Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(LK-C-3) Add controlled MultiRZ support to Lightning Kokkos #954

Merged
merged 82 commits into from
Nov 13, 2024
Merged
Show file tree
Hide file tree
Changes from 73 commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
4b9062e
initial commit
josephleekl Oct 18, 2024
ba70971
initial commit
josephleekl Oct 18, 2024
6ac216a
support controlled 1 qubit gate
josephleekl Oct 18, 2024
bea8bf7
add controlled globalphase test
josephleekl Oct 18, 2024
38c48b9
fix controlled globalphase return
josephleekl Oct 18, 2024
968b86e
format
josephleekl Oct 18, 2024
8c97d82
format
josephleekl Oct 18, 2024
09b07d5
format
josephleekl Oct 19, 2024
a7106b7
add globalphase python test
josephleekl Oct 19, 2024
34e6400
apply controlled only for 1 qubit gate
josephleekl Oct 21, 2024
b1263fa
format
josephleekl Oct 21, 2024
1245270
trigger CI
josephleekl Oct 21, 2024
7ebee0f
fix parity_2_offset to kokkos_inline_function
josephleekl Oct 21, 2024
cec599b
separate generate/control bit patterns
josephleekl Oct 22, 2024
a01b38a
format
josephleekl Oct 22, 2024
8dec286
initial commit
josephleekl Oct 18, 2024
17ef49b
support 2/3/4 qubit control gates
josephleekl Oct 19, 2024
e3911c3
update statevectorkokkos error
josephleekl Oct 19, 2024
a136669
update statevectorkokkos error
josephleekl Oct 19, 2024
bead29b
test control=wire exception
josephleekl Oct 19, 2024
aa944b0
remove controlled toffoli matrix test
josephleekl Oct 19, 2024
b98a2a1
fix disable control qubitunitary/blockencode
josephleekl Oct 21, 2024
bde08d0
format
josephleekl Oct 21, 2024
35b048a
fix disable multiRZ
josephleekl Oct 21, 2024
993a6f1
separate generate/controlbitpatterns
josephleekl Oct 22, 2024
ded7ac4
initial commit
josephleekl Oct 18, 2024
dc05a33
initial commit
josephleekl Oct 18, 2024
fd0799d
LK support controlled multiRZ
josephleekl Oct 21, 2024
65757fd
format
josephleekl Oct 21, 2024
3e36d9d
fix operation list
josephleekl Oct 21, 2024
bc9d408
separate generate/controlbitpatterns
josephleekl Oct 22, 2024
3fb1ec0
add fail matrix test
josephleekl Oct 23, 2024
92f8939
clean up
josephleekl Oct 23, 2024
29d7136
update test
josephleekl Oct 23, 2024
0a4b71d
edit state_vector.py
josephleekl Oct 23, 2024
2a2c4f9
recover c-globalphase + check target_wire=1
josephleekl Oct 23, 2024
e4e534f
format
josephleekl Oct 23, 2024
d0e75e3
accept list of controlled op
josephleekl Oct 24, 2024
193338c
add control H/T/S/phaseshift test
josephleekl Oct 24, 2024
3367a4b
update globalphase impl
josephleekl Oct 24, 2024
79d5c01
update ali comments
josephleekl Oct 30, 2024
3a99c5a
update comments
josephleekl Oct 31, 2024
4dd344e
update comments
josephleekl Oct 31, 2024
465c46c
update to std::ranges::views
josephleekl Oct 31, 2024
bb46635
revert std::view change
josephleekl Oct 31, 2024
e148462
add tests with 2 control wires
josephleekl Nov 1, 2024
c79d411
format
josephleekl Nov 1, 2024
6c08fbd
fix maybe_unused attribute from lambda to func body
josephleekl Nov 1, 2024
ecd8677
update lightning_kokkos.toml
josephleekl Nov 1, 2024
a250a76
format
josephleekl Nov 1, 2024
0e160bd
update basicgatefunctors
josephleekl Nov 6, 2024
9502b78
update lambda capture by value
josephleekl Nov 7, 2024
498802e
implement comments
josephleekl Nov 7, 2024
3d1e859
make format
josephleekl Nov 8, 2024
a69a4b5
merge with lk-control-base
josephleekl Nov 8, 2024
f25011b
merge with lk-control-1Q
josephleekl Nov 8, 2024
cc45e61
fix merge issues
josephleekl Nov 8, 2024
fedd270
format
josephleekl Nov 8, 2024
10a18bb
update gate functors and toml
josephleekl Nov 8, 2024
82b0b85
add tests
josephleekl Nov 8, 2024
240500e
Merge branch 'lk-control-base' into lk-control-gate-23Q
josephleekl Nov 9, 2024
020c215
fix merge squash
josephleekl Nov 9, 2024
b33d680
update alfredo comments
josephleekl Nov 11, 2024
b2e278c
update ali comment test instance
josephleekl Nov 12, 2024
9d93fe4
add qml.ctrl test comment
josephleekl Nov 12, 2024
d59df2f
update shuli/ali comments
josephleekl Nov 12, 2024
78c925c
Merge branch 'lk-control-gate-23Q' into lk-control-gate-NQ-multiRZ
josephleekl Nov 12, 2024
d84a24a
std::tie to structured binding
josephleekl Nov 12, 2024
3f1d800
add util:: namespace
josephleekl Nov 12, 2024
a8ec292
Merge branch 'lk-control-gate-23Q' into lk-control-gate-NQ-multiRZ
josephleekl Nov 12, 2024
e95b814
Merge branch 'lk-control-base' into lk-control-gate-NQ-multiRZ
josephleekl Nov 12, 2024
b76b995
python test and toml
josephleekl Nov 12, 2024
9230d73
update kokkos:: and std::accumulate
josephleekl Nov 12, 2024
21c4a8c
update test name
josephleekl Nov 13, 2024
0a62ea2
add contiguous control test
josephleekl Nov 13, 2024
63114fd
add test
josephleekl Nov 13, 2024
e93965b
format test
josephleekl Nov 13, 2024
3cef4e8
remove scratch for NCMultiRZ
josephleekl Nov 13, 2024
e8ba058
remove scratch using
josephleekl Nov 13, 2024
8e4ff26
restore measure test
josephleekl Nov 13, 2024
bd8b70a
format
josephleekl Nov 13, 2024
cf1d66e
add TODO for using shmem for indices in multirz
josephleekl Nov 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,64 @@ using Pennylane::LightningKokkos::Util::vector2view;
/// @endcond

namespace Pennylane::LightningKokkos::Functors {
template <class PrecisionT, class FuncT> class applyNCNFunctor {
using KokkosComplexVector = Kokkos::View<Kokkos::complex<PrecisionT> *>;
using KokkosIntVector = Kokkos::View<std::size_t *>;
using ScratchViewComplex =
Kokkos::View<Kokkos::complex<PrecisionT> *,
Kokkos::DefaultExecutionSpace::scratch_memory_space,
Kokkos::MemoryTraits<Kokkos::Unmanaged>>;
using ScratchViewSizeT =
Kokkos::View<std::size_t *,
Kokkos::DefaultExecutionSpace::scratch_memory_space,
Kokkos::MemoryTraits<Kokkos::Unmanaged>>;
using MemberType = Kokkos::TeamPolicy<>::member_type;

Kokkos::View<Kokkos::complex<PrecisionT> *> arr;
const FuncT core_function;
KokkosIntVector indices;
KokkosIntVector parity;
KokkosIntVector rev_wires;
KokkosIntVector rev_wire_shifts;
std::size_t dim;

public:
template <class ExecutionSpace>
applyNCNFunctor([[maybe_unused]] ExecutionSpace exec,
Kokkos::View<Kokkos::complex<PrecisionT> *> arr_,
std::size_t num_qubits,
const std::vector<std::size_t> &controlled_wires,
const std::vector<bool> &controlled_values,
const std::vector<std::size_t> &wires, FuncT core_function_)
: arr(arr_), core_function(core_function_) {

std::size_t two2N =
std::exp2(num_qubits - wires.size() - controlled_wires.size());
dim = std::exp2(wires.size());
const auto &[parity_, rev_wires_] =
reverseWires(num_qubits, wires, controlled_wires);
parity = parity_;
std::vector<std::size_t> indices_ =
generateBitPatterns(wires, num_qubits);
ControlBitPatterns(indices_, num_qubits, controlled_wires,
controlled_values);
indices = vector2view(indices_);
std::size_t scratch_size = ScratchViewComplex::shmem_size(dim) +
Copy link
Member

@multiphaseCFD multiphaseCFD Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is shmem_size used to assign the size of shared memory of GPU if the target is GPU? If yes, how can we make it safer as shared memory size varies from GPUs to GPUs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the scratch memory is set to level 0 (in L85), it is using the GPU Shared memory in the case for Nvidia GPUs. Assuming O(10KB) shared memory, this should be fine for at least 9 wires. If it does exceed this, we could in theory set the scratch memory level to 1, which is larger but slower.

The scratch_size here could actually be smaller, I will update this line from

std::size_t scratch_size = ScratchViewComplex::shmem_size(dim) +  ScratchViewSizeT::shmem_size(dim);

to

std::size_t scratch_size = ScratchViewSizeT::shmem_size(dim);

(This kernel does not need scratch memory for the matrix like for qubitunitary)

Some further reference (p.8):

▶ Accessing data in (level 0) scratch memory is (usually) much faster than global
memory.
▶ GPUs have separate, dedicated, small, low-latency scratch memories (NOT
subject to coalescing requirements).
▶ CPUs don’t have special hardware, but programming with scratch memory results
in cache-aware memory access patterns.
▶ Roughly, it’s like a user-managed L1 cache

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's say, would a MultiRZ gate targets at 20 wires with only 1 control wire break the simulation?

Copy link
Contributor Author

@josephleekl josephleekl Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might actually break it, that's a good point. I'll test it now; in this case I might not use scratch or I will use level 1 scratch (depending on what the performance and memory limit is)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done some investigation with different settings on ISAIC A100:

  • Scratch level 0: for >13 wires, it fails with insufficient shared memory
  • Scratch level 1: Does not fail, but ~10% slower for 12-13 wires compared with scratch level 0
  • Not using scratch at all: Does not fail, about the same as scratch level 1, and about 10% faster for >22 wires

I have now removed using scratch for this.

ScratchViewSizeT::shmem_size(dim);
Kokkos::parallel_for(
Kokkos::TeamPolicy(two2N, Kokkos::AUTO, dim)
.set_scratch_size(0, Kokkos::PerTeam(scratch_size)),
*this);
}
KOKKOS_FUNCTION void operator()(const MemberType &teamMember) const {
const std::size_t k = teamMember.league_rank();
const std::size_t offset = Util::parity_2_offset(parity, k);
Kokkos::parallel_for(Kokkos::TeamThreadRange(teamMember, dim),
[&](const std::size_t i) {
core_function(arr, i, indices, offset);
});
}
};

template <class PrecisionT, class FuncT, bool has_controls>
class applyNC1Functor {};
Expand Down Expand Up @@ -1714,6 +1772,40 @@ void applyMultiRZ(Kokkos::View<Kokkos::complex<PrecisionT> *> arr_,
});
}

template <class ExecutionSpace, class PrecisionT>
void applyNCMultiRZ(Kokkos::View<Kokkos::complex<PrecisionT> *> arr_,
const std::size_t num_qubits,
const std::vector<std::size_t> &controlled_wires,
const std::vector<bool> &controlled_values,
const std::vector<std::size_t> &wires,
const bool inverse = false,
const std::vector<PrecisionT> &params = {}) {
const PrecisionT &angle = params[0];
const Kokkos::complex<PrecisionT> shift_0 = Kokkos::complex<PrecisionT>{
std::cos(angle / 2),
(inverse) ? std::sin(angle / 2) : -std::sin(angle / 2)};
const Kokkos::complex<PrecisionT> shift_1 = Kokkos::conj(shift_0);
std::size_t wires_parity = 0U;
wires_parity =
std::accumulate(wires.begin(), wires.end(), std::size_t{0},
[num_qubits](std::size_t acc, std::size_t wire) {
return acc | (static_cast<std::size_t>(1U)
<< (num_qubits - wire - 1));
});
auto core_function = KOKKOS_LAMBDA(
Kokkos::View<Kokkos::complex<PrecisionT> *> arr, const std::size_t i,
Kokkos::View<std::size_t *> indices, std::size_t offset) {
const std::size_t index = indices(i);
arr(index + offset) *=
(Kokkos::Impl::bit_count((index + offset) & wires_parity) % 2 == 0)
? shift_0
: shift_1;
};

applyNCNFunctor(ExecutionSpace{}, arr_, num_qubits, controlled_wires,
controlled_values, wires, core_function);
}

template <class ExecutionSpace, class PrecisionT>
void applyPauliRot(Kokkos::View<Kokkos::complex<PrecisionT> *> arr_,
const std::size_t num_qubits,
Expand Down Expand Up @@ -2009,6 +2101,11 @@ void applyNCNamedOperation(const ControlledGateOperation gateop,
controlled_values, wires, inverse,
params);
return;
case ControlledGateOperation::MultiRZ:
applyNCMultiRZ<ExecutionSpace>(arr_, num_qubits, controlled_wires,
controlled_values, wires, inverse,
params);
return;
default:
PL_ABORT("Controlled gate operation does not exist.");
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -958,6 +958,41 @@ TEMPLATE_TEST_CASE("StateVectorKokkos::applyOperation param "
CHECK(real(sv_gate_host[j]) == Approx(real(expected_result[j])));
}
}

SECTION("2-controlled SingleExcitationMinus") {
josephleekl marked this conversation as resolved.
Show resolved Hide resolved
Kokkos::deep_copy(sv_gate.getView(), ini_sv);

const std::vector<std::size_t> control_wires = {0, 2};
const std::vector<bool> control_values = {true, false};
josephleekl marked this conversation as resolved.
Show resolved Hide resolved
const std::vector<std::size_t> target_wire = {1, 3};
const TestType param = 0.234;
sv_gate.applyOperation("MultiRZ", control_wires, control_values,
target_wire, inverse, {param});
auto sv_gate_host = Kokkos::create_mirror_view_and_copy(
Kokkos::HostSpace{}, sv_gate.getView());

std::vector<ComplexT> expected_result{// Generated using Pennylane
ComplexT{0.25, 0.0},
ComplexT{0.25, 0.0},
ComplexT{0.25, 0.0},
ComplexT{0.25, 0.0},
ComplexT{0.25, 0.0},
ComplexT{0.25, 0.0},
ComplexT{0.25, 0.0},
ComplexT{0.25, 0.0},
ComplexT{0.24829083, -0.02918331},
ComplexT{0.24829083, +0.02918331},
ComplexT{0.25, 0.0},
ComplexT{0.25, 0.0},
ComplexT{0.24829083, +0.02918331},
ComplexT{0.24829083, -0.02918331},
ComplexT{0.25, 0.0},
ComplexT{0.25, 0.0}};
for (std::size_t j = 0; j < exp2(num_qubits); j++) {
CHECK(imag(sv_gate_host[j]) == Approx(imag(expected_result[j])));
CHECK(real(sv_gate_host[j]) == Approx(real(expected_result[j])));
}
}
}

TEMPLATE_TEST_CASE(
Expand Down
2 changes: 1 addition & 1 deletion pennylane_lightning/lightning_kokkos/_state_vector.py
Original file line number Diff line number Diff line change
Expand Up @@ -285,7 +285,7 @@ def _apply_lightning(
param = operation.parameters
method(wires, invert_param, param)
elif isinstance(operation, qml.ops.Controlled) and not isinstance(
operation.base, (qml.QubitUnitary, qml.BlockEncode, qml.MultiRZ)
operation.base, (qml.QubitUnitary, qml.BlockEncode)
): # apply n-controlled gate
# Kokkos does not support controlled gates except for GlobalPhase and single-qubit
self._apply_lightning_controlled(operation)
Expand Down
1 change: 1 addition & 0 deletions pennylane_lightning/lightning_kokkos/lightning_kokkos.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@
"C(DoubleExcitation)",
"C(DoubleExcitationMinus)",
"C(DoubleExcitationPlus)",
"C(MultiRZ)",
"C(GlobalPhase)",
"CRot",
"IsingXX",
Expand Down
2 changes: 1 addition & 1 deletion pennylane_lightning/lightning_kokkos/lightning_kokkos.toml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ IsingXX = { properties = [ "invertible", "controllable", "differe
IsingXY = { properties = [ "invertible", "controllable", "differentiable" ] }
IsingYY = { properties = [ "invertible", "controllable", "differentiable" ] }
IsingZZ = { properties = [ "invertible", "controllable", "differentiable" ] }
MultiRZ = { properties = [ "invertible", "differentiable" ] }
MultiRZ = { properties = [ "invertible", "controllable", "differentiable" ] }
PauliX = { properties = [ "invertible", "controllable", "differentiable" ] }
PauliY = { properties = [ "invertible", "controllable", "differentiable" ] }
PauliZ = { properties = [ "invertible", "controllable", "differentiable" ] }
Expand Down
2 changes: 0 additions & 2 deletions tests/test_gates.py
Original file line number Diff line number Diff line change
Expand Up @@ -487,8 +487,6 @@ def test_controlled_qubit_gates(operation, n_qubits, control_value, tol):
dev = qml.device(device_name, wires=n_qubits)
threshold = 5 if device_name == "lightning.tensor" else 250
num_wires = max(operation.num_wires, 1)
if device_name == "lightning.kokkos" and isinstance(operation, qml.MultiRZ):
pytest.skip("lightning.kokkos does not support controlled-multiRZ")

for n_wires in range(num_wires + 1, num_wires + 4):
wire_lists = list(itertools.permutations(range(0, n_qubits), n_wires))
Expand Down
Loading