-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(LK-C-3) Add controlled MultiRZ support to Lightning Kokkos #954
Conversation
Hello. You may have forgotten to update the changelog!
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## lk-control-base #954 +/- ##
==================================================
Coverage ? 97.01%
==================================================
Files ? 221
Lines ? 34464
Branches ? 0
==================================================
Hits ? 33436
Misses ? 1028
Partials ? 0 ☔ View full report in Codecov by Sentry. |
5497be5
to
2d5b58e
Compare
42f9cd3
to
23f398c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @josephleekl!
..._lightning/core/src/simulators/lightning_kokkos/gates/tests/Test_StateVectorKokkos_Param.cpp
Outdated
Show resolved
Hide resolved
..._lightning/core/src/simulators/lightning_kokkos/gates/tests/Test_StateVectorKokkos_Param.cpp
Show resolved
Hide resolved
ControlBitPatterns(indices_, num_qubits, controlled_wires, | ||
controlled_values); | ||
indices = vector2view(indices_); | ||
std::size_t scratch_size = ScratchViewComplex::shmem_size(dim) + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is shmem_size
used to assign the size of shared memory of GPU if the target is GPU? If yes, how can we make it safer as shared memory size varies from GPUs to GPUs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the scratch memory is set to level 0 (in L85), it is using the GPU Shared memory in the case for Nvidia GPUs. Assuming O(10KB) shared memory, this should be fine for at least 9 wires. If it does exceed this, we could in theory set the scratch memory level to 1, which is larger but slower.
The scratch_size here could actually be smaller, I will update this line from
std::size_t scratch_size = ScratchViewComplex::shmem_size(dim) + ScratchViewSizeT::shmem_size(dim);
to
std::size_t scratch_size = ScratchViewSizeT::shmem_size(dim);
(This kernel does not need scratch memory for the matrix like for qubitunitary)
Some further reference (p.8):
▶ Accessing data in (level 0) scratch memory is (usually) much faster than global
memory.
▶ GPUs have separate, dedicated, small, low-latency scratch memories (NOT
subject to coalescing requirements).
▶ CPUs don’t have special hardware, but programming with scratch memory results
in cache-aware memory access patterns.
▶ Roughly, it’s like a user-managed L1 cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's say, would a MultiRZ gate targets at 20 wires with only 1 control wire break the simulation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might actually break it, that's a good point. I'll test it now; in this case I might not use scratch or I will use level 1 scratch (depending on what the performance and memory limit is)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done some investigation with different settings on ISAIC A100:
- Scratch level 0: for >13 wires, it fails with insufficient shared memory
- Scratch level 1: Does not fail, but ~10% slower for 12-13 wires compared with scratch level 0
- Not using scratch at all: Does not fail, about the same as scratch level 1, and about 10% faster for >22 wires
I have now removed using scratch for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @josephleekl ! Just a couple of Q.
Do you want to add a changelog entry as well? |
Thanks @multiphaseCFD , I will update the changelog for the final PR to master :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @josephleekl . We could add a TODO for runtime selections shmem support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙌
Before submitting
Please complete the following checklist when submitting a PR:
All new features must include a unit test.
If you've fixed a bug or added code that should be tested, add a test to the
tests
directory!All new functions and code must be clearly commented and documented.
If you do make documentation changes, make sure that the docs build and
render correctly by running
make docs
.Ensure that the test suite passes, by running
make test
.Add a new entry to the
.github/CHANGELOG.md
file, summarizing thechange, and including a link back to the PR.
Ensure that code is properly formatted by running
make format
.When all the above are checked, delete everything above the dashed
line and fill in the pull request template.
Context:
Description of the Change:
This PR adds support for controlled multiRZ for Lightning Kokkos. This is defined in
BasicGateFunctor.hpp
, and is applied throughapplyNCNFunctor
defined in the same file.Benefits:
Performance benchmarks for gates are shown here: https://www.notion.so/xanaduai/Lightning-Kokkos-Native-Controlled-Operation-Gate-Benchmarks-12ebc6bd17648017a2dcd237748b24fe
Possible Drawbacks:
Related GitHub Issues:
[sc-76775]