Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(LK-C-3) Add controlled MultiRZ support to Lightning Kokkos #954

Merged
merged 82 commits into from
Nov 13, 2024

Conversation

josephleekl
Copy link
Contributor

@josephleekl josephleekl commented Oct 21, 2024

Before submitting

Please complete the following checklist when submitting a PR:

  • All new features must include a unit test.
    If you've fixed a bug or added code that should be tested, add a test to the
    tests directory!

  • All new functions and code must be clearly commented and documented.
    If you do make documentation changes, make sure that the docs build and
    render correctly by running make docs.

  • Ensure that the test suite passes, by running make test.

  • Add a new entry to the .github/CHANGELOG.md file, summarizing the
    change, and including a link back to the PR.

  • Ensure that code is properly formatted by running make format.

When all the above are checked, delete everything above the dashed
line and fill in the pull request template.


Context:

Description of the Change:
This PR adds support for controlled multiRZ for Lightning Kokkos. This is defined in BasicGateFunctor.hpp, and is applied through applyNCNFunctor defined in the same file.

Benefits:
Performance benchmarks for gates are shown here: https://www.notion.so/xanaduai/Lightning-Kokkos-Native-Controlled-Operation-Gate-Benchmarks-12ebc6bd17648017a2dcd237748b24fe

Possible Drawbacks:

Related GitHub Issues:

[sc-76775]

@josephleekl josephleekl added ci:build_wheels Activate wheel building. ci:use-gpu-runner Enable usage of GPU runner for this Pull Request labels Oct 21, 2024
Copy link
Contributor

Hello. You may have forgotten to update the changelog!
Please edit .github/CHANGELOG.md with:

  • A one-to-two sentence description of the change. You may include a small working example for new features.
  • A link back to this PR.
  • Your name (or GitHub username) in the contributors section.

@josephleekl josephleekl changed the title LK support controlled MultiRZ (arbitrary qubit controlled gate) Add controlled N-qubit gate support to Lightning Kokkos Oct 21, 2024
@josephleekl josephleekl marked this pull request as ready for review October 21, 2024 18:24
Copy link

codecov bot commented Oct 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (lk-control-base@0acc148). Learn more about missing BASE report.

Additional details and impacted files
@@                Coverage Diff                 @@
##             lk-control-base     #954   +/-   ##
==================================================
  Coverage                   ?   97.01%           
==================================================
  Files                      ?      221           
  Lines                      ?    34464           
  Branches                   ?        0           
==================================================
  Hits                       ?    33436           
  Misses                     ?     1028           
  Partials                   ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@josephleekl josephleekl marked this pull request as draft October 21, 2024 19:49
@josephleekl josephleekl force-pushed the lk-control-gate-NQ-multiRZ branch 2 times, most recently from 42f9cd3 to 23f398c Compare October 21, 2024 21:01
@josephleekl josephleekl marked this pull request as ready for review October 21, 2024 21:22
@josephleekl josephleekl added the urgent Mark a pull request as high priority label Oct 21, 2024
Base automatically changed from lk-control-gate-23Q to lk-control-base November 12, 2024 18:21
Copy link
Member

@maliasadi maliasadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @josephleekl!

ControlBitPatterns(indices_, num_qubits, controlled_wires,
controlled_values);
indices = vector2view(indices_);
std::size_t scratch_size = ScratchViewComplex::shmem_size(dim) +
Copy link
Member

@multiphaseCFD multiphaseCFD Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is shmem_size used to assign the size of shared memory of GPU if the target is GPU? If yes, how can we make it safer as shared memory size varies from GPUs to GPUs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the scratch memory is set to level 0 (in L85), it is using the GPU Shared memory in the case for Nvidia GPUs. Assuming O(10KB) shared memory, this should be fine for at least 9 wires. If it does exceed this, we could in theory set the scratch memory level to 1, which is larger but slower.

The scratch_size here could actually be smaller, I will update this line from

std::size_t scratch_size = ScratchViewComplex::shmem_size(dim) +  ScratchViewSizeT::shmem_size(dim);

to

std::size_t scratch_size = ScratchViewSizeT::shmem_size(dim);

(This kernel does not need scratch memory for the matrix like for qubitunitary)

Some further reference (p.8):

▶ Accessing data in (level 0) scratch memory is (usually) much faster than global
memory.
▶ GPUs have separate, dedicated, small, low-latency scratch memories (NOT
subject to coalescing requirements).
▶ CPUs don’t have special hardware, but programming with scratch memory results
in cache-aware memory access patterns.
▶ Roughly, it’s like a user-managed L1 cache

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's say, would a MultiRZ gate targets at 20 wires with only 1 control wire break the simulation?

Copy link
Contributor Author

@josephleekl josephleekl Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might actually break it, that's a good point. I'll test it now; in this case I might not use scratch or I will use level 1 scratch (depending on what the performance and memory limit is)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done some investigation with different settings on ISAIC A100:

  • Scratch level 0: for >13 wires, it fails with insufficient shared memory
  • Scratch level 1: Does not fail, but ~10% slower for 12-13 wires compared with scratch level 0
  • Not using scratch at all: Does not fail, about the same as scratch level 1, and about 10% faster for >22 wires

I have now removed using scratch for this.

Copy link
Member

@multiphaseCFD multiphaseCFD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @josephleekl ! Just a couple of Q.

@multiphaseCFD
Copy link
Member

Do you want to add a changelog entry as well?

@josephleekl
Copy link
Contributor Author

Do you want to add a changelog entry as well?

Thanks @multiphaseCFD , I will update the changelog for the final PR to master :)

Copy link
Member

@multiphaseCFD multiphaseCFD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @josephleekl . We could add a TODO for runtime selections shmem support.

Copy link
Member

@maliasadi maliasadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌

@josephleekl josephleekl merged commit d7aecd7 into lk-control-base Nov 13, 2024
110 checks passed
@josephleekl josephleekl deleted the lk-control-gate-NQ-multiRZ branch November 13, 2024 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci:build_wheels Activate wheel building. ci:use-gpu-runner Enable usage of GPU runner for this Pull Request urgent Mark a pull request as high priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants