Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster Layout and MDC Store Creation #448

Open
wants to merge 28 commits into
base: develop
Choose a base branch
from

Conversation

coreyostrove
Copy link
Contributor

@coreyostrove coreyostrove commented Jun 5, 2024

Building off of the changes in PR #445 as part of an effort at reducing the overhead associated with running GST analysis, this PR further speeds up the construction of COPA layouts and MDC stores/objective functions. Both in general, but even more so in the context of iterative GST estimation.

Note: This branch is forked off of 'feature-faster-circuit-primitives', and so some of these commits are from that branch and will get cleaned up whenever #445 gets merged into develop (in case you notice duplicates).

Here is a summary of the changes:

  1. Caching for layout creation: One of the main bottlenecks in layout creation is performing a variety of (relatively expensive) preprocessing steps on circuits. In the context of iterative GST we almost always use a nested circuit structure wherein for each new iteration we keep all of the previous circuits and add a few new ones. New layouts are constructed at each iteration, and previously we started from scratch each time. Now prior to starting the iterative estimation we precompute and construct a cache for the more expensive circuit structures used in layout generation, significantly reducing the amount of repeated calculations.
  2. Caching for MDC store construction: Similar story to layout creation. Previously one of the bottlenecks in MDC store construction was computing the number of possible outcomes for a circuit (including POVMs and potentially instruments). This resulted in repeated calls to a method called expand_instruments_and_separate_povm which is quite expensive. Like layouts we construct a fresh MDC store at each iteration, so by caching this information we significantly reduce the amount of repeated calculations.
  3. The expand_instruments_and_separate_povm method has been refactored, moving it from being a method of Circuits which takes as input a model into being a method of OpModels which takes as input a circuit. This method was a strange fit for the Circuit class, as it relies pretty heavily on knowledge of the internal details of a model, and from an abstraction hierarchy level this was also a out of character thing for a Circuit to be asked to do. It makes much more sense for an OpModel (the implementation effectively assumed you'd only be calling it on an OpModel given the attributes and methods it assumed were implemented, so now that is enforced), and OpModel is also where the conceptually similar complete_circuit(s) and split_circuit(s) methods live. In addition to the refactor, I have also added a new bulk_expand_instruments_and_separate_povm which more efficiently performs this operation on a list of circuits.
  4. Miscellaneous performance updates in service of speeding up layout and MDC store construction.

I look forward to any and all feedback.

P.S. I didn't mention how these changes affect the bottom line on runtime yet. For single-qubit GST with the XYI gate set, the full TP parameterization, and a maximum depth of 128 (as before, I haven't yet tested this on the 2Q case, or for other experiment designs, so ymmv) overall we have:

  • MatrixCOPALayout creation is ~4X faster
  • MapCOPALayout creation is ~2X faster
  • ModelDataSetCircuitStore/TimeIndependentMDCObjectiveFunction creation is ~5X faster.

Those changes are measured relative to the runtimes following the changes made on #445. Combined with #445 this gives an overall reduction in end-to-end runtime (for the 1Q test case described above) of ~50% when using the MatrixForwardSimulator, and ~15% when using the MapForwardSimulator (proportionally less of the overall runtime is spent performing these subroutines with map, so this difference makes sense).

Corey Ostrove added 19 commits May 22, 2024 16:50
The creation of COPA layouts relies on a number of specialized circuit structures which require non-trivial time to construct. In the context of iterative GST estimation with nested circuit lists (i.e. the default) this results in unnecessarily repeat construction of these objects. This is an initial implementation of a caching scheme allowing for more efficient re-use of these circuit structures across iterations.
Cache the expanded SPAM-free circuits to reduce recomputing things unnecessarily.
This updates the implementation of the SeparatePOVMCircuit containter class. The most important change is adding an attribute for the full_effect_labels that avoids uneeded reconstruction. To add protection then, to ensure that this is kept in sync with everything else, the povm_label and effect_labels attributes (which feed into full_effect_labels) have been promoted to properties with setters that ensure the full_effect_labels are kept synced.
Adds a new method to OpModel that allows for doing instrument expansion and povm expansion in bulk, speeding things up be avoiding recomputation of shared quantities. Also adds a pipeline for re-using completed or split circuits (as produced by the related OpModel methods) for more efficient re-use of done work.
Some minor performance oriented tweaks to the init for COPA layouts.
Refactor some of the ordered dictionaries in matrix layout creation into regular ones.
Start adding infrastructure for caching things used in MDC store creation and for plumbing in stuff from layout creation.
Performance optimization for the method for adding omitted frequencies to incorporate caching of the number of outcomes per circuit (which is somewhat expensive since it goes through the instrument/povm expansion code). Additionally refactor some other parts of this code for improved efficiency. Also makes a few minor tweaks to the method for adding counts to speed that up as well. Can probably make this a bit faster still by merging the two calls to reduce redundancy, but that is a future us problem. Additionally make a few microoptimizations to the dataset code for grabbing counts, and to slicetools adding a function for directly giving a numpy array for a slice (instead of needing to cast from a list).

Miscellaneous cleanup of old commented out code that doesn't appear needed any longer.
Fix a bug I introduced in dataset indexing into something that could be None.
Another minor bug caught by testing.
Not sure why this didn't get caught on the circuit update branch, but oh well...
Fixes minor error in split_circuits.
Improve the performance of __getitem__ when indexing into static circuits by making use of the _copy_init code path.
Implement caching of circuit structures tailored to the map forward simulator's requirements.
@coreyostrove coreyostrove changed the base branch from master to develop June 5, 2024 05:37
Corey Ostrove added 2 commits June 4, 2024 23:54
This finishes the process of refactoring expand_instruments_and_separate_povm from a circuit method to a method of OpModel.
Refactor expand_instruments_and_separate_povm to use the multi-circuit version under the hood to reduce code duplication.
@coreyostrove coreyostrove self-assigned this Jun 5, 2024
@coreyostrove coreyostrove added this to the 0.9.13 milestone Jun 5, 2024
@coreyostrove coreyostrove marked this pull request as ready for review June 5, 2024 06:48
@coreyostrove coreyostrove requested review from rileyjmurray and a team as code owners June 5, 2024 06:48
@coreyostrove coreyostrove requested a review from sserita June 5, 2024 06:48
Copy link
Contributor

@rileyjmurray rileyjmurray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this branch came off of the feature-faster-circuit-primitives branch. That's making Git think this PR contains all the material from PR #445. I'll hold off on a full review of this PR until (1) #445 is merged and (2) this PR is updated, as needed, in order for Git to see the minimal set of changes.

That said, I do have three actionable comments. Two comments are given in-line. Here's a broader comment.

Right now the changes introduce free-functions like create_matrix_copa_layout_circuit_cache and create_map_copa_layout_circuit_cache. I'd prefer that these were static class functions, and that the name class signifier (_matrix_, _map__, etc..) be dropped from the function names. From there, you could invoke the function just with mld.sim.create_copa_layout_circuit_cache(...).

If you'd like you could make create_copa_layout_circuit_cache a method in the ForwardSimulator base class. If that method were to raise a NotImplementedError by default then you could replace some branching if-statements with

try:
    precomp_layour_circuit_cache = mdl.sim.create_copa_layout_circuit_cache(...)
except NotImplementedError:
    precomp_layour_circuit_cache = None

pygsti/algorithms/core.py Outdated Show resolved Hide resolved
pygsti/algorithms/core.py Outdated Show resolved Hide resolved
Corey Ostrove added 2 commits June 5, 2024 16:25
Refactor cache creation functions into static methods of the corresponding forward simulator class. Also add an empty base version of this method, and clean up a few miscellaneous things caught by review.
@coreyostrove
Copy link
Contributor Author

Thanks for the feedback and suggestions, @rileyjmurray. I have incorporated your feedback and refactored the cache creation methods into static methods of the corresponding forward simulator classes. I also addressed the two inline comments, so I have marked those as resolved.

@@ -78,24 +82,29 @@ def add_expanded_circuits(indices, add_to_this_dict):
for i in indices:
nospam_c = unique_nospam_circuits[i]
for unique_i in circuits_by_unique_nospam_circuits[nospam_c]: # "unique" circuits: add SPAM to nospam_c
observed_outcomes = None if (dataset is None) else dataset[ds_circuits[unique_i]].unique_outcomes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for folks, but particularly aimed at @enielse. A comparable line to this one for computing the quantity named observed_outcomes appears here, but in the map layout code this queries the outcomes property of DataSetRow instead of the unique_outcomes property as is done here. I think this should be unique_outcomes in both cases (I don't see a reason why it wouldn't also be for map), but I wanted to run this by you before I made that change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any reason why map cannot also use unique_outcomes instead of outcomes, but am basing that just on looking through this code while reviewing. Expert @enielse opinion still welcome.

Copy link
Contributor

@rileyjmurray rileyjmurray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good overall, but the current changes have broken unit tests for some forward simulators. I don't know why all of the failures are happening. One of the failures is because TorchForwardSimulator uses Circuit.expand_instruments_and_separate_povm.

@coreyostrove
Copy link
Contributor Author

This looks good overall, but the current changes have broken unit tests for some forward simulators. I don't know why all of the failures are happening. One of the failures is because TorchForwardSimulator uses Circuit.expand_instruments_and_separate_povm.

TIL that you can copy a link to a specific line of the github actions log, neat.

Yup, this makes sense. One of the changes made on this branch was moving that method from the Circuit class to the OpModel class. This branch was forked off of a version of develop before the torch forward simulator code was merged in, so I'll need to make some minor updates.

Fix a few minor issues related to refactored code and updates made in this branch.
Copy link
Contributor

@sserita sserita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited to get this in and get those speedups in layout creation! First, I have a few comments: one which I think will error in certain edge cases, and others that are optional but may cut down on branching to increase maintainability.


#pre-compute a dictionary caching completed circuits for layout construction performance.
unique_circuits = list({ckt for circuit_list in circuit_lists for ckt in circuit_list})
if isinstance(mdl.sim, (_fwdsims.MatrixForwardSimulator, _fwdsims.MapForwardSimulator)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we checking this because only matrix and map forward sims have this defined? If so, thoughts on a try/catch on NotImplementedError instead so that if create_[...]_cache is defined on another forward sim, we don't have to remember and come back to change this check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This staticmethod is implemented in the ForwardSimulator base class, but raises a NotImplementedError. Though that might eventually change (it would be possible in principle to implement caching for the base COPA layout class' cache creation, but I haven't done so yet).

I've been spending too much time the past few months on SWE youtube, and one of the SWE design philosophies du jour that I've been enamored with as of late is 'Locality of Behavior' (LoB, as described here). While I'm not a purist by any means, in this instance I think it is easier (for someone reading this for the first time, or the first time in a while) to understand this portion of the code by explicitly signaling for which forward simulators the caching will happen, rather than the try/except paradigm which would require looking into the source code for the various forward simulator classes to identify when it won't happen.

That said, you're right there is a trade-off in doing so, which is needing to remember to come back to this and explicitly enable caching in protocols which support it. (Realistically it'd most likely be one of either me, you or Riley messing around with pyGSTi internals at this level, so hopefully we'd remember).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't heard of LoB before, but I like it.

cache['split_circuits'] = {ckt: split_ckt for ckt, split_ckt in zip(circuits, split_circuits)}

#There is some potential aliasing that happens in the init that I am not
#doing here, but I think 90+% of the time this ought to be fine.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? In the <10% of cases that you expect to run into, will the cache/resulting layout still be created correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, it took me a minute to remember what I was thinking about here. (It wasn't loaded into cache, ba dum dum tsss...)

The thing that happens in the init, that isn't happening here, are the following calls:

unique_circuits, to_unique = self._compute_unique_circuits(circuits)
aliases = circuits.op_label_aliases if isinstance(circuits, _CircuitList) else None
ds_circuits = _lt.apply_aliases_to_circuits(unique_circuits, aliases)

This allows for the possibility of aliasing in the keys of the DataSet. This aliasing is the part that I didn't implement in the code shown below. The thing that would happen in the <10% of cases I was alluding to, i.e. the cases where someone is using aliasing (as far as I am aware this is not a widely used feature) is that ds_row would be None some amount (maybe all) of the time, and we'd pass a list with None entries into bulk_expand_instruments_and_separate_povm. This is allowed (and is in fact the default), but it would result in expanded circuits being created for every POVM and instrument effect, instead of just those for which outcomes were observed. The resulting layout wouldn't be wrong, but would be less memory efficient than it could otherwise be, as it would have space allocated for probabilities we won't be calculating. It would also make the bulk_expand_instruments_and_separate_povm less efficient than it otherwise could be.

All of that said, I am not entirely sure why I didn't implement the aliasing for the DataSet keys at the time. The answer is probably something along the lines of: "I wasn't sure how the aliasing code worked and couldn't be arsed to figure it out at the time." Anyhow, I see no reason not to add this back in (I'm pretty sure I can just essentially copy and paste the lines above into the cache creation routine), so I'll give it a crack.

@@ -78,24 +82,29 @@ def add_expanded_circuits(indices, add_to_this_dict):
for i in indices:
nospam_c = unique_nospam_circuits[i]
for unique_i in circuits_by_unique_nospam_circuits[nospam_c]: # "unique" circuits: add SPAM to nospam_c
observed_outcomes = None if (dataset is None) else dataset[ds_circuits[unique_i]].unique_outcomes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any reason why map cannot also use unique_outcomes instead of outcomes, but am basing that just on looking through this code while reviewing. Expert @enielse opinion still welcome.

pygsti/layouts/matrixlayout.py Outdated Show resolved Hide resolved
pygsti/layouts/matrixlayout.py Outdated Show resolved Hide resolved
pygsti/layouts/matrixlayout.py Outdated Show resolved Hide resolved
pygsti/layouts/matrixlayout.py Show resolved Hide resolved
pygsti/objectivefns/objectivefns.py Show resolved Hide resolved
Corey Ostrove added 3 commits July 30, 2024 20:29
Add in support for data set key aliasing in COPA layout cache creation.
Rework some of the if statement branching in the layout creation to instead use fallback behavior of get more.
I accidentally put down the wrong directory for temp testing files in the RB testing code.
@coreyostrove coreyostrove requested a review from a team as a code owner July 31, 2024 03:43
@coreyostrove
Copy link
Contributor Author

Thanks for the careful feedback, @sserita!

I have just pushed some changes addressing this feedback, to summarize the main changes:

  1. I've added back in alias support during cache creation to better match the original implementations in the various COPA layout inits. (Inspired by one of your questions about a comment I had left).
  2. I have replace some of the branching statements for cache existence with an empty dictionary + get based version where appropriate. There are a few places where the behavior in the case where the cache isn't present is meaningfully different than when invoking the fallback behavior for missing values (because they are using bulk methods), and in those instances I haven't made this change.

There is also a miscellaneous typo fix in the RB unit tests that I accidentally introduced with the IRB branch that I'm throwing in for good measure, in case you notice the bizarre inclusion.

Copy link
Contributor

@sserita sserita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aliasing looks good. I just have one question left on the RB test change.

@@ -163,9 +163,9 @@ def test_combined_design_access(self):

def test_serialization(self):

self.irb_design.write('../../test_packages/temp_test_files/test_InterleavedRBDesign_serialization')
self.irb_design.write('../../test/test_packages/temp_test_files/test_InterleavedRBDesign_serialization')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about why this change is needed. Aren't we in test/unit/protocols, so ../../test_packages is the right way to get to those files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought so originally myself, but I noticed when running unit tests locally last night that a new test_packages directory was being created at the base level (i.e. at the same level as test and pygsti) and populated with these files. In retrospect I suppose I could have equivalently removed one of the sets of '..'. Not sure why this is the case though, tbh, maybe something to do with how pytest sets the cwd?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also possible this was related to which directory I was launching pytest from, so if you have a venv handy to run these tests on and have a different experience that'd be good to know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants