Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement multithreaded stochastic swap in rust #7658

Merged
merged 42 commits into from
Feb 28, 2022

Conversation

mtreinish
Copy link
Member

@mtreinish mtreinish commented Feb 14, 2022

Summary

This commit is a rewrite of the core swap trials functionality in the
StochasticSwap transpiler pass. Previously this core routine was written
using Cython (see #1789) which had great performance, but that
implementation was single threaded. The core of the stochastic swap
algorithm by it's nature is well suited to be executed in parallel, it
attempts a number of random trials and then picks the best result
from all the trials and uses that for that layer. These trials can
easily be run in parallel as there is no data dependency between the
trials (there are shared inputs but read-only). As the algorithm
generally scales exponentially the speed up from running the trials in
parallel can offset this somewhat and improve the general scaling of the
pass to a point. Running the pass in parallel was previously tried in #4781
using Python multiprocessing but the overhead of launching an additional
process and serializing the input arrays for each trial was significantly larger
than the speed gains. To run the algorithm efficiently in parallel
multithreading is needed to leverage shared memory on shared inputs.

This commit rewrites the cython routine using rust. This was done for
two reasons. The first is that rust's safety guarantees make dealing
with and writing parallel code much easier and safer. It's also
multiplatform because the rust language supports native threading
primitives in language. The second is while writing parallel cython
code using open-mp there are limitations with it, mainly on windows. In
practice it was also difficult to write and maintain parallel cython
code as it has very strict requirements on python and C code
interactions. It was much faster and easier to port it to rust and the
performance for each iteration (outside of parallelism) is the same (and in
some cases marginally faster) in rust. The implementation here reuses
the data structures that the previous cython implementation introduced
(mainly flattening all the terra objects into 1d or 2d numpy arrays for
efficient access from C).

The speedups from this PR can be significant, calling transpile() on a
400 qubit (with a depth of 10) QV model circuit targeting a 409 heavy
hex coupling map goes from ~200 seconds with the single threaded cython
to ~60 seconds with this PR locally on a 32 core system, When transpiling
a 1000 qubit (also with a depth of 10) QV model circuit targeting a 1081
qubit heavy hex coupling map goes from taking ~3700 seconds to ~720
seconds.

The tradeoff with this PR is for local qiskit-terra development a rust
compiler needs to be installed. This is made trivial using rustup
(https://rustup.rs/), but it is an additional burden and one that we
might not want to make. If so we can look at turning this PR into a
separate repository/package that qiskit-terra can depend on. The
tradeoff here is that we'll be adding friction to the api boundary
between the pass and the core swap trials interface. But, it does ease
the dependency on development for qiskit-terra.

Details and comments

Fixes #1743

TODO:

  • Fix test failures
  • Fix hanging test/test timeout
  • Fix tox configuration and docs build
  • Add Docs
  • Update contributing docs to include requiring a rust compiler
  • Add release notes
  • More thorough benchmarking and potentially determining if there is a break even point for parallel vs serial (and if there is use that as a default switch)
  • Update linux wheel publish jobs to install rust compiler

@mtreinish mtreinish added type: discussion performance Changelog: API Change Include in the "Changed" section of the changelog labels Feb 14, 2022
@mtreinish mtreinish requested a review from a team as a code owner February 14, 2022 14:39
This commit is a rewrite of the core swap trials functionality in the
StochasticSwap transpiler pass. Previously this core routine was written
using Cython (see Qiskit#1789) which had great performance, but that
implementation was single threaded. The core of the stochastic swap
algorithm by it's nature is well suited to be executed in parallel, it
attempts a number of random trials and then picks the best result
from all the trials and uses that for that layer. These trials can
easily be run in parallel as there is no data dependency between the
trials (there are shared inputs but read-only). As the algorithm
generally scales exponentially the speed up from running the trials in
parallel can offset this and improve the scaling of the pass. Running
the pass in parallel was previously tried in Qiskit#4781 using Python
multiprocessing but the overhead of launching an additional process and
serializing the input arrays for each trial was significantly larger
than the speed gains. To run the algorithm efficiently in parallel
multithreading is needed to leverage shared memory on shared inputs.

This commit rewrites the cython routine using rust. This was done for
two reasons. The first is that rust's safety guarantees make dealing
with and writing parallel code much easier and safer. It's also
multiplatform because the rust language supports native threading
primatives in language. The second is while writing parallel cython
code using open-mp there are limitations with it, mainly on windows. In
practice it was also difficult to write and maintain parallel cython
code as it has very strict requirements on python and c code
interactions. It was much faster and easier to port it to rust and the
performance for each iteration (outside of parallelism) is the same (in
some cases marginally faster) in rust. The implementation here reuses
the data structures that the previous cython implementation introduced
(mainly flattening all the terra objects into 1d or 2d numpy arrays for
efficient access from C).

The speedups from this PR can be significant, calling transpile() on a
400 qubit (with a depth of 10) QV model circuit targetting a 409 heavy
hex coupling map goes from ~200 seconds with the single threaded cython
to ~60 seconds with this PR locally on a 32 core system, When transpiling
a 1000 qubit (also with a depth of 10) QV model circuit targetting a 1081
qubit heavy hex coupling map goes from taking ~6500 seconds to ~720
seconds.

The tradeoff with this PR is for local qiskit-terra development a rust
compiler needs to be installed. This is made trivial using rustup
(https://rustup.rs/), but it is an additional burden and one that we
might not want to make. If so we can look at turning this PR into a
separate repository/package that qiskit-terra can depend on. The
tradeoff here is that we'll be adding friction to the api boundary
between the pass and the core swap trials interface. But, it does ease
the dependency on development for qiskit-terra.
@mtreinish mtreinish force-pushed the parallel-stochastic-swap-in-rust branch from c1dc67c to 9d9c2dc Compare February 14, 2022 14:45
@mtreinish
Copy link
Member Author

Just for comparison I reran the 1k qubit QV transpile with logging enabled per this script:

import logging
import time

import numpy as np
import qiskit
from qiskit import *
from qiskit.quantum_info import random_unitary
from qiskit.transpiler import CouplingMap

logging.basicConfig(level="INFO")


def build_qv_model_circuit(width, depth, seed=None):
    """
    The model circuits consist of layers of Haar random
    elements of SU(4) applied between corresponding pairs
    of qubits in a random bipartition.
    """
    np.random.seed(seed)
    circuit = QuantumCircuit(width)
    # For each layer
    for _ in range(depth):
        # Generate uniformly random permutation Pj of [0...n-1]
        perm = np.random.permutation(width)
        # For each pair p in Pj, generate Haar random SU(4)
        for k in range(int(np.floor(width/2))):
            U = random_unitary(4)
            pair = int(perm[2*k]), int(perm[2*k+1])
            circuit.append(U, [pair[0], pair[1]])
    return circuit

qv = build_qv_model_circuit(1000, 10)
basis_gates = ["id", "u1", "u2", "u3", "cx"]
cmap = CouplingMap.from_heavy_hex(21)
start_time = time.time()
transpile(qv, coupling_map=cmap, basis_gates=basis_gates, seed_transpiler=42)
stop_time = time.time()
print(stop_time - start_time)

to get the isolated stochastic swap times as part of the larger transpile() call. With this PR applied it logged 553533.51545 ms and without this PR (using the serial cython implementation) it logged 3585305.39703 ms

@ewinston
Copy link
Contributor

I profiled random circuits of depth 10 and variable width on a grid lattice. For the single threaded cython I got,
image
While for the multithreaded rust binary on a cpu with 16 cores I got a ~3.8x improvement in speed.
image
I noticed that while the rust version was running, although all the cores were in use they did not always seem to be at 100% which might partially account for the ratio.

Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally am fine with adding Rust components to Terra, but I have a much larger tolerance for increases in complexity of development tooling than most others, I think. It probably wouldn't hurt for us to coalesce our compiled components into either Cython or Rust, but not both, in due course.

I'd like to hear others' opinions on adding Rust (unless you all talked about this in a meeting while I was out), since I think it affects the larger Terra dev team in quite a big way.

The changes largely look fine to me - it looks like a pretty mechanical conversion of the Cython code to Rust (you've still got the variables ii, jj and kk which are a dead giveaway that Paul originally wrote it!). I left a few comments about typos, and a couple that are basically just beginner Rust questions.

src/edge_collections.rs Outdated Show resolved Hide resolved
src/edge_collections.rs Show resolved Hide resolved
qiskit/transpiler/passes/routing/stochastic_swap.py Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
src/edge_collections.rs Show resolved Hide resolved
src/stochastic_swap.rs Outdated Show resolved Hide resolved
src/stochastic_swap.rs Outdated Show resolved Hide resolved
Comment on lines 33 to 47
#[inline]
fn compute_cost(
dist: &ArrayView2<f64>,
layout: &NLayout,
gates: &[usize],
num_gates: usize,
) -> f64 {
(0..num_gates)
.map(|kk| {
let ii = layout.logic_to_phys[gates[2 * kk]];
let jj = layout.logic_to_phys[gates[2 * kk + 1]];
dist[[ii, jj]]
})
.fold(0.0, |a, b| a + b)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is minor, but something I was thinking about. Given floating-point maths, fold can't assume that the addition is associative, so it can't do fast-math-like optimisations like turning the fold into a threaded reduction. Given that this seems to be the worker routine of the loop (by eye - I did exactly zero profiling), is it worth us looking at that for large sizes?

The addition technically still isn't associative, but this shouldn't be a situation where we care about small differences - anything with a coupled link large enough to really screw with the calculation results will never be the best solution anyway, and there can't be catastrophic cancellation because the values should all be positive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is something I've thought about and why I switched from the for loop in the original cython to the iterator based approach here. My concern with doing a multithreaded reduce here was we're already in a parallel context when this is run so I was worried we just be introducing more overhead.

Basically if you do:

Suggested change
#[inline]
fn compute_cost(
dist: &ArrayView2<f64>,
layout: &NLayout,
gates: &[usize],
num_gates: usize,
) -> f64 {
(0..num_gates)
.map(|kk| {
let ii = layout.logic_to_phys[gates[2 * kk]];
let jj = layout.logic_to_phys[gates[2 * kk + 1]];
dist[[ii, jj]]
})
.fold(0.0, |a, b| a + b)
}
#[inline]
fn compute_cost(
dist: &ArrayView2<f64>,
layout: &NLayout,
gates: &[usize],
num_gates: usize,
) -> f64 {
(0..num_gates)
.into_par_iter()
.map(|kk| {
let ii = layout.logic_to_phys[gates[2 * kk]];
let jj = layout.logic_to_phys[gates[2 * kk + 1]];
dist[[ii, jj]]
})
.reduce(|| 0.0, |a, b| a + b)
}

it becomes a parallel reduction. My other thought here was if there was maybe some simd operation we could use here too

As for the fp error here, yeah I agree. Honestly the typing was a bit fast and loose in the cython around this call. The distance matrix here is always an integer (although the array comes from retworkx distance_matrix() function) because it's just counting edges between nodes in the graph. So when this is called with cdist we don't really have to worry about it, but for the times this is called with scale that is actually a float.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, it probably needs a bunch of considerations for the fastest speed here, because there's a couple of levels of memory indirection before we get to the register variables we want.

Yeah, the threading inside process parallelisation may well not be worth it in the end. Maybe something to look at in the future, but probably not super high priority.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, it probably needs a bunch of considerations for the fastest speed here, because there's a couple of levels of memory indirection before we get to the register variables we want.

Yeah, the threading inside process parallelisation may well not be worth it in the end. Maybe something to look at in the future, but probably not super high priority.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we're already multithreaded here. That's where the speed advantage comes in this PR, the outer trials loop is executed in a rayon parallel iterator. This does raise a good point that we might still be in parallel processes too (although given the state of our multiprocessing support on windows, macOS>=py3.8, or linux==py3.9) and this multithreaded single pass might be more harmful for performance then. What I meant here though was nested multithreaded worker pools. I did try this locally just a little while ago to see what would happen and it basically has pegged all 64 threads on my workstation for about 1.5 hours now. So definitely not the approach we want to take. :) (we might be able to try to balance threads a bit like on my workstation I have 44 free threads, when running with 20 trials, which means we could probably get away with using 2 threads per trial here to do this sum in parallel, but my workstation is a pretty extreme case).

I was also looking at simdeez as a potential way to use simd here, but yeah I agree it will require more thought (I'll also have to brush up on my vectorized intrinsics to figure out how to use that lib) and is a future thing. We can leave optimizing this function as a future todo

src/stochastic_swap.rs Outdated Show resolved Hide resolved
src/stochastic_swap.rs Outdated Show resolved Hide resolved
@jakelishman
Copy link
Member

Also, with us potentially adding a new compiled dependency, how does Rust handle choosing the set of CPU instructions it's going to target? I'm assuming it does something sensible, but just wanted to check that we're not accidentally going to start compiling in AVX-512 instructions if the build machine happens to have them available.

@mtreinish
Copy link
Member Author

mtreinish commented Feb 16, 2022

Also, with us potentially adding a new compiled dependency, how does Rust handle choosing the set of CPU instructions it's going to target? I'm assuming it does something sensible, but just wanted to check that we're not accidentally going to start compiling in AVX-512 instructions if the build machine happens to have them available.

By default it's fairly conservative. For x86_64 the only extra features over the core ISA it uses are fxsr, sse, and sse2 (which pretty much every 64bit x86 cpu supports). To get more than that you have to explicitly ask to tune for a specific CPU (either on the command line, via a env variable, or I think you can change the default in the Cargo.toml too). For example, you can do RUSTFLAGS='-Ctarget-cpu=native' pip install . and it will build for your native local CPU. We can obviously tune this to be something a bit more modern if we wanted, for example if we set the target cpu to x86_64-v2 it adds sse2, sse3, sse4.1, sse4.2, and popcount which basically limits us to everything newer than 2008. But for retworkx at least I opted to maximize compatibility in the precompiled binaries and just left it as is with the generic x86_64 target even if that potentially left some performance on the floor.

@mtreinish
Copy link
Member Author

I noticed that while the rust version was running, although all the cores were in use they did not always seem to be at 100% which might partially account for the ratio.

The library I used for parallel iterators, rayon, uses a thread pool with work stealing to implement the parallel execution. When I was running on my 16 vCPU (pinned to 8 physical cores) windows vm and my old desktop/benchmarking machine (also with physical 8 cores and 16 threads) I saw a pretty clear pattern for optimization level 1 with 20 trials configured on each layer permutation that it would start full running on all the cores and then after those started to finish there were still 4 trials left which would run in 4 threads as the 16 initial executions started to finish. But as those last 4 ran most of the cores were idle. I didn't really look closer than that though, but if we're not getting 100% cpu utilization on each thread we can look at profiling this to see where things are waiting because I really wouldn't expect much idle time and if there is some that's pointing some further optimization we can do here.

mtreinish and others added 6 commits February 16, 2022 09:34
This commit fixes how we package the compiled rust module in
qiskit-terra. As a single rust project only gives us a single compiled
binary output we can't use the same scheme we did previously with cython
with a separate dynamic lib file for each module. This shifts us to
making the rust code build a `qiskit._accelerate` module and in that we
have submodules for everything we need from compiled code. For this PR
there is only one submodule, `stochastic_swap`, so for example the
parallel swap_trials routine can be imported from
`qiskit._accelerate.stochastic_swap.swap_trials`. In the future we can
have additional submodules for other pieces of compiled code in qiskit.
For example, the likely next candidate is the pauli expectation value
cython module, which we'll likely port to rust and also make parallel
(for sufficiently large number of qubits). In that case we'd add a new
submodule for that functionality.
This commit corrects the use of the normal distribution to have the mean
set to 1.0. Previously we were doing this out of band for each value by
adding 1 to the random value which wasn't necessary because we could
just generate it with a mean of 1.0.
This commit removes an unecessary extra scope around the locked read for
where we store the best solution. The scope was previously there to
release the lock after we check if there is a solution or not. However
this wasn't actually needed as we can just do the check inline and the
lock will release after the condition block.
Co-authored-by: Jake Lishman <jake@binhbar.com>
Previously the swap_trials() function had an explicit lifetime
annotation `'p` which wasn't necessary because the compiler can
determine this on it's own. Normally when dealing with numpy views and a
Python object (i.e. a GIL handle) we need a lifetime annotation to tell
the rust compiler the numpy view and the python gil handle will have the
same lifetime. But since swap_trials doesn't take a gil handle and
operates purely in rust we don't need this lifetime and the rust
compiler can deal with the lifetime of the numpy views on their own.
@jlapeyre
Copy link
Contributor

By default it's fairly conservative.

If this were a separate package, is it possible for the a user to optionally build from source while installing? The number of people willing to do it would be small enough that it is unlikely to be worth it unless there are already robust tools for installing from source.

@jlapeyre
Copy link
Contributor

The tradeoff with this PR is for local qiskit-terra development a rust
compiler needs to be installed. This is made trivial using rustup
(https://rustup.rs/)

I mostly metoo @jakelishman 's first comment. I am inclined to vote yes. But, I use linux, so I find the burden is essentially zero. How many people develop Qiskit on windows? I often hear that installing and using tools on windows is not easy, even when it's superficially easy.

@mtreinish
Copy link
Member Author

mtreinish commented Feb 16, 2022

By default it's fairly conservative.

If this were a separate package, is it possible for the a user to optionally build from source while installing? The number of people willing to do it would be small enough that it is unlikely to be worth it unless there are already robust tools for installing from source.

Yeah, you can do this very easily either as a separate package or if it was part of qiskit-terra (there's functionally no difference for cpu tuning). Either way it's basically the same step you basically just need to set the env variable RUSTFLAGS=-Ctarget-cpu=native or RUSTFLAGS=-Ctarget-cpu=x86_64-v3 when you call pip install either on a local checkout or on the sdist we publish to pypi. That will be passed to the rust compiler internally by setuptools-rust when it compiles the rust extension. For example, on retworkx you can do this today with something like: RUSTFLAGS="-Ctarget-cpu=native" pip install --no-binary retworkx retworkx which will build it from the published sdist tuned for your local cpu.

@mtreinish
Copy link
Member Author

mtreinish commented Feb 16, 2022

The tradeoff with this PR is for local qiskit-terra development a rust
compiler needs to be installed. This is made trivial using rustup
(https://rustup.rs/)

I mostly metoo @jakelishman 's first comment. I am inclined to vote yes. But, I use linux, so I find the burden is essentially zero. How many people develop Qiskit on windows? I often hear that installing and using tools on windows is not easy, even when it's superficially easy.

I'm not sure how many people actively are developing on a windows as their daily driver. I do it from time to time, mostly to debug environment specific issues (I've lost count of the number of FP precision issues we've hit in windows), but it's not my daily/primary dev environment. I will say installing rust on Windows is about as easy as on linux. Instead of being a command line program your curl and run rustup is just a .exe which opens a shell and runs the same basic script (they also have a graphical .msi installer available, but I've never tried it). The only complexity is you do need a c compiler (well technically the linker) which on windows by default is msvc. But this was already a requirement to do terra dev with the cython stuff. So at least for me testing this PR on my VM yesterday I could just run the rustup .exe file I got from https://rustup.rs/ and everything just worked the same.

@jlapeyre
Copy link
Contributor

will build it from the published sdist tuned for your local cpu.
That's really slick.

yesterday I could just run the rustup .exe

My comment was partly inspired by juliaup story (only partly, there are many other stories of installation woe on windows). juliaup is written in rust and is modeled closely on rustup. But, they still have problems. For example, winget install julia -s msstore can fail. But, it may be easier for developers to work around problems than average users. Maybe the juliaup authors have not yet been able to organize something as simple as a web page with a download link to rustup-init.exe.

Beyond installing rustup, there is getting rust to work. Searching shows things like this: https://stackoverflow.com/a/68835925 . I'm still not saying adding rust is the wrong move, just trying to predict what might happen... I suppose getting cython to work on windows can be a problem too, and eventually, that requirement could be dropped.

This commit fixes the python lint failures and also updates the ci
configuration for the lint job to also run rust's style and lint
enforcement.
@jakelishman
Copy link
Member

To be fair, that stackoverflow post is about linking Windows Rust against mingw's gcc rather than the first-class-supported Visual C++ tooling. That's a very niche use-case, and you'd have similar issues trying to force current Terra to build with a non-standard C compiler on any operating system.

"Getting Rust to work" shouldn't be an issue with pre-compiled wheels - it shouldn't be any different to linking against a compiled C library - Terra already relies on retworkx, which is all distributed Rust code.

@jlapeyre
Copy link
Contributor

To be fair, that stackoverflow post is about linking Windows Rust against mingw's gcc

You are correct, they say explicitly that they are trying to do a non-standard installation. That didn't sink in the first time I read it. I could dig some more, but I'm not going to run into a show stopper. I guess the development requirements won't be an excessive burden. Starting with one module seems like a good idea. If it doesn't work out, it is not too much trouble to revert.

src/nlayout.rs Outdated Show resolved Hide resolved
This commit fixes the output list from the `layout_mapping()`
method of `NLayout`. Previously, it incorrectly would return the
wrong indices it should be a list of virtual -> physical to
qubit pairs. This commit corrects this error

Co-authored-by: georgios-ts <45130028+georgios-ts@users.noreply.github.com>
@mergify mergify bot merged commit ccc371f into Qiskit:main Feb 28, 2022
@mtreinish mtreinish deleted the parallel-stochastic-swap-in-rust branch February 28, 2022 21:50
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Feb 28, 2022
This commit replaces the cython implementation of the pauli expectation
value functions with a multithreaded rust implementation. This was done
primarily for two reasons, the first and primary reason for this change
is because after Qiskit#7658 this module was the only cython code left in the
qiskit-terra repository so unifying on a single compiled language will
reduce the maintanence burden in qiskit-terra. The second reason is
similar to the rationale in Qiskit#7658 around why using rust over cython for
multi-threaded hybrid python module. The difference here though is
unlike in stochastic swap this module isn't as performance critical as
it's not nearly as widely used.
mtreinish added a commit to mtreinish/qiskit that referenced this pull request Mar 1, 2022
This commit updates the asv configuration to support two recent changes
in the terra repo. The first is updating the supported python version
list to reflect the current versions supported by terra. Python 3.6 is
no longer supported and python 3.10 is now supported. Additionally,
after Qiskit/qiskit#7658 merged setuptools-rust and Rust are now
being used to build compiled extensions. While cython is still being
used, it's use will be removed soon with Qiskit/qiskit#7702. This
commit updates the build configuration to build the rust extension and
then build a wheel from it instead of building the cython extension.
jakelishman pushed a commit to Qiskit/qiskit-metapackage that referenced this pull request Mar 1, 2022
This commit updates the asv configuration to support two recent changes
in the terra repo. The first is updating the supported python version
list to reflect the current versions supported by terra. Python 3.6 is
no longer supported and python 3.10 is now supported. Additionally,
after Qiskit/qiskit#7658 merged setuptools-rust and Rust are now
being used to build compiled extensions. While cython is still being
used, it's use will be removed soon with Qiskit/qiskit#7702. This
commit updates the build configuration to build the rust extension and
then build a wheel from it instead of building the cython extension.
mtreinish added a commit to mtreinish/qiskit that referenced this pull request Mar 1, 2022
With Qiskit/qiskit#7658 and Qiskit/qiskit#7702 not far
behind the requiremetns for building terra from source will be changed.
A C++ compiler is no longer required and instead a rust compiler is
needed. This commit updates the instructions on building from source and
also removes so old out of date notes from the document at the same
time.
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Mar 3, 2022
In Qiskit#7658 we updated added rust code to the Qiskit build environment.
This was done to accelerate performance critical portions of the
library. However, in Qiskit#7658 we overlooked the binder tests which are used
to perform image comparisons for visualizations in a controlled
environment. The base binder docker image does not have the rust
compiler installed, so we need to manually install it prior to running
pip to install terra. This commit takes care of this and adds rust to
the binder environment.
mergify bot added a commit that referenced this pull request Mar 4, 2022
* Add rust to binder postBuild

In #7658 we updated added rust code to the Qiskit build environment.
This was done to accelerate performance critical portions of the
library. However, in #7658 we overlooked the binder tests which are used
to perform image comparisons for visualizations in a controlled
environment. The base binder docker image does not have the rust
compiler installed, so we need to manually install it prior to running
pip to install terra. This commit takes care of this and adds rust to
the binder environment.

* Install rust via conda for binder env

The binder image build is implicitly installing terra as it launches. So
trying to install rust manually as part of the postBuild script is too
late because it will have failed by then. Looking at the available
configuration files:

    https://mybinder.readthedocs.io/en/latest/config_files.html?#configuration-files

we can install conda packages defined in environment.yml as part of the
image build, and this will occur prior to the installation of terra.
This commit pivots to using the environment.yml to do this and we can
rely on the conda packaged version of rust instead of rustup.

* Move all binder files .binder/ dir

* Remove duplicate pip install in postBuild

* Revert "Remove duplicate pip install in postBuild"

This was actually needed, without this terra isn't actually installed.

This reverts commit d61c7a6.

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
ikkoham added a commit to ikkoham/qiskit-terra that referenced this pull request Mar 7, 2022
commit 689dd275aaf81c46b8c8e399097373a5ecc136a5
Merge: 408745636 77219b5c7
Author: Ikko Hamamura <ikkoham@users.noreply.github.com>
Date:   Mon Mar 7 09:20:32 2022 +0900

    Merge branch 'main' into primitives/base-class

commit 408745636c507003419ae2a51480554e9b0524b4
Author: Ikko Hamamura <ikkoham@users.noreply.github.com>
Date:   Sat Mar 5 12:38:38 2022 +0900

    Apply suggestions from code review

commit 77219b5c7b7146b1545c5e5190739b36f4064b2f
Author: Jake Lishman <jake.lishman@ibm.com>
Date:   Fri Mar 4 20:38:04 2022 +0000

    Workaround Aer bug with subnormal floats in randomised tests (#7719)

    Aer currently sets `-ffast-math` during compilation, which when compiled
    with GCC causes the CPU's floating-point rounding mode to be set to
    "flush to zero", and subnormal numbers are disallowed.  This should not
    be the case, and Qiskit/qiskit-aer#1469 will solve the problem in
    release.  Until then, we must instruct `hypothesis` to avoid subnormal
    numbers in its floating-point strategies, as since version 6.38 it
    explicitly tests to ensure that they are functional, if used.

    This commit should be reverted once Aer no longer sets `-ffast-math`.

    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit 79cbb2455d2de2eead5e145785444ab42b5f8380
Author: Iulia Zidaru <iuliazidaru@users.noreply.github.com>
Date:   Fri Mar 4 20:52:03 2022 +0200

    Relocate mock backends from qiskit.test.mock to qiskit.mock (#7437)

    * Relocate mock backends from qiskit.test.mock to qiskit.mock

    * Relocate mock backends from qiskit.test.mock to qiskit.mock

    * fix: Inline literal start-string without end-string

    * change package to qiskit.providers.fake_provider

    * fix test failure

    * reformat file

    * fix review comments

    * Release note is API change not feature

    Co-authored-by: Jake Lishman <jake@binhbar.com>

commit 6e29dfe5c431a21828cc59868ebdb204c7f5c3a0
Author: ikkoham <ikkoham@users.noreply.github.com>
Date:   Sat Mar 5 01:43:23 2022 +0900

    fix by the suggestion

commit 4e608871e79b89beed4c79fcb1c78014ea6874f6
Author: ikkoham <ikkoham@users.noreply.github.com>
Date:   Sat Mar 5 01:40:34 2022 +0900

    fix BaseSampler's doc

commit 9f976ce01ba7e640f9fa6819c022d0b3ab4a9364
Author: Lev Bishop <18673315+levbishop@users.noreply.github.com>
Date:   Fri Mar 4 11:03:12 2022 -0500

    Update qiskit/primitives/base_estimator.py

commit db522c021b77cb7874aa1e1a29e2d44189fe440b
Author: Ikko Hamamura <ikkoham@users.noreply.github.com>
Date:   Sat Mar 5 00:16:08 2022 +0900

    Apply suggestions from code review

    Co-authored-by: Ali Javadi-Abhari <ajavadia@users.noreply.github.com>

commit 7380fafc2e6203fb0b930e2769ccd817c7812c77
Merge: 7b143f7da 439f7a633
Author: Ikko Hamamura <ikkoham@users.noreply.github.com>
Date:   Fri Mar 4 23:47:37 2022 +0900

    Merge pull request #40 from t-imamichi/fix-sphnix

    fix sphinx markup

commit 439f7a63395892a27140e683d8a1648a5a6dd696
Author: Takashi Imamichi <imamichi@jp.ibm.com>
Date:   Fri Mar 4 23:46:29 2022 +0900

    fix sphinx markup

commit 7b143f7dac4f5ca901a424ff09d00d518d6cb622
Author: ikkoham <ikkoham@users.noreply.github.com>
Date:   Fri Mar 4 22:29:11 2022 +0900

    Fix according to comments

commit 8cf25aa16ca6318565f7fc742dfd9ef41df7fdbd
Author: ikkoham <ikkoham@users.noreply.github.com>
Date:   Fri Mar 4 22:00:46 2022 +0900

    pick from https://github.com/levbishop/qiskit-terra/commit/a5033d7785d58157a365892a2f954caeecaaabdf

commit 32f769e367666cc2c0c0a2f7dcdc2f98c2da4baf
Merge: 24b8a765e a8d7f707b
Author: Ikko Hamamura <ikkoham@users.noreply.github.com>
Date:   Fri Mar 4 21:58:39 2022 +0900

    Merge pull request #39 from ikkoham/primitives/base-class-remove-grouping

    Remove grouping

commit a8d7f707b8abf37840eb60e95b37003925500e9f
Author: ikkoham <ikkoham@users.noreply.github.com>
Date:   Fri Mar 4 21:54:58 2022 +0900

    use typing instead of collections.abc

commit 24b8a765e114a15cc44858181a91b293fbe40357
Author: ikkoham <ikkoham@users.noreply.github.com>
Date:   Fri Mar 4 21:54:58 2022 +0900

    use typing instead of collections.abc

commit 592e1bed006fd345704bd4dba6980a2808fceadf
Author: ikkoham <ikkoham@users.noreply.github.com>
Date:   Fri Mar 4 21:49:08 2022 +0900

    remove grouping

commit 77140eb0d8e99f12826cb039ff45b68194070941
Author: ikkoham <ikkoham@users.noreply.github.com>
Date:   Fri Mar 4 21:47:41 2022 +0900

    use Iterable

commit becf352555a50e07ab0201efa7c21ec52632e7b3
Author: Ikko Hamamura <ikkoham@users.noreply.github.com>
Date:   Fri Mar 4 21:27:47 2022 +0900

    Update qiskit/primitives/base_estimator.py

    Co-authored-by: Lev Bishop <18673315+levbishop@users.noreply.github.com>

commit 67a0f19a9b2510265c3f14da51972d3eb8695c39
Author: ikkoham <ikkoham@users.noreply.github.com>
Date:   Fri Mar 4 21:13:25 2022 +0900

    remove duplicated methods and remove variances

commit c532395226943df5b5fe2cc0707c07676129b46e
Merge: ce0ad7625 6bb4b1d91
Author: Ikko Hamamura <ikkoham@users.noreply.github.com>
Date:   Fri Mar 4 19:29:02 2022 +0900

    Merge pull request #38 from levbishop/primitives/base-class

    Primitives/base class

commit 6bb4b1d91022795dfc51601ea3fc7bb0eb3bab34
Author: Ikko Hamamura <ikkoham@users.noreply.github.com>
Date:   Fri Mar 4 19:28:54 2022 +0900

    Update qiskit/primitives/base_sampler.py

    Co-authored-by: Takashi Imamichi <31178928+t-imamichi@users.noreply.github.com>

commit bf9b52454c22878c61209f428e46e4600bd4f3c2
Author: Ikko Hamamura <ikkoham@users.noreply.github.com>
Date:   Fri Mar 4 19:28:48 2022 +0900

    Update qiskit/primitives/base_sampler.py

    Co-authored-by: Takashi Imamichi <31178928+t-imamichi@users.noreply.github.com>

commit dbe5bb544d73e05c585d2c00835752b44858e9f9
Merge: b7fc45e40 ce0ad7625
Author: Ikko Hamamura <ikkoham@users.noreply.github.com>
Date:   Fri Mar 4 19:00:52 2022 +0900

    Merge branch 'primitives/base-class' into primitives/base-class

commit b7fc45e40112619d6aff9085fcf041b0c6ba3bd9
Author: Lev S. Bishop <18673315+levbishop@users.noreply.github.com>
Date:   Fri Mar 4 03:50:39 2022 -0500

    Parameter ordering and docs

commit ce0ad7625d7b645dd48e9f3e2db7366dd50db3ed
Merge: 3f9cc5cbd 75b7a7a5a
Author: Ikko Hamamura <ikkoham@users.noreply.github.com>
Date:   Fri Mar 4 17:47:05 2022 +0900

    Merge pull request #37 from t-imamichi/doc-fix

    Doc fix

commit 75b7a7a5a3ba1ea14d9d26c414160feeeb7d328b
Author: Takashi Imamichi <imamichi@jp.ibm.com>
Date:   Fri Mar 4 17:41:05 2022 +0900

    fix sampler example

commit eaabfc84094d19ec42fcb6ebb8947a51e9e84657
Author: Lev S. Bishop <18673315+levbishop@users.noreply.github.com>
Date:   Fri Mar 4 03:19:36 2022 -0500

    Release note

commit 3f9cc5cbdc25bf0f4ac4c3f3c04e2f2dfed48248
Author: ikkoham <ikkoham@users.noreply.github.com>
Date:   Fri Mar 4 16:55:28 2022 +0900

    move dunder methods and fix lint

commit b2d8030d841223b6f3b25fafabc2c23cca790668
Author: ikkoham <ikkoham@users.noreply.github.com>
Date:   Fri Mar 4 16:16:29 2022 +0900

    add parenthesis

commit 45a8d85e0dce602d59470698190bb03a47b2be68
Author: ikkoham <ikkoham@users.noreply.github.com>
Date:   Fri Mar 4 16:14:48 2022 +0900

    fix docs to pass the CI

commit 5f96cd9025d0e85b8d3492daf6602d7c9bb3d70b
Author: Takashi Imamichi <imamichi@jp.ibm.com>
Date:   Fri Mar 4 15:43:47 2022 +0900

    fix docstring

commit 348338ecfb7877771a3bb49d835547f068feda99
Author: ikkoham <ikkoham@users.noreply.github.com>
Date:   Fri Mar 4 12:42:33 2022 +0900

    fix type hints

commit d26515980f10f6beab7fe22ac7ce529fceeb0825
Merge: 0fb08e94c f08e647cd
Author: Ikko Hamamura <ikkoham@users.noreply.github.com>
Date:   Fri Mar 4 11:37:09 2022 +0900

    Merge branch 'main' into primitives/base-class

commit f08e647cd67f0644e1080e86f30a88705bfcc449
Author: Matthew Treinish <mtreinish@kortar.org>
Date:   Thu Mar 3 21:32:17 2022 -0500

    Add rust to binder configuration (#7732)

    * Add rust to binder postBuild

    In #7658 we updated added rust code to the Qiskit build environment.
    This was done to accelerate performance critical portions of the
    library. However, in #7658 we overlooked the binder tests which are used
    to perform image comparisons for visualizations in a controlled
    environment. The base binder docker image does not have the rust
    compiler installed, so we need to manually install it prior to running
    pip to install terra. This commit takes care of this and adds rust to
    the binder environment.

    * Install rust via conda for binder env

    The binder image build is implicitly installing terra as it launches. So
    trying to install rust manually as part of the postBuild script is too
    late because it will have failed by then. Looking at the available
    configuration files:

        https://mybinder.readthedocs.io/en/latest/config_files.html?#configuration-files

    we can install conda packages defined in environment.yml as part of the
    image build, and this will occur prior to the installation of terra.
    This commit pivots to using the environment.yml to do this and we can
    rely on the conda packaged version of rust instead of rustup.

    * Move all binder files .binder/ dir

    * Remove duplicate pip install in postBuild

    * Revert "Remove duplicate pip install in postBuild"

    This was actually needed, without this terra isn't actually installed.

    This reverts commit d61c7a68e88747cd38f82643c2b35eca33932a52.

    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit 0fb08e94cd856170d1b5e57f078ea30a004c86bf
Merge: b88e398c9 6e47c2cd2
Author: Lev S. Bishop <18673315+levbishop@users.noreply.github.com>
Date:   Thu Mar 3 20:46:53 2022 -0500

    Merge branch 'primitives/base-class' of github.com:ikkoham/qiskit-terra into primitives/base-class

commit b88e398c94867325f5cc031cfceca0cfeaad93ff
Author: Lev S. Bishop <18673315+levbishop@users.noreply.github.com>
Date:   Thu Mar 3 20:45:50 2022 -0500

    Backwards compatible typechecking

commit 6e47c2cd20cc607d2c5039e2670c0a063cee34ed
Author: Lev Bishop <18673315+levbishop@users.noreply.github.com>
Date:   Thu Mar 3 18:09:47 2022 -0500

    Delete prim.md

    Added in error

commit 8a70ff0536f934be1cb4f5bdda48d5f0e2153e9d
Author: Lev S. Bishop <18673315+levbishop@users.noreply.github.com>
Date:   Thu Mar 3 17:53:29 2022 -0500

    Zip parameters and circuits

commit 0be2c44f2ce4e0b83cb49bb275d2db30aef56b3c
Author: Matthew Treinish <mtreinish@kortar.org>
Date:   Thu Mar 3 15:12:04 2022 -0500

    Update FakeWashington backend with new API snapshots (#7731)

    The FakeWashington backend was recently added in #7392 but when that PR
    was created the washington device was missing it's pulse defaults
    payload. Since that PR was first created the IBM API is now returing a
    pulse defaults payload. This commit updates the FakeWashington backend
    to use current snapshots which includes the missing data. It is then
    changed to be a pulse backend now that we have the defaults payload
    available.

commit 9a757c8ae20aa88dec6841e3986da0b4ce70b4c9
Author: Matthew Treinish <mtreinish@kortar.org>
Date:   Thu Mar 3 12:47:13 2022 -0500

    Support reproducible builds of Rust library (#7728)

    By default Rust libraries don't ship a Cargo.lock file. This is to allow
    other Rust consumers of the library to pick a compatible version with
    the other upstream dependencies. [1] However, the library we build in
    Qiskit is a bit different since it's not a traditional Rust library but
    instead we're building a C dynamic library that is meant to be consumed
    by Python. This is much closer a model to developing a Rust binary
    program because we're shipping a standalone binary. To support
    reproducible builds we should include the Cargo.lock file in our source
    distribution to ensure that all builds of qiskit-terra are using the
    same versions of our upstream Rust dependencies. This commit commits the
    missing Cargo.lock file, removes it from the .gitignore (which was added
    automatically by cargo when creating a library project), and includes it
    in the sdist. This will ensure that any downstream consumer of terra
    from source will have a reproducible build. Additionally this adds a
    dependabot config file so the bot will manage proposing version bumps on
    upstream project releases, since we probably want to be using the latest
    versions on new releases in our lock file.

    [1] https://doc.rust-lang.org/cargo/faq.html#why-do-binaries-have-cargolock-in-version-control-but-not-libraries

commit 4b86e1ef052d66beada61f039057006f4e9f909f
Merge: f10d130b3 148c04448
Author: Ikko Hamamura <ikkoham@users.noreply.github.com>
Date:   Thu Mar 3 17:30:54 2022 +0900

    Merge pull request #36 from t-imamichi/doc

    (wip) docstrings

commit 148c04448d564c937812c9c90c5828106c753b1a
Author: Takashi Imamichi <imamichi@jp.ibm.com>
Date:   Thu Mar 3 17:24:28 2022 +0900

    (wip) docstrings

commit f10d130b3b1102591d1d7039575c623c821692dd
Merge: 139e0cd52 ccfed937f
Author: Ikko Hamamura <ikkoham@users.noreply.github.com>
Date:   Wed Mar 2 23:34:45 2022 +0900

    Merge pull request #35 from t-imamichi/doc

    (wip) docstrings

commit ccfed937f88210ae4d3beb84851e66dbc3c36cc2
Author: Takashi Imamichi <imamichi@jp.ibm.com>
Date:   Wed Mar 2 23:32:25 2022 +0900

    (wip) docstrings

commit bee5e7f62db400a4c2f6924064413371be0048eb
Author: Julien Gacon <gaconju@gmail.com>
Date:   Tue Mar 1 16:35:19 2022 +0100

    Remove deprecated methods in ``qiskit.algorithms`` (#7257)

    * rm deprecated algo methods

    * add reno

    * fix tests, remove from varalgo

    * intial point was said to be abstract in varalgo!

    * attempt to fix sphinx #1 of ?

    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit 139e0cd529e724c464499eb145fbb80de8e79170
Author: ikkoham <ikkoham@users.noreply.github.com>
Date:   Wed Mar 2 00:35:05 2022 +0900

    add grouping and minimum implementation

commit 9eb6fc3394325684048d835ea15f9a0a5631aee1
Author: ikkoham <ikkoham@users.noreply.github.com>
Date:   Tue Mar 1 23:47:37 2022 +0900

    move result files

commit 733a9b7b5fbcbd44947e66b10c1b3dc17cfad49e
Author: ikkoham <ikkoham@users.noreply.github.com>
Date:   Tue Mar 1 23:37:25 2022 +0900

    Add base classes for primitives

commit ccc371f8ff4dd8fbb7cef41a7d231800a72bda4e
Author: Matthew Treinish <mtreinish@kortar.org>
Date:   Mon Feb 28 16:49:54 2022 -0500

    Implement multithreaded stochastic swap in rust (#7658)

    * Implement multithreaded stochastic swap in rust

    This commit is a rewrite of the core swap trials functionality in the
    StochasticSwap transpiler pass. Previously this core routine was written
    using Cython (see #1789) which had great performance, but that
    implementation was single threaded. The core of the stochastic swap
    algorithm by it's nature is well suited to be executed in parallel, it
    attempts a number of random trials and then picks the best result
    from all the trials and uses that for that layer. These trials can
    easily be run in parallel as there is no data dependency between the
    trials (there are shared inputs but read-only). As the algorithm
    generally scales exponentially the speed up from running the trials in
    parallel can offset this and improve the scaling of the pass. Running
    the pass in parallel was previously tried in #4781 using Python
    multiprocessing but the overhead of launching an additional process and
    serializing the input arrays for each trial was significantly larger
    than the speed gains. To run the algorithm efficiently in parallel
    multithreading is needed to leverage shared memory on shared inputs.

    This commit rewrites the cython routine using rust. This was done for
    two reasons. The first is that rust's safety guarantees make dealing
    with and writing parallel code much easier and safer. It's also
    multiplatform because the rust language supports native threading
    primatives in language. The second is while writing parallel cython
    code using open-mp there are limitations with it, mainly on windows. In
    practice it was also difficult to write and maintain parallel cython
    code as it has very strict requirements on python and c code
    interactions. It was much faster and easier to port it to rust and the
    performance for each iteration (outside of parallelism) is the same (in
    some cases marginally faster) in rust. The implementation here reuses
    the data structures that the previous cython implementation introduced
    (mainly flattening all the terra objects into 1d or 2d numpy arrays for
    efficient access from C).

    The speedups from this PR can be significant, calling transpile() on a
    400 qubit (with a depth of 10) QV model circuit targetting a 409 heavy
    hex coupling map goes from ~200 seconds with the single threaded cython
    to ~60 seconds with this PR locally on a 32 core system, When transpiling
    a 1000 qubit (also with a depth of 10) QV model circuit targetting a 1081
    qubit heavy hex coupling map goes from taking ~6500 seconds to ~720
    seconds.

    The tradeoff with this PR is for local qiskit-terra development a rust
    compiler needs to be installed. This is made trivial using rustup
    (https://rustup.rs/), but it is an additional burden and one that we
    might not want to make. If so we can look at turning this PR into a
    separate repository/package that qiskit-terra can depend on. The
    tradeoff here is that we'll be adding friction to the api boundary
    between the pass and the core swap trials interface. But, it does ease
    the dependency on development for qiskit-terra.

    * Sanitize packaging to support future modules

    This commit fixes how we package the compiled rust module in
    qiskit-terra. As a single rust project only gives us a single compiled
    binary output we can't use the same scheme we did previously with cython
    with a separate dynamic lib file for each module. This shifts us to
    making the rust code build a `qiskit._accelerate` module and in that we
    have submodules for everything we need from compiled code. For this PR
    there is only one submodule, `stochastic_swap`, so for example the
    parallel swap_trials routine can be imported from
    `qiskit._accelerate.stochastic_swap.swap_trials`. In the future we can
    have additional submodules for other pieces of compiled code in qiskit.
    For example, the likely next candidate is the pauli expectation value
    cython module, which we'll likely port to rust and also make parallel
    (for sufficiently large number of qubits). In that case we'd add a new
    submodule for that functionality.

    * Adjust random normal distribution to use correct mean

    This commit corrects the use of the normal distribution to have the mean
    set to 1.0. Previously we were doing this out of band for each value by
    adding 1 to the random value which wasn't necessary because we could
    just generate it with a mean of 1.0.

    * Remove unecessary extra scope from locked read

    This commit removes an unecessary extra scope around the locked read for
    where we store the best solution. The scope was previously there to
    release the lock after we check if there is a solution or not. However
    this wasn't actually needed as we can just do the check inline and the
    lock will release after the condition block.

    * Remove unecessary explicit type from opt_edges variable

    * Fix indices typo in NLayout constructor

    Co-authored-by: Jake Lishman <jake@binhbar.com>

    * Remove explicit lifetime annotation from swap_trials

    Previously the swap_trials() function had an explicit lifetime
    annotation `'p` which wasn't necessary because the compiler can
    determine this on it's own. Normally when dealing with numpy views and a
    Python object (i.e. a GIL handle) we need a lifetime annotation to tell
    the rust compiler the numpy view and the python gil handle will have the
    same lifetime. But since swap_trials doesn't take a gil handle and
    operates purely in rust we don't need this lifetime and the rust
    compiler can deal with the lifetime of the numpy views on their own.

    * Use sum() instead of fold()

    * Fix lint and add rust style and lint checks to CI

    This commit fixes the python lint failures and also updates the ci
    configuration for the lint job to also run rust's style and lint
    enforcement.

    * Fix returned layout mapping from NLayout

    This commit fixes the output list from the `layout_mapping()`
    method of `NLayout`. Previously, it incorrectly would return the
    wrong indices it should be a list of virtual -> physical to
    qubit pairs. This commit corrects this error

    Co-authored-by: georgios-ts <45130028+georgios-ts@users.noreply.github.com>

    * Tweak tox configuration to try and reliably build rust extension

    * Make swap_trials parallelization configurable

    This commit makes the parallelization of the swap_trials() configurable.
    This is dones in two ways, first a new argument parallel_threshold is
    added which takes an optional int which is the number of qubits to
    switch between a parallel and serial version. The second is that it
    takes into account the the state of the QISKIT_IN_PARALLEL environment
    variable. This variable is set to TRUE by parallel_map() when we're
    running in a multiprocessing context. In those cases also running
    stochastic swap in parallel will likely just cause too much load as
    we're potentially oversubscribing work to the number of available CPUs.
    So, if QISKIT_IN_PARALLEL is set to True we run swap_trials serially.

    * Revert "Make swap_trials parallelization configurable"

    This reverts commit 57790c84b03da10fd7296c57b38b54c5bccebf4c. That
    commit attempted to sovle some issues in test running, mainly around
    multiple parallel dispatch causing exceess load. But in practice it was
    broken and caused more issues than it fixed. We'll investigate and add
    control for the parallelization in a future commit separately after all
    the tests are passing so we have a good baseline.

    * Add docs to swap_trials() and remove unecessary num_gates arg

    * Fix race condition leading to non-deterministic behavior

    Previously, in the case of circuits that had multiple best possible
    depth == 1 solutions for a layer, there was a race condition in the fast
    exit path between the threads which could lead to a non-deterministic
    result even with a fixed seed. The output was always valid, but which
    result was dependent on which parallel thread with an ideal solution
    finished last and wrote to the locked best result last. This was causing
    weird non-deterministic test failures for some tests because of #1794 as
    the exact match result would change between runs. This could be a bigger
    issue because user expectations are that with a fixed seed set on the
    transpiler that the output circuit will be deterministically
    reproducible.

    To address this is issue this commit trades off some performance to
    ensure we're always returning a deterministic result in this case. This
    is accomplished by updating/checking if a depth==1 solution has been
    found in another trial thread we only act (so either exit early or
    update the already found depth == 1 solution) if that solution already
    found has a trial number that is less than this thread's trial number.
    This does limit the effectiveness of the fast exit, but in practice it
    should hopefully not effect the speed too much.

    As part of this commit some tests are updated because the new
    deterministic behavior is slightly different from the previous results
    from the cython serial implementation. I manually verified that the
    new output circuits are still valid (it also looks like the quality
    of the results in some of those cases improved, but this is strictly
    anecdotal and shouldn't be taken as a general trend with this PR).

    * Apply suggestions from code review

    Co-authored-by: georgios-ts <45130028+georgios-ts@users.noreply.github.com>

    * Fix compiler errors in previous commit

    * Revert accidental commit of parallel reduction in compute_cost

    This was only a for local testing to prove it was a bad idea and was
    accidently included in the branch. We should not nest the parallel
    execution like this.

    * Eliminate short circuit for depth == 1 swap_trial() result

    This commit eliminates the short circuit fast return in swap_trial()
    when another trial thread has found an ideal solution. Trying to do this
    in a parallel context is tricky to make deterministic because in cases
    of >1 depth == 1 solutions there is an inherent race condition between
    the threads for writing out their depth == 1 result to the shared
    location. Different strategies were tried to make this reliably
    deterministic but there wa still a race condition. Since this was just a
    performance optimization to avoid doing unnecessary work this commit
    removes this step. Weighing improved performance against repeatability
    in the output of the compiler, the reproducible results are more
    important. After we've adopted a multithreaded stochastic swap we can
    investigate adding this back as a potential future optimization.

    * Add missing docstrings

    * Add section to contributing on installing form source

    * Make rust python classes pickleable

    * Add rust compiler install to linux wheel jobs

    * Try more tox changes to fix docs builds

    * Revert "Eliminate short circuit for depth == 1 swap_trial() result"

    This reverts commit c510764a770cb610661bdb3732337cd45ab587fd. The
    removal there was premature and we had a fix for the non-determinism in
    place, ignoring a typo which was preventing it from working.

    Co-Authored-By: Georgios Tsilimigkounakis <45130028+georgios-ts@users.noreply.github.com>

    * Fix submodule declaration and module attribute on rust classes

    * Fix rust lint

    * Fix docs job definition

    * Disable multiprocessing parallelism in unit tests

    This commit disables the multiprocessing based parallelism when running
    unittest jobs in CI. We historically have defaulted the use of
    multiprocessing in environments only where the "fork" start method is
    available because this has the best performance and has no caveats
    around how it is used by users (you don't need an
    `if __name__ == "__main__"` guard). However, the use of the "fork"
    method isn't always 100% reliable (see
    https://bugs.python.org/issue40379), which we saw on Python 3.9 #6188.
    In unittest CI (and tox) by default we use stestr which spawns (not using
    fork) parallel workers to run tests in parallel. With this PR this means
    in unittest we're now running multiple test runner subprocesses, which
    are executing parallel dispatched code using multiprocessing's fork
    start method, which is executing multithreaded rust code. This three layers
    of nesting is fairly reliably hanging as Python's fork doesn't seem to
    be able to handle this many layers of nested parallelism. There are 2
    ways I've been able to fix this, the first is to change the start method
    used by `parallel_map()` to either "spawn" or "forkserver" either of
    these does not suffer from random hanging. However, doing this in the
    unittest context causes significant overhead and slows down test
    executing significantly. The other is to just disable the
    multiprocessing which fixes the hanging and doesn't impact runtime
    performance signifcantly (and might actually help in CI so we're not
    oversubscribing the limited resources.

    As I have not been able to reproduce `parallel_map()` hanging in
    a standalone context with multithreaded stochastic swap this commit opts
    for just disabling multiprocessing in CI and documenting the known issue
    in the release notes as this is the simpler solution. It's unlikely that
    users will nest parallel processes as it typically hurts performance
    (and parallel_map() actively guards against it), we only did it in
    testing previously because the tests which relied on it were a small
    portion of the test suite (roughly 65 tests) and typically did not have
    a significant impact on the total throughput of the test suite.

    * Fix typo in azure pipelines config

    * Remove unecessary extension compilation for image tests

    * Add test script to explicitly verify parallel dispatch

    In an earlier commit we disabled the use of parallel dispatch in
    parallel_map() to avoid a bug in cpython associated with their fork()
    based subprocess launch. Doing this works around the bug which was
    reliably triggered by running multiprocessing in parallel subprocesses.
    It also has the side benefit of providing a ~2x speed up for test suite
    execution in CI. However, this meant we lost our test coverage in CI for
    running parallel_map() with actual multiprocessing based parallel
    dispatch. To ensure we don't inadvertandtly regress this code path
    moving forward this commit adds a dedicated test script which runs a
    simple transpilation in parallel and verifies that everything works as
    expected with the default parallelism settings.

    * Avoid multi-threading when run in a multiprocessing context

    This commit adds a switch on running between a single threaded and a
    multithreaded variant of the swap_trials loop based on whether the
    QISKIT_IN_PARALLEL flag is set. If QISKIT_IN_PARALLEL is set to TRUE
    this means the `parallel_map()` function is running in the outer python
    context and we're running in multiprocessing already. This means we do
    not want to be running in multiple threads generally as that will lead
    to potential resource exhaustion by spawn n processes each potentially
    running with m threads where `n` is `min(num_phys_cpus, num_tasks)` and
    `m` is num_logical_cpus (although only
    `min(num_logical_cpus, num_trials)` will be active) which on the typical
    system there aren't enough cores to leverage both multiprocessing and
    multithreading. However, in case a user does have such an environment
    they can set the `QISKIT_FORCE_THREADS` env variable to `TRUE` which
    will use threading regardless of the status of `QISKIT_IN_PARALLEL`.

    * Apply suggestions from code review

    Co-authored-by: Jake Lishman <jake@binhbar.com>

    * Minor fixes from review comments

    This commits fixes some minor details found during code review. It
    expands the section on building from source to explain how to build a
    release optimized binary with editable mode, makes the QISKIT_PARALLEL
    env variable usage consistent across all jobs, and adds a missing
    shebang to the `install_rush.sh` script which is used to install rust in
    the manylinux container environment.

    * Simplify tox configuration

    In earlier commits the tox configuration was changed to try and fix the
    docs CI job by going to great effort to try and enforce that
    setuptools-rust was installed in all situations, even before it was
    actually needed. However, the problem with the docs ci job was unrelated
    to the tox configuration and this reverts the configuration to something
    that works with more versions of tox and setuptools-rust.

    * Add missing pieces of cargo configuration

    Co-authored-by: Jake Lishman <jake@binhbar.com>
    Co-authored-by: georgios-ts <45130028+georgios-ts@users.noreply.github.com>
    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit 44f794aa7afa545900bda4ed361a7a27d71dff4f
Author: Edwin Navarro <enavarro@comcast.net>
Date:   Sat Feb 26 13:56:55 2022 -0800

    Fix display of sidetext gates with conditions (#7673)

    * First testing

    * Fix display of sidetext gates with conditions in text

    * Add comment

    * Start it up again

    * Add mpl and latex tests

    * Add cu1 and rzz tests

    * Start it up again

    * Break out RZZ and CU1

    * Restart

commit 5b53a15d047b51079b8d8269967514fd34ab8d81
Author: Ikko Hamamura <ikkoham@users.noreply.github.com>
Date:   Sat Feb 26 08:05:57 2022 +0900

    Fix endianness in result.mitigator (#7689)

    * fix endian

    * add a release note

    * Reword release note

    * Remove debugging print

    Co-authored-by: Jake Lishman <jake.lishman@ibm.com>
    Co-authored-by: Jake Lishman <jake@binhbar.com>
    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit 6d3a4f8f9ebbc55f2f6f6ecf01b49ffed800db53
Author: Lolcroc <Lolcroc@users.noreply.github.com>
Date:   Fri Feb 25 17:20:45 2022 +0100

    Add `__slots__` for `Bit` subclasses (#7708)

    * Add __slots__ for Bit subclasses

    * Add release note

    Co-authored-by: Jake Lishman <jake.lishman@ibm.com>

commit 15a109e05f6ecb4388512b428b81adb709847244
Author: Daniel J. Egger <38065505+eggerdj@users.noreply.github.com>
Date:   Fri Feb 25 01:38:43 2022 +0100

    Parameters in InstructionDurations. (#7321)

    * * First draft of the instruction duration odification.

    * * Adding suggestion by Itoko

    * * Fix bug where duration and parameters were switched

    * * Remove None from tests.

    * * black.

    * * Added check on None duration.

    * * Added test.

    * * Reno

    * * Test fix.

    * * Moved test and updated reno.

    * * Docstring.

    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit 2600586c9caa019634b6219c6e58bc4b640c680d
Author: Alexander Ivrii <alexi@il.ibm.com>
Date:   Thu Feb 24 23:19:06 2022 +0200

    Define LinearFunction class and collect blocks of gates that make a LinearFunction (#7361)

    * Implementing LinearFunction gate and a transpiler pass that collects a sequence of linear gates into a LinearFunction

    * removing test file

    * running black and pylint

    * Reimplementing linear function to inherit from Gate; adding tests for linear functions

    * adding tests for CollectLinearFunctions transpiler optimization pass

    * Improving tests for CollectLinearFunctions pass

    * style

    * style

    * Adding LinearFunction to exclude in test_gate_definitions

    * Normalizing internal representation to numpy array format; adding example of a linear matrix

    * using find_bit command

    * removing stray pylint comment

    * fixing lint error

    * adding a comment

    * Update qiskit/circuit/library/generalized_gates/linear_function.py

    Co-authored-by: Kevin Krsulich <kevin@krsulich.net>

    * adding a transpiler pass that synthesizes linear functions, and updating tests

    * renaming; changing default behavior to copy

    * Adding comments regarding synthesis and big-endian

    * adding transpiler passes to synthesize linear functions and to promote them to permutations whenever possible

    * adding release notes

    * code improvements following the review

    * adding explicit reference to pdf

    * removing redundant is_permutation check

    * First pass over comments in the review

    * minor tweak to release notes

    * trying to get links in release notes to work

    * trying to get links in release notes to work

    * pass over documentation

    * improving tests

    * treating other review comments

    * removing accidentally added param

    * Use specific testing assertions

    * Specific assertion stragglers

    * changing the assert

    Co-authored-by: Kevin Krsulich <kevin@krsulich.net>
    Co-authored-by: Jake Lishman <jake@binhbar.com>

commit 7cab49fb7f223798d31ff295f1b9d0b2f7e15fed
Author: Matthew Treinish <mtreinish@kortar.org>
Date:   Thu Feb 24 07:09:27 2022 -0500

    Use VF2Layout in all preset passmanagers (#7213)

    * Use VF2Layout in all preset passmanagers

    With the introduction of the VF2Layout pass we now have a very fast
    method of searching for a perfect layout. Previously we only had the
    CSPLayout method for doing this which could be quite slow and we only
    used in level 2 and level 3. Since VF2Layout operates quickly adding the
    pass to each preset pass manager makes sense so we always use a perfect
    layout if available (or unless a user explicitly specifies an initial
    layout or layout method). This commit makes this change and adds
    VF2Layout to each optimization level and uses a perfect layout if found
    by default.

    Fixes #7156

    * Revert changes to level 0

    For optimization level 0 we don't actually want to use VF2Layout because
    while it can find a perfect layout it would be potentially surprising to
    users that the level which is supposed to have no optimizations picks a
    non-trivial layout.

    * Set seed on perfect layout test

    * Fix test failures and unexpected change in behavior

    This commit makes several changes to fix both unexpected changes in
    behavior and also update the default behavior of the vf2 layout pass.
    The first issues is that the vf2 pass was raising an exception if it was
    run with >2 qubit gates. This caused issues if we run with calibrations
    (or backends that support >2q gates) as vf2 layout is used as an
    opportunistic thing and if there is for example a 5q gate being used
    we shouldn't fail the entire transpile just because vf2 can't deal with
    it. It's only an issue if the later passes can't either, it just means
    vf2 won't be able to find a perfect layout. The second change is around
    the default seeding. For the preset pass managers to have a consistent
    output this removed the randomization if no seed is specified and just
    use an in order comparison. This is necessary to have a consistent
    layout for testing and reproducability, while we can set a seed
    everywhere, the previous behavior was more stable as it would default to
    trivial layout most of the time (assuming that was perfect). When we add
    multiple vf2 trials and are picking the best choice among those for a
    given time budget we can add back in the default seed randomization. The
    last change made here is that several tests implicitly expected a
    trivial layout (mainly around device aware transpilation or
    calibrations). In those cases the transpile wasn't valid for an
    arbitrary layout, for example if a calibrated gate is only defined on a
    single qubit. Using vf2 layout in those cases doesn't work because the
    gate is only defined on a single qubit so picking a non-trivial layout
    correctly errors. To fix these cases the tests are updated to explicitly
    state they require a trivial layout instead of assuming the transpiler
    will implicitly give them that if it's a perfect layout.

    * Add missing release note

    * Apply suggestions from code review in release note

    Co-authored-by: Jake Lishman <jake@binhbar.com>

    * Make vf2 layout stop reason an Enum

    * Update releasenotes/notes/vf2layout-preset-passmanager-db46513a24e79aa9.yaml

    Co-authored-by: Kevin Krsulich <kevin@krsulich.net>

    * Add back initial layout to level1

    Since there seems to be a pretty baked in asumption for level 1 that it
    will use the trivial layout by default if it's a perfect mapping. This
    was causing the majority of the test failures and might be an unexpected
    breakage for people. However, in a future release we should remove this
    (likely when vf2layout is made noise aware). To anticipate this a
    FutureWarning is emitted when a trivial layout is used to indicate that
    this behavior will change in the future for level 1 and if you're
    relying on it you should explicitly set the layout_method='trivial'.

    * Tweak seeds to reduce effect of noise on fake_yorktown with aer

    * Update release note

    * Use id_order=True on vf2_mapping() for VF2Layout pass

    Using id_order=False orders the nodes by degree which biases the
    mapping found by vf2 towards nodes with higher connectivity which
    typically have higher error rates. Until the pass is made noise aware
    to counter this bias we should just use id_order=True which uses the
    node id for the order (which roughly matches insertion order) which
    won't have this bias in the results.

    * Revert "Use id_order=True on vf2_mapping() for VF2Layout pass"

    VF2 without the Vf2++ heuristic is too slow for some common use cases
    that we probably don't want to use it by default. Instead we should use
    some techniques to improve the quality of the results. The first
    approach will be applying a score heuristic to a found mapping. A
    potential follow on after that could be to do some pre-filtering of
    noisy nodes.

    This reverts commit 53d54c3a3648288e40d3ae21e76206f88cb7b981.

    * Set real limits on calling vf2 and add quality heuristic

    This commit adds options to set limits on the vf2 pass, both the
    internal call limit for the vf2 execution in retworkx, a total time
    spent in the pass trying multiple layouts, and the number of trials to
    attempt. These are then set in the preset pass manager to ensure we
    don't sit spinning on vf2 forever in the real world. While the pass is
    generally fast there are edge cases where it can get stuck. At the same
    time this adds a rough quality heuristic (based on readout error falling
    back to connectivity) to select between multiple mappings found by
    retworkx. This addresses the poor quality results we were getting with
    vf2++ in earlier revisions as we can find the best from multiple
    mappings.

    * Remove initial layout default from level 1

    Now that vf2 layout has limited noise awareness and multiple trials it
    no longer will be defaulting to the worst qubits like it was with only a
    single sample when vf2++ is used. This commit removes the implicit
    trivial layout attempt as it's no longer needed.

    * Remove unused imports

    * Use vigo instead of yorktown for oracle tests

    * Revert "Remove initial layout default from level 1"

    This is breaking the pulse tutorials, as fixing that is a more
    involved change and probably indicates we should continue to
    assume trivial layout by default if perfect for level 1 and raise a
    warning to users that it's going away to give everyone who needs a
    default trivial layout time to adjust their code. This commit reverts
    only using vf2 for level 1 and adds back in a trivial layout stage.

    This reverts commit df478357b9c59fe88f8d464c361de7a2e0c03976.
    This reverts commit 65ae6ee0da156ecb1659e0abfb0b19df3bbbd367.

    * Fix warning and update docs to not emit one

    This commit fixes the warning so it's only emitted if the trivial layout
    is used, previously it would also be emitted if an initial layout was
    set. The the docs are updated to not emit the warning, in some cases
    code examples are updated to explicitly use a trivial layout if that's
    what's needed. In others there were needless jupyter-execute directives
    being used when there was no visualization and a code-block is just as
    effective (which avoids the execution during doc builds).

    * Add debug logging and fix heurstic usage

    * Add better test coverage of new pass features

    * Only run a single trial if the graphs are the same size

    If the interaction graph and the coupling graph are the same size
    currently the score heuristic will produce the same results since they
    just look at the sum of qubit noise (or degree). We don't need to run
    multiple trials or bother scoring things since we'll just pick the first
    mapping anyway.

    * Fix rebase error

    * Use enum type for stop reason condition

    * Add comment about call_limit value

    * Undo unecessary seed change

    * Add back default seed randomization

    After all the improvements to the VF2Layout pass in #7276 this was no
    longer needed. It was added back prior to #7276 where the vf2 layout
    pass was not behaving well for simple cases.

    * Permute operator bits based on layout

    The backendv2 transpilation tests were failing with vf2 layout enabled
    by default because we were no longer guaranteed to get an initial layout
    by default. The tests were checking for an equivalent operator between
    the output circuit and the input one. However, the use of vf2layout was
    potentially changing the bit order (especially at higher optimization
    levels) in the operator because a non trivial layout was selected. This
    caused the tests to fail. This commit fixes the test failures by adding
    a helper function to permute the qubits back based on the layout
    property in the transpiled circuit.

    * Fix docs build

    * Remove warning on use of TrivialLayout in opt level 1

    The warning which was being emitted by an optimization level 1 transpile()
    if a trivial layout was used was decided to be too potentially noisy for
    users, especially because it wasn't directly actionable. For the first
    step of using vf2 layout everywhere we decided to leave level1 as trying
    a trivial layout first and then falling back to vf2 layout if the
    trivial layout isn't a perfect match. We'll investigate whether it makes
    sense in the future to change this behavior and come up with a migration
    plan when that happens.

    * Apply suggestions from code review

    Co-authored-by: Jake Lishman <jake@binhbar.com>

    * Cleanup inline comment numbering in preset pass manager modules

    * Fix lint

    * Remove operator_permuted_layout() from backendv2 tests

    This commit removes the custom operator_permuted_layout() function from
    the backendv2 tests. This function was written to permute the qubits
    based on the output layout from transpile() so it can be compared to the
    input circuit for equivalence. However, since this PR was first opened a
    new constructor method Operator.from_circuit() was added in #7616 to
    handle this directly in the Operator construction instead of doing it
    out of band. THis commit just leverages the new constructor instead of
    having a duplicate local test function.

    * Remove unused import

    Co-authored-by: Jake Lishman <jake@binhbar.com>
    Co-authored-by: Kevin Krsulich <kevin@krsulich.net>
    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit 9ec0c4eed3cf2f72e9f1f3c24233bfbe62df3511
Author: Omar Costa Hamido <omarcostinha@gmail.com>
Date:   Wed Feb 23 22:07:55 2022 +0000

    Use pulse configuration in fake Bogota, Rome, Manila and Santiago  (#7688)

    * Update fake_bogota.py

    - refer to the defs file and turn it into a FakePulseBackend

    * Update fake_manila.py

    - refer to the defs file and turn it into a FakePulseBackend

    * Update fake_rome.py

    - refer to the defs file and turn it into a FakePulseBackend

    * Update fake_santiago.py

    - refer to the defs file and turn it into a FakePulseBackend

    * Update fake_bogota.py

    - make sure we are using FakePulseLegacyBackend where it is needed.

    * Create bogota-manila-rome-santiago-as-fakepulsebackends-2907dec149997a27.yaml

    - add release notes

    * Update releasenotes/notes/bogota-manila-rome-santiago-as-fakepulsebackends-2907dec149997a27.yaml

    no need for prelude 🎹🎵 😕

    Co-authored-by: Matthew Treinish <mtreinish@kortar.org>

    * Update releasenotes/notes/bogota-manila-rome-santiago-as-fakepulsebackends-2907dec149997a27.yaml

    🧐 using proper notation.

    Co-authored-by: Matthew Treinish <mtreinish@kortar.org>

    * Fix typo

    Co-authored-by: Matthew Treinish <mtreinish@kortar.org>
    Co-authored-by: Jake Lishman <jake@binhbar.com>
    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit 3591fa635db75e3d7f1afea19359acb2fb3a6740
Author: Jake Lishman <jake.lishman@ibm.com>
Date:   Wed Feb 23 19:21:22 2022 +0000

    Rework `QuantumCircuit._append` and bit resolver (#7618)

    The previous resolver of indices for bits involved catching several
    exceptions even when resolving a valid specifier.  This is comparatively
    slow for inner-loop code.  The implementation also assumed that if a
    type could be cast to an integer, the only way it could be a valid
    specifier was as an index.  This broke for size-1 Numpy arrays, which
    can be cast to `int`, but should be treated as iterables.

    Since `QuantumCircuit.append` necessarily checks the types of all its
    arguments, it is unnecessary for `QuantumCircuit._append` to do so as
    well.  This also allows anywhere that is constructing a `QuantumCircuit`
    from known-safe data (such as copying from an existing circuit, or
    building templates) to do so without the checks.  This is now
    documented as its contract.

    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit 2a67dc1342517b88367f88475bff6a524f46d295
Author: Naoki Kanazawa <nkanazawa1989@gmail.com>
Date:   Thu Feb 24 02:37:53 2022 +0900

    Move QPY serializer to own module (#7582)

    * Move qpy to own module.

    qpy_serialization.py is splint into several files for maintenability. This commit also adds several bytes Enum classes for type keys in the header, and provides several helper functions. Some namedtuple class names are updated because, for example, INSTRUCTION will be vague when we add schedule, i.e. it's basically different program and has own instruction that has different data format. Basically CIRCUIT_ prefix is added to them.

    * manually cherry-pick #7584 with some cleanup

    - change qiskit.qpy.objects -> qiskit.qpy.binary_io
    - TUPLE -> SEQUENCE (we may use this for list in future)
    - add QpyError
    - add _write_register in circuit io to remove boilerplate code

    * respond to review comments
    - expose several private methods for backward compatibility
    - use options for symengine
    - rename alphanumeric -> value
    - rename write, read methods and remove alias
    - improve container read

    * remove import warning

    * replace alphanumeric with value in comments and messages.

    * private functions import

    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit a6cb1a37f52fd98f90d0685d1fdcba447a5345b7
Author: Naoki Kanazawa <nkanazawa1989@gmail.com>
Date:   Tue Feb 22 15:50:33 2022 +0900

    Fix ASAP/ALAP scheduling pass (#7655)

    * fix scheduling pass

    This commit updates both ASAP and ALAP passes not to allow measurement instructions to simultaneously write in the same register.
    In addition, delay appended after end of circuit is removed since this instruction has no effect.

    * Update behavior of passes

    Added `BaseScheduler` as a parent class of ASAP and ALAP passes. These scheduler can take two control parameters `clbit_write_latency` and `conditional_latency`. These represent I/O latency of clbits.
    In addition, delays in the very end of the scheduled circuit is readded because Dynamical Decoupling passes inserts echo sequence there. More unittests and reno are also added.

    * fix ASAP conditional bug

    The conditional bit start time was only looking cregs. But this should start right before the gate.

    Co-authored-by: Toshinari Itoko <itoko@jp.ibm.com>

    * respond to review comments
    - fix typo
    - update model drawing in comment
    - more comment

    * update documentation and add todo comment

    * update logic to insert conditional bit assert

    * add more docs on topological ordering

    * Update qiskit/transpiler/passes/scheduling/base_scheduler.py

    Co-authored-by: Toshinari Itoko <15028342+itoko@users.noreply.github.com>

    * lint fix

    Co-authored-by: Toshinari Itoko <itoko@jp.ibm.com>
    Co-authored-by: Toshinari Itoko <15028342+itoko@users.noreply.github.com>
    Co-authored-by: Ikko Hamamura <ikkoham@users.noreply.github.com>

commit 6eb12965fd2ab3e9a6816353467c8e1c4ad2a477
Author: Matthew Treinish <mtreinish@kortar.org>
Date:   Mon Feb 21 12:47:46 2022 -0500

    Bump minimum supported symengine version for built-in pickle support (#7682)

    * Bump minimum supported symengine version for built-in pickle support

    The new symengine 0.9 release added native support in the package for
    pickling symengine objects. Previously we had been converting symengine
    objects to sympy objects so we could pickle them. With native support
    for pickle in symengine now we no longer need this which besides
    removing unnecessary should hopefully make pickling (which we do
    internally as part of using multiprocessing) more reliable.

    This also seems to fix the hanging we were seeing with multiprocessing
    with Python 3.9 on Linux. While investigating that issue it points to
    the underlying cause being a bug in cPython with the `fork()` based
    start method, but we were only able to reliably trigger it after
    switching to symengine in #6270 and having to rely on importing
    symengine to pickle the symengine objects. Since we're no longer doing
    that after bumping the minimum symengine version this removes the
    default disabling of parallel dispatch with Python 3.9. While I'm not
    100% confident this fixes the bug, in my testing locally I haven't been
    able to reproduce the hang we were encountering (but this is ancedotal
    at best). If we do encounter issues with multiprocess hanging in the
    future we can look at rewriting the internals of `parallel_map()` or
    switching it back to disabled by default.

    Fixes #6188

    * Fix typos in release notes

    Co-authored-by: Jake Lishman <jake@binhbar.com>

    Co-authored-by: Jake Lishman <jake@binhbar.com>
    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit a185ee6e3dd8d1f87c773e4e894a458ed1930ad2
Author: Matthew Treinish <mtreinish@kortar.org>
Date:   Mon Feb 21 11:33:56 2022 -0500

    Add fake backends for new IBM Quantum systems (#7392)

    * Add fake backends for new IBM Quantum systems (#6808)

    This commit adds new fake backend classes for new IBM Quantum systems:
    Cairo, Hanoi, Kolkata, Nairobi, and Washington. Just as with the other
    fake backends these new classes contain snapshots of calibration and error
    data taken from the real system, and can be used for local testing,
    compilation and simulation.

    Legacy backends are not added for these new fake backends as the
    legacy backend interface is deprecated and will be removed in a future
    release so there is no need to expose that for the new backends (it was
    only added for compatibility testing on the old fake backends).

    * Update qiskit/test/mock/backends/washington/fake_washington.py

    Co-authored-by: Ali Javadi-Abhari <ajavadia@users.noreply.github.com>

    * Update releasenotes/notes/new-fake-backends-04ea9cb26374e385.yaml

    Co-authored-by: Luciano Bello <bel@zurich.ibm.com>

    * Fix lint

    Co-authored-by: Ali Javadi-Abhari <ajavadia@users.noreply.github.com>
    Co-authored-by: Luciano Bello <bel@zurich.ibm.com>
    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
mergify bot pushed a commit that referenced this pull request Mar 10, 2022
…entation (#7702)

* Replace pauli expectation value cython with rust implementation

This commit replaces the cython implementation of the pauli expectation
value functions with a multithreaded rust implementation. This was done
primarily for two reasons, the first and primary reason for this change
is because after #7658 this module was the only cython code left in the
qiskit-terra repository so unifying on a single compiled language will
reduce the maintanence burden in qiskit-terra. The second reason is
similar to the rationale in #7658 around why using rust over cython for
multi-threaded hybrid python module. The difference here though is
unlike in stochastic swap this module isn't as performance critical as
it's not nearly as widely used.

* Tune single threaded performance for rust sum

This commit tunes the sum for the single threaded path. Using the
iterator sum() method is very convienent but for the single threaded
path it doesn't create the most efficient output. This was causing a
regression in performance over the previous cython version. To address
that issue, this commit adds a new tuned function which does a chunked
sum which the compiler can handle better. It more closely models how
we'd do this with vectorized SIMD instructions. As a future step we can
look at using simdeez https://github.com/jackmott/simdeez
to further optimize this by doing runtime CPU feature detection and
leveraging SIMD instrinsics (we might want to look at using `fast_sum()`
in the multithreaded path if we do that too).

* Add release notes

* Fix lint

* Add docstring and signature to rust functions

* Define parallel threshold as a constant

* Add attribution comment to fast_sum()

* Rename eval_parallel_env -> getenv_use_multiple_threads

* Use inline literal type for size

Co-authored-by: Kevin Hartman <kevin@hart.mn>

* Add overflow check on num_qubits

The functions only work for at most for number of qubits < usize bits
anything larger would cause an overflow. While rust provides overflow
checking in debug mode it disables this for performance in release mode.
Sice we ship binaries in release mode this commit adds an overflow check
for the num_qubits argument to ensure that we don't overflow and produce
incorrect results.

* Remove unecessary setup_requires field from setup.py

The setup_requires field in the setup.py is deprecated and
has been superseded by the pyproject.toml to define build
system dependencies. Since we're already relying on the
pyproject.toml to install setuptools-rust for us having the
setup_requires line will do nothing but potentially cause
issues as it will use an older install mechanism that will potentially conflict with  people's environments.

* Drop `.iter().take(LANES)`.

* Fix typo.

Co-authored-by: Kevin Hartman <kevin@hart.mn>
mtreinish added a commit to Qiskit/qiskit-metapackage that referenced this pull request Mar 23, 2022
* Update building from source instructions

With Qiskit/qiskit#7658 and Qiskit/qiskit#7702 not far
behind the requiremetns for building terra from source will be changed.
A C++ compiler is no longer required and instead a rust compiler is
needed. This commit updates the instructions on building from source and
also removes so old out of date notes from the document at the same
time.

* Apply suggestions from code review

* Consistently capitalise "Rust"

* Add section on modifying rust extension

* Fix typos

* Empty-Commit to retrigger ci after outage

Co-authored-by: Jake Lishman <jake@binhbar.com>
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Oct 19, 2022
In Qiskit#7658 we disabled multiprocessing as part of unittest runs in CI
because the multiple layers of parallelism (subprocess from stestr,
multiprocessing from qiskit, and multithreading from rust in qiskit)
were triggering a latent bug in python's multiprocessing implementation
that was blocking CI. To counter the lost coverage from disabling
multiprocessing in that PR we added a script verify_parallel_map which
force enabled multiprocessing and validated transpile() run in parallel
executed correctly. However, in Qiskit#8952 we introduced a model for
selectively enabling multiprocessing just for a single test method. This
should allow us to avoid the stochastic failure triggering the deadlock
in python's multiprocessing by overloading parallelism but still test in
isolation that parallel_map() works.

This commit builds on the test class introduced in Qiskit#8952 and adds
identical test cases to what was previously in verify_parallel_map.py to
move that coverage into the unit test suite. Then the
verify_parallel_map script is removed and all callers are updated to
just run unit tests instead of also executing that script.
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Oct 20, 2022
In Qiskit#7658 we disabled multiprocessing as part of unittest runs in CI
because the multiple layers of parallelism (subprocess from stestr,
multiprocessing from qiskit, and multithreading from rust in qiskit)
were triggering a latent bug in python's multiprocessing implementation
that was blocking CI. To counter the lost coverage from disabling
multiprocessing in that PR we added a script verify_parallel_map which
force enabled multiprocessing and validated transpile() run in parallel
executed correctly. However, in Qiskit#8952 we introduced a model for
selectively enabling multiprocessing just for a single test method. This
should allow us to avoid the stochastic failure triggering the deadlock
in python's multiprocessing by overloading parallelism but still test in
isolation that parallel_map() works.

This commit builds on the test class introduced in Qiskit#8952 and adds
identical test cases to what was previously in verify_parallel_map.py to
move that coverage into the unit test suite. Then the
verify_parallel_map script is removed and all callers are updated to
just run unit tests instead of also executing that script.
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Oct 20, 2022
In Qiskit#7658 we disabled multiprocessing as part of unittest runs in CI
because the multiple layers of parallelism (subprocess from stestr,
multiprocessing from qiskit, and multithreading from rust in qiskit)
were triggering a latent bug in python's multiprocessing implementation
that was blocking CI. To counter the lost coverage from disabling
multiprocessing in that PR we added a script verify_parallel_map which
force enabled multiprocessing and validated transpile() run in parallel
executed correctly. However, in Qiskit#8952 we introduced a model for
selectively enabling multiprocessing just for a single test method. This
should allow us to avoid the stochastic failure triggering the deadlock
in python's multiprocessing by overloading parallelism but still test in
isolation that parallel_map() works.

This commit builds on the test class introduced in Qiskit#8952 and adds
identical test cases to what was previously in verify_parallel_map.py to
move that coverage into the unit test suite. Then the
verify_parallel_map script is removed and all callers are updated to
just run unit tests instead of also executing that script.
mergify bot added a commit that referenced this pull request Nov 1, 2022
In #7658 we disabled multiprocessing as part of unittest runs in CI
because the multiple layers of parallelism (subprocess from stestr,
multiprocessing from qiskit, and multithreading from rust in qiskit)
were triggering a latent bug in python's multiprocessing implementation
that was blocking CI. To counter the lost coverage from disabling
multiprocessing in that PR we added a script verify_parallel_map which
force enabled multiprocessing and validated transpile() run in parallel
executed correctly. However, in #8952 we introduced a model for
selectively enabling multiprocessing just for a single test method. This
should allow us to avoid the stochastic failure triggering the deadlock
in python's multiprocessing by overloading parallelism but still test in
isolation that parallel_map() works.

This commit builds on the test class introduced in #8952 and adds
identical test cases to what was previously in verify_parallel_map.py to
move that coverage into the unit test suite. Then the
verify_parallel_map script is removed and all callers are updated to
just run unit tests instead of also executing that script.

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
jakelishman pushed a commit to jakelishman/qiskit-terra that referenced this pull request Aug 11, 2023
…kage#1437)

This commit updates the asv configuration to support two recent changes
in the terra repo. The first is updating the supported python version
list to reflect the current versions supported by terra. Python 3.6 is
no longer supported and python 3.10 is now supported. Additionally,
after Qiskit#7658 merged setuptools-rust and Rust are now
being used to build compiled extensions. While cython is still being
used, it's use will be removed soon with Qiskit#7702. This
commit updates the build configuration to build the rust extension and
then build a wheel from it instead of building the cython extension.
jakelishman added a commit to jakelishman/qiskit-terra that referenced this pull request Aug 11, 2023
)

* Update building from source instructions

With Qiskit#7658 and Qiskit#7702 not far
behind the requiremetns for building terra from source will be changed.
A C++ compiler is no longer required and instead a rust compiler is
needed. This commit updates the instructions on building from source and
also removes so old out of date notes from the document at the same
time.

* Apply suggestions from code review

* Consistently capitalise "Rust"

* Add section on modifying rust extension

* Fix typos

* Empty-Commit to retrigger ci after outage

Co-authored-by: Jake Lishman <jake@binhbar.com>
SamD-1998 pushed a commit to SamD-1998/qiskit-terra that referenced this pull request Sep 7, 2023
…kage#1437)

This commit updates the asv configuration to support two recent changes
in the terra repo. The first is updating the supported python version
list to reflect the current versions supported by terra. Python 3.6 is
no longer supported and python 3.10 is now supported. Additionally,
after Qiskit#7658 merged setuptools-rust and Rust are now
being used to build compiled extensions. While cython is still being
used, it's use will be removed soon with Qiskit#7702. This
commit updates the build configuration to build the rust extension and
then build a wheel from it instead of building the cython extension.
SamD-1998 pushed a commit to SamD-1998/qiskit-terra that referenced this pull request Sep 7, 2023
)

* Update building from source instructions

With Qiskit#7658 and Qiskit#7702 not far
behind the requiremetns for building terra from source will be changed.
A C++ compiler is no longer required and instead a rust compiler is
needed. This commit updates the instructions on building from source and
also removes so old out of date notes from the document at the same
time.

* Apply suggestions from code review

* Consistently capitalise "Rust"

* Add section on modifying rust extension

* Fix typos

* Empty-Commit to retrigger ci after outage

Co-authored-by: Jake Lishman <jake@binhbar.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: API Change Include in the "Changed" section of the changelog Changelog: New Feature Include in the "Added" section of the changelog performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Execute trials in StochasticSwap in parallel
7 participants