-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Rewrite default UnitarySynthesis to cache decomposers
#15492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jakelishman
wants to merge
1
commit into
Qiskit:main
Choose a base branch
from
jakelishman:unitary-synthesis
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,935
−1,693
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Collaborator
|
One or more of the following people are relevant to this code:
|
Member
Author
|
Just as some vague-inkling timings, the script: from qiskit.circuit import library as lib
from qiskit.converters import circuit_to_dag
from qiskit.transpiler import CouplingMap, Target, passes
num_qubits = 50
qv = lib.quantum_volume(num_qubits, num_qubits, seed=2026_01_01)
qv_dag = circuit_to_dag(qv, copy_operations=False)
cm = CouplingMap(
list({tuple(qv.find_bit(q).index for q in inst.qubits): None for inst in qv})
)
cm.make_symmetric()
target = Target.from_configuration(basis_gates=["sx", "rz", "cx"], coupling_map=cm)
pass_ = passes.UnitarySynthesis(target=target)
%timeit pass_.run(qv_dag)on this ancient laptop went from 150ms in PGO'd Qiskit 2.2.3 to 95ms non-PGO'd on this branch, which is 1.6x ish. (The "approximately 2x" in the commit message is tbf half-remembered from flamegraphs I made about 4 months ago.) |
Pull Request Test Coverage Report for Build 20666534892Details
💛 - Coveralls |
This is a complete rewrite of the default `UnitarySynthesis` plugin to vastly improve the efficiency of 2q decomposition. This is good for approximately a 2x runtime improvement in the `UnitarySynthesis` pass, since before this commit, we were reconstructing each relevant 2q decomposer each time we had a new matrix to decompose. However, constructing a 2q KAK decomposer is about as expensive as using a constructed KAK decomposer on a single matrix. This commit caches the available decomposers for each encountered pair of qubits, and allows the cache to be persisted between calls to `run_unitary_synthesis`. Before the move of the default unitary synthesis plugin to Rust, we effectively had decomposer caching for the "loose constraint" (basis gates + coupling map) hardware description, since we just chose a single decomposer on initialisation and used it throughout. The `Target` form in Python space cached at the `qargs` level, which meant that multiple unitaries on the same qargs pair would use the same set of decomposers, but each qargs pair would be calculated separately on first access, still (generally) leading to multiple constructions of the same decomposer. Both of these types of caching were lost in the move to Rust, but the effects were largely masked by the total runtime still being drastically better than the Python-space versions. This new form reinstates all the previous caching, and additionally caches at the level of individual decomposer construction as well (by caching the arguments used to construct a decomposer), so that (mostly) homogeneous `Target`s will re-use the same decomposer whenever it is valid on more than one 2q link.
b675f05 to
f68172f
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Changelog: New Feature
Include in the "Added" section of the changelog
mod: transpiler
Issues and PRs related to Transpiler
performance
Rust
This PR or issue is related to Rust code in the repository
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This is a complete rewrite of the default
UnitarySynthesisplugin to vastly improve the efficiency of 2q decomposition.This is good for approximately a 2x runtime improvement in the
UnitarySynthesispass, since before this commit, we were reconstructing each relevant 2q decomposer each time we had a new matrix to decompose. However, constructing a 2q KAK decomposer is about as expensive as using a constructed KAK decomposer on a single matrix. This commit caches the available decomposers for each encountered pair of qubits, and allows the cache to be persisted between calls torun_unitary_synthesis.Before the move of the default unitary synthesis plugin to Rust, we effectively had decomposer caching for the "loose constraint" (basis gates + coupling map) hardware description, since we just chose a single decomposer on initialisation and used it throughout. The
Targetform in Python space cached at theqargslevel, which meant that multiple unitaries on the same qargs pair would use the same set of decomposers, but each qargs pair would be calculated separately on first access, still (generally) leading to multiple constructions of the same decomposer. Both of these types of caching were lost in the move to Rust, but the effects were largely masked by the total runtime still being drastically better than the Python-space versions.This new form reinstates all the previous caching, and additionally caches at the level of individual decomposer construction as well (by caching the arguments used to construct a decomposer), so that (mostly) homogeneous
Targets will re-use the same decomposer whenever it is valid on more than one 2q link.Details and comments
Built on top of #15491- we fail that test with the errors left as-is because this new PR now (more correctly) passes on a giant error to the decomposer, which causes it to return separable gates and consequently fail the test. Previously, we were erroneously using a good error rate (from the reversed gate) even when outputting to a link with a crappy error. With the errors at more reasonable levels, we do the right thing. Note: this may indicate that we've accidentally made it too easy in the C API to default to the Python-space equivalent ofapproximation_degree=1.0instead of the better defaultapproximation_degree=None. We might want to revisit that / heavily document it in the C API.I am currently using an ancient Macbook from 2015 (8GB of RAM and some naff Intel i5!), so I'm disinclined to make realistic timings til I'm back at a proper computer haha.
For follow-ups: I also have ideas on how to multithread all of this, and there are several TODOs left in the PR commenting on potential performance or logic improvements. I also suspect that there's performance gains available within the
TwoQubitBasisDecomposeritself, since that was a very early and mechanical port to Rust.