Skip to content

Conversation

@jakelishman
Copy link
Member

@jakelishman jakelishman commented Jan 1, 2026

Summary

This is a complete rewrite of the default UnitarySynthesis plugin to vastly improve the efficiency of 2q decomposition.

This is good for approximately a 2x runtime improvement in the UnitarySynthesis pass, since before this commit, we were reconstructing each relevant 2q decomposer each time we had a new matrix to decompose. However, constructing a 2q KAK decomposer is about as expensive as using a constructed KAK decomposer on a single matrix. This commit caches the available decomposers for each encountered pair of qubits, and allows the cache to be persisted between calls to run_unitary_synthesis.

Before the move of the default unitary synthesis plugin to Rust, we effectively had decomposer caching for the "loose constraint" (basis gates + coupling map) hardware description, since we just chose a single decomposer on initialisation and used it throughout. The Target form in Python space cached at the qargs level, which meant that multiple unitaries on the same qargs pair would use the same set of decomposers, but each qargs pair would be calculated separately on first access, still (generally) leading to multiple constructions of the same decomposer. Both of these types of caching were lost in the move to Rust, but the effects were largely masked by the total runtime still being drastically better than the Python-space versions.

This new form reinstates all the previous caching, and additionally caches at the level of individual decomposer construction as well (by caching the arguments used to construct a decomposer), so that (mostly) homogeneous Targets will re-use the same decomposer whenever it is valid on more than one 2q link.

Details and comments

Built on top of #15491 - we fail that test with the errors left as-is because this new PR now (more correctly) passes on a giant error to the decomposer, which causes it to return separable gates and consequently fail the test. Previously, we were erroneously using a good error rate (from the reversed gate) even when outputting to a link with a crappy error. With the errors at more reasonable levels, we do the right thing. Note: this may indicate that we've accidentally made it too easy in the C API to default to the Python-space equivalent of approximation_degree=1.0 instead of the better default approximation_degree=None. We might want to revisit that / heavily document it in the C API.

I am currently using an ancient Macbook from 2015 (8GB of RAM and some naff Intel i5!), so I'm disinclined to make realistic timings til I'm back at a proper computer haha.


For follow-ups: I also have ideas on how to multithread all of this, and there are several TODOs left in the PR commenting on potential performance or logic improvements. I also suspect that there's performance gains available within the TwoQubitBasisDecomposer itself, since that was a very early and mechanical port to Rust.

@jakelishman jakelishman added this to the 2.4.0 milestone Jan 1, 2026
@jakelishman jakelishman requested a review from a team as a code owner January 1, 2026 21:48
@jakelishman jakelishman added Changelog: New Feature Include in the "Added" section of the changelog Rust This PR or issue is related to Rust code in the repository mod: transpiler Issues and PRs related to Transpiler labels Jan 1, 2026
@github-project-automation github-project-automation bot moved this to Ready in Qiskit 2.4 Jan 1, 2026
@qiskit-bot
Copy link
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core
  • @levbishop

@jakelishman
Copy link
Member Author

jakelishman commented Jan 1, 2026

Just as some vague-inkling timings, the script:

from qiskit.circuit import library as lib
from qiskit.converters import circuit_to_dag
from qiskit.transpiler import CouplingMap, Target, passes

num_qubits = 50
qv = lib.quantum_volume(num_qubits, num_qubits, seed=2026_01_01)
qv_dag = circuit_to_dag(qv, copy_operations=False)
cm = CouplingMap(
    list({tuple(qv.find_bit(q).index for q in inst.qubits): None for inst in qv})
)
cm.make_symmetric()
target = Target.from_configuration(basis_gates=["sx", "rz", "cx"], coupling_map=cm)
pass_ = passes.UnitarySynthesis(target=target)

%timeit pass_.run(qv_dag)

on this ancient laptop went from 150ms in PGO'd Qiskit 2.2.3 to 95ms non-PGO'd on this branch, which is 1.6x ish. (The "approximately 2x" in the commit message is tbf half-remembered from flamegraphs I made about 4 months ago.)

@coveralls
Copy link

coveralls commented Jan 1, 2026

Pull Request Test Coverage Report for Build 20666534892

Details

  • 1040 of 1085 (95.85%) changed or added relevant lines in 11 files are covered.
  • 52 unchanged lines in 6 files lost coverage.
  • Overall coverage decreased (-0.002%) to 88.309%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/circuit/src/operations.rs 4 5 80.0%
crates/circuit/src/packed_instruction.rs 9 11 81.82%
crates/transpiler/src/transpiler.rs 40 43 93.02%
crates/circuit/src/dag_circuit.rs 7 14 50.0%
crates/transpiler/src/passes/unitary_synthesis/mod.rs 382 393 97.2%
crates/transpiler/src/passes/unitary_synthesis/decomposers.rs 532 553 96.2%
Files with Coverage Reduction New Missed Lines %
qiskit/synthesis/two_qubit/xx_decompose/decomposer.py 2 89.23%
crates/transpiler/src/transpiler.rs 4 94.01%
crates/circuit/src/parameter/symbol_expr.rs 5 72.94%
crates/qasm2/src/lex.rs 5 92.29%
crates/synthesis/src/two_qubit_decompose.rs 10 90.46%
crates/transpiler/src/target/mod.rs 26 82.62%
Totals Coverage Status
Change from base Build 20663023597: -0.002%
Covered Lines: 96644
Relevant Lines: 109438

💛 - Coveralls

This is a complete rewrite of the default `UnitarySynthesis` plugin to
vastly improve the efficiency of 2q decomposition.

This is good for approximately a 2x runtime improvement in the
`UnitarySynthesis` pass, since before this commit, we were
reconstructing each relevant 2q decomposer each time we had a new matrix
to decompose.  However, constructing a 2q KAK decomposer is about as
expensive as using a constructed KAK decomposer on a single matrix.
This commit caches the available decomposers for each encountered pair
of qubits, and allows the cache to be persisted between calls to
`run_unitary_synthesis`.

Before the move of the default unitary synthesis plugin to Rust, we
effectively had decomposer caching for the "loose constraint" (basis
gates + coupling map) hardware description, since we just chose a single
decomposer on initialisation and used it throughout.  The `Target` form
in Python space cached at the `qargs` level, which meant that multiple
unitaries on the same qargs pair would use the same set of decomposers,
but each qargs pair would be calculated separately on first access,
still (generally) leading to multiple constructions of the same
decomposer.  Both of these types of caching were lost in the move to
Rust, but the effects were largely masked by the total runtime still
being drastically better than the Python-space versions.

This new form reinstates all the previous caching, and additionally
caches at the level of individual decomposer construction as well (by
caching the arguments used to construct a decomposer), so that (mostly)
homogeneous `Target`s will re-use the same decomposer whenever it is
valid on more than one 2q link.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Changelog: New Feature Include in the "Added" section of the changelog mod: transpiler Issues and PRs related to Transpiler performance Rust This PR or issue is related to Rust code in the repository

Projects

Status: Ready

Development

Successfully merging this pull request may close these issues.

3 participants