TUSQ Simulation Streamlines Noisy Quantum Circuit Computing
The exponentially faster TUSQ simulator revolutionises noisy quantum circuit modelling.
Quantum computing could change health and materials research. The high cost and long wait periods of actual quantum gear remain a major impediment. Complex simulators that accurately and scalablely simulate quantum circuits on noisy quantum hardware are needed.
Siddharth Dangwal, Tina Oberoi, and Ajay Sailopal of the University of Chicago and their colleagues developed TUSQ—Tracking, Uncomputation, and Sampling for Noisy Quantum Simulation—to address this important issue. By reducing needless computations and reusing computational resources, this novel strategy speeds up noisy quantum circuit simulations by unprecedented amounts.
Traditional quantum circuit simulation (QCS) struggles with noise. Noisy QCS uses stochastic processes, whereas State Vector Simulation (SVS) can do noiseless QCS by multiplying a quantum state vector with deterministic unitary matrices.
A basic SVS technique must repeat the matrix-vector multiplications for each sample to account for probabilistic noise effects. This causes an S-fold time overhead, where S is the sample count. DMS uses matrices to represent quantum states and matrix-matrix multiplications to account for noise in a single circuit execution. Due to its quadruple memory overhead compared to SVS, DMS is unsuitable for many qubits.
TUSQ, like CUDA-Q and Qiskit Statevector Simulator, simulates noise with many SVSs, reducing memory footprint four times over DMS. But it also addresses the time overhead. TUSQ's efficiency comes from its cutting-edge Error Characterisation Module (ECM) and Tree-based Execution Module (TEM). To provide faster noisy QCS, these modules locate and reduce unnecessary or irrelevant calculations.
ECM helps streamline circuit execution
The ECM analyses stochastic noise channel circuits to locate samples with the same output, reducing the number of circuit iterations. This involves two important steps:
Error Realisation (ER) Tallying: A noisy quantum circuit is a classical average of multiple circuits with stochastic channel-sampled “fixed noisy gates”. This is error realisation (ER) tallying. TUSQ tracks these "error realisations" (ERs). If an ER occurs s times, TUSQ replicates the circuit once and samples its output state vector s times. This is significantly cheaper than multiple simulations. Modern quantum computers, which have low error rates, contain more ERs with low Hamming weights, making this operate well. Beyond counting, ER Commutation finds scenarios when many ERs can produce the same output state vector. TUSQ uses Pauli gate and CNOT commutation rules to "push" noisy gates as far right as possible without affecting the noiseless circuit. If two ERs develop identical new ERs by combining their shot counts, the number of distinct circuits to simulate is reduced.
Tree-based Execution Module-TEM: Computation Reuse
TEM optimises the execution of the minimal set of circuits with distinct outputs the ECM has developed by taking advantage of computational reuse. The module includes:
Depth-first Tree Traversal (DFTT): TUSQ depicts circuits as a tree with nodes representing state vectors and edges gates. TUSQ uses rollback-recovery instead of precalculating each circuit, like classical computer design. It “uncomputes” the final state vector for a single circuit (a leaf node) using the inverse of gates before shifting to a new branch to calculate a new circuit. It greatly reduces unnecessary matrix-vector multiplications. Note that TUSQ assumes all gates, even noisy ones, are unitary, which has been proved for a wide range of realistic noise models, including depolarising, measurement, and Pauli-twirling approximations for decoherence. Makes this computation possible. Because it does not memorise intermediate states, TUSQ has a minimal memory footprint and can parallelise DFTT using available RAM. DFTT has an asymptotic advantage by reducing operations from a naive implementation to, where |E| is the number of edges and b is the number of noisy channel possibilities. Pruning: To boost speed, TUSQ discovers “insignificant circuits” that emerge rarely after ECM because they have little effect on output distribution. These branches are removed to speed up processing. Even with a small, controlled disturbance, this is necessary for efficiency. TUSQ samples a subset of inconsequential circuits with a large aggregate contribution and adjusts their probability to maintain output distribution contribution. User-defined hyperparameters can alter this trimming's average relative fidelity difference from 2.1% to 8.7%.
Incredible Performance and Impact
TUSQ optimisations boost performance dramatically. The simulator ran 186 benchmarks on an Nvidia A100 GPU, including QAOA, Adder, Bit Code, Phase Code, and GHZ circuits.
Performance highlights include:
Average 12.53x speedup over CUDA-Q and 52.5x over Qiskit. Larger benchmarks (>15 qubits) speed up 55.42x over Qiskit and 23.03x over CUDA-Q. TUSQ sometimes surpassed Qiskit by 7878.03x and CUDA-Q by 439.38x. Similar simulators require almost 10 hours to replicate a 30-qubit Adder circuit, yet TUSQ did it in 819.87 seconds. TUSQ outperformed the newly suggested TQSim simulator by 68.6x and 493.4x. Due to its depth-first traversal and uncomputation approach, TUSQ is faster than TQSim, which requires more memory and memoization.
TUSQ's speedup increases with qubits, but deeper circuits or higher error rates may decrease it since more branches become “significant” and pruning becomes less effective. TUSQ's expected 1% error rate is typical of existing systems, and future technologies will make it more viable.
The development of TUSQ, a powerful tool for modelling noisy quantum circuits, advances quantum computing research. Resource reuse and computational overhead control allow researchers to study larger and more complex quantum algorithms with TUSQ, speeding up the shift to scalable quantum processing. Plans call for an open-source TUSQ, which should boost innovation.













