15 techniques tested on real hardware across three quantum processors. Only readout correction achieves chemical accuracy — and more techniques doesn't mean better results.
Quantum computers make errors on every shot. For a simple H₂ VQE circuit, the ideal result is a clean 50/50 split between |01⟩ and |10⟩. Real hardware leaks probability into the wrong states — and that leak corrupts the energy.
The detector misreads 0 as 1 (or vice versa). Largest error source.
Imperfect control pulses. Each gate adds a small rotation error.
The qubit loses its quantum state over time (T1 decay, T2 dephasing).
A blurry photograph — Every measurement is like taking a photo with a shaky camera. Error mitigation is computational image stabilization — you can't eliminate the shake, but you can mathematically correct for it if you know how the camera shakes.
We ran 12 gate-folding experiments on Tuna-9, tripling and quintupling the CNOT count. The error barely changed — proving that gate noise contributes less than 5% of total error. Readout errors dominate at 80%.
This explains everything: techniques that fix readout (TREX, REM) achieve chemical accuracy. Techniques that fix gates (ZNE, DD) barely help. And combining both adds overhead without proportional benefit.
All 11 configurations we tested on H₂ VQE (R=0.735 Å), ranked from best to worst. The dashed green line marks chemical accuracy (1 kcal/mol). Hover for details.
The kitchen-sink fallacy — Adding more mitigation techniques is like adding more cooks — past a point, they get in each other's way. TREX alone (0.22 kcal/mol) beats TREX + DD + Twirling (10 kcal/mol) by 45x.
The four main techniques, each attacking a different aspect of quantum noise.
Discard measurement shots that violate a known symmetry (e.g., parity conservation).
Measure known states (|0⟩, |1⟩) to build a confusion matrix, then invert it to correct all subsequent measurements.
Randomly insert X gates before measurement across many shots, then classically undo the randomization. Averages out readout asymmetry automatically.
Run the circuit at multiple amplified noise levels (by inserting extra gates), then extrapolate back to the zero-noise limit.
The simplest technique. H₂'s ground state has one electron spin-up and one spin-down (odd parity). Any shot measuring both qubits the same way is noise — throw it away.
Barely loses data
Just above chemical accuracy
Not enough on its own
Measure known states to learn the detector's error pattern, then mathematically invert it. The key insight: readout errors are highly asymmetric — |1⟩→|0⟩ flips are 10x more common than |0⟩→|1⟩.
Calibrating a bathroom scale — If your scale always reads 2 kg too heavy, you can correct every future measurement by subtracting 2. Readout error mitigation does the same thing for quantum measurements — but the “bias” is different for |0⟩ and |1⟩.
Zero-noise extrapolation (ZNE) is theoretically elegant: amplify gate noise by repeating gates, measure at multiple noise levels, extrapolate to zero. But on our circuits, it fails — because gates aren't the problem.
12.84 kcal/mol on IBM. 7.24 kcal/mol best extrapolation on Tuna-9. Not useful when readout dominates.
TREX alone: 0.22. TREX + DD: 1.33. TREX + DD + Twirl: 10.0. Each addition degraded performance by 6-45x.
TREX 4K shots: 0.22 kcal/mol. TREX 16K shots: 3.77 kcal/mol. The noise is systematic, not statistical.
Different processors have different noise profiles — and different optimal mitigation strategies. The best technique on one chip may not transfer.
Built-in TREX handles readout correction automatically. Adding more techniques (DD, twirling) only adds overhead.
No built-in TREX — manual confusion matrix calibration + parity post-selection reaches chemical accuracy.
Highest raw gate fidelity (99.82% RB). REM calibration data collected. Full VQE mitigation comparison pending.
Different hospitals, different treatments — A treatment that works at one hospital may not work at another with different equipment. Quantum error mitigation is the same — you need to diagnose each processor individually.
Our unique finding: the ratio of Hamiltonian coefficients |g₁|/|g₄| predicts how badly readout errors corrupt the final energy. When this ratio exceeds ~5, even the best mitigation can't achieve chemical accuracy.
Below threshold. TREX achieves 0.22 kcal/mol — 119x improvement over raw. The Z-coefficient (g₁) is only 4.4x larger than the entangling coefficient (g₄), so readout errors in the Z measurement get moderately amplified.
Above threshold. Best result is 4.45 kcal/mol (IBM) / 4.44 (Tuna-9) — confirmed across platforms. 1.8x higher ratio → 20x worse error. The asymmetric electron distribution amplifies Z-basis readout errors.
This means that for larger molecules, the Hamiltonian structure itself determines whether current hardware can produce useful results — before you even consider the noise level. Tapering and basis rotation to minimize this ratio could be a path to chemical accuracy on harder problems.
The 1 kcal/mol threshold — accuracy needed for quantum chemistry to be practically useful.
A calibration matrix measuring P(measured state | prepared state). Asymmetric errors are common: |1⟩→|0⟩ flips are much more frequent than |0⟩→|1⟩.
Twirled Readout EXtraction. IBM's technique that randomizes measurement basis across shots and classically corrects. resilience_level=1.
Discarding shots that violate known physical constraints (parity, particle number). Trades data for accuracy.
Sequences of identity-equivalent gate pairs during idle periods to refocus environmental noise. Effective for long idle times.
Randomly conjugating noisy gates with Pauli gates to convert coherent errors into stochastic (easier to handle) errors.
Zero-noise extrapolation. Amplify noise by gate folding (G → G G† G), measure at multiple levels, extrapolate to zero noise.
The |1⟩→|0⟩ error rate is typically much higher than |0⟩→|1⟩, because excited states can decay during readout.
Replacing a gate G with G G† G (three gates, same ideal effect, 3x gate noise) to amplify noise for ZNE.
Our finding: the ratio |g₁|/|g₄| in the Hamiltonian predicts how much readout errors get amplified into energy errors.
[1]Sagastizabal et al., "Error mitigation by symmetry verification on a variational quantum eigensolver," Phys. Rev. A 100, 010302(R) (2019)
[2]Kandala et al., "Hardware-efficient variational quantum eigensolver for small molecules," Nature 549, 242-246 (2017)
[3]Peruzzo et al., "A variational eigenvalue solver on a photonic quantum processor," Nat. Commun. 5, 4213 (2014)
[4]Kim et al., "Evidence for the utility of quantum computing before fault tolerance," Nature 618, 500-505 (2023)
[5]van den Berg et al., "Probabilistic error cancellation with sparse Pauli-Lindblad models," Nat. Phys. 19, 1116-1121 (2023)
[6]Our experimental data: 100+ runs across IBM Torino, QI Tuna-9, and IQM Garnet (2025-2026)
Current quantum processors are noisy — gate errors, measurement errors, and decoherence corrupt results. Error mitigation techniques reduce this noise without the overhead of full quantum error correction. They work by running extra circuits and post-processing the results to extract a better estimate of the ideal answer.
We tested 15 techniques on real hardware: readout error mitigation (REM), zero-noise extrapolation (ZNE), probabilistic error cancellation (PEC), Pauli twirling, symmetry verification, and more. Results vary dramatically by platform: IBM's built-in TREX achieves 119x improvement, while ZNE fails entirely on Tuna-9's native gate set.
Every result shown here comes from actual quantum experiments — no simulated noise models. The rankings reflect what works in practice, not in theory.