Diagnosing the problem takes 30 seconds. Skipping the diagnosis wastes hours.
We Kept Using the Same Error Fix. Then It Stopped Working.
IBM's error correction went from 119x improvement to 1.3x when we changed circuits. A 30-second diagnostic would have told us why.
After our error mitigation showdown, we had a clear winner: IBM's TREX. Set resilience_level=1 and your VQE results go from 26 kcal/mol error to 0.22 kcal/mol. A 119x improvement. Chemical accuracy in a single API parameter. We were thrilled.
So naturally, we used TREX on the next experiment too — a kicked Ising model from Kim et al. 2023. Same IBM Torino backend. Same resilience_level=1.
The improvement: 1.3x. Not 119x. One point three.
We'd been treating TREX like a universal fix. It isn't. And the reason is embarrassingly simple once you see it.
Aspirin Doesn't Fix a Broken Leg
TREX corrects readout errors — the tendency for the measurement hardware to misread |1〉 as |0〉. It's like noise-canceling headphones for the detector. On our VQE circuit (3 gates deep), readout error accounts for over 80% of total error. Fix readout, fix almost everything. Hence 119x.
The kicked Ising circuit is 40–80 gates deep. By the time the quantum state reaches the detector, it's already been scrambled by dozens of imperfect gates. Readout error is maybe 10% of the problem. TREX faithfully corrects that 10% and calls it a day. Hence 1.3x.
It's like taking aspirin for a broken leg. The aspirin works — your headache is gone. But your leg is still broken.
The Same Circuit, Two Diagnoses
| H2 VQE (shallow) | Kicked Ising (deep) | |
|---|---|---|
| Gate depth | 3 | 40–80 |
| What's actually broken | The detector (readout) | The computation (gates) |
| TREX (readout fix) | 119x better | 1.3x better |
| ZNE (gate noise fix) | Made it worse | 14x better (emulator) |
ZNE (zero-noise extrapolation) is the opposite tool: it fixes gate errors by running the circuit at multiple noise levels and extrapolating to zero. On our VQE circuit, ZNE made results worse — there's no gate-noise signal to extrapolate. On the deep Ising circuit, it's exactly the right medicine.
The 30-Second Diagnostic
Here's what we wish we'd done first. It takes 3 extra circuit submissions (~30 seconds of QPU time):
- Run your circuit normally (baseline)
- Run it with 3x the CNOT gates (insert CNOT-CNOT pairs that cancel out mathematically but add physical noise)
- Run it with 5x the CNOT gates
Then look at how the error changes:
If the error increases steadily (more gates = worse results): gate noise dominates. Use ZNE.
If the error stays flat (more gates barely matter): readout noise dominates. Use TREX or confusion matrix correction.
We ran this diagnostic on Tuna-9 for our VQE circuit:
- 1 CNOT: 7.70 kcal/mol
- 3 CNOTs: 8.62 kcal/mol
- 5 CNOTs: 6.86 kcal/mol
Flat. Even non-monotonic. Quadrupling the gate count changed the error by less than 1 kcal/mol. The remaining ~7 kcal/mol is all readout. This told us: TREX is correct, ZNE is pointless. Thirty seconds, three jobs, clear answer.
A Decision Tree
After running this diagnostic on multiple circuits across multiple backends, the pattern is consistent enough to be a rule:
| Your circuit | Dominant error | Use this | Skip this |
|---|---|---|---|
| <5 two-qubit gates | Readout | TREX, REM, confusion matrices | ZNE, DD (just adds overhead) |
| 5–20 two-qubit gates | Mixed | TREX + light ZNE together | Any single technique alone |
| >20 two-qubit gates | Gates | ZNE, DD, Pauli twirling | TREX alone (fixes 10% of problem) |
The predictor is two-qubit gate count, not qubit count and not backend. A 9-qubit circuit with 1 CNOT per qubit still benefits from TREX. A 2-qubit circuit with 40 CNOTs does not.
The Kim 2023 Results
Once we understood the regime, we matched the mitigation to the noise:
- Emulator with simulated noise: ZNE achieved 14.1x improvement on kicked Ising magnetization. The gate-noise signal was clean and monotonic — exactly what ZNE needs.
- Tuna-9 hardware: ZNE achieved 2.3x. Modest but meaningful, limited by the 9-qubit chip's noise characteristics.
- IBM Torino (5-qubit chain): ZNE achieved 3.1x. Better than TREX's 1.3x on the same circuit.
None of these are as dramatic as TREX's 119x on VQE. Deep circuits are just harder — gate errors compound exponentially with depth, and no amount of post-processing fully undoes that. But choosing the right mitigation gets you 3x instead of 1.3x, which can be the difference between a publishable result and noise.
The Broader Point
The quantum computing community tends to rank error mitigation techniques: "TREX is the best," "ZNE is state-of-the-art," "post-selection is too wasteful." Our data says these rankings are meaningless without context. The best technique depends entirely on what's actually breaking your circuit.
This is also the NISQ preview of a deeper truth. As quantum algorithms scale from VQE (shallow, readout-limited) to quantum dynamics (medium, gate-limited) to fault-tolerant computing (deep, logically protected), the optimal error strategy shifts from readout correction to gate-level mitigation to full quantum error correction. The 119x-to-1.3x cliff we hit is the first transition in that ladder.
Don't pick your mitigation based on what worked last time. Diagnose your circuit first. It takes 30 seconds.
VQE mitigation data: IBM mitigation ladder. Kim 2023 kicked Ising data: kicked Ising results. Gate-folding diagnostic: error mitigation showdown.
Sources & References
- IBM mitigation ladder (JSON)https://github.com/JDerekLomas/quantuminspire/blob/main/experiments/results/vqe-mitigation-ladder-001-ibm.json
- Kim 2023 Ising replication (JSON)https://github.com/JDerekLomas/quantuminspire/blob/main/experiments/results/kim2023-ising-tuna9.json
- Error mitigation showdown/blog/error-mitigation-showdown
- Kim et al., Nature 618, 2023https://doi.org/10.1038/s41586-023-06096-3
- Cai et al., Rev. Mod. Phys. 95, 2023https://doi.org/10.1103/RevModPhys.95.045005