We Kept Using the Same Error Fix. Then It Stopped Working.

IBM's error correction went from 119x improvement to 1.3x when we changed circuits. A 30-second diagnostic would have told us why.

TREXZNEcircuit deptherror mitigationkicked IsingVQEIBM QuantumTuna-9Kim 2023

After our error mitigation showdown, we had a clear winner: IBM's TREX. Set resilience_level=1 and your VQE results go from 26 kcal/mol error to 0.22 kcal/mol. A 119x improvement. Chemical accuracy in a single API parameter. We were thrilled.

So naturally, we used TREX on the next experiment too — a kicked Ising model from Kim et al. 2023. Same IBM Torino backend. Same resilience_level=1.

The improvement: 1.3x. Not 119x. One point three.

We'd been treating TREX like a universal fix. It isn't. And the reason is embarrassingly simple once you see it.

Aspirin Doesn't Fix a Broken Leg

TREX corrects readout errors — the tendency for the measurement hardware to misread |1⟩ as |0⟩. It's like noise-canceling headphones for the detector. On our VQE circuit (3 gates deep), readout error accounts for over 80% of total error. Fix readout, fix almost everything. Hence 119x.

The kicked Ising circuit is 40–80 gates deep. By the time the quantum state reaches the detector, it's already been scrambled by dozens of imperfect gates. Readout error is maybe 10% of the problem. TREX faithfully corrects that 10% and calls it a day. Hence 1.3x.

It's like taking aspirin for a broken leg. The aspirin works — your headache is gone. But your leg is still broken.

The Same Circuit, Two Diagnoses

	H2 VQE (shallow)	Kicked Ising (deep)
Gate depth	3	40–80
What's actually broken	The detector (readout)	The computation (gates)
TREX (readout fix)	119x better	1.3x better
ZNE (gate noise fix)	Made it worse	14x better (emulator)

ZNE (zero-noise extrapolation) is the opposite tool: it fixes gate errors by running the circuit at multiple noise levels and extrapolating to zero. On our VQE circuit, ZNE made results worse — there's no gate-noise signal to extrapolate. On the deep Ising circuit, it's exactly the right medicine.

The 30-Second Diagnostic

Here's what we wish we'd done first. It takes 3 extra circuit submissions (~30 seconds of QPU time):

Run your circuit normally (baseline)
Run it with 3x the CNOT gates (insert CNOT-CNOT pairs that cancel out mathematically but add physical noise)
Run it with 5x the CNOT gates

Then look at how the error changes:

If the error increases steadily (more gates = worse results): gate noise dominates. Use ZNE.

If the error stays flat (more gates barely matter): readout noise dominates. Use TREX or confusion matrix correction.

We ran this diagnostic on Tuna-9 for our VQE circuit:

1 CNOT: 7.70 kcal/mol
3 CNOTs: 8.62 kcal/mol
5 CNOTs: 6.86 kcal/mol

Flat. Even non-monotonic. Quadrupling the gate count changed the error by less than 1 kcal/mol. The remaining ~7 kcal/mol is all readout. This told us: TREX is correct, ZNE is pointless. Thirty seconds, three jobs, clear answer.

A Decision Tree

After running this diagnostic on multiple circuits across multiple backends, the pattern is consistent enough to be a rule:

Your circuit	Dominant error	Use this	Skip this
<5 two-qubit gates	Readout	TREX, REM, confusion matrices	ZNE, DD (just adds overhead)
5–20 two-qubit gates	Mixed	TREX + light ZNE together	Any single technique alone
>20 two-qubit gates	Gates	ZNE, DD, Pauli twirling	TREX alone (fixes 10% of problem)

The predictor is two-qubit gate count, not qubit count and not backend. A 9-qubit circuit with 1 CNOT per qubit still benefits from TREX. A 2-qubit circuit with 40 CNOTs does not.

The Kim 2023 Results

Once we understood the regime, we matched the mitigation to the noise:

Emulator with simulated noise: ZNE achieved 14.1x improvement on kicked Ising magnetization. The gate-noise signal was clean and monotonic — exactly what ZNE needs.
Tuna-9 hardware: ZNE achieved 2.3x. Modest but meaningful, limited by the 9-qubit chip's noise characteristics.
IBM Torino (5-qubit chain): ZNE achieved 3.1x. Better than TREX's 1.3x on the same circuit.

None of these are as dramatic as TREX's 119x on VQE. Deep circuits are just harder — gate errors compound exponentially with depth, and no amount of post-processing fully undoes that. But choosing the right mitigation gets you 3x instead of 1.3x, which can be the difference between a publishable result and noise.

The Broader Point

The quantum computing community tends to rank error mitigation techniques: "TREX is the best," "ZNE is state-of-the-art," "post-selection is too wasteful." Our data says these rankings are meaningless without context. The best technique depends entirely on what's actually breaking your circuit.

This is also the NISQ preview of a deeper truth. As quantum algorithms scale from VQE (shallow, readout-limited) to quantum dynamics (medium, gate-limited) to fault-tolerant computing (deep, logically protected), the optimal error strategy shifts from readout correction to gate-level mitigation to full quantum error correction. The 119x-to-1.3x cliff we hit is the first transition in that ladder.

Don't pick your mitigation based on what worked last time. Diagnose your circuit first. It takes 30 seconds.

VQE mitigation data: IBM mitigation ladder. Kim 2023 kicked Ising data: kicked Ising results. Gate-folding diagnostic: error mitigation showdown.

Sources & References

IBM mitigation ladder (JSON)https://github.com/JDerekLomas/quantuminspire/blob/main/experiments/results/vqe-mitigation-ladder-001-ibm.json
Kim 2023 Ising replication (JSON)https://github.com/JDerekLomas/quantuminspire/blob/main/experiments/results/kim2023-ising-tuna9.json
Error mitigation showdown/blog/error-mitigation-showdown
Kim et al., Nature 618, 2023https://doi.org/10.1038/s41586-023-06096-3
Cai et al., Rev. Mod. Phys. 95, 2023https://doi.org/10.1103/RevModPhys.95.045005

← Previous

The Race to Automate Science — and Why It Should Worry Us

How to Know If Your Quantum Chemistry Experiment Will Fail Before You Run It