Skip to main content
Paper Reproduction3 claims tested

Evidence for the utility of quantum computing before fault tolerance

Kim et al.Nature 618, 500-505 (2023)

IBM Quantum | 127-qubit Eagle (ibm_kyiv)arXiv:2302.11590

In Plain Language

What this paper does: This high-profile IBM paper claimed "evidence for quantum utility" — that a 127-qubit quantum computer could produce results that are difficult for classical computers to simulate. It modeled a kicked Ising chain (a physics model for interacting magnets) using error mitigation.

Why it matters: This is the most contested claim in recent quantum computing: can current hardware do anything classically intractable? The paper's results were challenged by classical simulation groups. Reproducing the key experimental signatures tests whether the claims hold up.

Our scope: Mechanism verification, not a replication. The original ran 127 qubits at 60 Trotter steps with a learned noise model (PEA). We ran 5-9 qubits at 10 steps with simple ZNE. Our scale is trivially classically simulable — we tested whether the error mitigation methodology works, not the quantum utility claim.

What we found: All 3 mechanism claims confirmed on a 9-qubit subset. ZNE achieved a 14.1x improvement on the emulator and 2-3x on hardware. The mitigation technique works as described, but our small-scale test cannot address the paper's central quantum utility argument.

Key Terms

Kicked Ising modelA physics model where quantum spins (tiny magnets) interact and are periodically "kicked" — used to study quantum dynamics and chaos

ZNEZero Noise Extrapolation — run the same circuit at different noise levels, then extrapolate to estimate what the zero-noise answer would be

Quantum utilityThe claim that a quantum computer can produce useful results faster or better than any classical computer for a specific task

100%3/3

Backends Tested

QI EmulatorIBM Torinoibm_marrakeshQI Tuna-9tuna9_12edgeibm_torino_9q_trex

Failure Modes

PASS3 (100%)

Claim-by-Claim Comparison

Each claim from the paper is tested on multiple quantum backends. Published values are compared against our measurements.

Unmitigated magnetization M_z decays monotonically with Trotter depth due to noise accumulation

Fig. 2cPublished: Yes
BackendMeasuredDiscrepancyStatus
QI EmulatorYesmatchPASS
IBM TorinoYesmatchPASS
ibm_marrakeshYesmatchPASS
QI Tuna-9YesmatchPASS
tuna9_12edgeYesmatchPASS
ibm_torino_9q_trexYesmatchPASS

ZNE error mitigation recovers ideal M_z at Clifford point (theta_h=0) across depths

Fig. 2cPublished: Yes
BackendMeasuredDiscrepancyStatus
QI EmulatorYesmatchPASS
IBM Torino----
ibm_marrakeshYesmatchPASS
QI Tuna-9YesmatchPASS
tuna9_12edgeYesmatchPASS
ibm_torino_9q_trexNomismatchPARTIAL

ibm_torino_9q_trex: TREX (readout error mitigation, not ZNE) on 9-qubit Tuna-9 topology on IBM Torino. Max TREX error 20.3% at d=10. TREX only corrects readout errors, not gate noise, so it cannot recover ideal M_z at deep circuits. At d=1: TREX 0.948 (5.2% error). At d=10: TREX 0.797 (20.3% error). TREX MAE 0.113 vs raw MAE 0.150 — only marginal improvement. Confirms that readout mitigation alone is insufficient for deep circuits.

ZNE error mitigation substantially improves accuracy over unmitigated results

Fig. 3Published: 10 +/- 5 x improvement factor
BackendMeasuredDiscrepancyStatus
QI Emulator14.1-4.1000PASS
IBM Torino----
ibm_marrakesh3.1+6.9000PARTIAL_SUCCESS
QI Tuna-92.3+7.7000PARTIAL
tuna9_12edge8+2.0000PASS
ibm_torino_9q_trex1.3+8.7000PARTIAL

ibm_marrakesh: ZNE gate folding on IBM Marrakesh achieves 3.1x improvement (M_z error 3.2% raw -> 1.0% ZNE). Lower than emulator's 14.1x because hardware has non-depolarizing noise (coherent errors, crosstalk) that ZNE gate folding cannot fully amplify linearly. Paper's PEA method learns the actual noise model, achieving ~10x on 127 qubits.

QI Tuna-9: ZNE on Tuna-9 9-qubit topology achieves 3.1x at d=1, 1.5x at d=3 (mean 2.3x). Below paper's ~10x with PEA, but matches IBM marrakesh's 3.1x with same basic ZNE method. Hardware has non-depolarizing noise (dephasing-dominated) that simple gate folding cannot fully exploit. At d=3, fold=3 requires 180 CZ gates — hardware decoherence limits ZNE effectiveness.

ibm_torino_9q_trex: TREX (readout mitigation) on 9-qubit topology achieves only 1.3x improvement over raw (TREX MAE 0.113 vs raw MAE 0.150). Worst of all mitigation methods tested. TREX corrects readout errors only — for deep Ising circuits where gate noise dominates, readout mitigation provides minimal benefit. Contrast with H2 VQE where TREX achieved 119x improvement (shallow circuit, readout-dominated error). Key finding: mitigation method must match dominant error source.

Cross-Backend Summary

BackendClaims TestedPassedPass RatePrimary Issue
QI Emulator33100%--
IBM Torino11100%--
ibm_marrakesh3267%PARTIAL_SUCCESS
QI Tuna-93267%PARTIAL
tuna9_12edge33100%--
ibm_torino_9q_trex3133%PARTIAL

Key Findings

QI Emulator: 3/3 claims matched. The simulation pipeline correctly reproduces the published physics.

IBM Torino: 1/1 claims matched. Hardware results match published values within error bars.

ibm_marrakesh: 2/3 claims matched. Hardware noise prevents full reproduction.

QI Tuna-9: 2/3 claims matched. Hardware noise prevents full reproduction.

tuna9_12edge: 3/3 claims matched. Hardware results match published values within error bars.

ibm_torino_9q_trex: 1/3 claims matched. Hardware noise prevents full reproduction.

Report Metadata

Generated: 2/10/2026Paper ID: kim2023View PaperView raw JSON