Skip to main content
Paper Reproductions

Testing Published
Quantum Results

We reproduce published quantum computing results using AI-written circuits on modern hardware. Some are full replications at original scale, others are small-scale reproductions that verify the underlying mechanisms.

Each paper is tested on up to four backends: a noiseless emulator (correctness baseline), QI Tuna-9 (9 superconducting qubits), IQM Garnet (20 qubits), and IBM Torino (133 qubits). Claims are compared quantitatively against published values.

7

Papers Tested

1 more planned

24

Claims Tested

across 7 backends

96%

Pass Rate

23/24 claims

7

Backends

QI Emulator, QI Tuna-9, IBM Torino, IQM Garnet, ibm_marrakesh, tuna9_12edge, ibm_torino_9q_trex

Three Chips, One Suite

Same circuits, different hardware. Each metric tested on QI Tuna-9 (9 superconducting qubits), IQM Garnet (20 qubits), and IBM Torino (133 qubits).

MetricQI Tuna-9 (9q)IQM Garnet (20q)IBM Torino (133q)
Bell fidelity93.5%98.1%86.5%
GHZ-3 fidelity88.9%93.9%82.9%
GHZ-5 fidelity83.8%81.8%76.6%
GHZ-10n/a54.7%62.2%
Quantum Volume163232
RB gate fidelity99.82%99.82%99.99%*
VQE H2 (kcal/mol)0.92--0.22
Dominant noiseDephasingDephasingDepolarizing

* IBM RB inflated by transpiler collapsing Clifford sequences. Tuna-9/IQM values are true gate fidelity. Bold = best per metric. VQE: IBM uses TREX, Tuna-9 uses hybrid PS+REM. -- = not yet tested.

Results by Backend

Which papers pass on which hardware? Green = all claims pass. Orange = partial. Red = fails. Gray = not yet tested.

PaperEmulatorTuna-9IBM TorinoIQM GarnetType
Cross 20193/3 — same scale, different hardwarePASSPASSPASSPASSQV + RB
Sagastizabal 20194/4 — same scale, different hardwarePASSPASSPASS--VQE + EM
Kandala 20175/5 — H2 only (omits LiH, BeH2)PASSPASSPASS--VQE
Peruzzo 20146/8 — superconducting, not photonicPASSPARTIALFAIL--VQE
Harrigan 20214/4 — 3-6 qubits (original: 23)PASSPASS----QAOA
Kim 20233/3 — 9 qubits (original: 127)PASSPARTIALPARTIAL--Ising

PASS = all tested claims within published error bars. PARTIAL = some claims pass, some fail due to hardware noise. FAIL = no claims pass on hardware. -- = not yet tested. Notes show scope relative to original paper.

Do published results hold up?

We extract quantitative claims from papers and test whether AI-generated circuits produce matching numbers on noiseless emulators and real hardware. Some tests are at original scale, others are smaller-scale checks of the underlying mechanisms.

Where does hardware noise break things?

Emulator runs pass consistently. Hardware runs reveal the noise floor: which claims survive real-world decoherence, and which are swamped?

Can AI close the gap?

Failure mode classification tells us whether the gap is noise (mitigable), circuit translation (fixable), or missing methodology (structural).

Completed Reports

Validating quantum computers using randomized model circuits

Cross et al.Phys. Rev. A 100, 032328 (2019)

IBM Research | IBM superconducting (various)

100%
4/4 claims
QI EmulatorQI Tuna-9IBM TorinoIQM Garnet
PASS: 4
2-qubit QV circuits pass heavy output test (> 2/3)
3-qubit QV circuits pass heavy output test (> 2/3)
4-qubit QV circuits pass heavy output test (> 2/3) — Quantum Volume 16

+1 more claims

View full report →

Quantum approximate optimization of non-planar graph problems on a planar superconducting processor

Harrigan et al.Nature Physics 17, 332-336 (2021)

Google AI Quantum | 53-qubit Sycamore (Google)

100%
3/3 claims
QI Emulator
PASS: 3PARTIAL: 0
QAOA MaxCut at p=1 achieves approximation ratio > 0.5 (random)
QAOA performance improves from p=1 to p=2
SWAP compilation overhead degrades QAOA performance for non-native graphs
View full report →

Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets

Kandala et al.Nature 549, 242-246 (2017)

IBM Research | 6-qubit superconducting transmon

100%
4/4 claims
QI EmulatorIBM TorinoQI Tuna-9
PASS: 4PARTIAL: 0
H2 ground state energy at equilibrium (R~0.7 A)
H2 potential energy curve tracks FCI (d=1 sufficient)
Deeper ansatz (d=1,2,3) maintains chemical accuracy for H2 PES

+1 more claims

View full report →

Evidence for the utility of quantum computing before fault tolerance

Kim et al.Nature 618, 500-505 (2023)

IBM Quantum | 127-qubit Eagle (ibm_kyiv)

100%
3/3 claims
QI EmulatorIBM Torinoibm_marrakeshQI Tuna-9tuna9_12edgeibm_torino_9q_trex
PASS: 3
Unmitigated magnetization M_z decays monotonically with Trotter depth due to noise accumulation
ZNE error mitigation recovers ideal M_z at Clifford point (theta_h=0) across depths
ZNE error mitigation substantially improves accuracy over unmitigated results
View full report →

A variational eigenvalue solver on a photonic quantum processor

Peruzzo et al.Nature Communications 5, 4213 (2014)

Various (Bristol, MIT, Google) | Photonic quantum processor

100%
3/3 claims
QI EmulatorIBM TorinoQI Tuna-9
PASS: 3PARTIAL: 0
HeH+ ground state energy near equilibrium (R=0.75 A)
HeH+ potential energy curve matches FCI across bond distances
Symmetry verification improves noisy VQE
View full report →

Error Mitigation by Symmetry Verification on a VQE

Sagastizabal et al.Phys. Rev. A 100, 010302(R) (2019)

QuTech / TU Delft | 2-qubit transmon (Starmon-5)

75%
3/4 claims
QI EmulatorQI Tuna-9IBM Torino
PASS: 3PARTIAL: 1
H2 ground state energy at equilibrium (R=0.735 A)
Symmetry verification reduces VQE error vs raw noisy measurement
VQE achieves chemical accuracy (< 1.6 mHa) with error mitigation

+1 more claims

View full report →

A programmable two-qubit quantum processor in silicon

Watson et al.Nature 555, 633-637 (2018)

QuTech / TU Delft | Si/SiGe spin qubits (2 qubits)

100%
3/3 claims
QI EmulatorQI Tuna-9
PASS: 3
Bell state fidelity from 3-axis tomography (ZZ, XX, YY correlators)
1-bit Deutsch-Josza algorithm correctly classifies all 4 oracles
2-qubit Grover search finds correct target for all 4 possible marked items
View full report →

Paper Pipeline

Done
Error Mitigation by Symmetry Verification on a VQE
Sagastizabal et al.VQE + Error Mitigation
2qarXiv
Done
Hardware-efficient VQE for small molecules
Kandala et al.VQE
6qarXiv
Done
A variational eigenvalue solver on a photonic quantum processor
Peruzzo et al.VQE (first)
2qarXiv
Done
Validating quantum computers using randomized model circuits
Cross et al.Quantum Volume
5qarXiv
Done
Quantum approximate optimization of non-planar graph problems on a planar superconducting processor
Harrigan et al.QAOA
23qarXiv
Done
Evidence for the utility of quantum computing before fault tolerance
Kim et al.Error Mitigation
127qarXiv
Done
A programmable two-qubit quantum processor in silicon
Watson et al.Spin Qubits
2qarXiv
Planned
Universal control of a six-qubit quantum processor in silicon
Philips et al.Spin Qubits
6qarXiv

Methodology

Claim extraction. Published claims are identified from paper text, figures, and supplementary material. Each claim has a published value, error bars (when available), and a reference figure.

Circuit generation. An AI agent (Claude Opus 4.6) writes the quantum circuits, Hamiltonian construction, and measurement analysis code. The agent uses PennyLane, Qiskit, and OpenFermion depending on the paper's methodology.

Failure classification. Results are classified as: success (within published error bars), partial noise (qualitatively correct but degraded), noise dominated (hardware noise overwhelms signal), or structural failures (circuit translation, parameter mismatch, missing methodology detail).

The research question. What do the gaps between published results and AI-reproduced results reveal about reproducibility in quantum computing? The finding is not the pass/fail — it's the pattern of where and why things break.