Paper Reproductions

Testing Published
Quantum Results

We reproduce published quantum computing results using AI-written circuits on modern hardware. Some are full replications at original scale, others are small-scale reproductions that verify the underlying mechanisms.

Each paper is tested on up to four backends: a noiseless emulator (correctness baseline), QI Tuna-9 (9 superconducting qubits), IQM Garnet (20 qubits), and IBM Torino (133 qubits). Claims are compared quantitatively against published values.

Papers Tested

1 more planned

Claims Tested

across 7 backends

96%

Pass Rate

23/24 claims

Backends

QI Emulator, QI Tuna-9, IBM Torino, IQM Garnet, ibm_marrakesh, tuna9_12edge, ibm_torino_9q_trex

Three Chips, One Suite

Same circuits, different hardware. Each metric tested on QI Tuna-9 (9 superconducting qubits), IQM Garnet (20 qubits), and IBM Torino (133 qubits).

Metric	QI Tuna-9 (9q)	IQM Garnet (20q)	IBM Torino (133q)
Bell fidelity	93.5%	98.1%	86.5%
GHZ-3 fidelity	88.9%	93.9%	82.9%
GHZ-5 fidelity	83.8%	81.8%	76.6%
GHZ-10	n/a	54.7%	62.2%
Quantum Volume	16	32	32
RB gate fidelity	99.82%	99.82%	99.99%*
VQE H2 (kcal/mol)	0.92	--	0.22
Dominant noise	Dephasing	Dephasing	Depolarizing

* IBM RB inflated by transpiler collapsing Clifford sequences. Tuna-9/IQM values are true gate fidelity. Bold = best per metric. VQE: IBM uses TREX, Tuna-9 uses hybrid PS+REM. -- = not yet tested.

Results by Backend

Which papers pass on which hardware? Green = all claims pass. Orange = partial. Red = fails. Gray = not yet tested.

Paper	Emulator	Tuna-9	IBM Torino	IQM Garnet	Type
Cross 20193/3 — same scale, different hardware	PASS	PASS	PASS	PASS	QV + RB
Sagastizabal 20194/4 — same scale, different hardware	PASS	PASS	PASS	--	VQE + EM
Kandala 20175/5 — H2 only (omits LiH, BeH2)	PASS	PASS	PASS	--	VQE
Peruzzo 20146/8 — superconducting, not photonic	PASS	PARTIAL	FAIL	--	VQE
Harrigan 20214/4 — 3-6 qubits (original: 23)	PASS	PASS	--	--	QAOA
Kim 20233/3 — 9 qubits (original: 127)	PASS	PARTIAL	PARTIAL	--	Ising

PASS = all tested claims within published error bars. PARTIAL = some claims pass, some fail due to hardware noise. FAIL = no claims pass on hardware. -- = not yet tested. Notes show scope relative to original paper.

Do published results hold up?

We extract quantitative claims from papers and test whether AI-generated circuits produce matching numbers on noiseless emulators and real hardware. Some tests are at original scale, others are smaller-scale checks of the underlying mechanisms.

Where does hardware noise break things?

Emulator runs pass consistently. Hardware runs reveal the noise floor: which claims survive real-world decoherence, and which are swamped?

Can AI close the gap?

Failure mode classification tells us whether the gap is noise (mitigable), circuit translation (fixable), or missing methodology (structural).

Completed Reports

Validating quantum computers using randomized model circuits

Cross et al. — Phys. Rev. A 100, 032328 (2019)

IBM Research | IBM superconducting (various)

100%

4/4 claims

QI EmulatorQI Tuna-9IBM TorinoIQM Garnet

PASS: 4

2-qubit QV circuits pass heavy output test (> 2/3)

3-qubit QV circuits pass heavy output test (> 2/3)

4-qubit QV circuits pass heavy output test (> 2/3) — Quantum Volume 16

+1 more claims

View full report →

Quantum approximate optimization of non-planar graph problems on a planar superconducting processor

Harrigan et al. — Nature Physics 17, 332-336 (2021)

Google AI Quantum | 53-qubit Sycamore (Google)

QAOA MaxCut at p=1 achieves approximation ratio > 0.5 (random)

QAOA performance improves from p=1 to p=2

SWAP compilation overhead degrades QAOA performance for non-native graphs

View full report →

Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets

Kandala et al. — Nature 549, 242-246 (2017)

IBM Research | 6-qubit superconducting transmon

100%

4/4 claims

QI EmulatorIBM TorinoQI Tuna-9

PASS: 4PARTIAL: 0

H2 ground state energy at equilibrium (R~0.7 A)

H2 potential energy curve tracks FCI (d=1 sufficient)

Deeper ansatz (d=1,2,3) maintains chemical accuracy for H2 PES

+1 more claims

View full report →

Evidence for the utility of quantum computing before fault tolerance

Kim et al. — Nature 618, 500-505 (2023)

IBM Quantum | 127-qubit Eagle (ibm_kyiv)

100%

3/3 claims

QI EmulatorIBM Torinoibm_marrakeshQI Tuna-9tuna9_12edgeibm_torino_9q_trex

PASS: 3

Unmitigated magnetization M_z decays monotonically with Trotter depth due to noise accumulation

ZNE error mitigation recovers ideal M_z at Clifford point (theta_h=0) across depths

ZNE error mitigation substantially improves accuracy over unmitigated results

View full report →

A variational eigenvalue solver on a photonic quantum processor

Peruzzo et al. — Nature Communications 5, 4213 (2014)

Various (Bristol, MIT, Google) | Photonic quantum processor

100%

3/3 claims

QI EmulatorIBM TorinoQI Tuna-9

PASS: 3PARTIAL: 0

HeH+ ground state energy near equilibrium (R=0.75 A)

HeH+ potential energy curve matches FCI across bond distances

Symmetry verification improves noisy VQE

View full report →

Error Mitigation by Symmetry Verification on a VQE

Sagastizabal et al. — Phys. Rev. A 100, 010302(R) (2019)

QuTech / TU Delft | 2-qubit transmon (Starmon-5)

75%

3/4 claims

QI EmulatorQI Tuna-9IBM Torino

PASS: 3PARTIAL: 1

H2 ground state energy at equilibrium (R=0.735 A)

Symmetry verification reduces VQE error vs raw noisy measurement

VQE achieves chemical accuracy (< 1.6 mHa) with error mitigation

+1 more claims

View full report →

A programmable two-qubit quantum processor in silicon

Watson et al. — Nature 555, 633-637 (2018)

QuTech / TU Delft | Si/SiGe spin qubits (2 qubits)

Bell state fidelity from 3-axis tomography (ZZ, XX, YY correlators)

1-bit Deutsch-Josza algorithm correctly classifies all 4 oracles

2-qubit Grover search finds correct target for all 4 possible marked items

View full report →

Paper Pipeline

Done

Error Mitigation by Symmetry Verification on a VQE

Sagastizabal et al. — VQE + Error Mitigation

2qarXiv

Done

Hardware-efficient VQE for small molecules

Kandala et al. — VQE

6qarXiv

Done

A variational eigenvalue solver on a photonic quantum processor

Peruzzo et al. — VQE (first)

2qarXiv

Done

Validating quantum computers using randomized model circuits

Cross et al. — Quantum Volume

5qarXiv

Done

Quantum approximate optimization of non-planar graph problems on a planar superconducting processor

Harrigan et al. — QAOA

23qarXiv

Done

Evidence for the utility of quantum computing before fault tolerance

Kim et al. — Error Mitigation

127qarXiv

Done

A programmable two-qubit quantum processor in silicon

Watson et al. — Spin Qubits

2qarXiv

Planned

Universal control of a six-qubit quantum processor in silicon

Philips et al. — Spin Qubits

6qarXiv

Methodology

Claim extraction. Published claims are identified from paper text, figures, and supplementary material. Each claim has a published value, error bars (when available), and a reference figure.

Circuit generation. An AI agent (Claude Opus 4.6) writes the quantum circuits, Hamiltonian construction, and measurement analysis code. The agent uses PennyLane, Qiskit, and OpenFermion depending on the paper's methodology.

Failure classification. Results are classified as: success (within published error bars), partial noise (qualitatively correct but degraded), noise dominated (hardware noise overwhelms signal), or structural failures (circuit translation, parameter mismatch, missing methodology detail).

The research question. What do the gaps between published results and AI-reproduced results reveal about reproducibility in quantum computing? The finding is not the pass/fail — it's the pattern of where and why things break.

Testing PublishedQuantum Results

Three Chips, One Suite

Results by Backend

Completed Reports

Validating quantum computers using randomized model circuits

Quantum approximate optimization of non-planar graph problems on a planar superconducting processor

Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets

Evidence for the utility of quantum computing before fault tolerance

A variational eigenvalue solver on a photonic quantum processor

Error Mitigation by Symmetry Verification on a VQE

A programmable two-qubit quantum processor in silicon

Paper Pipeline

Methodology

Testing Published
Quantum Results