Research

AI agents replicate
quantum papers on
real hardware.

Can AI systematically reproduce quantum computing experiments? We tested 27 claims from 6 landmark papers across 3 quantum processors. 93% pass. The gaps between published results and AI-reproduced results are the finding.

6 papers replicated100+ experiments3 hardware platforms

Paper Replications

H2 VQE error mitigation4/4 pass

Sagastizabal et al. 2019

IBM TREX: 0.22 kcal/mol

Hardware-efficient VQE5/5 pass

Kandala et al. 2017

Chemical accuracy on 3 configs

First VQE (HeH+)7/9 pass

Peruzzo et al. 2014

Coefficient amplification discovered

Quantum Volume3/3 pass

Cross et al. 2019

QV=32 on IBM & IQM

QAOA MaxCut4/4 pass

Harrigan et al. 2021

74.1% approx ratio on Tuna-9

Kicked Ising / utility3/3 pass

Kim et al. 2023

9-qubit, 180 CZ gates, 14.1x ZNE

View all replications with detailed claims →

Key Findings

0.22kcal/mol

Chemical accuracy on real hardware

TREX on IBM Torino. 119x improvement over raw. The simplest mitigation wins.

|g1|/|g4|

Coefficient amplification predicts error

H2 ratio 4.4 = 0.22 kcal/mol. HeH+ ratio 7.8 = 4.45 kcal/mol. 1.8x ratio, 20x error.

9q > 20q

Topology beats scale

Tuna-9 beats IQM Garnet on GHZ-5: 83.8% vs 81.8%. Knowing your chip matters more.

>80%readout

Most error is readout, not gates

ZNE failed on both backends. Gate folding adds <1.3 kcal/mol. Readout correction is what works.

100+ Experiments

15+Bell states

Entanglement benchmarking across qubit pairs

12+GHZ states

3-50 qubit multipartite entanglement

40+VQE chemistry

H2 and HeH+ energy estimation with mitigation

8+QAOA MaxCut

Combinatorial optimization on hardware

20+Benchmarks

RB, QV, connectivity probes, characterization

10+QEC

[[4,2,2]] detection code, NN decoders

Browse all experiments with raw data →

Three Chips, One Suite

QI Tuna-9

Qubits9

QV8

Bell best93.5%

VQE best0.92 kcal/mol

Best small-scale fidelity

IQM Garnet

Qubits20

QV32

Bell best98.1%

VQE bestn/a kcal/mol

Highest Bell fidelity

IBM Torino

Qubits133

QV32

Bell best86.5%

VQE best0.22 kcal/mol

Best VQE with TREX

Full platform comparison →

Explore the Research

Paper Replications

6 papers, 27 claims tested across 3 chips. Every claim documented.

Experiment Dashboard

100+ experiments with raw counts, analysis, and circuit details.

Platform Comparison

Tuna-9 vs Garnet vs Torino. Bell, GHZ, QV, VQE head-to-head.

Methodology

349 prompts from 445 sessions. The 5-phase workflow that emerged.

Research Blog

14 posts: mitigation showdowns, topology maps, noise forensics.

Quantum VibeCoding

The method that made all this possible.

Data & Reproducibility

All raw data, circuits, and analysis scripts are open on GitHub. Every result file uses schema-versioned JSON with SHA256 checksums for raw counts and circuits.

Read the paper (PDF) — outline on GitHub

Environment

Python3.12Qiskit2.1.2PennyLane0.44QI SDK3.5.1

Hardware

IBM Torino133 qubitsQI Tuna-99 qubitsIQM Garnet20 qubits

AI agents replicatequantum papers onreal hardware.

Paper Replications

Sagastizabal et al. 2019

Kandala et al. 2017

Peruzzo et al. 2014

Cross et al. 2019

Harrigan et al. 2021

Kim et al. 2023

Key Findings

Chemical accuracy on real hardware

Coefficient amplification predicts error

Topology beats scale

Most error is readout, not gates

100+ Experiments

Three Chips, One Suite

Explore the Research

Data & Reproducibility

AI agents replicate
quantum papers on
real hardware.