Skip to main content
TU Delft / QuTech — Open Research Initiative

How might AI
accelerate quantum?

Using natural language, you can ask AI agents to run real quantum experiments on real hardware: it's the era of Quantum Vibecoding

Get started vibecoding on real quantum hardware in under 15 minutes

hAIqu
AI as the interface between humans & quantum

The Question

Quantum vibecoding?

Can AI actually support quantum computing work? We tested this by connecting Claude Code to real quantum hardware through MCP servers. Describe the experiment in natural language. The agent derives Hamiltonians (energy models for molecules), writes circuits, submits to real chips, and analyzes the results.

445 sessions. 349 prompts. 3 quantum chips. 0 lines of quantum code by hand.

> Replicate Sagastizabal 2019 on IBM Torino. Try every error mitigation strategy and rank them.

TREX (readout error correction) achieves 0.22 kcal/mol — 119x improvement over raw. Adding more mitigation makes it worse.

The Evidence

And it actually works

AI agents replicated 6 landmark quantum papers on real hardware and set a new state-of-the-art on quantum code generation.

93%claims pass

27 claims tested across 6 landmark papers, 3 quantum chips.

Benchmark Review

Can LLMs write quantum code? We tested 12 models on Qiskit HumanEval — 151 hand-verified quantum programming tasks (circuit construction, transpilation, error mitigation, VQE).

56.3%
58.9%
70.9%
79.5%
QSpark
Best fine-tuned specialist
Claude 3.5 Sonnet
General-purpose, zero-shot
Claude Opus 4.6
General-purpose + RAG
5 frontier models
Ensemble ceiling

General-purpose frontier models beat every fine-tuned quantum specialist — zero-shot, with no quantum training data. Adding dynamic RAG (feeding relevant Qiskit docs at inference) pushes accuracy to 70.9%.

We also benchmarked 12 frontier models on 151 Qiskit coding tasks. General-purpose LLMs beat every fine-tuned quantum specialist — and RAG pushes accuracy to 70.9%.