TU Delft / QuTech — Open Research Initiative

How might AI
accelerate quantum?

Using natural language, you can ask AI agents to run real quantum experiments on real hardware: it's the era of Quantum Vibecoding

Get started vibecoding on real quantum hardware in under 15 minutes

Try it yourself Read the paper

hAIqu—
AI as the interface between humans & quantum

The Question

Quantum vibecoding?

Can AI actually support quantum computing work? We tested this by connecting Claude Code to real quantum hardware through MCP servers. Describe the experiment in natural language. The agent derives Hamiltonians (energy models for molecules), writes circuits, submits to real chips, and analyzes the results.

445 sessions. 349 prompts. 3 quantum chips. 0 lines of quantum code by hand.

> Replicate Sagastizabal 2019 on IBM Torino. Try every error mitigation strategy and rank them.

TREX (readout error correction) achieves 0.22 kcal/mol — 119x improvement over raw. Adding more mitigation makes it worse.

The workflow

5 prompt patterns that emerged from 445 sessions

349 prompts, annotated

Every prompt and what it discovered

Try it yourself

Claude Code + MCP servers for 3 quantum backends

8 silent bugs

Traps that produce wrong results without errors

The Evidence

And it actually works

AI agents replicated 6 landmark quantum papers on real hardware and set a new state-of-the-art on quantum code generation.

93%claims pass

27 claims tested across 6 landmark papers, 3 quantum chips.

Benchmark Review

Can LLMs write quantum code? We tested 12 models on Qiskit HumanEval — 151 hand-verified quantum programming tasks (circuit construction, transpilation, error mitigation, VQE).

56.3%

58.9%

70.9%

79.5%

QSpark

Best fine-tuned specialist

Claude 3.5 Sonnet

General-purpose, zero-shot

Claude Opus 4.6

General-purpose + RAG

5 frontier models

Ensemble ceiling

General-purpose frontier models beat every fine-tuned quantum specialist — zero-shot, with no quantum training data. Adding dynamic RAG (feeding relevant Qiskit docs at inference) pushes accuracy to 70.9%.

See the replications →Benchmark review →Download the paper (PDF) →

We also benchmarked 12 frontier models on 151 Qiskit coding tasks. General-purpose LLMs beat every fine-tuned quantum specialist — and RAG pushes accuracy to 70.9%.

Learning & Education

Making quantum intuitive

AI-generated visualizations, simulations, and interactive tools that build new mental models for quantum computing — designed for learners, not just experts.