• Machine Learning

Google Willow: 105-Qubit Chip Achieves New Quantum Supremacy Milestone

Google Quantum AI

Google's Willow processor completed a random circuit sampling benchmark in 5 minutes that would require 10^25 years on the world's fastest classical supercomputer, and crucially demonstrated below-threshold quantum error correction: logical error rates decreased as the surface code distance increased from d=3 to d=7.

Key Outcome
Willow achieved QV > 1000 and demonstrated below-threshold error correction scaling; logical error rate decreased as code distance increased from d=3 to d=7, a critical milestone for fault-tolerant quantum computing.

The Willow Processor

Google Quantum AI announced the Willow chip in December 2024, a 105-qubit superconducting processor representing a major advance over the 2019 Sycamore chip (53 qubits) that demonstrated the first quantum computational advantage. Willow is fabricated on a 2D grid with improved qubit coherence times: T1 (energy relaxation) of approximately 100 microseconds and T2 (dephasing) of approximately 150 microseconds, roughly 5x improvements over Sycamore.

The two-qubit gate fidelity on Willow reached 99.7% on average across the chip, with single-qubit gates at 99.9%. These improvements were achieved through redesigned transmon qubit geometry, improved superconducting resonator fabrication, and better isolation between neighboring qubits using tunable couplers that can be switched off between gate operations to suppress residual ZZ coupling.

Willow addresses two separate claims: a new random circuit sampling (RCS) supremacy benchmark, and, more importantly for the long-term, the first experimental demonstration of below-threshold quantum error correction scaling on surface codes.

Random Circuit Sampling Benchmark

The RCS benchmark asks the quantum computer to sample from the output distribution of a deep random quantum circuit. Classically verifying that the output is correct requires simulating the circuit, which becomes exponentially hard as circuit width and depth grow. Google’s team estimated that sampling from Willow’s 105-qubit RCS circuit to within statistical fidelity would require approximately 10^25 years on Frontier, the current world’s fastest classical supercomputer.

import numpy as np
import cirq

# Construct a small RCS-style random circuit for illustration
# Full Willow RCS uses 105 qubits and ~20 cycles of 2-qubit gates

def build_rcs_circuit(n_qubits=10, n_cycles=5, seed=42):
    """
    Random Circuit Sampling circuit: alternating layers of
    single-qubit random gates and two-qubit iSWAP-like gates.
    """
    rng = np.random.default_rng(seed)
    qubits = cirq.LineQubit.range(n_qubits)
    circuit = cirq.Circuit()

    # Single-qubit gate set used by Google
    sq_gates = [cirq.X**0.5, cirq.Y**0.5, cirq.T]

    for cycle in range(n_cycles):
        # Layer of random single-qubit gates
        circuit.append(
            rng.choice(sq_gates)(q) for q in qubits
        )
        # Layer of two-qubit gates on alternating pairs
        pairs = (
            list(zip(qubits[::2], qubits[1::2]))
            if cycle % 2 == 0
            else list(zip(qubits[1::2], qubits[2::2]))
        )
        for q0, q1 in pairs:
            circuit.append(cirq.SQRT_ISWAP(q0, q1))

    circuit.append(cirq.measure(*qubits, key='m'))
    return circuit

rcs = build_rcs_circuit(n_qubits=10, n_cycles=5)
print(f"RCS circuit depth: {len(rcs)}")
print(f"Two-qubit gate count: {sum(1 for op in rcs.all_operations() if len(op.qubits) == 2)}")

# Simulate with cirq (exact state vector, feasible for 10 qubits)
sim = cirq.Simulator()
result = sim.run(rcs, repetitions=1000)
counts = result.measurements['m']
# Convert to integer histogram
samples = [''.join(map(str, row)) for row in counts]
from collections import Counter
hist = Counter(samples)
print(f"\nSampled {len(hist)} unique bitstrings from {1000} shots")
print(f"Top 5 bitstrings: {hist.most_common(5)}")

# Cross-entropy benchmarking (XEB) fidelity estimate
# XEB measures how well the measured distribution matches ideal
# For the full Willow chip, Google measured XEB fidelity > 0.001
# on the full 105-qubit circuit (any positive value = quantum advantage)
ideal_result = sim.simulate(rcs.without_terminal_measurements())
ideal_probs = np.abs(ideal_result.final_state_vector) ** 2
sampled_probs = [ideal_probs[int(s, 2)] for s in samples]
xeb = 2 ** len(rcs.all_qubits()) * np.mean(sampled_probs) - 1
print(f"\nXEB fidelity (10-qubit demo): {xeb:.4f}")
print("(Willow 105-qubit XEB fidelity: ~0.002, classical simulation: ~0)")

Below-Threshold Error Correction: The Critical Milestone

The supremacy benchmark, while impressive, does not by itself demonstrate that useful fault-tolerant quantum computing is approaching. The critical milestone in Willow is the surface code scaling experiment.

A surface code logical qubit is formed from a 2D grid of d x d physical qubits (code distance d). Syndrome measurements detect errors without collapsing the logical state. If the physical error rate p is below the surface code threshold (approximately 1% for standard error models), then increasing d reduces the logical error rate exponentially.

Before Willow, Google’s hardware had physical error rates that hovered near the threshold, meaning increasing d sometimes made logical error rates worse rather than better (above-threshold behavior). Willow’s improved coherence and gate fidelities pushed it clearly below threshold for the first time.

# Surface code logical error rate model
# Below threshold: p_L ~ (p/p_th)^(d/2) per round
# Above threshold: p_L increases with d

def surface_code_logical_error_rate(p_physical, code_distance, p_threshold=0.01):
    """
    Simplified surface code logical error rate.
    p_physical: physical two-qubit gate error rate
    code_distance: d (number of qubits per row/column = d x d)
    """
    ratio = p_physical / p_threshold
    if ratio < 1:
        # Below threshold: exponential suppression
        p_logical = (ratio) ** ((code_distance + 1) / 2)
    else:
        # Above threshold: error rate grows with d
        p_logical = min(0.5, ratio ** (code_distance / 4))
    return p_logical

# Sycamore (2019): p ~ 0.006 to 0.009, near threshold
# Willow (2024): p ~ 0.003, well below threshold

print("Logical error rate vs code distance:")
print(f"{'Distance':>10} | {'Sycamore p=0.008':>18} | {'Willow p=0.003':>16}")
print("-" * 52)
for d in [3, 5, 7]:
    syc = surface_code_logical_error_rate(0.008, d)
    wil = surface_code_logical_error_rate(0.003, d)
    print(f"  d = {d}      |  {syc:.2e}           |  {wil:.2e}")

print()
print("Willow experimental results (Google, Dec 2024):")
print("  d=3: ~0.3% logical error rate per round")
print("  d=5: ~0.2% logical error rate per round")
print("  d=7: ~0.14% logical error rate per round")
print("  -> Each distance step REDUCES logical error: below-threshold confirmed")

The experimental data showed logical error rate decreasing from approximately 0.3% per round at d=3, to 0.2% at d=5, to 0.14% at d=7. This is the first unambiguous demonstration of below-threshold scaling in a superconducting system. The implication is that a logical qubit with arbitrarily low error rate can be built by increasing code distance; the theoretical foundation of fault-tolerant quantum computing is now experimentally confirmed.

Comparison to Sycamore and Implications

MetricSycamore (2019)Willow (2024)
Physical qubits53105
T1 coherence time~20 us~100 us
Two-qubit gate fidelity99.4%99.7%
RCS classical simulation time10,000 years10^25 years
Surface code scalingAbove/near thresholdBelow threshold
Quantum Volume (est.)~128>1000

The 2019 Sycamore result was challenged by classical simulation improvements (researchers at IBM and later Google itself showed Sycamore’s circuit could be classically simulated in days using tensor network methods). Willow’s 10^25 year benchmark is considered much harder to classically simulate due to deeper circuits and better-controlled two-qubit interactions that frustrate tensor network compression.

Google’s roadmap targets a “useful” quantum computation (solving a problem with practical value faster than any classical computer) by 2029. The below-threshold error correction demonstration is the critical prerequisite: without it, adding more qubits would not reduce errors and fault-tolerant algorithms would be impossible. Willow’s result shifts the quantum computing community’s assessment of the fault-tolerant timeline from speculative to credible.