Adaptive Variational Circuits with ADAPT-VQE in PennyLane

Standard variational quantum eigensolvers (VQE) fix the circuit structure before optimization begins. This choice is consequential: an ansatz that is too shallow cannot represent the ground state accurately, while one that is too deep runs into barren plateaus and hardware noise. ADAPT-VQE, introduced by Grimsley et al. in 2019, sidesteps this dilemma by constructing the ansatz one operator at a time, guided by gradient information from the current state.

The algorithm iterates three steps: evaluate the energy gradient with respect to every operator in a predefined pool, add the operator with the largest gradient magnitude to the circuit, then re-optimize all parameters. The circuit grows only as fast as the physics demands, typically reaching chemical accuracy with far fewer parameters than UCCSD.

The ADAPT-VQE Algorithm

Given a Hamiltonian $H$ and an operator pool $\{A_k\}$ (anti-Hermitian operators), ADAPT-VQE builds the ansatz:

$|\psi(\theta)\rangle = e^{\theta_p A_p} \cdots e^{\theta_1 A_1} |0\rangle$

At each iteration, it selects the next operator by computing:

$\frac{\partial E}{\partial \theta_k}\bigg|_{\theta_k=0} = \langle \psi | [H, A_k] | \psi \rangle$

The operator with the largest $|\partial E / \partial \theta_k|$ is appended to the circuit and a full parameter optimization is run.

Setup and Operator Pool

For the H2 molecule, we use a minimal (STO-3G) basis. The second-quantized Hamiltonian mapped to qubits via Jordan-Wigner gives a 4-qubit problem. The operator pool consists of single and double excitation operators derived from the Hartree-Fock reference.

import pennylane as qml
from pennylane import numpy as np
import pennylane.qchem as qchem
import matplotlib.pyplot as plt

# Build the H2 Hamiltonian at bond length 0.74 Angstrom
symbols = ["H", "H"]
coordinates = np.array([0.0, 0.0, 0.0, 0.0, 0.0, 1.3984])  # Bohr

H, n_qubits = qchem.molecular_hamiltonian(symbols, coordinates)
n_electrons = 2
hf_state = qchem.hf_state(n_electrons, n_qubits)

print(f"Number of qubits: {n_qubits}")
print(f"Number of Hamiltonian terms: {len(H.operands)}")
print(f"HF state: {hf_state}")

Building the Operator Pool

The pool contains generators of single excitations $\hat{T}_{ia} - \hat{T}_{ia}^\dagger$ and double excitations $\hat{T}_{ijab} - \text{h.c.}$ , where $i, j$ index occupied orbitals and $a, b$ index virtual orbitals.

def build_operator_pool(n_qubits, n_electrons):
    """
    Build pool of single and double excitation operators.
    Returns list of (wires, excitation_type) tuples.
    """
    occupied = list(range(n_electrons))
    virtual = list(range(n_electrons, n_qubits))

    pool = []

    # Single excitations
    for i in occupied:
        for a in virtual:
            pool.append(("single", [i, a]))

    # Double excitations
    for i in occupied:
        for j in occupied:
            if j <= i:
                continue
            for a in virtual:
                for b in virtual:
                    if b <= a:
                        continue
                    pool.append(("double", [i, j, a, b]))

    return pool

pool = build_operator_pool(n_qubits, n_electrons)
print(f"Pool size: {len(pool)} operators")
for kind, wires in pool:
    print(f"  {kind} excitation on wires {wires}")

Gradient Evaluation

The ADAPT gradient for operator $k$ is $\langle \psi | [H, G_k] | \psi \rangle$ , where $G_k$ is the anti-Hermitian generator. PennyLane can compute this as a parameter-shift derivative of the energy when the operator is appended with $\theta = 0$ .

dev = qml.device("default.qubit", wires=n_qubits)

def make_adapt_circuit(selected_ops, params, candidate=None):
    """Build the current ansatz, optionally appending a candidate at theta=0."""
    qml.BasisState(hf_state, wires=range(n_qubits))
    for (kind, wires), theta in zip(selected_ops, params):
        if kind == "single":
            qml.SingleExcitation(theta, wires=wires)
        else:
            qml.DoubleExcitation(theta, wires=wires)
    if candidate is not None:
        kind, wires = candidate
        if kind == "single":
            qml.SingleExcitation(0.0, wires=wires)
        else:
            qml.DoubleExcitation(0.0, wires=wires)

def compute_adapt_gradient(H, selected_ops, params, candidate):
    """
    Estimate the ADAPT gradient for a candidate operator by finite difference
    around theta=0 appended at the end of the current circuit.
    """
    eps = 1e-3

    @qml.qnode(dev)
    def energy_plus():
        qml.BasisState(hf_state, wires=range(n_qubits))
        for (kind, wires), theta in zip(selected_ops, params):
            if kind == "single":
                qml.SingleExcitation(theta, wires=wires)
            else:
                qml.DoubleExcitation(theta, wires=wires)
        kind, wires = candidate
        if kind == "single":
            qml.SingleExcitation(eps, wires=wires)
        else:
            qml.DoubleExcitation(eps, wires=wires)
        return qml.expval(H)

    @qml.qnode(dev)
    def energy_minus():
        qml.BasisState(hf_state, wires=range(n_qubits))
        for (kind, wires), theta in zip(selected_ops, params):
            if kind == "single":
                qml.SingleExcitation(theta, wires=wires)
            else:
                qml.DoubleExcitation(theta, wires=wires)
        kind, wires = candidate
        if kind == "single":
            qml.SingleExcitation(-eps, wires=wires)
        else:
            qml.DoubleExcitation(-eps, wires=wires)
        return qml.expval(H)

    return float((energy_plus() - energy_minus()) / (2 * eps))

The ADAPT-VQE Main Loop

def adapt_vqe(H, pool, max_iterations=6, grad_threshold=1e-3, n_opt_steps=80):
    """
    Run ADAPT-VQE and return energy history and selected operators.
    """
    selected_ops = []
    params = np.array([], requires_grad=True)
    energy_history = []
    gradient_norms = []

    for iteration in range(max_iterations):
        # Step 1: evaluate gradients for all pool operators
        grads = []
        for candidate in pool:
            g = compute_adapt_gradient(H, selected_ops, params, candidate)
            grads.append(abs(g))

        max_grad = max(grads)
        gradient_norms.append(max_grad)
        print(f"Iteration {iteration+1}: max gradient = {max_grad:.6f}")

        if max_grad < grad_threshold:
            print("Converged: gradient below threshold.")
            break

        # Step 2: select the operator with largest gradient
        best_idx = int(np.argmax(grads))
        selected_ops.append(pool[best_idx])
        params = np.append(params, 0.0)
        params = np.array(params, requires_grad=True)
        print(f"  Added {pool[best_idx][0]} excitation on wires {pool[best_idx][1]}")

        # Step 3: optimize all parameters
        @qml.qnode(dev)
        def cost(p):
            qml.BasisState(hf_state, wires=range(n_qubits))
            for (kind, wires), theta in zip(selected_ops, p):
                if kind == "single":
                    qml.SingleExcitation(theta, wires=wires)
                else:
                    qml.DoubleExcitation(theta, wires=wires)
            return qml.expval(H)

        opt = qml.GradientDescentOptimizer(stepsize=0.4)
        for _ in range(n_opt_steps):
            params, e = opt.step_and_cost(cost, params)

        final_energy = float(cost(params))
        energy_history.append(final_energy)
        print(f"  Energy after optimization: {final_energy:.8f} Ha")

    return energy_history, gradient_norms, selected_ops, params

energy_history, grad_norms, final_ops, final_params = adapt_vqe(H, pool)

Why Adaptive Circuits Win

For H2 in STO-3G, the full UCCSD ansatz includes all singles and doubles simultaneously and typically has 4-6 parameters (after symmetry reduction). ADAPT-VQE reaches the same energy with 1 or 2 operators in most cases because only the double excitation $|0011\rangle \to |1100\rangle$ is dominant. The algorithm discovers this automatically.

The savings become more dramatic for larger molecules. For systems like BeH2 or H2O, UCCSD may require dozens of parameters while ADAPT-VQE converges with a handful, reducing both circuit depth and optimization difficulty.

# Compare ADAPT convergence to HF reference and FCI
hf_energy = -1.1167  # Hartree for H2 at this geometry (approximate)
fci_energy = -1.1362  # FCI/STO-3G reference (approximately -1.136189 Ha)

print("\n--- Results ---")
print(f"HF energy:   {hf_energy:.6f} Ha")
print(f"FCI energy:  {fci_energy:.6f} Ha")
for i, e in enumerate(energy_history):
    error_kcal = (e - fci_energy) * 627.5
    print(f"ADAPT iter {i+1}: {e:.6f} Ha  (error: {error_kcal:.3f} kcal/mol)")

# Chemical accuracy threshold
print(f"\nChemical accuracy = 1.6 kcal/mol = {1.6/627.5:.6f} Ha")

Visualizing Circuit Growth

# Plot energy convergence
iterations = list(range(1, len(energy_history) + 1))
plt.figure(figsize=(8, 4))
plt.plot(iterations, energy_history, 'o-', label='ADAPT-VQE')
plt.axhline(fci_energy, color='red', linestyle='--', label='FCI reference')
plt.axhline(hf_energy, color='gray', linestyle=':', label='HF reference')
plt.xlabel('ADAPT Iteration')
plt.ylabel('Energy (Ha)')
plt.title('ADAPT-VQE Convergence for H2')
plt.legend()
plt.tight_layout()
plt.savefig('adapt_convergence.png', dpi=150)

# Plot gradient norms
plt.figure(figsize=(6, 3))
plt.semilogy(range(1, len(grad_norms) + 1), grad_norms, 's-')
plt.xlabel('ADAPT Iteration')
plt.ylabel('Max Gradient Norm (log scale)')
plt.title('Gradient Norms During ADAPT-VQE')
plt.tight_layout()
plt.savefig('adapt_gradients.png', dpi=150)

Practical Considerations

Operator pool design matters. Using qubit-ADAPT operators (Pauli strings) instead of fermionic excitations reduces circuit depth on hardware at the cost of a larger pool and more gradient evaluations per iteration. Fermionic pools lead to more chemically interpretable ansatze.

Gradient measurement cost scales linearly with pool size per iteration. For large molecules, screening the pool with approximate gradients (e.g., one-qubit reduced density matrices) before full gradient evaluation is a common optimization.

Convergence criteria: the gradient norm threshold is typically set to $10^{-3}$ Ha. Tighter thresholds can help but may require many additional iterations with diminishing energy returns.

Hardware compilation: since the circuit grows iteratively, each new operator should be transpiled efficiently to native gates. PennyLane’s device compilation pipeline handles this, but choosing operators whose generators map to short gate sequences helps minimize depth.

ADAPT-VQE represents a broader principle: letting the quantum state itself guide ansatz construction. The same idea underlies MCCI, iCI, and other selected configuration interaction methods in classical quantum chemistry. On quantum hardware, it is one of the most promising strategies for achieving chemical accuracy without fixed-structure overparameterization.