QAOA: Quantum Approximate Optimisation Algorithm with PennyLane

The Quantum Approximate Optimisation Algorithm (QAOA) is a variational algorithm designed for combinatorial optimisation problems. It was introduced by Farhi, Goldstone, and Gutmann in 2014 and sits at the intersection of quantum computing and classical operations research. Like VQE, it uses a parameterised quantum circuit optimised by a classical loop, making it a candidate for near-term quantum hardware.

This tutorial builds QAOA from first principles: we derive the cost Hamiltonian, decompose the unitaries into gates, implement the algorithm in PennyLane both manually and with the built-in qml.qaoa module, visualise the optimisation landscape, explore parameter initialisation strategies, extend to weighted graphs, and analyse approximation ratios across different circuit depths.

Installation

pip install pennylane pennylane-qchem networkx matplotlib

QAOA Theory from First Principles

QAOA is best understood as an approximation to adiabatic quantum optimisation. The adiabatic theorem says: if you start in the ground state of a simple Hamiltonian B and slowly evolve to a problem Hamiltonian C, the system stays in the ground state throughout the evolution (provided you go slowly enough and there is no zero energy gap crossing). The ground state of C encodes the solution to the optimisation problem.

The simple starting Hamiltonian is the transverse-field mixer:

B = sum_i X_i

The ground state of -B is the uniform superposition |+…+>, which is easy to prepare with Hadamard gates. The problem Hamiltonian C encodes the objective function (in our case, MaxCut). The ground state of C is the bitstring that maximises the cut.

Adiabatic evolution applies the time-dependent Hamiltonian H(t) = (1 - t/T) B + (t/T) C for time 0 to T. As T grows large, the system stays in the instantaneous ground state and arrives at C’s ground state.

QAOA approximates this continuous evolution using Trotterization. Instead of smooth time evolution, we break the path into p discrete steps. At each step, we apply the cost unitary exp(-i * gamma_k * C) followed by the mixer unitary exp(-i * beta_k * B). The key insight: rather than fixing gamma and beta to follow a specific adiabatic schedule, we treat them as free parameters and optimise them classically.

The full QAOA state is:

|gamma, beta> = U_B(beta_p) U_C(gamma_p) ... U_B(beta_1) U_C(gamma_1) |+...+>

As p increases, the space of reachable states grows, and QAOA better approximates the adiabatic path. In the limit p -> infinity, QAOA can represent any adiabatic schedule and is guaranteed to find the exact optimum.

The connection to Trotterization is direct. A first-order Trotter decomposition of the time evolution operator exp(-i * (C + B) * dt) is approximately exp(-i * C * dt) * exp(-i * B * dt), which is exactly one QAOA layer with gamma = dt and beta = dt. Multiple layers with optimised angles generalize this by allowing each step to have its own pace, which can outperform a uniform Trotter schedule.

The MaxCut Problem

MaxCut is a graph partitioning problem: given an undirected graph G = (V, E), divide the nodes into two sets S and S’ to maximise the number of edges that cross the partition. It is NP-hard in general, making it a useful benchmark for optimisation algorithms.

For a 4-node graph with edges (0,1), (0,3), (1,2), (2,3), (1,3), an optimal cut places nodes {0, 2} in one set and {1, 3} in the other, cutting 4 of the 5 edges.

import pennylane as qml
from pennylane import numpy as np
import networkx as nx
import matplotlib.pyplot as plt

# Define a 4-node graph
n_qubits = 4
edges = [(0, 1), (0, 3), (1, 2), (2, 3), (1, 3)]

G = nx.Graph()
G.add_nodes_from(range(n_qubits))
G.add_edges_from(edges)

Cost Hamiltonian Derivation

The MaxCut objective counts how many edges cross the partition. We assign a spin variable z_i in {+1, -1} to each node: z_i = +1 means node i is in set S, z_i = -1 means it is in set S’. An edge (i, j) is cut when z_i and z_j have opposite signs.

For a single edge (i, j), the contribution to the cut count is:

f(z_i, z_j) = (1 - z_i * z_j) / 2

Check the two cases:

z_i = z_j (same partition): (1 - 1) / 2 = 0 (edge is not cut)
z_i != z_j (different partition): (1 - (-1)) / 2 = 1 (edge is cut)

The total cut value for a partition z is:

Cut(z) = sum_{(i,j) in E} (1 - z_i * z_j) / 2
       = |E|/2 - sum_{(i,j) in E} z_i * z_j / 2

Now promote the classical spin variables z_i to Pauli Z operators Z_i. The eigenvalues of Z_i are +1 and -1, matching our spin encoding. The cost Hamiltonian becomes:

C = sum_{(i,j) in E} (I - Z_i Z_j) / 2

Since we want to maximise Cut(z), and variational optimisers typically minimise, we minimise the negative:

H_cost = -C = sum_{(i,j) in E} (Z_i Z_j - I) / 2

Dropping the constant identity term (it shifts all eigenvalues equally and does not affect the optimal bitstring), the effective cost Hamiltonian is:

H_cost = sum_{(i,j) in E} (-1/2) * Z_i Z_j

This explains why cost_coeffs = [-0.5] appears in the code: each ZZ term carries a coefficient of -0.5.

# Build the cost Hamiltonian as a PennyLane Hamiltonian
cost_coeffs = [-0.5] * len(edges)
cost_ops = [qml.PauliZ(i) @ qml.PauliZ(j) for i, j in edges]
H_cost = qml.Hamiltonian(cost_coeffs, cost_ops)
print("Cost Hamiltonian:", H_cost)

U_C Gate Decomposition

The cost unitary is U_C(gamma) = exp(-i * gamma * H_cost). Since H_cost is a sum of commuting ZZ terms (all Z operators commute with each other), the exponential factorises:

U_C(gamma) = product_{(i,j) in E} exp(i * gamma/2 * Z_i Z_j)

(The sign flips because H_cost has a -0.5 coefficient.)

Consider a single factor exp(-i * gamma * Z_i Z_j) acting on the computational basis. Since Z_i Z_j is diagonal in the computational basis:

exp(-i * gamma * Z_i Z_j) |z_i, z_j> = exp(-i * gamma * z_i * z_j) |z_i, z_j>

where z_i, z_j in {+1, -1} are the eigenvalues. Concretely:

State	z_i * z_j	Phase factor
\|00>	+1	exp(-i * gamma)
\|01>	-1	exp(+i * gamma)
\|10>	-1	exp(+i * gamma)
\|11>	+1	exp(-i * gamma)

This gate applies a relative phase of 2*gamma between the same-parity states (|00>, |11>) and the opposite-parity states (|01>, |10>).

The standard circuit decomposition uses a CNOT sandwich with an Rz rotation:

CNOT(i, j)  ->  Rz(2*gamma, j)  ->  CNOT(i, j)

Here is why this works. The first CNOT maps: |00> -> |00>, |01> -> |01>, |10> -> |11>, |11> -> |10>. Then Rz(2gamma) on qubit j applies phase exp(-igamma) to |…0> and exp(+i*gamma) to |…1>. The second CNOT undoes the bit flip. The net effect:

|00>: CNOT gives |00>, Rz gives e^{-igamma}|00>, CNOT gives e^{-igamma}|00>
|01>: CNOT gives |01>, Rz gives e^{+igamma}|01>, CNOT gives e^{+igamma}|01>
|10>: CNOT gives |11>, Rz gives e^{+igamma}|11>, CNOT gives e^{+igamma}|10>
|11>: CNOT gives |10>, Rz gives e^{-igamma}|10>, CNOT gives e^{-igamma}|11>

This matches the ZZ phase table above. In PennyLane, the qml.IsingZZ(2*gamma, wires=[i, j]) gate implements exactly this operation in a single call.

U_B Mixer Derivation

The mixer Hamiltonian is B = sum_k X_k. The mixer unitary is:

U_B(beta) = exp(-i * beta * sum_k X_k)

Since X operators on different qubits commute (they act on separate Hilbert spaces), the exponential of the sum factorises into a tensor product:

U_B(beta) = tensor product_k exp(-i * beta * X_k)

Each single-qubit factor exp(-i * beta * X) is a rotation about the X axis. Using the identity exp(-i * theta * X) = cos(theta) I - i sin(theta) X, the matrix is:

exp(-i * beta * X) = [[cos(beta),    -i*sin(beta)],
                       [-i*sin(beta),  cos(beta)  ]]

This is precisely the Rx(2*beta) gate in PennyLane’s convention, where Rx(phi) = exp(-i * phi/2 * X). So when we call qml.RX(2*beta, wires=q), we get exp(-i * beta * X_q).

An important point: the standard X mixer does not preserve Hamming weight. Starting from |+…+> (which is a superposition of all Hamming weights), the mixer can explore the full 2^n Hilbert space. This is desirable for unconstrained problems like MaxCut. For constrained problems (like graph colouring, where solutions must have a fixed number of nodes per colour), a Hamming-weight-preserving mixer such as the XY mixer is more appropriate. We discuss this later.

The QAOA Circuit

A p-layer QAOA circuit has 2p parameters: angles gamma (one per cost layer) and beta (one per mixer layer). The circuit starts in a uniform superposition, then alternates cost and mixer unitaries.

dev = qml.device("default.qubit", wires=n_qubits)

def cost_layer(gamma):
    """Apply exp(-i * gamma * H_cost): IsingZZ on each edge."""
    for i, j in edges:
        qml.IsingZZ(2 * gamma, wires=[i, j])

def mixer_layer(beta):
    """Apply exp(-i * beta * sum_i X_i): RX on each qubit."""
    for q in range(n_qubits):
        qml.RX(2 * beta, wires=q)

@qml.qnode(dev)
def qaoa_circuit(params, p):
    """
    params: flat array of length 2p, alternating [gamma_0, beta_0, gamma_1, beta_1, ...]
    p: number of QAOA layers
    """
    # Prepare |+...+>, the ground state of -B
    for q in range(n_qubits):
        qml.Hadamard(wires=q)

    # Alternate cost and mixer layers
    for layer in range(p):
        gamma = params[2 * layer]
        beta  = params[2 * layer + 1]
        cost_layer(gamma)
        mixer_layer(beta)

    return qml.expval(H_cost)

Optimising the Parameters

The cost function is the expectation value of H_cost. We minimise it (more negative = better cut) using PennyLane’s Adam optimiser with the parameter-shift rule for gradients.

def run_qaoa(p=1, n_steps=150, stepsize=0.1, seed=42):
    np.random.seed(seed)
    params = np.random.uniform(0, np.pi, 2 * p, requires_grad=True)
    opt = qml.AdamOptimizer(stepsize=stepsize)

    energies = []
    for step in range(n_steps):
        params, cost = opt.step_and_cost(lambda p_: qaoa_circuit(p_, p), params)
        energies.append(float(cost))
        if step % 30 == 0:
            print(f"Step {step:3d}: cost = {cost:.4f}")

    return params, energies

print("=== p=1 ===")
params_p1, energies_p1 = run_qaoa(p=1)

print("\n=== p=2 ===")
params_p2, energies_p2 = run_qaoa(p=2)

Reading the Results

After optimisation, sample the circuit to find which partition was most often returned. For an ideal simulator, compute the full probability distribution over all 2^n bitstrings.

@qml.qnode(dev)
def qaoa_probs(params, p):
    for q in range(n_qubits):
        qml.Hadamard(wires=q)
    for layer in range(p):
        gamma = params[2 * layer]
        beta  = params[2 * layer + 1]
        cost_layer(gamma)
        mixer_layer(beta)
    return qml.probs(wires=range(n_qubits))

probs_p1 = qaoa_probs(params_p1, p=1)
probs_p2 = qaoa_probs(params_p2, p=2)

# Find the most probable bitstring
best_idx = int(np.argmax(probs_p2))
best_bits = format(best_idx, f"0{n_qubits}b")
print(f"Most probable bitstring (p=2): {best_bits}")
print(f"Probability: {probs_p2[best_idx]:.3f}")

# Evaluate cut value for this bitstring
partition = [int(b) for b in best_bits]
cut = sum(1 for i, j in edges if partition[i] != partition[j])
print(f"Cut value: {cut} / {len(edges)} edges")

# Plot probability distributions for p=1 and p=2
labels = [format(i, f"0{n_qubits}b") for i in range(2 ** n_qubits)]

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 4))
ax1.bar(labels, probs_p1, color="steelblue")
ax1.set_title("QAOA p=1 state probabilities")
ax1.set_xlabel("Bitstring")
ax1.set_ylabel("Probability")
ax1.tick_params(axis="x", rotation=90)

ax2.bar(labels, probs_p2, color="seagreen")
ax2.set_title("QAOA p=2 state probabilities")
ax2.set_xlabel("Bitstring")
ax2.tick_params(axis="x", rotation=90)

plt.tight_layout()
plt.savefig("qaoa_probs.png", dpi=150)
plt.show()

PennyLane Built-in QAOA Module

PennyLane provides a dedicated qml.qaoa module that automates Hamiltonian construction and layer application. This reduces boilerplate and eliminates manual errors in building the cost and mixer operators.

# Build cost and mixer Hamiltonians using qml.qaoa
H_cost_auto, H_mixer_auto = qml.qaoa.maxcut(G)

print("Auto cost Hamiltonian:", H_cost_auto)
print("Auto mixer Hamiltonian:", H_mixer_auto)

The qml.qaoa.maxcut function takes a NetworkX graph and returns two Hamiltonians: the cost Hamiltonian (with -0.5 * ZZ coefficients, matching our manual construction) and the mixer Hamiltonian (sum of X operators).

Now build the QAOA circuit using the built-in layer functions:

@qml.qnode(dev)
def qaoa_auto_circuit(params, p):
    # Start in |+...+>
    for q in range(n_qubits):
        qml.Hadamard(wires=q)

    for layer in range(p):
        gamma = params[2 * layer]
        beta  = params[2 * layer + 1]
        # Built-in cost and mixer layers
        qml.qaoa.cost_layer(gamma, H_cost_auto)
        qml.qaoa.mixer_layer(beta, H_mixer_auto)

    return qml.expval(H_cost_auto)

# Verify both implementations give the same expectation value
test_params = np.array([0.5, 0.3], requires_grad=True)
manual_val = qaoa_circuit(test_params, p=1)
auto_val = qaoa_auto_circuit(test_params, p=1)
print(f"Manual circuit: {manual_val:.6f}")
print(f"Auto circuit:   {auto_val:.6f}")
print(f"Match: {np.isclose(float(manual_val), float(auto_val))}")

The two values should be identical (or differ by at most floating-point precision), confirming that the manual and built-in implementations are equivalent.

Brute-Force Comparison

For small graphs (n <= 20 or so), we can compute the exact MaxCut by evaluating all 2^n bitstrings. This gives us the ground truth for computing approximation ratios.

def brute_force_maxcut(n_nodes, edges):
    """
    Enumerate all 2^n partitions and return the maximum cut value
    along with all bitstrings achieving that cut.
    """
    best_cut = 0
    best_bitstrings = []

    for b in range(2 ** n_nodes):
        bits = [(b >> i) & 1 for i in range(n_nodes)]
        cut_val = sum(1 for i, j in edges if bits[i] != bits[j])
        if cut_val > best_cut:
            best_cut = cut_val
            best_bitstrings = [format(b, f"0{n_nodes}b")]
        elif cut_val == best_cut:
            best_bitstrings.append(format(b, f"0{n_nodes}b"))

    return best_cut, best_bitstrings

optimal_cut, optimal_bitstrings = brute_force_maxcut(n_qubits, edges)
print(f"Optimal MaxCut value: {optimal_cut}")
print(f"Optimal bitstrings: {optimal_bitstrings}")

# Compute approximation ratio for p=2
qaoa_cut = cut  # from the earlier result
approx_ratio = qaoa_cut / optimal_cut
print(f"QAOA (p=2) cut: {qaoa_cut}, Approximation ratio: {approx_ratio:.4f}")

Note that the brute-force function uses little-endian bit ordering (bit 0 is the least significant bit), matching PennyLane’s qubit ordering convention.

Landscape Visualisation for p=1

For p=1, the cost function E(gamma, beta) depends on only two parameters, so we can visualise the full landscape as a 2D contour plot. This reveals the structure of the optimisation problem.

# Scan a 30x30 grid over (gamma, beta)
n_grid = 30
gamma_range = np.linspace(0, np.pi / 2, n_grid)
beta_range = np.linspace(0, np.pi / 4, n_grid)
landscape = np.zeros((n_grid, n_grid))

for i, gamma in enumerate(gamma_range):
    for j, beta in enumerate(beta_range):
        params_scan = np.array([gamma, beta], requires_grad=False)
        landscape[j, i] = qaoa_circuit(params_scan, p=1)

# Plot the landscape
fig, ax = plt.subplots(figsize=(8, 6))
cf = ax.contourf(gamma_range, beta_range, landscape, levels=30, cmap="RdYlBu_r")
plt.colorbar(cf, ax=ax, label="Cost function value")

# Mark the optimised point from Adam
opt_gamma, opt_beta = float(params_p1[0]), float(params_p1[1])
# Wrap into the scan range for visualisation
opt_gamma_mod = opt_gamma % (np.pi / 2)
opt_beta_mod = opt_beta % (np.pi / 4)
ax.plot(opt_gamma_mod, opt_beta_mod, "k*", markersize=15, label="Adam optimum")

ax.set_xlabel(r"$\gamma$")
ax.set_ylabel(r"$\beta$")
ax.set_title("QAOA p=1 cost landscape")
ax.legend()
plt.tight_layout()
plt.savefig("qaoa_landscape.png", dpi=150)
plt.show()

The landscape is clearly non-convex, with multiple local minima and saddle points. This is why a single optimisation run from random initialisation may not find the global optimum. The standard remedy is to run multiple random restarts and keep the best result.

Approximation Ratio Analysis for Different p

We now run QAOA for p = 1, 2, 3 with multiple random restarts and track how the approximation ratio improves with depth.

def run_qaoa_with_restarts(p, n_restarts=5, n_steps=200, stepsize=0.1):
    """Run QAOA with multiple random restarts, return the best parameters and cut value."""
    best_cost = 0.0
    best_params = None

    for restart in range(n_restarts):
        seed = restart * 17 + 3  # deterministic but varied seeds
        np.random.seed(seed)
        params = np.random.uniform(0, np.pi, 2 * p, requires_grad=True)
        opt = qml.AdamOptimizer(stepsize=stepsize)

        for step in range(n_steps):
            params, cost = opt.step_and_cost(lambda p_: qaoa_circuit(p_, p), params)

        if best_params is None or float(cost) < best_cost:
            best_cost = float(cost)
            best_params = params.copy()

    # Get probabilities and find best bitstring
    probs = qaoa_probs(best_params, p=p)
    best_idx = int(np.argmax(probs))
    best_bits = format(best_idx, f"0{n_qubits}b")
    partition = [int(b) for b in best_bits]
    cut_val = sum(1 for i, j in edges if partition[i] != partition[j])

    return best_params, best_cost, cut_val, best_bits

# Run for p = 1, 2, 3
optimal_cut, _ = brute_force_maxcut(n_qubits, edges)
results = {}

for p in [1, 2, 3]:
    params, cost, cut_val, bits = run_qaoa_with_restarts(p, n_restarts=5)
    ratio = cut_val / optimal_cut
    results[p] = {"cost": cost, "cut": cut_val, "ratio": ratio, "bitstring": bits}
    print(f"p={p}: cut={cut_val}, optimal={optimal_cut}, ratio={ratio:.4f}, bitstring={bits}")

# Display as a table
print("\n  p | QAOA cut | Optimal cut | Approx. ratio | Best bitstring")
print("----+----------+-------------+---------------+----------------")
for p in [1, 2, 3]:
    r = results[p]
    print(f"  {p} |    {r['cut']}     |      {optimal_cut}      |    {r['ratio']:.4f}     | {r['bitstring']}")

# Plot approximation ratio vs p
p_values = [1, 2, 3]
ratios = [results[p]["ratio"] for p in p_values]

fig, ax = plt.subplots(figsize=(6, 4))
ax.plot(p_values, ratios, "o-", color="steelblue", markersize=10, linewidth=2)
ax.axhline(y=1.0, color="gray", linestyle="--", alpha=0.5, label="Optimal")
ax.axhline(y=0.6924, color="red", linestyle=":", alpha=0.7, label="p=1 guarantee (3-regular)")
ax.set_xlabel("QAOA depth p")
ax.set_ylabel("Approximation ratio")
ax.set_title("QAOA approximation ratio vs circuit depth")
ax.set_xticks(p_values)
ax.set_ylim(0.5, 1.05)
ax.legend()
plt.tight_layout()
plt.savefig("qaoa_ratio_vs_p.png", dpi=150)
plt.show()

For p = 1 on 3-regular graphs, QAOA provably achieves an approximation ratio of at least 0.6924. Our 4-node graph is not 3-regular (node 1 has degree 3, while nodes 0 and 2 have degree 2), but we still observe that increasing p systematically improves the result. Classical algorithms like Goemans-Williamson achieve 0.878, so surpassing classical performance requires higher depth or larger, more structured problem instances.

Parameter Initialisation Strategies

Random initialisation works for small p, but as the number of parameters grows, the optimisation landscape becomes harder to navigate. Three common strategies address this.

Strategy 1: Random Initialisation

The simplest approach samples gamma and beta uniformly at random. For p = 1 this is fine, but for larger p the search space grows exponentially and random starting points often land in poor local minima.

Strategy 2: INTERP (Interpolation)

The INTERP strategy uses the optimal parameters from depth p to initialise depth p+1. The idea: if the optimal schedule at depth p is (gamma_1, …, gamma_p, beta_1, …, beta_p), then a reasonable initial guess for depth p+1 is a linearly interpolated version that stretches the p-step schedule into p+1 steps.

The interpolation formulas are:

gamma'_k = (k-1)/(p) * gamma_{k-1} + (p+1-k)/(p) * gamma_k    for k = 1, ..., p+1
beta'_k  = (k-1)/(p) * beta_{k-1}  + (p+1-k)/(p) * beta_k     for k = 1, ..., p+1

with boundary conditions gamma_0 = beta_0 = gamma_{p+1} = beta_{p+1} = 0.

def interp_initialisation(gamma_opt, beta_opt):
    """
    Given optimised parameters for depth p, produce initial parameters for depth p+1
    using linear interpolation (INTERP strategy from Zhou et al., 2020).
    """
    p = len(gamma_opt)
    p_new = p + 1
    gamma_new = np.zeros(p_new)
    beta_new = np.zeros(p_new)

    # Pad with zeros at boundaries
    gamma_padded = np.concatenate([[0.0], gamma_opt, [0.0]])
    beta_padded = np.concatenate([[0.0], beta_opt, [0.0]])

    for k in range(p_new):
        # k is 0-indexed, paper uses 1-indexed
        frac = k / p if p > 0 else 0
        gamma_new[k] = frac * gamma_padded[k] + (1 - frac) * gamma_padded[k + 1]
        beta_new[k]  = frac * beta_padded[k]  + (1 - frac) * beta_padded[k + 1]

    return gamma_new, beta_new

# Example: use p=1 optimal to initialise p=2
gamma_1 = np.array([float(params_p1[0])])
beta_1 = np.array([float(params_p1[1])])
gamma_2_init, beta_2_init = interp_initialisation(gamma_1, beta_1)

print(f"p=1 optimal: gamma={gamma_1}, beta={beta_1}")
print(f"p=2 INTERP init: gamma={gamma_2_init}, beta={beta_2_init}")

# Run optimisation starting from INTERP initialisation
interp_params = np.array(
    [val for pair in zip(gamma_2_init, beta_2_init) for val in pair],
    requires_grad=True,
)
opt = qml.AdamOptimizer(stepsize=0.1)
for step in range(150):
    interp_params, cost = opt.step_and_cost(lambda p_: qaoa_circuit(p_, 2), interp_params)
print(f"INTERP-initialised p=2 cost: {float(cost):.4f}")

Strategy 3: FOURIER

The FOURIER strategy reparameterises gamma and beta using a truncated Fourier series. Instead of p free parameters each, we use q << p Fourier coefficients:

gamma_k = sum_{l=1}^{q} u_l * sin((l - 0.5) * (k - 0.5) * pi / p)
beta_k  = sum_{l=1}^{q} v_l * cos((l - 0.5) * (k - 0.5) * pi / p)

This reduces the number of optimisable parameters from 2p to 2q, which is especially helpful for large p. The low-frequency bias acts as an implicit regulariser, avoiding erratic parameter schedules.

Weighted MaxCut

Real-world optimisation problems often involve weighted graphs, where each edge has an associated weight. The MaxCut objective for a weighted graph is:

Cut(z) = sum_{(i,j) in E} w_ij * (1 - z_i * z_j) / 2

The cost Hamiltonian becomes:

H_cost = sum_{(i,j) in E} (-w_ij / 2) * Z_i Z_j

The cost unitary for each edge now applies exp(-i * gamma * w_ij * Z_i Z_j), which is an IsingZZ gate with angle 2 * gamma * w_ij.

# Define a weighted 4-node graph
weighted_edges = [(0, 1, 3.0), (0, 3, 1.0), (1, 2, 2.0), (2, 3, 4.0), (1, 3, 1.5)]

G_weighted = nx.Graph()
G_weighted.add_nodes_from(range(n_qubits))
for i, j, w in weighted_edges:
    G_weighted.add_edge(i, j, weight=w)

# Build weighted cost Hamiltonian
w_cost_coeffs = [-w / 2 for _, _, w in weighted_edges]
w_cost_ops = [qml.PauliZ(i) @ qml.PauliZ(j) for i, j, _ in weighted_edges]
H_cost_w = qml.Hamiltonian(w_cost_coeffs, w_cost_ops)

def w_cost_layer(gamma):
    for i, j, w in weighted_edges:
        qml.IsingZZ(2 * gamma * w, wires=[i, j])

@qml.qnode(dev)
def qaoa_weighted(params, p):
    for q in range(n_qubits):
        qml.Hadamard(wires=q)
    for layer in range(p):
        gamma = params[2 * layer]
        beta  = params[2 * layer + 1]
        w_cost_layer(gamma)
        mixer_layer(beta)
    return qml.expval(H_cost_w)

@qml.qnode(dev)
def qaoa_weighted_probs(params, p):
    for q in range(n_qubits):
        qml.Hadamard(wires=q)
    for layer in range(p):
        gamma = params[2 * layer]
        beta  = params[2 * layer + 1]
        w_cost_layer(gamma)
        mixer_layer(beta)
    return qml.probs(wires=range(n_qubits))

# Brute-force weighted MaxCut
def brute_force_weighted_maxcut(n_nodes, weighted_edges):
    best_cut = 0.0
    best_bitstrings = []
    for b in range(2 ** n_nodes):
        bits = [(b >> i) & 1 for i in range(n_nodes)]
        cut_val = sum(w for i, j, w in weighted_edges if bits[i] != bits[j])
        if cut_val > best_cut:
            best_cut = cut_val
            best_bitstrings = [format(b, f"0{n_nodes}b")]
        elif cut_val == best_cut:
            best_bitstrings.append(format(b, f"0{n_nodes}b"))
    return best_cut, best_bitstrings

opt_w_cut, opt_w_bits = brute_force_weighted_maxcut(n_qubits, weighted_edges)
print(f"Optimal weighted cut: {opt_w_cut}, bitstrings: {opt_w_bits}")

# Run weighted QAOA with p=2
np.random.seed(42)
w_params = np.random.uniform(0, np.pi, 4, requires_grad=True)
opt = qml.AdamOptimizer(stepsize=0.1)

for step in range(200):
    w_params, cost = opt.step_and_cost(lambda p_: qaoa_weighted(p_, 2), w_params)
    if step % 50 == 0:
        print(f"Step {step:3d}: cost = {cost:.4f}")

# Evaluate the result
w_probs = qaoa_weighted_probs(w_params, p=2)
w_best_idx = int(np.argmax(w_probs))
w_best_bits = format(w_best_idx, f"0{n_qubits}b")
w_partition = [int(b) for b in w_best_bits]
w_cut_val = sum(w for i, j, w in weighted_edges if w_partition[i] != w_partition[j])
print(f"\nWeighted QAOA result: bitstring={w_best_bits}, cut value={w_cut_val:.1f}")
print(f"Optimal: {opt_w_cut:.1f}, Approx ratio: {w_cut_val / opt_w_cut:.4f}")

The key difference from unweighted QAOA is that the IsingZZ angle scales with the edge weight. Edges with larger weights receive stronger rotations, so the circuit naturally prioritises cutting high-weight edges.

Alternative Mixers

The standard X mixer (B = sum_k X_k) is the default choice for QAOA, but it is not the only option. Different mixers are appropriate for different problem structures.

The XY Mixer

The XY mixer preserves Hamming weight: it swaps excitations between qubits without creating or destroying them. The Hamiltonian is:

B_XY = sum_{(i,j) in E_mixer} (X_i X_j + Y_i Y_j) / 2

This mixer is useful for problems with equality constraints. For example, in graph colouring with k colours, each node must receive exactly one colour. Encoding this as a one-hot binary vector means the Hamming weight is fixed at k. The XY mixer preserves this constraint by construction, so the circuit never explores invalid solutions.

The circuit for a single XY interaction exp(-i * beta * (XX + YY) / 2) between qubits i and j decomposes as:

def xy_mixer_term(beta, i, j):
    """Apply exp(-i * beta * (X_i X_j + Y_i Y_j) / 2)."""
    qml.CNOT(wires=[i, j])
    qml.RY(2 * beta, wires=i)
    qml.CNOT(wires=[i, j])

The full XY mixer applies this to every pair in the mixer graph (which may differ from the problem graph).

When to Use Which Mixer

Use the standard X mixer when:

The problem is unconstrained (like MaxCut)
You want to explore all possible bitstrings
Simplicity is preferred

Use the XY mixer when:

Solutions must satisfy equality constraints (fixed Hamming weight)
The feasible subspace is a small fraction of the full Hilbert space
Constraint violations would make the optimiser waste effort on invalid solutions

Other mixer options include the Grover mixer (which can preserve arbitrary feasible subspaces) and ring mixers (which connect qubits in a cycle).

Common Mistakes

Building a correct QAOA implementation requires attention to several subtle details. Here are five pitfalls that frequently trip up practitioners.

1. Wrong Sign Convention

QAOA maximises the cut, but optimisers like Adam minimise. The cost Hamiltonian must use negative coefficients (cost_coeffs = [-0.5]) so that minimising the expectation value corresponds to maximising the cut. If you accidentally use positive coefficients, the optimiser will find the minimum cut (the worst partition) instead.

A quick sanity check: after optimisation, the expectation value should be negative. If it converges to a positive number, your signs are likely wrong.

2. Too Few Shots on Real Hardware

On a simulator, expectation values are computed analytically. On real hardware, you estimate them by sampling (shooting) the circuit many times. With only 1000 shots, the estimated expectation value has a standard deviation of roughly 1/sqrt(1000) ≈ 0.03, which can mislead the optimiser into a poor parameter region. Use at least 4000 to 8000 shots for stable gradient estimates. Better yet, use parameter-shift gradients with enough shots per shift to achieve a target precision.

3. Not Running Multiple Restarts

The QAOA landscape has many local minima, as the landscape visualisation above demonstrates. A single optimisation run from a random starting point often converges to a suboptimal local minimum. Running 5 to 10 random restarts and keeping the best result is a simple and effective mitigation. For higher p, the INTERP strategy provides a more principled warm-start.

4. Wrong Angle Convention for RX

PennyLane defines Rx(phi) = exp(-i * phi/2 * X). To implement the mixer exp(-i * beta * X), the circuit angle must be 2 * beta, not beta. This factor-of-two error is easy to introduce and hard to debug because the circuit still runs, it just optimises the wrong landscape. Always check: qml.RX(2 * beta, wires=q) implements exp(-i * beta * X).

The same applies to IsingZZ: qml.IsingZZ(2 * gamma, wires=[i, j]) implements exp(-i * gamma * Z_i Z_j).

5. Treating QAOA Output as a Single Solution

QAOA produces a probability distribution over bitstrings, not a single deterministic answer. The most probable bitstring (argmax of the probability vector) is a good candidate, but it is not guaranteed to be the optimal solution. In practice, sample many bitstrings from the output distribution, evaluate the cut value of each, and keep the best one. This post-processing step is essential: sometimes a bitstring with slightly lower probability has a higher cut value than the argmax bitstring.

A related mistake is reporting only the expectation value as the QAOA performance metric. The expectation value is a weighted average over all bitstrings and is always less than or equal to the best cut value achievable by any single bitstring in the distribution. The true performance metric is the cut value of the best sampled bitstring divided by the optimal cut (the approximation ratio).

Scaling and Quantum Advantage

Increasing p concentrates probability mass on high-cut bitstrings, so the approximation ratio improves with depth. In the p -> infinity limit, QAOA is proven to find the exact optimum. For finite p on real hardware, performance depends on circuit noise, connectivity, and the problem instance.

QAOA with p=1 gives a guaranteed approximation ratio of at least 0.6924 for MaxCut on 3-regular graphs. Classical algorithms like Goemans-Williamson achieve 0.878. Whether quantum hardware can exceed classical solvers for practical problem sizes is an active research question. See the NISQ era discussion for context on hardware limitations.

What to Try Next

Scale to larger graphs (8-12 nodes) and compare p=1 vs p=3 approximation ratios
Replace the graph edges to model a scheduling or portfolio optimisation problem
Implement the FOURIER parameterisation for high-depth QAOA (p >= 5)
Experiment with the XY mixer on a constrained problem like graph colouring
Run on real hardware via the PennyLane-Qiskit plugin and observe how noise affects the approximation ratio
See the PennyLane reference for additional qml.qaoa built-in utilities