PennyLane Intermediate Free 8/26 in series 15 min read

QAOA for Max-Cut with PennyLane

Implement the Quantum Approximate Optimization Algorithm for the Max-Cut problem in PennyLane: graph encoding, cost Hamiltonian, circuit construction, and parameter optimization.

What you'll learn

  • QAOA
  • PennyLane
  • Max-Cut
  • optimization
  • graph

Prerequisites

  • Python proficiency
  • Beginner quantum computing concepts (superposition, entanglement)
  • Linear algebra basics

The Max-Cut Problem

Given an undirected graph, the Max-Cut problem asks you to partition its vertices into two sets so that the number of edges crossing the partition is maximized.

For a graph with four vertices and five edges, a perfect cut might route three or four edges across the partition boundary. Finding the truly optimal cut is NP-hard in general; no polynomial-time classical algorithm is known for the worst case. Despite that, Max-Cut is a benchmark problem for QAOA precisely because its structure maps cleanly onto quantum circuits.

The problem also has practical relevance. Max-Cut appears in chip design (minimizing wire crossings), statistical physics (ground state of Ising models), and machine learning (clustering via graph cuts).

QAOA Overview

QAOA (Quantum Approximate Optimization Algorithm, Farhi et al. 2014) is a variational algorithm. It prepares a parameterized quantum state and measures a cost function on it, then uses a classical optimizer to update the parameters until the cost improves.

The circuit alternates two types of operations over p layers:

  1. Cost unitary U_C(gamma): encodes the problem’s objective function by applying phase rotations proportional to the cost of each solution.
  2. Mixer unitary U_B(beta): mixes amplitudes across the solution space, allowing the algorithm to explore.

After p layers, you have 2p parameters: one gamma and one beta per layer. Larger p gives a better approximation at the cost of a deeper circuit. At p = infinity, QAOA can in principle find the exact optimum.

Connection to Adiabatic Computing

QAOA has a deep theoretical connection to quantum adiabatic computation. To understand this connection, consider two Hamiltonians:

  • H_B (the mixer Hamiltonian): a sum of Pauli-X operators, H_B = sum_i X_i. The ground state of H_B is the uniform superposition |+...+>, which is easy to prepare.
  • H_C (the cost Hamiltonian): the operator whose ground state encodes the optimal solution to the combinatorial problem.

The adiabatic theorem says that if you start in the ground state of H_B and slowly interpolate toward H_C via a time-dependent Hamiltonian:

H(t) = (1 - t/T) * H_B + (t/T) * H_C,    t in [0, T]

then for sufficiently large T (slow interpolation), the system remains in the instantaneous ground state throughout the evolution. At t = T, the system sits in the ground state of H_C, which is the optimal solution.

The challenge is that “sufficiently slow” can mean exponentially slow for hard problem instances.

From Adiabatic to QAOA via Trotterization

QAOA approximates this continuous adiabatic evolution with p discrete steps. Consider dividing the total evolution time T into p intervals. At each interval k, the Hamiltonian is approximately constant. The time evolution operator for the full adiabatic path factors as:

U_adiabatic ≈ prod_{k=1}^{p} exp(-i * beta_k * H_B) * exp(-i * gamma_k * H_C)

This is the first-order Trotter product formula (also called the Lie-Trotter decomposition). Each factor is a unitary generated by one of the two Hamiltonians at a specific time step.

The QAOA circuit is exactly this Trotter product, but it treats the gamma_k and beta_k values as free variational parameters rather than fixing them from the adiabatic schedule. This is a key insight: by optimizing the parameters classically, QAOA can potentially find better solutions at low p than the corresponding Trotter approximation of the adiabatic path.

The relationship between p and the adiabatic limit is:

pBehavior
1Coarse approximation; only two parameters
2-5Increasingly fine Trotterization; captures more structure
p -> infinityRecovers the full adiabatic evolution; finds exact ground state

In practice, even small p gives useful results because the classical optimizer compensates for the coarse discretization by choosing parameter values that differ from the naive adiabatic schedule.

Encoding Max-Cut as an Ising Hamiltonian

For each edge (i, j) in the graph, assign binary variables z_i and z_j representing which partition each vertex belongs to. The cut value contributed by that edge is:

cut(i, j) = (1 - z_i * z_j) / 2

When z_i and z_j differ (the edge is cut), the contribution is 1. When they match, it is 0.

In quantum notation, replace z_i with the Pauli-Z operator on qubit i. The cost Hamiltonian becomes:

H_C = sum_{(i,j) in edges} (I - Z_i Z_j) / 2

Maximizing the number of cut edges is equivalent to finding the ground state of -H_C, or equivalently maximizing the expectation value <H_C>.

Deriving the Cost and Mixer Unitaries

Understanding how the abstract Hamiltonians become concrete quantum gates is essential for implementing QAOA correctly.

Cost Unitary Derivation

The cost unitary applies a phase proportional to the cost of each computational basis state. For a single edge (i, j), the contribution to the cost Hamiltonian is (I - Z_i Z_j) / 2. The corresponding unitary is:

U_edge(gamma) = exp(-i * gamma * Z_i Z_j / 2)

(We drop the identity term since it contributes only a global phase.)

To compute the matrix form of this unitary, note that Z_i Z_j has eigenvalues +1 and -1. In the computational basis {|00>, |01>, |10>, |11>}, the operator Z_i Z_j is diagonal:

Z_i Z_j = diag(+1, -1, -1, +1)

So the single-edge unitary is the 4x4 diagonal matrix:

U_edge(gamma) = | exp(-i*gamma/2)     0              0              0           |
                | 0                   exp(+i*gamma/2) 0              0           |
                | 0                   0              exp(+i*gamma/2) 0           |
                | 0                   0              0              exp(-i*gamma/2)|

States where qubits i and j agree (|00>, |11>) pick up phase exp(-i*gamma/2), while states where they disagree (|01>, |10>) pick up phase exp(+i*gamma/2). This is exactly how QAOA rewards cut edges.

Gate Decomposition

To implement exp(-i * gamma * Z_i Z_j / 2) on a quantum computer, use the standard ZZ decomposition:

exp(-i * gamma * Z_i Z_j / 2) = CNOT(i, j) * Rz(gamma)(j) * CNOT(i, j)

Here Rz(gamma) = exp(-i * gamma * Z / 2). The two CNOTs conjugate the Rz gate so that it acts on the parity of qubits i and j rather than on qubit j alone.

PennyLane provides qml.IsingZZ(phi, wires=[i, j]), which implements:

IsingZZ(phi) = exp(-i * phi/2 * Z_i Z_j)

Comparing with our cost unitary exp(-i * gamma * Z_i Z_j / 2), we need phi/2 = gamma/2, so phi = gamma. Wait, let us be more careful. The IsingZZ gate convention in PennyLane is:

IsingZZ(phi) = exp(-i * phi/2 * Z_i Z_j)

We want exp(-i * gamma * Z_i Z_j / 2). Setting phi/2 = gamma/2 gives phi = gamma. But in our circuit code we pass 2 * gamma. Why? Because the cost Hamiltonian includes a factor of 1/2 from (I - Z_i Z_j)/2, and when we exponentiate the full edge term with the optimization parameter gamma, we absorb this factor differently depending on convention.

The convention used in our code is: the cost layer applies IsingZZ(2 * gamma) for each edge, which gives exp(-i * gamma * Z_i Z_j). This corresponds to exponentiating gamma * Z_i Z_j (without the 1/2 from the Hamiltonian). This is a valid parameterization because gamma is a free variational parameter; the optimizer finds the right value regardless of how you absorb constant factors.

Mixer Unitary Derivation

The mixer Hamiltonian is H_B = sum_i X_i. Since the X operators on different qubits commute (they act on different Hilbert spaces), the mixer unitary factors exactly:

exp(-i * beta * H_B) = prod_i exp(-i * beta * X_i)

Each single-qubit factor is simply an X-rotation:

exp(-i * beta * X_i) = Rx(2 * beta)

This follows from the definition Rx(theta) = exp(-i * theta/2 * X). Setting theta/2 = beta gives theta = 2 * beta.

Setup and Graph Definition

pip install pennylane networkx matplotlib
import pennylane as qml
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt

# Define a 4-node graph
G = nx.Graph()
G.add_edges_from([(0, 1), (0, 3), (1, 2), (2, 3), (1, 3)])
n_qubits = G.number_of_nodes()

print(f"Nodes: {list(G.nodes())}")
print(f"Edges: {list(G.edges())}")
# Nodes: [0, 1, 2, 3]
# Edges: [(0, 1), (0, 3), (1, 2), (2, 3), (1, 3)]

Building the Cost Hamiltonian

# Build cost Hamiltonian as a PennyLane operator
def build_cost_hamiltonian(graph):
    coeffs = []
    obs = []
    for i, j in graph.edges():
        # Coefficient for (I - Z_i Z_j) / 2
        # Split into: 0.5 * I  -  0.5 * Z_i Z_j
        coeffs.append(0.5)
        obs.append(qml.Identity(i))
        coeffs.append(-0.5)
        obs.append(qml.PauliZ(i) @ qml.PauliZ(j))
    return qml.Hamiltonian(coeffs, obs)

H_cost = build_cost_hamiltonian(G)
print(H_cost)

Using PennyLane’s Built-in QAOA Module

PennyLane provides a dedicated qml.qaoa module that simplifies QAOA construction. Instead of building the cost Hamiltonian by hand, you can use qml.qaoa.maxcut to generate it directly from the graph:

# Build cost and mixer Hamiltonians using the qaoa module
H_cost_auto, H_mixer_auto = qml.qaoa.maxcut(G)

print("Cost Hamiltonian (from qml.qaoa.maxcut):")
print(H_cost_auto)
print("\nMixer Hamiltonian:")
print(H_mixer_auto)

The module also provides qml.qaoa.cost_layer and qml.qaoa.mixer_layer that apply the appropriate unitaries:

dev_compare = qml.device("default.qubit", wires=n_qubits)

@qml.qnode(dev_compare)
def qaoa_circuit_module(params, p, graph):
    """QAOA circuit using qml.qaoa built-in functions."""
    gammas = params[:p]
    betas = params[p:]

    H_cost_mod, H_mixer_mod = qml.qaoa.maxcut(graph)

    # Initialize in equal superposition
    for wire in range(graph.number_of_nodes()):
        qml.Hadamard(wires=wire)

    # Apply cost and mixer layers using the module
    for layer in range(p):
        qml.qaoa.cost_layer(gammas[layer], H_cost_mod)
        qml.qaoa.mixer_layer(betas[layer], H_mixer_mod)

    return qml.expval(H_cost_mod)

# Compare manual and module implementations
test_params = np.array([0.5, 0.3])
manual_result = qaoa_circuit(test_params, p=1, graph=G)
module_result = qaoa_circuit_module(test_params, p=1, graph=G)

print(f"Manual implementation: {manual_result:.6f}")
print(f"Module implementation: {module_result:.6f}")
print(f"Difference: {abs(manual_result - module_result):.2e}")
# The two should agree to numerical precision

The qml.qaoa module handles the Hamiltonian construction, the decomposition into gates, and the layer structure. For production code, prefer the module; for learning, the manual approach helps you understand what happens inside.

Constructing the QAOA Circuit

dev = qml.device("default.qubit", wires=n_qubits)

def cost_layer(gamma, graph):
    """Apply exp(-i * gamma * H_C) via IsingZZ on each edge."""
    for i, j in graph.edges():
        qml.IsingZZ(2 * gamma, wires=[i, j])

def mixer_layer(beta):
    """Apply exp(-i * beta * H_B) via RX on each qubit."""
    for wire in range(n_qubits):
        qml.RX(2 * beta, wires=wire)

@qml.qnode(dev)
def qaoa_circuit(params, p, graph):
    gammas = params[:p]
    betas  = params[p:]

    # Equal superposition initialization
    for wire in range(n_qubits):
        qml.Hadamard(wires=wire)

    # Alternate cost and mixer layers
    for layer in range(p):
        cost_layer(gammas[layer], graph)
        mixer_layer(betas[layer])

    return qml.expval(build_cost_hamiltonian(graph))

Testing the Circuit at p=1

# Initial parameters (random)
p = 1
np.random.seed(42)
params_init = np.random.uniform(0, np.pi, 2 * p)

cost = qaoa_circuit(params_init, p=p, graph=G)
print(f"Initial cost: {cost:.4f}")
# Expected: some value between 0 and 5 (max possible cut = 5 edges)

Parameter Landscape Visualization

Before optimizing, it helps to visualize the energy landscape. For p=1 QAOA, there are only two parameters (gamma, beta), so the full landscape is a 2D surface that you can plot as a heatmap.

# Compute the QAOA energy landscape for p=1
n_points = 30
gamma_range = np.linspace(0, 2 * np.pi, n_points)
beta_range = np.linspace(0, np.pi, n_points)
landscape = np.zeros((n_points, n_points))

for i, gamma in enumerate(gamma_range):
    for j, beta in enumerate(beta_range):
        params = np.array([gamma, beta])
        landscape[j, i] = qaoa_circuit(params, p=1, graph=G)

# Plot the landscape
fig, ax = plt.subplots(figsize=(8, 6))
im = ax.imshow(
    landscape,
    extent=[0, 2 * np.pi, 0, np.pi],
    origin="lower",
    aspect="auto",
    cmap="RdYlBu_r",
)
ax.set_xlabel(r"$\gamma$")
ax.set_ylabel(r"$\beta$")
ax.set_title("QAOA p=1 Energy Landscape for Max-Cut")
plt.colorbar(im, label=r"$\langle H_C \rangle$")
plt.tight_layout()
plt.savefig("qaoa_landscape.png", dpi=150)
plt.show()

# Find the maximum in the landscape
max_idx = np.unravel_index(np.argmax(landscape), landscape.shape)
best_gamma = gamma_range[max_idx[1]]
best_beta = beta_range[max_idx[0]]
print(f"Grid search best: gamma={best_gamma:.3f}, beta={best_beta:.3f}")
print(f"Grid search best <H_C> = {landscape[max_idx]:.4f}")

The landscape has a regular, periodic structure with several local maxima. This periodicity comes from the fact that the cost and mixer unitaries are periodic in their parameters. The landscape also exhibits symmetries: for example, the landscape is symmetric under gamma -> 2*pi - gamma combined with beta -> pi - beta.

This structure has practical implications. First, it means the optimization landscape is non-convex, so gradient-based optimizers can get stuck in local optima. Second, the regular structure means that grid search over a coarse grid (as above) can identify good starting points for fine-tuning with gradient descent. Third, the periodicity means you only need to search over a bounded region, not all of R^2.

Optimizing Parameters

from pennylane.optimize import AdamOptimizer
import pennylane.numpy as pnp

def optimize_qaoa(graph, p=1, steps=150, lr=0.1):
    np.random.seed(0)
    params = pnp.array(np.random.uniform(0, np.pi, 2 * p), requires_grad=True)

    optimizer = AdamOptimizer(stepsize=lr)
    cost_history = []

    for step in range(steps):
        params, cost = optimizer.step_and_cost(
            lambda params_: -qaoa_circuit(params_, p=p, graph=graph),
            params
        )
        cost_history.append(-float(cost))   # flip sign: we minimized -cost

        if step % 30 == 0:
            print(f"Step {step:3d}  <H_C> = {cost_history[-1]:.4f}")

    return params, cost_history

# Run for p=1 and p=2
params_p1, history_p1 = optimize_qaoa(G, p=1, steps=150)
params_p2, history_p2 = optimize_qaoa(G, p=2, steps=200)

INTERP Initialization Strategy

One of the biggest challenges in QAOA is initializing parameters for higher p values. Random initialization becomes increasingly unreliable as the number of parameters grows, because the optimization landscape has more local optima.

The INTERP (interpolation) strategy provides a principled way to bootstrap parameters from lower p to higher p. The idea: once you find good parameters for p layers, use them to construct a starting point for p+1 layers.

The procedure is:

  1. Optimize QAOA at p=1 to find optimal parameters [gamma_1, beta_1].
  2. To initialize p=2, set gamma = [0, gamma_1] and beta = [beta_1, 0]. In other words, prepend a zero to gamma and append a zero to beta.
  3. More generally, for arbitrary p, interpolate the optimal p-layer parameters onto a (p+1)-point grid.

The intuition is that adding a new layer with zero parameters changes nothing about the circuit; the optimizer then has a good starting point and only needs to make small adjustments.

def interp_initialize(params_p, p):
    """
    Use INTERP strategy to initialize p+1 layers from optimal p-layer parameters.

    Given optimal [gamma_1, ..., gamma_p, beta_1, ..., beta_p],
    produce initial [gamma_1', ..., gamma_{p+1}', beta_1', ..., beta_{p+1}'].

    The interpolation inserts values at the boundaries:
    gamma' = [0, gamma_1, ..., gamma_p]  (prepend zero)
    beta'  = [beta_1, ..., beta_p, 0]    (append zero)
    """
    gammas = params_p[:p]
    betas = params_p[p:]

    # Prepend zero to gammas, append zero to betas
    new_gammas = np.concatenate([[0.0], np.array(gammas)])
    new_betas = np.concatenate([np.array(betas), [0.0]])

    return pnp.array(np.concatenate([new_gammas, new_betas]), requires_grad=True)


def optimize_with_interp(graph, max_p=4, steps_per_p=150, lr=0.1):
    """Optimize QAOA from p=1 to max_p using INTERP initialization."""
    results = {}

    # Start with p=1 (random init)
    np.random.seed(0)
    params = pnp.array(np.random.uniform(0, np.pi, 2), requires_grad=True)
    optimizer = AdamOptimizer(stepsize=lr)

    for step in range(steps_per_p):
        params, _ = optimizer.step_and_cost(
            lambda p_: -qaoa_circuit(p_, p=1, graph=graph), params
        )

    cost_p1 = float(qaoa_circuit(params, p=1, graph=graph))
    results[1] = {"params": params.copy(), "cost": cost_p1}
    print(f"p=1: <H_C> = {cost_p1:.4f}")

    # Build up from p=2 to max_p using INTERP
    for p in range(2, max_p + 1):
        prev_params = results[p - 1]["params"]
        params = interp_initialize(prev_params, p - 1)
        optimizer = AdamOptimizer(stepsize=lr)

        for step in range(steps_per_p):
            params, _ = optimizer.step_and_cost(
                lambda p_, p_val=p: -qaoa_circuit(p_, p=p_val, graph=graph),
                params
            )

        cost_p = float(qaoa_circuit(params, p=p, graph=graph))
        results[p] = {"params": params.copy(), "cost": cost_p}
        print(f"p={p}: <H_C> = {cost_p:.4f}")

    return results

interp_results = optimize_with_interp(G, max_p=4)

The INTERP strategy consistently outperforms random initialization for p > 1, because it starts the optimizer in a region of parameter space that already encodes useful structure from the lower-depth solution.

Reading the Solution

After optimization, sample bitstrings from the circuit to find the actual cut partition:

dev_sample = qml.device("default.qubit", wires=n_qubits, shots=1024)

@qml.qnode(dev_sample)
def qaoa_sample(params, p, graph):
    gammas = params[:p]
    betas  = params[p:]
    for wire in range(n_qubits):
        qml.Hadamard(wires=wire)
    for layer in range(p):
        cost_layer(gammas[layer], graph)
        mixer_layer(betas[layer])
    return qml.sample(wires=range(n_qubits))

samples = qaoa_sample(params_p1, p=1, graph=G)
# Each row is one bitstring: [0,1,0,1] means vertices 1,3 in partition A; 0,2 in partition B

def count_cut(bitstring, graph):
    cut = 0
    for i, j in graph.edges():
        if bitstring[i] != bitstring[j]:
            cut += 1
    return cut

# Evaluate the most common bitstring
from collections import Counter
bitstrings = [tuple(s) for s in samples]
most_common = Counter(bitstrings).most_common(5)

for bitstring, freq in most_common:
    cut = count_cut(bitstring, G)
    print(f"  {bitstring}  cut={cut}  freq={freq}")

Approximation Ratio

The maximum cut for this 4-node, 5-edge graph is 4 edges (verified by brute force). Compare QAOA against that:

# Brute force optimal
from itertools import product

def brute_force_max_cut(graph):
    n = graph.number_of_nodes()
    best = 0
    best_partition = None
    for bits in product([0, 1], repeat=n):
        cut = count_cut(bits, graph)
        if cut > best:
            best = cut
            best_partition = bits
    return best, best_partition

optimal_cut, optimal_partition = brute_force_max_cut(G)
print(f"Optimal cut: {optimal_cut}  partition: {optimal_partition}")

# Most common QAOA sample
best_qaoa_cut = max(count_cut(b, G) for b in bitstrings)
approx_ratio = best_qaoa_cut / optimal_cut
print(f"QAOA best cut: {best_qaoa_cut}  approximation ratio: {approx_ratio:.3f}")

For p=1, you typically achieve an approximation ratio around 0.75 on this graph. For p=2, it rises toward 0.9 or higher. The theoretical lower bound for QAOA at p=1 on Max-Cut is 0.6924; this is a worst-case guarantee, distinct from the Goemans-Williamson SDP bound of 0.878 which is a classical result. QAOA can outperform the 0.6924 floor in practice, especially on structured graphs.

Approximation Ratio vs Circuit Depth

p (layers)ParametersApprox ratio (typical)Circuit depth
12~0.75Shallow
24~0.85Moderate
36~0.92Deeper
510~0.97Hardware-challenging

There are diminishing returns as p grows: each additional layer contributes less improvement while adding to circuit depth and optimization difficulty.

Weighted Max-Cut

The standard Max-Cut problem treats all edges equally. In many real-world applications, edges carry weights representing the strength of connections, and the objective becomes maximizing the total weight of cut edges.

For weighted Max-Cut, the cost Hamiltonian generalizes to:

H_C = sum_{(i,j) in edges} w_{ij} * (I - Z_i Z_j) / 2

where w_{ij} is the weight of edge (i, j).

# Define a weighted graph where one heavy edge dominates
G_weighted = nx.Graph()
G_weighted.add_edge(0, 1, weight=100.0)  # Heavy edge: must be in optimal cut
G_weighted.add_edge(1, 2, weight=1.0)
G_weighted.add_edge(2, 3, weight=1.0)
G_weighted.add_edge(3, 0, weight=1.0)
G_weighted.add_edge(0, 2, weight=1.0)

n_qubits_w = G_weighted.number_of_nodes()

def build_weighted_cost_hamiltonian(graph):
    """Build cost Hamiltonian for weighted Max-Cut."""
    coeffs = []
    obs = []
    for i, j, data in graph.edges(data=True):
        w = data.get("weight", 1.0)
        # w * (I - Z_i Z_j) / 2  =  w/2 * I  -  w/2 * Z_i Z_j
        coeffs.append(w / 2)
        obs.append(qml.Identity(i))
        coeffs.append(-w / 2)
        obs.append(qml.PauliZ(i) @ qml.PauliZ(j))
    return qml.Hamiltonian(coeffs, obs)

H_cost_w = build_weighted_cost_hamiltonian(G_weighted)

# Cost and mixer layers for weighted Max-Cut
def weighted_cost_layer(gamma, graph):
    """Apply cost unitary for weighted graph. Each edge scaled by its weight."""
    for i, j, data in graph.edges(data=True):
        w = data.get("weight", 1.0)
        qml.IsingZZ(2 * gamma * w, wires=[i, j])

dev_w = qml.device("default.qubit", wires=n_qubits_w)

@qml.qnode(dev_w)
def qaoa_weighted(params, p, graph):
    gammas = params[:p]
    betas = params[p:]
    for wire in range(graph.number_of_nodes()):
        qml.Hadamard(wires=wire)
    for layer in range(p):
        weighted_cost_layer(gammas[layer], graph)
        for wire in range(graph.number_of_nodes()):
            qml.RX(2 * betas[layer], wires=wire)
    return qml.expval(build_weighted_cost_hamiltonian(graph))

# Optimize weighted QAOA
params_w = pnp.array(np.random.uniform(0, np.pi, 2), requires_grad=True)
optimizer_w = AdamOptimizer(stepsize=0.05)

for step in range(200):
    params_w, cost_w = optimizer_w.step_and_cost(
        lambda p_: -qaoa_weighted(p_, p=1, graph=G_weighted), params_w
    )
    if step % 50 == 0:
        print(f"Step {step:3d}  <H_C> = {-float(cost_w):.4f}")

# Sample and verify the heavy edge is cut
dev_w_sample = qml.device("default.qubit", wires=n_qubits_w, shots=1024)

@qml.qnode(dev_w_sample)
def qaoa_weighted_sample(params, p, graph):
    gammas = params[:p]
    betas = params[p:]
    for wire in range(graph.number_of_nodes()):
        qml.Hadamard(wires=wire)
    for layer in range(p):
        weighted_cost_layer(gammas[layer], graph)
        for wire in range(graph.number_of_nodes()):
            qml.RX(2 * betas[layer], wires=wire)
    return qml.sample(wires=range(graph.number_of_nodes()))

samples_w = qaoa_weighted_sample(params_w, p=1, graph=G_weighted)
bitstrings_w = [tuple(s) for s in samples_w]

# Check: in the most common solutions, is the heavy edge (0,1) always cut?
most_common_w = Counter(bitstrings_w).most_common(5)
for bs, freq in most_common_w:
    heavy_edge_cut = bs[0] != bs[1]
    total_weight = sum(
        data["weight"] for i, j, data in G_weighted.edges(data=True) if bs[i] != bs[j]
    )
    print(f"  {bs}  weight={total_weight:.0f}  heavy edge cut={heavy_edge_cut}  freq={freq}")

The optimizer learns to prioritize cutting the heavy edge (0, 1), since that single edge contributes 100 out of the 104 total weight in the graph. In the output, you should see that the dominant bitstrings always place vertices 0 and 1 in different partitions.

XY Mixer for Cardinality-Constrained Problems

The standard QAOA mixer H_B = sum_i X_i does not preserve any symmetry of the bitstring. Each Rx rotation can flip any qubit, so the search explores all 2^n possible bitstrings. This is fine for unconstrained Max-Cut, but many combinatorial problems have cardinality constraints: “choose exactly k items from n.”

The XY mixer preserves the Hamming weight (number of ones) in the bitstring. It swaps excitations between neighboring qubits without creating or destroying them. For a pair of qubits (i, j), the XY interaction is:

H_XY(i, j) = (X_i X_j + Y_i Y_j) / 2

This operator maps |01> <-> |10> (swaps the excitation) while leaving |00> and |11> unchanged. The total number of ones is conserved.

def xy_mixer_layer(beta, wires, connectivity="full"):
    """
    Apply XY mixer that preserves Hamming weight.

    For each pair (i, j) in the connectivity graph, apply:
        exp(-i * beta * (XX + YY) / 2)

    PennyLane decomposes this as: CNOT, RY, CNOT pattern.
    """
    n = len(wires)
    if connectivity == "full":
        pairs = [(wires[i], wires[j]) for i in range(n) for j in range(i + 1, n)]
    elif connectivity == "ring":
        pairs = [(wires[i], wires[(i + 1) % n]) for i in range(n)]
    else:
        pairs = connectivity  # Pass explicit list of pairs

    for i, j in pairs:
        # Implement exp(-i * beta * (XX + YY) / 2) using native gates
        # This is equivalent to a partial SWAP operation
        qml.CNOT(wires=[i, j])
        qml.RY(2 * beta, wires=i)
        qml.CNOT(wires=[j, i])
        qml.RY(-2 * beta, wires=i)
        qml.CNOT(wires=[i, j])


# Demonstrate: XY mixer on a problem requiring exactly k=2 ones
# We want the best 2-vertex subset that maximizes cut edges
n_qubits_xy = 4
dev_xy = qml.device("default.qubit", wires=n_qubits_xy)

@qml.qnode(dev_xy)
def qaoa_xy(params, p, graph):
    """QAOA with XY mixer, starting from a k=2 state."""
    gammas = params[:p]
    betas = params[p:]

    # Initialize in a state with exactly k=2 ones: |0011>
    # This sets the Hamming weight sector
    qml.PauliX(wires=2)
    qml.PauliX(wires=3)

    for layer in range(p):
        # Cost layer (same as standard QAOA)
        for i, j in graph.edges():
            qml.IsingZZ(2 * gammas[layer], wires=[i, j])
        # XY mixer (preserves Hamming weight)
        xy_mixer_layer(betas[layer], wires=list(range(n_qubits_xy)))

    return qml.expval(build_cost_hamiltonian(graph))

# Test: verify Hamming weight is preserved
dev_xy_sample = qml.device("default.qubit", wires=n_qubits_xy, shots=500)

@qml.qnode(dev_xy_sample)
def qaoa_xy_sample(params, p, graph):
    gammas = params[:p]
    betas = params[p:]
    qml.PauliX(wires=2)
    qml.PauliX(wires=3)
    for layer in range(p):
        for i, j in graph.edges():
            qml.IsingZZ(2 * gammas[layer], wires=[i, j])
        xy_mixer_layer(betas[layer], wires=list(range(n_qubits_xy)))
    return qml.sample(wires=range(n_qubits_xy))

test_params_xy = np.array([0.8, 0.5])
samples_xy = qaoa_xy_sample(test_params_xy, p=1, graph=G)

# All samples should have exactly 2 ones
hamming_weights = [sum(s) for s in samples_xy]
print(f"Hamming weights in samples: {set(hamming_weights)}")
# Expected: {2} (only Hamming weight 2)
print(f"Unique bitstrings: {Counter([tuple(s) for s in samples_xy]).most_common()}")

When to use which mixer:

MixerExploresUse case
Rx (standard)All 2^n bitstringsUnconstrained problems (Max-Cut, Max-SAT)
XYFixed Hamming weight sectorCardinality-constrained problems (k-coloring, portfolio selection with fixed budget)

The XY mixer restricts the search to a smaller, feasible subspace, which can dramatically improve convergence for constrained problems.

Gradient-Based vs Gradient-Free Optimization

The choice of optimizer matters for QAOA, especially when targeting real hardware. Here we compare three approaches: Adam (adaptive gradient), vanilla gradient descent with the parameter-shift rule, and SPSA (gradient-free, stochastic).

from pennylane.optimize import AdamOptimizer, GradientDescentOptimizer, SPSAOptimizer

def run_optimizer_comparison(graph, p=1, steps=150):
    """Compare three optimizers on the same QAOA instance."""

    # Common objective: minimize -<H_C>
    def objective(params):
        return -qaoa_circuit(params, p=p, graph=graph)

    results = {}

    # 1. Adam optimizer
    np.random.seed(42)
    params_adam = pnp.array(np.random.uniform(0, np.pi, 2 * p), requires_grad=True)
    opt_adam = AdamOptimizer(stepsize=0.1)
    history_adam = []
    for step in range(steps):
        params_adam, cost = opt_adam.step_and_cost(objective, params_adam)
        history_adam.append(-float(cost))
    results["Adam"] = history_adam

    # 2. Gradient descent with parameter-shift rule (PennyLane default)
    np.random.seed(42)
    params_gd = pnp.array(np.random.uniform(0, np.pi, 2 * p), requires_grad=True)
    opt_gd = GradientDescentOptimizer(stepsize=0.3)
    history_gd = []
    for step in range(steps):
        params_gd, cost = opt_gd.step_and_cost(objective, params_gd)
        history_gd.append(-float(cost))
    results["GD (param-shift)"] = history_gd

    # 3. SPSA optimizer (gradient-free, stochastic)
    np.random.seed(42)
    params_spsa = pnp.array(np.random.uniform(0, np.pi, 2 * p), requires_grad=True)
    opt_spsa = SPSAOptimizer(maxiter=steps)
    history_spsa = []
    for step in range(steps):
        params_spsa = opt_spsa.step(objective, params_spsa)
        cost_val = float(objective(params_spsa))
        history_spsa.append(-cost_val)
    results["SPSA"] = history_spsa

    return results

comparison = run_optimizer_comparison(G, p=1, steps=150)

# Plot convergence curves
fig, ax = plt.subplots(figsize=(8, 5))
for name, history in comparison.items():
    ax.plot(history, label=name)
ax.set_xlabel("Optimization step")
ax.set_ylabel(r"$\langle H_C \rangle$")
ax.set_title("Optimizer Comparison for QAOA p=1")
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig("optimizer_comparison.png", dpi=150)
plt.show()

print("\nFinal values:")
for name, history in comparison.items():
    print(f"  {name}: {history[-1]:.4f}")

Key differences between the optimizers:

OptimizerCircuit evaluations per stepGradient methodBest for
Adam2 * n_params + 1Parameter-shift rule (exact)Simulator, smooth landscapes
GD (param-shift)2 * n_params + 1Parameter-shift rule (exact)Simulator, fine-tuning
SPSA2 (constant!)Finite difference with random perturbationNoisy hardware, many parameters

SPSA is the preferred optimizer for real quantum hardware. It estimates the gradient from only two circuit evaluations regardless of the number of parameters, compared to 2 * n_params evaluations for parameter-shift. On noisy hardware where each circuit evaluation is expensive and noisy, this constant overhead is a major advantage. The tradeoff is that SPSA produces noisy gradient estimates, so convergence is less smooth.

From PennyLane to Real Hardware

Running QAOA on real quantum hardware requires only a device swap in PennyLane, thanks to its hardware-agnostic design. The PennyLane-Qiskit plugin connects to IBM Quantum backends.

# Install the plugin: pip install pennylane-qiskit

# To run on IBM hardware (requires an IBM Quantum account):
# dev_ibm = qml.device(
#     "qiskit.ibmq",
#     wires=4,
#     backend="ibm_brisbane",    # or another available backend
#     shots=4096,
# )

# For demonstration, use the Qiskit Aer simulator (no account needed):
# dev_aer = qml.device("qiskit.aer", wires=4, shots=4096)

# The same QNode works with any device:
# @qml.qnode(dev_ibm)
# def qaoa_hardware(params, p, graph):
#     gammas = params[:p]
#     betas = params[p:]
#     for wire in range(graph.number_of_nodes()):
#         qml.Hadamard(wires=wire)
#     for layer in range(p):
#         cost_layer(gammas[layer], graph)
#         mixer_layer(betas[layer])
#     return qml.expval(build_cost_hamiltonian(graph))

Circuit Depth Analysis for Hardware

Before running on hardware, estimate the circuit fidelity by counting two-qubit gates:

# Draw the circuit and count gates
@qml.qnode(qml.device("default.qubit", wires=4))
def qaoa_for_drawing(params):
    for wire in range(4):
        qml.Hadamard(wires=wire)
    for layer in range(2):  # p=2
        cost_layer(params[layer], G)
        mixer_layer(params[2 + layer])
    return qml.expval(qml.PauliZ(0))

draw_params = np.array([0.5, 0.3, 0.4, 0.2])
print(qml.draw(qaoa_for_drawing)(draw_params))

# Count operations
specs = qml.specs(qaoa_for_drawing)(draw_params)
print(f"\nCircuit statistics:")
print(f"  Total gates: {specs['resources'].num_gates}")
print(f"  Depth: {specs['resources'].depth}")

Estimating hardware fidelity: If each two-qubit gate has error rate epsilon (typically 0.5% to 2% on current hardware), the probability of the entire circuit executing without error is approximately:

fidelity ≈ (1 - epsilon)^(n_two_qubit_gates)

For our 4-node Max-Cut graph at p=2, there are 10 IsingZZ gates (5 edges times 2 layers). At 1% error per gate, the estimated fidelity is (0.99)^10 ≈ 0.90. At p=3 it drops to (0.99)^15 ≈ 0.86. For practical results on NISQ hardware, keep p at 1 or 2 and apply error mitigation techniques.

Performance Scaling with Graph Size

How does QAOA performance scale as graphs grow? This section benchmarks p=1 and p=2 on three graph types.

def run_qaoa_benchmark(graph, p, steps=200, lr=0.1, seed=0):
    """Run QAOA optimization and return the best expectation value and approximation ratio."""
    n = graph.number_of_nodes()
    dev_bench = qml.device("default.qubit", wires=n)
    H_bench = build_cost_hamiltonian(graph)

    @qml.qnode(dev_bench)
    def circuit(params):
        gammas = params[:p]
        betas = params[p:]
        for w in range(n):
            qml.Hadamard(wires=w)
        for layer in range(p):
            for i, j in graph.edges():
                qml.IsingZZ(2 * gammas[layer], wires=[i, j])
            for w in range(n):
                qml.RX(2 * betas[layer], wires=w)
        return qml.expval(H_bench)

    np.random.seed(seed)
    params = pnp.array(np.random.uniform(0, np.pi, 2 * p), requires_grad=True)
    opt = AdamOptimizer(stepsize=lr)

    for _ in range(steps):
        params, _ = opt.step_and_cost(lambda p_: -circuit(p_), params)

    qaoa_value = float(circuit(params))
    optimal_cut, _ = brute_force_max_cut(graph)
    ratio = qaoa_value / optimal_cut if optimal_cut > 0 else 0

    return qaoa_value, optimal_cut, ratio

# Test graphs
np.random.seed(123)

# (a) 4-node cycle
G_cycle4 = nx.cycle_graph(4)

# (b) 8-node random graph (Erdos-Renyi, p=0.5)
G_random8 = nx.erdos_renyi_graph(8, 0.5, seed=42)

# (c) 10-node random 3-regular graph
G_reg10 = nx.random_regular_graph(3, 10, seed=42)

benchmarks = [
    ("4-node cycle", G_cycle4),
    ("8-node random (p=0.5)", G_random8),
    ("10-node 3-regular", G_reg10),
]

print(f"{'Graph':<30} {'p':>2} {'QAOA':>8} {'Optimal':>8} {'Ratio':>8}")
print("-" * 62)

for name, graph in benchmarks:
    for p in [1, 2]:
        qaoa_val, opt_val, ratio = run_qaoa_benchmark(graph, p, steps=200)
        print(f"{name:<30} {p:>2} {qaoa_val:>8.2f} {opt_val:>8d} {ratio:>8.3f}")

Expected results (your numbers may differ slightly due to optimization randomness):

Graphp=1 ratiop=2 ratio
4-node cycle~1.00~1.00
8-node random~0.75~0.85
10-node 3-regular~0.69~0.78

The 4-node cycle is simple enough that p=1 solves it exactly. For larger and denser graphs, more layers are needed. The 0.6924 lower bound for p=1 QAOA on 3-regular graphs (proved by Farhi, Goldstone, and Gutmann) appears to be tight: the 10-node 3-regular graph achieves a ratio close to this floor.

Classical Benchmark Comparison

To understand QAOA’s competitiveness, compare it against classical approximation algorithms for Max-Cut.

Goemans-Williamson SDP Relaxation

The Goemans-Williamson (GW) algorithm is the best known polynomial-time classical approximation for Max-Cut, achieving a 0.878 approximation ratio in the worst case. It works by relaxing the integer optimization to a semidefinite program (SDP), solving the SDP, then rounding the solution via random hyperplane rounding.

# Goemans-Williamson requires an SDP solver
# pip install cvxpy
import warnings

try:
    import cvxpy as cp

    def goemans_williamson(graph, n_rounds=1000):
        """
        Goemans-Williamson SDP relaxation for Max-Cut.
        Returns the best cut found over n_rounds of random hyperplane rounding.
        """
        n = graph.number_of_nodes()
        nodes = list(graph.nodes())
        node_idx = {v: i for i, v in enumerate(nodes)}

        # SDP relaxation: maximize sum_{(i,j)} w_ij * (1 - X_ij) / 2
        # subject to X is positive semidefinite, X_ii = 1
        X = cp.Variable((n, n), symmetric=True)
        constraints = [X >> 0]  # Positive semidefinite
        for i in range(n):
            constraints.append(X[i, i] == 1)

        # Objective: maximize sum of (1 - X_ij)/2 for edges
        obj = 0
        for u, v in graph.edges():
            i, j = node_idx[u], node_idx[v]
            obj += (1 - X[i, j]) / 2

        prob = cp.Problem(cp.Maximize(obj), constraints)
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            prob.solve(solver=cp.SCS, verbose=False)

        # Extract vectors via Cholesky-like decomposition
        X_val = X.value
        # Ensure positive semidefiniteness (numerical stability)
        eigvals, eigvecs = np.linalg.eigh(X_val)
        eigvals = np.maximum(eigvals, 0)
        V = eigvecs @ np.diag(np.sqrt(eigvals))

        # Random hyperplane rounding
        best_cut = 0
        best_partition = None
        for _ in range(n_rounds):
            r = np.random.randn(n)
            partition = tuple((V @ r > 0).astype(int))
            cut = count_cut(partition, graph)
            if cut > best_cut:
                best_cut = cut
                best_partition = partition

        return best_cut, best_partition

    gw_available = True
except ImportError:
    gw_available = False
    print("cvxpy not installed; skipping Goemans-Williamson comparison")

Greedy Heuristic

def greedy_max_cut(graph):
    """
    Simple greedy heuristic: assign each vertex to the partition that
    maximizes the number of cut edges with already-assigned vertices.
    """
    n = graph.number_of_nodes()
    nodes = list(graph.nodes())
    partition = {}

    # Start: put first vertex in set 0
    partition[nodes[0]] = 0

    for v in nodes[1:]:
        # Count edges to each partition
        cut_if_0 = sum(1 for u in graph.neighbors(v) if u in partition and partition[u] == 1)
        cut_if_1 = sum(1 for u in graph.neighbors(v) if u in partition and partition[u] == 0)

        partition[v] = 0 if cut_if_0 >= cut_if_1 else 1

    bitstring = tuple(partition[v] for v in nodes)
    return count_cut(bitstring, graph), bitstring

greedy_cut, greedy_partition = greedy_max_cut(G)
print(f"Greedy: cut={greedy_cut}, partition={greedy_partition}")

Full Comparison

# Compare all methods on the 4-node graph
optimal_cut, optimal_partition = brute_force_max_cut(G)
greedy_cut, _ = greedy_max_cut(G)

# QAOA results from earlier optimization
qaoa_p1_value = history_p1[-1]
qaoa_p2_value = history_p2[-1]

print(f"\n{'Method':<25} {'Cut value':>10} {'Ratio':>8}")
print("-" * 45)
print(f"{'Optimal (brute force)':<25} {optimal_cut:>10} {1.000:>8.3f}")

if gw_available:
    gw_cut, _ = goemans_williamson(G)
    print(f"{'Goemans-Williamson':<25} {gw_cut:>10} {gw_cut / optimal_cut:>8.3f}")

print(f"{'QAOA p=2':<25} {qaoa_p2_value:>10.2f} {qaoa_p2_value / optimal_cut:>8.3f}")
print(f"{'QAOA p=1':<25} {qaoa_p1_value:>10.2f} {qaoa_p1_value / optimal_cut:>8.3f}")
print(f"{'Greedy heuristic':<25} {greedy_cut:>10} {greedy_cut / optimal_cut:>8.3f}")

Honest assessment: For small graphs where brute force is tractable, Goemans-Williamson typically outperforms QAOA at p=1 and p=2. The GW algorithm guarantees a 0.878 approximation ratio, while QAOA p=1 guarantees only 0.6924 on 3-regular graphs. For QAOA to match or exceed GW, you generally need p >= 5 or higher, which pushes circuit depth beyond what current NISQ hardware can execute reliably.

The case for QAOA rests on two hopes: first, that at sufficient depth it may surpass the GW bound (there is no proof it cannot); second, that fault-tolerant quantum computers will eventually make high-p QAOA practical. For now, QAOA is primarily a research tool and a benchmark for quantum hardware, not a practical competitor to classical algorithms on Max-Cut.

Common Mistakes

This section covers pitfalls that frequently trip up QAOA practitioners. Each mistake is paired with an explanation and a fix.

1. Initializing All Parameters to Zero

Setting all gammas and betas to zero produces the uniform superposition state (the Hadamards create it, and zero-parameter unitaries are identity). The expectation value at this point equals the average cut over all bitstrings, and the gradient is exactly zero in every direction because the landscape is flat at this symmetric point.

Fix: Initialize parameters randomly from a uniform distribution over [0, pi], or use a structured initialization like INTERP (described above).

# Bad: zero initialization (optimizer never moves)
params_bad = pnp.array([0.0, 0.0], requires_grad=True)
grad_at_zero = qml.grad(lambda p: -qaoa_circuit(p, p=1, graph=G))(params_bad)
print(f"Gradient at zero: {grad_at_zero}")
# Output: very close to [0, 0]

# Good: random initialization
params_good = pnp.array(np.random.uniform(0, np.pi, 2), requires_grad=True)
grad_at_random = qml.grad(lambda p: -qaoa_circuit(p, p=1, graph=G))(params_good)
print(f"Gradient at random init: {grad_at_random}")
# Output: nonzero values

2. Wrong Factor in IsingZZ

The qml.IsingZZ(phi, wires) gate implements exp(-i * phi/2 * Z_i Z_j). If you pass gamma directly instead of 2 * gamma, the rotation angle is halved and the optimizer must compensate by finding parameter values twice as large. While the optimizer can sometimes recover, the landscape changes shape and convergence suffers.

Fix: Always pass 2 * gamma to qml.IsingZZ:

# Correct
qml.IsingZZ(2 * gamma, wires=[i, j])

# Incorrect (but silently produces wrong results)
# qml.IsingZZ(gamma, wires=[i, j])

3. Wrong Sign in Cost Hamiltonian

QAOA maximizes the expected cut value, but PennyLane optimizers minimize by default. If you forget to negate the cost when passing it to the optimizer, the optimizer drives the cost down (minimizing the cut value), which finds the worst cut instead of the best.

Fix: Negate the cost function in the optimization loop:

# Correct: minimize the negative of the cost
params, cost = optimizer.step_and_cost(
    lambda p_: -qaoa_circuit(p_, p=p, graph=graph), params
)

# Incorrect: minimizes the cost, finding the WORST cut
# params, cost = optimizer.step_and_cost(
#     lambda p_: qaoa_circuit(p_, p=p, graph=graph), params
# )

4. Sampling Before Convergence

Drawing samples from a QAOA circuit with unoptimized parameters gives essentially random bitstrings. The resulting cut values are no better than chance. Always verify that the optimization has converged (the cost curve has plateaued) before interpreting samples.

Fix: Plot the cost history and confirm convergence. A good heuristic: the last 20% of steps should show less than 1% variation in the cost.

5. Too Many Layers on Noisy Hardware

Each additional QAOA layer adds two-qubit gates proportional to the number of edges. On noisy hardware, gate errors accumulate multiplicatively. Beyond p=2 or p=3 on current NISQ devices, the noise overwhelms the signal, and the output state is closer to the maximally mixed state than to the optimal solution.

Fix: On NISQ hardware, keep p at 1 or 2. Use error mitigation techniques (zero-noise extrapolation, probabilistic error cancellation) rather than adding more layers. Save higher p for simulators or fault-tolerant hardware.

6. Confusing Cut Value with Approximation Ratio

The approximation ratio is QAOA_cut / optimal_cut, not QAOA_cut / n_edges. For a graph with 10 edges and an optimal cut of 7, a QAOA result of 5 gives an approximation ratio of 5/7 = 0.71, not 5/10 = 0.50.

# Correct approximation ratio
optimal_cut, _ = brute_force_max_cut(G)
qaoa_cut = 3  # example value
ratio = qaoa_cut / optimal_cut  # correct

# Wrong: dividing by total edges
n_edges = G.number_of_edges()
wrong_ratio = qaoa_cut / n_edges  # incorrect

print(f"Optimal cut: {optimal_cut}")
print(f"Correct ratio: {ratio:.3f}")
print(f"Wrong ratio: {wrong_ratio:.3f}")

Scaling Considerations

As the graph grows, two costs increase. First, the circuit depth scales as O(p * |E|) since each edge requires a gate in every cost layer. For a dense graph with hundreds of edges, even p=1 produces a deep circuit. Second, the classical optimization problem becomes harder. With 2p parameters to tune over a non-convex landscape, gradient-based optimizers can get stuck. Common mitigations include random restarts, warm-starting from a classical approximation, and layer-by-layer training (fix layers 1 through p-1, then optimize layer p).

QAOA currently offers the most practical results on sparse graphs with tens to low hundreds of nodes. On current NISQ hardware, circuit noise limits useful depth to roughly p=2 or p=3 before results degrade below classical benchmarks.

Was this tutorial helpful?