QAOA for MaxCut: A Complete Qiskit Walkthrough

QAOA (Quantum Approximate Optimization Algorithm) is one of the most studied near-term quantum algorithms. It targets combinatorial optimization problems, problems where you want to find the best assignment from a discrete set of possibilities. MaxCut is the standard benchmark: given a graph, partition vertices into two groups to maximize edges between them.

This tutorial builds QAOA for MaxCut from scratch in Qiskit, explains why the circuit looks the way it does, and shows how increasing depth parameter p improves solution quality. Along the way you will learn the gate-level decomposition, extend to weighted graphs, visualize the optimization landscape, analyze circuit depth for hardware, and implement Recursive QAOA.

Setup

pip install qiskit qiskit-aer networkx scipy matplotlib

import numpy as np
from scipy.optimize import minimize
import networkx as nx
import matplotlib.pyplot as plt
from qiskit import QuantumCircuit
from qiskit.quantum_info import SparsePauliOp, Operator
from qiskit.primitives import StatevectorEstimator, StatevectorSampler
from qiskit.circuit import ParameterVector

The MaxCut problem

Given a graph G = (V, E), partition vertices V into sets S and complement S-bar. The cut value is the number of edges (u, v) where u and v are in different sets. MaxCut finds the partition that maximizes this count.

import networkx as nx

# Build a 4-node test graph
G = nx.Graph()
G.add_edges_from([(0,1), (1,2), (2,3), (3,0), (0,2)])

# Brute-force optimal (feasible for n=4)
best_cut = 0
best_partition = None
for mask in range(2**G.number_of_nodes()):
    partition = [(mask >> i) & 1 for i in range(G.number_of_nodes())]
    cut = sum(1 for (u, v) in G.edges() if partition[u] != partition[v])
    if cut > best_cut:
        best_cut = cut
        best_partition = partition

print(f"Optimal cut: {best_cut}, partition: {best_partition}")

For this 4-node graph with 5 edges, the optimal cut is 4 (only one edge remains uncut). There are multiple optimal partitions. As the graph grows, brute-force becomes infeasible: the search space is 2^n.

The cost Hamiltonian

Each vertex is a qubit: |0> means “in set S”, |1> means “in complement”. An edge (u, v) contributes 1 when the qubits differ. The cost operator for edge (u, v):

C_(u,v) = (1 - Z_u Z_v) / 2

When both qubits match (both |0> or both |1>), Z_u Z_v = +1 and C = 0. When they differ, Z_u Z_v = -1 and C = 1. The total cost Hamiltonian is:

C = sum_{(u,v) in E} (1 - Z_u Z_v) / 2

Each computational basis state |x> is an eigenstate of C with eigenvalue equal to the cut value of partition x. This is the key property that makes QAOA work: the cost Hamiltonian encodes the objective function directly into its eigenvalues.

def maxcut_hamiltonian(graph: nx.Graph) -> SparsePauliOp:
    n = graph.number_of_nodes()
    terms = []
    for (u, v) in graph.edges():
        pauli_str = ['I'] * n
        pauli_str[u] = 'Z'
        pauli_str[v] = 'Z'
        zz_str = ''.join(reversed(pauli_str))  # Qiskit: rightmost = qubit 0
        terms.append((zz_str, -0.5))
        terms.append(('I' * n, 0.5))
    return SparsePauliOp.from_list(terms).simplify()

H_cost = maxcut_hamiltonian(G)
print(H_cost)

QAOA theory: why it works

QAOA prepares a trial state by alternating two unitaries for p layers. Start from the uniform superposition |+>^n, which places equal amplitude on every computational basis state:

|psi(gamma, beta)> = U_B(beta_p) U_C(gamma_p) ... U_B(beta_1) U_C(gamma_1) |+>^n

The two unitaries serve distinct roles:

Problem unitary U_C(gamma) = exp(-i * gamma * C) applies a phase to each basis state proportional to its cut value. States with large cuts accumulate large phases. States with small cuts accumulate small phases. This is a diagonal operator in the computational basis: it changes no amplitudes, only phases.

Mixer unitary U_B(beta) = exp(-i * beta * B) where B = sum_i X_i. Because X_i flips qubit i, this operator mixes amplitude between nearby basis states (those differing by a single bit flip). It converts the phase differences planted by U_C into amplitude differences.

The interplay between these two operations creates constructive interference on high-cut states and destructive interference on low-cut states. Each additional layer (higher p) provides finer control over this interference pattern.

For p approaching infinity, QAOA recovers the adiabatic algorithm: a continuous path from the ground state of B (the uniform superposition) to the ground state of C (the optimal cut). The quantum adiabatic theorem guarantees this converges to the exact solution, provided the path is traversed slowly enough. Finite-p QAOA is a Trotterized (discretized) version of this continuous evolution.

The mathematical structure is:

<C> = <+|^n  [prod_{k=1}^{p} U_C(gamma_k)^dag U_B(beta_k)^dag]  C  [prod_{k=1}^{p} U_B(beta_k) U_C(gamma_k)]  |+>^n

This is a 2p-parameter variational landscape. Classical optimization searches for the (gamma*, beta*) that maximizes .

Gate decomposition of U_C

For MaxCut, the cost unitary for a single edge (u, v) is:

exp(-i * gamma * (1 - Z_u Z_v) / 2) = exp(-i * gamma / 2) * exp(i * gamma * Z_u Z_v / 2)

The first factor exp(-i * gamma / 2) is a global phase and has no effect on measurement outcomes. We drop it. The remaining operator is:

exp(i * gamma * Z_u Z_v / 2)

To understand the ZZ rotation, consider the matrix form. In the two-qubit computational basis {|00>, |01>, |10>, |11>}, the operator Z_u tensor Z_v is diagonal with eigenvalues (+1, -1, -1, +1). So:

exp(i * theta * ZZ) = diag(e^{i*theta}, e^{-i*theta}, e^{-i*theta}, e^{i*theta})

Written as a full matrix with theta = gamma/2:

| cos(theta) + i*sin(theta)    0                          0                          0                         |
| 0                            cos(theta) - i*sin(theta)  0                          0                         |
| 0                            0                          cos(theta) - i*sin(theta)  0                         |
| 0                            0                          0                          cos(theta) + i*sin(theta) |

This decomposes into standard gates as: CNOT(u, v), Rz(2*gamma, v), CNOT(u, v). Here is the derivation:

CNOT maps Z_u tensor Z_v to I tensor Z_v (it diagonalizes the ZZ interaction onto qubit v alone).
Rz(2*gamma, v) = exp(-i * gamma * Z_v) applies the desired phase to qubit v.
The second CNOT undoes the basis change.

The net result is exp(i * gamma * Z_u Z_v / 2) up to a global phase absorbed by Rz.

Let’s verify this decomposition in Qiskit:

from qiskit.quantum_info import Operator
import numpy as np

gamma_val = 0.7  # arbitrary test value

# Build the ZZ rotation using CNOT-Rz-CNOT
qc_decomp = QuantumCircuit(2)
qc_decomp.cx(0, 1)
qc_decomp.rz(2 * gamma_val, 1)
qc_decomp.cx(0, 1)

# Build the exact ZZ rotation matrix
theta = gamma_val / 2
zz_exact = np.diag([
    np.exp(1j * gamma_val / 2),
    np.exp(-1j * gamma_val / 2),
    np.exp(-1j * gamma_val / 2),
    np.exp(1j * gamma_val / 2)
])

# Compare (up to global phase)
op_decomp = Operator(qc_decomp).data
# Align global phase by matching element (0,0)
phase = zz_exact[0, 0] / op_decomp[0, 0]
print("Max error:", np.max(np.abs(zz_exact - phase * op_decomp)))
# Should print ~0.0 (machine precision)

The QAOA circuit

QAOA alternates between two unitaries for p layers, starting from the uniform superposition:

Problem unitary U_C(gamma): encodes the cost Hamiltonian
Mixer unitary U_B(beta): X rotations enabling exploration

For MaxCut, U_C(gamma) on edge (u,v) decomposes to: CX, RZ(2*gamma), CX.

def build_qaoa_circuit(graph: nx.Graph, p: int):
    n = graph.number_of_nodes()
    gamma = ParameterVector('gamma', p)
    beta  = ParameterVector('beta', p)
    qc = QuantumCircuit(n)
    qc.h(range(n))  # Start in |+>^n
    for layer in range(p):
        # Problem unitary: apply ZZ rotation for each edge
        for (u, v) in graph.edges():
            qc.cx(u, v)
            qc.rz(2 * gamma[layer], v)
            qc.cx(u, v)
        # Mixer unitary: X rotation on each qubit
        for i in range(n):
            qc.rx(2 * beta[layer], i)
    return qc, gamma, beta

qc, gamma_params, beta_params = build_qaoa_circuit(G, p=1)
print(f"p=1: depth={qc.depth()}, params={qc.num_parameters}")

Optimizing the parameters

The QAOA expectation value is a function of (gamma, beta). Classical optimization finds the parameters that maximize the expected cut value.

def qaoa_cost(params, qc, gamma_params, beta_params, H, p):
    estimator = StatevectorEstimator()
    gamma_vals = params[:p]
    beta_vals  = params[p:]
    param_dict = {g: gamma_vals[i] for i, g in enumerate(gamma_params)}
    param_dict.update({b: beta_vals[i] for i, b in enumerate(beta_params)})
    bound_qc = qc.assign_parameters(param_dict)
    result = estimator.run([(bound_qc, H)]).result()
    return -float(result[0].data.evs)  # negate: minimize() -> we want max

p = 1
qc, gamma_params, beta_params = build_qaoa_circuit(G, p)
np.random.seed(42)
result = minimize(
    qaoa_cost, np.random.uniform(0, np.pi, 2*p),
    args=(qc, gamma_params, beta_params, H_cost, p),
    method='COBYLA', options={'maxiter': 500}
)
optimal_params = result.x
print(f"Optimal cost: {-result.fun:.4f} (classical optimal: {best_cut})")
print(f"Approximation ratio: {-result.fun/best_cut:.4f}")

Sampling the solution

def sample_qaoa(qc, gamma_params, beta_params, params, p, shots=2048):
    sampler = StatevectorSampler()
    param_dict = {g: params[i] for i, g in enumerate(gamma_params)}
    param_dict.update({b: params[p+i] for i, b in enumerate(beta_params)})
    bound_qc = qc.assign_parameters(param_dict)
    bound_qc.measure_all()
    result = sampler.run([bound_qc], shots=shots).result()
    return result[0].data.meas.get_counts()

counts = sample_qaoa(qc, gamma_params, beta_params, optimal_params, p)
print("\nTop bitstrings:")
for bitstring, count in sorted(counts.items(), key=lambda x: -x[1])[:6]:
    partition = [int(b) for b in reversed(bitstring)]
    cut = sum(1 for (u, v) in G.edges() if partition[u] != partition[v])
    print(f"  {bitstring}: cut={cut}, count={count}" + (" <-- optimal" if cut == best_cut else ""))

Effect of increasing p

for p_val in [1, 2, 3]:
    qc_p, gp, bp = build_qaoa_circuit(G, p_val)
    best_ratio = 0
    for _ in range(5):  # multiple restarts
        res = minimize(qaoa_cost, np.random.uniform(0, np.pi, 2*p_val),
                       args=(qc_p, gp, bp, H_cost, p_val),
                       method='COBYLA', options={'maxiter': 1000})
        best_ratio = max(best_ratio, -res.fun / best_cut)
    print(f"p={p_val}: approx ratio={best_ratio:.4f}, depth={qc_p.depth()}")

Typical results on this 4-node graph:

p=1: ratio ~0.75 (the theoretical >= 0.6924 guarantee applies to 3-regular graphs)
p=2: ratio ~0.85
p=3: ratio ~0.95+

Each additional layer increases the approximation ratio but also increases circuit depth, making hardware execution harder.

QAOA p=1 optimal parameters for 3-regular graphs

For 3-regular graphs (every vertex has exactly 3 neighbors), good p=1 QAOA parameters are known in advance. For triangle-free 3-regular graphs the p=1 optimum has a closed form: beta* = pi/8 and gamma* = arctan(1/sqrt(2)), roughly 0.616.

At the optimal angles, QAOA achieves an approximation ratio of at least 0.6924 on any 3-regular graph, regardless of the number of vertices (Farhi, Goldstone, and Gutmann, 2014).

# Generate a random 3-regular graph on 10 vertices
G3 = nx.random_regular_graph(3, 10, seed=42)

# Brute-force optimal cut
n3 = G3.number_of_nodes()
best_cut_3 = 0
for mask in range(2**n3):
    partition = [(mask >> i) & 1 for i in range(n3)]
    cut = sum(1 for (u, v) in G3.edges() if partition[u] != partition[v])
    best_cut_3 = max(best_cut_3, cut)

H3 = maxcut_hamiltonian(G3)
qc3, gp3, bp3 = build_qaoa_circuit(G3, p=1)

# Analytical optimal parameters (triangle-free 3-regular optimum)
analytical_gamma = np.arctan(1 / np.sqrt(2))  # ~0.616
analytical_beta = np.pi / 8
analytical_params = np.array([analytical_gamma, analytical_beta])

analytical_cost = -qaoa_cost(analytical_params, qc3, gp3, bp3, H3, 1)
analytical_ratio = analytical_cost / best_cut_3
print(f"Analytical params: gamma=arctan(1/sqrt(2)), beta=pi/8")
print(f"  Approximation ratio: {analytical_ratio:.4f}")
print(f"  Expected cut: {analytical_cost:.2f}, optimal: {best_cut_3}")

# Compare to random parameter initializations
random_ratios = []
for seed in range(20):
    rng = np.random.default_rng(seed)
    random_params = rng.uniform(0, np.pi, 2)
    random_cost = -qaoa_cost(random_params, qc3, gp3, bp3, H3, 1)
    random_ratios.append(random_cost / best_cut_3)

print(f"\nRandom params (20 trials):")
print(f"  Mean ratio: {np.mean(random_ratios):.4f}")
print(f"  Max ratio:  {np.max(random_ratios):.4f}")
print(f"  Min ratio:  {np.min(random_ratios):.4f}")
print(f"\nAnalytical params consistently outperform random initialization.")

This result is powerful because it means you can skip the classical optimization loop entirely for p=1 on 3-regular graphs. For other graph types, these parameters still serve as a strong starting point.

Landscape visualization

For p=1, QAOA has only two parameters: gamma and beta. We can visualize the entire cost landscape as a 2D heatmap. This reveals the structure of the optimization problem and helps explain why gradient-free optimizers like COBYLA work well at low p.

# Evaluate QAOA cost over a 2D grid
grid_size = 30
gamma_range = np.linspace(0, np.pi / 2, grid_size)
beta_range = np.linspace(0, np.pi / 4, grid_size)
landscape = np.zeros((grid_size, grid_size))

qc_land, gp_land, bp_land = build_qaoa_circuit(G, p=1)

for i, g in enumerate(gamma_range):
    for j, b in enumerate(beta_range):
        params = np.array([g, b])
        # Store the expected cut value (positive, not negated)
        landscape[j, i] = -qaoa_cost(params, qc_land, gp_land, bp_land, H_cost, 1)

# Find the grid maximum
max_idx = np.unravel_index(np.argmax(landscape), landscape.shape)
best_beta_grid = beta_range[max_idx[0]]
best_gamma_grid = gamma_range[max_idx[1]]

fig, ax = plt.subplots(figsize=(8, 6))
im = ax.imshow(
    landscape, extent=[0, np.pi/2, np.pi/4, 0],
    aspect='auto', cmap='viridis', origin='upper'
)
ax.set_xlabel('gamma')
ax.set_ylabel('beta')
ax.set_title('QAOA p=1 cost landscape (expected cut value)')
plt.colorbar(im, ax=ax, label='<C>')

# Mark the COBYLA optimum
ax.plot(optimal_params[0], optimal_params[1], 'r*', markersize=15,
        label=f'COBYLA optimum ({-result.fun:.3f})')
# Mark the grid maximum
ax.plot(best_gamma_grid, best_beta_grid, 'wx', markersize=12, markeredgewidth=2,
        label=f'Grid max ({landscape[max_idx]:.3f})')
ax.legend(loc='lower right')
plt.tight_layout()
plt.savefig('qaoa_landscape.png', dpi=150)
plt.show()

print(f"Grid maximum at gamma={best_gamma_grid:.4f}, beta={best_beta_grid:.4f}")
print(f"COBYLA optimum at gamma={optimal_params[0]:.4f}, beta={optimal_params[1]:.4f}")

The landscape for this 4-node graph is smooth with a single dominant basin. This is typical for small graphs at p=1. As p increases, the landscape becomes higher-dimensional and harder to visualize, but also develops more local minima, which motivates using multiple random restarts.

Circuit depth analysis

Understanding circuit resource counts is critical for assessing whether QAOA can run on real hardware. For a graph with n vertices, m edges, and p layers, the gate counts are:

Gate	Count per layer	Total (p layers)
H	n (initial state only)	n
CNOT	2m	2mp
Rz	m	mp
Rx	n	np

For our 4-node, 5-edge graph:

n_nodes = G.number_of_nodes()
m_edges = G.number_of_edges()

print(f"Graph: {n_nodes} vertices, {m_edges} edges\n")
print(f"{'p':>3} | {'CNOTs':>6} | {'Rz':>4} | {'Rx':>4} | {'H':>3} | {'Total gates':>11} | {'Circuit depth':>13}")
print("-" * 65)

for p_val in [1, 2, 3, 5, 10]:
    cnots = 2 * m_edges * p_val
    rzs = m_edges * p_val
    rxs = n_nodes * p_val
    hs = n_nodes
    total = cnots + rzs + rxs + hs
    qc_tmp, _, _ = build_qaoa_circuit(G, p_val)
    print(f"{p_val:>3} | {cnots:>6} | {rzs:>4} | {rxs:>4} | {hs:>3} | {total:>11} | {qc_tmp.depth():>13}")

On current NISQ hardware, each CNOT gate has approximately 0.5% error rate. The expected circuit fidelity, assuming independent errors, is:

cnot_error = 0.005
for p_val in [1, 2, 3, 5]:
    n_cnots = 2 * m_edges * p_val
    fidelity = (1 - cnot_error) ** n_cnots
    print(f"p={p_val}: {n_cnots} CNOTs, expected fidelity = {fidelity:.3f}")

At p=3 with 30 CNOTs, the expected fidelity is about 0.86. At p=5, it drops to 0.78. At p=10, it falls below 0.61. This sets a practical ceiling on how many layers you can run before noise overwhelms the signal. Error mitigation techniques (ZNE, PEC) can push this ceiling higher, but at a cost in shot overhead.

Weighted MaxCut

Real-world graph problems often have weighted edges. In weighted MaxCut, each edge (u, v) has a weight w_{uv}, and the objective is:

C = sum_{(u,v) in E} w_{uv} * (1 - Z_u Z_v) / 2

The only change in the QAOA circuit is the Rz rotation angle: instead of Rz(2gamma, v), use Rz(2gamma*w_{uv}, v). Heavier edges get larger rotations, biasing the algorithm toward cutting them.

def weighted_maxcut_hamiltonian(graph: nx.Graph) -> SparsePauliOp:
    """Build the cost Hamiltonian for a weighted MaxCut instance.
    Edge weights are read from the 'weight' attribute."""
    n = graph.number_of_nodes()
    terms = []
    for (u, v, data) in graph.edges(data=True):
        w = data.get('weight', 1.0)
        pauli_str = ['I'] * n
        pauli_str[u] = 'Z'
        pauli_str[v] = 'Z'
        zz_str = ''.join(reversed(pauli_str))
        terms.append((zz_str, -0.5 * w))
        terms.append(('I' * n, 0.5 * w))
    return SparsePauliOp.from_list(terms).simplify()


def build_weighted_qaoa_circuit(graph: nx.Graph, p: int):
    """Build a QAOA circuit where Rz angles scale with edge weights."""
    n = graph.number_of_nodes()
    gamma = ParameterVector('gamma', p)
    beta  = ParameterVector('beta', p)
    qc = QuantumCircuit(n)
    qc.h(range(n))
    for layer in range(p):
        for (u, v, data) in graph.edges(data=True):
            w = data.get('weight', 1.0)
            qc.cx(u, v)
            qc.rz(2 * gamma[layer] * w, v)  # weight scales the rotation
            qc.cx(u, v)
        for i in range(n):
            qc.rx(2 * beta[layer], i)
    return qc, gamma, beta


# Example: 4-node graph with weights
Gw = nx.Graph()
Gw.add_edge(0, 1, weight=1.0)
Gw.add_edge(1, 2, weight=2.0)
Gw.add_edge(2, 3, weight=3.0)
Gw.add_edge(3, 0, weight=4.0)
Gw.add_edge(0, 2, weight=5.0)

# Brute-force weighted MaxCut
best_wcut = 0
best_wpartition = None
for mask in range(2**Gw.number_of_nodes()):
    partition = [(mask >> i) & 1 for i in range(Gw.number_of_nodes())]
    wcut = sum(
        data['weight'] for (u, v, data) in Gw.edges(data=True)
        if partition[u] != partition[v]
    )
    if wcut > best_wcut:
        best_wcut = wcut
        best_wpartition = partition

print(f"Weighted optimal cut: {best_wcut}, partition: {best_wpartition}")

# Run weighted QAOA
Hw = weighted_maxcut_hamiltonian(Gw)
qcw, gpw, bpw = build_weighted_qaoa_circuit(Gw, p=1)

np.random.seed(42)
best_wresult = None
for _ in range(5):
    res = minimize(
        qaoa_cost, np.random.uniform(0, np.pi, 2),
        args=(qcw, gpw, bpw, Hw, 1),
        method='COBYLA', options={'maxiter': 500}
    )
    if best_wresult is None or res.fun < best_wresult.fun:
        best_wresult = res

print(f"QAOA expected weighted cut: {-best_wresult.fun:.4f}")
print(f"Weighted approximation ratio: {-best_wresult.fun / best_wcut:.4f}")

The weighted version biases QAOA toward cutting high-weight edges, which is exactly what the objective function demands. The Hamiltonian structure is identical; only the Rz angles change.

Warm starting from classical solutions

The QAOA optimization landscape can have local minima, especially at higher p. Starting from a good initial point can significantly improve convergence. One strategy is to warm-start from a classical heuristic solution.

A greedy MaxCut algorithm assigns each vertex to the partition that maximizes the immediate gain. We use the greedy solution to construct an initial state closer to the optimal, then run QAOA with parameters initialized near the identity (small gamma and beta) so the quantum circuit starts near the greedy solution.

def greedy_maxcut(graph):
    """Simple greedy MaxCut: assign each vertex to maximize cut gain."""
    n = graph.number_of_nodes()
    partition = [0] * n
    for v in range(n):
        # Count edges to each side
        gain_0 = sum(1 for u in graph.neighbors(v) if partition[u] == 1)
        gain_1 = sum(1 for u in graph.neighbors(v) if partition[u] == 0)
        partition[v] = 1 if gain_1 > gain_0 else 0
    cut = sum(1 for (u, v) in graph.edges() if partition[u] != partition[v])
    return partition, cut

greedy_part, greedy_cut = greedy_maxcut(G)
print(f"Greedy cut: {greedy_cut}, partition: {greedy_part}")
print(f"Optimal cut: {best_cut}")

# Warm-start strategy: initialize near the greedy solution
# Use small random perturbations around (0, 0) so the circuit
# starts close to the identity (preserving the initial state structure)
p_warm = 2
qc_warm, gp_warm, bp_warm = build_qaoa_circuit(G, p_warm)

# Warm start: small initial parameters
warm_init = np.random.uniform(0, 0.3, 2 * p_warm)

# Cold start: random initial parameters
cold_init = np.random.uniform(0, np.pi, 2 * p_warm)

np.random.seed(123)
warm_results = []
cold_results = []

for trial in range(10):
    rng = np.random.default_rng(trial)

    # Warm start
    w_init = rng.uniform(0, 0.3, 2 * p_warm)
    w_res = minimize(
        qaoa_cost, w_init,
        args=(qc_warm, gp_warm, bp_warm, H_cost, p_warm),
        method='COBYLA', options={'maxiter': 300}
    )
    warm_results.append(-w_res.fun / best_cut)

    # Cold start
    c_init = rng.uniform(0, np.pi, 2 * p_warm)
    c_res = minimize(
        qaoa_cost, c_init,
        args=(qc_warm, gp_warm, bp_warm, H_cost, p_warm),
        method='COBYLA', options={'maxiter': 300}
    )
    cold_results.append(-c_res.fun / best_cut)

print(f"\nWarm start (10 trials): mean ratio = {np.mean(warm_results):.4f}, "
      f"best = {np.max(warm_results):.4f}")
print(f"Cold start (10 trials): mean ratio = {np.mean(cold_results):.4f}, "
      f"best = {np.max(cold_results):.4f}")

Warm starting reduces wasted function evaluations on poor regions of the landscape. On hardware, where each function evaluation requires queuing a quantum job, this savings is substantial.

Running QAOA on IBM hardware

Running QAOA on real IBM quantum hardware introduces noise, queue latency, and gate set restrictions. The workflow has several additional steps compared to simulation.

# This code requires an IBM Quantum account and qiskit-ibm-runtime.
# Install with: pip install qiskit-ibm-runtime

from qiskit.transpiler.preset_passmanagers import generate_preset_pass_manager

# Step 1: Build the parametrized circuit (same as before)
p_hw = 1
qc_hw, gp_hw, bp_hw = build_qaoa_circuit(G, p_hw)

# Step 2: Transpile to a real backend's native gate set.
# We use optimization_level=3 for maximum gate reduction.
# Replace 'ibm_brisbane' with your available backend.
#
# from qiskit_ibm_runtime import QiskitRuntimeService
# service = QiskitRuntimeService()
# backend = service.least_busy(min_num_qubits=4)
# pm = generate_preset_pass_manager(optimization_level=3, backend=backend)
# transpiled_qc = pm.run(qc_hw)
# print(f"Transpiled depth: {transpiled_qc.depth()}")
# print(f"Transpiled CNOT count: {transpiled_qc.count_ops().get('cx', 0)}")

# Step 3: Run estimation with error mitigation.
# EstimatorV2 with resilience_level=2 enables Zero Noise Extrapolation (ZNE),
# which runs the circuit at multiple noise levels and extrapolates to zero noise.
#
# from qiskit_ibm_runtime import EstimatorV2, SamplerV2, Session, Options
#
# with Session(backend=backend) as session:
#     estimator = EstimatorV2(mode=session)
#     estimator.options.resilience_level = 2  # ZNE
#
#     def hardware_cost(params):
#         gamma_vals = params[:p_hw]
#         beta_vals = params[p_hw:]
#         param_dict = {g: gamma_vals[i] for i, g in enumerate(gp_hw)}
#         param_dict.update({b: beta_vals[i] for i, b in enumerate(bp_hw)})
#         bound = transpiled_qc.assign_parameters(param_dict)
#         result = estimator.run([(bound, H_cost)]).result()
#         return -float(result[0].data.evs)
#
#     # Step 4: Optimize with limited evaluations.
#     # Each evaluation is a hardware job, so budget carefully.
#     # SPSA is a good choice for noisy function evaluations.
#     hw_result = minimize(
#         hardware_cost,
#         np.array([np.pi/8, np.pi/8]),  # warm start for small graphs
#         method='COBYLA',
#         options={'maxiter': 50}  # limited budget due to queue time
#     )
#
#     # Step 5: Sample the optimized circuit
#     sampler = SamplerV2(mode=session)
#     param_dict = {g: hw_result.x[i] for i, g in enumerate(gp_hw)}
#     param_dict.update({b: hw_result.x[p_hw+i] for i, b in enumerate(bp_hw)})
#     bound_final = transpiled_qc.assign_parameters(param_dict)
#     bound_final.measure_all()
#     hw_counts = sampler.run([bound_final], shots=4096).result()
#     print(hw_counts[0].data.meas.get_counts())

Key considerations for hardware execution:

Transpilation matters. The transpiler maps your abstract circuit to the hardware’s physical qubit connectivity and native gates (typically CX, Rz, SX, X on IBM devices). High optimization levels reduce gate count but take longer to compile.
Error mitigation is essential. Without ZNE or similar techniques, noise biases the expectation value toward zero (depolarizing noise). Resilience level 2 uses ZNE; level 1 uses measurement error mitigation only.
Budget your function evaluations. Each COBYLA iteration submits a job to the hardware queue. With 50 iterations and 30-second queue times, optimization takes roughly 25 minutes. Warm starting from known good parameters reduces the number of iterations needed.
Expect degradation. A circuit that achieves 0.95 approximation ratio in simulation might achieve 0.70-0.80 on hardware, depending on the backend’s error rates and circuit depth.

Recursive QAOA (RQAOA)

Standard QAOA at low p often produces mediocre approximation ratios. Recursive QAOA (RQAOA) improves results by iteratively reducing the graph based on qubit-qubit correlations measured from the QAOA output state.

The algorithm:

Run QAOA (typically at p=1) on the current graph.
Measure the correlation <Z_u Z_v> for each edge.
Find the edge with the strongest correlation (largest |<Z_u Z_v>|).
If the correlation is positive (vertices tend to be in the same partition), merge u and v into a single vertex.
If the correlation is negative (vertices tend to be in different partitions), merge u and v with negation (assign opposite partitions).
Reduce the graph by one vertex and repeat from step 1.
When the graph is small enough (e.g., 1 vertex), the partition is fully determined.

def compute_zz_correlations(qc, gamma_params, beta_params, params, p, graph):
    """Compute <Z_u Z_v> for each edge using the optimized QAOA state."""
    n = qc.num_qubits
    estimator = StatevectorEstimator()
    param_dict = {g: params[i] for i, g in enumerate(gamma_params)}
    param_dict.update({b: params[p+i] for i, b in enumerate(beta_params)})
    bound_qc = qc.assign_parameters(param_dict)

    correlations = {}
    for (u, v) in graph.edges():
        pauli_str = ['I'] * n
        pauli_str[u] = 'Z'
        pauli_str[v] = 'Z'
        zz_op = SparsePauliOp.from_list([(''.join(reversed(pauli_str)), 1.0)])
        result = estimator.run([(bound_qc, zz_op)]).result()
        correlations[(u, v)] = float(result[0].data.evs)
    return correlations


def rqaoa(graph, p=1, max_iter=500):
    """Run Recursive QAOA on a MaxCut instance."""
    current_graph = graph.copy()
    # Track variable assignments: maps original vertex -> (reference_vertex, sign)
    # sign=+1 means same partition, sign=-1 means opposite partition
    assignments = {v: (v, 1) for v in graph.nodes()}

    while current_graph.number_of_nodes() > 1:
        n = current_graph.number_of_nodes()

        # Relabel nodes to 0..n-1 for circuit building
        mapping = {old: new for new, old in enumerate(current_graph.nodes())}
        reverse_mapping = {new: old for old, new in mapping.items()}
        relabeled = nx.relabel_nodes(current_graph, mapping)

        # Run QAOA p=1
        H = maxcut_hamiltonian(relabeled)
        qc_r, gp_r, bp_r = build_qaoa_circuit(relabeled, p)

        best_res = None
        for _ in range(3):
            res = minimize(
                qaoa_cost,
                np.random.uniform(0, np.pi, 2 * p),
                args=(qc_r, gp_r, bp_r, H, p),
                method='COBYLA',
                options={'maxiter': max_iter}
            )
            if best_res is None or res.fun < best_res.fun:
                best_res = res

        # Compute correlations
        correlations = compute_zz_correlations(
            qc_r, gp_r, bp_r, best_res.x, p, relabeled
        )

        # Find most correlated edge
        best_edge = max(correlations, key=lambda e: abs(correlations[e]))
        best_corr = correlations[best_edge]
        u_rel, v_rel = best_edge
        u_orig = reverse_mapping[u_rel]
        v_orig = reverse_mapping[v_rel]

        # Determine relationship: positive correlation means same partition,
        # negative means different partitions
        same_partition = best_corr > 0

        # Update assignments: all vertices pointing to v_orig now point through u_orig
        for vertex in list(assignments.keys()):
            ref, sign = assignments[vertex]
            if ref == v_orig:
                if same_partition:
                    assignments[vertex] = (u_orig, sign)
                else:
                    assignments[vertex] = (u_orig, -sign)

        # Merge v_orig into u_orig in the graph
        for neighbor in list(current_graph.neighbors(v_orig)):
            if neighbor != u_orig:
                if current_graph.has_edge(u_orig, neighbor):
                    pass  # keep existing edge (could add weights)
                else:
                    current_graph.add_edge(u_orig, neighbor)
        current_graph.remove_node(v_orig)

    # The last remaining vertex gets partition 0
    last_vertex = list(current_graph.nodes())[0]
    base_partition = {last_vertex: 0}

    # Reconstruct full partition
    final_partition = [0] * graph.number_of_nodes()
    for vertex in graph.nodes():
        ref, sign = assignments[vertex]
        base_val = base_partition.get(ref, 0)
        final_partition[vertex] = base_val if sign == 1 else 1 - base_val

    # Compute cut value
    cut = sum(1 for (u, v) in graph.edges()
              if final_partition[u] != final_partition[v])
    return final_partition, cut


# Run RQAOA on our test graph
np.random.seed(42)
rqaoa_partition, rqaoa_cut = rqaoa(G)
print(f"RQAOA partition: {rqaoa_partition}")
print(f"RQAOA cut: {rqaoa_cut}")
print(f"Optimal cut: {best_cut}")
print(f"RQAOA ratio: {rqaoa_cut / best_cut:.4f}")

RQAOA often outperforms standard QAOA at the same p value because it leverages the correlation structure in the quantum state to make hard variable assignments. The tradeoff is that it requires running QAOA n-1 times (once per reduction step), which increases the total computational cost.

MaxCut approximation guarantees

Different algorithms provide different guarantees on solution quality. The approximation ratio is the expected cut divided by the optimal cut.

Algorithm	Approximation ratio	Notes
Random assignment	0.500	Assign each vertex to S with probability 1/2
Greedy heuristic	0.50 to 0.70	Instance-dependent, no worst-case guarantee
QAOA p=1	>= 0.6924	Proven lower bound for 3-regular graphs
QAOA p=2	~0.7559	Numerically observed on worst-case instances
QAOA p=3	~0.7924	Numerically observed
Goemans-Williamson SDP	>= 0.878	Best known polynomial-time classical algorithm
Exact (brute force)	1.000	Exponential time

The Goemans-Williamson (GW) result is a landmark in approximation algorithms. It solves a semidefinite programming relaxation of MaxCut and rounds the continuous solution to a discrete partition. Under the Unique Games Conjecture, 0.878 is the best ratio achievable in polynomial time by any classical algorithm.

QAOA does not currently beat GW at any fixed p. However, as p grows with the problem size (p = O(n)), QAOA can in principle achieve ratio 1.0. Whether QAOA provides a practical advantage at moderate p on specific graph families remains an open research question.

# Verify random assignment ratio experimentally
random_cuts = []
for _ in range(10000):
    rand_part = np.random.randint(0, 2, G.number_of_nodes())
    cut = sum(1 for (u, v) in G.edges() if rand_part[u] != rand_part[v])
    random_cuts.append(cut)

print(f"Random assignment: mean ratio = {np.mean(random_cuts)/best_cut:.4f} "
      f"(theory: 0.5000)")
print(f"Greedy: ratio = {greedy_cut/best_cut:.4f}")
print(f"QAOA p=1: ratio = {-result.fun/best_cut:.4f}")

QAOA for other problems beyond MaxCut

QAOA generalizes to any problem expressible as a cost function over binary variables. Any QUBO (Quadratic Unconstrained Binary Optimization) maps directly to a diagonal Hamiltonian on qubits. The mixer stays the same (X rotations); only the cost Hamiltonian changes.

Maximum Independent Set

Given a graph, find the largest set of vertices with no edges between them. The cost Hamiltonian combines a reward for selecting vertices with a penalty for selecting adjacent pairs:

C = sum_v Z_v  -  P * sum_{(u,v) in E} (1 + Z_u)(1 + Z_v) / 4

where P is a penalty strength large enough to make any invalid solution (with adjacent selected vertices) worse than any valid one. Setting P > n ensures this.

def mis_hamiltonian(graph: nx.Graph, penalty: float = None) -> SparsePauliOp:
    """Build cost Hamiltonian for Maximum Independent Set.
    Convention: |1> means vertex is in the independent set."""
    n = graph.number_of_nodes()
    if penalty is None:
        penalty = n + 1  # guarantee penalty dominates

    terms = []
    # Reward: -Z_v for each vertex (negate because we encode selecting as |1>
    # and Z|1> = -|1>, so the eigenvalue for selected vertex is -1,
    # and -(- 1) = +1 reward)
    for v in range(n):
        pauli_str = ['I'] * n
        pauli_str[v] = 'Z'
        terms.append((''.join(reversed(pauli_str)), -0.5))
        terms.append(('I' * n, -0.5))

    # Penalty for adjacent selected vertices
    for (u, v) in graph.edges():
        # (1 + Z_u)(1 + Z_v) / 4 = (I + Z_u + Z_v + Z_u Z_v) / 4
        pauli_u = ['I'] * n
        pauli_u[u] = 'Z'
        pauli_v = ['I'] * n
        pauli_v[v] = 'Z'
        pauli_uv = ['I'] * n
        pauli_uv[u] = 'Z'
        pauli_uv[v] = 'Z'

        terms.append(('I' * n, penalty / 4))
        terms.append((''.join(reversed(pauli_u)), penalty / 4))
        terms.append((''.join(reversed(pauli_v)), penalty / 4))
        terms.append((''.join(reversed(pauli_uv)), penalty / 4))

    return SparsePauliOp.from_list(terms).simplify()

H_mis = mis_hamiltonian(G)
print("MIS Hamiltonian terms:", len(H_mis))

Graph coloring

For 3-coloring, encode each vertex with 2 qubits (representing colors 0, 1, 2 in binary, with state |11> penalized as invalid). The cost Hamiltonian penalizes adjacent vertices sharing the same color and penalizes the invalid state |11> on each vertex.

This is more complex than MaxCut because it requires multi-qubit interactions and a larger qubit register (2 qubits per vertex instead of 1).

def graph_coloring_hamiltonian(graph: nx.Graph, n_colors: int = 3,
                                penalty: float = 10.0) -> SparsePauliOp:
    """Build cost Hamiltonian for graph coloring with penalty terms.
    Uses 2 qubits per vertex for 3-coloring (colors 0,1,2 encoded as 00,01,10).
    State 11 is penalized as invalid."""
    n = graph.number_of_nodes()
    n_qubits = 2 * n
    terms = []

    # Penalty for invalid state |11> on each vertex
    # Projector onto |11> for qubits (2v, 2v+1): (1+Z_{2v})(1+Z_{2v+1})/4
    for v in range(n):
        q0, q1 = 2 * v, 2 * v + 1
        # (1 + Z_q0)(1 + Z_q1)/4 terms
        for pauli_ops, coeff in [
            ({}, penalty / 4),
            ({q0: 'Z'}, penalty / 4),
            ({q1: 'Z'}, penalty / 4),
            ({q0: 'Z', q1: 'Z'}, penalty / 4),
        ]:
            pstr = ['I'] * n_qubits
            for q, p in pauli_ops.items():
                pstr[q] = p
            terms.append((''.join(reversed(pstr)), coeff))

    # Penalty for adjacent vertices sharing the same color
    # For each edge (u,v) and each valid color c in {00, 01, 10}:
    # penalize when both u and v are color c
    for (u, v) in graph.edges():
        for color in range(n_colors):
            bits = [(color >> 0) & 1, (color >> 1) & 1]
            # Projector for vertex u having this color
            # and vertex v having this color
            u_qubits = [2*u, 2*u+1]
            v_qubits = [2*v, 2*v+1]

            # Build projector: product of (1 +/- Z)/2 for each qubit
            # bit=0 -> (1+Z)/2, bit=1 -> (1-Z)/2
            all_qubits = u_qubits + v_qubits
            all_bits = bits + bits  # same color for both vertices

            # Expand the product of (1 +/- Z)/2 terms
            from itertools import product as iterproduct
            for choices in iterproduct([0, 1], repeat=4):
                coeff = penalty
                pstr = ['I'] * n_qubits
                for idx, (q, b, c) in enumerate(
                    zip(all_qubits, all_bits, choices)
                ):
                    coeff /= 2
                    if c == 1:  # picked Z term
                        pstr[q] = 'Z'
                        if b == 1:  # (1-Z)/2, so Z contributes -1
                            coeff *= -1
                # Only add if coeff is nonzero
                if abs(coeff) > 1e-12:
                    terms.append((''.join(reversed(pstr)), coeff))

    return SparsePauliOp.from_list(terms).simplify()

# Example: 3-coloring of a triangle
G_tri = nx.Graph([(0,1), (1,2), (2,0)])
H_color = graph_coloring_hamiltonian(G_tri, n_colors=3, penalty=10.0)
print(f"Graph coloring Hamiltonian: {H_color.size} terms on "
      f"{G_tri.number_of_nodes() * 2} qubits")

The key takeaway: QAOA’s structure (alternating cost and mixer unitaries) applies to any cost Hamiltonian. The challenge for complex problems is that the penalty terms increase circuit depth and the optimization landscape becomes harder to navigate.

Common mistakes

1. Not using multiple random restarts

The QAOA landscape has local minima, especially at higher p. A single optimization run can get stuck. Always run multiple restarts and take the best.

# Bad: single restart
res = minimize(qaoa_cost, np.random.uniform(0, np.pi, 2*p),
               args=(qc, gamma_params, beta_params, H_cost, p),
               method='COBYLA')

# Good: multiple restarts
best = None
for _ in range(10):
    res = minimize(qaoa_cost, np.random.uniform(0, np.pi, 2*p),
                   args=(qc, gamma_params, beta_params, H_cost, p),
                   method='COBYLA', options={'maxiter': 500})
    if best is None or res.fun < best.fun:
        best = res

2. Wrong sign convention

QAOA maximizes the cut, but scipy.minimize minimizes. You must negate the cost function. Forgetting this produces the worst cut instead of the best.

# Correct: negate for minimization
return -float(result[0].data.evs)

# Wrong: would find the minimum cut
return float(result[0].data.evs)

3. Qiskit’s bitstring ordering

Qiskit uses little-endian ordering: qubit 0 is the rightmost character in a bitstring. When converting a bitstring to a partition, you must reverse it or index carefully.

bitstring = "1010"
# Qiskit ordering: qubit 3 is leftmost, qubit 0 is rightmost
# So bitstring "1010" means: qubit 0 = 0, qubit 1 = 1, qubit 2 = 0, qubit 3 = 1

# Correct: reverse to get vertex ordering
partition = [int(b) for b in reversed(bitstring)]
# partition = [0, 1, 0, 1] -> vertex 0 = 0, vertex 1 = 1, vertex 2 = 0, vertex 3 = 1

# Also correct: use indexing
partition = [int(bitstring[-(i+1)]) for i in range(len(bitstring))]

4. Confusing approximation ratio with sampling success

A high approximation ratio for the expectation value does not guarantee that the most frequently sampled bitstring is optimal. The expectation value is an average over all sampled bitstrings. Always decode and verify the actual cut values of the top bitstrings.

# Always verify: decode each sampled bitstring and compute its cut
for bitstring, count in sorted(counts.items(), key=lambda x: -x[1])[:5]:
    partition = [int(b) for b in reversed(bitstring)]
    actual_cut = sum(1 for (u, v) in G.edges() if partition[u] != partition[v])
    print(f"  {bitstring}: actual cut = {actual_cut}, sampled {count} times")

5. Ignoring hardware budget constraints

On real hardware, each function evaluation goes through the quantum device queue. An optimization loop with 500 iterations is impractical when each iteration takes 30 seconds. Limit COBYLA to 50-100 evaluations and use warm starting.

# Simulation: can afford many iterations
minimize(cost_fn, x0, method='COBYLA', options={'maxiter': 1000})

# Hardware: budget carefully
minimize(cost_fn, warm_start_x0, method='COBYLA', options={'maxiter': 50})

Practical notes

QAOA is not a practical solver today. For small instances, classical greedy algorithms match or exceed QAOA without the overhead. The algorithm’s potential advantage (if any) lies in large-scale fault-tolerant hardware that does not yet exist.

Parameter transferability. For triangle-free 3-regular graphs specifically, the optimal p=1 parameters are gamma=arctan(1/sqrt(2)) and beta=pi/8 regardless of the instance. Warm-starting from a similar instance’s parameters speeds up optimization significantly.

Use gradient-free optimizers on hardware. COBYLA and BOBYQA are more robust to shot noise than gradient-based methods. On real hardware with finite shots, gradient estimates are too noisy to be useful at small shot counts.

MaxCut is NP-hard. QAOA provides an approximation, not a guarantee of the exact optimal. For graph partitioning in practice, use METIS or spectral methods; they are fast and well-understood. QAOA is a research platform, not a production solver.

Recursive QAOA (RQAOA) improves quality at low p. If you are limited to p=1 by hardware noise, RQAOA extracts more information from each run by correlating variables and reducing the graph iteratively.

Circuit depth is the bottleneck. On current hardware, the CNOT error rate (~0.5%) limits useful circuit depth to roughly 50-100 CNOTs before noise overwhelms the signal. Plan your QAOA depth budget accordingly: for a graph with m edges, each layer costs 2m CNOTs.