SWAP Networks and Qubit Routing in Qiskit

Quantum hardware does not allow every qubit to interact with every other qubit. Each device has a coupling map: a graph where edges represent physically allowed two-qubit gate pairs. When your circuit contains a two-qubit gate between non-adjacent qubits, Qiskit’s transpiler must insert SWAP gates to move qubit states closer together. This process is called qubit routing, and minimizing SWAP overhead is critical for circuit quality on real hardware.

This tutorial covers everything you need to understand about qubit routing in Qiskit, from the hardware constraints that make it necessary to advanced techniques for minimizing its impact on your circuits.

Hardware Topology Deep Dive

Different quantum processors use different qubit connectivity layouts. The topology determines which qubit pairs can execute two-qubit gates directly. Understanding these topologies helps you design circuits that require fewer SWAP insertions.

Linear Chain

In a linear chain, qubits connect in a straight line: 0-1-2-3-4 and so on. Each qubit connects to exactly two neighbors, except for the endpoints, which connect to only one. This is the worst topology for algorithms that need all-to-all interactions, because the maximum distance between any two qubits grows linearly with the number of qubits. A gate between qubit 0 and qubit n-1 requires n-2 SWAPs to bring the states adjacent.

Square Grid

In a square grid, each interior qubit connects to four neighbors (north, south, east, west). Edge qubits connect to two or three neighbors. This topology offers better connectivity than a linear chain because the maximum distance between any two qubits grows as the square root of the total qubit count. Google’s Sycamore processor uses a variant of this layout.

Heavy-Hex (IBM Eagle and Heron)

IBM’s Eagle and Heron processors use a heavy-hex topology. This is a hexagonal lattice where each hexagon has a “heavy” qubit at its midpoint. Most qubits have two or three connections. IBM chose heavy-hex because it reduces frequency collisions between neighboring qubits, a significant source of crosstalk error in superconducting processors. The tradeoff is lower connectivity per qubit compared to a square grid. IBM’s newer Nighthawk family (announced late 2025) moves back to a square lattice, using tunable couplers to manage the crosstalk problem instead.

Inspecting a Real Coupling Map

You can inspect the coupling map of any backend, including Qiskit’s fake backends that mirror real hardware properties:

from qiskit_ibm_runtime.fake_provider import FakeSherbrooke

backend = FakeSherbrooke()
coupling_map = backend.coupling_map

# FakeSherbrooke models IBM's 127-qubit Eagle processor in heavy-hex topology
edges = list(coupling_map.get_edges())
print(f"Number of qubits: {coupling_map.size()}")
print(f"Number of directed edges: {len(edges)}")

# Check the connectivity of a few qubits
for qubit in [0, 1, 2, 63]:
    neighbors = coupling_map.neighbors(qubit)
    print(f"Qubit {qubit} connects to: {list(neighbors)}")

# Compute the diameter (longest shortest path) of the coupling map
print(f"Coupling map distance between qubit 0 and qubit 126: "
      f"{coupling_map.distance(0, 126)}")

The number of directed edges is typically twice the number of physical connections, because each connection supports a CNOT in both directions (though with different error rates, as we discuss below).

How SWAP Insertion Works

When your circuit needs a two-qubit gate between qubits that are not adjacent on the coupling map, the transpiler inserts SWAP gates to “move” qubit states along edges of the connectivity graph until the two qubits become neighbors. Each SWAP gate exchanges the quantum states of two adjacent qubits.

Why a SWAP Equals 3 CNOTs

A SWAP gate decomposes into exactly three CNOT gates. This is not an approximation; it is an exact decomposition. You can verify this directly:

from qiskit import QuantumCircuit
from qiskit.quantum_info import Operator
import numpy as np

# Build SWAP from three CNOTs
swap_from_cnots = QuantumCircuit(2)
swap_from_cnots.cx(0, 1)
swap_from_cnots.cx(1, 0)
swap_from_cnots.cx(0, 1)

# Build SWAP using the native gate
swap_native = QuantumCircuit(2)
swap_native.swap(0, 1)

# Verify they produce the same unitary matrix
print(np.allclose(Operator(swap_from_cnots).data, Operator(swap_native).data))  # True

This decomposition has a direct impact on error accumulation. If each CNOT has error rate p, one SWAP gate introduces error of approximately 3p (for small p where higher-order terms are negligible). On current IBM hardware where CNOT error rates hover around 0.3%, each SWAP contributes roughly 0.9% error. In a circuit with 10 SWAPs, the routing overhead alone introduces about 9% error before accounting for any of the algorithm’s own gates.

The Cost Adds Up Fast

Consider a concrete example. Suppose your algorithm needs 20 CNOT gates, and routing on heavy-hex hardware requires 8 SWAPs. Those 8 SWAPs become 24 additional CNOTs, bringing the total to 44 CNOTs. The circuit depth also increases substantially because SWAP gates are rarely parallelizable with the algorithm’s gates. Every extra layer of depth gives decoherence more time to corrupt your quantum state.

from qiskit import QuantumCircuit, transpile
from qiskit.providers.fake_provider import GenericBackendV2

# A circuit that requires long-range interactions
qc = QuantumCircuit(5)
qc.h(0)
qc.cx(0, 4)  # qubits 0 and 4 likely not adjacent
qc.cx(1, 3)
qc.cx(2, 4)
qc.measure_all()

backend = GenericBackendV2(num_qubits=5, seed=42)

# Transpile with default routing
qc_t = transpile(qc, backend=backend, optimization_level=1, seed_transpiler=42)
print(f"Original gates: {qc.count_ops()}")
print(f"Transpiled gates: {qc_t.count_ops()}")
print(f"Original depth: {qc.depth()}")
print(f"Transpiled depth: {qc_t.depth()}")

The transpiled gate count is typically much higher than the original due to inserted SWAPs and their CNOT decompositions.

CNOT Direction Matters

On real superconducting hardware, two-qubit gates are calibrated in specific directions. The coupling map stores directed edges: an edge (a, b) means that CNOT(control=a, target=b) is a natively supported operation. If your circuit needs CNOT(a, b) but only CNOT(b, a) is calibrated, the transpiler must flip the direction by wrapping the CNOT in Hadamard gates on both qubits.

The identity used is:

CNOT(a, b) = H(a) . H(b) . CNOT(b, a) . H(b) . H(a)

Hadamard gates are single-qubit operations with very low error rates (typically 10x lower than CNOT error), so the direction flip is relatively cheap. Still, it adds circuit depth, and those extra single-qubit gates contribute some error.

from qiskit_ibm_runtime.fake_provider import FakeSherbrooke

backend = FakeSherbrooke()
edges = list(backend.coupling_map.get_edges())
print(f"Total directed edges: {len(edges)}")

# Check if both directions exist for a given pair
sample_pair = edges[0]
reverse_pair = (sample_pair[1], sample_pair[0])
has_reverse = reverse_pair in edges
print(f"Edge {sample_pair} exists: True")
print(f"Reverse edge {reverse_pair} exists: {has_reverse}")

# On most IBM devices, both directions exist but with different error rates
# The transpiler picks the lower-error direction when possible

When you see the transpiler inserting unexpected Hadamard gates around your CNOTs, it is usually handling direction flips. This is normal and expected behavior.

Routing Algorithms in Qiskit

Qiskit provides several routing algorithms, each with different tradeoffs between compilation time and output circuit quality.

SABRE (Default)

SABRE (SWAP-based Bidirectional heuristic search for Efficient qubit Routing) is the default routing algorithm for optimization levels 1 and above. Here is how it works:

SABRE starts with an initial layout (a mapping from virtual circuit qubits to physical hardware qubits).
It identifies a “front layer” of gates whose input qubits are all available (no unresolved dependencies).
For each gate in the front layer, SABRE checks whether the two qubits are adjacent on the coupling map.
If a gate’s qubits are not adjacent, SABRE considers all possible SWAP insertions on coupling map edges near those qubits.
Each candidate SWAP is scored by how much it reduces the total distance needed to execute all pending gates, not just the current one. This look-ahead heuristic is what makes SABRE effective.
SABRE runs the circuit forward and backward, using the output layout of one pass as the input layout of the next. It keeps the best result across multiple iterations.

The look-ahead scoring is critical. A greedy algorithm might insert a SWAP that helps the current gate but pushes other qubits further apart, creating more work downstream. SABRE’s heuristic avoids this trap by considering the impact on future gates.

BasicSwap

BasicSwap is a simpler, greedy algorithm. For each non-executable gate, it finds the shortest path between the two qubits on the coupling map and inserts SWAPs along that path. It does not look ahead at future gates. BasicSwap compiles faster but typically produces circuits with more SWAPs.

Comparing Routing Algorithms

from qiskit import QuantumCircuit, transpile
from qiskit.providers.fake_provider import GenericBackendV2

# Build a circuit with multiple long-range interactions
qc = QuantumCircuit(5)
qc.h(0)
qc.cx(0, 4)
qc.cx(1, 3)
qc.cx(2, 4)
qc.cx(0, 3)
qc.cx(1, 4)
qc.measure_all()

backend = GenericBackendV2(num_qubits=5, seed=42)

# Compare SABRE vs BasicSwap
for routing_method in ["sabre", "basic"]:
    qc_t = transpile(
        qc,
        backend=backend,
        routing_method=routing_method,
        optimization_level=1,
        seed_transpiler=42,
    )
    ops = qc_t.count_ops()
    swaps = ops.get("swap", 0)
    cx_count = ops.get("cx", 0)
    print(f"{routing_method:6s}: {swaps} SWAPs, {cx_count} CX gates, depth={qc_t.depth()}")

For most circuits, SABRE produces fewer SWAPs and shallower circuits than BasicSwap. The gap grows larger on circuits with many non-local gates.

Initial Layout Strategies

The initial qubit assignment (layout) has a major impact on routing quality. A good layout places frequently-interacting virtual qubits on adjacent physical qubits, reducing the number of SWAPs the routing pass needs to insert.

TrivialLayout

TrivialLayout maps virtual qubit i to physical qubit i. It completely ignores the coupling map structure and the circuit’s interaction pattern. This is rarely a good choice, but it is deterministic and useful as a baseline.

DenseLayout

DenseLayout places your circuit’s qubits onto a densely connected subgraph of the coupling map. It does not consider which virtual qubits interact with each other, only the overall connectivity of the physical qubits chosen.

SabreLayout

SabreLayout runs the SABRE algorithm in reverse to find a good initial layout. It is the default for optimization level 1 and above. SabreLayout considers both the circuit’s gate structure and the coupling map, making it the best general-purpose choice.

Manual Layout

You can specify the initial layout explicitly when you know your circuit’s structure well enough to place qubits by hand:

from qiskit import QuantumCircuit, transpile
from qiskit.providers.fake_provider import GenericBackendV2

backend = GenericBackendV2(num_qubits=7, seed=42)

qc = QuantumCircuit(5)
qc.h(0)
qc.cx(0, 1)
qc.cx(1, 2)
qc.cx(2, 3)
qc.cx(3, 4)
qc.measure_all()

# Compare different layout strategies by measuring resulting circuit depth
results = {}

# TrivialLayout: virtual qubit i -> physical qubit i
qc_trivial = transpile(qc, backend=backend, optimization_level=1,
                        layout_method="trivial", seed_transpiler=42)
results["trivial"] = qc_trivial.depth()

# SabreLayout: let SABRE optimize placement
qc_sabre = transpile(qc, backend=backend, optimization_level=1,
                      layout_method="sabre", seed_transpiler=42)
results["sabre"] = qc_sabre.depth()

# Manual layout: place qubits on a known-good chain in the coupling map
# First, find a chain of 5 connected qubits
initial_layout = [0, 1, 2, 3, 4]
qc_manual = transpile(qc, backend=backend, initial_layout=initial_layout,
                       optimization_level=1, seed_transpiler=42)
results["manual"] = qc_manual.depth()

for method, depth in results.items():
    print(f"{method:10s} layout: depth = {depth}")

For linear circuits (chains of nearest-neighbor gates), a good manual layout can match or beat SABRE. For circuits with complex interaction patterns, SabreLayout almost always wins.

Inspecting the Chosen Layout

After transpilation, you can see exactly which virtual qubits mapped to which physical qubits:

from qiskit import QuantumCircuit, transpile
from qiskit.providers.fake_provider import GenericBackendV2

backend = GenericBackendV2(num_qubits=7, seed=42)

qc = QuantumCircuit(5)
qc.h(0)
qc.cx(0, 4)
qc.cx(1, 3)
qc.measure_all()

qc_sabre = transpile(
    qc,
    backend=backend,
    layout_method="sabre",
    routing_method="sabre",
    optimization_level=3,
    seed_transpiler=42,
)

# Check which virtual qubits mapped to which physical qubits
layout = qc_sabre.layout.initial_layout
print("Virtual -> Physical qubit mapping:")
for virtual, physical in layout.get_virtual_bits().items():
    print(f"  virtual qubit {virtual} -> physical qubit {physical}")

Noise-Aware Routing

Not all qubits and connections on a quantum processor are equal. Some qubits have lower error rates than others, and some CNOT connections are more reliable than others. Noise-aware routing takes advantage of this by preferring low-error qubits and edges during layout and routing decisions.

When you provide a real backend (or a fake backend that models real noise properties), the transpiler at optimization level 3 uses error rate data to influence its choices:

from qiskit import QuantumCircuit, transpile
from qiskit_ibm_runtime.fake_provider import FakeSherbrooke

backend = FakeSherbrooke()

# A circuit that benefits from noise-aware placement
qc = QuantumCircuit(5)
qc.h(0)
for i in range(4):
    qc.cx(i, i + 1)
qc.measure_all()

# With noise-aware routing, SABRE weights SWAP choices by gate error rates
qc_noise_aware = transpile(
    qc,
    backend=backend,
    optimization_level=3,
    layout_method="sabre",
    routing_method="sabre",
    seed_transpiler=42,
)

print(f"Transpiled depth: {qc_noise_aware.depth()}")
print(f"Gate counts: {qc_noise_aware.count_ops()}")

Noise-aware routing may choose a physically longer SWAP path (more total gates) if that path uses lower-error-rate connections. This trades circuit depth for reduced per-gate error, and the net effect can be a significant improvement in output fidelity. The benefit is most pronounced on larger devices where qubit-to-qubit error variation is substantial.

Analyzing Routing Overhead

Before running a circuit on hardware, you should quantify how much overhead routing adds. This tells you whether your circuit is feasible given the device’s coherence time and error rates.

from qiskit import QuantumCircuit, transpile
from qiskit.providers.fake_provider import GenericBackendV2

backend = GenericBackendV2(num_qubits=7, seed=42)

# A circuit with several non-local gates
qc = QuantumCircuit(5)
qc.h(0)
qc.cx(0, 4)
qc.cx(1, 3)
qc.cx(2, 4)
qc.cx(0, 3)
qc.cx(1, 4)
qc.measure_all()

qc_t = transpile(qc, backend=backend, optimization_level=2, seed_transpiler=42)

# Count original vs transpiled two-qubit gates
orig_cx = qc.count_ops().get("cx", 0)
trans_ops = qc_t.count_ops()
trans_cx = trans_ops.get("cx", 0)
trans_swaps = trans_ops.get("swap", 0)
# Each remaining SWAP in the transpiled circuit will become 3 CX gates
effective_cx = trans_cx + 3 * trans_swaps

print(f"Original CX gates: {orig_cx}")
print(f"Transpiled CX gates: {trans_cx}")
print(f"Remaining SWAPs (each = 3 CX): {trans_swaps}")
print(f"Effective CX after full decomposition: {effective_cx}")
if orig_cx > 0:
    print(f"Routing overhead: {effective_cx / orig_cx:.1f}x")

print(f"\nOriginal depth: {qc.depth()}")
print(f"Transpiled depth: {qc_t.depth()}")

Routing overhead varies dramatically by circuit type:

Linear nearest-neighbor circuits (like a simple chain of CX gates): 1.0x to 1.5x overhead on most topologies.
QFT on moderate qubit counts: 1.5x to 3x overhead.
QAOA on dense graphs (many non-local interactions): 3x to 5x or more overhead on heavy-hex.
Random circuits with all-to-all connectivity: can exceed 5x overhead.

If your routing overhead exceeds 3x, consider redesigning the circuit or using SWAP networks (covered below).

Reducing SWAP Overhead in Practice

Several strategies reduce SWAP count, and using them together yields the best results.

Strategy 1: Use Higher Optimization Levels

Optimization level 3 runs the most aggressive optimization passes, including repeated layout and routing attempts:

from qiskit import QuantumCircuit, transpile
from qiskit.providers.fake_provider import GenericBackendV2

backend = GenericBackendV2(num_qubits=7, seed=42)

qc = QuantumCircuit(5)
qc.h(0)
qc.cx(0, 4)
qc.cx(1, 3)
qc.cx(2, 4)
qc.measure_all()

for opt_level in [0, 1, 2, 3]:
    qc_t = transpile(qc, backend=backend, optimization_level=opt_level,
                      seed_transpiler=42)
    print(f"opt_level={opt_level}: depth={qc_t.depth()}, "
          f"ops={dict(qc_t.count_ops())}")

Strategy 2: Try Multiple Random Seeds

SABRE is a randomized algorithm. Different seeds produce different layouts and routing solutions. Running multiple seeds and picking the best result is a simple but effective technique:

import numpy as np
from qiskit import QuantumCircuit, transpile
from qiskit.providers.fake_provider import GenericBackendV2

backend = GenericBackendV2(num_qubits=7, seed=42)

qc = QuantumCircuit(5)
qc.h(0)
qc.cx(0, 4)
qc.cx(1, 3)
qc.cx(2, 4)
qc.measure_all()

best_depth = float("inf")
best_circuit = None
depths = []

for seed in range(20):
    qc_t = transpile(qc, backend=backend, optimization_level=2,
                      seed_transpiler=seed)
    depths.append(qc_t.depth())
    if qc_t.depth() < best_depth:
        best_depth = qc_t.depth()
        best_circuit = qc_t

print(f"Depths across 20 seeds: min={min(depths)}, max={max(depths)}, "
      f"mean={np.mean(depths):.1f}")
print(f"Best depth: {best_depth}")
print(f"Best gate counts: {dict(best_circuit.count_ops())}")

Use at least 10 seeds. For production workloads, 20 to 50 seeds is common. The compilation cost is linear in the number of seeds, but each run is fast for circuits under a few hundred qubits.

Strategy 3: Redesign the Circuit to Match Device Topology

When your algorithm allows flexibility in gate ordering or qubit assignment, restructure the circuit to minimize non-local interactions:

from qiskit import QuantumCircuit, transpile
from qiskit.providers.fake_provider import GenericBackendV2

backend = GenericBackendV2(num_qubits=7, seed=42)

# Bad: long-range interactions that force many SWAPs
qc_bad = QuantumCircuit(5)
qc_bad.h(0)
qc_bad.cx(0, 4)  # distance 4 on a chain
qc_bad.cx(1, 4)  # distance 3 on a chain
qc_bad.cx(0, 3)  # distance 3 on a chain
qc_bad.measure_all()

# Better: restructured to use nearest-neighbor interactions
# Same algorithm (GHZ-like entanglement), different gate ordering
qc_good = QuantumCircuit(5)
qc_good.h(0)
qc_good.cx(0, 1)
qc_good.cx(1, 2)
qc_good.cx(2, 3)
qc_good.cx(3, 4)
qc_good.measure_all()

qc_bad_t = transpile(qc_bad, backend=backend, optimization_level=2,
                      seed_transpiler=42)
qc_good_t = transpile(qc_good, backend=backend, optimization_level=2,
                       seed_transpiler=42)

print(f"Long-range version: depth={qc_bad_t.depth()}, "
      f"ops={dict(qc_bad_t.count_ops())}")
print(f"Nearest-neighbor version: depth={qc_good_t.depth()}, "
      f"ops={dict(qc_good_t.count_ops())}")

Circuit Design Principles for Minimal Routing

Beyond the strategies above, these design principles help you write circuits that route efficiently.

Principle 1: Align circuit structure with hardware topology. If the hardware is a chain, design algorithms with linear nearest-neighbor gates. If the hardware is a grid, use 2D nearest-neighbor patterns. Fighting the topology always costs SWAPs.

Principle 2: Avoid crossing interactions. A gate between qubits 0 and 4 with qubits 1, 2, and 3 in between forces SWAPs even under optimal routing. If your algorithm requires such interactions, group them together so the routing pass can amortize SWAP costs across multiple gates.

Principle 3: For dense interaction patterns, use SWAP networks. When every qubit needs to interact with every other qubit (as in QAOA on dense graphs), a structured SWAP network achieves all-to-all connectivity in O(n) layers. This is more predictable and often more efficient than relying on the general-purpose transpiler.

Principle 4: For VQE, choose hardware-efficient ansatze. Hardware-efficient ansatze restrict two-qubit gates to neighboring qubits on the device topology. This eliminates routing overhead entirely for the variational layers, at the cost of a potentially less expressive circuit. The tradeoff is usually worthwhile on NISQ hardware where every extra gate degrades fidelity.

SWAP Networks for All-to-All Connectivity

For algorithms that require all-to-all qubit interactions, structured SWAP networks provide an efficient and deterministic alternative to the transpiler’s general-purpose routing. A SWAP network is a fixed sequence of SWAP layers that systematically permutes qubits so that every pair of qubits becomes adjacent at some point during the network.

The Brick-Wall SWAP Network

The simplest and most common SWAP network uses alternating “odd” and “even” layers of parallel SWAPs, arranged like bricks in a wall:

Even layers: SWAP qubits (0,1), (2,3), (4,5), …
Odd layers: SWAP qubits (1,2), (3,4), (5,6), …

For n qubits, n-1 layers of this pattern guarantee that every pair of qubits has been adjacent in at least one layer.

from qiskit import QuantumCircuit

def swap_network_layer(n_qubits, layer_parity):
    """One layer of a brick-wall SWAP network."""
    qc = QuantumCircuit(n_qubits)
    start = layer_parity % 2
    for i in range(start, n_qubits - 1, 2):
        qc.swap(i, i + 1)
    return qc

# Build a full SWAP network for 6 qubits
n_qubits = 6
full_network = QuantumCircuit(n_qubits)
for layer in range(n_qubits - 1):  # n-1 layers for full pair coverage
    full_network.compose(swap_network_layer(n_qubits, layer), inplace=True)

print(full_network.draw())

Verifying the SWAP Network Covers All Pairs

It is important to verify that your SWAP network actually covers every qubit pair. The following function simulates the network and tracks which pairs become adjacent:

def verify_all_pairs_covered(n_qubits, n_layers):
    """Check that all qubit pairs interact in the SWAP network."""
    covered = set()
    # Track which logical qubit is at each physical position
    positions = list(range(n_qubits))

    for layer in range(n_layers):
        start = layer % 2
        # Record which pairs are adjacent in this layer BEFORE swapping
        for i in range(start, n_qubits - 1, 2):
            pair = tuple(sorted([positions[i], positions[i + 1]]))
            covered.add(pair)
        # Simulate the SWAPs: exchange positions
        for i in range(start, n_qubits - 1, 2):
            positions[i], positions[i + 1] = positions[i + 1], positions[i]

    all_pairs = {(i, j) for i in range(n_qubits) for j in range(i + 1, n_qubits)}
    print(f"Covered {len(covered)}/{len(all_pairs)} pairs in {n_layers} layers")
    return covered == all_pairs

# Verify for several qubit counts
for n in [4, 6, 8, 10]:
    result = verify_all_pairs_covered(n, n - 1)
    print(f"  n={n}: all pairs covered = {result}")

Using SWAP Networks in Practice

In a real algorithm like QAOA on a dense graph, you interleave your two-qubit gates with the SWAP layers. At each layer, you apply ZZ interactions between adjacent qubit pairs, then SWAP those pairs to bring new partners together:

from qiskit import QuantumCircuit
import numpy as np

def qaoa_swap_network_layer(n_qubits, gamma, layer_parity):
    """Apply ZZ interactions on adjacent pairs, then SWAP them."""
    qc = QuantumCircuit(n_qubits)
    start = layer_parity % 2
    for i in range(start, n_qubits - 1, 2):
        # ZZ interaction: CNOT - Rz - CNOT
        qc.cx(i, i + 1)
        qc.rz(2 * gamma, i + 1)
        qc.cx(i, i + 1)
        # SWAP the pair so new partners become adjacent
        qc.swap(i, i + 1)
    return qc

# Build one QAOA mixing round with SWAP network
n_qubits = 6
gamma = 0.5
beta = 0.3

qaoa_round = QuantumCircuit(n_qubits)

# Initial superposition
qaoa_round.h(range(n_qubits))

# Problem unitary via SWAP network: n-1 layers cover all pairs
for layer in range(n_qubits - 1):
    qaoa_round.compose(
        qaoa_swap_network_layer(n_qubits, gamma, layer),
        inplace=True,
    )

# Mixer unitary
qaoa_round.rx(2 * beta, range(n_qubits))
qaoa_round.measure_all()

print(f"QAOA circuit depth: {qaoa_round.depth()}")
print(f"QAOA circuit gate counts: {dict(qaoa_round.count_ops())}")

This approach produces a circuit with predictable depth that scales linearly with the number of qubits, regardless of graph density. The transpiler only needs to handle the linear nearest-neighbor SWAPs, which require zero additional routing on a chain topology.

Using the PassManager for Fine-Grained Control

For advanced use cases, you can build a custom transpilation pipeline using Qiskit’s PassManager. This gives you control over exactly which optimization passes run and in what order.

from qiskit import QuantumCircuit
from qiskit.transpiler import PassManager, CouplingMap
from qiskit.transpiler.passes import (
    SabreLayout,
    SabreSwap,
    FullAncillaAllocation,
    EnlargeWithAncilla,
    ApplyLayout,
)
from qiskit.providers.fake_provider import GenericBackendV2

backend = GenericBackendV2(num_qubits=7, seed=42)
coupling = CouplingMap(backend.coupling_map.get_edges())

# Build a circuit to route
qc = QuantumCircuit(5)
qc.h(0)
qc.cx(0, 4)
qc.cx(1, 3)
qc.cx(2, 4)
qc.measure_all()

# Passing routing_pass makes SabreLayout act as a pure layout pass.
# Without it, SabreLayout runs layout AND routing AND ancilla allocation
# on its own, which then conflicts with the explicit passes below.
pm = PassManager([
    SabreLayout(coupling, routing_pass=SabreSwap(coupling, heuristic="decay", seed=42),
                max_iterations=3, seed=42),
    FullAncillaAllocation(coupling),
    EnlargeWithAncilla(),
    ApplyLayout(),
    SabreSwap(coupling, heuristic="decay", seed=42),
])

qc_custom = pm.run(qc)
print(f"Custom routing: depth={qc_custom.depth()}")
print(f"Gate counts: {dict(qc_custom.count_ops())}")

The heuristic parameter in SabreSwap controls how SWAPs are scored:

"basic": only considers the immediate front layer of gates.
"lookahead": also considers gates in the next few layers.
"decay": like lookahead, but penalizes qubits that have been involved in recent SWAPs. This avoids cycles where qubits keep swapping back and forth.

The decay heuristic is generally the best choice and is the default in higher optimization levels.

Benchmarking Routing Quality Across Circuit Types

Different circuit structures experience very different routing overhead. Benchmarking across circuit types helps you understand what to expect for your specific application.

import numpy as np
from qiskit import QuantumCircuit, transpile
from qiskit.providers.fake_provider import GenericBackendV2

backend = GenericBackendV2(num_qubits=7, seed=42)

# Bell state: minimal routing needed (one CX gate)
bell = QuantumCircuit(2)
bell.h(0)
bell.cx(0, 1)
bell.measure_all()

# GHZ chain: linear nearest-neighbor structure
ghz = QuantumCircuit(5)
ghz.h(0)
for i in range(4):
    ghz.cx(i, i + 1)
ghz.measure_all()

# Dense interactions: every qubit interacts with every other qubit
dense = QuantumCircuit(5)
dense.h(0)
for i in range(5):
    for j in range(i + 1, 5):
        dense.cx(i, j)
dense.measure_all()

circuits = [
    ("Bell state (2 qubits, 1 CX)", bell),
    ("GHZ chain (5 qubits, linear)", ghz),
    ("Dense all-to-all (5 qubits)", dense),
]

for name, qc_bench in circuits:
    depths = []
    cx_counts = []
    for seed in range(10):
        qc_t = transpile(qc_bench, backend=backend,
                          optimization_level=2, seed_transpiler=seed)
        depths.append(qc_t.depth())
        ops = qc_t.count_ops()
        cx_counts.append(ops.get("cx", 0) + 3 * ops.get("swap", 0))
    print(f"{name}:")
    print(f"  depth: {min(depths)}-{max(depths)} (mean {np.mean(depths):.1f})")
    print(f"  effective CX: {min(cx_counts)}-{max(cx_counts)} "
          f"(mean {np.mean(cx_counts):.1f})")

Notice how the variance across seeds increases with circuit complexity. For the Bell state, every seed produces the same result. For the dense circuit, the best seed may produce a circuit that is 2x shallower than the worst seed.

Common Mistakes

Routing is an area where subtle mistakes can silently degrade your results. Here are the most common pitfalls.

Not Transpiling Before Running on Hardware

If you submit a circuit to a real backend without transpiling, it will fail if the circuit contains two-qubit gates between non-adjacent qubits or uses gates not in the device’s basis gate set. Always transpile before execution.

from qiskit import QuantumCircuit, transpile
from qiskit.providers.fake_provider import GenericBackendV2

backend = GenericBackendV2(num_qubits=7, seed=42)

qc = QuantumCircuit(5)
qc.h(0)
qc.cx(0, 4)
qc.measure_all()

# Always transpile before running
qc_t = transpile(qc, backend=backend, optimization_level=2, seed_transpiler=42)
# Then run: backend.run(qc_t)

Using optimization_level=0

Optimization level 0 does minimal routing and no layout optimization. It is intended for debugging, not production use. Circuits transpiled at level 0 are often 2x to 5x worse than level 2.

from qiskit import QuantumCircuit, transpile
from qiskit.providers.fake_provider import GenericBackendV2

backend = GenericBackendV2(num_qubits=7, seed=42)

qc = QuantumCircuit(5)
qc.h(0)
qc.cx(0, 4)
qc.cx(1, 3)
qc.cx(2, 4)
qc.measure_all()

for level in [0, 1, 2, 3]:
    qc_t = transpile(qc, backend=backend, optimization_level=level,
                      seed_transpiler=42)
    print(f"Level {level}: depth={qc_t.depth()}, gates={sum(qc_t.count_ops().values())}")

Ignoring SWAP Decomposition When Comparing Circuits

A circuit with “depth 10” that contains 3 SWAPs has an effective CNOT depth much higher than 10. When comparing routing results, always count effective two-qubit gates after SWAP decomposition.

Fixing a Manual Layout Without Benchmarking

Manual layouts can be 2x to 3x worse than SABRE-optimized layouts, especially on larger devices where the connectivity structure is complex. If you use a manual layout, always benchmark it against SABRE to confirm it is actually better.

Assuming All Qubits Are Equivalent

Real devices have significant qubit-to-qubit variation in error rates, T1 times, and T2 times. Two circuits with the same topology but different physical qubit assignments can have very different output fidelity. Use noise-aware transpilation (optimization level 3 with a real backend) to automatically prefer lower-error qubits.

Summary

Minimizing SWAP overhead is one of the most impactful optimizations you can make before running circuits on real hardware. A circuit that looks short on paper can easily double or triple in depth after routing, pushing it beyond the device’s coherence time. The key takeaways:

Every SWAP costs 3 CNOTs and their associated errors. On current hardware, each SWAP contributes roughly 1% error.
SABRE is the best general-purpose routing algorithm in Qiskit. Always use optimization level 2 or 3.
Try multiple seeds (10 to 50) because SABRE is randomized and results vary significantly.
Design your circuits to match the hardware topology when possible. Nearest-neighbor gates are free; long-range gates are expensive.
For algorithms that need all-to-all connectivity, structured SWAP networks give predictable linear-depth circuits that outperform general-purpose routing.
Use noise-aware transpilation on real backends to take advantage of qubit-to-qubit variation in error rates.
Always benchmark routing overhead before committing to a circuit design. If overhead exceeds 3x, redesign the circuit or switch to a SWAP network approach.