Circuit Routing and Optimization in Cirq

Overview

Real quantum processors enforce strict connectivity constraints on which pairs of qubits can interact directly. A two-qubit gate (like CNOT or CZ) requires physical coupling between the qubits involved, and this coupling can only be engineered between qubits that sit close together on the chip. The result is that the device’s connectivity map looks like a sparse graph: each qubit connects to only 2-4 neighbors, not to every other qubit.

On Google’s Sycamore processor, for example, the layout is a rectangular grid where each interior qubit connects to at most 4 neighbors (up, down, left, right). Edge qubits have 3 neighbors, and corner qubits have only 2. When your circuit requires a gate between two qubits that are not adjacent on this grid, the compiler must insert SWAP gates to shuttle qubit states through intermediate positions until the two logical qubits occupy physically adjacent sites. Each SWAP decomposes into 3 CNOT gates, so the overhead adds up fast. An unconstrained 10-qubit circuit with 20 two-qubit gates might balloon to 40-60 gates after routing, and every added gate introduces noise on real hardware.

Cirq’s transformer framework provides composable passes for routing circuits onto device topologies, optimizing gate sequences, and reducing circuit depth. This tutorial walks through the full compilation pipeline: understanding the device graph, routing a logical circuit, applying optimization passes, and measuring the results.

Why Routing is Hard

At first glance, routing seems straightforward: just insert SWAPs wherever you need them. But the problem is combinatorial. Inserting a SWAP to bring two qubits together changes the physical location of every qubit involved, which can break the adjacency that a later gate depends on. A SWAP that fixes one gate might create new violations for the next three gates in the circuit.

Finding the minimum number of SWAPs needed to make a circuit compatible with a given device topology is NP-hard in the general case. The problem is equivalent to a graph embedding problem: you need to embed the interaction graph of your circuit (which qubits talk to which) into the device’s connectivity graph, and you need to do this dynamically as the qubit mapping shifts with each SWAP.

Cirq’s RouteCQC router uses a heuristic approach to keep the SWAP overhead manageable. The key ideas behind RouteCQC are:

Gate-by-gate processing. The router walks through the circuit’s gates in topological order. When it encounters a two-qubit gate that violates connectivity, it decides how to fix the violation before moving on.
Look-ahead scoring. Rather than greedily fixing only the current gate, RouteCQC looks ahead at upcoming gates to evaluate which SWAP insertion will be cheapest overall. A SWAP that costs one extra gate now but saves three later is preferred.
Bridge operations. In some cases, instead of physically swapping qubit states, the router can use a “bridge” strategy that routes the gate through an intermediate qubit using a sequence of CNOT gates. This can be cheaper than a full SWAP when the intermediate qubit is not involved in nearby gates.

The result is not guaranteed to be optimal, but in practice RouteCQC produces circuits with significantly fewer SWAPs than naive approaches, especially on grid topologies.

Understanding Device Topology

Cirq represents device topology as a graph of qubits. Grid-based devices like Google Sycamore use GridQubit with nearest-neighbor connectivity.

import cirq

# A 3x3 grid topology
device_qubits = cirq.GridQubit.rect(3, 3)
print(device_qubits)

# Check adjacency
q00 = cirq.GridQubit(0, 0)
q01 = cirq.GridQubit(0, 1)
q10 = cirq.GridQubit(1, 0)
print(f"(0,0)-(0,1) adjacent: {cirq.GridQubit.is_adjacent(q00, q01)}")
print(f"(0,0)-(1,0) adjacent: {cirq.GridQubit.is_adjacent(q00, q10)}")

Extracting and Visualizing the Connectivity Graph

To work with routing, you need the device’s connectivity as a graph object. Each node is a qubit and each edge represents a pair of qubits that can execute a two-qubit gate directly.

import networkx as nx

# Build the device connectivity graph for a 3x3 grid
qubits = cirq.GridQubit.rect(3, 3)
device_graph = nx.Graph()
device_graph.add_nodes_from(qubits)

# Add edges between adjacent qubits
for q in qubits:
    for neighbor in [
        cirq.GridQubit(q.row + 1, q.col),
        cirq.GridQubit(q.row, q.col + 1),
    ]:
        if neighbor in qubits:
            device_graph.add_edge(q, neighbor)

print(f"Qubits: {device_graph.number_of_nodes()}")
print(f"Connections: {device_graph.number_of_edges()}")
print(f"Degree of (1,1): {device_graph.degree(cirq.GridQubit(1, 1))}")
print(f"Degree of (0,0): {device_graph.degree(cirq.GridQubit(0, 0))}")

On this 3x3 grid, the center qubit (1,1) has degree 4 (connected to all four neighbors), while corner qubit (0,0) has degree 2. This matters for routing: qubits with higher degree give the router more options for SWAP paths, so placing high-connectivity logical qubits on high-degree physical qubits tends to reduce overhead.

`GridQubit.is_adjacent()` vs. the Device Graph

GridQubit.is_adjacent() checks geometric adjacency on the infinite grid: two GridQubits are adjacent if their Manhattan distance equals 1. This works well for rectangular grids, but it does not account for missing qubits or irregular topologies. Real processors often have qubits removed for calibration reasons or have non-rectangular layouts. The device graph, by contrast, represents exactly which connections are available on the actual hardware. For production routing, always use the device graph rather than relying on is_adjacent() alone.

# is_adjacent checks geometric adjacency, not device membership
phantom = cirq.GridQubit(5, 5)  # not in our 3x3 device
print(f"Geometrically adjacent to phantom: {cirq.GridQubit.is_adjacent(phantom, cirq.GridQubit(5, 4))}")
# True, but neither qubit may exist on the actual device

Writing a Logical Circuit

Write your algorithm using LineQubit without worrying about device connectivity first.

# Logical circuit on 4 qubits
a, b, c, d = cirq.LineQubit.range(4)

circuit = cirq.Circuit([
    cirq.H(a),
    cirq.CNOT(a, c),   # non-adjacent on a grid
    cirq.CNOT(b, d),   # non-adjacent on a grid
    cirq.CZ(a, b),
    cirq.measure(a, b, c, d, key="result"),
])

print(circuit)

a and c are not adjacent on most grid devices, so this circuit needs routing before it can run. The CNOT(a, c) gate spans a distance of 2 in the line ordering, and on a 2x2 grid layout, a (mapped to position 0) and c (mapped to position 2) may or may not be neighbors depending on how you assign logical qubits to physical positions.

Applying the Transformer Framework

Cirq’s TransformerContext and built-in transformer passes handle routing and optimization in a composable way.

Manual Qubit Mapping

When you know your circuit’s structure well, you can manually assign logical qubits to physical positions. The goal is to place qubits that interact frequently on adjacent physical sites.

import cirq.transformers as transformers

# Map logical qubits to physical grid qubits
qubit_map = {
    a: cirq.GridQubit(0, 0),
    b: cirq.GridQubit(0, 1),
    c: cirq.GridQubit(1, 0),
    d: cirq.GridQubit(1, 1),
}

routed_circuit = circuit.transform_qubits(lambda q: qubit_map[q])
print(routed_circuit)

In this mapping, a maps to (0,0) and c maps to (1,0), which are adjacent on the grid. Similarly, b maps to (0,1) and d maps to (1,1), also adjacent. This means both problematic CNOT gates now act on adjacent qubits, and no SWAPs are needed. A good manual mapping can eliminate routing overhead entirely for small circuits.

Automatic Routing with RouteCQC

For more complex topologies or larger circuits where manual mapping is impractical, use the RouteCQC router. It automatically inserts SWAP gates to satisfy connectivity constraints.

import networkx as nx
from cirq.transformers import RouteCQC

# Define device connectivity as a networkx graph of allowed qubit pairs
qubits = cirq.GridQubit.rect(3, 3)
device_graph = nx.grid_2d_graph(3, 3)
# Map (row, col) nodes to GridQubit objects
qubit_map = {(q.row, q.col): q for q in qubits}
device_graph = nx.relabel_nodes(device_graph, qubit_map)

router = RouteCQC(device_graph)
# route_circuit returns (routed_circuit, initial_map, swap_map) in that order
routed, initial_map, swap_map = router.route_circuit(circuit)
print(routed)
print(f"SWAPs inserted: {sum(1 for op in routed.all_operations() if op.gate == cirq.SWAP)}")

Understanding the RouteCQC Output

router.route_circuit() returns three values:

routed: The new circuit with SWAP gates inserted where needed. All two-qubit gates in this circuit act on adjacent qubits in the device graph.
initial_map: A dictionary mapping each logical qubit (from your original circuit) to the physical qubit it occupies at the start of the routed circuit. This tells you how the router chose to place your qubits initially.
swap_map: A dictionary mapping each logical qubit to the physical qubit it occupies at the end of the routed circuit. Because SWAPs physically move qubit states around, the final positions differ from the initial positions. You need this map to interpret measurement results correctly.

Comparing Before and After Routing

To understand the cost of routing, compare the original and routed circuits across several metrics.

def circuit_stats(c, label="Circuit"):
    """Print gate count statistics for a circuit."""
    all_ops = list(c.all_operations())
    two_qubit = [op for op in all_ops if cirq.num_qubits(op) == 2
                 and not cirq.is_measurement(op)]
    one_qubit = [op for op in all_ops if cirq.num_qubits(op) == 1]
    swaps = [op for op in all_ops if op.gate == cirq.SWAP]
    print(f"{label}:")
    print(f"  Depth:            {len(c)}")
    print(f"  Total gates:      {len(all_ops)}")
    print(f"  1-qubit gates:    {len(one_qubit)}")
    print(f"  2-qubit gates:    {len(two_qubit)}")
    print(f"  SWAPs:            {len(swaps)}")
    print()

circuit_stats(circuit, "Original (logical)")
circuit_stats(routed, "After routing")

The routed circuit will generally have a higher gate count and deeper depth than the original. Each SWAP contributes 3 additional CNOT-equivalent gates when decomposed to the native gate set. The depth increase depends on how well the router can parallelize SWAP operations with other gates in the circuit.

Optimizing Gate Sequences

After routing, apply optimization passes to reduce gate count and circuit depth. Routing introduces redundant operations that these passes can eliminate.

Merge Single-Qubit Gates

# Merge single-qubit gates
optimized = cirq.merge_single_qubit_gates_to_phased_x_and_z(routed)

This pass finds consecutive single-qubit gates on the same qubit and replaces them with a single PhasedXZ gate that has the same unitary effect. It works by multiplying the 2x2 rotation matrices of consecutive gates analytically and decomposing the product into a single PhasedXZ rotation. For example, an H followed by a T followed by another H on the same qubit becomes one PhasedXZ gate. This is especially valuable after routing, because SWAP decompositions often leave chains of single-qubit gates that can be collapsed.

Drop Negligible Operations

# Drop negligible gates
optimized = cirq.drop_negligible_operations(optimized)

After merging, some gates may be effectively the identity matrix due to floating-point accumulation. For example, merging an Rz(0.3) with an Rz(-0.3) produces Rz(0.0), which does nothing. This pass removes any gate whose unitary is within a small tolerance (default 1e-8) of the identity. It is a cleanup step that should always follow merge passes.

Eject Z Gates

# Eject Z gates through Clifford gates for depth reduction
optimized = cirq.eject_z(optimized)

Z-axis rotations have a special property: they commute with many common operations. A Rz gate commutes with CZ, with other Rz gates, and with measurement in the computational basis. The eject_z pass exploits this by tracking accumulated Z-phase on each qubit and pushing it forward through the circuit. As the phase propagates, it often reaches a measurement gate, where it can be absorbed entirely (since measuring in the Z basis is insensitive to Z-phase). Even when the Z rotation cannot reach a measurement, pushing it later in the circuit can allow it to cancel with other Z rotations or merge into a subsequent gate.

This technique is sometimes called “virtual Z gates” because, on hardware that supports frame tracking, the Z rotation never needs to be physically executed at all. The phase is instead absorbed into the definition of subsequent gates.

Eject Phased Paulis

optimized = cirq.eject_phased_paulis(optimized)

This pass applies a similar strategy to PhasedXPow operations (rotations around axes in the XY plane). It pushes these operations forward through the circuit, commuting them past compatible gates, and absorbs or cancels them where possible. The algebraic rules governing how Pauli operations commute through Clifford gates make this tractable. Like eject_z, this pass reduces both gate count and depth by eliminating operations that would otherwise require physical gate execution.

Printing the Optimized Result

print(f"Original depth:   {len(circuit)}")
print(f"Optimized depth:  {len(optimized)}")

Reading the Optimization Results

To fully assess the impact of routing and optimization, compare circuits across multiple metrics. Gate count and depth tell part of the story, but two-qubit gate count is the most important metric for estimating noise on real hardware, because two-qubit gates have error rates roughly 10x higher than single-qubit gates.

circuit_stats(circuit, "Original (logical)")
circuit_stats(routed, "After routing (before optimization)")
circuit_stats(optimized, "After routing + optimization")

You can also estimate how the two-qubit gate count translates to expected fidelity. A rough model assumes each two-qubit gate has independent error probability p, so the circuit fidelity is approximately (1 - p)^n where n is the number of two-qubit gates.

import numpy as np

# Typical CZ error rate on current hardware
two_qubit_error = 0.005

for label, circ in [("Original", circuit), ("Routed", routed), ("Optimized", optimized)]:
    n_2q = sum(1 for op in circ.all_operations()
               if cirq.num_qubits(op) == 2 and not cirq.is_measurement(op))
    est_fidelity = (1 - two_qubit_error) ** n_2q
    print(f"{label}: {n_2q} two-qubit gates, estimated fidelity {est_fidelity:.4f}")

This is a simplified model (it ignores crosstalk, readout error, and decoherence during idle time), but it illustrates why minimizing two-qubit gate count is the primary goal of the optimization pipeline.

Choosing a Qubit Mapping Strategy

The initial placement of logical qubits onto physical qubits significantly affects routing overhead. You have two main options.

Manual mapping works best when you understand your circuit’s interaction pattern. If you know that qubits A and B interact frequently, you place them on adjacent physical qubits. For small circuits (under ~10 qubits) with a clear interaction structure, manual mapping often produces better results than automatic placement because you can exploit domain knowledge about the algorithm.

Automatic mapping via RouteCQC is the right choice when the circuit is large, when the interaction pattern is irregular or data-dependent, or when you want a reproducible compilation pipeline that does not require human intervention. RouteCQC determines the initial placement as part of its routing process.

To evaluate the quality of a mapping (whether manual or automatic), look at these indicators:

SWAP count. Fewer SWAPs means less overhead. Compare mappings by routing the same circuit with different initial placements and counting the resulting SWAPs.
Two-qubit gate depth. Even with the same SWAP count, different mappings can produce different depths because some allow more parallel execution.
Interaction distance. For each two-qubit gate in your circuit, compute the shortest path between the two qubits in the device graph under the current mapping. The sum of these distances gives a lower bound on the SWAPs needed.

def evaluate_mapping(circuit, mapping, device_graph):
    """Score a qubit mapping by total interaction distance."""
    total_distance = 0
    for op in circuit.all_operations():
        if cirq.num_qubits(op) == 2:
            q0, q1 = op.qubits
            p0 = mapping[q0]
            p1 = mapping[q1]
            dist = nx.shortest_path_length(device_graph, p0, p1)
            total_distance += dist
    return total_distance

# Lower scores indicate better mappings
score = evaluate_mapping(circuit, {
    a: cirq.GridQubit(0, 0),
    b: cirq.GridQubit(0, 1),
    c: cirq.GridQubit(1, 0),
    d: cirq.GridQubit(1, 1),
}, device_graph)
print(f"Mapping interaction distance: {score}")

Combining Passes with a Custom Transformer

Build a pipeline that routes and optimizes in one call.

def compile_for_grid(circuit: cirq.Circuit) -> cirq.Circuit:
    """Route and optimize a circuit for a 3x3 grid device."""
    context = cirq.TransformerContext(deep=True)

    # Step 1: Decompose to native gates
    c = cirq.decompose(circuit)
    c = cirq.Circuit(c)

    # Step 2: Merge redundant single-qubit gates
    c = cirq.merge_single_qubit_gates_to_phased_x_and_z(c, context=context)

    # Step 3: Drop negligible operations
    c = cirq.drop_negligible_operations(c, context=context)

    # Step 4: Synchronize moments
    c = cirq.synchronize_terminal_measurements(c, context=context)

    return c

compiled = compile_for_grid(routed_circuit)
print(compiled)

Note the ordering of passes in this pipeline. Decomposition comes first because it breaks composite gates into primitives that the subsequent passes can reason about. Merging follows decomposition to collapse the resulting chains of single-qubit gates. Dropping negligible operations cleans up any identity-like gates produced by merging. Finally, synchronizing terminal measurements ensures that all measurements happen in the same moment, which is a requirement on some hardware backends.

For a production pipeline that includes routing, structure the passes as: route first, then decompose, then optimize. This ensures that the optimization passes clean up the SWAP decompositions.

def full_compile(circuit: cirq.Circuit, device_graph) -> cirq.Circuit:
    """Full compilation pipeline: route, decompose, optimize."""
    context = cirq.TransformerContext(deep=True)

    # Route onto device
    router = RouteCQC(device_graph)
    c, initial_map, swap_map = router.route_circuit(circuit)

    # Decompose to native gates
    c = cirq.Circuit(cirq.decompose(c))

    # Optimize
    c = cirq.merge_single_qubit_gates_to_phased_x_and_z(c, context=context)
    c = cirq.drop_negligible_operations(c, context=context)
    c = cirq.eject_z(c, context=context)
    c = cirq.eject_phased_paulis(c, context=context)
    c = cirq.drop_negligible_operations(c, context=context)  # second cleanup pass
    c = cirq.synchronize_terminal_measurements(c, context=context)

    return c, initial_map, swap_map

Notice the second drop_negligible_operations call at the end. The eject passes can produce new near-identity gates that the first cleanup did not catch. Running the cleanup twice is cheap and catches these cases.

Measuring the Result

Validate the optimized circuit by comparing simulation outputs against the original.

sim = cirq.Simulator()

# Drop measurements before statevector simulation: simulate() collapses
# measurement gates stochastically, making the vdot comparison meaningless.
circuit_no_meas = cirq.drop_terminal_measurements(circuit)
compiled_no_meas = cirq.drop_terminal_measurements(compiled)

original_result = sim.simulate(circuit_no_meas)
compiled_result = sim.simulate(compiled_no_meas)

import numpy as np
fidelity = abs(np.vdot(
    original_result.final_state_vector,
    compiled_result.final_state_vector
)) ** 2
print(f"State fidelity after routing + optimization: {fidelity:.6f}")

A fidelity of 1.000000 (or very close to it, like 0.999999) confirms that the routing and optimization passes preserved the circuit’s unitary. If the fidelity is noticeably below 1.0, something went wrong. The most common causes are: a bug in a custom transformer pass, applying drop_negligible_operations with too large a tolerance, or comparing circuits that operate on different qubit orderings without accounting for the swap map.

When comparing routed circuits to originals, remember that the qubit labels differ. The original circuit uses LineQubit(0), LineQubit(1), ... while the routed circuit uses GridQubit(r, c) objects. The simulator does not care about qubit types, but you need to make sure the state vector indices line up. The initial_map from RouteCQC tells you the correspondence.

Common Mistakes

Optimizing Before Routing

A tempting but counterproductive pattern is to fully optimize the circuit before routing it. The problem: routing inserts SWAP gates, and those SWAPs decompose into sequences of single-qubit and two-qubit gates that themselves need optimization. If you already ran all your optimization passes, those new gates go unoptimized.

The correct order is:

Write and optionally pre-simplify the logical circuit
Route the circuit onto the device topology
Run the full optimization pipeline on the routed circuit

If you do want to optimize before routing (for example, to reduce the number of two-qubit gates that the router has to handle), run the optimization pipeline again after routing as well.

Forgetting That RouteCQC Changes Qubit Labels

The output circuit from RouteCQC uses GridQubit objects, not the LineQubit objects from your original circuit. If you try to look up results by the original qubit names, you will get incorrect values or KeyErrors.

# Wrong: using original qubit labels after routing
# result.measurements[a]  # 'a' is LineQubit(0), not in the routed circuit

# Right: use the initial_map to find the corresponding physical qubit
physical_a = initial_map[a]
# Then use physical_a to index into the routed circuit's results

Additionally, the swap_map tells you where each logical qubit ends up after all SWAPs have executed. This is critical for interpreting measurement results: the measurement on GridQubit(1, 0) at the end of the routed circuit might correspond to logical qubit b, not the qubit you originally placed at (1, 0).

Assuming the Simulator Checks Topology

Cirq’s Simulator executes any circuit regardless of connectivity. It does not enforce that two-qubit gates act on adjacent qubits. This means a circuit that violates device constraints will simulate perfectly but fail on real hardware.

To verify that your routed circuit actually satisfies the device topology, check every two-qubit gate against the device graph.

def validate_connectivity(circuit, device_graph):
    """Check that all two-qubit gates act on adjacent qubits."""
    violations = []
    for moment in circuit:
        for op in moment:
            if cirq.num_qubits(op) == 2 and not cirq.is_measurement(op):
                q0, q1 = op.qubits
                if not device_graph.has_edge(q0, q1):
                    violations.append(op)
    if violations:
        print(f"Found {len(violations)} connectivity violations:")
        for v in violations[:5]:
            print(f"  {v}")
    else:
        print("All two-qubit gates satisfy device connectivity.")
    return len(violations) == 0

validate_connectivity(routed, device_graph)

Routing Circuits That Already Use GridQubits

If your circuit already uses GridQubit objects but in a layout that does not match the target device, RouteCQC may produce confusing results. The router expects to map from the circuit’s qubits onto the device graph’s qubits. If the circuit already uses GridQubit(0, 0) but the device graph uses a different set of GridQubit objects (perhaps with an offset or different dimensions), the mapping can silently produce an invalid result.

The safest practice is to write logical circuits with LineQubit and let the router handle the mapping to GridQubit. If you must use GridQubit in your logical circuit, verify that every qubit in the circuit is also a node in the device graph before routing.

Summary

Use RouteCQC to insert SWAP gates automatically when device connectivity prevents direct gate execution. The router uses look-ahead heuristics and bridge operations to minimize the number of inserted SWAPs, but routing inevitably increases gate count and depth. Follow routing with Cirq’s optimization passes to recover some of this overhead: merge single-qubit gates into compact PhasedXZ rotations, drop near-identity operations, and eject Z-phase and Pauli operations forward through the circuit. Building a pipeline with TransformerContext lets you compose and reuse these passes across different device targets. Always validate your routed circuit against the device graph before submitting to hardware, and always compare simulation results against the original circuit to confirm that the compilation preserved correctness.