Custom Compilation Passes in TKET

TKET’s compilation system is built around composable BasePass objects. Every optimization is a pass, and passes can be combined into sequences, applied conditionally, or repeated until convergence. This tutorial goes beyond the built-in FullPeepholeOptimise shortcut and shows how to construct custom pass pipelines, rebase to hardware native gate sets, route circuits onto real device topologies, and measure the effect of each stage.

Installation

pip install pytket
# For IBM backend support:
pip install pytket-qiskit
# For IonQ and other Braket-hosted backends:
pip install pytket-braket

Pass Taxonomy

pytket ships with dozens of compilation passes, each targeting a specific transformation. Understanding which passes exist and what category they belong to is the first step toward building effective pipelines.

The following table organizes the most commonly used passes by category.

Category	Pass	What it does
Synthesis	`SynthesiseTket`	Re-synthesizes 2-qubit subcircuits using KAK decomposition into TK1 + CX
Synthesis	`SynthesisePauliGraph`	Synthesizes circuits from a Pauli exponential graph representation
Synthesis	`PauliSimp`	Simplifies sequences of Pauli exponentials before synthesis
Optimization	`FullPeepholeOptimise`	Aggressive peephole optimization assuming TK1/CX basis
Optimization	`PeepholeOptimise2Q`	Optimizes 2-qubit subcircuits via KAK decomposition
Optimization	`CliffordSimp`	Simplifies Clifford subcircuits using tableau algebra
Reduction	`RemoveRedundancies`	Removes gate-inverse pairs, identity rotations, and zero-angle gates
Reduction	`CommuteThroughMultis`	Commutes single-qubit gates through multi-qubit gates to expose cancellations
Reduction	`RemoveBarriers`	Strips barrier instructions from the circuit
Routing	`CXMappingPass`	Routes a circuit onto a connectivity graph using CX-based SWAP insertion
Routing	`DefaultMappingPass`	Convenience pass that picks placement and routing automatically
Routing	`RoutingPass`	Inserts SWAPs to satisfy architecture constraints without placement
Rebase	`RebaseTket`	Converts all gates to the TKET canonical set (TK1 + CX)
Rebase	`RebaseCustom`	Converts all gates to a user-specified single-qubit and two-qubit gate set
Rebase	`AutoRebase`	Automatically builds a rebase pass for a given target gate set (replaces the removed `auto_rebase_pass`)
Verification	`GateSetPredicate`	Checks that all gates belong to an allowed set (used with conditional passes)
Verification	`ConnectivityPredicate`	Checks that all two-qubit gates respect a given architecture

Each pass exposes an apply(circuit) method that mutates the circuit in place and returns a boolean indicating whether any changes were made.

The Pass System

Every compilation pass in pytket inherits from BasePass and exposes an apply method that mutates a Circuit in place. Passes are composed with SequencePass.

from pytket.passes import (
    SequencePass,
    FullPeepholeOptimise,
    RebaseCustom,
    CommuteThroughMultis,
    RemoveRedundancies,
    SynthesiseTket,
)
from pytket.circuit import Circuit

# A simple 4-qubit circuit to optimize
circ = Circuit(4)
circ.H(0).CX(0, 1).CX(1, 2).CX(2, 3)
circ.Rz(0.5, 0).Rz(0.5, 0)   # redundant pair that can be merged
circ.CX(0, 1).H(0)

print("Before:", circ.n_gates, "gates, depth", circ.depth())

pass_sequence = SequencePass([
    CommuteThroughMultis(),
    RemoveRedundancies(),
    SynthesiseTket(),
])
pass_sequence.apply(circ)

print("After:", circ.n_gates, "gates, depth", circ.depth())

CommuteThroughMultis commutes single-qubit gates through two-qubit gates when the commutativity rules allow it, exposing cancellation opportunities. RemoveRedundancies removes gate-inverse pairs and identity rotations. SynthesiseTket re-synthesizes small subcircuits using TKET’s internal KAK-based decomposer.

How CommuteThroughMultis Works

CommuteThroughMultis exploits the fact that certain single-qubit gates commute with multi-qubit gates on specific qubits. The pass checks commutativity rules based on the gate’s eigenbasis and the structure of the multi-qubit gate.

The core rules for CX gates are:

Control qubit (qubit 0 of CX): The control qubit of a CX gate is diagonal in the Z basis. Any gate that is also diagonal in the Z basis commutes with CX on the control. This includes Rz, T, Tdg, S, Sdg, and Z gates. Intuitively, CX applies a conditional X to the target based on whether the control is |1>, and Z-basis rotations do not change the population in |0> vs |1>.
Target qubit (qubit 1 of CX): The target qubit undergoes a conditional X. Gates that commute with X (that is, X-basis diagonal gates like Rx and X itself) commute through CX on the target.
Non-commuting example: An Rx gate on the control qubit does not commute through CX, because Rx changes the Z-basis populations and alters the conditional behavior of the control.

When a single-qubit gate commutes through a CX, it can slide past it to potentially cancel with another gate on the other side.

from pytket.circuit import Circuit
from pytket.passes import CommuteThroughMultis

# Before: Rz on qubit 0, then CX(0,1)
circ = Circuit(2)
circ.Rz(0.5, 0)
circ.CX(0, 1)

print("Before commutation:")
for cmd in circ.get_commands():
    print(f"  {cmd}")

# Apply CommuteThroughMultis
CommuteThroughMultis().apply(circ)

print("\nAfter commutation:")
for cmd in circ.get_commands():
    print(f"  {cmd}")

# The Rz(0.5) on qubit 0 commutes through the CX on the control qubit,
# so it moves after the CX. The unitary is preserved.

In a larger circuit, this commutation can push an Rz past a CX to merge with another Rz on the same qubit, and RemoveRedundancies then combines them into a single rotation. This is why running CommuteThroughMultis before RemoveRedundancies is more effective than running RemoveRedundancies alone.

from pytket.circuit import Circuit
from pytket.passes import CommuteThroughMultis, RemoveRedundancies

# Rz(0.3) -- CX -- Rz(0.7) on the same qubit
# Without commutation, RemoveRedundancies cannot merge the two Rz gates.
circ = Circuit(2)
circ.Rz(0.3, 0)
circ.CX(0, 1)
circ.Rz(0.7, 0)

print("Initial gate count:", circ.n_gates)

# Just RemoveRedundancies alone cannot help here
circ_copy = circ.copy()
RemoveRedundancies().apply(circ_copy)
print("After RemoveRedundancies only:", circ_copy.n_gates)

# But CommuteThroughMultis + RemoveRedundancies can merge the Rz gates
CommuteThroughMultis().apply(circ)
RemoveRedundancies().apply(circ)
print("After commute + remove:", circ.n_gates)

KAK Decomposition in SynthesiseTket

SynthesiseTket decomposes any two-qubit unitary into at most 3 CX gates plus single-qubit rotations. It does this using the KAK (Khaneja-Glaser) decomposition, which is a fundamental result from Lie group theory applied to SU(4).

The KAK theorem states that any element of SU(4) can be written as:

U = (A1 ⊗ A2) · exp(i(c_x XX + c_y YY + c_z ZZ)) · (A3 ⊗ A4)

where A1, A2, A3, A4 are single-qubit unitaries and (c_x, c_y, c_z) are the Cartan coordinates (also called interaction coefficients). These three real numbers completely characterize the entangling power of the two-qubit gate.

Key examples of Cartan coordinates:

Identity: c_x = c_y = c_z = 0 (no entanglement, 0 CX gates needed)
CNOT: c_x = pi/4, c_y = c_z = 0 (1 CX gate needed)
iSWAP: c_x = c_y = pi/4, c_z = 0 (2 CX gates needed)
SWAP: c_x = c_y = c_z = pi/4 (3 CX gates needed)
Generic unitary: up to 3 CX gates needed

The number of CX gates required depends on how many of the Cartan coordinates are nonzero. If only c_x is nonzero, one CX suffices. If c_x and c_y are nonzero, two CX gates are needed. If all three are nonzero, three CX gates are required.

from pytket.circuit import Circuit, Unitary2qBox, OpType
from pytket.passes import SynthesiseTket, DecomposeBoxes
import numpy as np

# Create a random 2-qubit unitary
rng = np.random.default_rng(42)
# Generate a random unitary using QR decomposition of a random complex matrix
random_matrix = rng.standard_normal((4, 4)) + 1j * rng.standard_normal((4, 4))
q, r = np.linalg.qr(random_matrix)
# Fix the phase to make det = 1
d = np.diag(r)
ph = d / np.abs(d)
q = q @ np.diag(ph)

# Wrap it as a Unitary2qBox
ubox = Unitary2qBox(q)
circ = Circuit(2)
circ.add_unitary2qbox(ubox, 0, 1)

# Decompose the box into gates, then synthesize
DecomposeBoxes().apply(circ)
SynthesiseTket().apply(circ)

# Count CX gates
cx_count = sum(
    1 for cmd in circ.get_commands() if cmd.op.type == OpType.CX
)
print(f"CX gate count after KAK decomposition: {cx_count}")
assert cx_count <= 3, "KAK guarantees at most 3 CX gates"

# The circuit now contains only TK1 (single-qubit) and CX (two-qubit) gates
gate_types = set(cmd.op.type for cmd in circ.get_commands())
print(f"Gate types present: {gate_types}")

This is significant for compilation because it means any two-qubit interaction can be expressed with a bounded number of entangling gates. When SynthesiseTket scans a circuit, it identifies two-qubit subcircuits (pairs of qubits that interact), computes their net unitary, and re-synthesizes them using the minimal number of CX gates.

Clifford Simplification with CliffordSimp

Clifford gates form a special subgroup of quantum gates that includes H, S, CX, X, Y, and Z. These gates have a remarkable property: they map Pauli operators to Pauli operators under conjugation. This means that a circuit consisting entirely of Clifford gates can be represented compactly using a tableau (a binary matrix tracking how Paulis transform), and composed or simplified in polynomial time.

CliffordSimp identifies contiguous subcircuits composed entirely of Clifford gates, simplifies them using tableau algebra, and re-inserts the simplified version. This is especially effective for circuits that contain many Clifford gates interspersed with a few non-Clifford rotations (like T gates), which is common in fault-tolerant circuit constructions.

from pytket.circuit import Circuit
from pytket.passes import CliffordSimp

# Build a circuit with many redundant Clifford gates
circ = Circuit(3)

# Layer of Hadamards and CX gates
circ.H(0).H(1).H(2)
circ.CX(0, 1).CX(1, 2)
circ.S(0).S(1).S(2)
circ.CX(0, 1).CX(1, 2)
circ.H(0).H(1).H(2)

# More Clifford operations that partially cancel
circ.X(0).Z(1).Y(2)
circ.CX(0, 1).CX(1, 0).CX(0, 1)  # This is a SWAP
circ.H(0).S(0).H(0)  # HSH = Sdg (up to phase)
circ.CX(1, 2).CX(1, 2)  # Two identical CX gates cancel

gates_before = circ.n_gates
two_qb_before = circ.n_2qb_gates()
print(f"Before CliffordSimp: {gates_before} gates, {two_qb_before} 2-qubit gates")

CliffordSimp().apply(circ)

gates_after = circ.n_gates
two_qb_after = circ.n_2qb_gates()
print(f"After CliffordSimp:  {gates_after} gates, {two_qb_after} 2-qubit gates")
print(f"Reduction: {gates_before - gates_after} gates removed")

CliffordSimp is particularly useful as a cleanup pass after routing, because SWAP insertion introduces sequences of three CX gates (which are all Clifford) that can sometimes be simplified in context.

Rebasing to Hardware Native Gates

Each hardware vendor supports a different native gate set. TKET provides RebaseCustom to rewrite any circuit into an arbitrary set of single-qubit and two-qubit primitives.

IBM Native Gates

For IBM devices the native set is {Rz, SX, X, CX}. The single-qubit decomposition converts the universal TK1(a, b, c) gate into a sequence of Rz and SX rotations. TK1(a, b, c) represents an arbitrary single-qubit unitary as Rz(a) Ry(b) Rz(c). To express Ry in terms of Rz and SX, we use the identity Ry(b) = Rz(-0.5) SX Rz(b) SX^dag Rz(0.5), which gives a decomposition into at most 3 Rz and 2 SX gates.

from pytket.passes import RebaseCustom
from pytket.circuit import Circuit, OpType

def tk1_to_rzsx(a, b, c):
    """Decompose TK1(a,b,c) into Rz and SX gates (IBM native)."""
    circ = Circuit(1)
    circ.Rz(c, 0)
    circ.SX(0)
    circ.Rz(b, 0)
    circ.SX(0)
    circ.Rz(a, 0)
    return circ

cx_circ = Circuit(2)
cx_circ.CX(0, 1)

ibm_rebase = RebaseCustom(
    {OpType.Rz, OpType.SX, OpType.X},
    cx_circ,
    tk1_to_rzsx,
)

circ = Circuit(3)
circ.H(0).CX(0, 1).T(1).CX(1, 2).Tdg(2)

ibm_rebase.apply(circ)
print("IBM native gates:")
for cmd in circ.get_commands():
    print(f"  {cmd.op.type.name} on {cmd.qubits}")

IonQ Native Gates

IonQ trapped-ion hardware uses a fundamentally different native gate set. The single-qubit primitive is GPi (and its variant GPi2), which performs rotations on the equator of the Bloch sphere. The two-qubit primitive is the Molmer-Sorensen (MS) gate, a globally entangling operation that creates a maximally entangled state.

The GPi(phi) gate applies a pi-rotation around an axis in the XY plane at angle phi:

GPi(phi) = [[0, e^(-i*phi)], [e^(i*phi), 0]]

The GPi2(phi) gate is a pi/2 rotation around the same axis. Together, GPi and GPi2 can produce any single-qubit rotation. The relationship to the standard Euler angles is:

Rz(theta) = GPi2(0) GPi2(-theta) (up to global phase)
Any TK1(a,b,c) decomposes into at most 3 GPi2 gates

The MS gate is the symmetric Molmer-Sorensen interaction: MS = exp(-i pi/4 XX), which is equivalent to a CX up to single-qubit corrections.

from pytket.circuit import Circuit, OpType
from pytket.passes import RebaseCustom

def tk1_to_ionq(a, b, c):
    """Decompose TK1(a,b,c) into Rz and Ry gates for IonQ.

    IonQ accepts Rz and Ry as virtual/physical single-qubit gates.
    The actual hardware translates these to GPi/GPi2 pulses.
    """
    circ = Circuit(1)
    circ.Rz(c, 0)
    circ.Ry(b, 0)
    circ.Rz(a, 0)
    return circ

# MS gate is locally equivalent to CX:
# CX = (I ⊗ Ry(-0.5)) MS (I ⊗ Ry(0.5)) (Rz(-0.5) ⊗ Rz(-0.5)) up to phase
# For RebaseCustom, we provide CX directly since pytket handles the
# MS equivalence at the backend level.
cx_replacement = Circuit(2)
cx_replacement.CX(0, 1)

ionq_rebase = RebaseCustom(
    {OpType.Rz, OpType.Ry},
    cx_replacement,
    tk1_to_ionq,
)

circ = Circuit(2)
circ.H(0).CX(0, 1).T(1)
ionq_rebase.apply(circ)

print("IonQ-compatible gates:")
for cmd in circ.get_commands():
    print(f"  {cmd.op.type.name}({cmd.op.params}) on {cmd.qubits}")

When targeting IonQ hardware through a maintained extension such as pytket-braket, the backend’s default compilation applies the appropriate rebase automatically. (An older pytket-ionq extension existed but is no longer maintained.) The manual RebaseCustom shown here is useful for profiling or when building custom pipelines that need to simulate the target gate set without connecting to the actual backend.

Quantinuum (H-Series) Native Gates

Quantinuum’s H-series trapped-ion processors use the native gate set {Rz, PhasedX, ZZPhase}. This is a particularly elegant basis because:

Rz(t) applies a Z rotation by angle t*pi. On trapped ions, Rz is a virtual gate (implemented by frame tracking) with zero error.
PhasedX(a, b) applies a rotation of angle api around an axis in the XY plane at azimuthal angle bpi. It generalizes both Rx and Ry: PhasedX(a, 0) = Rx(api) and PhasedX(a, 0.5) = Ry(api).
ZZPhase(t) applies exp(-i * t * pi/2 * ZZ), a symmetric two-qubit ZZ interaction. This is the native entangling operation on Quantinuum hardware.

The CNOT decomposition into ZZPhase plus single-qubit corrections is:

CX(0,1) = Ry(-0.5, 1) . ZZPhase(0.5, 0, 1) . Ry(0.5, 1) . Rz(-0.5, 0) . Rz(-0.5, 1)

(up to a global phase).

from pytket.circuit import Circuit, OpType
from pytket.passes import RebaseCustom

def tk1_to_phasedx(a, b, c):
    """Decompose TK1(a,b,c) into Rz and PhasedX gates (Quantinuum native).

    TK1(a,b,c) = Rz(a) Ry(b) Rz(c)
    Ry(b) = PhasedX(b, 0.5)
    So TK1(a,b,c) = Rz(a) PhasedX(b, 0.5) Rz(c)
    """
    circ = Circuit(1)
    circ.Rz(c, 0)
    circ.add_gate(OpType.PhasedX, [b, 0.5], [0])
    circ.Rz(a, 0)
    return circ

# CX decomposition into ZZPhase + single-qubit corrections
cx_replacement = Circuit(2)
cx_replacement.add_gate(OpType.PhasedX, [0.5, 0.5], [1])  # Ry(0.5) on target
cx_replacement.add_gate(OpType.ZZPhase, [0.5], [0, 1])
cx_replacement.add_gate(OpType.PhasedX, [-0.5, 0.5], [1])  # Ry(-0.5) on target
cx_replacement.Rz(-0.5, 0)
cx_replacement.Rz(-0.5, 1)

quantinuum_rebase = RebaseCustom(
    {OpType.Rz, OpType.PhasedX},
    cx_replacement,
    tk1_to_phasedx,
)

circ = Circuit(2)
circ.H(0).CX(0, 1).Rz(0.3, 1)
quantinuum_rebase.apply(circ)

print("Quantinuum-native gates:")
for cmd in circ.get_commands():
    print(f"  {cmd.op.type.name}({cmd.op.params}) on {cmd.qubits}")

Because Rz is a virtual gate on trapped-ion hardware (zero duration, zero error), the Quantinuum rebase produces circuits where only PhasedX and ZZPhase contribute to the actual execution time and error budget. Optimizing a pipeline for Quantinuum means minimizing PhasedX and ZZPhase counts specifically.

Conditional Pass Application

RepeatWithMetricPass

RepeatWithMetricPass runs a pass repeatedly until a metric function stops improving. This is useful when a single pass application does not converge in one shot.

from pytket.passes import RepeatWithMetricPass, RemoveRedundancies
from pytket.circuit import Circuit

circ = Circuit(3)
circ.CX(0, 1).CX(1, 0).CX(0, 1)

# Repeat RemoveRedundancies until it stops changing the gate count
def gate_count_metric(c):
    return c.n_gates

repeat_pass = RepeatWithMetricPass(RemoveRedundancies(), gate_count_metric)
repeat_pass.apply(circ)
print("Gates after repeated removal:", circ.n_gates)

Predicate-Based Conditional Passes

pytket provides predicates that test whether a circuit satisfies certain conditions. You can use GateSetPredicate to check if a circuit already uses the target gate set, and skip the rebase if it does.

from pytket.circuit import Circuit, OpType
from pytket.predicates import GateSetPredicate

# Define what we consider "already rebased to IBM native"
ibm_predicate = GateSetPredicate({
    OpType.CX, OpType.Rz, OpType.SX, OpType.X, OpType.Measure,
})

# A circuit already in the IBM native set
circ_native = Circuit(2)
circ_native.Rz(0.3, 0).SX(0).CX(0, 1)

# A circuit NOT in the IBM native set
circ_foreign = Circuit(2)
circ_foreign.H(0).T(1).CX(0, 1)

print("Native circuit satisfies predicate:", ibm_predicate.verify(circ_native))
print("Foreign circuit satisfies predicate:", ibm_predicate.verify(circ_foreign))

# Only rebase if needed
if not ibm_predicate.verify(circ_foreign):
    ibm_rebase.apply(circ_foreign)
    print("Rebased. Now satisfies predicate:", ibm_predicate.verify(circ_foreign))

This pattern is especially useful in production pipelines where you do not control the input circuit format. Skipping unnecessary passes saves compilation time and avoids introducing redundant gates from an identity rebase.

Placement Strategies

Before routing, TKET must decide which logical qubit maps to which physical qubit. This is the placement step, and the quality of the initial placement directly affects how many SWAP gates routing needs to insert.

TKET provides three main placement strategies:

LinePlacement finds the longest chain of interacting qubits in the circuit and maps them onto a contiguous line of physical qubits. This works well when the circuit has a mostly linear interaction pattern.
GraphPlacement uses subgraph isomorphism to find the best mapping of the circuit’s interaction graph onto the hardware’s connectivity graph. This is the most general strategy and works well for arbitrary circuits.
NoiseAwarePlacement extends GraphPlacement by incorporating device calibration data. It prefers physical qubits with lower gate error rates and longer coherence times. This requires a backend object that provides noise information.

from pytket.architecture import Architecture
from pytket.placement import LinePlacement, GraphPlacement
from pytket.circuit import Circuit

# A T-shaped architecture:
#   0 - 1 - 2
#       |
#       3
#       |
#       4
arch = Architecture([(0, 1), (1, 2), (1, 3), (3, 4)])

# LinePlacement finds a line through the architecture
line_place = LinePlacement(arch)

# GraphPlacement uses subgraph matching
graph_place = GraphPlacement(arch)

# Build a circuit with a specific interaction pattern
circ = Circuit(4)
circ.CX(0, 1).CX(1, 2).CX(2, 3)
circ.CX(0, 2)  # This non-local interaction forces at least one SWAP

# Try both placements
import copy

circ_line = copy.deepcopy(circ)
circ_graph = copy.deepcopy(circ)

line_place.place(circ_line)
graph_place.place(circ_graph)

print("Line placement mapping:")
print(f"  {circ_line.qubit_readout()}")
print("Graph placement mapping:")
print(f"  {circ_graph.qubit_readout()}")

Qubit placement matters because a good initial mapping can place frequently interacting qubits on adjacent physical qubits, reducing the number of SWAPs the router needs to insert. On a device with 100+ qubits, a poor placement can double or triple the two-qubit gate count after routing.

Routing and SWAP Overhead Across Topologies

Once a circuit is rebased to native gates, it still needs to be routed onto the hardware connectivity graph. Different hardware architectures impose different connectivity constraints, and the topology has a dramatic effect on SWAP overhead.

CXMappingPass

CXMappingPass simultaneously places logical qubits and inserts SWAP gates to satisfy connectivity. Each SWAP decomposes into 3 CX gates, so minimizing SWAPs is critical for circuit fidelity.

from pytket.passes import CXMappingPass, DefaultMappingPass
from pytket.architecture import Architecture
from pytket.placement import GraphPlacement
from pytket.circuit import Circuit

# Define a linear connectivity: 0-1-2-3-4
arch = Architecture([(0, 1), (1, 2), (2, 3), (3, 4)])
placement = GraphPlacement(arch)

mapping_pass = CXMappingPass(
    arch,
    placement,
    directed_cx=True,
    delay_measures=True,
)

def make_circuit():
    c = Circuit(5)
    for i in range(4):
        c.CX(i, i + 1)
    for i in range(5):
        c.Rz(0.3, i).H(i)
    for i in range(4):
        c.CX(i + 1, i)
    for i in range(5):
        c.Rz(0.7, i)
    return c

def profile(label, c):
    print(f"{label:35s}  gates={c.n_gates:4d}  depth={c.depth():4d}  "
          f"2q_gates={c.n_2qb_gates():4d}")

circ_to_route = make_circuit()
SynthesiseTket().apply(circ_to_route)
mapping_pass.apply(circ_to_route)

profile("After routing", circ_to_route)

directed_cx=True preserves the orientation of CX gates to match the hardware’s native direction, avoiding extra overhead from reversing gate direction. delay_measures=True pushes measurements as late as possible, which reduces decoherence on measured qubits.

Comparing Topologies

The following example builds a fully-connected circuit on 5 qubits (every qubit interacts with every other) and routes it onto three different topologies to compare SWAP overhead.

from pytket.circuit import Circuit, OpType
from pytket.architecture import Architecture
from pytket.placement import GraphPlacement
from pytket.passes import CXMappingPass, SynthesiseTket, RemoveRedundancies
import copy

def make_fully_connected_circuit(n_qubits=5):
    """Create a circuit where every pair of qubits interacts."""
    circ = Circuit(n_qubits)
    for i in range(n_qubits):
        for j in range(i + 1, n_qubits):
            circ.CX(i, j)
            circ.Rz(0.1 * (i + j), j)
    return circ

# Three topologies for 5 qubits
# Linear chain: 0-1-2-3-4
linear = Architecture([(i, i + 1) for i in range(4)])

# Star: qubit 2 is the hub
star = Architecture([(2, 0), (2, 1), (2, 3), (2, 4)])

# Grid (2x3 with 5 qubits used):
# 0 - 1 - 2
# |   |
# 3 - 4
grid = Architecture([(0, 1), (1, 2), (0, 3), (1, 4), (3, 4)])

topologies = [
    ("Linear chain", linear),
    ("Star", star),
    ("Grid (2x3)", grid),
]

base_circ = make_fully_connected_circuit()
SynthesiseTket().apply(base_circ)
print(f"{'Topology':20s}  {'2Q gates':>10s}  {'Total gates':>12s}  {'Depth':>6s}")
print("-" * 55)

for name, arch in topologies:
    circ = copy.deepcopy(base_circ)
    placement = GraphPlacement(arch)
    routing = CXMappingPass(arch, placement, directed_cx=False)
    routing.apply(circ)
    RemoveRedundancies().apply(circ)
    print(f"{name:20s}  {circ.n_2qb_gates():10d}  {circ.n_gates:12d}  {circ.depth():6d}")

Linear chains produce the most SWAP overhead because distant qubits must communicate through a chain of intermediaries. The grid topology provides shorter paths between most qubit pairs, and the star topology excels when one qubit interacts with many others.

Custom Pass from Scratch Using BasePass

TKET allows you to define custom passes that compose with built-in passes using SequencePass. The simplest way to create a custom pass is with CustomPass, which wraps a function that transforms a circuit.

from pytket.passes import CustomPass, SequencePass, RemoveRedundancies
from pytket.circuit import Circuit, OpType

def cx_counter(circ):
    """A custom pass that logs CX gate statistics."""
    cx_count = sum(
        1 for cmd in circ.get_commands() if cmd.op.type == OpType.CX
    )
    total = circ.n_gates
    print(f"  [CX Counter] {cx_count} CX gates out of {total} total "
          f"({100 * cx_count / max(total, 1):.1f}%)")
    return circ

counter_pass = CustomPass(cx_counter)

# Use it in a pipeline alongside built-in passes
circ = Circuit(3)
circ.H(0).CX(0, 1).CX(1, 2).CX(2, 1).CX(1, 0)
circ.Rz(0.5, 0).Rz(-0.5, 0)  # cancels to identity

print("Before optimization:")
counter_pass.apply(circ)

pipeline = SequencePass([
    RemoveRedundancies(),
    counter_pass,
])

print("After optimization:")
pipeline.apply(circ)

For more complex custom passes that need to track state across invocations, you can use a closure or a class:

from pytket.passes import CustomPass, SequencePass, CommuteThroughMultis
from pytket.circuit import Circuit, OpType

class PassProfiler:
    """Records gate counts at each invocation, for later analysis."""

    def __init__(self, label):
        self.label = label
        self.history = []

    def __call__(self, circ):
        record = {
            "label": self.label,
            "n_gates": circ.n_gates,
            "depth": circ.depth(),
            "n_2qb": circ.n_2qb_gates(),
        }
        self.history.append(record)
        return circ

    def as_pass(self):
        return CustomPass(self)

# Create profilers for each stage
prof_before = PassProfiler("before")
prof_after = PassProfiler("after_commute")

pipeline = SequencePass([
    prof_before.as_pass(),
    CommuteThroughMultis(),
    prof_after.as_pass(),
])

circ = Circuit(4)
for i in range(3):
    circ.CX(i, i + 1)
circ.Rz(0.3, 0).Rz(0.7, 1)
for i in range(3):
    circ.CX(i, i + 1)

pipeline.apply(circ)

print("Profile results:")
for record in prof_before.history + prof_after.history:
    print(f"  {record['label']:20s}  gates={record['n_gates']}  "
          f"depth={record['depth']}  2qb={record['n_2qb']}")

Measuring Pass Effectiveness with Metrics

When building a compilation pipeline, you need to know which passes actually contribute to gate count and depth reduction. The following profiler applies each pass individually to a copy of the circuit and records the metrics at every stage.

from pytket.circuit import Circuit, OpType
from pytket.passes import (
    CommuteThroughMultis,
    RemoveRedundancies,
    SynthesiseTket,
    FullPeepholeOptimise,
    CliffordSimp,
    PeepholeOptimise2Q,
)
import copy

def profile(label, c):
    return {
        "label": label,
        "gates": c.n_gates,
        "depth": c.depth(),
        "2qb": c.n_2qb_gates(),
    }

def build_random_circuit(n_qubits=10, seed=42):
    """Build a realistic test circuit with mixed gate types."""
    import random
    random.seed(seed)
    circ = Circuit(n_qubits)
    for _ in range(60):
        gate_type = random.choice(["cx", "h", "rz", "t", "s", "cx"])
        if gate_type == "cx":
            q1, q2 = random.sample(range(n_qubits), 2)
            circ.CX(q1, q2)
        elif gate_type == "h":
            circ.H(random.randint(0, n_qubits - 1))
        elif gate_type == "rz":
            circ.Rz(random.uniform(0, 2), random.randint(0, n_qubits - 1))
        elif gate_type == "t":
            circ.T(random.randint(0, n_qubits - 1))
        elif gate_type == "s":
            circ.S(random.randint(0, n_qubits - 1))
    return circ

# Build the test circuit
base_circ = build_random_circuit(n_qubits=10)

# Define the passes to benchmark
passes = [
    ("CommuteThroughMultis", CommuteThroughMultis()),
    ("RemoveRedundancies", RemoveRedundancies()),
    ("SynthesiseTket", SynthesiseTket()),
    ("CliffordSimp", CliffordSimp()),
    ("PeepholeOptimise2Q", PeepholeOptimise2Q()),
]

# Apply passes cumulatively and record metrics at each stage
results = []
circ = copy.deepcopy(base_circ)
results.append(profile("Initial", circ))

for label, p in passes:
    p.apply(circ)
    results.append(profile(f"After {label}", circ))

# Also benchmark FullPeepholeOptimise as a baseline
circ_full = copy.deepcopy(base_circ)
FullPeepholeOptimise().apply(circ_full)
results.append(profile("FullPeepholeOptimise", circ_full))

# Print results table
print(f"{'Stage':35s}  {'Gates':>6s}  {'Depth':>6s}  {'2Q Gates':>8s}")
print("-" * 60)
for r in results:
    print(f"{r['label']:35s}  {r['gates']:6d}  {r['depth']:6d}  {r['2qb']:8d}")

# Compute per-pass contribution
print("\nPer-pass gate reduction:")
for i in range(1, len(results) - 1):  # exclude FullPeepholeOptimise row
    prev = results[i - 1]
    curr = results[i]
    delta = prev["gates"] - curr["gates"]
    print(f"  {curr['label']:35s}  {delta:+4d} gates  "
          f"({delta:+4d} from {prev['gates']})")

This profiling approach helps you identify which passes to keep and which add compilation time without meaningful improvement for your specific circuit family. For example, CliffordSimp is very effective on circuits from fault-tolerant synthesis (which are Clifford-heavy) but may do nothing on variational circuits that are dominated by parameterized rotations.

Profiling Each Stage

To understand where optimization wins come from in a specific pipeline, apply passes individually and record metrics at each step.

from pytket.passes import (
    CommuteThroughMultis,
    RemoveRedundancies,
    SynthesiseTket,
    FullPeepholeOptimise,
)
from pytket.circuit import Circuit
import copy

def profile(label, c):
    print(f"{label:35s}  gates={c.n_gates:4d}  depth={c.depth():4d}  "
          f"2q_gates={c.n_2qb_gates():4d}")

# Build a moderately complex circuit
def make_circuit():
    c = Circuit(5)
    for i in range(4):
        c.CX(i, i + 1)
    for i in range(5):
        c.Rz(0.3, i).H(i)
    for i in range(4):
        c.CX(i + 1, i)
    for i in range(5):
        c.Rz(0.7, i)
    return c

stages = [
    ("CommuteThroughMultis", CommuteThroughMultis()),
    ("RemoveRedundancies", RemoveRedundancies()),
    ("SynthesiseTket", SynthesiseTket()),
]

circ = make_circuit()
profile("Initial", circ)

for label, p in stages:
    p.apply(circ)
    profile(f"After {label}", circ)

# Compare against the all-in-one shortcut
circ_full = make_circuit()
FullPeepholeOptimise().apply(circ_full)
profile("FullPeepholeOptimise", circ_full)

This kind of profiling reveals which passes contribute the most reduction for your specific circuit structure. For circuits dominated by Clifford gates, SynthesiseTket tends to dominate. For circuits with many commuting single-qubit gates, CommuteThroughMultis followed by RemoveRedundancies gives the most improvement.

Assembling a Full Pipeline

A production pipeline typically follows this order: synthesize, optimize, rebase, route, clean up.

The ordering matters. Synthesis and optimization should happen first because they reduce the gate count and simplify the circuit structure before routing. Rebasing converts to the target gate set so that routing inserts SWAPs in the correct basis. The final cleanup catches cancellations introduced by SWAP decomposition.

from pytket.passes import (
    SequencePass, SynthesiseTket, RemoveRedundancies,
    CommuteThroughMultis, CliffordSimp,
)

full_pipeline = SequencePass([
    # Phase 1: high-level optimization
    CommuteThroughMultis(),
    RemoveRedundancies(),
    SynthesiseTket(),
    CliffordSimp(),

    # Phase 2: rebase to hardware native gates
    ibm_rebase,           # from the earlier example

    # Phase 3: routing onto hardware connectivity
    mapping_pass,         # routing pass defined above

    # Phase 4: post-routing cleanup
    RemoveRedundancies(),
])

circ_final = make_circuit()
full_pipeline.apply(circ_final)
profile("Full pipeline", circ_final)

The second RemoveRedundancies after routing catches cancellations that routing sometimes introduces via adjacent SWAP decompositions. Running it again is cheap and usually reduces 2-qubit gate count by a few percent.

Common Mistakes

Five pitfalls that frequently cause subtle problems in TKET compilation pipelines:

1. Applying routing before rebase. If you route a circuit that still contains high-level gates (like H or T), the router inserts SWAPs composed of those gates. When you then rebase to the target gate set, each SWAP expands further, potentially doubling the gate count. Always rebase before routing, so that SWAPs are inserted in the native gate set and do not need re-expansion.

# Wrong order: route then rebase
wrong_pipeline = SequencePass([
    mapping_pass,    # inserts SWAPs as CX triples in TK1/CX basis
    ibm_rebase,      # re-expands every TK1 into Rz/SX, inflating gate count
])

# Correct order: rebase then route
correct_pipeline = SequencePass([
    ibm_rebase,      # convert to native gates first
    mapping_pass,    # SWAPs are now in native CX, no re-expansion needed
    RemoveRedundancies(),
])

2. Forgetting that passes mutate circuits in place. If you apply a pass to a circuit and then want to compare it against the original, the original is gone. Always use copy.deepcopy before applying passes when you need to preserve the original for comparison or profiling.

import copy

circ = make_circuit()
circ_backup = copy.deepcopy(circ)  # preserve original

SynthesiseTket().apply(circ)

# Now you can compare circ (optimized) vs circ_backup (original)
print(f"Before: {circ_backup.n_gates} gates")
print(f"After:  {circ.n_gates} gates")

3. Using FullPeepholeOptimise after a custom rebase. FullPeepholeOptimise internally assumes the TK1 + CX gate basis. If you have already rebased to a different gate set (like Rz + SX + CX for IBM), FullPeepholeOptimise will first convert back to TK1 + CX, optimize, and leave the result in TK1 + CX. You then need to rebase again, which can introduce extra gates. If you use FullPeepholeOptimise, apply it before your custom rebase, not after.

4. Ignoring directed CX constraints. Some hardware only supports CX in one direction (for example, CX(0,1) but not CX(1,0)). If you set directed_cx=False in the routing pass, the compiler may insert CX gates in the wrong direction. The hardware backend then reverses them using H-CX-H sandwiches, adding 2 extra H gates per reversed CX. Set directed_cx=True and let the router handle directionality during placement.

# With directed_cx=True, the router respects hardware CX direction
# and avoids the H-CX-H overhead for reversed gates
mapping_pass_directed = CXMappingPass(
    arch,
    placement,
    directed_cx=True,   # respect hardware direction
    delay_measures=True,
)

5. Not running RemoveRedundancies after routing. SWAP decomposition breaks each SWAP into 3 CX gates. When two SWAPs are adjacent (which happens at topology bottlenecks), the resulting 6 CX gates often contain cancellable pairs. A single RemoveRedundancies pass after routing typically removes 5-15% of the two-qubit gates introduced by routing. Skipping this step leaves free performance on the table.

Custom pass pipelines are how production quantum software stacks achieve the circuit fidelity needed to run meaningful computations on NISQ hardware. The pytket pass system gives you the composability to experiment with pass ordering without rewriting your circuit construction code.