Noise Mitigation Techniques in tket

Noise on Real Hardware

Every gate applied on a quantum processor introduces some error. Raw results from hardware often differ substantially from ideal simulator output. Understanding the specific types of noise helps you choose the right mitigation strategy.

Gate Errors (Depolarizing Noise)

When a quantum gate executes on hardware, the result is not a perfect unitary operation. The dominant error model for gates is depolarizing noise. After a gate with error probability p, the system is in the ideal (correct) state with probability 1 - p and in a random Pauli error state with probability p. More precisely, each of the three non-trivial Pauli operators (X, Y, Z) occurs with probability p/3 for single-qubit depolarizing noise.

Two-qubit gates are significantly noisier than single-qubit gates. On current superconducting hardware (IBM, Rigetti), single-qubit gate error rates are typically 0.01% to 0.1%, while two-qubit gate (CX/CZ) error rates range from 0.3% to 1.5%. On trapped-ion systems (Quantinuum, IonQ), two-qubit gate fidelities are generally better (0.1% to 0.5%), but gate execution times are much longer.

The practical takeaway: every two-qubit gate you eliminate from your circuit directly reduces the total error. This is why compilation-based mitigation focuses heavily on CX gate count reduction.

T1 Decay (Amplitude Damping)

T1 describes the timescale over which a qubit spontaneously relaxes from the excited state |1> to the ground state |0>. This is analogous to an atom emitting a photon and returning to its ground state. On superconducting qubits, T1 times typically range from 50 to 200 microseconds.

T1 decay matters for circuit depth. If your circuit takes longer to execute than a significant fraction of T1, qubits that should be in the |1> state will start decaying to |0>. This introduces a bias: you observe more 0s than you should. Reducing circuit depth (the number of sequential gate layers) directly reduces the impact of T1 decay.

T2 Decay (Dephasing)

T2 describes the timescale over which a qubit loses phase coherence. A qubit in a superposition state like (|0> + |1>)/sqrt(2) gradually loses the phase relationship between its components, turning the pure superposition into a classical mixture. The result is that interference effects, which are the basis of quantum speedup, wash out over time.

T2 is always bounded by T1: specifically, T2 <= 2 * T1. On most superconducting hardware, T2 is roughly equal to T1 or slightly shorter. Like T1, T2 decay is a time-dependent effect, so shorter circuits suffer less dephasing.

SPAM Errors (State Preparation and Measurement)

SPAM stands for State Preparation And Measurement. These errors come in two flavors:

Preparation errors: when initializing a qubit in |0>, there is a small probability it actually starts in |1>.
Measurement errors: when reading a qubit, there is a probability of reading 0 when the qubit is actually in |1>, or reading 1 when the qubit is actually in |0>. On superconducting hardware, measurement error rates typically range from 0.5% to 5%, and the asymmetry between 0-to-1 and 1-to-0 errors is common.

SPAM errors are systematic and qubit-dependent. Each physical qubit has its own characteristic error rates that remain relatively stable over time. This stability makes SPAM errors amenable to calibration-based correction.

Crosstalk

Crosstalk occurs when a gate on one qubit unintentionally affects neighboring qubits through electromagnetic coupling. For example, driving a microwave pulse to perform an X gate on qubit 3 might induce a small unwanted rotation on qubit 4 if they are physically adjacent. Crosstalk errors are architecture-dependent and difficult to model precisely, but they contribute to overall circuit infidelity, especially on densely connected devices.

Mapping Noise Types to Mitigation Strategies

Each noise source suggests a specific mitigation approach:

Gate errors -> Reduce gate count through compilation (FullPeepholeOptimise, CliffordSimp, RemoveRedundancies)
T1 and T2 decay -> Reduce circuit depth through compilation and parallelization
SPAM errors -> Calibrate and correct with SpamCorrecter
Residual gate errors -> Post-process with ZNE or PEC after compilation
Crosstalk -> Choose routing that avoids simultaneous gates on coupled qubits (device-specific)

tket provides two complementary approaches: compilation-based mitigation (reducing the number of noisy operations through circuit optimization) and post-processing mitigation (correcting systematic biases in measurement results).

Compilation-Based Mitigation

The simplest and most effective way to reduce noise is to reduce the number of gates in your circuit, especially two-qubit gates.

Why Fewer Gates Means Less Noise: A Quantitative Case

Consider a concrete example. Suppose each two-qubit gate has an error rate of p = 0.5%, a realistic figure for current superconducting devices (trapped-ion systems such as Quantinuum’s do better, closer to 0.1%). For a circuit with n two-qubit gates, the probability that all gates execute correctly is approximately (1 - p)^n. For small p, this is well approximated by 1 - n*p.

A circuit with 10 two-qubit gates has an estimated total two-qubit gate error of about 10 * 0.005 = 5%. If FullPeepholeOptimise reduces that to 6 two-qubit gates, the total error drops to 6 * 0.005 = 3%. That is a 40% reduction in error from a compiler pass that costs nothing to apply.

Here is a concrete example that measures the gate count reduction:

from pytket import Circuit, OpType
from pytket.passes import FullPeepholeOptimise

# Build a circuit with redundant structure
circ = Circuit(4)
for i in range(3):
    circ.H(i)
    circ.CX(i, i + 1)
    circ.Rz(0.3, i + 1)
    circ.CX(i, i + 1)
    circ.H(i)
circ.CX(0, 1).CX(1, 2).CX(2, 3)
circ.Rz(0.5, 3)
circ.CX(2, 3).CX(1, 2).CX(0, 1)

cx_before = circ.n_gates_of_type(OpType.CX)
total_before = circ.n_gates

FullPeepholeOptimise().apply(circ)

cx_after = circ.n_gates_of_type(OpType.CX)
total_after = circ.n_gates

print(f"CX gates: {cx_before} -> {cx_after}")
print(f"Total gates: {total_before} -> {total_after}")

# Estimate error improvement (assuming 0.5% per CX gate)
p = 0.005
error_before = 1 - (1 - p) ** cx_before
error_after = 1 - (1 - p) ** cx_after
print(f"Estimated CX error: {error_before:.2%} -> {error_after:.2%}")

DecomposeBoxes

If your circuit contains high-level abstractions (CircBox, Unitary2qBox, etc.), these must be decomposed into primitive gates before further optimization passes can act on them:

from pytket import Circuit, OpType
from pytket.passes import DecomposeBoxes

circ = Circuit(3)
# ... add CircBox or other high-level operations ...
DecomposeBoxes().apply(circ)

Always apply DecomposeBoxes as the first step in your optimization pipeline. Other passes expect primitive gates and will skip over boxed operations.

CliffordSimplification

The CliffordSimp pass identifies and simplifies subcircuits composed entirely of Clifford gates (H, S, CNOT, and their combinations). Clifford circuits can be simulated classically in polynomial time using the Gottesman-Knill theorem, so tket can compute their net effect and replace them with a shorter equivalent sequence:

from pytket import Circuit, OpType
from pytket.passes import CliffordSimp

circ = Circuit(5)
circ.H(0).CX(0, 1).CX(1, 2).S(2).CX(2, 1).CX(1, 0).H(0)
circ.H(3).CX(3, 4).S(4).CX(4, 3).H(3)

print(f"Before CliffordSimp: {circ.n_gates_of_type(OpType.CX)} CX gates")
CliffordSimp().apply(circ)
print(f"After CliffordSimp: {circ.n_gates_of_type(OpType.CX)} CX gates")

CliffordSimp is particularly effective for circuits that contain stabilizer subcircuits, such as error correction syndrome extraction or state preparation routines for entangled states like GHZ and graph states.

RemoveRedundancies

The RemoveRedundancies pass automatically detects and removes pairs of gates that cancel each other out. Since H * H = I, X * X = I, and CX * CX = I, adjacent identical self-inverse gates can be eliminated:

from pytket import Circuit
from pytket.passes import RemoveRedundancies

circ = Circuit(2)
circ.H(0)
circ.H(0)       # These two H gates cancel (H * H = I)
circ.CX(0, 1)
circ.CX(0, 1)   # These two CX gates cancel (CX * CX = I)
circ.X(1)

print(f"Before: {circ.n_gates} gates")
RemoveRedundancies().apply(circ)
print(f"After: {circ.n_gates} gate (only the X remains)")

RemoveRedundancies is a lightweight pass that runs quickly on any circuit. While it only catches straightforward cancellations, it is useful as a cleanup step after other transformations that may introduce redundant gates.

CommuteThroughMultis

The CommuteThroughMultis pass moves single-qubit gates through multi-qubit gates when the commutation relation allows it. This enables cancellations that are not visible to simpler passes because the canceling gates are separated by a multi-qubit gate.

For example, an X gate on the control qubit commutes through a CX gate (since CX is controlled-X, and X on the control commutes if we track the phases correctly). This means two X gates separated by a CX can be brought together and canceled:

from pytket import Circuit
from pytket.passes import CommuteThroughMultis

circ = Circuit(3)
circ.X(0)
circ.CX(0, 1)
circ.X(0)       # This X can commute through the CX and cancel with the first X
circ.H(2)

print(f"Before: {circ.n_gates} gates")
CommuteThroughMultis().apply(circ)
print(f"After: {circ.n_gates} gates")

CommuteThroughMultis is especially useful in circuits generated by variational algorithms (VQE, QAOA) where parameterized single-qubit rotations are interleaved with entangling gates in repeated layers. Gate commutation can expose cancellations at the boundaries between layers.

SquashCustom for Single-Qubit Gate Merging

Any sequence of single-qubit gates on the same qubit is equivalent to a single rotation in SU(2). The SquashCustom pass merges consecutive single-qubit gates into a minimal sequence using a specified gate set. A common choice is the Rz-Rx decomposition, where any single-qubit unitary can be expressed as Rz * Rx * Rz (the ZXZ Euler decomposition):

from pytket import Circuit
from pytket.circuit import OpType
from pytket.passes import SquashCustom
from pytket.circuit_library import TK1_to_RzRx

circ = Circuit(1)
circ.H(0)
circ.T(0)
circ.S(0)
circ.H(0)
circ.Rz(0.3, 0)

print(f"Before: {circ.n_gates} single-qubit gates")
SquashCustom({OpType.Rz, OpType.Rx}, TK1_to_RzRx).apply(circ)
print(f"After: {circ.n_gates} gates (merged into Rz-Rx-Rz sequence)")

The TK1_to_RzRx function handles the conversion from tket’s internal TK1 gate (a general single-qubit rotation parameterized by three Euler angles) to the Rz-Rx gate set. You can also define custom replacement functions for other gate sets supported by your target hardware.

Squashing five single-qubit gates into two or three gates reduces both gate count and circuit duration, which helps with T1 and T2 decay.

KAK Decomposition for Two-Qubit Gate Reduction

FullPeepholeOptimise uses KAK decomposition (also called the Cartan or Kraus-Cirac decomposition) internally. Understanding what KAK does helps you appreciate why FullPeepholeOptimise is so effective.

The KAK decomposition theorem states that any two-qubit unitary U can be written as:

U = (A1 tensor B1) * exp(i * (a * XX + b * YY + c * ZZ)) * (A2 tensor B2)

where A1, A2, B1, B2 are single-qubit unitaries and a, b, c are real parameters. The middle exponential term (the “interaction part”) requires at most 3 CNOT gates to implement. If the interaction has special structure (for example, if one or more of a, b, c is zero), it may require fewer CNOTs: 0, 1, or 2.

This means that any sequence of two-qubit gates between the same pair of qubits can be consolidated into at most 3 CNOTs plus single-qubit rotations. When FullPeepholeOptimise finds a subcircuit containing multiple two-qubit gates between the same qubit pair, it computes the net unitary, applies KAK decomposition, and replaces the original subcircuit with the minimal CNOT-count equivalent.

For example, two CNOTs followed by specific single-qubit rotations might reduce to a single CNOT:

from pytket import Circuit, OpType
from pytket.passes import FullPeepholeOptimise

# Two CX gates with rotations between them
circ = Circuit(2)
circ.CX(0, 1)
circ.Rz(0.5, 0)
circ.Rz(0.25, 1)
circ.CX(0, 1)

cx_before = circ.n_gates_of_type(OpType.CX)
FullPeepholeOptimise().apply(circ)
cx_after = circ.n_gates_of_type(OpType.CX)

print(f"CX gates: {cx_before} -> {cx_after}")
# The net unitary may require fewer CX gates than the original circuit

FullPeepholeOptimise

FullPeepholeOptimise is tket’s most aggressive optimization pass. It combines all of the techniques above: gate commutation, Clifford simplification, single-qubit squashing, and KAK-based two-qubit gate decomposition. It works by scanning the circuit for subcircuits involving pairs of qubits, computing their net unitary, and resynthesizing with the minimum number of two-qubit gates:

from pytket import Circuit, OpType
from pytket.passes import FullPeepholeOptimise

circ = Circuit(5)
circ.H(0).CX(0, 1).Rz(0.3, 1).CX(0, 1).H(0)
circ.CX(1, 2).CX(2, 3).CX(3, 4)
circ.Rz(0.5, 4).CX(3, 4).CX(2, 3).CX(1, 2)

cx_before = circ.n_gates_of_type(OpType.CX)
FullPeepholeOptimise().apply(circ)
cx_after = circ.n_gates_of_type(OpType.CX)
print(f"CX gates: {cx_before} -> {cx_after}")

For best results, apply FullPeepholeOptimise before routing. The router inserts SWAP gates (each decomposing into 3 CX gates), so starting with a smaller circuit keeps the routed circuit shorter.

SPAM Correction with SpamCorrecter

State Preparation and Measurement (SPAM) errors are systematic: each qubit has a characteristic probability of reading 0 when it should be 1, and vice versa. If you measure these error probabilities through calibration, you can invert them to correct the output distribution.

How SpamCorrecter Works

The core idea is to build a confusion matrix that characterizes the measurement errors. For n qubits, the confusion matrix M is a 2^n by 2^n matrix where entry M[i][j] represents the probability of measuring bitstring i when bitstring j was actually prepared. In the ideal (noiseless) case, M is the identity matrix. On real hardware, the off-diagonal entries capture the measurement errors.

Once you have M, you can correct a measured probability distribution p_noisy by computing p_corrected = M_inverse * p_noisy. In practice, tket uses a constrained inversion that ensures the corrected probabilities remain non-negative.

Calibration Circuits

SpamCorrecter generates calibration circuits that prepare each computational basis state and measure it. For n qubits, there are 2^n calibration circuits (one per basis state). Each circuit prepares its target state using X gates (to flip qubits from |0> to |1> as needed) and then measures all qubits:

from pytket.utils.spam import SpamCorrecter
from pytket.circuit import Node

# Define qubit nodes (use the actual backend's Node objects)
# SpamCorrecter takes a list of qubit subsets; qubits within a subset
# are treated as having correlated readout errors
qubits = [Node(0), Node(1), Node(2)]
spam = SpamCorrecter([qubits])

# Get the calibration circuits
cal_circs = spam.calibration_circuits()
print(f"Number of calibration circuits: {len(cal_circs)}")
# For 3 qubits: 2^3 = 8 calibration circuits

# Each calibration circuit prepares one basis state:
# Circuit 0: prepare |000>, measure all
# Circuit 1: prepare |001>, measure all
# Circuit 2: prepare |010>, measure all
# ... and so on through |111>

Running Calibration and Applying Correction

After generating the calibration circuits, you run them on the hardware, build the confusion matrix, and apply corrections to your experimental results:

from pytket import Circuit
from pytket.utils.spam import SpamCorrecter
from pytket.circuit import Node

# Step 1: Define qubits and create corrector (takes a list of qubit subsets)
qubits = [Node(0), Node(1), Node(2)]
spam = SpamCorrecter([qubits])
cal_circs = spam.calibration_circuits()

# Step 2: Run calibration circuits on hardware
# Use at least 10,000 shots per circuit for reliable statistics
# cal_results = [
#     backend.get_result(backend.process_circuit(c, n_shots=10000))
#     for c in cal_circs
# ]

# Step 3: Build confusion matrices from calibration data
# spam.calculate_matrices(cal_results)

# Step 4: Run your actual experiment circuit
# experiment_circ = Circuit(3)
# ... build your circuit ...
# experiment_circ.measure_all()
# raw_result = backend.get_result(backend.process_circuit(experiment_circ, n_shots=10000))
# raw_counts = raw_result.get_counts()

# Step 5: Apply SPAM correction. correct_counts takes the BackendResult plus
# a qubit-to-bit map from get_parallel_measure, and returns a corrected result
# parallel_measures = spam.get_parallel_measure(experiment_circ)
# corrected_result = spam.correct_counts(raw_result, parallel_measures)
# corrected_counts = corrected_result.get_counts()

# Step 6: Compare raw vs corrected
# print("Raw counts:", raw_counts)
# print("Corrected counts:", corrected_counts)

The confusion matrix entries reveal information about each qubit’s readout fidelity. Large off-diagonal values indicate qubits with poor measurement fidelity, which may inform your qubit selection strategy when mapping circuits to hardware.

Scaling Considerations for SpamCorrecter

For n qubits, SpamCorrecter requires 2^n calibration circuits. This scaling limits practical use:

5 qubits: 32 calibration circuits (feasible)
10 qubits: 1,024 calibration circuits (expensive but possible)
20 qubits: over 1 million calibration circuits (impractical)

For larger systems, you can apply SpamCorrecter to subsets of qubits independently, assuming measurement errors on different qubits are uncorrelated. This assumption is reasonable for most superconducting devices, where measurement crosstalk is small.

Combining Passes: A Full Optimization Pipeline

Here is a complete pipeline that applies the key optimization passes in the recommended order:

from pytket import Circuit, OpType
from pytket.passes import (
    DecomposeBoxes,
    RemoveRedundancies,
    CommuteThroughMultis,
    CliffordSimp,
    FullPeepholeOptimise,
    SequencePass,
)

# Build a 5-qubit circuit with deliberate redundancy
circ = Circuit(5)
for i in range(4):
    circ.H(i)
    circ.CX(i, i + 1)
    circ.CX(i + 1, i)
    circ.H(i)
circ.Rz(0.25, 4)
circ.measure_all()

# Track gate counts
cx_original = circ.n_gates_of_type(OpType.CX)
total_original = circ.n_gates

# Apply the optimization pipeline in recommended order:
# 1. DecomposeBoxes: expand high-level abstractions
# 2. RemoveRedundancies: quick cleanup of obvious cancellations
# 3. CommuteThroughMultis: move single-qubit gates through CX to enable more cancellations
# 4. CliffordSimp: simplify Clifford subcircuits
# 5. FullPeepholeOptimise: aggressive two-qubit gate optimization via KAK
pipeline = SequencePass([
    DecomposeBoxes(),
    RemoveRedundancies(),
    CommuteThroughMultis(),
    CliffordSimp(),
    FullPeepholeOptimise(),
])
pipeline.apply(circ)

cx_optimized = circ.n_gates_of_type(OpType.CX)
total_optimized = circ.n_gates
print(f"CX count: {cx_original} -> {cx_optimized}")
print(f"Total gate count: {total_original} -> {total_optimized}")

Note that FullPeepholeOptimise internally applies many of the same transformations as the earlier passes, so you could use it alone and get most of the benefit. Including the earlier passes explicitly is useful when you want fine-grained control or when debugging optimization behavior.

Zero-Noise Extrapolation (ZNE)

Zero-noise extrapolation is a post-processing technique for suppressing gate errors. The idea is straightforward: run your circuit at multiple noise levels, observe how the result degrades as noise increases, and extrapolate backward to estimate what the result would be at zero noise.

How ZNE Works

The workflow has four steps:

Start with your compiled (optimized) circuit.
For each noise scale factor (for example, 1x, 3x, 5x), create a “folded” version of the circuit. Gate folding inserts identity-equivalent gate pairs (like CX followed by CX) that do nothing logically but double the hardware noise from those gates.
Run each folded circuit on hardware and collect the expectation value of your observable.
Fit the expectation values to a model (linear, polynomial, or exponential) as a function of noise scale factor, and evaluate the fit at scale factor 0.

ZNE works best for estimating expectation values of observables rather than full probability distributions. It requires the noise to scale predictably with the number of inserted gates.

ZNE with tket and Mitiq

tket does not include a built-in ZNE implementation, but it integrates well with Mitiq, the leading open-source error mitigation library. Here is how to combine them:

# pip install mitiq pytket pytket-qiskit
import numpy as np
from mitiq import zne
from mitiq.zne.scaling import fold_gates_at_random
from mitiq.zne.inference import RichardsonFactory
from pytket import Circuit
from pytket.passes import FullPeepholeOptimise

# Step 1: Build and optimize your circuit in tket
circ = Circuit(2)
circ.H(0)
circ.CX(0, 1)
circ.Rz(0.5, 1)
circ.CX(0, 1)
circ.H(0)
circ.measure_all()

FullPeepholeOptimise().apply(circ)

# Step 2: Define an executor that takes a Cirq circuit and returns
# an expectation value. Mitiq uses Cirq circuits internally.
# You convert between tket and Cirq as needed.
def execute(circuit, noise_level=0.01):
    """Execute a circuit and return an expectation value.
    
    In practice, this function would:
    1. Convert the Cirq circuit to tket format
    2. Run on a noisy backend (real hardware or noisy simulator)
    3. Compute the expectation value from the measurement counts
    """
    # Placeholder: replace with actual backend execution
    from cirq import DensityMatrixSimulator, depolarize
    noisy_result = DensityMatrixSimulator(noise=depolarize(p=noise_level)).simulate(circuit)
    # Return the expectation value of the Z operator on qubit 0
    rho = noisy_result.final_density_matrix
    z_matrix = np.array([[1, 0], [0, -1]])
    identity = np.eye(2)
    z_op = np.kron(z_matrix, identity)
    return np.real(np.trace(z_op @ rho))

# Step 3: Apply ZNE with Richardson extrapolation
# Scale factors determine how much the noise is amplified
# Richardson extrapolation fits a polynomial and evaluates at scale=0
factory = RichardsonFactory(scale_factors=[1.0, 3.0, 5.0])

# Convert tket circuit to Cirq for Mitiq
from pytket.extensions.cirq import tk_to_cirq
cirq_circuit = tk_to_cirq(circ)

result = zne.execute_with_zne(
    circuit=cirq_circuit,
    executor=execute,
    scale_noise=fold_gates_at_random,
    factory=factory,
)
print(f"ZNE-mitigated expectation value: {result:.4f}")

The choice of scale factors matters. Using too-large scale factors (like 1, 9, 15) amplifies the noise so much that the measured values become essentially random, making the extrapolation unreliable. Scale factors of [1, 3, 5] or [1, 2, 3] are good starting points.

Probabilistic Error Cancellation (PEC)

Probabilistic Error Cancellation is an alternative post-processing technique that can produce more accurate results than ZNE for small circuits with well-characterized noise.

How PEC Works

PEC represents each noisy gate as a linear combination (quasi-probability decomposition) of implementable operations. The key insight is that if you know the exact noise channel affecting each gate, you can decompose the ideal (noiseless) gate as a weighted sum of noisy operations:

G_ideal = sum_i (c_i * O_i)

where O_i are operations you can actually implement and c_i are real coefficients (which can be negative, hence “quasi-probability”). By randomly sampling operations from this decomposition and weighting the results by the signs and magnitudes of the coefficients, you obtain an unbiased estimate of the noiseless expectation value.

PEC Overhead

The cost of PEC is that the number of circuit executions (samples) needed for a given precision scales as e^(2 * gamma), where gamma is the total one-norm of the quasi-probability decomposition, which is closely related to the total gate infidelity. For a circuit with n gates each having infidelity epsilon:

gamma is approximately n * epsilon
Sampling overhead is approximately e^(2 * n * epsilon)

For a 10-gate circuit with 1% error per gate, the overhead is e^(0.2), which is about 1.2x, very manageable. For a 100-gate circuit with 1% error per gate, the overhead is e^(2), which is about 7.4x, still feasible. But for larger or noisier circuits, the exponential scaling becomes prohibitive.

PEC vs ZNE

PEC provides unbiased estimates (it converges to the correct answer as you take more samples), while ZNE relies on an extrapolation model that may not perfectly match the actual noise behavior. However, PEC requires a detailed characterization of the noise model for each gate, while ZNE only requires the ability to scale noise. In practice:

Use PEC when you have a well-characterized noise model and a relatively short circuit.
Use ZNE when you need a simpler setup and can tolerate some model-dependent bias.

Both PEC and ZNE can be accessed through Mitiq, and tket circuits can be converted to the format Mitiq expects using the pytket-cirq or pytket-qiskit extension packages.

Benchmarking Your Mitigation Strategy

To know whether your mitigation strategy is actually helping, you should compare results across multiple configurations: ideal (noiseless), noisy without mitigation, and noisy with mitigation.

from pytket import Circuit, OpType
from pytket.passes import FullPeepholeOptimise, SequencePass, DecomposeBoxes

# Build a test circuit
def build_test_circuit():
    circ = Circuit(3)
    circ.H(0)
    circ.CX(0, 1)
    circ.CX(1, 2)
    circ.Rz(0.25, 0)
    circ.Rz(0.5, 1)
    circ.Rz(0.75, 2)
    circ.CX(1, 2)
    circ.CX(0, 1)
    circ.H(0)
    circ.measure_all()
    return circ

# Unoptimized circuit
circ_raw = build_test_circuit()
cx_raw = circ_raw.n_gates_of_type(OpType.CX)

# Optimized circuit
circ_opt = build_test_circuit()
SequencePass([DecomposeBoxes(), FullPeepholeOptimise()]).apply(circ_opt)
cx_opt = circ_opt.n_gates_of_type(OpType.CX)

print(f"Unoptimized: {cx_raw} CX gates")
print(f"Optimized:   {cx_opt} CX gates")
print(f"Reduction:   {cx_raw - cx_opt} CX gates eliminated")

# To benchmark on a noisy simulator:
# from pytket.extensions.qiskit import AerBackend
#
# ideal_backend = AerBackend()
# ideal_result = ideal_backend.get_result(
#     ideal_backend.process_circuit(circ_raw, n_shots=10000)
# )
#
# For a noisy backend, use a fake device noise model:
# from qiskit_aer.noise import NoiseModel
# from qiskit_ibm_runtime.fake_provider import FakeSherbrooke
# noise_model = NoiseModel.from_backend(FakeSherbrooke())
# noisy_backend = AerBackend(noise_model)
#
# noisy_raw_result = noisy_backend.get_result(
#     noisy_backend.process_circuit(circ_raw, n_shots=10000)
# )
# noisy_opt_result = noisy_backend.get_result(
#     noisy_backend.process_circuit(circ_opt, n_shots=10000)
# )
#
# Compare the distributions using total variation distance:
# from pytket.utils import counts_to_dist
# ideal_dist = counts_to_dist(ideal_result.get_counts())
# noisy_raw_dist = counts_to_dist(noisy_raw_result.get_counts())
# noisy_opt_dist = counts_to_dist(noisy_opt_result.get_counts())

The total variation distance between the ideal and noisy distributions quantifies how much error remains. Comparing this metric before and after optimization tells you exactly how much your mitigation strategy is worth on a specific circuit and device.

Comparing Mitigation Strategies

Here is a practical comparison of the available techniques:

Technique	When to Use	Cost	Accuracy
FullPeepholeOptimise	Always, before routing	Free (compile time only)	High: prevents noise by eliminating gates
CliffordSimp	Circuits with Clifford subcircuits	Free (compile time only)	High for applicable subcircuits
RemoveRedundancies	Any circuit, as a cleanup step	Free (compile time only)	Moderate: catches obvious cancellations
CommuteThroughMultis	Circuits with interleaved single/multi-qubit gates	Free (compile time only)	Moderate: enables further cancellations
SquashCustom	Circuits with many consecutive single-qubit gates	Free (compile time only)	Moderate: reduces single-qubit gate count
SpamCorrecter	When readout errors dominate	2^n calibration circuits	High for readout error correction
ZNE	When gate errors dominate, need expectation values	3-5x circuit executions	Moderate: depends on extrapolation model
PEC	Small circuits with known noise model	Exponential sampling overhead	High: unbiased estimates

Compilation-based techniques (the top five rows) are always worth applying because they have zero runtime cost. Post-processing techniques (the bottom three rows) involve trade-offs between accuracy, overhead, and assumptions about the noise.

Common Mistakes

Applying FullPeepholeOptimise After Routing

The routing step inserts SWAP gates to map your logical circuit onto the device’s physical qubit connectivity. Each SWAP decomposes into 3 CX gates. If you apply FullPeepholeOptimise after routing, it may rearrange gates in ways that break the routing constraints, requiring you to re-route (which adds more SWAPs). Always optimize first, then route.

Using SpamCorrecter When Gate Errors Dominate

SpamCorrecter only corrects measurement errors. If your circuit has 50 two-qubit gates with 1% error each, the total gate error is roughly 40%, while readout error might be 2% per qubit. In this scenario, even perfect readout correction barely improves the result. Check the relative magnitudes of gate errors and readout errors for your target device before investing in SPAM calibration.

Over-Folding for ZNE

When the scale factor is too large, the folded circuit has so much noise that the measured expectation value is essentially random (close to the maximally mixed state value). Extrapolating from mostly-random data points produces unreliable results. As a rule of thumb, the highest scale factor should not push the circuit to a regime where the expectation value has flattened to its noise floor. Start with scale factors [1, 2, 3] and increase only if the trend is clearly visible.

Insufficient Calibration Shots for SpamCorrecter

Each calibration circuit needs enough shots to accurately estimate the confusion matrix entries. If a qubit has 1% readout error, you need at least 10,000 shots per calibration circuit to estimate that 1% to within about 0.1% precision (by the central limit theorem, the standard error is sqrt(p*(1-p)/N), which is about 0.1% for p=0.01 and N=10,000). Using only 1,000 shots per calibration circuit gives you 0.3% precision, which may be insufficient.

Expecting Mitigation to Fix Deep Circuits

Error mitigation works at the margin. If your circuit fidelity is below 10% (that is, the probability of getting the correct answer from the noisy circuit is less than 10%), even perfect error mitigation may not recover a meaningful signal. The noisy output is dominated by random noise, and no post-processing technique can reliably extract the correct answer from nearly uniform random data. If your circuit is too deep for the hardware’s coherence time, the solution is to use a different algorithm or decomposition that produces a shallower circuit, not to pile on more mitigation.

Summary

Effective noise mitigation on NISQ hardware combines multiple techniques:

Start with compilation-based optimization. Apply DecomposeBoxes, then RemoveRedundancies and CommuteThroughMultis for quick wins, then CliffordSimp and FullPeepholeOptimise for aggressive gate reduction. These are free and always beneficial.
Characterize your device’s dominant noise source. If readout errors are the bottleneck, invest in SpamCorrecter calibration. If gate errors dominate, consider ZNE or PEC for post-processing.
Benchmark rigorously. Compare your mitigated results against both the ideal result (from a noiseless simulator) and the unmitigated noisy result. Quantify the improvement to ensure your mitigation strategy is actually helping.
Know the limits. Error mitigation is not error correction. It buys you a constant factor improvement, not an exponential one. For circuits beyond a certain depth, the only solution is better hardware or a more efficient algorithm.