PennyLane Error Mitigation: ZNE and PEC for NISQ Circuits

Real quantum hardware is noisy. Gates are imperfect, qubits interact with their environment, and measurements introduce errors. Error mitigation techniques do not fix the underlying noise, but they use classical post-processing to extract more accurate results from noisy circuits without the overhead of full error correction.

This tutorial covers the physics of quantum noise, how to model every noise channel available in PennyLane, and how to implement multiple mitigation strategies: zero-noise extrapolation with gate folding, symmetry verification, Clifford data regression, and readout error correction. Along the way, you will learn noise-aware circuit design principles and how shot noise interacts with gradient estimation.

Physical Noise Sources in Superconducting Qubits

Before modeling noise in software, it helps to understand where it comes from in hardware. Superconducting qubits (the most widely deployed platform from IBM, Google, and Rigetti) suffer from four dominant noise sources.

Charge noise arises from fluctuating background charges near the Josephson junction. Stray electrons on the substrate surface or in the oxide layers shift the qubit’s energy levels randomly. Transmon qubits are designed to be exponentially insensitive to charge noise, but residual sensitivity still contributes to dephasing.

Flux noise comes from magnetic flux fluctuations in the superconducting loop. Qubits whose frequency is tunable via an external flux (like the transmon with a SQUID loop) are particularly susceptible. The noise spectrum is typically 1/f, meaning low-frequency fluctuations dominate.

Two-level system (TLS) defects are microscopic impurities in the substrate or junction oxide that behave like spurious qubits. When a TLS defect is near-resonant with the qubit, energy can swap between them, causing sudden jumps in the qubit’s relaxation time. TLS defects are a major source of T1 variability across qubits and over time.

Photon loss in the readout resonator and control lines leads to information leakage. Stray photons in the resonator can also cause unwanted qubit transitions (the “photon shot noise” dephasing mechanism).

T1 and T2 Relaxation Times

These physical noise sources map onto two characteristic timescales:

T1 (energy relaxation time): the time constant for the qubit to decay from |1> to |0>. The probability of remaining in |1> decays exponentially: P(|1>) = exp(-t/T1). On current hardware, T1 ranges from 50 to 500 microseconds.
T2 (dephasing time): the time constant for loss of phase coherence. A superposition state like |+> = (|0> + |1>)/sqrt(2) loses its off-diagonal density matrix elements as exp(-t/T2). T2 is always bounded by T2 <= 2*T1 (pure dephasing adds to the relaxation-induced dephasing). On current hardware, T2 ranges from 30 to 300 microseconds.

The following code plots the coherence decay for typical superconducting qubit parameters:

import numpy as np
import matplotlib.pyplot as plt

# Typical superconducting qubit parameters
T1 = 100e-6    # 100 microseconds
T2 = 80e-6     # 80 microseconds
t_gate = 50e-9  # 50 nanoseconds per gate

# Time axis: 0 to 400 microseconds
t = np.linspace(0, 400e-6, 500)

# T1 decay: probability of remaining in |1>
p_excited = np.exp(-t / T1)

# T2 decay: coherence (off-diagonal element magnitude)
coherence = np.exp(-t / T2)

# Mark the gate time scale
n_gates = t / t_gate

fig, ax1 = plt.subplots(figsize=(10, 5))

ax1.plot(t * 1e6, p_excited, label=f"T1 decay (T1 = {T1*1e6:.0f} us)", color="tab:blue")
ax1.plot(t * 1e6, coherence, label=f"T2 decay (T2 = {T2*1e6:.0f} us)", color="tab:red")
ax1.axhline(y=1/np.e, color="gray", linestyle="--", alpha=0.5, label="1/e threshold")

# Mark circuit depths
for depth, color in [(100, "green"), (500, "orange"), (2000, "red")]:
    gate_time = depth * t_gate * 1e6
    ax1.axvline(x=gate_time, color=color, linestyle=":", alpha=0.7, label=f"{depth} gates = {gate_time:.1f} us")

ax1.set_xlabel("Time (microseconds)")
ax1.set_ylabel("Survival probability / Coherence")
ax1.set_title("Qubit Decoherence vs Circuit Depth")
ax1.legend(loc="upper right")
ax1.set_xlim(0, 400)
ax1.set_ylim(0, 1.05)
plt.tight_layout()
plt.savefig("coherence_decay.png", dpi=150)
plt.show()

A circuit with 2000 gates at 50 ns each takes 100 microseconds, which is comparable to T2. At that point, phase information is largely gone. This is why circuit depth is the primary bottleneck on NISQ hardware.

Noise Channel Composition

In practice, a qubit experiences multiple noise sources simultaneously during each gate. For a single gate with duration t_gate on a qubit with relaxation times T1 and T2, the combined noise channel is the composition of amplitude damping and phase damping with parameters derived from the physical timescales:

Amplitude damping parameter: gamma = 1 - exp(-t_gate / T1)
Phase damping parameter: lambda = 1 - exp(-t_gate / T2 + t_gate / (2 * T1))

The phase damping parameter accounts for the fact that amplitude damping already causes some dephasing (contributing T1/2 to T2), so the additional pure dephasing rate is 1/T2 - 1/(2*T1).

The following code applies these channels individually and in combination to a qubit in the |+> state, then compares the resulting density matrices:

import numpy as np

# Physical parameters
T1 = 100e-6     # 100 microseconds
T2 = 80e-6      # 80 microseconds
t_gate = 50e-9   # 50 nanoseconds

# Derived noise parameters
gamma = 1 - np.exp(-t_gate / T1)      # amplitude damping
lam = 1 - np.exp(-t_gate / T2 + t_gate / (2 * T1))  # pure dephasing

print(f"Amplitude damping gamma = {gamma:.6e}")
print(f"Phase damping lambda    = {lam:.6e}")

# Initial state: |+> = (|0> + |1>) / sqrt(2)
rho = np.array([[0.5, 0.5],
                [0.5, 0.5]], dtype=complex)

def amplitude_damp(rho, gamma):
    """Apply amplitude damping channel using Kraus operators."""
    K0 = np.array([[1, 0], [0, np.sqrt(1 - gamma)]])
    K1 = np.array([[0, np.sqrt(gamma)], [0, 0]])
    return K0 @ rho @ K0.conj().T + K1 @ rho @ K1.conj().T

def phase_damp(rho, lam):
    """Apply phase damping channel using Kraus operators."""
    K0 = np.array([[1, 0], [0, np.sqrt(1 - lam)]])
    K1 = np.array([[0, 0], [0, np.sqrt(lam)]])
    return K0 @ rho @ K0.conj().T + K1 @ rho @ K1.conj().T

# Apply channels individually and in combination
rho_ad = amplitude_damp(rho, gamma)
rho_pd = phase_damp(rho, lam)
rho_combined = phase_damp(amplitude_damp(rho, gamma), lam)

print(f"\nInitial state |+>:")
print(f"  rho_01 (coherence) = {rho[0, 1]:.6f}")

print(f"\nAfter amplitude damping only:")
print(f"  rho_01 = {rho_ad[0, 1]:.6f}  (coherence reduced by {1 - abs(rho_ad[0, 1]) / abs(rho[0, 1]):.2e})")

print(f"\nAfter phase damping only:")
print(f"  rho_01 = {rho_pd[0, 1]:.6f}  (coherence reduced by {1 - abs(rho_pd[0, 1]) / abs(rho[0, 1]):.2e})")

print(f"\nAfter combined (amplitude then phase):")
print(f"  rho_01 = {rho_combined[0, 1]:.6f}  (coherence reduced by {1 - abs(rho_combined[0, 1]) / abs(rho[0, 1]):.2e})")

# After 1000 gates, the accumulated effect is significant
rho_1000 = rho.copy()
for _ in range(1000):
    rho_1000 = phase_damp(amplitude_damp(rho_1000, gamma), lam)

print(f"\nAfter 1000 gates:")
print(f"  rho_01 = {rho_1000[0, 1]:.6f}")
print(f"  Population |1>: {rho_1000[1, 1]:.6f}")

For a single gate at 50 ns, both gamma and lambda are very small (on the order of 5e-7 and 3e-7 respectively). But after 1000 gates, the accumulated decoherence is measurable, and after 10,000 gates it dominates.

Simulating Noise with PennyLane

PennyLane’s default.mixed device simulates noise by working with density matrices instead of pure state vectors. Here is a basic example showing depolarizing noise on a Bell state:

import pennylane as qml
from pennylane import numpy as np

# default.mixed supports noise channels
dev_noisy = qml.device("default.mixed", wires=2)

@qml.qnode(dev_noisy)
def noisy_bell(p_depol):
    qml.Hadamard(wires=0)
    qml.CNOT(wires=[0, 1])

    # Apply depolarizing noise after the gates
    qml.DepolarizingChannel(p_depol, wires=0)
    qml.DepolarizingChannel(p_depol, wires=1)

    return qml.expval(qml.PauliZ(0) @ qml.PauliZ(1))

# Noiseless: ZZ correlation should be 1.0 for a Bell state
print(f"p=0.00: {noisy_bell(0.00):.4f}")
print(f"p=0.01: {noisy_bell(0.01):.4f}")
print(f"p=0.05: {noisy_bell(0.05):.4f}")
print(f"p=0.20: {noisy_bell(0.20):.4f}")

# p=0.00: 1.0000
# p=0.01: 0.9604
# p=0.05: 0.8100
# p=0.20: 0.3600

Notice how rapidly the ZZ correlation degrades. At p=0.05, nearly 20% of the signal is lost. On real hardware, effective error rates per gate are often in this range.

All PennyLane Noise Channels

PennyLane provides a comprehensive set of noise channels for the default.mixed device. Here is the full list with constructor parameters and examples.

qml.DepolarizingChannel(p, wires): replaces the qubit state with the maximally mixed state with probability p. The most common noise model for benchmarking.

qml.DepolarizingChannel(0.01, wires=0)  # 1% depolarizing error

qml.AmplitudeDamping(gamma, wires): models energy relaxation (T1 decay). The qubit decays from |1> to |0> with probability gamma.

qml.AmplitudeDamping(0.02, wires=0)  # 2% amplitude damping

qml.PhaseDamping(gamma, wires): models pure dephasing (T2 decay beyond what T1 causes). Coherences decay without energy loss.

qml.PhaseDamping(0.03, wires=0)  # 3% phase damping

qml.ThermalRelaxationError(pe, t1, t2, tg, wires): combines amplitude damping and phase damping using physical parameters directly. pe is the excited state population at thermal equilibrium, t1 is T1, t2 is T2, and tg is the gate time.

qml.ThermalRelaxationError(pe=0.01, t1=100e-6, t2=80e-6, tg=50e-9, wires=0)

qml.BitFlip(p, wires): flips the qubit (|0> to |1> and vice versa) with probability p. Often used to model readout errors.

qml.BitFlip(0.05, wires=0)  # 5% bit flip probability

qml.PhaseFlip(p, wires): applies a Z gate with probability p. Models random phase kicks.

qml.PhaseFlip(0.02, wires=0)  # 2% phase flip probability

qml.QubitChannel(K_list, wires): applies a custom noise channel defined by a list of Kraus operators. This is the most flexible option, letting you implement any completely positive trace-preserving (CPTP) map.

# Custom channel: 90% identity, 10% X error
import numpy as np

K0 = np.sqrt(0.9) * np.eye(2)
K1 = np.sqrt(0.1) * np.array([[0, 1], [1, 0]])
qml.QubitChannel([K0, K1], wires=0)

The Kraus operators must satisfy the completeness relation: the sum of K_i^dagger @ K_i over all i must equal the identity matrix. PennyLane validates this at runtime.

Here is a complete example using QubitChannel to implement a custom asymmetric depolarizing channel where X, Y, and Z errors have different probabilities:

import pennylane as qml
import numpy as np

dev = qml.device("default.mixed", wires=1)

# Asymmetric depolarizing: different rates for X, Y, Z errors
p_x, p_y, p_z = 0.02, 0.005, 0.01
p_id = 1 - p_x - p_y - p_z

K_I = np.sqrt(p_id) * np.eye(2)
K_X = np.sqrt(p_x) * np.array([[0, 1], [1, 0]])
K_Y = np.sqrt(p_y) * np.array([[0, -1j], [1j, 0]])
K_Z = np.sqrt(p_z) * np.array([[1, 0], [0, -1]])

# Verify completeness: sum of K_i^dag @ K_i = I
completeness = sum(K.conj().T @ K for K in [K_I, K_X, K_Y, K_Z])
print(f"Completeness check (should be identity):\n{completeness}")

@qml.qnode(dev)
def custom_noise_circuit():
    qml.Hadamard(wires=0)
    qml.QubitChannel([K_I, K_X, K_Y, K_Z], wires=0)
    return qml.expval(qml.PauliX(0))

print(f"<X> with asymmetric noise: {custom_noise_circuit():.4f}")
# Without noise, <X> for |+> state would be 1.0

Noise Circuit Insertion Patterns

When building noisy simulations, you need a systematic way to insert noise after gates. PennyLane supports three patterns, from most manual to most automated.

Pattern 1: Manual Insertion

Place noise channels explicitly after each gate. This gives full control but becomes tedious for large circuits.

import pennylane as qml
from pennylane import numpy as np

dev = qml.device("default.mixed", wires=2)

@qml.qnode(dev)
def manual_noise(params, p=0.01):
    qml.RY(params[0], wires=0)
    qml.DepolarizingChannel(p, wires=0)   # noise after RY

    qml.RY(params[1], wires=1)
    qml.DepolarizingChannel(p, wires=1)   # noise after RY

    qml.CNOT(wires=[0, 1])
    qml.DepolarizingChannel(p, wires=0)   # noise after CNOT on control
    qml.DepolarizingChannel(p, wires=1)   # noise after CNOT on target

    return qml.expval(qml.PauliZ(0))

Pattern 2: Using qml.noise.insert

The qml.noise.insert transform automatically inserts a noise channel after every gate (or after gates of a specified type). This is the cleanest approach for benchmarking. (Older PennyLane releases exposed this transform as qml.transforms.insert; it now lives in the qml.noise module.)

import pennylane as qml
from pennylane import numpy as np

dev = qml.device("default.mixed", wires=2)

# Define a clean circuit with no noise
@qml.qnode(dev)
def clean_circuit(params):
    qml.RY(params[0], wires=0)
    qml.RY(params[1], wires=1)
    qml.CNOT(wires=[0, 1])
    qml.RY(params[2], wires=0)
    return qml.expval(qml.PauliZ(0))

# Create a noisy version by inserting depolarizing noise after every gate
noisy_circuit = qml.noise.insert(
    clean_circuit, qml.DepolarizingChannel, 0.01, position="all"
)

params = np.array([0.5, 0.3, 0.8])
print(f"Clean:  {clean_circuit(params):.4f}")
print(f"Noisy:  {noisy_circuit(params):.4f}")

The position argument controls where noise is inserted: "all" inserts after every gate, "start" inserts before all gates, and "end" inserts after all gates.

Pattern 3: Noise Model via Transform Decorator

You can also build a reusable noise model as a transform and apply it to any circuit:

import pennylane as qml
from pennylane import numpy as np

dev = qml.device("default.mixed", wires=2)

def make_noisy_qnode(qnode, p_single=0.001, p_two=0.01):
    """Wrap a qnode with a realistic noise model.

    Two-qubit gates get 10x higher noise than single-qubit gates,
    matching typical hardware error rates.
    """
    # Insert single-qubit noise after all operations
    noisy = qml.noise.insert(
        qnode, qml.DepolarizingChannel, p_two, position="all"
    )
    return noisy

@qml.qnode(dev)
def my_circuit(params):
    qml.RY(params[0], wires=0)
    qml.RY(params[1], wires=1)
    qml.CNOT(wires=[0, 1])
    qml.RY(params[2], wires=0)
    return qml.expval(qml.PauliZ(0) @ qml.PauliZ(1))

noisy_qnode = make_noisy_qnode(my_circuit)
params = np.array([0.5, 0.3, 0.8])
print(f"Noisy result: {noisy_qnode(params):.4f}")

How Noise Degrades VQE

Noise has a compounding effect on variational algorithms. Each gate adds error, so deeper circuits accumulate more noise. This makes the cost landscape shallower and can prevent convergence to the true ground state.

dev_ideal = qml.device("default.qubit", wires=2)
dev_noisy = qml.device("default.mixed", wires=2)

# Simple two-qubit parameterized circuit
def ansatz_circuit(params, wires):
    qml.RY(params[0], wires=0)
    qml.RY(params[1], wires=1)
    qml.CNOT(wires=[0, 1])
    qml.RY(params[2], wires=0)

H = qml.Hamiltonian([1.0, 0.5, -0.25], [qml.PauliZ(0), qml.PauliZ(1), qml.PauliZ(0) @ qml.PauliZ(1)])

@qml.qnode(dev_ideal)
def ideal_cost(params):
    ansatz_circuit(params, wires=[0, 1])
    return qml.expval(H)

@qml.qnode(dev_noisy)
def noisy_cost(params):
    ansatz_circuit(params, wires=[0, 1])
    qml.DepolarizingChannel(0.03, wires=0)
    qml.DepolarizingChannel(0.03, wires=1)
    return qml.expval(H)

params = np.array([0.5, 0.3, 0.8], requires_grad=True)

ideal_val = ideal_cost(params)
noisy_val = noisy_cost(params)
print(f"Ideal expectation value:  {ideal_val:.4f}")
print(f"Noisy expectation value:  {noisy_val:.4f}")
print(f"Error from noise:         {abs(ideal_val - noisy_val):.4f}")

Zero-Noise Extrapolation

Zero-noise extrapolation (ZNE) is a practical error mitigation strategy. The key insight: if you can evaluate the circuit at multiple noise levels, you can fit a curve through those results and extrapolate to where the noise would be zero.

Gate Folding

On real hardware, you cannot directly scale the noise parameter. Instead, you use gate folding to physically increase the noise exposure while preserving the logical operation. Gate folding replaces each gate G with the sequence G, G-dagger, G (written as G * G^dagger * G). This sequence is logically equivalent to G (because G^dagger * G = I, so G * G^dagger * G = G * I = G), but the circuit is now three times deeper, so it accumulates approximately three times the noise.

For a general odd scale factor c = 2k + 1:

c = 1: G (one copy, no folding)
c = 3: G * G^dagger * G (three operations)
c = 5: G * G^dagger * G * G^dagger * G (five operations)

Here is a Python implementation that folds a PennyLane circuit and verifies logical equivalence:

import pennylane as qml
from pennylane import numpy as np

def fold_gates(circuit_fn, scale_factor):
    """Create a folded version of a circuit function.

    Args:
        circuit_fn: A function that applies quantum gates (no return statement).
        scale_factor: Odd integer >= 1. Each gate G becomes G*(G^dag*G)^k
                      where scale_factor = 2k + 1.
    """
    assert scale_factor % 2 == 1 and scale_factor >= 1, "Scale factor must be odd and >= 1"
    k = (scale_factor - 1) // 2

    def folded_circuit(*args, **kwargs):
        # Record the operations by running the circuit function
        circuit_fn(*args, **kwargs)

        # For each fold: apply adjoint then forward
        for _ in range(k):
            qml.adjoint(circuit_fn)(*args, **kwargs)
            circuit_fn(*args, **kwargs)

    return folded_circuit

# Original circuit
def my_ansatz(params):
    qml.RY(params[0], wires=0)
    qml.RY(params[1], wires=1)
    qml.CNOT(wires=[0, 1])
    qml.RY(params[2], wires=0)

# Verify logical equivalence: folded circuit on noiseless device should give same result
dev_test = qml.device("default.qubit", wires=2)
params = np.array([0.5, 0.3, 0.8])

@qml.qnode(dev_test)
def original_circuit(params):
    my_ansatz(params)
    return qml.expval(qml.PauliZ(0) @ qml.PauliZ(1))

@qml.qnode(dev_test)
def folded_3x(params):
    fold_gates(my_ansatz, scale_factor=3)(params)
    return qml.expval(qml.PauliZ(0) @ qml.PauliZ(1))

@qml.qnode(dev_test)
def folded_5x(params):
    fold_gates(my_ansatz, scale_factor=5)(params)
    return qml.expval(qml.PauliZ(0) @ qml.PauliZ(1))

print(f"Original (1x):   {original_circuit(params):.6f}")
print(f"Folded (3x):     {folded_3x(params):.6f}")
print(f"Folded (5x):     {folded_5x(params):.6f}")
# All three should produce identical results on a noiseless simulator

Now apply gate folding with noise to collect ZNE data points:

dev_noisy = qml.device("default.mixed", wires=2)
H = qml.Hamiltonian([1.0, 0.5, -0.25], [qml.PauliZ(0), qml.PauliZ(1), qml.PauliZ(0) @ qml.PauliZ(1)])

def noisy_ansatz_with_folding(params, scale_factor, p_noise=0.02):
    """Run the ansatz with gate folding and per-gate noise."""
    folded = fold_gates(my_ansatz, scale_factor)

    @qml.qnode(dev_noisy)
    def circuit(params):
        folded(params)
        # Apply noise proportional to the circuit depth
        # In a more realistic simulation, noise would be inserted after each gate
        # using qml.noise.insert
        for wire in range(2):
            qml.DepolarizingChannel(p_noise * scale_factor, wires=wire)
        return qml.expval(H)

    return float(circuit(params))

# Evaluate at scale factors 1, 3, 5
scale_factors = [1, 3, 5]
noisy_expectations = [noisy_ansatz_with_folding(params, s) for s in scale_factors]

print("Noisy results at different folding levels:")
for s, e in zip(scale_factors, noisy_expectations):
    print(f"  Scale {s}x: {e:.4f}")

Linear and Polynomial Extrapolation

With the noisy data points in hand, fit a curve and extrapolate to zero noise:

import numpy as np

# Evaluate at noise scale factors 1x, 2x, 3x (using noise scaling for simplicity)
def get_expectation_at_noise_scale(params, scale_factor, base_noise=0.02):
    scaled_noise = base_noise * scale_factor

    @qml.qnode(dev_noisy)
    def scaled_circuit(params):
        my_ansatz(params)
        qml.DepolarizingChannel(scaled_noise, wires=0)
        qml.DepolarizingChannel(scaled_noise, wires=1)
        return qml.expval(H)

    return float(scaled_circuit(params))

scale_factors = [1, 2, 3]
noisy_expectations = [get_expectation_at_noise_scale(params, s) for s in scale_factors]

# Fit a linear extrapolation through the three points
coeffs = np.polyfit(scale_factors, noisy_expectations, deg=1)
zne_result = np.polyval(coeffs, 0)   # evaluate at noise = 0

print(f"\nLinear ZNE estimate:  {zne_result:.4f}")
print(f"Ideal (noiseless):    {float(ideal_cost(params)):.4f}")
print(f"Uncorrected noisy:    {noisy_expectations[0]:.4f}")

Richardson Extrapolation with Error Analysis

Richardson extrapolation fits a polynomial of degree n-1 through n data points. Using more scale factors and higher-degree polynomials can improve accuracy, but higher-degree fits are more sensitive to noise in the data points (overfitting). The following code compares polynomial fits of degree 1, 2, and 4, and uses bootstrap resampling to estimate the uncertainty of each extrapolation:

import numpy as np
import matplotlib.pyplot as plt

# Collect data at 5 noise scale factors
scale_factors = [1, 2, 3, 4, 5]
base_noise = 0.02
params = np.array([0.5, 0.3, 0.8])

# Simulate multiple shots to get error bars
n_repetitions = 100
all_results = np.zeros((n_repetitions, len(scale_factors)))

dev_shots = qml.device("default.mixed", wires=2, shots=1000)

for i, sf in enumerate(scale_factors):
    scaled_noise = base_noise * sf

    @qml.qnode(dev_shots)
    def shot_circuit(params):
        my_ansatz(params)
        qml.DepolarizingChannel(scaled_noise, wires=0)
        qml.DepolarizingChannel(scaled_noise, wires=1)
        return qml.expval(H)

    for rep in range(n_repetitions):
        all_results[rep, i] = float(shot_circuit(params))

# Mean and standard deviation at each noise level
means = all_results.mean(axis=0)
stds = all_results.std(axis=0)

print("Noisy evaluations:")
for sf, m, s in zip(scale_factors, means, stds):
    print(f"  Scale {sf}x: {m:.4f} +/- {s:.4f}")

# Richardson extrapolation with bootstrap error analysis
def richardson_extrapolation(sf, ev, deg):
    """Fit polynomial of given degree and extrapolate to noise=0."""
    coeffs = np.polyfit(sf, ev, deg=deg)
    return np.polyval(coeffs, 0)

# Bootstrap: resample the noisy evaluations 1000 times
n_bootstrap = 1000
degrees = [1, 2, 4]
bootstrap_estimates = {deg: [] for deg in degrees}

for _ in range(n_bootstrap):
    # Resample: pick a random repetition for each scale factor
    boot_indices = np.random.randint(0, n_repetitions, size=len(scale_factors))
    boot_values = [all_results[boot_indices[i], i] for i in range(len(scale_factors))]

    for deg in degrees:
        try:
            est = richardson_extrapolation(scale_factors, boot_values, deg)
            bootstrap_estimates[deg].append(est)
        except np.RankWarning:
            pass

ideal_val = float(ideal_cost(params))
print(f"\nIdeal (noiseless): {ideal_val:.4f}")
print(f"Uncorrected (1x):  {means[0]:.4f}")
print(f"\nExtrapolated estimates (mean +/- std):")
for deg in degrees:
    est = np.array(bootstrap_estimates[deg])
    print(f"  Degree {deg}: {est.mean():.4f} +/- {est.std():.4f}")

# Plot
fig, ax = plt.subplots(figsize=(10, 6))
ax.errorbar(scale_factors, means, yerr=stds, fmt='ko', capsize=5, label="Noisy evaluations")

x_fit = np.linspace(0, 5.5, 100)
colors = ["tab:blue", "tab:orange", "tab:green"]
for deg, color in zip(degrees, colors):
    coeffs = np.polyfit(scale_factors, means, deg=deg)
    y_fit = np.polyval(coeffs, x_fit)
    est_mean = np.mean(bootstrap_estimates[deg])
    est_std = np.std(bootstrap_estimates[deg])
    ax.plot(x_fit, y_fit, color=color, label=f"Degree {deg} (ZNE = {est_mean:.3f} +/- {est_std:.3f})")

ax.axhline(y=ideal_val, color='red', linestyle='--', label=f"Ideal = {ideal_val:.4f}")
ax.set_xlabel("Noise scale factor")
ax.set_ylabel("Expectation value")
ax.set_title("Zero-Noise Extrapolation: Polynomial Fit Comparison")
ax.legend()
ax.set_xlim(-0.5, 5.5)
plt.tight_layout()
plt.savefig("zne_comparison.png", dpi=150)
plt.show()

The degree-1 (linear) fit is the most robust but assumes the noise dependence is linear, which is only approximately true. Higher-degree fits can capture nonlinear behavior but amplify noise in the data. In practice, linear or quadratic extrapolation is the best tradeoff for most circuits.

Symmetry Verification

Symmetry verification is a simple, zero-overhead mitigation technique. If your circuit should conserve a known symmetry (for example, particle number or parity), you can detect errors by checking whether measurement outcomes satisfy the symmetry. Shots that violate the expected symmetry are discarded as corrupted.

For a Bell state |00> + |11>, the two qubits always have the same value. Any shot that produces 01 or 10 indicates an error occurred. Discarding these shots filters out a fraction of the errors at the cost of reduced sample size.

import pennylane as qml
from pennylane import numpy as np

dev = qml.device("default.mixed", wires=2, shots=10000)

@qml.qnode(dev)
def noisy_bell_samples(p_depol):
    """Create a Bell state with depolarizing noise and return samples."""
    qml.Hadamard(wires=0)
    qml.CNOT(wires=[0, 1])
    qml.DepolarizingChannel(p_depol, wires=0)
    qml.DepolarizingChannel(p_depol, wires=1)
    return qml.sample()

# Run the noisy circuit
p_noise = 0.05
samples = noisy_bell_samples(p_noise)  # shape: (10000, 2)

# Compute ZZ correlator without filtering
zz_raw = np.mean([(-1)**s[0] * (-1)**s[1] for s in samples])

# Apply parity filtering: keep only shots where both qubits agree
parity_mask = samples[:, 0] == samples[:, 1]
filtered_samples = samples[parity_mask]

# Compute ZZ correlator after filtering
zz_filtered = np.mean([(-1)**s[0] * (-1)**s[1] for s in filtered_samples])

# For the Bell state |00> + |11>, the ideal ZZ correlator is +1
print(f"Total shots:     {len(samples)}")
print(f"Kept after filter: {len(filtered_samples)} ({100*len(filtered_samples)/len(samples):.1f}%)")
print(f"Discarded:         {len(samples) - len(filtered_samples)}")
print(f"\nZZ correlator (raw):      {zz_raw:.4f}")
print(f"ZZ correlator (filtered): {zz_filtered:.4f}")
print(f"Ideal ZZ correlator:      1.0000")

The filtered result is closer to the ideal value because the 01 and 10 outcomes (which have ZZ = -1) are precisely the ones caused by single-qubit errors. The cost is a reduced shot count, but for moderate noise levels, the improvement in accuracy more than compensates.

Symmetry verification works best when:

The circuit has a clear symmetry that errors break
The noise rate is moderate (if most shots are discarded, the variance becomes too large)
You are estimating expectation values of observables that commute with the symmetry

Clifford Data Regression (CDR)

Clifford Data Regression is a more sophisticated mitigation technique that learns a correction model from training data. The core idea: Clifford circuits (circuits composed of H, S, CNOT, and Pauli gates) are classically simulable, so you can compute their exact expectation values. By running Clifford training circuits on the noisy hardware and comparing the noisy results to the known ideal results, you can fit a linear model that maps noisy expectation values to corrected ones.

The CDR workflow:

Start with your target circuit containing non-Clifford gates (e.g., Ry(0.3), Rz(1.7))
Generate training circuits by replacing non-Clifford gates with nearby Clifford gates (e.g., replace Ry(0.3) with Ry(0) or Ry(pi/2))
Compute the ideal expectation value for each training circuit (classically, since they are Clifford)
Run each training circuit on the noisy device to get noisy expectation values
Fit a linear model: E_ideal = a * E_noisy + b
Apply the linear correction to the noisy result of the original circuit

import pennylane as qml
from pennylane import numpy as np

dev_ideal = qml.device("default.qubit", wires=3)
dev_noisy = qml.device("default.mixed", wires=3)

# Target VQE circuit with non-Clifford rotation angles
target_params = np.array([0.3, 1.7, 0.8, 2.1, 0.5, 1.2])

def vqe_circuit(params):
    """3-qubit VQE ansatz with parameterized rotations."""
    qml.RY(params[0], wires=0)
    qml.RY(params[1], wires=1)
    qml.RY(params[2], wires=2)
    qml.CNOT(wires=[0, 1])
    qml.CNOT(wires=[1, 2])
    qml.RY(params[3], wires=0)
    qml.RY(params[4], wires=1)
    qml.RY(params[5], wires=2)

H_vqe = qml.Hamiltonian(
    [1.0, 0.5, 0.5, -0.25],
    [qml.PauliZ(0), qml.PauliZ(1), qml.PauliZ(2), qml.PauliZ(0) @ qml.PauliZ(2)]
)

@qml.qnode(dev_ideal)
def ideal_vqe(params):
    vqe_circuit(params)
    return qml.expval(H_vqe)

@qml.qnode(dev_noisy)
def noisy_vqe(params):
    vqe_circuit(params)
    for w in range(3):
        qml.DepolarizingChannel(0.02, wires=w)
    return qml.expval(H_vqe)

# Step 1: Generate Clifford training circuits
# Replace each non-Clifford angle with the nearest multiple of pi/2
clifford_angles = [0, np.pi/2, np.pi, 3*np.pi/2]

def nearest_clifford(angle):
    """Round an angle to the nearest Clifford rotation (multiple of pi/2)."""
    return min(clifford_angles, key=lambda c: abs(angle - c))

n_training = 20
rng = np.random.default_rng(42)
training_params_list = []

for _ in range(n_training):
    # For each training circuit, randomly choose which gates to make Clifford
    train_params = target_params.copy()
    for i in range(len(train_params)):
        # Randomly pick a Clifford angle for each gate
        train_params[i] = rng.choice(clifford_angles)
    training_params_list.append(train_params)

# Step 2: Compute ideal and noisy values for training circuits
ideal_train = np.array([float(ideal_vqe(p)) for p in training_params_list])
noisy_train = np.array([float(noisy_vqe(p)) for p in training_params_list])

# Step 3: Fit the linear model: E_ideal = a * E_noisy + b
A = np.vstack([noisy_train, np.ones(len(noisy_train))]).T
a, b = np.linalg.lstsq(A, ideal_train, rcond=None)[0]

print(f"CDR linear model: E_ideal = {a:.4f} * E_noisy + {b:.4f}")

# Step 4: Apply correction to the target circuit
noisy_target = float(noisy_vqe(target_params))
ideal_target = float(ideal_vqe(target_params))
cdr_corrected = a * noisy_target + b

print(f"\nTarget circuit results:")
print(f"  Ideal:          {ideal_target:.4f}")
print(f"  Noisy:          {noisy_target:.4f}")
print(f"  CDR corrected:  {cdr_corrected:.4f}")
print(f"  Noisy error:    {abs(noisy_target - ideal_target):.4f}")
print(f"  CDR error:      {abs(cdr_corrected - ideal_target):.4f}")

CDR is particularly effective when the noise channel acts linearly on the expectation values, which is often a good approximation for moderate noise levels. Unlike ZNE, CDR does not require amplifying the noise, which makes it better suited for circuits that are already deep.

Readout Error Mitigation

Readout errors flip measured 0s to 1s and vice versa. The correction requires a calibration matrix: prepare each computational basis state, measure it many times, and record the confusion probabilities. For a single qubit this gives a 2x2 matrix A where column j contains the measured distribution when state j was prepared.

# Build a 1-qubit readout calibration matrix using BitFlip channel
dev_cal = qml.device("default.mixed", wires=1)
p_flip = 0.05   # 5% readout error rate

@qml.qnode(dev_cal)
def measure_state(bit):
    if bit == 1:
        qml.PauliX(wires=0)
    qml.BitFlip(p_flip, wires=0)
    return qml.probs(wires=0)

col0 = measure_state(0)   # [0.95, 0.05]
col1 = measure_state(1)   # [0.05, 0.95]
A = np.array([[col0[0], col1[0]], [col0[1], col1[1]]])
A_inv = np.linalg.inv(A)  # apply A_inv to any measured distribution to correct it
print("Calibration matrix:\n", A)
print("Inverse (correction matrix):\n", A_inv)

# Example correction: measure a state and correct the distribution
@qml.qnode(dev_cal)
def noisy_measurement():
    qml.RY(1.2, wires=0)  # prepare a non-trivial state
    qml.BitFlip(p_flip, wires=0)
    return qml.probs(wires=0)

measured_probs = np.array(noisy_measurement())
corrected_probs = A_inv @ measured_probs

print(f"\nMeasured probabilities:  {measured_probs}")
print(f"Corrected probabilities: {corrected_probs}")

For multi-qubit systems, the full calibration matrix has dimensions 2^n by 2^n, which grows exponentially. For systems with more than about 10 qubits, use tensor product structure (assuming readout errors are independent across qubits) to keep the calibration cost linear.

Noise-Aware Circuit Design

Before applying post-hoc mitigation, reduce noise at the circuit design level. These techniques are free in terms of shot budget and often provide the largest improvement.

Minimize Two-Qubit Gates

Two-qubit gates (CNOT, CZ) have error rates 5 to 10 times higher than single-qubit gates on current hardware. Reducing the two-qubit gate count is the single most effective way to lower circuit noise.

import pennylane as qml
from pennylane import numpy as np

dev_noisy = qml.device("default.mixed", wires=3)

# Original VQE ansatz: 6 CNOT gates
@qml.qnode(dev_noisy)
def deep_ansatz(params):
    # Layer 1: full entangling layer (3 CNOTs)
    qml.RY(params[0], wires=0)
    qml.RY(params[1], wires=1)
    qml.RY(params[2], wires=2)
    qml.CNOT(wires=[0, 1])
    qml.CNOT(wires=[1, 2])
    qml.CNOT(wires=[2, 0])

    # Layer 2: another full entangling layer (3 CNOTs)
    qml.RY(params[3], wires=0)
    qml.RY(params[4], wires=1)
    qml.RY(params[5], wires=2)
    qml.CNOT(wires=[0, 1])
    qml.CNOT(wires=[1, 2])
    qml.CNOT(wires=[2, 0])

    # Noise after each CNOT (simplified: apply at end)
    for w in range(3):
        qml.DepolarizingChannel(0.02, wires=w)

    return qml.expval(qml.PauliZ(0) @ qml.PauliZ(1) @ qml.PauliZ(2))

# Optimized ansatz: 3 CNOT gates (linear connectivity, single layer)
@qml.qnode(dev_noisy)
def shallow_ansatz(params):
    # Single layer with linear entanglement (3 CNOTs instead of 6)
    qml.RY(params[0], wires=0)
    qml.RY(params[1], wires=1)
    qml.RY(params[2], wires=2)
    qml.CNOT(wires=[0, 1])
    qml.CNOT(wires=[1, 2])
    # Use extra rotation parameters to compensate for reduced entanglement
    qml.RY(params[3], wires=0)
    qml.RY(params[4], wires=1)
    qml.RY(params[5], wires=2)
    qml.CNOT(wires=[1, 0])

    for w in range(3):
        qml.DepolarizingChannel(0.02, wires=w)

    return qml.expval(qml.PauliZ(0) @ qml.PauliZ(1) @ qml.PauliZ(2))

params = np.array([0.5, 0.3, 0.8, 0.2, 0.6, 0.4])

deep_result = deep_ansatz(params)
shallow_result = shallow_ansatz(params)

print(f"Deep ansatz (6 CNOTs):    {deep_result:.4f}")
print(f"Shallow ansatz (3 CNOTs): {shallow_result:.4f}")
print(f"The shallow circuit has less noise-induced bias")

Echo Sequences for Systematic Error Cancellation

If a gate has a small systematic over-rotation (e.g., a Pauli-X gate that actually rotates by pi + epsilon), you can cancel the systematic component by sandwiching the circuit between X gates. The idea: apply X, then the circuit, then X. The systematic errors from the two X gates partially cancel.

Adjacent Gate Cancellation

If your circuit contains a gate G immediately followed by its inverse G-dagger, the pair can be removed entirely. This happens more often than you might expect after circuit transpilation or when composing parameterized layers.

import pennylane as qml
from pennylane import numpy as np

dev = qml.device("default.qubit", wires=2)

# Before optimization: redundant gates
@qml.qnode(dev)
def unoptimized(params):
    qml.CNOT(wires=[0, 1])
    qml.CNOT(wires=[0, 1])  # this cancels the previous CNOT
    qml.RY(params[0], wires=0)
    qml.RY(-params[0], wires=0)  # this cancels the previous RY
    qml.RZ(params[1], wires=1)
    return qml.expval(qml.PauliZ(0) @ qml.PauliZ(1))

# After optimization: only the non-canceling gates remain
@qml.qnode(dev)
def optimized(params):
    qml.RZ(params[1], wires=1)
    return qml.expval(qml.PauliZ(0) @ qml.PauliZ(1))

params = np.array([0.5, 0.8])
print(f"Unoptimized: {unoptimized(params):.6f}")
print(f"Optimized:   {optimized(params):.6f}")
# Results should be identical, but the optimized version runs faster
# and accumulates less noise on real hardware

Shot-Based Gradient Estimation Under Noise

In variational algorithms, gradients are estimated using the parameter-shift rule. For a gate Ry(theta), the gradient of the expectation value is:

dE/dtheta = (E(theta + pi/2) - E(theta - pi/2)) / 2

Each evaluation E(theta +/- pi/2) is estimated from a finite number of shots. The variance of a single expectation value estimated from N_shots is approximately Var(O) / N_shots, where Var(O) is the variance of the observable. For a Pauli observable, Var(O) <= 1.

The gradient estimate, being the difference of two such estimates divided by 2, has variance approximately Var(O) / N_shots (the factor of 2 in the denominator and the sum of two variances in the numerator cancel).

Hardware noise adds an additional bias to the gradient estimate (the noisy gradient points in a slightly different direction than the ideal gradient), but does not significantly change the variance for moderate noise levels.

import pennylane as qml
from pennylane import numpy as np
import matplotlib.pyplot as plt

dev = qml.device("default.mixed", wires=1, shots=1000)

@qml.qnode(dev)
def noisy_expval(theta, p_noise=0.02):
    qml.RY(theta, wires=0)
    qml.DepolarizingChannel(p_noise, wires=0)
    return qml.expval(qml.PauliZ(0))

def parameter_shift_gradient(theta, p_noise=0.02):
    """Compute gradient using the parameter-shift rule."""
    shift = np.pi / 2
    return (float(noisy_expval(theta + shift, p_noise)) -
            float(noisy_expval(theta - shift, p_noise))) / 2

# Compute the gradient 100 times to see the distribution
theta = 0.8
n_trials = 100
gradients = [parameter_shift_gradient(theta) for _ in range(n_trials)]

# Theoretical prediction for the standard deviation
# For PauliZ, Var(O) = 1 - <Z>^2. For the gradient, variance is ~ 2*Var/(4*N_shots) = Var/(2*N_shots)
# Simplified: std of gradient ~ 1/sqrt(N_shots) for a Pauli observable
N_shots = 1000
theoretical_std = 1.0 / np.sqrt(N_shots)  # approximate upper bound

print(f"Gradient at theta={theta}:")
print(f"  Mean:            {np.mean(gradients):.4f}")
print(f"  Empirical std:   {np.std(gradients):.4f}")
print(f"  Theoretical std: {theoretical_std:.4f} (upper bound)")

# Compute noiseless gradient for comparison
dev_exact = qml.device("default.qubit", wires=1)

@qml.qnode(dev_exact)
def exact_expval(theta):
    qml.RY(theta, wires=0)
    return qml.expval(qml.PauliZ(0))

exact_gradient = (float(exact_expval(theta + np.pi/2)) -
                  float(exact_expval(theta - np.pi/2))) / 2
print(f"  Exact gradient:  {exact_gradient:.4f}")
print(f"  Noise bias:      {abs(np.mean(gradients) - exact_gradient):.4f}")

# Plot the distribution
plt.figure(figsize=(8, 4))
plt.hist(gradients, bins=20, density=True, alpha=0.7, color="steelblue")
plt.axvline(x=exact_gradient, color="red", linestyle="--", label=f"Exact = {exact_gradient:.3f}")
plt.axvline(x=np.mean(gradients), color="black", linestyle="-", label=f"Mean = {np.mean(gradients):.3f}")
plt.xlabel("Gradient estimate")
plt.ylabel("Density")
plt.title("Distribution of Parameter-Shift Gradients (1000 shots, 2% noise)")
plt.legend()
plt.tight_layout()
plt.savefig("gradient_distribution.png", dpi=150)
plt.show()

The key takeaway: shot noise and hardware noise both affect gradient estimates. Increasing the shot count reduces variance but not noise-induced bias. To correct the bias, you need error mitigation on the expectation values before computing the gradient.

PennyLane + Mitiq Integration

Mitiq is a dedicated error mitigation library developed by the Unitary Foundation. It provides battle-tested implementations of ZNE (with multiple folding strategies), probabilistic error cancellation, and Clifford data regression. PennyLane integrates with Mitiq through the built-in qml.mitigate_with_zne transform, which accepts Mitiq’s folding functions and extrapolation factories directly.

# pip install mitiq pennylane

import pennylane as qml
from pennylane import numpy as np
from mitiq.zne.scaling import fold_global
from mitiq.zne.inference import RichardsonFactory

dev_noisy = qml.device("default.mixed", wires=2)
dev_ideal = qml.device("default.qubit", wires=2)

H = qml.Hamiltonian(
    [1.0, 0.5, -0.25],
    [qml.PauliZ(0), qml.PauliZ(1), qml.PauliZ(0) @ qml.PauliZ(1)]
)

params = np.array([0.5, 0.3, 0.8])

@qml.qnode(dev_noisy)
def noisy_circuit(params):
    qml.RY(params[0], wires=0)
    qml.RY(params[1], wires=1)
    qml.CNOT(wires=[0, 1])
    qml.RY(params[2], wires=0)
    qml.DepolarizingChannel(0.02, wires=0)
    qml.DepolarizingChannel(0.02, wires=1)
    return qml.expval(H)

@qml.qnode(dev_ideal)
def ideal_circuit(params):
    qml.RY(params[0], wires=0)
    qml.RY(params[1], wires=1)
    qml.CNOT(wires=[0, 1])
    qml.RY(params[2], wires=0)
    return qml.expval(H)

# PennyLane's qml.mitigate_with_zne transform plugs Mitiq's tools into a
# QNode. On hardware (or a hardware-style noisy device), it folds the gates
# (fold_global) and extrapolates (RichardsonFactory.extrapolate):
#
#   mitigated = qml.mitigate_with_zne(
#       circuit_on_noisy_device,
#       scale_factors=[1, 2, 3],
#       folding=fold_global,
#       extrapolate=RichardsonFactory.extrapolate,
#   )
#
# Gate folding cannot amplify the explicit channel ops in our simulated
# circuit (channels have no adjoint), so for this simulation we compute
# results at each noise level ourselves and extrapolate the same way
# Mitiq's RichardsonFactory does:
scale_factors = [1.0, 2.0, 3.0]
results_at_scales = []
for sf in scale_factors:
    scaled_noise = 0.02 * sf

    @qml.qnode(dev_noisy)
    def scaled_circuit(params, p=scaled_noise):
        qml.RY(params[0], wires=0)
        qml.RY(params[1], wires=1)
        qml.CNOT(wires=[0, 1])
        qml.RY(params[2], wires=0)
        qml.DepolarizingChannel(p, wires=0)
        qml.DepolarizingChannel(p, wires=1)
        return qml.expval(H)

    results_at_scales.append(float(scaled_circuit(params)))

# Use numpy for Richardson extrapolation (same as Mitiq does internally)
coeffs = np.polyfit(scale_factors, results_at_scales, deg=len(scale_factors) - 1)
manual_zne = np.polyval(coeffs, 0)

ideal_val = float(ideal_circuit(params))
noisy_val = results_at_scales[0]

print(f"Ideal value:       {ideal_val:.4f}")
print(f"Noisy (1x):        {noisy_val:.4f}")
print(f"Manual ZNE:        {manual_zne:.4f}")
print(f"Noisy error:       {abs(noisy_val - ideal_val):.4f}")
print(f"ZNE error:         {abs(manual_zne - ideal_val):.4f}")

Mitiq provides additional folding strategies beyond global folding: fold_gates_at_random randomly selects individual gates to fold (which can reduce the variance of the folded circuit), and fold_gates_from_left/fold_gates_from_right fold gates in order (useful when noise varies across the circuit). For production use, Mitiq’s implementation is more robust than a hand-rolled version.

Common Mistakes

Five pitfalls that trip up practitioners regularly:

1. Using default.qubit instead of default.mixed for noise simulation. The default.qubit device is a statevector simulator that ignores all noise channels. If you add qml.DepolarizingChannel to a circuit running on default.qubit, PennyLane will raise an error or silently produce incorrect results. Always use default.mixed when your circuit includes any noise operations.

# WRONG: noise channels are not supported on default.qubit
dev_wrong = qml.device("default.qubit", wires=2)

# CORRECT: use default.mixed for noisy simulation
dev_correct = qml.device("default.mixed", wires=2)

2. Scaling the noise parameter instead of folding gates. In simulation, it is tempting to scale the depolarizing probability directly (e.g., multiply p by a scale factor). This is a useful approximation for quick experiments, but it does not correspond to what happens on real hardware. On a real device, you cannot change the noise rate; you can only change the circuit depth. Gate folding (G * G^dagger * G) is the physically correct approach. If you build your mitigation pipeline using parameter scaling, it will not transfer to hardware execution.

3. Applying ZNE to a circuit with too much noise. ZNE requires that the expectation value varies smoothly as a function of the noise level. For very noisy circuits, the signal is buried in noise and the noisy expectation values are nearly constant regardless of the scale factor. In this regime, extrapolation amplifies noise rather than canceling it. A useful rule of thumb: check that the difference between the 1x and 3x noise results is at least 3 times larger than the shot noise. If the noisy values are all within the error bars of each other, ZNE will not help.

4. Not including readout error correction. Gate error mitigation (ZNE, CDR) addresses errors that occur during circuit execution, but readout errors happen at measurement time and are unaffected by these techniques. If you apply ZNE without readout correction, the remaining readout bias can be the dominant error source. Best practice: apply readout error correction first (using the calibration matrix approach), then apply ZNE to the readout-corrected expectation values.

5. Computing gradients with the noisy circuit but optimizing as if noiseless. The noisy cost landscape is flatter and can have shifted minima compared to the ideal landscape. If you use noisy gradients to optimize parameters but evaluate convergence against the ideal cost, you may converge to a point that is a minimum of the noisy landscape but not of the ideal one. For best results, apply error mitigation to each cost function evaluation within the optimization loop. This is more expensive (each gradient step requires running the circuit at multiple noise levels) but produces parameters that are closer to the true optimum.

When Mitigation Helps (and When It Does Not)

Situation	Mitigation useful?
Shallow circuit, low noise	Yes, ZNE works well
Deep circuit, high noise	Limited; signal buried in noise before extrapolation is reliable
Few qubits	Yes, calibration overhead is manageable
Many qubits	Calibration cost scales exponentially for full readout mitigation
Estimating expectation values	Yes, ZNE is designed for this
Sampling from a distribution	No, ZNE does not apply
Circuit has known symmetry	Yes, symmetry verification is free
Non-Clifford gate count is small	CDR works well

The cost of ZNE is roughly 3x the shot budget (for 3 noise levels). The cost of CDR scales with the number of training circuits (typically 20 to 50). Symmetry verification has zero overhead beyond the post-processing step.

What to Try Next

Apply the CDR technique from this tutorial to a molecular Hamiltonian from qml.qchem and compare the corrected ground state energy to the exact value
Combine symmetry verification with ZNE for a doubly mitigated result
Experiment with qml.ThermalRelaxationError to model noise using physical T1/T2 parameters directly, rather than abstract depolarizing rates
Build a noise-adaptive optimizer that adjusts the shot count based on the estimated gradient variance
Try Mitiq’s probabilistic error cancellation (PEC) for circuits with well-characterized noise, where the error channel inverse can be expressed as a quasi-probability distribution over Pauli operations
See the PennyLane Reference for the full list of noise channels in default.mixed