Error Mitigation on Amazon Braket

The Physics of Noise in NISQ Devices

Every operation on a real quantum processor introduces error. Understanding the specific sources and magnitudes of that error is the first step toward mitigating it.

Gate errors are the dominant source for most circuits. Single-qubit gates on modern superconducting processors achieve fidelities of 99.8-99.95%, meaning a 0.05-0.2% chance of error per gate. Two-qubit gates (CNOT, CZ) are significantly worse, with typical fidelities of 99.0-99.5%. That 0.5-1.0% error per two-qubit gate sounds small, but it compounds multiplicatively across a circuit.

Consider a 4-qubit GHZ state preparation with 3 CNOT gates at 99.5% fidelity each: the circuit fidelity is roughly 0.995^3 = 98.5%. Manageable. Now scale to a 20-qubit circuit with 50 two-qubit gates: 0.995^50 = 77.8%. At 99.0% fidelity per gate, that drops to 0.99^50 = 60.5%. The signal is already severely degraded.

Decoherence limits how long qubits hold their quantum state. Two timescales matter:

T1 (energy relaxation): the time for an excited qubit to decay to the ground state. Typical values range from 50 to 300 microseconds on superconducting hardware, and milliseconds to seconds on trapped-ion devices.
T2 (dephasing): the time for a superposition to lose its phase coherence. T2 is always bounded by 2*T1, and in practice is often shorter due to low-frequency noise. Typical values: 50-200 microseconds on superconducting qubits.

A circuit that takes 10 microseconds to execute on hardware with T2 = 100 microseconds loses roughly 10% of its coherence to dephasing alone, on top of gate errors.

Readout errors occur during measurement. A qubit in state |0> might be reported as “1” with probability 1-5%, and vice versa. These errors are typically asymmetric: the probability of misreading |1> as “0” (due to T1 decay during readout) is usually higher than misreading |0> as “1”. On current IBM and Rigetti hardware, readout error rates of 1-3% per qubit are common.

The compounding problem: for a circuit with g two-qubit gates at fidelity f, n qubits with readout error r, and circuit duration t relative to T2:

Gate fidelity contribution: f^g
Coherence contribution: exp(-t/T2) (approximate)
Readout contribution: (1-r)^n for a single bitstring

These multiply together. A circuit with 20 two-qubit gates at 99.5%, 4 qubits with 2% readout error, and t/T2 = 0.1 gives roughly 0.995^20 * exp(-0.1) * 0.98^4 = 0.905 * 0.905 * 0.922 = 75.5% overall fidelity. Error mitigation techniques recover useful information from these noisy results.

Zero-Noise Extrapolation (ZNE)

ZNE estimates the noiseless expectation value by deliberately amplifying noise and then extrapolating back to the zero-noise limit.

Why ZNE Works: The Mathematical Foundation

The core idea rests on a simple observation: the expectation value of an observable is a smooth function of the noise level. If we write the noise scale as lambda (where lambda=1 is the physical noise level), the measured expectation value can be expanded as:

E(lambda) = E_ideal + c1*lambda + c2*lambda^2 + c3*lambda^3 + ...

At lambda=0, the noise vanishes and we recover the ideal expectation value E_ideal. The problem is that we cannot run at lambda=0 on real hardware. What we can do is run at lambda=1 (base noise), lambda=3, lambda=5, and so on, by artificially amplifying the noise via circuit folding. With measurements at multiple noise levels, we fit a model to E(lambda) and read off the intercept at lambda=0.

The key assumption is that the noise scales predictably with the folding factor. Gate folding (described below) triples, quintuples, etc. the number of noisy gates, so the noise contribution scales roughly linearly with the scale factor when the per-gate error rate is small. This assumption breaks down when gate errors are large (above ~5%) or when the circuit is so deep that decoherence dominates over gate errors.

The Three Steps of ZNE

Run the circuit at the base noise level (scale factor 1) to get E(1).
Amplify the noise by folding the circuit at scale factors 3, 5, 7, etc. and measure E(3), E(5), E(7).
Fit a model to the (scale_factor, expectation_value) data and extrapolate to scale factor 0.

Gate Folding vs. Pulse Stretching

Two main approaches exist for amplifying noise:

Gate folding replaces each gate G with the sequence G * G_dag * G. Since G_dag * G = I (the identity), the ideal action of the circuit is unchanged, but the physical noise triples because three gates execute instead of one. You can fold the entire circuit (global folding) or fold individual gates selectively (local folding). Global folding at scale factor s appends (s-1)/2 copies of (circuit_adjoint + circuit) to the original circuit.

Pulse stretching slows the physical control pulses that implement each gate, increasing the time qubits spend interacting with the environment. This amplifies decoherence-type noise more naturally than gate folding. However, pulse stretching requires pulse-level access to the hardware, which Amazon Braket currently exposes only on select Rigetti devices through Braket Pulse.

For Braket, gate folding is the standard approach. Within gate folding, partial folding targets only the noisiest gates (typically two-qubit gates). This is more efficient because it amplifies the dominant noise source without unnecessarily inflating circuit depth from already-high-fidelity single-qubit gates.

Setting Up a Noisy Simulation

We use Braket’s density matrix simulator to model depolarizing noise.

from braket.circuits import Circuit, Observable
from braket.devices import LocalSimulator
from braket.circuits.noise import Noise
import numpy as np

# Base circuit: 4-qubit GHZ state with ZZ expectation measurement
def base_circuit():
    circ = Circuit()
    circ.h(0)
    circ.cnot(0, 1)
    circ.cnot(1, 2)
    circ.cnot(2, 3)
    return circ

# Add depolarizing noise to every gate
def add_noise(circuit, p):
    noisy = circuit.copy()
    noisy.apply_gate_noise(Noise.Depolarizing(probability=p))
    return noisy

Implementing Circuit Folding

def fold_circuit(circuit, scale_factor):
    """Fold the entire circuit to amplify noise by scale_factor.

    scale_factor must be an odd integer (1, 3, 5, ...).
    Folding appends (circuit_adjoint + circuit) pairs.
    """
    if scale_factor == 1:
        return circuit

    folded = circuit.copy()
    n_folds = (scale_factor - 1) // 2

    for _ in range(n_folds):
        # Append circuit_adjoint then circuit again
        folded.add_circuit(circuit.adjoint())
        folded.add_circuit(circuit)

    return folded

Verbatim Compilation on Braket

When you submit a circuit to a Braket QPU, the service may transpile it: reordering gates, canceling redundant operations, and remapping qubits. This is helpful for general workloads, but it undermines ZNE. If the compiler cancels the G_dag*G pairs you inserted for folding, the noise amplification disappears.

To prevent this, Braket supports verbatim compilation on certain devices (IonQ Forte, Rigetti Ankaa and Cepheus class processors, and others). Verbatim mode tells the compiler to execute your circuit exactly as specified.

from braket.aws import AwsDevice
from braket.circuits import Circuit

# Select a device that supports verbatim compilation
device = AwsDevice("arn:aws:braket:us-west-1::device/qpu/rigetti/Ankaa-3")

# Build a folded circuit
folded = fold_circuit(base_circuit(), scale_factor=3)

# Submit with verbatim compilation enabled
# disable_qubit_rewiring prevents the compiler from remapping logical to physical qubits
# The circuit must use native gates for the target device when using verbatim mode
task = device.run(
    folded,
    shots=1000,
    disable_qubit_rewiring=True
)
result = task.result()

When using verbatim mode, your circuit must be expressed in the device’s native gate set. For Rigetti devices, that typically means decomposing into RZ, RX(pi/2), and iSWAP gates, though the exact set varies by processor generation. For IonQ’s current Forte-generation devices, the native gates are GPI, GPI2, and ZZ (the retired Aria devices used MS instead of ZZ). You can check the supported gates with:

# Inspect native gate set
props = device.properties
print(props.paradigm.nativeGateSet)

# Check if verbatim mode is supported
print(props.action.get("braket.ir.openqasm.program"))

For local simulation (which we use throughout this tutorial), verbatim mode is not needed because the simulator does not perform transpilation.

Running ZNE with Linear Extrapolation

device = LocalSimulator("density_matrix")

noise_prob = 0.01  # 1% depolarizing per gate
scale_factors = [1, 3, 5]
expectation_values = []

for sf in scale_factors:
    folded = fold_circuit(base_circuit(), sf)
    noisy = add_noise(folded, noise_prob)
    noisy.expectation(Observable.Z() @ Observable.Z(), target=[0, 3])

    task = device.run(noisy, shots=0)  # shots=0 for exact density matrix
    result = task.result()
    ev = result.values[0]
    expectation_values.append(ev)
    print(f"Scale factor {sf}: <Z0 Z3> = {ev:.6f}")

# Linear extrapolation to zero noise
coeffs = np.polyfit(scale_factors, expectation_values, deg=1)
zne_estimate = coeffs[1]  # y-intercept
print(f"\nLinear ZNE estimate: {zne_estimate:.6f}")
print(f"Ideal value (GHZ): 1.000000")

Richardson Extrapolation

Richardson extrapolation fits a polynomial of degree n-1 through n data points, which can capture nonlinear noise scaling more accurately than linear extrapolation. The idea: if you have measurements at three scale factors, you fit a quadratic and evaluate at lambda=0, capturing both the linear and quadratic terms in the noise expansion.

def richardson_extrapolation(scale_factors, values):
    """Richardson extrapolation to zero-noise limit."""
    n = len(scale_factors)
    # Solve the Vandermonde system
    V = np.vander(scale_factors, increasing=True)[:, :n]
    coeffs = np.linalg.solve(V, values)
    return coeffs[0]  # Zero-noise estimate is the constant term

richardson_estimate = richardson_extrapolation(scale_factors, expectation_values)
print(f"Richardson ZNE estimate: {richardson_estimate:.6f}")

Richardson extrapolation is more accurate than linear extrapolation when the noise-expectation relationship has significant curvature. However, it is also more sensitive to statistical noise in the individual measurements, because it effectively amplifies the differences between closely spaced data points. With noisy shot-based measurements (rather than exact density matrix simulation), Richardson extrapolation requires more shots per scale factor to be reliable.

Exponential Extrapolation

Some noise channels (particularly amplitude damping and dephasing) cause the expectation value to decay exponentially rather than linearly with circuit depth. In those cases, a better model is:

E(lambda) = E_ideal * exp(-a * lambda)

Taking the logarithm converts this to a linear problem: log(E(lambda)) = log(E_ideal) - a*lambda. The intercept gives log(E_ideal).

from scipy.optimize import curve_fit

def exponential_extrapolation(scale_factors, values):
    """Fit E(lambda) = A * exp(-a * lambda) and return A as the zero-noise estimate."""
    # Only works when all values have the same sign
    if any(v <= 0 for v in values):
        raise ValueError(
            "Exponential extrapolation requires all positive expectation values. "
            "Use linear or Richardson extrapolation instead."
        )

    # Log-linear fit: log(E) = log(A) - a * lambda
    log_values = np.log(values)
    coeffs = np.polyfit(scale_factors, log_values, deg=1)
    log_A = coeffs[1]  # y-intercept in log space
    return np.exp(log_A)

# Alternative: nonlinear least squares for better handling of noise
def exponential_extrapolation_nlls(scale_factors, values):
    """Nonlinear least squares fit for E(lambda) = A * exp(-a * lambda)."""
    def model(x, A, a):
        return A * np.exp(-a * x)

    # Initial guess: A ~ first value, a ~ decay rate from first two points
    a_guess = -np.log(values[1] / values[0]) / (scale_factors[1] - scale_factors[0])
    p0 = [values[0] * np.exp(a_guess * scale_factors[0]), a_guess]

    popt, pcov = curve_fit(model, scale_factors, values, p0=p0)
    return popt[0]  # A is the zero-noise estimate

When to use each extrapolation method:

Method	Best for	Weakness
Linear	Low noise, few scale factors	Ignores curvature, biased for strong noise
Richardson	Moderate noise, polynomial scaling	Sensitive to shot noise, can overshoot
Exponential	Amplitude damping, dephasing-dominated circuits	Fails when expectation values cross zero

Full ZNE Example with Error Analysis

Real hardware measurements have statistical fluctuations from finite shot counts. A production ZNE workflow needs variance estimates and confidence intervals.

from braket.circuits import Circuit, Observable
from braket.devices import LocalSimulator
from braket.circuits.noise import Noise
import numpy as np

def base_circuit():
    circ = Circuit()
    circ.h(0)
    circ.cnot(0, 1)
    circ.cnot(1, 2)
    circ.cnot(2, 3)
    return circ

def fold_circuit(circuit, scale_factor):
    if scale_factor == 1:
        return circuit
    folded = circuit.copy()
    n_folds = (scale_factor - 1) // 2
    for _ in range(n_folds):
        folded.add_circuit(circuit.adjoint())
        folded.add_circuit(circuit)
    return folded

def add_noise(circuit, p):
    noisy = circuit.copy()
    noisy.apply_gate_noise(Noise.Depolarizing(probability=p))
    return noisy

device = LocalSimulator("density_matrix")
noise_prob = 0.01
scale_factors = [1, 3, 5]
n_repetitions = 20  # Repeat each scale factor for variance estimation
shots_per_run = 1000

# Collect results across repetitions
all_results = {sf: [] for sf in scale_factors}

for sf in scale_factors:
    folded = fold_circuit(base_circuit(), sf)
    noisy = add_noise(folded, noise_prob)
    noisy.expectation(Observable.Z() @ Observable.Z(), target=[0, 3])

    for rep in range(n_repetitions):
        task = device.run(noisy, shots=shots_per_run)
        result = task.result()
        ev = result.values[0]
        all_results[sf].append(ev)

# Compute means and standard errors
means = []
std_errors = []
for sf in scale_factors:
    vals = np.array(all_results[sf])
    mean = np.mean(vals)
    se = np.std(vals, ddof=1) / np.sqrt(n_repetitions)
    means.append(mean)
    std_errors.append(se)
    print(f"Scale factor {sf}: <Z0 Z3> = {mean:.6f} +/- {se:.6f}")

# Linear extrapolation with uncertainty propagation
coeffs, cov = np.polyfit(scale_factors, means, deg=1, cov=True)
zne_linear = coeffs[1]
zne_linear_err = np.sqrt(cov[1][1])

# Richardson extrapolation
V = np.vander(scale_factors, increasing=True)[:, :len(scale_factors)]
V_inv = np.linalg.inv(V)
richardson_coeffs = V_inv @ np.array(means)
zne_richardson = richardson_coeffs[0]
# Propagate uncertainty through the Richardson inversion
se_array = np.array(std_errors)
zne_richardson_err = np.sqrt(np.sum((V_inv[0, :] * se_array) ** 2))

print(f"\nLinear ZNE:      {zne_linear:.6f} +/- {zne_linear_err:.6f}")
print(f"Richardson ZNE:  {zne_richardson:.6f} +/- {zne_richardson_err:.6f}")
print(f"Ideal value:     1.000000")

# Visualization of the extrapolation curve
print("\nExtrapolation plot (scale factor vs expectation value):")
print("  lambda | E(lambda)")
print("  -------|-----------")
for sf, m, se in zip(scale_factors, means, std_errors):
    bar = "=" * int(max(0, m * 40))
    print(f"  {sf:5d}  | {m:.4f} +/- {se:.4f}  {bar}")
print(f"  {0:5d}  | {zne_linear:.4f} (linear extrapolation)")
print(f"  {0:5d}  | {zne_richardson:.4f} (Richardson extrapolation)")

Resource Estimation: Budgeting Shots for ZNE

Before running ZNE on paid QPU time, calculate the total shot budget. With k scale factors and s shots per scale factor, the total cost is k * s shots for a single ZNE estimate. If you also run r repetitions for error bars, the total is k * s * r.

Example budget for the setup above:

Component	Count
Scale factors	3 (lambda = 1, 3, 5)
Shots per run	1,000
Repetitions	20
Total shots	60,000

At Braket’s pricing for Rigetti’s Cepheus QPU ( $0.30 per task plus$ 0.000425 per shot as of mid-2026), 60 tasks at 1,000 shots each cost roughly 60 * $0.30 + 60,000 *$ 0.000425 = $18 +$ 25.50 = $43.50. That is for a single [observable](/glossary/hermitian-operator) on a 4-qubit circuit. Per-shot fees vary by provider (IQM Garnet is$ 0.00145 per shot, IonQ Forte is $0.08 per shot), so always check the Braket pricing page for current rates before budgeting.

Practical guidelines for shot allocation:

Minimum reliable shots per scale factor: 1,000-4,000 shots for expectation values with precision to ~0.01-0.02. The standard error of a Z-basis expectation value from s shots is roughly 1/sqrt(s), so 1,000 shots gives ~0.03 precision.
Number of scale factors: 3 is the minimum (supports quadratic fit). 5 scale factors (1, 3, 5, 7, 9) give a more robust fit but cost 67% more. Using more than 5 rarely helps because high scale factors produce nearly random outputs.
Repetitions for error bars: 10-30 repetitions are sufficient for estimating the standard error of the mean. On simulators with shot noise, 20 repetitions is a reasonable default.
Diminishing returns: doubling the shots per scale factor only reduces the standard error by a factor of sqrt(2) = 1.41. Investing in more scale factors or better extrapolation models is often more cost-effective than brute-force shot increases.

Measurement Error Mitigation

Readout errors flip measurement outcomes independently of the quantum state. We can characterize them by preparing known computational basis states and measuring the confusion rates.

Full Calibration Matrix

The calibration matrix M has entries M[i][j] = P(measure i | prepared j). For n qubits, this is a 2^n by 2^n matrix. To correct raw measurement probabilities p_raw, we compute p_corrected = M^(-1) * p_raw (using the pseudoinverse to handle numerical stability).

def build_calibration_matrix(device, n_qubits, shots=10000):
    """Measure each computational basis state to build the readout calibration matrix."""
    n_states = 2 ** n_qubits
    cal_matrix = np.zeros((n_states, n_states))

    for state_idx in range(n_states):
        # Prepare |state_idx> by applying X gates
        circ = Circuit()
        bits = format(state_idx, f'0{n_qubits}b')
        for q, b in enumerate(bits):
            if b == '1':
                circ.x(q)

        # Add readout noise
        circ.apply_readout_noise(Noise.BitFlip(probability=0.02))

        task = device.run(circ, shots=shots)
        counts = task.result().measurement_counts

        for outcome, count in counts.items():
            outcome_idx = int(outcome, 2)
            cal_matrix[outcome_idx][state_idx] = count / shots

    return cal_matrix

# Build and invert the calibration matrix
sim = LocalSimulator()
cal_matrix = build_calibration_matrix(sim, n_qubits=4, shots=10000)
cal_inverse = np.linalg.pinv(cal_matrix)

# Apply correction to raw measurement distribution
raw_circuit = base_circuit()
raw_circuit.apply_readout_noise(Noise.BitFlip(probability=0.02))
task = sim.run(raw_circuit, shots=10000)
raw_counts = task.result().measurement_counts

# Convert counts to probability vector
n_states = 16
raw_probs = np.zeros(n_states)
for outcome, count in raw_counts.items():
    raw_probs[int(outcome, 2)] = count / 10000

mitigated_probs = cal_inverse @ raw_probs
# Clip negative values (physical constraint)
mitigated_probs = np.clip(mitigated_probs, 0, None)
mitigated_probs /= mitigated_probs.sum()

print("Raw vs mitigated probabilities for |0000> and |1111>:")
print(f"  |0000>: raw={raw_probs[0]:.4f}, mitigated={mitigated_probs[0]:.4f}")
print(f"  |1111>: raw={raw_probs[15]:.4f}, mitigated={mitigated_probs[15]:.4f}")

The Scaling Problem: Tensored Mitigation

The full calibration matrix has 2^n rows and 2^n columns. For 10 qubits, that is 1,024 x 1,024, requiring 1,024 calibration circuits. For 20 qubits, it is over a million calibration circuits, which is completely impractical.

The tensored mitigation approach solves this by assuming readout errors are independent across qubits. Instead of one 2^n x 2^n matrix, you build n separate 2x2 matrices (one per qubit) and take the tensor product. This reduces the calibration cost from 2^n circuits to 2n circuits.

Each per-qubit calibration matrix has the form:

M_qubit = [[P(0|0), P(0|1)],
            [P(1|0), P(1|1)]]

where P(measured|prepared) is the probability of measuring a given outcome when the qubit was prepared in a known state.

def build_tensored_calibration(device, n_qubits, shots=10000):
    """Build per-qubit 2x2 calibration matrices.

    Requires only 2*n_qubits calibration circuits instead of 2^n_qubits.
    """
    qubit_matrices = []

    for q in range(n_qubits):
        cal_2x2 = np.zeros((2, 2))

        for prep_state in [0, 1]:
            circ = Circuit()
            # Prepare the target qubit
            if prep_state == 1:
                circ.x(q)
            # Other qubits stay in |0>

            # Add readout noise to the target qubit only
            circ.apply_readout_noise(Noise.BitFlip(probability=0.02))

            task = device.run(circ, shots=shots)
            counts = task.result().measurement_counts

            # Extract the target qubit's measurement statistics
            for outcome, count in counts.items():
                # outcome is a bitstring over all qubits
                measured_bit = int(outcome[q])
                cal_2x2[measured_bit][prep_state] += count / shots

        qubit_matrices.append(cal_2x2)

    return qubit_matrices


def tensored_inverse(qubit_matrices):
    """Compute the inverse of the tensor product of per-qubit calibration matrices.

    The inverse of a tensor product is the tensor product of the inverses.
    """
    inverses = [np.linalg.inv(m) for m in qubit_matrices]

    # Build the full inverse by taking the tensor product
    result = inverses[0]
    for inv_m in inverses[1:]:
        result = np.kron(result, inv_m)

    return result


# Example usage
sim = LocalSimulator()
qubit_cals = build_tensored_calibration(sim, n_qubits=4, shots=10000)

# Print per-qubit calibration matrices
for q, m in enumerate(qubit_cals):
    print(f"Qubit {q} calibration matrix:")
    print(f"  P(0|0)={m[0,0]:.4f}  P(0|1)={m[0,1]:.4f}")
    print(f"  P(1|0)={m[1,0]:.4f}  P(1|1)={m[1,1]:.4f}")

# Apply tensored correction
tensored_inv = tensored_inverse(qubit_cals)
mitigated_probs_tensored = tensored_inv @ raw_probs
mitigated_probs_tensored = np.clip(mitigated_probs_tensored, 0, None)
mitigated_probs_tensored /= mitigated_probs_tensored.sum()

print("\nTensored mitigation results:")
print(f"  |0000>: mitigated={mitigated_probs_tensored[0]:.4f}")
print(f"  |1111>: mitigated={mitigated_probs_tensored[15]:.4f}")

The tradeoff: tensored mitigation cannot capture correlated readout errors (where the measurement of qubit 0 affects qubit 1). On most current hardware, correlations are small enough that tensored mitigation performs nearly as well as the full matrix approach, at a fraction of the cost.

Checking for Built-in Readout Mitigation

Some Braket QPU devices offer native readout error mitigation at the hardware or firmware level. You can check whether a device supports this by inspecting its properties:

from braket.aws import AwsDevice

device = AwsDevice("arn:aws:braket:us-east-1::device/qpu/ionq/Forte-1")

# Check device action properties for error mitigation support
action_props = device.properties.action
if hasattr(action_props, 'braket.ir.openqasm.program'):
    program_props = action_props['braket.ir.openqasm.program']
    print(f"Supported result types: {program_props.supportedResultTypes}")

# IonQ devices support error mitigation through their "debias" option
# Check device-specific capabilities
print(f"Device capabilities: {device.properties.provider}")

When a device provides native readout mitigation, you may still benefit from applying your own, but be careful not to double-correct. If the device has already applied readout correction to the returned counts, applying your calibration matrix on top will introduce bias. Always check the device documentation and provider metadata to understand what corrections are applied automatically.

Combining ZNE and Readout Mitigation

Both techniques address different error sources: ZNE handles gate errors and decoherence, while readout mitigation handles measurement errors. They compose naturally, but the order matters.

Correct order:

Collect raw measurement counts from the quantum device.
Apply readout error correction to the raw counts (classical post-processing on the probability distribution).
Compute the expectation value from the corrected counts.
Repeat steps 1-3 for each noise scale factor.
Apply ZNE extrapolation to the corrected expectation values.

Common mistake: computing the expectation value from raw counts first, then trying to correct readout errors on the expectation value. Readout correction operates on the full probability distribution, not on a scalar. Once you collapse the distribution to a single expectation value, the information needed for readout correction is lost.

def expectation_from_counts(counts, observable_qubits, n_qubits, total_shots):
    """Compute <Z^n> expectation value from measurement counts.

    For Z-basis observables, each bitstring contributes +1 or -1
    based on the parity of the measured bits on the target qubits.
    """
    ev = 0.0
    for bitstring, count in counts.items():
        # Compute parity of the observable qubits
        parity = 0
        for q in observable_qubits:
            parity += int(bitstring[q])
        sign = (-1) ** parity
        ev += sign * count
    return ev / total_shots


def zne_with_readout_correction(
    base_circ, scale_factors, noise_prob, readout_prob,
    cal_inverse, shots=1000
):
    """Full ZNE pipeline with readout error correction at each scale factor."""
    device = LocalSimulator()
    corrected_evs = []

    for sf in scale_factors:
        # Step 1: Fold and add noise
        folded = fold_circuit(base_circ, sf)
        noisy = folded.copy()
        noisy.apply_gate_noise(Noise.Depolarizing(probability=noise_prob))
        noisy.apply_readout_noise(Noise.BitFlip(probability=readout_prob))

        # Step 2: Run and get raw counts
        task = device.run(noisy, shots=shots)
        raw_counts = task.result().measurement_counts

        # Step 3: Convert to probability vector and apply readout correction
        n_qubits = 4
        n_states = 2 ** n_qubits
        raw_probs = np.zeros(n_states)
        for outcome, count in raw_counts.items():
            raw_probs[int(outcome, 2)] = count / shots

        corrected_probs = cal_inverse @ raw_probs
        corrected_probs = np.clip(corrected_probs, 0, None)
        corrected_probs /= corrected_probs.sum()

        # Step 4: Compute expectation value from corrected distribution
        # For <Z0 Z3>: parity of bits 0 and 3
        ev = 0.0
        for state_idx in range(n_states):
            bits = format(state_idx, f'0{n_qubits}b')
            parity = int(bits[0]) + int(bits[3])
            sign = (-1) ** parity
            ev += sign * corrected_probs[state_idx]

        corrected_evs.append(ev)
        print(f"Scale factor {sf}: corrected <Z0 Z3> = {ev:.6f}")

    # Step 5: ZNE extrapolation on corrected values
    coeffs = np.polyfit(scale_factors, corrected_evs, deg=1)
    zne_linear = coeffs[1]

    richardson = richardson_extrapolation(scale_factors, corrected_evs)

    print(f"\nZNE + readout correction (linear):     {zne_linear:.6f}")
    print(f"ZNE + readout correction (Richardson):  {richardson:.6f}")
    return zne_linear, richardson


# Run the combined pipeline
sim = LocalSimulator()
cal_matrix = build_calibration_matrix(sim, n_qubits=4, shots=10000)
cal_inv = np.linalg.pinv(cal_matrix)

zne_with_readout_correction(
    base_circuit(),
    scale_factors=[1, 3, 5],
    noise_prob=0.01,
    readout_prob=0.02,
    cal_inverse=cal_inv,
    shots=2000
)

Comparison of Error Mitigation Techniques

Technique	Shot overhead	Accuracy	Assumptions	Best use case
Zero-Noise Extrapolation (ZNE)	k x base shots (k = number of scale factors)	Good for weak-to-moderate noise	Noise scales predictably with folding; per-gate error is small	Expectation value estimation on circuits with moderate depth
Measurement Error Mitigation	2^n or 2n calibration circuits (full vs. tensored)	Excellent for readout errors	Readout errors are stable over time; independent per qubit (for tensored)	Any circuit where readout error is a significant contributor
Probabilistic Error Cancellation (PEC)	Exponential in circuit depth: O(exp(2epsilond)) shots	Unbiased estimator	Full knowledge of noise model required; noise must be invertible	High-accuracy estimation when noise model is well characterized
Symmetry Verification	2-3x base shots	Moderate	Circuit has known symmetries (e.g., particle number conservation)	Variational algorithms with conserved quantities

ZNE and readout mitigation are the most accessible techniques because they require no knowledge of the underlying noise model (ZNE) or only simple calibration (readout). PEC provides unbiased estimates but is exponentially expensive. Symmetry verification is cheap but only applicable when the target state has exploitable symmetries.

Common Mistakes in Error Mitigation

1. Folding Too Aggressively

Using large scale factors (7, 9, 11, …) seems like it would give more data for extrapolation, but the signal degrades exponentially. At scale factor 9 with 1% depolarizing noise per gate, a 4-gate circuit effectively has 36 noisy gates, giving a fidelity of roughly 0.99^36 = 69.5%. The expectation value at that noise level is so far from the ideal that it contributes mostly noise, not signal, to the extrapolation fit.

Rule of thumb: stop increasing the scale factor when the expectation value drops below ~30% of its ideal magnitude. Beyond that point, the measurement is dominated by statistical noise rather than the physical signal you are trying to extrapolate.

2. Assuming Noise Scales Linearly Without Verification

ZNE extrapolation assumes that the noise-expectation curve is smooth and predictable. Before trusting the extrapolated value, plot E(lambda) against lambda. If the curve has kinks, plateaus, or non-monotonic behavior, the extrapolation will be unreliable.

A simple diagnostic: run at 5 or more scale factors and check whether the data is well-fit by your chosen model (linear, quadratic, exponential). Compute the residuals. If the residuals show systematic patterns rather than random scatter, the model is wrong.

# Diagnostic: check residuals of the linear fit
coeffs = np.polyfit(scale_factors, means, deg=1)
fitted = np.polyval(coeffs, scale_factors)
residuals = np.array(means) - fitted

print("Residuals of linear fit:")
for sf, r in zip(scale_factors, residuals):
    print(f"  lambda={sf}: residual={r:.6f}")

# If residuals are large relative to std_errors, the linear model is insufficient
for sf, r, se in zip(scale_factors, residuals, std_errors):
    if abs(r) > 2 * se:
        print(f"  WARNING: residual at lambda={sf} exceeds 2*stderr, "
              f"linear model may be inadequate")

3. Insufficient Calibration Shots

The calibration matrix itself is estimated from finite shots, introducing statistical error. If you use 1,000 shots per calibration state, each entry in the matrix has a standard error of roughly sqrt(p*(1-p)/1000), where p is the true confusion probability. For p=0.02, that is sqrt(0.02*0.98/1000) = 0.0044, meaning the 2% readout error rate is known only to within +/- 0.4%.

This uncertainty propagates through the matrix inversion and into the corrected probabilities. For the full 2^n calibration matrix, matrix inversion amplifies small errors, especially when the matrix is nearly singular (which happens when readout errors are large).

Guideline: use at least 5,000 shots per calibration state, and 10,000+ if high precision is needed. Verify by running the calibration twice and checking that the resulting matrices agree within statistical uncertainty.

4. Exceeding Device Coherence Times

Circuit folding at scale factor s multiplies the circuit depth by roughly s. If the original circuit takes 5 microseconds to execute, a scale factor of 7 produces a circuit that takes 35 microseconds. On hardware with T2 = 50 microseconds, the folded circuit loses 50% of its coherence to dephasing alone, violating the assumption that gate errors dominate.

Before running ZNE on hardware, estimate the folded circuit duration and compare it to the device’s T1 and T2 times. If the folded circuit duration exceeds T2/3 at your largest scale factor, the decoherence contribution will distort the extrapolation. In that case, use fewer or smaller scale factors, or switch to partial folding (folding only two-qubit gates) to reduce the depth increase.

5. Double-Applying Readout Correction

When combining readout mitigation with a device that has native error mitigation enabled, the corrections can conflict. Native mitigation already adjusts the returned counts; applying your calibration matrix on top introduces a systematic bias that pushes probabilities away from the true values.

Always verify what corrections the device applies by default. On Braket, check the device documentation and the device.properties.provider metadata. If native mitigation is enabled and you cannot disable it, skip the calibration matrix step and apply only ZNE.

Key Takeaways

NISQ errors come from three sources: gate errors (0.5-1% per two-qubit gate), decoherence (T1/T2 timescales), and readout errors (1-5% per qubit). They compound multiplicatively across a circuit.
ZNE amplifies noise via circuit folding and extrapolates to the noiseless limit, requiring no knowledge of the noise model. The mathematical basis is a Taylor expansion of E(lambda) around lambda=0.
Gate folding (G to GG_dagG) is the standard noise amplification approach on Braket. Use verbatim compilation to prevent the transpiler from canceling your inserted gates.
Linear, Richardson, and exponential extrapolation each suit different noise regimes. Always verify the fit quality before trusting the extrapolated value.
Measurement error mitigation corrects readout biases using a calibration matrix. For circuits with more than ~10 qubits, use tensored mitigation (2n calibration circuits instead of 2^n).
When combining ZNE and readout mitigation, apply readout correction first (on the probability distribution), then compute expectation values, then extrapolate.
Budget your shots carefully: ZNE with 3 scale factors at 1,000 shots each costs 3,000 shots minimum, and meaningful error bars require 10-30 repetitions per scale factor.
Avoid common pitfalls: excessive folding that drowns the signal, unverified linearity assumptions, insufficient calibration shots, and circuits that exceed hardware coherence times.