Introduction to Quantum Error Models

Quantum computers are physical devices, and physical devices are noisy. Every gate has a finite error rate, qubits leak energy to their surroundings, and measurement is imperfect. Before you can understand quantum error correction, you need a working knowledge of the error models that describe what goes wrong. This tutorial covers the main single-qubit and multi-qubit error channels, their matrix representations, their geometric meaning on the Bloch sphere, how they relate to real hardware parameters, and how to simulate them in both raw NumPy and Qiskit Aer.

Why Density Matrices?

Pure quantum states can be described by state vectors |psi>. But noise introduces classical uncertainty on top of quantum uncertainty. Suppose a qubit preparation device produces |0> with probability 0.8 and |1> with probability 0.2, and you do not know which one it produced. This is a classical mixture, and no single state vector can describe it.

The density matrix handles both pure states and classical mixtures in one framework. For a pure state |psi>, the density matrix is the outer product:

import numpy as np

# Pure state |0>
psi = np.array([1, 0], dtype=complex)
rho_pure = np.outer(psi, psi.conj())
print("Pure |0>:")
print(rho_pure)
# [[1. 0.]
#  [0. 0.]]

For a classical mixture of states {|psi_k>} with probabilities {p_k}, the density matrix is:

rho = sum_k  p_k * |psi_k><psi_k|

# Classical mixture: 80% |0>, 20% |1>
psi_0 = np.array([1, 0], dtype=complex)
psi_1 = np.array([0, 1], dtype=complex)
rho_mixed = 0.8 * np.outer(psi_0, psi_0.conj()) + 0.2 * np.outer(psi_1, psi_1.conj())
print("Mixed state (80% |0>, 20% |1>):")
print(rho_mixed)
# [[0.8 0. ]
#  [0.  0.2]]

Purity: Distinguishing Pure from Mixed States

The purity of a density matrix is Tr(rho^2). It satisfies 1/d <= Tr(rho^2) <= 1, where d is the dimension of the Hilbert space. A pure state has purity exactly 1. The maximally mixed state I/d has purity 1/d.

def purity(rho):
    """Compute the purity Tr(rho^2) of a density matrix."""
    return np.real(np.trace(rho @ rho))

print(f"Purity of pure |0>:    {purity(rho_pure):.4f}")   # 1.0
print(f"Purity of 80/20 mix:   {purity(rho_mixed):.4f}")  # 0.68
print(f"Purity of I/2:         {purity(np.eye(2)/2):.4f}")  # 0.5

Purity is a useful diagnostic: when you send a pure state through a noisy channel, the purity of the output tells you how much information the channel destroyed. A purity of 0.5 for a single qubit means the state has become maximally mixed and carries no recoverable quantum information.

Density matrices also arise naturally from entanglement. If you have a two-qubit entangled state and you trace out one qubit (the partial trace), the remaining qubit is described by a mixed-state density matrix even though the global state is pure. This is why density matrices are essential for describing open quantum systems, where the qubit interacts with an environment you cannot access.

What Is a Quantum Channel?

A quantum channel is a completely positive, trace-preserving (CPTP) linear map that takes a density matrix to a density matrix. The most general form is the Kraus (operator-sum) representation:

rho_out = sum_i  K_i @ rho @ K_i^dagger

The Kraus operators {K_i} must satisfy the completeness relation sum_i K_i^dagger @ K_i = I, which ensures the output is a valid density matrix with trace 1.

The Kraus representation is not unique: different sets of Kraus operators can describe the same physical channel. However, every physically realizable quantum operation (including unitary gates, measurements, and noise) can be written in this form.

The Bit Flip Channel

A bit flip error flips |0> to |1> and |1> to |0> with probability p. This is the quantum analog of a classical bit error.

Kraus operators: K0 = sqrt(1-p) * I, K1 = sqrt(p) * X

def bit_flip_channel(rho, p):
    """Apply bit flip channel with probability p."""
    I = np.eye(2, dtype=complex)
    X = np.array([[0, 1], [1, 0]], dtype=complex)
    K0 = np.sqrt(1 - p) * I
    K1 = np.sqrt(p) * X
    return K0 @ rho @ K0.conj().T + K1 @ rho @ K1.conj().T

# Start in |+> = (|0> + |1>) / sqrt(2)
plus = np.array([1, 1], dtype=complex) / np.sqrt(2)
rho = np.outer(plus, plus.conj())

print("Before:", np.round(rho, 3))
print("After p=0.1:", np.round(bit_flip_channel(rho, 0.1), 3))
print("After p=0.5:", np.round(bit_flip_channel(rho, 0.5), 3))

At p=0.5, the off-diagonal elements (coherences) survive because bit flip acts in the Z basis and |+> is an eigenstate of X. The state is degraded but not fully dephased.

Bloch sphere picture

The bit flip channel transforms the Bloch vector (x, y, z) as:

(x, y, z) -> (x, (1-2p)*y, (1-2p)*z)

The x-component is preserved because X commutes with itself. The y and z components shrink by a factor of (1-2p). At p=0.5, only the x-component survives: any input state collapses to a mixture on the x-axis.

The Phase Flip Channel

A phase flip applies a Z gate with probability p. This flips the sign of the |1> amplitude: |+> becomes |->. Phase flip is the most damaging error for superposition states.

Kraus operators: K0 = sqrt(1-p) * I, K1 = sqrt(p) * Z

def phase_flip_channel(rho, p):
    """Apply phase flip channel with probability p."""
    I = np.eye(2, dtype=complex)
    Z = np.array([[1, 0], [0, -1]], dtype=complex)
    K0 = np.sqrt(1 - p) * I
    K1 = np.sqrt(p) * Z
    return K0 @ rho @ K0.conj().T + K1 @ rho @ K1.conj().T

rho_plus = np.outer(plus, plus.conj())
print("Phase flip p=0.1:", np.round(phase_flip_channel(rho_plus, 0.1), 3))
# Off-diagonal elements are reduced: coherence decays

Phase errors destroy the coherence that makes quantum computation useful. In superconducting qubits, phase noise (dephasing) from charge noise and flux noise is typically a dominant error source.

Bloch sphere picture

The phase flip channel transforms the Bloch vector as:

(x, y, z) -> ((1-2p)*x, (1-2p)*y, z)

The z-component is preserved (populations are unchanged), while the x and y components shrink. This is the opposite pattern from the bit flip channel: it preserves the computational basis information but destroys superposition coherence.

The Dephasing Channel and T2 Time

The phase flip channel is actually the special case of a more general dephasing channel. In real hardware, dephasing is a continuous process driven by the T2 relaxation time. The off-diagonal elements of the density matrix decay exponentially:

rho_out = [[rho_00, rho_01 * exp(-t/T2)],
           [rho_10 * exp(-t/T2), rho_11]]

The dephasing parameter is lambda = exp(-t/T2), so the channel leaves populations unchanged and multiplies coherences by lambda. The phase flip channel with probability p is equivalent to dephasing with lambda = 1 - 2p.

def dephasing_channel(rho, t, T2):
    """Apply dephasing channel for time t given coherence time T2."""
    lam = np.exp(-t / T2)
    rho_out = rho.copy()
    rho_out[0, 1] *= lam
    rho_out[1, 0] *= lam
    return rho_out

# Simulate coherence decay of |+> state over time
T2 = 100e-6  # 100 microseconds
plus = np.array([1, 1], dtype=complex) / np.sqrt(2)
rho_plus = np.outer(plus, plus.conj())

print("Coherence decay over time (T2 = 100 us):")
for t_us in [0, 10, 25, 50, 100, 200]:
    t = t_us * 1e-6
    rho_out = dephasing_channel(rho_plus, t, T2)
    coherence = np.abs(rho_out[0, 1])
    print(f"  t = {t_us:3d} us: |rho_01| = {coherence:.4f}")

This exponential decay is directly measurable in a Ramsey experiment: prepare |+>, wait time t, then measure in the X basis. The visibility of the oscillation decays as exp(-t/T2).

The Depolarizing Channel

The depolarizing channel applies I, X, Y, or Z each with a certain probability, with I being the “no error” outcome. With error parameter p, each Pauli is applied with probability p/4 and the identity with probability 1 - 3p/4.

def depolarizing_channel(rho, p):
    """Apply single-qubit depolarizing channel with error parameter p."""
    I = np.eye(2, dtype=complex)
    X = np.array([[0, 1], [1, 0]], dtype=complex)
    Y = np.array([[0, -1j], [1j, 0]], dtype=complex)
    Z = np.array([[1, 0], [0, -1]], dtype=complex)

    ops = [
        np.sqrt(1 - 3 * p / 4) * I,
        np.sqrt(p / 4) * X,
        np.sqrt(p / 4) * Y,
        np.sqrt(p / 4) * Z,
    ]
    return sum(K @ rho @ K.conj().T for K in ops)

# The depolarizing channel shrinks the Bloch vector toward the origin
psi_0 = np.array([1, 0], dtype=complex)
rho_0 = np.outer(psi_0, psi_0.conj())

for p in [0.0, 0.1, 0.5, 1.0]:
    rho_out = depolarizing_channel(rho_0, p)
    bloch_z = rho_out[0, 0].real - rho_out[1, 1].real
    print(f"p={p:.1f}  Bloch Z={bloch_z:.3f}")

At p=1.0, the output is the maximally mixed state I/2, with Bloch vector length 0. Depolarizing noise is the standard model for gate errors in most quantum computing simulators because it is symmetric and mathematically convenient.

Bloch sphere picture

The depolarizing channel uniformly shrinks the entire Bloch vector. With the parameterization above (identity with probability 1 - 3p/4, each Pauli with probability p/4):

(x, y, z) -> (1 - p) * (x, y, z)

This is the only single-qubit channel that preserves the direction of the Bloch vector while uniformly reducing its length. All other channels distort the Bloch sphere into an ellipsoid.

Amplitude Damping

Amplitude damping models energy relaxation: a qubit in |1> spontaneously emits a photon and decays to |0>. This is the physical process behind the T1 relaxation time.

Kraus operators: K0 = [[1, 0], [0, sqrt(1-gamma)]], K1 = [[0, sqrt(gamma)], [0, 0]]

Note that K1 maps |1> to |0> (it is the lowering operator scaled by sqrt(gamma)). This directionality is important: amplitude damping always drives the qubit toward |0>, not the other way around.

def amplitude_damping_channel(rho, gamma):
    """Apply amplitude damping channel with decay parameter gamma."""
    K0 = np.array([[1, 0], [0, np.sqrt(1 - gamma)]], dtype=complex)
    K1 = np.array([[0, np.sqrt(gamma)], [0, 0]], dtype=complex)
    return K0 @ rho @ K0.conj().T + K1 @ rho @ K1.conj().T

# Start in |1>
psi_1 = np.array([0, 1], dtype=complex)
rho_1 = np.outer(psi_1, psi_1.conj())

for gamma in [0.0, 0.1, 0.5, 1.0]:
    rho_out = amplitude_damping_channel(rho_1, gamma)
    p_excited = rho_out[1, 1].real
    print(f"gamma={gamma:.1f}  P(|1>)={p_excited:.3f}")
# At gamma=1.0, the qubit has completely decayed to |0>

The relationship between gamma and T1 is: gamma = 1 - exp(-t / T1). For a gate duration of t = 50 ns on a device with T1 = 100 us, gamma is approximately 5e-4, small but not negligible over a long circuit.

Bloch sphere picture

The amplitude damping channel is the only channel we have discussed that is asymmetric: it shifts the Bloch sphere rather than just shrinking it. The transformation is:

x -> sqrt(1 - gamma) * x
y -> sqrt(1 - gamma) * y
z -> (1 - gamma) * z + gamma

The z-component is both contracted and shifted upward (toward |0>). The Bloch sphere maps to a teardrop shape, squeezed and pushed toward the north pole.

Computing the Bloch Vector from a Density Matrix

Each channel above transforms the Bloch vector in a specific way. The following utility function extracts the Bloch vector from any single-qubit density matrix:

def bloch_vector(rho):
    """Extract the Bloch vector (x, y, z) from a single-qubit density matrix.

    For rho = (I + r.sigma) / 2, the Bloch vector r = (x, y, z) is:
      x = 2 * Re(rho[0,1])
      y = 2 * Im(rho[1,0])
      z = rho[0,0] - rho[1,1]
    """
    x = 2 * np.real(rho[0, 1])
    y = 2 * np.imag(rho[1, 0])
    z = np.real(rho[0, 0] - rho[1, 1])
    return np.array([x, y, z])

def bloch_length(rho):
    """Length of the Bloch vector. Equals 1 for pure states, <1 for mixed."""
    r = bloch_vector(rho)
    return np.linalg.norm(r)

# Verify Bloch sphere transformations for each channel
# Start with a state pointing in the +x direction: |+>
rho_plus = np.outer(plus, plus.conj())
r0 = bloch_vector(rho_plus)
print(f"Initial Bloch vector: ({r0[0]:.3f}, {r0[1]:.3f}, {r0[2]:.3f})")

p = 0.1
channels = {
    "Bit flip":      bit_flip_channel(rho_plus, p),
    "Phase flip":    phase_flip_channel(rho_plus, p),
    "Depolarizing":  depolarizing_channel(rho_plus, p),
    "Amp. damping":  amplitude_damping_channel(rho_plus, p),
}

print(f"\nAfter each channel (p or gamma = {p}):")
for name, rho_out in channels.items():
    r = bloch_vector(rho_out)
    pur = purity(rho_out)
    print(f"  {name:16s}: r = ({r[0]:+.4f}, {r[1]:+.4f}, {r[2]:+.4f}), "
          f"|r| = {np.linalg.norm(r):.4f}, purity = {pur:.4f}")

Thermal Relaxation: Generalized Amplitude Damping

Standard amplitude damping assumes the environment is at zero temperature, so the qubit always decays to |0>. Real qubits are at a finite temperature (though typically very cold), meaning there is a small thermal excitation probability.

The thermal equilibrium population of the excited state is:

p_eq = 1 / (1 + exp(hbar * omega / (k_B * T)))

For superconducting transmon qubits at 15 mK with a 5 GHz transition frequency, this formula predicts a vanishingly small p_eq (around 10^-7). In practice, measured residual excited-state populations are much higher, typically 0.001 to 0.01, because stray radiation and nonequilibrium quasiparticles keep the qubit’s effective temperature well above the fridge temperature. Trapped-ion systems on optical transitions have p_eq essentially zero.

The generalized amplitude damping channel has four Kraus operators that account for both T1 decay (|1> to |0>) and thermal excitation (|0> to |1>):

def generalized_amplitude_damping(rho, gamma, p_eq):
    """Generalized amplitude damping channel.

    Parameters:
        rho: single-qubit density matrix
        gamma: decay parameter, gamma = 1 - exp(-t/T1)
        p_eq: thermal equilibrium excited-state population
    """
    # Decay operators (weighted by ground-state equilibrium population)
    K0 = np.sqrt(1 - p_eq) * np.array([[1, 0],
                                         [0, np.sqrt(1 - gamma)]], dtype=complex)
    K1 = np.sqrt(1 - p_eq) * np.array([[0, np.sqrt(gamma)],
                                         [0, 0]], dtype=complex)
    # Excitation operators (weighted by excited-state equilibrium population)
    K2 = np.sqrt(p_eq) * np.array([[np.sqrt(1 - gamma), 0],
                                     [0, 1]], dtype=complex)
    K3 = np.sqrt(p_eq) * np.array([[0, 0],
                                     [np.sqrt(gamma), 0]], dtype=complex)

    # Verify completeness: sum K_i^dag K_i = I
    completeness = sum(K.conj().T @ K for K in [K0, K1, K2, K3])
    assert np.allclose(completeness, np.eye(2)), "Completeness violated!"

    return sum(K @ rho @ K.conj().T for K in [K0, K1, K2, K3])

# Simulate T1 decay with thermal population
p_eq = 0.002  # typical measured residual population for a transmon
psi_1 = np.array([0, 1], dtype=complex)
rho_1 = np.outer(psi_1, psi_1.conj())

print("Generalized amplitude damping (p_eq = 0.002):")
for gamma in [0.0, 0.1, 0.5, 0.9, 1.0]:
    rho_out = generalized_amplitude_damping(rho_1, gamma, p_eq)
    p_excited = rho_out[1, 1].real
    print(f"  gamma={gamma:.1f}: P(|1>) = {p_excited:.5f}")
# At gamma=1.0, the qubit reaches thermal equilibrium: P(|1>) = p_eq

At gamma = 1.0 (infinite time), the qubit reaches thermal equilibrium with P(|1>) = p_eq. This is the steady state regardless of the initial state.

Two-Qubit Depolarizing Channel

Single-qubit noise models are not sufficient for characterizing multi-qubit gates. The two-qubit depolarizing channel generalizes the single-qubit version to the 16 two-qubit Pauli operators {I, X, Y, Z} x {I, X, Y, Z}.

With error parameter p, the identity (no error) occurs with probability 1 - 15p/16, and each of the 15 non-identity two-qubit Paulis occurs with probability p/16:

def two_qubit_depolarizing(rho, p):
    """Apply 2-qubit depolarizing channel with error parameter p.

    The 16 two-qubit Paulis {I,X,Y,Z}^2 are applied with probability
    p/16 each, except II which gets probability 1 - 15p/16.
    """
    I = np.eye(2, dtype=complex)
    X = np.array([[0, 1], [1, 0]], dtype=complex)
    Y = np.array([[0, -1j], [1j, 0]], dtype=complex)
    Z = np.array([[1, 0], [0, -1]], dtype=complex)

    paulis_1q = [I, X, Y, Z]

    rho_out = np.zeros_like(rho)
    for i, P1 in enumerate(paulis_1q):
        for j, P2 in enumerate(paulis_1q):
            P = np.kron(P1, P2)
            if i == 0 and j == 0:
                # Identity term
                weight = 1 - 15 * p / 16
            else:
                weight = p / 16
            rho_out += weight * (P @ rho @ P.conj().T)

    return rho_out

# Create a Bell state |Phi+> = (|00> + |11>) / sqrt(2)
bell = np.array([1, 0, 0, 1], dtype=complex) / np.sqrt(2)
rho_bell = np.outer(bell, bell.conj())

print("2-qubit depolarizing on Bell state |Phi+>:")
for p in [0.0, 0.01, 0.1, 0.5, 1.0]:
    rho_out = two_qubit_depolarizing(rho_bell, p)
    # Fidelity with original Bell state
    fidelity = np.real(bell.conj() @ rho_out @ bell)
    print(f"  p={p:.2f}: fidelity = {fidelity:.4f}")
# At p=1.0, the state is I/4 (maximally mixed), fidelity = 0.25

At p = 1.0, the output is the maximally mixed state I/4 on two qubits, and the fidelity with any pure state is 1/4.

Process Matrix (Chi Matrix) Representation

The Kraus representation is not the only way to describe a quantum channel. The process matrix (or chi matrix) representation expands the channel in a fixed operator basis, typically the Pauli basis {I, X, Y, Z}.

For any single-qubit channel with Kraus operators {K_k}, each Kraus operator can be decomposed in the Pauli basis:

K_k = sum_m  e_{km} * sigma_m

where sigma_0 = I, sigma_1 = X, sigma_2 = Y, sigma_3 = Z. The chi matrix is then:

chi_{mn} = sum_k  e_{km} * conj(e_{kn})

The channel action becomes E(rho) = sum_{m,n} chi_{mn} * sigma_m @ rho @ sigma_n.

def compute_chi_matrix(kraus_ops):
    """Compute the chi (process) matrix for a single-qubit channel.

    The chi matrix is expressed in the Pauli basis {I, X, Y, Z}.
    Each Kraus operator K_k is decomposed as K_k = sum_m e_{km} sigma_m,
    and chi_{mn} = sum_k e_{km} conj(e_{kn}).
    """
    I = np.eye(2, dtype=complex)
    X = np.array([[0, 1], [1, 0]], dtype=complex)
    Y = np.array([[0, -1j], [1j, 0]], dtype=complex)
    Z = np.array([[1, 0], [0, -1]], dtype=complex)
    paulis = [I, X, Y, Z]

    chi = np.zeros((4, 4), dtype=complex)

    for K in kraus_ops:
        # Decompose K in the Pauli basis: K = sum_m e_m sigma_m
        # Using Tr(sigma_m @ K) / 2 since Tr(sigma_m @ sigma_n) = 2 delta_mn
        e = np.array([np.trace(P @ K) / 2 for P in paulis])
        chi += np.outer(e, e.conj())

    return chi

# Compute chi for the depolarizing channel
p = 0.1
I = np.eye(2, dtype=complex)
X = np.array([[0, 1], [1, 0]], dtype=complex)
Y = np.array([[0, -1j], [1j, 0]], dtype=complex)
Z = np.array([[1, 0], [0, -1]], dtype=complex)

kraus_depol = [
    np.sqrt(1 - 3 * p / 4) * I,
    np.sqrt(p / 4) * X,
    np.sqrt(p / 4) * Y,
    np.sqrt(p / 4) * Z,
]

chi = compute_chi_matrix(kraus_depol)
print("Chi matrix for depolarizing channel (p=0.1):")
print(np.round(np.real(chi), 6))
print(f"\nDiagonal entries: chi_II={(1-3*p/4):.6f}, "
      f"chi_XX=chi_YY=chi_ZZ={(p/4):.6f}")

The depolarizing channel has a diagonal chi matrix: chi_II = 1 - 3p/4 and chi_XX = chi_YY = chi_ZZ = p/4. This diagonal structure means the Pauli errors are uncorrelated. Any off-diagonal entries in chi indicate coherent (correlated) error contributions.

Coherent vs. Incoherent Errors

Not all errors are stochastic. Incoherent errors are the random Pauli errors we have discussed so far: each gate application randomly picks a Pauli with some probability. Coherent errors are systematic, deterministic rotations that occur the same way every time, for example a gate that consistently over-rotates by a small angle epsilon.

This distinction matters enormously for error accumulation:

Incoherent errors accumulate statistically: after n gates each with error probability p, the total error is approximately n * p (linear in n, but the state remains close to a Pauli mixture).
Coherent errors accumulate quadratically in fidelity: after n gates each with over-rotation epsilon, the infidelity grows as n^2 * epsilon^2 because the rotations add coherently.

def coherent_overrotation(rho, epsilon):
    """Apply a coherent over-rotation: Rz(pi + epsilon) instead of Rz(pi).

    This is a systematic calibration error, not random noise.
    """
    # Rz(theta) = [[exp(-i*theta/2), 0], [0, exp(i*theta/2)]]
    theta = np.pi + epsilon
    Rz = np.array([[np.exp(-1j * theta / 2), 0],
                    [0, np.exp(1j * theta / 2)]], dtype=complex)
    return Rz @ rho @ Rz.conj().T

# Compare coherent vs incoherent error accumulation
# Start in |+> state
rho_init = np.outer(plus, plus.conj())
epsilon = 0.01  # 0.01 radian over-rotation
p_incoh = epsilon**2 / 2  # roughly equivalent depolarizing probability

print("Coherent vs incoherent error accumulation:")
print(f"{'n gates':>8s}  {'Coherent infidelity':>20s}  {'Incoherent infidelity':>22s}")

for n in [1, 10, 50, 100, 200]:
    # Coherent: apply over-rotation n times
    rho_coh = rho_init.copy()
    for _ in range(n):
        rho_coh = coherent_overrotation(rho_coh, epsilon)

    # Incoherent: apply depolarizing n times
    rho_incoh = rho_init.copy()
    for _ in range(n):
        rho_incoh = depolarizing_channel(rho_incoh, p_incoh)

    # Compute fidelity with initial state
    fid_coh = np.real(plus.conj() @ rho_coh @ plus)
    fid_incoh = np.real(plus.conj() @ rho_incoh @ plus)

    print(f"{n:8d}  {1 - fid_coh:20.6f}  {1 - fid_incoh:22.6f}")

The chi matrix reveals the difference: a purely incoherent (Pauli) channel has a real, diagonal chi matrix. A coherent error has significant off-diagonal elements. Quantum process tomography or gate set tomography can distinguish the two in practice.

This distinction has practical consequences for error correction. QEC codes are designed to correct stochastic Pauli errors efficiently. Coherent errors can be converted into incoherent errors through techniques like randomized compiling (also called Pauli frame randomization), which inserts random Pauli gates that average away the coherent component.

Error Rate and Circuit Fidelity

A crucial practical question is: given a per-gate error rate, how many gates can you run before the output becomes useless?

If each gate independently has fidelity 1 - epsilon, and your circuit has n gates, the overall circuit fidelity is approximately:

F = (1 - epsilon)^n ≈ exp(-n * epsilon)  for small epsilon

The circuit “breaks even” (fidelity drops to 1/e ≈ 0.368) when n = 1/epsilon.

print("Circuit fidelity F = (1 - epsilon)^n")
print(f"{'epsilon':>10s}  {'n=10':>10s}  {'n=100':>10s}  {'n=1000':>10s}  "
      f"{'n=10000':>10s}  {'1/e gate count':>15s}")
print("-" * 75)

for epsilon in [1e-2, 1e-3, 1e-4, 1e-5]:
    fidelities = [(1 - epsilon)**n for n in [10, 100, 1000, 10000]]
    breakeven = int(1 / epsilon)
    print(f"{epsilon:10.0e}  " +
          "  ".join(f"{f:10.4f}" for f in fidelities) +
          f"  {breakeven:15,d}")

Per-gate error	10 gates	100 gates	1,000 gates	10,000 gates	Break-even (1/e)
1e-2	0.904	0.366	0.000	0.000	100
1e-3	0.990	0.905	0.368	0.000	1,000
1e-4	0.999	0.990	0.905	0.368	10,000
1e-5	1.000	0.999	0.990	0.905	100,000

Current hardware gate error rates of approximately 0.1% (1e-3) give a break-even of about 1,000 gates. This is why circuits deeper than a few hundred gates on current devices tend to produce noise-dominated outputs, and why quantum error correction is necessary for useful large-scale computation.

Pauli Channels and Pauli Twirling

Not every channel is a Pauli channel (amplitude damping, for example, is not, because it shifts the Bloch sphere rather than just shrinking it). But there is a standard procedure, called Pauli twirling, that converts any CPTP map into a Pauli channel: conjugate the channel by a Pauli operator chosen uniformly at random before and after each application, and average over the choices. The twirled channel is:

E_twirl(rho) = p0 * rho + p1 * X rho X + p2 * Y rho Y + p3 * Z rho Z

where the probabilities p0, p1, p2, p3 are exactly the diagonal entries of the original channel’s chi matrix in the Pauli basis.

This is why Pauli noise models are so widely used: twirling (implemented in practice as randomized compiling) reduces arbitrary single-qubit noise to a Pauli channel with the same average fidelity, and the Pauli error probabilities can be read off the diagonal of chi.

def pauli_channel(rho, p0, p1, p2, p3):
    """General Pauli channel: E(rho) = p0*rho + p1*X*rho*X + p2*Y*rho*Y + p3*Z*rho*Z.

    Parameters must satisfy p0 + p1 + p2 + p3 = 1 and all pi >= 0.
    """
    assert abs(p0 + p1 + p2 + p3 - 1.0) < 1e-10, "Probabilities must sum to 1"
    X = np.array([[0, 1], [1, 0]], dtype=complex)
    Y = np.array([[0, -1j], [1j, 0]], dtype=complex)
    Z = np.array([[1, 0], [0, -1]], dtype=complex)

    return (p0 * rho +
            p1 * X @ rho @ X +
            p2 * Y @ rho @ Y +
            p3 * Z @ rho @ Z)

# The bit flip channel is a Pauli channel with p0=1-p, p1=p, p2=p3=0
# The depolarizing channel is a Pauli channel with p0=1-3p/4, p1=p2=p3=p/4
p = 0.1
rho_test = np.array([[0.7, 0.3+0.1j], [0.3-0.1j, 0.3]], dtype=complex)

rho_depol = depolarizing_channel(rho_test, p)
rho_pauli = pauli_channel(rho_test, 1 - 3*p/4, p/4, p/4, p/4)

print("Depolarizing matches Pauli channel:",
      np.allclose(rho_depol, rho_pauli))

How These Channels Map to Real Hardware

Error model	Physical source	Typical error parameter
Bit flip	Control errors, crosstalk	Gate error ~0.1-1% per 2Q gate
Phase flip	Charge noise, flux noise, dephasing (T2)	T2 ~10-200 us
Depolarizing	Gate imperfections (model approximation)	~0.1-0.5% per single-qubit gate
Amplitude damping	Spontaneous emission, T1 relaxation	T1 ~50-500 us
Generalized amp. damping	T1 decay + thermal excitation	T1 ~50-500 us, p_eq ~0.001-0.01
Dephasing	Pure dephasing (T_phi) component	T2 <= 2*T1 always

In practice, noise on real hardware is not well described by any single pure channel. Calibration data is fit to a combination of depolarizing plus amplitude damping, or characterized directly via process tomography.

Simulating with Qiskit Aer Noise Models

The Qiskit Aer simulator provides a NoiseModel class that lets you attach error channels to specific gates. Here is a complete example that builds a realistic noise model and compares noisy vs. noiseless Bell state preparation:

from qiskit import QuantumCircuit
from qiskit_aer import AerSimulator
from qiskit_aer.noise import (
    NoiseModel,
    depolarizing_error,
    amplitude_damping_error,
    pauli_error,
)

# Build a noise model
noise_model = NoiseModel()

# Bit flip on single-qubit gates (0.1% chance)
bit_flip = pauli_error([("X", 0.001), ("I", 0.999)])
noise_model.add_all_qubit_quantum_error(bit_flip, ["x", "h", "rz", "sx"])

# Depolarizing error on two-qubit gates (1% total error)
depol_2q = depolarizing_error(0.01, 2)
noise_model.add_all_qubit_quantum_error(depol_2q, ["cx"])

# Amplitude damping on all qubits during idle (gamma ~ 5e-4)
amp_damp = amplitude_damping_error(5e-4)
noise_model.add_all_qubit_quantum_error(amp_damp, ["id", "delay"])

print("Noise model:")
print(noise_model)

# Build a Bell state circuit
qc = QuantumCircuit(2, 2)
qc.h(0)        # Create superposition on qubit 0
qc.cx(0, 1)    # Entangle qubits 0 and 1
qc.measure([0, 1], [0, 1])

shots = 10_000

# Noiseless simulation
sim_ideal = AerSimulator()
result_ideal = sim_ideal.run(qc, shots=shots).result()
counts_ideal = result_ideal.get_counts()

# Noisy simulation
sim_noisy = AerSimulator(noise_model=noise_model)
result_noisy = sim_noisy.run(qc, shots=shots).result()
counts_noisy = result_noisy.get_counts()

print(f"\nBell state results ({shots} shots):")
print(f"  Ideal:  {counts_ideal}")
print(f"  Noisy:  {counts_noisy}")

# In the ideal case, you see only '00' and '11' (50/50).
# With noise, '01' and '10' appear at a rate proportional to
# the error probabilities.
for outcome in ["00", "01", "10", "11"]:
    ideal_frac = counts_ideal.get(outcome, 0) / shots
    noisy_frac = counts_noisy.get(outcome, 0) / shots
    print(f"  P({outcome}): ideal={ideal_frac:.3f}, noisy={noisy_frac:.3f}")

The noisy simulation shows a small but measurable leakage from the expected {00, 11} outcomes into {01, 10}, which is exactly what the bit flip and depolarizing errors predict.

Practical Noise Characterization Workflow

In a real lab or on cloud hardware, you characterize noise through a sequence of increasingly detailed protocols:

1. Randomized Benchmarking (RB) gives you the average gate fidelity. It works by applying random sequences of Clifford gates of increasing length, then measuring how quickly the survival probability decays. The decay rate gives a single number: the error per Clifford (EPC). RB is insensitive to state preparation and measurement (SPAM) errors, making it the gold standard for gate fidelity.

2. Gate Set Tomography (GST) reconstructs the full process matrix for each gate. It is much more expensive than RB (requiring many more circuits) but gives complete information about coherent and incoherent errors, including over-rotations, axis tilts, and correlated errors.

3. Cross-Entropy Benchmarking (XEB) characterizes system-level performance by running random circuits and comparing the output distribution to the ideal. This is the protocol Google used to demonstrate quantum computational advantage with Sycamore. XEB gives a single fidelity number for the entire circuit, not individual gates.

You can access real device noise data in Qiskit using fake backends:

from qiskit_ibm_runtime.fake_provider import FakeManilaV2

# Load a fake backend with calibration data from a real device
backend = FakeManilaV2()

# The backend contains real T1, T2, gate error, and readout error data
properties = backend.properties()
if properties is not None:
    for qubit in range(backend.num_qubits):
        t1 = properties.t1(qubit) * 1e6  # convert to microseconds
        t2 = properties.t2(qubit) * 1e6
        print(f"Qubit {qubit}: T1 = {t1:.1f} us, T2 = {t2:.1f} us")

# You can also extract a noise model directly from the backend
noise_model_real = NoiseModel.from_backend(backend)
print(f"\nNoise model from FakeManila: {noise_model_real}")

Common Mistakes

These five pitfalls catch both beginners and experienced practitioners:

1. Confusing T1 and T2. T1 is the energy relaxation time (amplitude damping). T2 is the total dephasing time, which includes contributions from both T1 decay and pure dephasing (T_phi). The fundamental constraint is T2 <= 2 * T1, always. When T2 = 2 * T1, there is no pure dephasing; all decoherence comes from energy relaxation. When T2 < 2 * T1, the difference is accounted for by the pure dephasing rate: 1/T2 = 1/(2*T1) + 1/T_phi. If your noise model has T2 > 2*T1, it is unphysical.

2. Applying amplitude damping in the wrong direction. The K1 Kraus operator for amplitude damping is [[0, sqrt(gamma)], [0, 0]]. This maps |1> to |0> (the lowering operator). A common mistake is writing K1 as [[0, 0], [sqrt(gamma), 0]], which would map |0> to |1> and represents excitation, not decay. Always check: does your channel drive the Bloch vector toward +z (|0>) or -z (|1>)?

3. Improper normalization of Kraus operators. The completeness relation sum_k K_k^dag K_k = I must hold exactly. For the depolarizing channel, this means the coefficients are sqrt(1 - 3p/4) for the identity and sqrt(p/4) for each Pauli, not 1 - 3p/4 and p/4. Forgetting the square root is a frequent source of bugs that produces non-physical density matrices with trace not equal to 1.

4. Confusing process fidelity and state fidelity. The average gate fidelity F_avg (what randomized benchmarking measures) and the process fidelity F_pro (the overlap of the process matrix with the ideal) are related but different: F_avg = (d * F_pro + 1) / (d + 1) where d is the Hilbert space dimension. For a single qubit (d=2), if F_pro = 0.99, then F_avg = (2 * 0.99 + 1) / 3 = 0.9933. Always specify which fidelity you are reporting.

5. Forgetting that noise is time-dependent. The amplitude damping parameter gamma = 1 - exp(-t/T1) depends on the gate duration t. A slow gate (e.g., 200 ns for a CX) accumulates more amplitude damping than a fast gate (e.g., 30 ns for a single-qubit gate). Similarly, idle qubits accumulate dephasing. Noise models that assign a fixed error rate per gate without accounting for gate duration are approximations that can significantly underestimate errors in circuits with many idle periods.

What Error Correction Targets

Quantum error correction (QEC) works by encoding one logical qubit into many physical qubits and detecting which error occurred without measuring the qubit’s actual value.

The three-qubit bit flip code detects and corrects single bit flip errors. The three-qubit phase flip code does the same for phase errors. The nine-qubit Shor code combines both and corrects any single-qubit error from the full set {I, X, Y, Z}.

The reason this is possible is that the four Pauli operators span the space of all single-qubit errors. If a quantum code can detect and correct X and Z independently, it corrects all single-qubit errors, including amplitude damping, because any physical error can be decomposed into a linear combination of Pauli operators.

Understanding the error models is the prerequisite for understanding why QEC codes are structured the way they are: each code is engineered to detect the specific syndrome signatures that distinguish I, X, Z, and Y on each physical qubit.

Why Density Matrices?

Purity: Distinguishing Pure from Mixed States

What Is a Quantum Channel?

The Bit Flip Channel

Bloch sphere picture

The Phase Flip Channel

Bloch sphere picture

The Dephasing Channel and T2 Time

The Depolarizing Channel

Bloch sphere picture

Amplitude Damping

Bloch sphere picture

Computing the Bloch Vector from a Density Matrix

Thermal Relaxation: Generalized Amplitude Damping

Two-Qubit Depolarizing Channel

Process Matrix (Chi Matrix) Representation

Coherent vs. Incoherent Errors

Error Rate and Circuit Fidelity

Pauli Channels and Pauli Twirling

How These Channels Map to Real Hardware

Simulating with Qiskit Aer Noise Models

Practical Noise Characterization Workflow

Common Mistakes

What Error Correction Targets

Related Tutorials

Get one quantum email a week