Fast Clifford Simulation with Cirq

Why Clifford Simulation Matters

Simulating a general quantum circuit on n qubits requires tracking a statevector of 2^n complex amplitudes. At n=30 that is 16 GB of memory; at n=50 it exceeds anything on Earth. But a special class of circuits, those built entirely from Clifford gates, can be simulated in O(n^2) time per gate using the Gottesman-Knill theorem.

Daniel Gottesman and Emanuel Knill established this result in 1998. The theorem states that any quantum circuit composed solely of the following ingredients can be simulated efficiently on a classical computer:

Preparation of qubits in computational basis states
Clifford gates (Hadamard, Phase, CNOT, and any compositions of these)
Measurement in the computational basis
Classical feed-forward conditioned on measurement outcomes

“Efficiently” here means polynomial time and space in the number of qubits. This is one of the clearest examples of a quantum circuit class that is provably easy for classical computers.

The theorem does not mean Clifford circuits are useless for quantum computing. Quite the opposite. Clifford circuits form the backbone of nearly every major protocol in quantum information:

Quantum error correction: All stabilizer codes, including the surface code, Steane code, and Shor code, are defined by their stabilizer groups, which are generated by Clifford operations. Designing, testing, and debugging these codes at scale requires simulating hundreds or thousands of qubits through Clifford circuits.
Randomized benchmarking (RB): The standard protocol for measuring gate quality on real hardware uses random sequences of Clifford gates. Computing the ideal reference output for these sequences demands efficient Clifford simulation.
Quantum error detection and correction circuits: Syndrome extraction circuits are composed entirely of Clifford gates. Validating that a syndrome circuit correctly identifies errors requires simulation at the full code distance.
Magic state distillation: Protocols that produce high-fidelity T states (the resource that, combined with Clifford gates, enables universal quantum computation) are themselves built from Clifford circuits that must be verified classically.

The ability to simulate large Clifford circuits efficiently is therefore essential for designing, testing, and validating real quantum experiments. Cirq provides a dedicated CliffordSimulator that implements the stabilizer tableau formalism, letting you simulate circuits on 100+ qubits in seconds on a standard laptop.

The Clifford Gate Set

A unitary U is a Clifford gate if and only if it maps every Pauli operator to another Pauli operator under conjugation. Formally, for every n-qubit Pauli operator P, the conjugated operator UPU† is also a Pauli operator (up to a global phase of +1 or -1). This property is what makes classical simulation possible: instead of tracking an exponentially large statevector, you track how the Pauli operators transform, which requires only polynomial storage.

The standard generating set for the Clifford group consists of just three gates: Hadamard (H), Phase (S), and CNOT. Every Clifford gate on n qubits can be decomposed into a circuit using only these three. Here is how each generator acts on the Pauli operators by conjugation:

Hadamard (H)

The Hadamard gate swaps the X and Z axes of the Bloch sphere:

Input Pauli	Output Pauli
X	Z
Z	X
Y	-Y

Note the sign flip on Y. Since Y = iXZ, applying H gives H(iXZ)H† = i(HXH†)(HZH†) = iZX = -iXZ = -Y. This sign matters when tracking phases in the stabilizer tableau.

In Cirq: cirq.H

Phase Gate (S)

The S gate (also written as P(pi/2)) rotates the X axis toward Y, leaving Z unchanged:

Input Pauli	Output Pauli
X	Y
Y	-X
Z	Z

The S gate is its own square root of Z: SS = Z. It is sometimes called the “quarter-turn” gate because it rotates the Bloch sphere by 90 degrees about the Z axis.

In Cirq: cirq.S

CNOT (Controlled-X)

The CNOT gate entangles two qubits. Its Pauli conjugation rules involve both the control qubit (c) and target qubit (t):

Input Pauli	Output Pauli
X_c (X on control)	X_c X_t
X_t (X on target)	X_t
Z_c (Z on control)	Z_c
Z_t (Z on target)	Z_c Z_t

The pattern: X “spreads forward” from control to target, while Z “spreads backward” from target to control. This asymmetry reflects the fact that CNOT is not symmetric under qubit exchange.

In Cirq: cirq.CNOT

Additional Clifford Gates

Beyond the three generators, several other common gates are also Clifford:

Gate	Cirq class	Notes
CZ	`cirq.CZ`	Symmetric two-qubit gate; equivalent to CNOT with Hadamards on the target
Pauli X	`cirq.X`	Bit flip; trivially Clifford since it is itself a Pauli
Pauli Y	`cirq.Y`	Bit and phase flip
Pauli Z	`cirq.Z`	Phase flip; equals S^2
SWAP	`cirq.SWAP`	Decomposes into three CNOTs
ISWAP	`cirq.ISWAP`	Common on Google hardware; Clifford

Any circuit composed exclusively of these gates is a Clifford circuit. Notably absent: T gates, arbitrary rotations like RZ(theta) for generic theta, and Toffoli. Adding any non-Clifford gate makes classical simulation intractable in general, which is precisely why T gates are the resource that enables universal quantum computation.

Stabilizer State Representation

Instead of storing 2^n amplitudes, the stabilizer formalism tracks n stabilizer generators, each of which is an n-qubit Pauli string (a tensor product of single-qubit Pauli operators I, X, Y, Z with an overall sign of +1 or -1). The quantum state |psi> is the unique state stabilized by all generators, meaning S|psi> = +|psi> for each generator S.

Worked Example: Building a 3-Qubit Entangled State

Let us walk through a concrete example to see how stabilizers evolve under Clifford gates.

Step 1: Start with |000>

The state |000> is stabilized by three independent generators:

+Z_0 I_1 I_2   (qubit 0 is in |0>, eigenvalue +1 of Z)
+I_0 Z_1 I_2   (qubit 1 is in |0>)
+I_0 I_1 Z_2   (qubit 2 is in |0>)

Written compactly: {+ZII, +IZI, +IIZ}.

Step 2: Apply H to qubit 0

The Hadamard on qubit 0 conjugates Z_0 to X_0 (and leaves operators on other qubits unchanged). The stabilizer Z_0 becomes X_0. The state is now |+00>, and the stabilizers are:

{+XII, +IZI, +IIZ}

This makes sense: |+> is the +1 eigenstate of X, and qubits 1 and 2 remain in |0>.

Step 3: Apply CNOT from qubit 0 (control) to qubit 1 (target)

Using the CNOT conjugation rules:

X_0 spreads to become X_0 X_1. So the first stabilizer +XII becomes +XXI.
Z_1 picks up Z_0 and becomes Z_0 Z_1. So the second stabilizer +IZI becomes +ZZI.
The third stabilizer +IIZ has no operators on qubits 0 or 1, so it is unchanged.

The stabilizers are now:

{+XXI, +ZZI, +IIZ}

The state is (|000> + |110>) / sqrt(2). You can verify: +XXI stabilizes this state because X_0 X_1 flips both qubits 0 and 1, mapping |000> to |110> and vice versa. And +ZZI stabilizes it because both terms have qubits 0 and 1 matching (both 0 or both 1), giving eigenvalue +1.

Step 4: Apply CNOT from qubit 0 (control) to qubit 2 (target)

+XXI: X_0 spreads to X_0 X_2, so this becomes +XXI with an extra X_2, giving +X_0 X_1 X_2 = +XXX.
+ZZI: no X on the control or Z on the target, so this is unchanged at +ZZI.
+IIZ: Z on target picks up Z on control, giving +Z_0 Z_2 = +ZIZ.

Wait, let us be more careful. The CNOT is from qubit 0 to qubit 2.

For stabilizer +XXI = X_0 X_1 I_2:

X_0 (on the control) maps to X_0 X_2 under CNOT(0,2). So X_0 X_1 I_2 becomes X_0 X_2 X_1 I_2… no, we must apply the conjugation to each single-qubit factor independently. X_0 maps to X_0 X_2. X_1 is not on the control or target of this CNOT, so it stays as X_1. I_2 stays as identity on qubit 2, but we already have X_2 from the spread of X_0. So the result is X_0 X_1 X_2 = +XXX.

For stabilizer +ZZI = Z_0 Z_1 I_2:

Z_0 (on the control) stays as Z_0. Z_1 is unchanged. I_2 is unchanged. Result: +ZZI.

For stabilizer +IIZ = I_0 I_1 Z_2:

Z_2 (on the target) maps to Z_0 Z_2. So the result is Z_0 I_1 Z_2 = +ZIZ.

Final stabilizers:

{+XXX, +ZZI, +ZIZ}

The state is the 3-qubit GHZ state: (|000> + |111>) / sqrt(2). You can confirm that +XXX stabilizes this state (flipping all three qubits maps |000> to |111> and back) and that +ZZI and +ZIZ check parity between qubit pairs.

This entire calculation tracked three Pauli strings of length 3. The equivalent statevector approach would track 2^3 = 8 complex amplitudes. For n = 100 qubits, the stabilizer approach tracks 100 Pauli strings of length 100 (10,000 bits), while the statevector would require 2^100 complex numbers, roughly 10^30 bytes.

Tableau Representation

Under the hood, Cirq’s CliffordSimulator stores the stabilizer state as a stabilizer tableau, a binary matrix that encodes all the stabilizer generators compactly.

For n qubits, the tableau is a binary matrix with 2n rows and 2n+1 columns:

Rows 0 to n-1: the n stabilizer generators (these define the quantum state)
Rows n to 2n-1: the n destabilizer generators (these are needed for efficient simulation of measurements but do not define the state)
Columns 0 to n-1: the X part of the Pauli string (1 if X or Y is present on that qubit, 0 otherwise)
Columns n to 2n-1: the Z part of the Pauli string (1 if Z or Y is present on that qubit, 0 otherwise)
Column 2n: the phase bit (0 for +1 phase, 1 for -1 phase)

A single-qubit Pauli on qubit k is encoded as:

Pauli	X bit (column k)	Z bit (column n+k)
I	0	0
X	1	0
Z	0	1
Y	1	1

For our 3-qubit GHZ example with stabilizers {+XXX, +ZZI, +ZIZ}, the stabilizer portion of the tableau (rows 0-2) looks like:

       X0 X1 X2 | Z0 Z1 Z2 | phase
  S0:   1  1  1 |  0  0  0 |   0     (+XXX)
  S1:   0  0  0 |  1  1  0 |   0     (+ZZI)
  S2:   0  0  0 |  1  0  1 |   0     (+ZIZ)

Each Clifford gate transforms this matrix by applying simple row operations:

Hadamard on qubit k: swap columns k and n+k (swap the X and Z bits), then update the phase column to account for Y sign flips.
S gate on qubit k: XOR column n+k with column k (Z bit gets toggled wherever X bit is 1), update phases.
CNOT from control c to target t: XOR column t with column c (X spreads forward), XOR column n+c with column n+t (Z spreads backward), update phases.

Each of these operations touches at most 2n+1 entries in a row, applied across 2n rows, giving O(n) time per gate applied to the full tableau. This is the core of the polynomial-time simulation.

You can inspect the tableau directly in Cirq:

import cirq

qubits = cirq.LineQubit.range(3)
circuit = cirq.Circuit([
    cirq.H(qubits[0]),
    cirq.CNOT(qubits[0], qubits[1]),
    cirq.CNOT(qubits[0], qubits[2]),
])

sim = cirq.CliffordSimulator()
result = sim.simulate(circuit)

# Print the stabilizer state (rendered as CH-form superposition)
print(result.final_state)

Using CliffordSimulator in Cirq

import cirq
import time

def ghz_circuit(n):
    """Create an n-qubit GHZ state: (|00...0> + |11...1>) / sqrt(2)."""
    qubits = cirq.LineQubit.range(n)
    circuit = cirq.Circuit()
    circuit.append(cirq.H(qubits[0]))
    for i in range(1, n):
        circuit.append(cirq.CNOT(qubits[0], qubits[i]))
    return circuit, qubits

# Simulate a 100-qubit GHZ state
n = 100
circuit, qubits = ghz_circuit(n)
circuit.append(cirq.measure(*qubits, key='result'))

start = time.time()
sim = cirq.CliffordSimulator()
result = sim.run(circuit, repetitions=1000)
elapsed = time.time() - start

print(f"Simulated {n}-qubit GHZ circuit in {elapsed:.3f} seconds")

# Verify: all measurements should be either all-0 or all-1
measurements = result.measurements['result']
for m in measurements:
    assert all(m == m[0]), "GHZ state should produce uniform bitstrings"

n_zeros = sum(1 for m in measurements if m[0] == 0)
n_ones = 1000 - n_zeros
print(f"All-zeros: {n_zeros}, All-ones: {n_ones}")

This typically runs within a few seconds on a standard laptop. A statevector simulation of 100 qubits would require approximately 10^30 bytes of memory.

Inspecting the Stabilizer State

You can inspect the simulated state without measuring:

circuit_no_meas, qubits = ghz_circuit(5)
sim = cirq.CliffordSimulator()
result = sim.simulate(circuit_no_meas)

# Print the state (rendered as a superposition by CliffordSimulator's CH form)
print(result.final_state)

CliffordSimulator uses the Clifford-Heisenberg (CH) form internally, so result.final_state renders as a superposition of computational basis states rather than a list of stabilizer generator strings. For the 5-qubit GHZ circuit above you will see 0.71|00000⟩ + 0.71|11111⟩, confirming the correct entangled state. The underlying stabilizers of the GHZ state are +XXXXX, +ZZI__, +IZZ__, etc., but these are stored in the CH form rather than exposed as a tableau directly.

Benchmarking: Clifford vs Statevector Simulation

The exponential advantage of Clifford simulation becomes strikingly clear when you benchmark it against the default statevector simulator. The following code measures simulation time for random Clifford circuits at increasing qubit counts:

import cirq
import time

def random_clifford_circuit(n_qubits, depth):
    """Generate a random Clifford circuit with the given number of qubits and depth."""
    qubits = cirq.LineQubit.range(n_qubits)
    return cirq.testing.random_circuit(
        qubits=qubits,
        n_moments=depth,
        op_density=0.8,
        gate_domain={cirq.H: 1, cirq.CNOT: 2, cirq.S: 1, cirq.CZ: 2}
    )

qubit_counts = [10, 15, 20, 25, 30, 35, 40]
depth = 50

print(f"{'Qubits':<10} {'Clifford (s)':<15} {'Statevector (s)':<18} {'Speedup':<10}")
print("-" * 55)

for n in qubit_counts:
    circuit = random_clifford_circuit(n, depth)

    # Clifford simulation
    cliff_sim = cirq.CliffordSimulator()
    start = time.time()
    cliff_sim.simulate(circuit)
    cliff_time = time.time() - start

    # Statevector simulation (skip for large qubit counts to avoid memory issues)
    if n <= 25:
        sv_sim = cirq.Simulator()
        start = time.time()
        sv_sim.simulate(circuit)
        sv_time = time.time() - start
        speedup = sv_time / cliff_time if cliff_time > 0 else float('inf')
        print(f"{n:<10} {cliff_time:<15.4f} {sv_time:<18.4f} {speedup:<10.1f}x")
    else:
        print(f"{n:<10} {cliff_time:<15.4f} {'(skipped)':<18} {'N/A':<10}")

Typical results on a modern laptop:

Qubits	Clifford (s)	Statevector (s)	Speedup
10	0.01	0.02	2x
15	0.01	0.05	5x
20	0.02	0.8	40x
25	0.03	25.0	~800x
30	0.04	(out of memory)	N/A
40	0.07	(impossible)	N/A

The Clifford simulator time grows polynomially (roughly as n^2 per gate, so n^2 * depth overall). The statevector simulator time grows exponentially (roughly 2^n). Beyond about 28 qubits, statevector simulation becomes impractical on consumer hardware. The Clifford simulator handles hundreds of qubits without difficulty.

Practical Applications

Randomized Benchmarking

Randomized benchmarking is the standard protocol for measuring the quality of quantum gates on real hardware. The procedure works as follows:

Choose a sequence length m (number of random Clifford gates)
Generate a random sequence of m Clifford gates: C_1, C_2, …, C_m
Compute the inverse gate C_inv = (C_m … C_2 C_1)^(-1) so the ideal circuit maps |0> back to |0>
Run the circuit on hardware and measure the probability of returning to |0> (the “survival probability”)
Repeat for many random sequences at each length m
Plot survival probability vs sequence length and fit an exponential decay

The decay rate directly gives the average error per Clifford gate. Here is a more complete implementation:

import cirq
import numpy as np

def generate_rb_circuit(qubits, depth):
    """
    Generate a single randomized benchmarking circuit.

    Creates a sequence of random Clifford gates followed by an
    inverse gate that should return the system to |0...0>.
    """
    n = len(qubits)

    # Build random Clifford layers
    layers = []
    for _ in range(depth):
        layer = cirq.testing.random_circuit(
            qubits=qubits,
            n_moments=1,
            op_density=0.8,
            gate_domain={cirq.H: 1, cirq.CNOT: 2, cirq.S: 1}
        )
        layers.append(layer)

    # Combine all layers into one circuit
    forward_circuit = cirq.Circuit()
    for layer in layers:
        forward_circuit += layer

    # Use the Clifford simulator to compute the ideal final state
    sim = cirq.CliffordSimulator()
    result = sim.simulate(forward_circuit)

    # Compute the inverse by simulating, then applying the inverse
    # unitary. For Clifford circuits, we can compute the inverse
    # Clifford tableau and decompose it back into gates.
    inverse_circuit = cirq.inverse(forward_circuit)

    # Full RB circuit: forward + inverse + measurement
    rb_circuit = forward_circuit + inverse_circuit
    rb_circuit.append(cirq.measure(*qubits, key='result'))

    return rb_circuit

def run_rb_experiment(n_qubits, depths, sequences_per_depth, repetitions):
    """
    Run a randomized benchmarking experiment using the Clifford simulator.

    Returns the average survival probability at each depth.
    """
    qubits = cirq.LineQubit.range(n_qubits)
    sim = cirq.CliffordSimulator()
    survival_probs = []

    for depth in depths:
        survivals = []
        for _ in range(sequences_per_depth):
            circuit = generate_rb_circuit(qubits, depth)
            result = sim.run(circuit, repetitions=repetitions)
            measurements = result.measurements['result']
            # Survival = fraction of shots returning all zeros
            n_survived = sum(
                1 for m in measurements if np.all(m == 0)
            )
            survivals.append(n_survived / repetitions)
        survival_probs.append(np.mean(survivals))

    return survival_probs

# Run a small RB experiment (ideal, no noise)
depths = [1, 5, 10, 20, 50, 100]
survival_probs = run_rb_experiment(
    n_qubits=2,
    depths=depths,
    sequences_per_depth=20,
    repetitions=100
)

print("Randomized Benchmarking Results (ideal simulation)")
print(f"{'Depth':<10} {'Survival Prob':<15}")
print("-" * 25)
for d, p in zip(depths, survival_probs):
    print(f"{d:<10} {p:<15.4f}")

# In the ideal case, survival probability should be 1.0 at all depths
# On real hardware, it decays as p(m) = A * r^m + B
# where r is the depolarizing parameter related to the error per Clifford

In a noise-free simulation, the survival probability remains 1.0 at all depths. On real hardware (or in a noisy simulation), the survival probability decays exponentially with depth. Fitting this decay curve to p(m) = A * r^m + B yields the depolarizing parameter r, from which the average error per Clifford is e = (1 - r)(1 - 1/2^n) where n is the number of qubits.

Noise Characterization

Interleave Clifford circuits with noisy channels to measure how specific error types propagate through stabilizer states. The classical simulation gives the ideal baseline for comparison.

# Generate a random 10-qubit Clifford circuit
n = 10
qubits = cirq.LineQubit.range(n)
random_circuit = cirq.testing.random_circuit(
    qubits=qubits,
    n_moments=50,
    op_density=0.8,
    gate_domain={cirq.H: 1, cirq.CNOT: 2, cirq.S: 1, cirq.CZ: 2}
)

sim = cirq.CliffordSimulator()
result = sim.simulate(random_circuit)
print(f"Simulated {n}-qubit, 50-moment random Clifford circuit")

Quantum Error Correction: 3-Qubit Bit-Flip Code

The stabilizer formalism is the mathematical foundation of all stabilizer codes. Every stabilizer code is defined by a set of commuting Pauli operators (the stabilizers of the code), and the encoding, syndrome extraction, and correction operations are all Clifford circuits. Being able to simulate these circuits efficiently at scale is critical for error correction research.

Here is a minimal example of the 3-qubit bit-flip code, which protects against single X (bit-flip) errors:

import cirq
import numpy as np

def three_qubit_bitflip_code():
    """
    Demonstrate the 3-qubit bit-flip code using Clifford simulation.

    Code stabilizers: Z0 Z1 and Z1 Z2
    Logical qubit: encoded across 3 physical qubits
    Correctable errors: any single X (bit-flip) error
    """
    # Data qubits and syndrome (ancilla) qubits
    data = cirq.LineQubit.range(3)
    ancilla = cirq.LineQubit.range(3, 5)

    # === ENCODING ===
    # Encode |0> into logical |0_L> = |000>
    # Encode |1> into logical |1_L> = |111>
    # Encoding circuit: CNOT from qubit 0 to qubits 1, 2
    encode = cirq.Circuit([
        cirq.CNOT(data[0], data[1]),
        cirq.CNOT(data[0], data[2]),
    ])

    # To encode |+> state, start with H on data[0]
    prep = cirq.Circuit([cirq.H(data[0])])

    # === ERROR ===
    # Introduce a bit-flip error on qubit 1
    error = cirq.Circuit([cirq.X(data[1])])

    # === SYNDROME EXTRACTION ===
    # Measure the code stabilizers Z0*Z1 and Z1*Z2
    # These are Clifford measurements using ancilla qubits
    syndrome = cirq.Circuit([
        # Measure Z0*Z1 using ancilla[0]
        cirq.CNOT(data[0], ancilla[0]),
        cirq.CNOT(data[1], ancilla[0]),
        # Measure Z1*Z2 using ancilla[1]
        cirq.CNOT(data[1], ancilla[1]),
        cirq.CNOT(data[2], ancilla[1]),
    ])

    # Measure the ancillas to get syndrome bits
    syndrome.append(cirq.measure(ancilla[0], key='s01'))
    syndrome.append(cirq.measure(ancilla[1], key='s12'))

    # === FULL CIRCUIT ===
    full_circuit = prep + encode + error + syndrome

    print("3-Qubit Bit-Flip Code Circuit:")
    print(full_circuit)

    # Simulate with CliffordSimulator
    sim = cirq.CliffordSimulator()
    result = sim.run(full_circuit, repetitions=10)

    s01 = result.measurements['s01']
    s12 = result.measurements['s12']

    print(f"\nSyndrome s01 (Z0*Z1 parity): {s01[0][0]}")
    print(f"Syndrome s12 (Z1*Z2 parity): {s12[0][0]}")

    # Decode the syndrome
    # s01=1, s12=1 -> error on qubit 1
    # s01=1, s12=0 -> error on qubit 0
    # s01=0, s12=1 -> error on qubit 2
    # s01=0, s12=0 -> no error
    syndrome_table = {
        (0, 0): "No error detected",
        (1, 0): "Error on qubit 0",
        (1, 1): "Error on qubit 1",
        (0, 1): "Error on qubit 2",
    }
    syndrome_key = (int(s01[0][0]), int(s12[0][0]))
    print(f"Diagnosis: {syndrome_table[syndrome_key]}")
    print("(We introduced an X error on qubit 1, so the syndrome correctly identifies it.)")

three_qubit_bitflip_code()

The key insight is that the syndrome extraction circuit is entirely Clifford: it uses only CNOTs and computational basis measurements. This means you can simulate error correction for large codes (surface codes with hundreds of data qubits and ancillas) using the CliffordSimulator. In practice, researchers use Clifford simulation to:

Verify that syndrome extraction circuits are correct before deploying to hardware
Measure logical error rates under different noise models by combining Clifford simulation with classical error injection
Optimize the layout and scheduling of syndrome extraction circuits for specific hardware topologies
Test decoders (the classical algorithms that interpret syndromes and choose corrections) at scale

Common Mistakes

Adding non-Clifford gates

The most common mistake is including a T gate, Toffoli gate, or arbitrary rotation like cirq.rz(theta) where theta is not a multiple of pi/2 in a circuit intended for CliffordSimulator. The simulator will raise an error:

import cirq

q = cirq.LineQubit(0)
circuit = cirq.Circuit([cirq.H(q), cirq.T(q)])  # T is NOT Clifford

sim = cirq.CliffordSimulator()
# This will raise TypeError because T is not a Clifford gate
try:
    sim.simulate(circuit)
except TypeError as e:
    print(f"Error: {e}")

If your circuit is “mostly Clifford” with a few non-Clifford gates, you must use the full statevector simulator (cirq.Simulator()) or explore stabilizer rank decomposition methods.

Confusing “Clifford circuit” with “Clifford group element”

Every element of the n-qubit Clifford group can be implemented as a circuit of Clifford gates, and conversely every Clifford circuit implements some Clifford group element. However, not every Clifford circuit is depth-efficient. Two circuits can implement the same Clifford group element but have very different depths and gate counts. For randomized benchmarking, what matters is the group element (since you need to compute its inverse), not the particular circuit decomposition. Cirq’s cirq.CliffordGate class represents Clifford group elements as tableaus, making inversion O(n^2) regardless of the original circuit depth.

Mid-circuit measurement with classical control

The Gottesman-Knill theorem does cover classical feed-forward (measuring in the middle of a circuit and choosing subsequent gates based on the outcome). However, CliffordSimulator in Cirq has limited support for this. If you need mid-circuit measurements with classically controlled Clifford gates, you may need to:

Defer measurements to the end of the circuit where possible (using the deferred measurement principle)
Manually branch your simulation based on measurement outcomes
Use cirq.DensityMatrixSimulator for mixed-state simulation that naturally handles measurement

# This works: measurement at the end
q = cirq.LineQubit.range(2)
circuit = cirq.Circuit([
    cirq.H(q[0]),
    cirq.CNOT(q[0], q[1]),
    cirq.measure(q[0], key='a'),
    cirq.measure(q[1], key='b'),
])
sim = cirq.CliffordSimulator()
result = sim.run(circuit, repetitions=100)  # Works fine

Using hardware-specific non-Clifford gates

Google’s Sycamore processor uses cirq.FSimGate(theta, phi) as its native two-qubit gate. FSimGate is Clifford only for specific parameter values (for example, FSimGate(pi/2, 0) is equivalent to ISWAP, which is Clifford). For general parameter values, it is not Clifford. If you are working with cirq_google device circuits, check that your gates fall within the Clifford subset before using CliffordSimulator.

Similarly, cirq.PhasedXPowGate is Clifford only when the exponent is a multiple of 0.5 and the phase exponent is a multiple of 0.5. Always verify that your specific gate parameters produce a Clifford gate.

Limitations

Clifford simulation cannot handle circuits containing T gates, arbitrary angle rotations, or Toffoli gates. If even a single non-Clifford gate appears, you must fall back to statevector or density matrix simulation. For circuits that are “mostly Clifford” with a few T gates, look into stabilizer rank methods or the approach used by Cirq’s StabilizerSampler with approximation.

Measurement in the Clifford simulator is also efficient: projecting onto a computational basis state updates the tableau in O(n^2) time, which is why sampling 1000 shots of a 100-qubit circuit takes seconds rather than millennia.

One additional limitation to keep in mind: while the Clifford simulator can handle very large qubit counts, the O(n^2) cost per gate means that circuits with both many qubits and many gates can still take noticeable time. A 1000-qubit circuit with 10,000 gates will require on the order of 10^10 operations, which takes minutes rather than seconds. For extremely large simulations, consider using optimized C/C++ Clifford simulators like Stim (by Craig Gidney at Google), which is orders of magnitude faster than Cirq’s Python-based implementation for large-scale error correction simulations.