• Hardware
  • Also: RB
  • Also: Clifford randomized benchmarking

Randomized Benchmarking (RB)

Randomized benchmarking is a scalable protocol for estimating average gate error rates by running random sequences of Clifford gates of varying length and fitting the exponential decay of survival probability to an error-per-Clifford rate.

The randomized benchmarking protocol works as follows. A random sequence of m Clifford gates is drawn uniformly, a final Clifford gate that inverts the entire sequence is appended, and the circuit is executed on the device. If the gates were perfect, the output would always be the all-zero state; any deviation is due to gate errors. Repeating this for many random sequences of the same length m and averaging the probability of returning to the zero state gives the sequence fidelity F(m). This is repeated for multiple values of m, producing a dataset of (m, F(m)) pairs that, under a depolarizing noise model, follow an exponential decay F(m) = A * p^m + B, where A and B absorb state preparation and measurement (SPAM) errors and p is the depolarizing parameter per Clifford gate.

Fitting the exponential decay yields the error per Clifford (EPC), defined as EPC = (1 - p) * (d - 1) / d where d = 2^n for n qubits. The EPC is a single, hardware-level figure of merit that characterizes average gate quality independent of SPAM errors, making it more reliable than raw fidelity measurements from process tomography. The Clifford group is the natural choice for RB sequences because it is closed (composing Clifford gates gives another Clifford), can be compiled efficiently, and supports the mathematical structure (unitary 2-design) required to make the exponential decay model exact under general Markovian noise.

Interleaved randomized benchmarking (IRB) extends the protocol to characterize individual gates. An additional target gate G is interleaved between each random Clifford in the sequence, and the decay rate of this interleaved experiment is compared to the reference RB decay rate. The ratio gives the error rate of gate G alone, disentangled from context-dependent errors. IRB is the standard method for reporting single-qubit and two-qubit gate fidelities on real hardware. A known limitation of RB is that it is insensitive to coherent (unitary) errors that average out across random sequences but still cause logical errors in structured circuits; coherent errors can make RB look better than the actual performance of a specific algorithm.

Randomized benchmarking results feed directly into quantum volume calculations. IBM’s quantum volume metric combines gate fidelity (as captured by EPC), qubit connectivity, and circuit depth into a single benchmark that measures the largest square circuit a device can run reliably. A device with lower EPC can support deeper circuits and thus achieve higher quantum volume. RB has become the de facto standard for comparing gate quality across platforms and vendors, with published EPC values for superconducting, trapped-ion, and neutral-atom systems. As hardware matures, more refined variants such as character benchmarking, cycle benchmarking, and unitarity benchmarking extend the framework to capture additional error channels beyond average depolarization.