Quantum Benchmark: Quantum Computing Glossary

A standardized method for characterizing quantum processor performance, encompassing metrics like quantum volume, randomized benchmarking, CLOPS, and cross-entropy benchmarking.

Quantum benchmarks are standardized tests that measure the performance of quantum processors across multiple dimensions: gate quality, circuit execution capability, speed, and algorithmic utility. No single benchmark captures all relevant aspects of quantum hardware performance, which is why the field has developed a suite of complementary metrics. Understanding what each benchmark measures, and what it does not measure, is essential for evaluating quantum computing hardware claims.

Why benchmarking is hard

Classical computers are benchmarked by running well-defined workloads (SPEC, LINPACK, MLPerf) where the correct output is known. Quantum benchmarking faces unique challenges:

Verification: For large quantum circuits, the correct output cannot be computed classically (that is the whole point of quantum computing). Benchmarks must be designed so that either the output is classically verifiable or statistical properties of the output can be checked.
Multi-dimensional performance: A quantum processor’s capability depends on qubit count, gate fidelity, connectivity, coherence time, measurement accuracy, and classical control speed. A single number inevitably compresses these dimensions.
Noise structure matters: Two processors with the same average gate fidelity may behave very differently if one has correlated errors and the other has independent errors.

Key benchmarks

Quantum Volume (QV)

Developed by IBM in 2019, quantum volume measures the largest square random circuit (width $n$ , depth $n$ ) that a processor executes with heavy output probability above $2/3$ . A QV of $2^n$ means the processor handles $n$ -qubit, $n$ -depth circuits. QV captures the interplay between qubit count, connectivity, and gate fidelity in a single number, but it saturates for processors that are deep-circuit capable, and it does not measure speed.

Randomized Benchmarking (RB)

Randomized benchmarking measures average gate fidelity by applying random sequences of Clifford gates followed by an inversion gate and measuring how the survival probability decays with sequence length. The decay rate gives the error per Clifford gate. RB is widely used because it is robust against state preparation and measurement (SPAM) errors, but it only characterizes Clifford gates, not the full native gate set.

Cross-Entropy Benchmarking (XEB)

Cross-entropy benchmarking compares the output distribution of a quantum circuit against the ideal distribution computed via classical simulation. The linear cross-entropy fidelity $F_{\text{XEB}}$ quantifies how well the quantum processor reproduces the correct distribution. Google used XEB in their 2019 quantum supremacy experiment. XEB works for non-Clifford circuits and can probe regime where classical simulation is infeasible, but it requires careful statistical analysis and is sensitive to certain types of correlated errors.

CLOPS (Circuit Layer Operations Per Second)

CLOPS measures the speed at which a processor executes parameterized circuits, accounting for the full stack: compilation, data transfer, quantum execution, and result retrieval. It captures the practical throughput for variational algorithms that require many circuit evaluations. A processor with excellent gate fidelity but slow classical control may have high QV but low CLOPS.

Algorithmic benchmarks

Some benchmarks focus on performance on specific algorithms:

Mirror circuits: Circuits designed to return a known output state, allowing verification of deep circuit execution.
Application-oriented benchmarks: Running specific subroutines (Hamiltonian simulation, QAOA, etc.) and measuring solution quality.
Volumetric benchmarks: Mapping the full width-by-depth space of circuits a processor can execute successfully, rather than collapsing to a single number.

Interpreting benchmark results

When evaluating benchmark claims:

Compare like with like. QV measured with all compiler optimizations enabled is not directly comparable to QV measured with a restricted gate set.
Check the error model. Some benchmarks assume depolarizing noise; real hardware may have biased or correlated noise that the benchmark does not capture.
Consider the application. A high QV does not guarantee good performance on a specific algorithm. The algorithm’s structure (circuit depth, connectivity requirements, sensitivity to specific error types) may not align with what QV measures.
Look at the full picture. A processor with QV 128, CLOPS 10,000, and $99.5\%$ two-qubit gate fidelity tells a richer story than any single number.

Why it matters for learners

Quantum benchmarks are the language through which hardware progress is communicated. Every major quantum computing announcement includes benchmark numbers, and the ability to critically evaluate these claims separates informed observers from those swayed by marketing. Understanding benchmarks also reveals what the field considers important: the shift from “qubit count” to “quantum volume” to “algorithmic benchmarks” reflects the maturing understanding that quantum computing capability is multidimensional.

Quantum Benchmark