Grover's Algorithm with Multiple Solutions

Review: Grover’s Algorithm for a Single Solution

Before tackling the multi-solution case, let’s establish the single-solution foundation that everything builds on.

The setup. You have a search space of N = 2^n items, each represented as an n-qubit computational basis state |x>. Somewhere in this space, exactly one item x* satisfies some property f(x*) = 1, and f(x) = 0 for all other x. Classically, finding x* requires O(N) queries. Grover’s algorithm finds it in O(sqrt(N)) queries, a quadratic speedup.

The oracle. The oracle O_f encodes the function f as a phase flip:

O_f|x> = -|x>    if f(x) = 1
O_f|x> = |x>     if f(x) = 0

The oracle does not “look at” the state or collapse it. It applies a conditional phase of -1 to the marked state within a superposition, leaving all other amplitudes untouched.

The diffusion operator. The diffusion operator D performs “inversion about the mean.” It reflects every amplitude around the average amplitude of all basis states. After the oracle has reduced the marked state’s amplitude (by flipping its sign), the diffusion operator pushes that amplitude above the new mean while pushing all others below it. Together, O_f followed by D forms one Grover iteration:

G = D * O_f

The math. Define theta by sin(theta) = 1/sqrt(N). After k iterations, the probability of measuring the marked state is:

P(success after k iterations) = sin^2((2k + 1) * theta)

The optimal number of iterations is:

k_opt ~ pi/4 * sqrt(N)

At this point, (2k+1)*theta is close to pi/2, so sin^2 is close to 1, and the success probability approaches 100%.

For 4 qubits (N = 16), the optimal k is about 3. For 10 qubits (N = 1024), it is about 25. The square root scaling is what gives Grover’s algorithm its power.

Single vs Multiple Solutions

The standard presentation of Grover’s algorithm assumes exactly one marked item among N candidates. In many real problems, there are M > 1 solutions. For example, a satisfiability problem may have many satisfying assignments, or a database query may return multiple results.

Grover’s algorithm generalizes naturally to M solutions. The key insight is that the initial amplitude on marked states starts larger when there are more of them, so fewer iterations are needed. The optimal iteration count becomes:

k_opt = round(pi/4 * sqrt(N/M))

With M solutions, the algorithm converges faster because the initial overlap with the “marked subspace” is sqrt(M/N) instead of 1/sqrt(N). The quadratic speedup is maintained: O(sqrt(N/M)) queries instead of the classical O(N/M).

The Geometric Picture

Grover’s algorithm has an elegant geometric interpretation that makes the multi-solution case easy to understand. The entire algorithm plays out in a two-dimensional subspace, regardless of how many qubits you use.

Define two special states:

|marked>   = (1/sqrt(M)) * sum of all |x> where f(x) = 1
|unmarked> = (1/sqrt(N-M)) * sum of all |x> where f(x) = 0

These are the uniform superpositions over marked and unmarked states respectively. They are orthogonal and span a 2D plane. The initial state H^n|0> (the uniform superposition over all N states) lies in this plane:

H^n|0> = sqrt(M/N) * |marked> + sqrt((N-M)/N) * |unmarked>

The angle theta between H^n|0> and the |unmarked> axis satisfies:

sin(theta) = sqrt(M/N)

Each Grover iteration rotates the state by 2*theta toward the |marked> axis. After k iterations, the state makes an angle (2k+1)*theta with the |unmarked> axis.

Here is the geometry:

  |marked>
       |        . /  state after k iterations
       |      .  /
       |    .   /   angle = (2k+1)*theta
       |  .    /
       | .    /
       |.  __/ <-- initial state H^n|0>
       |  /
       | / angle = theta
       |/_____________________________ |unmarked>

The optimal k makes (2k+1)*theta = pi/2, which rotates the state exactly onto |marked>. At that point, measuring yields a marked state with certainty.

This picture immediately shows two things:

More solutions (larger M) means a larger starting angle theta, so fewer rotations are needed.
Too many rotations will overshoot past |marked> and decrease the success probability.

Why M > N/4 Breaks the Algorithm

When M/N > 1/4, the starting angle theta exceeds pi/6. A single Grover iteration rotates by 2*theta > pi/3, which can overshoot the |marked> axis. In this regime, measuring the uniform superposition directly already gives each marked state with probability M/N > 1/4, which is already a decent success probability.

The following code compares the success probability with and without Grover’s algorithm for various M values:

import numpy as np

def success_prob_no_grover(N, M):
    return M / N

def success_prob_optimal_grover(N, M):
    theta = np.arcsin(np.sqrt(M / N))
    k = round((np.pi / 2 - theta) / (2 * theta))
    return np.sin((2 * k + 1) * theta) ** 2

# For M = N/2 (half solutions):
N = 16
for M in [1, 2, 4, 6, 8]:
    print(f"M={M}: no-grover P={success_prob_no_grover(N,M):.3f}, "
          f"optimal Grover P={success_prob_optimal_grover(N,M):.3f}")

The output shows that for M=8 (half of N=16), the no-Grover probability is already 0.500, and Grover’s optimal iteration count drops to 0 or 1. The algorithm is most valuable when M is small relative to N.

Over-Rotation: The “Overcooking” Problem

If you apply too many Grover iterations, the state rotates past |marked> and the success probability drops. This is one of the most common mistakes when using Grover’s algorithm in practice.

import numpy as np

N, M = 64, 1
theta = np.arcsin(np.sqrt(M / N))

for k in range(1, 15):
    p = np.sin((2 * k + 1) * theta) ** 2
    print(f"k={k:2d}: P(success) = {p:.4f} {'<-- optimal' if k == round((np.pi/2 - theta)/(2*theta)) else ''}")

You will see the success probability rise to nearly 1.0 at the optimal k (around 6), then fall back toward 0, then rise again in an oscillating pattern. This oscillation is a direct consequence of the rotation geometry: the state keeps spinning in the 2D plane, alternately aligning with |marked> and |unmarked>.

This behavior has a critical practical implication: you need to know M (or estimate it) to choose the right number of iterations. If you guess wrong and overshoot, you can actually do worse than random. Quantum counting (covered later in this tutorial) solves this problem.

Multi-Solution Oracle in Qiskit

The oracle for multiple solutions marks each solution independently. For a phase oracle, this means applying a phase flip (-1) for each marked state.

The standard approach uses the X-MCX-X pattern: flip the qubits that should be |0> in the target state, apply a multi-controlled Z gate, then undo the flips. This fires the phase flip only when the qubits match the target bit pattern.

from qiskit import QuantumCircuit, transpile
from qiskit_aer import AerSimulator
import numpy as np

def multi_solution_oracle(n_qubits, marked_states):
    """
    Returns a QuantumCircuit that phase-flips all states in marked_states.
    marked_states: list of integers (the marked states as decimals)
    """
    qc = QuantumCircuit(n_qubits)
    for target in marked_states:
        # Convert target to binary string, MSB first
        binary = format(target, f"0{n_qubits}b")
        # Flip 0-bits so the multi-controlled gate fires on the correct state
        for i, bit in enumerate(reversed(binary)):
            if bit == "0":
                qc.x(i)
        # Multi-controlled Z: use H-MCX-H to convert MCX into MCZ
        qc.h(n_qubits - 1)
        qc.mcx(list(range(n_qubits - 1)), n_qubits - 1)
        qc.h(n_qubits - 1)
        # Undo the X flips
        for i, bit in enumerate(reversed(binary)):
            if bit == "0":
                qc.x(i)
    return qc

def diffusion_operator(n_qubits):
    """Standard Grover diffusion operator (inversion about the mean)."""
    qc = QuantumCircuit(n_qubits)
    qc.h(range(n_qubits))
    qc.x(range(n_qubits))
    qc.h(n_qubits - 1)
    qc.mcx(list(range(n_qubits - 1)), n_qubits - 1)
    qc.h(n_qubits - 1)
    qc.x(range(n_qubits))
    qc.h(range(n_qubits))
    return qc

def grover_multi_solution(n_qubits, marked_states):
    N = 2 ** n_qubits
    M = len(marked_states)
    k = max(1, round(np.pi / 4 * np.sqrt(N / M)))
    print(f"N={N}, M={M}, optimal iterations k={k}")

    qc = QuantumCircuit(n_qubits, n_qubits)
    qc.h(range(n_qubits))

    oracle = multi_solution_oracle(n_qubits, marked_states)
    diffusion = diffusion_operator(n_qubits)

    for _ in range(k):
        qc.compose(oracle, inplace=True)
        qc.barrier()
        qc.compose(diffusion, inplace=True)
        qc.barrier()

    qc.measure(range(n_qubits), range(n_qubits))
    return qc

# Search for 3 solutions among 16 states (n=4 qubits)
# Marked states: 3 (0011), 7 (0111), 11 (1011)
marked = [3, 7, 11]
qc = grover_multi_solution(4, marked)

sim = AerSimulator()
result = sim.run(transpile(qc, sim), shots=4096).result()
counts = result.get_counts()

print("\nTop measurement outcomes:")
for state, count in sorted(counts.items(), key=lambda x: -x[1])[:6]:
    decimal = int(state, 2)
    label = " <-- MARKED" if decimal in marked else ""
    print(f"  |{state}> ({decimal:2d}): {count:4d} shots{label}")

Alternative Oracle Implementations

The X-MCX-X pattern above is the most common approach, but there are other ways to build a multi-solution oracle. Each has different trade-offs in gate count, readability, and hardware compatibility.

Phase oracle using multi-controlled phase gates. Instead of converting MCX to MCZ with Hadamard wrappers, you can use the mcp gate to apply a controlled phase rotation directly:

from qiskit import QuantumCircuit
import numpy as np

def phase_oracle(n_qubits, marked_states):
    """Oracle using multi-controlled phase gate (mcp)."""
    qc = QuantumCircuit(n_qubits)
    for target in marked_states:
        binary = format(target, f"0{n_qubits}b")
        # Flip 0-bits to match the target pattern
        for i, bit in enumerate(reversed(binary)):
            if bit == "0":
                qc.x(i)
        # Apply multi-controlled phase: pi phase on target qubit
        qc.mcp(np.pi, list(range(n_qubits - 1)), n_qubits - 1)
        # Undo the X flips
        for i, bit in enumerate(reversed(binary)):
            if bit == "0":
                qc.x(i)
    return qc

# Verify: both oracles should produce the same unitary
from qiskit.quantum_info import Operator

n = 3
marked = [2, 5]
oracle_a = multi_solution_oracle(n, marked)
oracle_b = phase_oracle(n, marked)
print("Unitaries match:", Operator(oracle_a).equiv(Operator(oracle_b)))

Diagonal unitary oracle. For small qubit counts, you can construct the oracle directly as a diagonal matrix where each marked state gets a phase of -1:

from qiskit import QuantumCircuit
from qiskit.quantum_info import Operator
import numpy as np

def diagonal_oracle(n_qubits, marked_states):
    """Oracle using an explicit diagonal unitary matrix."""
    N = 2 ** n_qubits
    phases = np.ones(N)
    for s in marked_states:
        phases[s] = -1
    diag_matrix = np.diag(phases)
    qc = QuantumCircuit(n_qubits)
    qc.unitary(diag_matrix, range(n_qubits), label="Oracle")
    return qc

# Compare gate counts after transpilation
from qiskit import transpile
from qiskit_aer import AerSimulator

sim = AerSimulator()
n = 3
marked = [2, 5]

for name, oracle_fn in [("X-MCX-X", multi_solution_oracle),
                         ("Phase (mcp)", phase_oracle),
                         ("Diagonal", diagonal_oracle)]:
    qc = oracle_fn(n, marked)
    transpiled = transpile(qc, sim, optimization_level=2)
    print(f"{name:15s}: depth={transpiled.depth():3d}, "
          f"gate count={sum(transpiled.count_ops().values()):3d}")

The diagonal oracle is conceptually the simplest but does not decompose efficiently for large qubit counts. The X-MCX-X and phase oracle approaches scale better because they exploit the structure of each marked state individually.

Building Oracles from Boolean Functions

In practice, you often have a boolean predicate f(x) and need to construct the corresponding oracle. Here is a utility that bridges the gap:

import numpy as np

def build_oracle_from_function(n_qubits, f):
    """
    Build a phase oracle for any boolean function f: {0,...,2^n-1} -> {0,1}.
    f(x) = 1 for marked states, 0 otherwise.
    """
    marked = [x for x in range(2**n_qubits) if f(x)]
    return multi_solution_oracle(n_qubits, marked)

# Example: mark all states with even Hamming weight (even number of 1-bits)
def even_hamming(x):
    return bin(x).count('1') % 2 == 0

oracle = build_oracle_from_function(4, even_hamming)
marked_states = [x for x in range(16) if even_hamming(x)]
print(f"Marked states: {marked_states}")  # [0, 3, 5, 6, 9, 10, 12, 15] - 8 states
print(f"Number of solutions M = {len(marked_states)}")

This approach enumerates all 2^n inputs to find the marked ones, which is only practical for small n (up to about 20 qubits). For larger problems, you need to build the oracle circuit directly from the structure of f, using reversible logic gates. The enumeration approach is still useful for testing, prototyping, and verifying that a hand-built oracle is correct.

Running Grover’s with Qiskit’s Built-In GroverOperator

Qiskit provides a GroverOperator class that combines the oracle and diffusion operator into a single reusable block. This is cleaner than manually composing the two pieces and ensures the diffusion operator matches the oracle’s qubit layout:

from qiskit.circuit.library import GroverOperator
from qiskit_aer import AerSimulator
from qiskit import QuantumCircuit, transpile
import numpy as np

# Build oracle circuit for 3 marked states in 4-qubit space
oracle = multi_solution_oracle(4, [3, 7, 11])

# GroverOperator combines oracle + diffusion into one block
grover_op = GroverOperator(oracle)

# Build the full Grover circuit
n_qubits = 4
N = 2 ** n_qubits
M = 3
k = max(1, round(np.pi / 4 * np.sqrt(N / M)))  # optimal iterations

qc = QuantumCircuit(n_qubits)
qc.h(range(n_qubits))  # Initial uniform superposition
for _ in range(k):
    qc.compose(grover_op, inplace=True)
qc.measure_all()

sim = AerSimulator()
counts = sim.run(transpile(qc, sim), shots=4096).result().get_counts()

print(f"Using GroverOperator with k={k} iterations:")
for state, count in sorted(counts.items(), key=lambda x: -x[1])[:5]:
    decimal = int(state, 2)
    marked_label = " <-- MARKED" if decimal in [3, 7, 11] else ""
    print(f"  |{state}> ({decimal:2d}): {count:4d} shots{marked_label}")

The GroverOperator class also handles edge cases like ancilla qubits in the oracle and custom reflection qubits, making it the recommended approach for production code.

The Optimal Iteration Count Derivation

With M solutions, the initial state has overlap sqrt(M/N) with the marked subspace. Using the geometric picture, the angle theta satisfies:

sin(theta) = sqrt(M/N)

After k Grover iterations, the state has rotated to angle (2k+1)*theta from the |unmarked> axis. The probability of measuring a marked state is:

P(success after k iterations) = sin^2((2k + 1) * theta)

This is maximized when (2k+1)*theta = pi/2, giving:

k = (pi/2 - theta) / (2*theta)

For small theta (i.e., M much smaller than N), theta is approximately sqrt(M/N), and this simplifies to:

k ~ pi/4 * sqrt(N/M)

Since k must be an integer, we round to the nearest whole number. The following code computes exact optimal iterations and success probabilities:

import numpy as np

def optimal_iterations(N, M):
    """Compute exact optimal Grover iterations for M solutions in N items."""
    theta = np.arcsin(np.sqrt(M / N))
    k = round((np.pi / 2 - theta) / (2 * theta))
    success_prob = np.sin((2 * k + 1) * theta) ** 2
    return k, success_prob

print("Optimal iterations and success probability:")
print(f"{'N':>6}  {'M':>4}  {'k':>5}  {'P(success)':>12}")
for N in [16, 64, 256]:
    for M in [1, 2, 4, 8]:
        if M < N:
            k, p = optimal_iterations(N, M)
            print(f"{N:>6}  {M:>4}  {k:>5}  {p:>12.4f}")

Notice that doubling M roughly reduces k by a factor of sqrt(2), and the success probability remains above 0.9 in all cases.

Success Probability Analysis Across M Values

To build intuition for how M affects the algorithm, here is a systematic analysis for N=64:

import numpy as np

N = 64
M_values = range(1, 33)
k_vals = []
p_vals = []

print(f"{'M':>4}  {'k_opt':>5}  {'P(success)':>12}  {'P(random)':>10}")
print("-" * 40)

for M in M_values:
    theta = np.arcsin(np.sqrt(M / N))
    k = round((np.pi / 2 - theta) / (2 * theta))
    p = np.sin((2 * k + 1) * theta) ** 2
    k_vals.append(k)
    p_vals.append(p)
    print(f"{M:>4}  {k:>5}  {p:>12.4f}  {M/N:>10.4f}")

The table reveals several patterns:

For M=1, the algorithm needs 6 iterations and achieves P=0.9961.
For M=4, only 3 iterations are needed.
For M=16 (N/4), the optimal k drops to 1.
For M=32 (N/2), k=0 or 1 is optimal, and measuring the initial superposition already gives 50% success.

You can also visualize this relationship:

import matplotlib.pyplot as plt
import numpy as np

N = 64
M_values = range(1, 33)
k_vals = []
p_vals = []

for M in M_values:
    theta = np.arcsin(np.sqrt(M / N))
    k = round((np.pi / 2 - theta) / (2 * theta))
    p = np.sin((2 * k + 1) * theta) ** 2
    k_vals.append(k)
    p_vals.append(p)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))
ax1.plot(list(M_values), k_vals, 'b-o', markersize=3)
ax1.set_xlabel('M (number of solutions)')
ax1.set_ylabel('Optimal iterations k')
ax1.set_title(f'Grover iterations for N={N}')

ax2.plot(list(M_values), p_vals, 'r-o', markersize=3)
ax2.set_xlabel('M (number of solutions)')
ax2.set_ylabel('Success probability')
ax2.set_title('Success probability at optimal k')
ax2.set_ylim(0.8, 1.02)

plt.tight_layout()
plt.savefig('grover_analysis.png', dpi=150)
plt.show()

Quantum Counting

When M is unknown, you need to estimate it before choosing k. Quantum counting combines Grover’s algorithm with quantum phase estimation (QPE) to estimate M without knowing it in advance.

How it works. The Grover operator G = D * O_f has a specific eigenvalue structure. In the 2D subspace spanned by |marked> and |unmarked>, G has two eigenvalues:

exp(+2i * theta)  and  exp(-2i * theta)

where theta = arcsin(sqrt(M/N)) as before. QPE applied to G estimates the phase 2theta/(2pi), from which we recover theta and then M = N * sin^2(theta).

The counting circuit uses counting_bits ancilla qubits to store the phase estimate. More counting qubits give higher precision in the estimate of M.

from qiskit.circuit.library import GroverOperator, PhaseEstimation
from qiskit.quantum_info import Operator
from qiskit import transpile
from qiskit_aer import AerSimulator
import numpy as np

def quantum_counting_circuit(n_qubits, marked_states, counting_qubits=4):
    """
    Build a quantum counting circuit using PhaseEstimation + GroverOperator.
    counting_qubits: precision of the phase estimate (more = more accurate)
    """
    # Build the oracle
    oracle_qc = multi_solution_oracle(n_qubits, marked_states)

    # Grover operator = oracle + diffusion
    grover_op = GroverOperator(oracle_qc)

    # PhaseEstimation estimates the eigenvalue phase of the Grover operator
    pe = PhaseEstimation(counting_qubits, grover_op)
    return pe

# Example: estimate M for 2 marked states in 4-qubit space
pe_circuit = quantum_counting_circuit(4, marked_states=[3, 7], counting_qubits=5)
print(f"Quantum counting circuit depth: {pe_circuit.depth()}")
print(f"Total qubits: {pe_circuit.num_qubits}")

Decoding the result. After measuring the counting register, the measured integer maps to a phase estimate. Here is how to convert that back to an estimate of M:

import numpy as np

def decode_quantum_counting(counts, N, counting_bits):
    """
    Decode quantum counting measurement results to estimate M.
    
    The measured integer maps to phase phi = measured_int / 2^counting_bits.
    Then theta = phi * pi, and M = N * sin^2(theta).
    """
    most_common = max(counts, key=counts.get)
    # The counting register bits encode the phase
    measured_int = int(most_common, 2)
    phi = measured_int / 2**counting_bits
    theta = phi * np.pi
    M_estimate = N * np.sin(theta)**2
    return round(M_estimate)

# Simulated example showing the decoding logic
# Suppose we measured "00110" (integer 6) from 5 counting qubits
example_counts = {"00110": 500, "11010": 480, "00101": 10, "00111": 10}
N = 16
counting_bits = 5
M_est = decode_quantum_counting(example_counts, N, counting_bits)
print(f"Estimated M = {M_est}")

The accuracy of quantum counting depends on the number of counting qubits. With c counting qubits, the phase estimate has resolution 1/2^c, and the error in M scales as O(sqrt(M) / 2^c). For most practical purposes, c = ceil(log2(N)) + 2 gives a reliable estimate.

Application: Solving a Small SAT Problem with Grover’s Algorithm

Grover’s algorithm is often discussed as a tool for solving NP-complete problems like boolean satisfiability (SAT). While the quadratic speedup alone does not make NP problems efficient, it is a real advantage for problems where the search space is moderate.

Consider a simple 3-SAT instance with 3 boolean variables x0, x1, x2 and two clauses:

Clause 1: x0 OR x1 OR NOT(x2)
Clause 2: NOT(x0) OR x2 OR x1

An assignment satisfies the formula if both clauses are true simultaneously. Let’s find all satisfying assignments classically first, then use Grover to search for them:

import numpy as np
from qiskit import QuantumCircuit, transpile
from qiskit_aer import AerSimulator

def clause1(x0, x1, x2):
    """x0 OR x1 OR NOT(x2)"""
    return x0 or x1 or (not x2)

def clause2(x0, x1, x2):
    """NOT(x0) OR x2 OR x1"""
    return (not x0) or x2 or x1

def sat_formula(x):
    """Evaluate the full SAT formula for integer x (3-bit encoding)."""
    x0 = (x >> 2) & 1  # MSB
    x1 = (x >> 1) & 1
    x2 = x & 1         # LSB
    return clause1(x0, x1, x2) and clause2(x0, x1, x2)

# Find all satisfying assignments
satisfying = [x for x in range(8) if sat_formula(x)]
print(f"Satisfying assignments: {satisfying}")
for x in satisfying:
    bits = format(x, '03b')
    print(f"  x={x} ({bits}): x0={bits[0]}, x1={bits[1]}, x2={bits[2]}")
print(f"Number of solutions M = {len(satisfying)}")

For small instances like this, we can encode the satisfying assignments directly into the oracle and run Grover’s algorithm:

# Build the oracle from the SAT predicate
oracle = multi_solution_oracle(3, satisfying)

# Run Grover's algorithm
N = 8
M = len(satisfying)
theta = np.arcsin(np.sqrt(M / N))
k = max(1, round((np.pi / 2 - theta) / (2 * theta)))
print(f"\nN={N}, M={M}, optimal iterations k={k}")

qc = QuantumCircuit(3, 3)
qc.h(range(3))

diffusion = diffusion_operator(3)
for _ in range(k):
    qc.compose(oracle, inplace=True)
    qc.barrier()
    qc.compose(diffusion, inplace=True)
    qc.barrier()

qc.measure(range(3), range(3))

sim = AerSimulator()
counts = sim.run(transpile(qc, sim), shots=4096).result().get_counts()

print("\nMeasurement results:")
for state, count in sorted(counts.items(), key=lambda x: -x[1]):
    decimal = int(state, 2)
    is_sat = " <-- SAT" if decimal in satisfying else ""
    print(f"  |{state}> ({decimal}): {count:4d} shots{is_sat}")

A note on real SAT oracles. For this small example, we enumerated all solutions classically and encoded them directly. For larger instances where classical enumeration is infeasible (which is the whole point of using Grover), you must build the oracle from the clause structure using reversible logic. This requires:

Ancilla qubits to compute each clause’s output.
A multi-controlled gate that flips the phase when all clauses are satisfied.
Uncomputation to reset the ancilla qubits to their initial state.

The ancilla overhead can be significant. For a k-SAT formula with c clauses, you typically need c ancilla qubits (one per clause) plus additional workspace. This makes the oracle circuit much deeper than the simple direct-encoding approach, and it is one reason Grover’s algorithm for SAT is challenging on near-term hardware.

Grover’s Algorithm on Real Hardware

Running Grover’s algorithm on real quantum hardware introduces noise and connectivity constraints that limit the achievable circuit fidelity.

Circuit depth. For n qubits with k Grover iterations, each iteration requires:

The oracle: roughly 2n X gates (for bit flips) plus one MCX gate per marked state
The diffusion operator: 2n H gates, 2n X gates, and one MCX gate
Each MCX gate on n qubits decomposes into O(n) CNOT gates

For 4 qubits and 2 iterations, the transpiled circuit typically contains 20-30 native gates (single-qubit rotations and CNOTs).

Error accumulation. On IBM hardware with approximately 0.3% CNOT error rate:

A circuit with 30 CNOTs has expected fidelity of roughly (1 - 0.003)^30, which is about 0.91
For 6 qubits, the circuit depth roughly doubles per iteration due to more complex MCX decomposition
At 10 qubits, the circuit depth exceeds what current devices can execute reliably within coherence times

The practical tradeoff. Grover’s quadratic speedup (sqrt(N) vs N queries) only becomes meaningful for large N (10+ qubits, meaning 1000+ items). But at those sizes, the circuit depth overwhelms the coherence time of current NISQ hardware. This creates a gap: the algorithm is most useful for large problems, but current hardware can only handle small demonstrations.

Today, Grover’s algorithm on real hardware is limited to about 5-6 qubit demonstrations with 1-2 iterations. Fault-tolerant quantum computers with error correction will be needed to run Grover at problem sizes where it outperforms classical search.

Error mitigation. For near-term experiments, several techniques can improve results:

Measurement error mitigation: calibrate the readout errors and apply a correction matrix
Zero-noise extrapolation: run the circuit at multiple noise levels and extrapolate to zero noise
Dynamical decoupling: insert identity-equivalent gate sequences during idle periods to suppress decoherence

Common Mistakes

Here are pitfalls that frequently trip up people implementing multi-solution Grover:

1. Using M = 0 in the iteration formula. The formula k = round(pi/4 * sqrt(N/M)) diverges when M = 0. If you do not know whether any solutions exist, you must use quantum counting first to estimate M. Alternatively, you can run Grover with a guess for M and check whether the output satisfies the search predicate.

2. Forgetting Qiskit’s little-endian bit ordering. Qiskit stores qubits in little-endian order: qubit 0 is the least significant bit. If you want to mark state 3 (binary 011), then in a 3-qubit system: qubit 0 = 1, qubit 1 = 1, qubit 2 = 0. The binary string representation in Qiskit measurement results reads right-to-left relative to qubit indices. The format(target, f"0{n_qubits}b") and reversed() pattern in the oracle code handles this correctly, but getting the convention wrong produces oracles that mark the wrong states.

# Demonstrate Qiskit's bit ordering
from qiskit import QuantumCircuit, transpile
from qiskit_aer import AerSimulator

qc = QuantumCircuit(3, 3)
# Prepare state |5> = |101> by flipping qubit 0 (LSB) and qubit 2 (MSB)
qc.x(0)  # qubit 0 = 1 (least significant)
qc.x(2)  # qubit 2 = 1 (most significant)
qc.measure(range(3), range(3))

sim = AerSimulator()
counts = sim.run(transpile(qc, sim), shots=1).result().get_counts()
print(f"Measured: {counts}")  # Should show '101' representing decimal 5

3. Off-by-one in the iteration count. The formula pi/4 * sqrt(N/M) is an approximation. The exact optimal k is:

k = round((pi/2 - theta) / (2 * theta))    where theta = arcsin(sqrt(M/N))

Using floor() instead of round() can produce a suboptimal iteration count, especially when the exact optimum falls close to a half-integer. Always use round() for better accuracy.

4. Over-rotation from too many iterations. As demonstrated earlier, applying more than the optimal number of iterations causes the success probability to drop. This is not a minor effect; it can reduce the probability to near zero. If you are unsure about M, it is safer to underestimate k slightly than to overshoot.

5. Omitting barriers between oracle and diffusion. In Qiskit, the transpiler may merge gates from the oracle and diffusion operator during optimization. This can break the oracle’s semantics if gate cancellations alter the intended phase pattern. Adding qc.barrier() between the oracle and diffusion prevents this:

for _ in range(k):
    qc.compose(oracle, inplace=True)
    qc.barrier()  # Prevents transpiler from merging oracle and diffusion gates
    qc.compose(diffusion, inplace=True)
    qc.barrier()

6. Not verifying the oracle. Before running the full algorithm, always verify that your oracle marks the correct states. You can do this by applying the oracle to each computational basis state and checking the phase:

from qiskit.quantum_info import Operator
import numpy as np

def verify_oracle(oracle_circuit, expected_marked):
    """Check that the oracle applies phase -1 to exactly the expected states."""
    n = oracle_circuit.num_qubits
    op = Operator(oracle_circuit)
    diagonal = np.diag(op.data)
    actual_marked = [i for i in range(2**n) if np.isclose(diagonal[i], -1)]
    assert set(actual_marked) == set(expected_marked), \
        f"Oracle marks {actual_marked}, expected {expected_marked}"
    print(f"Oracle verified: correctly marks states {expected_marked}")

oracle = multi_solution_oracle(4, [3, 7, 11])
verify_oracle(oracle, [3, 7, 11])

Summary

Grover’s algorithm adapts cleanly to multiple solutions by adjusting the iteration count. The geometric picture makes the mechanics clear: the initial state starts at angle theta = arcsin(sqrt(M/N)) from the |unmarked> axis, and each iteration rotates by 2theta toward |marked>. The optimal number of iterations is k = round((pi/2 - theta) / (2theta)), maintaining the quadratic speedup of O(sqrt(N/M)).

Key takeaways:

More solutions means fewer iterations. The algorithm self-adjusts through the theta parameter.
When M > N/4, Grover’s iterations offer diminishing returns. Measuring the initial superposition directly is nearly as effective.
Over-rotation is a real danger. Too many iterations decreases the success probability, sometimes to near zero.
When M is unknown, quantum counting (QPE applied to the Grover operator) estimates M in O(sqrt(N)) queries.
Real hardware limits current demonstrations to about 5-6 qubits. Fault-tolerant hardware will be needed for Grover’s algorithm to deliver practical speedups.
Always verify your oracle independently before running the full algorithm.

These properties make multi-solution Grover a practical primitive for constraint satisfaction, optimization, and database search tasks on future fault-tolerant hardware.