Simon's Algorithm: Exponential Quantum Speedup for Period Finding

Background and Historical Significance

Simon’s algorithm, published by Daniel Simon in 1994, was a watershed moment in quantum computing. It provided the first proof that a quantum computer could solve a specific problem exponentially faster than any classical algorithm, not just polynomially faster as Deutsch-Jozsa suggested.

More importantly, Simon’s algorithm directly inspired Peter Shor, who generalized its core technique of quantum Fourier sampling into the famous algorithm for factoring integers. Understanding Simon’s algorithm is essentially understanding the skeleton of Shor’s algorithm before the harder number-theoretic machinery is bolted on.

Simon’s Problem

You are given oracle access to a function f: {0,1}^n -> {0,1}^n with the promise that there exists a secret string s such that:

f(x) = f(y)  if and only if  x = y  OR  x XOR y = s

When s = 0...0 the function is one-to-one. Otherwise it is two-to-one: each output value is hit by exactly two inputs that differ by XOR with s. Your goal is to find s.

This is a period-finding problem. The XOR structure means f is periodic with period s under bitwise addition modulo 2 (the group (Z_2)^n). It is also a special case of the hidden subgroup problem, which unifies many quantum algorithms under one framework.

Classical Complexity

Any classical algorithm, deterministic or randomized, needs at least Omega(2^(n/2)) queries to find s with good probability. This follows from a birthday-paradox argument: you need to find two inputs x and y with f(x) = f(y), and until that collision is found you learn nothing about s. By the birthday bound you expect to wait until you have queried around sqrt(2^n) = 2^(n/2) distinct inputs. For n = 100 that is 2^50 queries, around a quadrillion.

Birthday Paradox Lower Bound: Detailed Proof Sketch

To make the classical lower bound precise, consider a classical algorithm that queries f at k distinct inputs x_1, x_2, ..., x_k. The algorithm finds s only if it discovers a collision: two indices i, j with f(x_i) = f(x_j), which reveals s = x_i XOR x_j.

How many queries does this require? Each pair (i, j) has probability 1/2^n of colliding (since there are 2^n possible output values and exactly one partner for each input). After k queries there are k(k-1)/2 pairs. By the union bound, the probability of at least one collision is at most:

P(collision) <= k(k-1) / (2 * 2^n)  ≈  k^2 / 2^(n+1)

For this probability to reach a constant (say 1/2), we need:

k^2 / 2^(n+1) >= 1/2
k^2 >= 2^n
k >= 2^(n/2)

So any classical algorithm needs Omega(2^(n/2)) queries before it can expect to see a collision. This is tight: a classical birthday attack using O(2^(n/2)) queries and O(2^(n/2)) memory does find s with constant probability.

The quantum algorithm, by contrast, needs only O(n) queries. For n = 100 that is roughly 100 queries versus a quadrillion. This is the exponential separation that makes Simon’s algorithm historically important.

Quantum Algorithm: Polynomial Queries

The quantum algorithm extracts a random linear constraint on s with each quantum query. After n - 1 such constraints you can solve the system with Gaussian elimination over GF(2) and recover s.

Total quantum queries needed: O(n). Total classical post-processing: O(n^3) for Gaussian elimination. Exponential speedup over classical.

The Quantum Circuit

The circuit uses two n-qubit registers. Steps:

Initialize both registers to |0...0>.
Apply Hadamard to every qubit in the first register. It becomes the uniform superposition over all 2^n strings.
Apply the oracle U_f, which maps |x>|0> to |x>|f(x)>. The first register is entangled with the second.
Measure the second register. Suppose the result is some value v. Now the first register collapses to an equal superposition of the two inputs x_0 and x_0 XOR s that both map to v.
Apply Hadamard to the first register.
Measure the first register. The result y satisfies y . s = 0 (mod 2), a linear constraint on s.

Repeat steps 1-6 until you have n - 1 linearly independent constraints, then solve.

Full State Vector Derivation (n=3, s=“110”)

Let us trace the algorithm step by step with n = 3 and secret string s = 110. We label the three qubits in each register as positions 0, 1, 2 from left to right.

Step 1: Initialization

Both registers start in |000>|000>.

Step 2: Hadamard on the first register

Applying H^{tensor 3} to the first register produces the uniform superposition:

|psi_1> = (1/sqrt(8)) * sum_{x in {0,1}^3} |x>|000>

= (1/sqrt(8)) * ( |000>|000> + |001>|000> + |010>|000> + |011>|000>
                 + |100>|000> + |101>|000> + |110>|000> + |111>|000> )

Step 3: Apply the oracle

The oracle maps |x>|0> to |x>|f(x)>. Because s = 110, the function f maps each pair {x, x XOR 110} to the same output. The eight inputs pair up as:

x	x XOR 110	f(x)
000	110	f(000) = f(110)
001	111	f(001) = f(111)
010	100	f(010) = f(100)
011	101	f(011) = f(101)

The specific values of f depend on the oracle construction, but only the pairing matters. Using our shift oracle (described below), we get f(x) = x for x with bit 0 equal to 0, and f(x) = x XOR 110 for x with bit 0 equal to 1. So the concrete values are:

x	f(x)
000	000
001	001
010	010
011	011
100	010
101	011
110	000
111	001

After the oracle the state is:

|psi_2> = (1/sqrt(8)) * ( |000>|000> + |001>|001> + |010>|010> + |011>|011>
                         + |100>|010> + |101>|011> + |110>|000> + |111>|001> )

Step 4: Measure the second register

Suppose we measure the second register and obtain v = 010. Looking at the table, the inputs that produce f(x) = 010 are x = 010 and x = 100. Note that 010 XOR 100 = 110 = s, confirming the pairing. The first register collapses to:

|psi_3> = (1/sqrt(2)) * ( |010> + |100> )

This is exactly (|x_0> + |x_0 XOR s>) / sqrt(2) with x_0 = 010.

Step 5: Hadamard on the first register

Apply H^{tensor 3} to the first register. Recall that H^{tensor n}|x> = (1/sqrt(2^n)) * sum_y (-1)^{x . y} |y>. So:

|psi_4> = (1/sqrt(2)) * (1/sqrt(8)) * sum_y [ (-1)^{010 . y} + (-1)^{100 . y} ] |y>
        = (1/4) * sum_y [ (-1)^{010 . y} + (-1)^{100 . y} ] |y>

For each y, the amplitude is a_y = (1/4) * [ (-1)^{010 . y} + (-1)^{100 . y} ].

We can factor out (-1)^{010 . y}:

a_y = (1/4) * (-1)^{010 . y} * [ 1 + (-1)^{(010 XOR 100) . y} ]
    = (1/4) * (-1)^{010 . y} * [ 1 + (-1)^{110 . y} ]
    = (1/4) * (-1)^{010 . y} * [ 1 + (-1)^{s . y} ]

The factor [1 + (-1)^{s . y}] equals 2 when s . y = 0 (mod 2) and 0 when s . y = 1 (mod 2). So only strings y with y . s = 0 have non-zero amplitude.

For s = 110, the condition y . s = y_0 * 1 + y_1 * 1 + y_2 * 0 = y_0 + y_1 = 0 (mod 2) means y_0 = y_1. The valid strings are:

y = 000:  s.y = 0+0 = 0  ✓   amplitude = (1/4)*(-1)^0 * 2 = 1/2
y = 001:  s.y = 0+0 = 0  ✓   amplitude = (1/4)*(-1)^0 * 2 = 1/2
y = 010:  s.y = 0+1 = 1  ✗   amplitude = 0
y = 011:  s.y = 0+1 = 1  ✗   amplitude = 0
y = 100:  s.y = 1+0 = 1  ✗   amplitude = 0
y = 101:  s.y = 1+0 = 1  ✗   amplitude = 0
y = 110:  s.y = 1+1 = 0  ✓   amplitude = (1/4)*(-1)^1 * 2 = -1/2
y = 111:  s.y = 1+1 = 0  ✓   amplitude = (1/4)*(-1)^1 * 2 = -1/2

Step 6: Measure the first register

The probability of each outcome is |a_y|^2 = 1/4 for the four valid strings and 0 for the rest. Each measurement yields one of {000, 001, 110, 111} with equal probability 1/4. Each of these satisfies y . s = 0 (mod 2).

Note that y = 000 is always valid but gives no information (every s satisfies 0 . s = 0). The informative outcomes are 001, 110, and 111. Collecting any two linearly independent vectors from these (for instance 001 and 110) and solving the system y . s = 0 recovers s = 110.

Oracle Construction Deep Dive

Strategy 1: The Shift Oracle

The oracle used in our Qiskit code works in two stages.

Stage 1: Copy. Apply CNOT gates to copy the first register to the second: for each qubit i, apply CX(i, n+i). After this step, |x>|0> becomes |x>|x>.

Stage 2: Conditional XOR. Find the first bit position j where s_j = 1. For every position i where s_i = 1, apply CX(j, n+i). This XORs the second register with s whenever qubit j of the first register is 1.

After both stages, the mapping is:

If x_j = 0: second register holds x XOR 0 = x, so f(x) = x.
If x_j = 1: second register holds x XOR s, so f(x) = x XOR s.

To verify the Simon promise, consider f(x) and f(x XOR s):

If x_j = 0, then (x XOR s)_j = 1 (since s_j = 1). So f(x) = x and f(x XOR s) = (x XOR s) XOR s = x. They match.
If x_j = 1, then (x XOR s)_j = 0. So f(x) = x XOR s and f(x XOR s) = x XOR s. They match.

In both cases f(x) = f(x XOR s), and you can verify that no other collisions occur. The promise holds.

Strategy 2: Permutation Oracle

An alternative approach constructs the oracle from an explicit truth table. For each input x, define f(x) as the lexicographically smaller element of {x, x XOR s}. This guarantees that paired inputs share the same output.

For n = 2, s = "11":

x	x XOR s	f(x) = min(x, x XOR s)
00	11	00
01	10	01
10	01	01
11	00	00

The function maps {00, 11} to 00 and {01, 10} to 01. The unitary implementing this oracle is a permutation matrix that swaps the appropriate basis states in the second register.

This construction is conceptually clean and works for any s, but it requires O(2^n) gates to implement as a circuit (since you specify the truth table entry by entry). The shift oracle is far more efficient, using only O(n) gates. The permutation oracle is useful for small examples or simulation, while the shift oracle is what you would use on actual hardware.

Why Step 5 Gives a Linear Constraint

After step 4 the first register holds (|x_0> + |x_0 XOR s>) / sqrt(2). After Hadamard the amplitude for output y is:

(1/sqrt(2^n)) * ((-1)^(y . x_0) + (-1)^(y . (x_0 XOR s)))
= (1/sqrt(2^n)) * (-1)^(y . x_0) * (1 + (-1)^(y . s))

This amplitude is zero unless y . s = 0 (mod 2). So the measurement always yields a string y orthogonal to s in GF(2). The second register measurement collapses x_0 but the constraint y . s = 0 holds for any x_0, so it is purely about s.

Linear Independence Probability Analysis

Each measurement of the first register yields a uniformly random element of the subspace {y : y . s = 0}, which has dimension n - 1 and thus contains 2^(n-1) elements. After collecting k linearly independent equations, how likely is it that the next measurement gives an independent one?

The k independent vectors span a k-dimensional subspace of the (n-1)-dimensional solution space. That subspace contains 2^k elements. A new measurement y is linearly dependent on the existing set if and only if y falls in this 2^k-element subspace. Since y is drawn uniformly from the 2^(n-1)-element space, the probability of dependence is:

P(dependent) = 2^k / 2^(n-1) = 2^(k - n + 1)

So the probability that the next measurement is independent is:

P(independent) = 1 - 2^(k - n + 1)

For the first few measurements (when k is much smaller than n - 1), this probability is very close to 1. Even in the worst case, when k = n - 2 (one more independent equation needed), the probability of independence is 1 - 2^(-1) = 1/2.

Expected Number of Measurements

Let T be the total number of measurements needed to collect n - 1 independent equations. We can write T = T_0 + T_1 + ... + T_{n-2}, where T_k is the number of measurements needed to go from k independent equations to k + 1. Each T_k is a geometric random variable with success probability p_k = 1 - 2^(k - n + 1), so E[T_k] = 1/p_k.

E[T] = sum_{k=0}^{n-2} 1 / (1 - 2^(k-n+1))

For k = 0: E[T_0] = 1 / (1 - 2^(-(n-1))) which is slightly above 1. For k = n-2: E[T_{n-2}] = 1 / (1 - 1/2) = 2.

We can bound the sum:

E[T] = sum_{k=0}^{n-2} 1 / (1 - 2^(k-n+1))
     <= sum_{k=0}^{n-2} (1 + 2^(k-n+2))        [since 1/(1-x) <= 1+2x for x <= 1/2]
     = (n-1) + 2 * sum_{k=0}^{n-2} 2^(k-n+1)
     = (n-1) + 2 * (1 - 2^(-(n-1)))
     < (n-1) + 2
     = n + 1

So the expected number of measurements is at most n + 1, which is well within O(n). In practice you typically need very close to n - 1 measurements, since most of them are independent.

Gaussian Elimination over GF(2): Worked Example

Suppose n = 3, s = "110", and our measurements produce the equations y_1 = "010", y_2 = "100", y_3 = "110". Each equation asserts y . s = 0 (mod 2), which means s lies in the null space of the matrix formed by these rows.

Step 1: Form the matrix

    col0  col1  col2
y1: [ 0     1     0  ]
y2: [ 1     0     0  ]
y3: [ 1     1     0  ]

Step 2: Row reduce over GF(2)

Pivot on column 0. Row 1 has a 0 in column 0, but row 2 has a 1. Swap rows 1 and 2:

R1: [ 1     0     0  ]
R2: [ 0     1     0  ]
R3: [ 1     1     0  ]

Row 3 has a 1 in column 0. Replace R3 with R3 XOR R1:

R1: [ 1     0     0  ]
R2: [ 0     1     0  ]
R3: [ 0     1     0  ]

Pivot on column 1. Row 2 already has a 1 in column 1. Row 3 also has a 1. Replace R3 with R3 XOR R2:

R1: [ 1     0     0  ]
R2: [ 0     1     0  ]
R3: [ 0     0     0  ]

Pivot on column 2. No non-zero entry remains. Column 2 is a free variable.

Step 3: Identify the free variable and reconstruct s

The pivot columns are 0 and 1. Column 2 is free. Set s_2 = 1 (the free variable) and back-substitute… but wait. All entries in column 2 are 0, so the pivot rows give s_0 = 0 and s_1 = 0. That gives s = 001, which is wrong.

This reveals an important subtlety: the null space of this matrix is spanned by (0, 0, 1), corresponding to s = 001. But we know the true secret is s = 110. What went wrong?

The issue is that we have only two independent equations (the third was redundant). The null space is one-dimensional and gives s = 001. This is actually a valid element of the null space, but it is not the secret string. The system y . s = 0 has the null space containing both s = 001 and s = 110 and s = 111. Wait, let us recheck. The null space of the matrix:

[ 1  0  0 ]
[ 0  1  0 ]

is {(0, 0, 0), (0, 0, 1)}. The vector (1, 1, 0) is NOT in this null space since R1 . (1,1,0) = 1 != 0.

This means our equations are inconsistent with s = 110. Let us recheck: y_2 = 100, and y_2 . s = 1*1 + 0*1 + 0*0 = 1 != 0. So y_2 = 100 is NOT a valid measurement for s = 110.

This is an important lesson. In the actual algorithm, the measurements are guaranteed to satisfy y . s = 0. For s = 110, the valid measurements are from {000, 001, 110, 111}. Let us redo the example with valid equations.

Corrected Example

Measurements: y_1 = "001", y_2 = "110", y_3 = "111".

Verify: 001 . 110 = 0, 110 . 110 = 1+1 = 0, 111 . 110 = 1+1+0 = 0. All valid.

Form the matrix:

    col0  col1  col2
y1: [ 0     0     1  ]
y2: [ 1     1     0  ]
y3: [ 1     1     1  ]

Row reduce over GF(2):

Swap R1 and R2 so column 0 has a pivot:

R1: [ 1     1     0  ]
R2: [ 0     0     1  ]
R3: [ 1     1     1  ]

R3 has a 1 in column 0. Replace R3 with R3 XOR R1:

R1: [ 1     1     0  ]
R2: [ 0     0     1  ]
R3: [ 0     0     1  ]

Column 1: no non-zero entry below R1 in column 1. Column 1 is free.

Column 2: R2 has a 1. R3 also has a 1. Replace R3 with R3 XOR R2:

R1: [ 1     1     0  ]
R2: [ 0     0     1  ]
R3: [ 0     0     0  ]

Identify free variable: Column 1 is free. Set s_1 = 1.

Back-substitute:

From R2: s_2 = 0.
From R1: s_0 + s_1 = 0, so s_0 = s_1 = 1.

Result: s = (1, 1, 0) which is "110". Correct.

What if the system is underdetermined?

If you collect fewer than n - 1 independent equations, the null space has dimension greater than 1 and contains multiple non-zero candidates. In that case, go back and collect more measurements. You can detect this situation by checking the rank of the matrix after row reduction: if the rank is less than n - 1, you need more equations.

Qiskit Implementation

import numpy as np
from qiskit import QuantumCircuit, QuantumRegister, ClassicalRegister
from qiskit_aer import AerSimulator


def simon_oracle(s: str) -> QuantumCircuit:
    """
    Build a Simon oracle for secret string s.
    The oracle implements a two-to-one function with the given period.
    Strategy: copy x to second register, then XOR second register with s
    conditioned on the first non-zero bit of s.
    """
    n = len(s)
    qc = QuantumCircuit(2 * n)

    # Copy first register to second (identity part)
    for i in range(n):
        qc.cx(i, n + i)

    # XOR second register with s when the control qubit (first 1-bit of s) is 1
    # This creates the two-to-one mapping: f(x) = f(x XOR s)
    if "1" in s:
        control_bit = next(i for i, b in enumerate(s) if b == "1")
        for i, bit in enumerate(s):
            if bit == "1":
                qc.cx(control_bit, n + i)

    return qc


def simon_circuit(s: str) -> QuantumCircuit:
    """Build a single Simon's algorithm circuit."""
    n = len(s)
    first = QuantumRegister(n, name="x")
    second = QuantumRegister(n, name="fx")
    cr = ClassicalRegister(n, name="c")
    qc = QuantumCircuit(first, second, cr)

    # Hadamard on first register
    qc.h(first)
    qc.barrier()

    # Oracle
    oracle = simon_oracle(s)
    qc.compose(oracle, inplace=True)
    qc.barrier()

    # Measure second register (implicit collapse)
    # Then Hadamard on first register
    qc.h(first)
    qc.barrier()

    # Measure first register
    qc.measure(first, cr)
    return qc


def is_linearly_independent(equations: list[list[int]], new_eq: list[int], n: int) -> bool:
    """
    Check whether new_eq is linearly independent of the existing equations over GF(2).
    Uses row reduction on the combined matrix.
    """
    mat = [row[:] for row in equations] + [new_eq[:]]
    m = len(mat)
    col = 0
    for row_idx in range(m):
        while col < n:
            pivot_row = None
            for r in range(row_idx, m):
                if mat[r][col] == 1:
                    pivot_row = r
                    break
            if pivot_row is not None:
                mat[row_idx], mat[pivot_row] = mat[pivot_row], mat[row_idx]
                for r in range(m):
                    if r != row_idx and mat[r][col] == 1:
                        mat[r] = [(mat[r][c] ^ mat[row_idx][c]) for c in range(n)]
                col += 1
                break
            col += 1
    # The new equation is independent iff it was not zeroed out
    return any(mat[-1][c] == 1 for c in range(n))


def gf2_solve(equations: list[str], n: int) -> str | None:
    """
    Solve a system of linear equations over GF(2) to find s.
    equations: list of n-bit strings y where y . s = 0 mod 2
    Returns s as a binary string, or None if underdetermined.
    """
    # Build matrix from equations
    matrix = []
    for eq in equations:
        row = [int(b) for b in eq]
        matrix.append(row)

    # Gaussian elimination over GF(2) to find null space
    mat = [row[:] for row in matrix]
    m = len(mat)
    pivot_cols = []

    col = 0
    for row_idx in range(m):
        # Find pivot
        while col < n:
            pivot_row = None
            for r in range(row_idx, m):
                if mat[r][col] == 1:
                    pivot_row = r
                    break
            if pivot_row is not None:
                mat[row_idx], mat[pivot_row] = mat[pivot_row], mat[row_idx]
                pivot_cols.append(col)
                for r in range(m):
                    if r != row_idx and mat[r][col] == 1:
                        mat[r] = [(mat[r][c] ^ mat[row_idx][c]) for c in range(n)]
                col += 1
                break
            col += 1

    free_cols = [c for c in range(n) if c not in pivot_cols]
    if not free_cols:
        return "0" * n  # s = 0, function is one-to-one

    # Set free variable to 1 and back-substitute
    free_col = free_cols[0]
    s = [0] * n
    s[free_col] = 1

    for i, pc in enumerate(pivot_cols):
        s[pc] = mat[i][free_col] % 2

    return "".join(str(b) for b in s)


def verify_solution(oracle_fn, s_candidate: str, n: int, num_tests: int = 20) -> bool:
    """
    Verify a candidate s by checking f(x) = f(x XOR s) for random inputs.
    oracle_fn: a callable that takes a bit string and returns a bit string.
    """
    if s_candidate == "0" * n:
        return True  # trivial case
    for _ in range(num_tests):
        x = format(np.random.randint(0, 2**n), f"0{n}b")
        x_xor_s = format(int(x, 2) ^ int(s_candidate, 2), f"0{n}b")
        if oracle_fn(x) != oracle_fn(x_xor_s):
            return False
    return True


def run_simon(s: str, max_iterations: int = 50):
    """Run Simon's algorithm to recover the secret string s."""
    n = len(s)
    simulator = AerSimulator()
    collected = []
    collected_bits = []

    print(f"Target secret string: s = {s}")
    print(f"Running Simon's algorithm with n = {n} qubits...")
    print()

    for iteration in range(max_iterations):
        qc = simon_circuit(s)
        job = simulator.run(qc, shots=1)
        result = job.result().get_counts()
        y = list(result.keys())[0].replace(" ", "")
        y = y[::-1]  # reverse: Qiskit returns big-endian (bit n-1 leftmost), we want bit 0 leftmost
        y_bits = [int(b) for b in y]

        # Skip the zero vector (gives no information)
        if all(b == 0 for b in y_bits):
            print(f"  Iteration {iteration+1}: y = {y}  (zero vector, skipped)")
            continue

        # Check linear independence before adding
        if is_linearly_independent(collected_bits, y_bits, n):
            collected.append(y)
            collected_bits.append(y_bits)
            print(f"  Iteration {iteration+1}: y = {y}  (independent, added as equation #{len(collected)})")
        else:
            print(f"  Iteration {iteration+1}: y = {y}  (dependent, skipped)")

        if len(collected) >= n - 1:
            candidate = gf2_solve(collected, n)
            if candidate:
                print(f"\nCollected {len(collected)} independent equations after {iteration+1} iterations")
                print(f"Equations: {collected}")
                print(f"Recovered s = {candidate}")
                print(f"Correct: {candidate == s}")
                return candidate

    print("Did not find s within max iterations.")
    return None


if __name__ == "__main__":
    # Test with a known secret string
    secret = "110"
    run_simon(secret)

Complete Worked Example Trace

Below is an annotated trace of a typical run with s = "110" and n = 3. The valid measurement outcomes (non-zero strings y with y . s = 0) are {001, 110, 111}.

"""
Detailed trace of Simon's algorithm for s = '110'.

This script prints every intermediate state: each measured y, whether it is
linearly independent, the current equation list, and the final recovery of s.
"""

import numpy as np
from qiskit import QuantumCircuit, QuantumRegister, ClassicalRegister
from qiskit_aer import AerSimulator


def simon_oracle_trace(s: str) -> QuantumCircuit:
    n = len(s)
    qc = QuantumCircuit(2 * n)
    for i in range(n):
        qc.cx(i, n + i)
    if "1" in s:
        control_bit = next(i for i, b in enumerate(s) if b == "1")
        for i, bit in enumerate(s):
            if bit == "1":
                qc.cx(control_bit, n + i)
    return qc


def trace_simon(s: str):
    n = len(s)
    simulator = AerSimulator()
    equations = []
    eq_bits = []

    print(f"=== Simon's Algorithm Trace ===")
    print(f"Secret string: s = {s}")
    print(f"n = {n}, need {n-1} independent equations")
    print(f"Valid non-zero y values (y . s = 0): ", end="")
    valid = [format(y, f"0{n}b") for y in range(1, 2**n)
             if sum(int(a)*int(b) for a, b in zip(format(y, f"0{n}b"), s)) % 2 == 0]
    print(", ".join(valid))
    print()

    iteration = 0
    while len(equations) < n - 1:
        iteration += 1

        # Build and run circuit
        first = QuantumRegister(n, name="x")
        second = QuantumRegister(n, name="fx")
        cr = ClassicalRegister(n, name="c")
        qc = QuantumCircuit(first, second, cr)
        qc.h(first)
        oracle = simon_oracle_trace(s)
        qc.compose(oracle, inplace=True)
        qc.h(first)
        qc.measure(first, cr)

        job = simulator.run(qc, shots=1)
        result = job.result().get_counts()
        y = list(result.keys())[0].replace(" ", "")
        y = y[::-1]  # reverse: Qiskit returns big-endian (bit n-1 leftmost), we want bit 0 leftmost
        y_vec = [int(b) for b in y]

        dot_product = sum(int(a)*int(b) for a, b in zip(y, s)) % 2
        print(f"Iteration {iteration}:")
        print(f"  Measured y = {y}")
        print(f"  Verify y . s = {y} . {s} = {dot_product} (mod 2)")

        if all(b == 0 for b in y_vec):
            print(f"  Status: zero vector, no information gained")
            print(f"  Equations so far: {equations}")
            print()
            continue

        # Check independence using row reduction
        test_mat = [row[:] for row in eq_bits] + [y_vec[:]]
        m = len(test_mat)
        col = 0
        for row_idx in range(m):
            while col < n:
                pivot_row = None
                for r in range(row_idx, m):
                    if test_mat[r][col] == 1:
                        pivot_row = r
                        break
                if pivot_row is not None:
                    test_mat[row_idx], test_mat[pivot_row] = test_mat[pivot_row], test_mat[row_idx]
                    for r in range(m):
                        if r != row_idx and test_mat[r][col] == 1:
                            test_mat[r] = [(test_mat[r][c] ^ test_mat[row_idx][c]) for c in range(n)]
                    col += 1
                    break
                col += 1
        independent = any(test_mat[-1][c] == 1 for c in range(n))

        if independent:
            equations.append(y)
            eq_bits.append(y_vec)
            print(f"  Status: INDEPENDENT (equation #{len(equations)} of {n-1} needed)")
        else:
            print(f"  Status: dependent on existing equations, skipped")

        print(f"  Equations so far: {equations}")
        print()

    # Solve the system
    print("=" * 40)
    print(f"Collected {n-1} independent equations: {equations}")
    print("Solving via GF(2) Gaussian elimination...")

    # Row reduce
    mat = [row[:] for row in eq_bits]
    pivot_cols = []
    col = 0
    for row_idx in range(len(mat)):
        while col < n:
            pivot_row = None
            for r in range(row_idx, len(mat)):
                if mat[r][col] == 1:
                    pivot_row = r
                    break
            if pivot_row is not None:
                mat[row_idx], mat[pivot_row] = mat[pivot_row], mat[row_idx]
                pivot_cols.append(col)
                for r in range(len(mat)):
                    if r != row_idx and mat[r][col] == 1:
                        mat[r] = [(mat[r][c] ^ mat[row_idx][c]) for c in range(n)]
                col += 1
                break
            col += 1

    print(f"Pivot columns: {pivot_cols}")
    free_cols = [c for c in range(n) if c not in pivot_cols]
    print(f"Free columns: {free_cols}")

    free_col = free_cols[0]
    s_recovered = [0] * n
    s_recovered[free_col] = 1
    for i, pc in enumerate(pivot_cols):
        s_recovered[pc] = mat[i][free_col] % 2

    s_str = "".join(str(b) for b in s_recovered)
    print(f"Setting free variable s[{free_col}] = 1")
    print(f"Back-substitution gives: s = {s_str}")
    print(f"Correct: {s_str == s}")

    # Verification step
    print()
    print("=== Verification ===")
    for x_int in range(2**n):
        x = format(x_int, f"0{n}b")
        x_xor_s = format(x_int ^ int(s_str, 2), f"0{n}b")
        print(f"  f({x}) should equal f({x_xor_s}): ", end="")
        print("(same input)" if x == x_xor_s else "verified by oracle promise")


if __name__ == "__main__":
    trace_simon("110")

A typical output looks like:

=== Simon's Algorithm Trace ===
Secret string: s = 110
n = 3, need 2 independent equations
Valid non-zero y values (y . s = 0): 001, 110, 111

Iteration 1:
  Measured y = 110
  Verify y . s = 110 . 110 = 0 (mod 2)
  Status: INDEPENDENT (equation #1 of 2 needed)
  Equations so far: ['110']

Iteration 2:
  Measured y = 001
  Verify y . s = 001 . 110 = 0 (mod 2)
  Status: INDEPENDENT (equation #2 of 2 needed)
  Equations so far: ['110', '001']

========================================
Collected 2 independent equations: ['110', '001']
Solving via GF(2) Gaussian elimination...
Pivot columns: [0, 2]
Free columns: [1]
Setting free variable s[1] = 1
Back-substitution gives: s = 110
Correct: True

Verification Step

After recovering a candidate s, it is essential to verify it. There are two cases to handle:

Case 1: s != 0. Pick a few random inputs x and check that f(x) = f(x XOR s). If any check fails, the candidate is wrong and you need more measurements or a different null space vector.

Case 2: s = 0 (one-to-one function). The GF(2) solver returns s = 0...0 when the null space is trivial (all equations are independent and span the full (n-1)-dimensional space, leaving no free variables). But you need to distinguish this from an underdetermined system. Check that you have exactly n independent equations with rank n. If the rank is n - 1 and the only null space element is 0, the function is genuinely one-to-one.

In practice, you can also verify by running the oracle on a pair (x, x XOR s) and confirming the outputs match. This uses one additional query and gives certainty.

The Hidden Subgroup Problem Perspective

Simon’s problem is an instance of the hidden subgroup problem (HSP), one of the most powerful unifying frameworks in quantum computing.

The HSP Framework

Given a group G, a subgroup H <= G, and a function f: G -> X that is constant on left cosets of H and distinct across different cosets, find H. Formally, f(g_1) = f(g_2) if and only if g_1 * H = g_2 * H.

For Simon’s problem:

G = (Z_2)^n, the group of n-bit strings under bitwise XOR.
H = {0^n, s}, the two-element subgroup generated by s.
The cosets of H are the pairs {x, x XOR s}.
f is constant on each coset and distinct across cosets.

The Dual Subgroup

The quantum algorithm for the abelian HSP works by measuring elements of the dual subgroup H^perp. For (Z_2)^n, the dual group is isomorphic to the group itself, and the dual of H = {0, s} is:

H^perp = { y in (Z_2)^n : y . h = 0 for all h in H }
       = { y : y . s = 0 }

This is exactly the set of valid measurement outcomes. Each quantum circuit run samples a uniformly random element of H^perp. After enough samples, you can reconstruct H^perp, and from it recover H = (H^perp)^perp, which gives you s.

Unification of Quantum Algorithms

The HSP framework unifies several foundational quantum algorithms:

Algorithm	Group G	Hidden Subgroup H
Deutsch-Jozsa	Z_2	{0} or Z_2
Bernstein-Vazirani	(Z_2)^n	{0, s}^perp (hyperplane)
Simon	(Z_2)^n	{0, s}
Shor (period finding)	Z_N	{0, r, 2r, …}

In Deutsch-Jozsa, the group is Z_2 and the “subgroup” is either trivial (balanced) or the whole group (constant). Bernstein-Vazirani finds a secret string through a single Fourier sample. Simon finds a two-element subgroup of (Z_2)^n. Shor finds a cyclic subgroup of Z_N.

The common algorithmic template is: (1) prepare a uniform superposition over G, (2) apply the oracle, (3) apply the quantum Fourier transform over G, (4) measure to get an element of H^perp, (5) classically reconstruct H.

For abelian groups, this approach always works efficiently. The Hadamard transform is the quantum Fourier transform over (Z_2)^n, while Shor uses the QFT over Z_N. The non-abelian HSP (which includes graph isomorphism) remains an open problem and is one of the major challenges in quantum algorithm design.

Classical Post-Processing Detail

The quantum circuit produces vectors y in the dual space of s (the set of all y with y . s = 0 mod 2). This dual space has dimension n - 1. Once you have n - 1 linearly independent vectors, the null space of the system is one-dimensional and its non-trivial element is s.

Gaussian elimination over GF(2) runs in O(n^3) time. The total number of quantum circuit runs expected to yield n - 1 linearly independent vectors is O(n), since at each step the probability that a new vector is linearly independent of the existing ones is at least 1/2.

Connection to Shor’s Algorithm

Shor’s algorithm for integer factoring reduces to finding the period of f(x) = a^x mod N. This is a continuous analog of Simon’s problem: instead of a period in (Z_2)^n, you seek a period in Z. The quantum Fourier transform over Z_N plays the role of the Hadamard transform over (Z_2)^n, producing measurement outcomes concentrated near multiples of 2^n / r where r is the period. Classical post-processing then extracts r via continued fractions.

Simon’s algorithm is exactly the finite-field version of this idea, stripped of the number theory. Shor said in interviews that Simon’s paper was the direct inspiration for his factoring algorithm. Understanding Simon’s algorithm gives you the quantum intuition for why Shor’s works, without the arithmetic overhead.

Property	Deutsch-Jozsa	Bernstein-Vazirani	Simon	Shor
Problem	Constant vs. balanced	Find hidden linear function	Find period in (Z_2)^n	Factor integer N
Group	{0,1}	(Z_2)^n	(Z_2)^n	Z_N
Speedup type	Exponential (exact)	Exponential (exact)	Exponential (probabilistic)	Exponential (probabilistic)
Classical queries	2^(n-1) + 1	n	Omega(2^(n/2))	Sub-exponential
Quantum queries	1	1	O(n)	poly(n)
Post-processing	None	Direct readout	GF(2) Gaussian elimination	Continued fractions
Historical significance	First quantum speedup (promise problem)	Clean HSP example	First exponential speedup proof	Practical cryptographic relevance

Noise and Error Analysis

On real quantum hardware (or noisy simulators), the oracle may introduce errors. Consider a simple noise model: each qubit in the second register has a probability p of undergoing a bit flip after the oracle acts. This corrupts the value f(x), which in turn changes the measurement statistics of the first register.

When the second register is corrupted, the entanglement between registers is partially broken. The measurement of the first register may yield a string y that does NOT satisfy y . s = 0. Such a corrupted equation, if added to the system, can cause the GF(2) solver to return an incorrect s.

Simulating Noise in Qiskit

"""
Demonstrate Simon's algorithm under depolarizing noise.
The oracle has a per-qubit bit-flip probability p applied to the second register.
"""

import numpy as np
from qiskit import QuantumCircuit, QuantumRegister, ClassicalRegister
from qiskit_aer import AerSimulator
from qiskit_aer.noise import NoiseModel, depolarizing_error


def simon_oracle_noisy(s: str) -> QuantumCircuit:
    """Build the oracle (same as before)."""
    n = len(s)
    qc = QuantumCircuit(2 * n)
    for i in range(n):
        qc.cx(i, n + i)
    if "1" in s:
        control_bit = next(i for i, b in enumerate(s) if b == "1")
        for i, bit in enumerate(s):
            if bit == "1":
                qc.cx(control_bit, n + i)
    return qc


def run_simon_noisy(s: str, p_error: float = 0.01, trials: int = 100):
    """
    Run Simon's algorithm with depolarizing noise on 2-qubit gates.
    Collects extra equations and uses majority voting for robustness.
    """
    n = len(s)

    # Create a noise model: depolarizing error on CX gates
    noise_model = NoiseModel()
    error = depolarizing_error(p_error, 2)
    noise_model.add_all_qubit_quantum_error(error, ["cx"])

    simulator = AerSimulator(noise_model=noise_model)

    # Collect many measurements
    first = QuantumRegister(n, name="x")
    second = QuantumRegister(n, name="fx")
    cr = ClassicalRegister(n, name="c")
    qc = QuantumCircuit(first, second, cr)
    qc.h(first)
    oracle = simon_oracle_noisy(s)
    qc.compose(oracle, inplace=True)
    qc.h(first)
    qc.measure(first, cr)

    job = simulator.run(qc, shots=trials)
    counts = job.result().get_counts()

    # Filter: keep only y values that appear frequently (likely correct)
    threshold = trials * 0.01  # keep outcomes appearing at least 1% of the time
    candidates = []
    for y_str, count in sorted(counts.items(), key=lambda x: -x[1]):
        y = y_str.replace(" ", "")
        if count >= threshold and y != "0" * n:
            candidates.append(y)

    print(f"Noise level: p = {p_error}")
    print(f"Collected {len(candidates)} candidate equations from {trials} shots")
    print(f"Top outcomes: {dict(list(sorted(counts.items(), key=lambda x: -x[1]))[:8])}")

    # Try solving with different subsets
    from itertools import combinations
    if len(candidates) >= n - 1:
        for combo in combinations(candidates, n - 1):
            result = gf2_solve(list(combo), n)
            if result and result == s:
                print(f"Recovered s = {result} (correct)")
                return result

    print(f"Could not recover s with high confidence at noise level p = {p_error}")
    return None


def gf2_solve(equations, n):
    """GF(2) solver (same as main implementation)."""
    matrix = [[int(b) for b in eq] for eq in equations]
    mat = [row[:] for row in matrix]
    m = len(mat)
    pivot_cols = []
    col = 0
    for row_idx in range(m):
        while col < n:
            pivot_row = None
            for r in range(row_idx, m):
                if mat[r][col] == 1:
                    pivot_row = r
                    break
            if pivot_row is not None:
                mat[row_idx], mat[pivot_row] = mat[pivot_row], mat[row_idx]
                pivot_cols.append(col)
                for r in range(m):
                    if r != row_idx and mat[r][col] == 1:
                        mat[r] = [(mat[r][c] ^ mat[row_idx][c]) for c in range(n)]
                col += 1
                break
            col += 1
    free_cols = [c for c in range(n) if c not in pivot_cols]
    if not free_cols:
        return "0" * n
    free_col = free_cols[0]
    s = [0] * n
    s[free_col] = 1
    for i, pc in enumerate(pivot_cols):
        s[pc] = mat[i][free_col] % 2
    return "".join(str(b) for b in s)


if __name__ == "__main__":
    secret = "110"
    print("=== Noiseless run ===")
    run_simon_noisy(secret, p_error=0.0, trials=100)
    print()
    print("=== 1% depolarizing noise ===")
    run_simon_noisy(secret, p_error=0.01, trials=100)
    print()
    print("=== 2% depolarizing noise ===")
    run_simon_noisy(secret, p_error=0.02, trials=200)
    print()
    print("=== 5% depolarizing noise ===")
    run_simon_noisy(secret, p_error=0.05, trials=500)

At 1-2% error rates, the correct y values still dominate the measurement statistics, and the algorithm typically recovers s by filtering for high-frequency outcomes and trying combinations. At 5% and above, the noise floor rises high enough that incorrect y values appear frequently, requiring more sophisticated error mitigation (such as majority voting across multiple independent runs, or consistency checking where you verify y . s_candidate = 0 for all collected equations).

Connection to Quantum Cryptanalysis

Simon’s algorithm has direct implications for cryptography. In work published in 2010 and 2012, Kuwakado and Morii showed that Simon’s algorithm can break certain symmetric-key constructions in polynomial time when the attacker has quantum access to the cipher.

Attacks on Symmetric Primitives

Even-Mansour cipher. The Even-Mansour construction encrypts as E(x) = pi(x XOR k_1) XOR k_2, where pi is a public permutation and k_1, k_2 are secret keys. If an attacker can query the cipher in superposition, they can define a function f(x) = E(x) XOR pi(x) and observe that f(x) = f(x XOR k_1) (since the XOR with k_2 cancels). This is exactly Simon’s problem with s = k_1. The attacker recovers the key in O(n) quantum queries.

3-round Feistel ciphers. A Feistel cipher with three rounds and independent round functions can similarly be attacked. The attacker constructs a function that has the Simon periodicity structure by exploiting the Feistel network’s symmetry. This breaks 3-round Feistel in polynomial quantum queries, compared to the 2^(n/2) classical queries needed.

These attacks require quantum oracle access to the cipher, meaning the attacker can evaluate the cipher on superpositions of inputs. This is a strong assumption, but it is relevant in settings where the cipher is used as a building block in a larger quantum protocol. These results motivate the study of “post-quantum symmetric cryptography,” ensuring that symmetric primitives remain secure even against quantum adversaries with superposition access.

Common Mistakes

Here are pitfalls that frequently trip up students and implementers of Simon’s algorithm.

1. Measuring both registers at the wrong time

The algorithm requires measuring the first register after applying Hadamard gates. A common error is to measure both registers simultaneously at the end, or to measure the first register before applying the Hadamard. The Hadamard transform on the first register is what converts the spatial information (superposition of x_0 and x_0 XOR s) into the frequency domain (the dual subgroup y . s = 0). Without this step, you just get a random input x and learn nothing about s.

In Qiskit, the second register measurement can be implicit (the simulator traces it out), or you can measure it explicitly before the Hadamard on the first register. Either approach is fine, but never skip the Hadamard.

2. Confusing the dual space with the null space of s

The measurement outcomes lie in the space {y : y . s = 0}, which is the null space of the 1-by-n matrix [s], or equivalently, the dual (orthogonal complement) of the subgroup {0, s}. Some students confuse this with the null space of y. The equations are y . s = 0, not s . y = 0 (though over GF(2) these are the same since the inner product is symmetric). The confusion typically arises when students set up the matrix incorrectly, placing s as the unknown in the wrong position. Remember: the y values are known (measured), and s is the unknown. You are solving Y * s = 0 where Y is the matrix whose rows are the measured y vectors.

3. Not checking linear independence before adding to the equation set

If you blindly add every measurement to your equation list, you may collect n - 1 equations that are linearly dependent. The GF(2) solver will then report an underdetermined system or return a wrong answer. Always check that each new measurement is linearly independent of the existing ones before adding it. The code above includes an is_linearly_independent function for this purpose.

A related error: checking independence by simple string comparison (y not in collected) instead of linear algebra. Two strings can be different yet linearly dependent (for example, "110" is the XOR of "100" and "010").

4. Forgetting that s = 0 is a valid answer

The problem statement allows s = 0...0, in which case the function is one-to-one. If your GF(2) solver only looks for non-trivial null space elements, it may fail to return the correct answer when s = 0. In this case, the null space is trivial (only the zero vector), and the solver should report s = 0...0. Always handle this case explicitly.

5. Off-by-one errors in null space extraction

The GF(2) solver must find the null space of an (n-1)-by-n matrix (assuming n-1 independent equations). After row reduction, there should be exactly one free column. A common bug is to look for the free variable among the wrong set of columns, or to back-substitute incorrectly. Double-check that:

Pivot columns are tracked correctly during elimination.
The free column is identified as a column with no pivot.
Back-substitution reads the correct matrix entries (after row reduction, not before).

6. Assuming uniform probability over all valid y

The measurement outcomes are uniformly distributed over the 2^(n-1) elements of {y : y . s = 0}, including the zero vector y = 0...0. Each valid y appears with probability 1/2^(n-1). Some implementations introduce subtle biases through incorrect circuit construction (for example, applying Hadamard gates to the wrong register or in the wrong order). If your simulation shows non-uniform distribution over the valid y values, check your circuit construction carefully.

Also note that the zero vector y = 0...0 is always a valid measurement but carries no information. It appears with probability 1/2^(n-1), which is the same as any other valid y. For n = 3, this is 1/4, so roughly 25% of measurements are wasted on the zero vector.