Quantum Phase Estimation: Deep Dive and Applications

What QPE Does

Quantum phase estimation (QPE) solves the following problem: given a unitary operator U and one of its eigenstates |u⟩, estimate the eigenvalue’s phase θ where U|u⟩ = e^(2πiθ)|u⟩ and θ ∈ [0, 1).

This is more useful than it might sound. Eigenvalues of unitaries directly correspond to energies in quantum simulation (via U = e^(-iHt)), to periods in Shor’s factoring algorithm, and to matrix condition numbers in the HHL linear-systems algorithm. QPE is one of the most reused subroutines in the quantum algorithm toolkit.

Circuit Construction

The standard QPE circuit uses:

t ancilla (counting) qubits, which control precision.
1 or more qubits holding the eigenstate |u⟩.

Steps:

Initialize counting qubits to |0⟩^t and eigenstate register to |u⟩.
Apply Hadamard to all t counting qubits.
Apply controlled-U^(2^k) from counting qubit k to the eigenstate register, for k = 0, 1, ..., t-1.
Apply the inverse quantum Fourier transform (QFT†) to the counting register.
Measure the counting register. The result is the binary fraction approximation of θ.

After step 2 the counting register holds the uniform superposition. Each controlled-U^(2^k) gate accrues a phase e^(2πi * 2^k * θ) onto qubit k via phase kickback (since U^(2^k)|u⟩ = e^(2πi * 2^k * θ)|u⟩). The counting register therefore encodes θ in its phase pattern. The inverse QFT transforms this phase pattern into a computational basis state encoding the binary representation of θ.

Full Mathematical Derivation

This section derives the QPE algorithm step by step, tracking the full quantum state at each stage.

Initial state

The system begins in the product state:

|Ψ_0⟩ = |0⟩^⊗t ⊗ |u⟩

where |u⟩ is an eigenstate of U with eigenvalue e^(2πiθ).

After Hadamard gates on the counting register

Applying H^⊗t to the counting register produces:

|Ψ_1⟩ = (1/√(2^t)) Σ_{k=0}^{2^t - 1} |k⟩ ⊗ |u⟩

Each basis state |k⟩ represents a t-bit integer. The counting register is now a uniform superposition of all 2^t computational basis states.

After controlled unitary applications

The circuit applies controlled-U^(2^j) from counting qubit j to the eigenstate register, for j = 0, 1, ..., t-1. Consider a single basis state |k⟩ in the superposition, where k = k_{t-1} k_{t-2} ... k_1 k_0 in binary. The controlled-U^(2^j) gate applies U^(2^j) when bit k_j = 1, contributing a phase e^(2πi * 2^j * θ * k_j). The combined action of all controlled gates on |k⟩ ⊗ |u⟩ is:

|k⟩ ⊗ |u⟩  →  e^(2πiθ * Σ_{j=0}^{t-1} k_j * 2^j) |k⟩ ⊗ |u⟩
         =  e^(2πiθk) |k⟩ ⊗ |u⟩

since Σ_{j=0}^{t-1} k_j * 2^j = k by the binary representation. The full state is therefore:

|Ψ_2⟩ = (1/√(2^t)) Σ_{k=0}^{2^t - 1} e^(2πiθk) |k⟩ ⊗ |u⟩

This is the central result: the counting register now holds the state (1/√(2^t)) Σ_k e^(2πiθk) |k⟩, which is exactly the quantum Fourier transform of the state |2^t θ⟩ (or the closest integer to it).

After inverse QFT

The quantum Fourier transform maps computational basis states to phase-encoded states:

QFT|m⟩ = (1/√(2^t)) Σ_{k=0}^{2^t - 1} e^(2πi mk / 2^t) |k⟩

Comparing with the counting register state in |Ψ_2⟩, we see it equals QFT|m⟩ when θ = m/2^t exactly. Applying the inverse QFT recovers:

QFT†: (1/√(2^t)) Σ_{k=0}^{2^t - 1} e^(2πiθk) |k⟩  →  |m⟩   (when θ = m/2^t)

The final state is |m⟩ ⊗ |u⟩, and measuring the counting register yields m with certainty. The phase estimate is θ_est = m / 2^t.

Non-exact phases

When θ ≠ m/2^t for any integer m, the inverse QFT does not produce a single basis state. Instead, the measurement outcome is probabilistic. The probability of measuring integer m is:

P(m) = (1/2^(2t)) |Σ_{k=0}^{2^t - 1} e^(2πi(θ - m/2^t)k)|^2

Evaluating this geometric sum gives the closed-form expression:

P(m) = sin²(π(2^t θ - m)) / (2^(2t) sin²(π(θ - m/2^t)))

This is a sinc-like distribution centered at m* = round(2^t θ). The closest integer m* has probability at least 4/π² ≈ 0.405, and the two closest integers together capture at least 8/π² ≈ 0.81 of the probability. Adding extra qubits beyond the log2(1/ε) minimum increases the success probability exponentially.

Phase Kickback: Detailed Derivation

Phase kickback is the mechanism that makes QPE work. It transfers eigenvalue phase information from the target register onto the control qubit, without disturbing the eigenstate.

Statement

Let U be a unitary with eigenvector |u⟩ and eigenvalue e^(iφ):

U|u⟩ = e^(iφ)|u⟩

Consider a controlled-U gate with the control qubit in state (|0⟩ + |1⟩)/√2 and the target register in |u⟩. The joint state evolves as:

(|0⟩ + |1⟩)/√2 ⊗ |u⟩
→ (|0⟩ ⊗ I|u⟩ + |1⟩ ⊗ U|u⟩) / √2
= (|0⟩ ⊗ |u⟩ + |1⟩ ⊗ e^(iφ)|u⟩) / √2
= (|0⟩ + e^(iφ)|1⟩) / √2  ⊗  |u⟩

The phase e^(iφ) has “kicked back” from the target onto the control qubit. The target register remains in |u⟩, completely undisturbed.

Why the eigenstate is not disturbed

The controlled-U gate acts as |0⟩⟨0| ⊗ I + |1⟩⟨1| ⊗ U. When the target is an eigenstate, U acts as scalar multiplication, so the target factors out of the tensor product. This factorization is what allows the phase to appear solely on the control qubit. If the target were not an eigenstate, entanglement between control and target would prevent this clean separation.

Concrete example: T gate and |1⟩

The T gate has the matrix diag(1, e^(iπ/4)). Its eigenstates are |0⟩ (eigenvalue 1) and |1⟩ (eigenvalue e^(iπ/4)). Consider controlled-T with control in |+⟩ = (|0⟩ + |1⟩)/√2 and target in |1⟩:

(|0⟩ + |1⟩)/√2 ⊗ |1⟩
→ (|0⟩ ⊗ |1⟩ + |1⟩ ⊗ T|1⟩) / √2
= (|0⟩ ⊗ |1⟩ + e^(iπ/4) |1⟩ ⊗ |1⟩) / √2
= (|0⟩ + e^(iπ/4)|1⟩) / √2  ⊗  |1⟩

The control qubit now carries the phase π/4, which encodes θ = 1/8 (since e^(iπ/4) = e^(2πi/8)). This is exactly the information that QPE extracts.

Phase kickback is relative

Note that the kickback phase is a relative phase between |0⟩ and |1⟩ on the control qubit. A global phase (applied to both components equally) would be unobservable. The key point is that controlled-U applies U only on the |1⟩ branch, creating a relative phase. This relative phase is then converted to a measurable quantity by the inverse QFT.

Controlled Power Implementations

The controlled-U^(2^k) gates are the most expensive part of QPE. How to implement them efficiently depends on the structure of U.

Diagonal unitaries

For diagonal unitaries like the T gate, U^n simply raises each diagonal entry to the n-th power. If U = diag(1, e^(iα)), then U^(2^k) = diag(1, e^(i * 2^k * α)). The controlled version is a single controlled-phase gate:

import numpy as np
from qiskit import QuantumCircuit, QuantumRegister

t = 3  # counting qubits
control_qubit = QuantumRegister(t, name="count")
target_qubit = QuantumRegister(1, name="eig")
qc = QuantumCircuit(control_qubit, target_qubit)

# Controlled-T^(2^k) is a single cp gate
for k in range(t):
    angle = 2**k * np.pi / 4  # T gate angle scaled by 2^k
    qc.cp(angle, control_qubit[k], target_qubit[0])

# One two-qubit gate per counting qubit, no matter how large 2^k gets.
print(qc.draw(output="text"))
print("Two-qubit gate count:", qc.num_nonlocal_gates())

This is optimal: each controlled power costs exactly one two-qubit gate regardless of k.

Trotter product formulas

For Hamiltonians decomposed as H = Σ_j H_j, the time evolution U = e^(-iHt) can be approximated by Trotterization:

U ≈ (e^(-iH_1 t/n) e^(-iH_2 t/n) ... e^(-iH_m t/n))^n

To implement U^(2^k), you simply repeat the Trotter circuit 2^k times. For second-order Trotter (Suzuki formula):

U_2(t) = e^(-iH_1 t/2) e^(-iH_2 t/2) ... e^(-iH_m t) ... e^(-iH_2 t/2) e^(-iH_1 t/2)

The circuit depth for U^(2^k) scales as O(2^k * m) where m is the number of Pauli terms. This exponential growth in depth with k is the primary bottleneck of QPE for chemistry applications.

Qubitization and block encodings

For more sophisticated approaches, qubitization replaces Trotterization with a single query to a block encoding of the Hamiltonian. A block encoding embeds H/α (normalized by the spectral norm α) in the upper-left block of a larger unitary W:

W = [ H/α    *  ]
    [  *      *  ]

The walk operator W has eigenvalues e^(±i arccos(E_j/α)), where E_j are eigenvalues of H. QPE on W extracts these eigenvalues with query complexity O(α/ε) instead of the Trotter-based O(α²/ε), a quadratic improvement. The controlled-W^(2^k) operations each require 2^k queries to the block encoding oracle, but the per-query cost is independent of the system size.

Precision and Qubit Count

With t counting qubits you can estimate θ to t bits of precision, i.e., to within 1/2^t. More precisely, the probability of measuring a value within ε of θ is at least 1 - δ when:

t >= log2(1/ε) + log2(2 + 1/(2δ))

For ε = 0.01 and δ = 0.05 (95% success probability, 1% precision) you need roughly t = 8 to 10 counting qubits.

The depth cost comes from the controlled-U^(2^k) gates. If U requires depth d to implement, the controlled-U^(2^(t-1)) gate requires simulating 2^(t-1) applications of U. For large t this is expensive. Iterative QPE (described below) trades circuit depth for classical iterations.

Non-Exact Phase Behavior: Probability Distribution

When the true phase θ does not lie exactly on the t-bit grid, the QPE output is a probability distribution over all 2^t outcomes. Understanding this distribution is essential for assessing QPE accuracy.

The probability formula

For true phase θ and measured integer m with t counting qubits:

P(m) = (1/2^t)² |Σ_{k=0}^{2^t-1} e^(2πi(θ - m/2^t)k)|²

Let δ = θ - m/2^t. The geometric sum evaluates to:

Σ_{k=0}^{N-1} e^(2πiδk) = (1 - e^(2πiδN)) / (1 - e^(2πiδ))

where N = 2^t. Taking the modulus squared:

P(m) = sin²(πδN) / (N² sin²(πδ))

This is a periodic sinc-like function. It peaks sharply when δ → 0 (i.e., m/2^t is close to θ) and has sidelobes that decay as 1/sin²(πδ).

Qiskit simulation of non-exact phases

The following code runs QPE for θ = 0.1 with t = 4 counting qubits. Since 0.1 × 16 = 1.6, the true phase falls between grid points m = 1 and m = 2. The output distribution should peak at m = 2 (nearest grid point).

import numpy as np
from qiskit import QuantumCircuit, QuantumRegister, ClassicalRegister, transpile
from qiskit_aer import AerSimulator
from qiskit.circuit.library import QFT


def qpe_arbitrary_phase(theta: float, t: int) -> QuantumCircuit:
    """
    QPE for a single-qubit unitary U = diag(1, e^(2*pi*i*theta)).
    The eigenstate is |1> with eigenvalue e^(2*pi*i*theta).
    """
    counting = QuantumRegister(t, name="count")
    eigenstate = QuantumRegister(1, name="eig")
    cr = ClassicalRegister(t, name="c")
    qc = QuantumCircuit(counting, eigenstate, cr)

    # Prepare eigenstate |1>
    qc.x(eigenstate[0])

    # Hadamard on counting qubits
    qc.h(counting)

    # Controlled-U^(2^k): controlled phase gate with angle 2^k * 2*pi*theta
    for k in range(t):
        angle = 2**k * 2 * np.pi * theta
        qc.cp(angle, counting[k], eigenstate[0])

    # Inverse QFT on counting register
    iqft = QFT(t, inverse=True, do_swaps=True)
    qc.compose(iqft, qubits=counting, inplace=True)

    qc.measure(counting, cr)
    return qc


def run_non_exact_phase():
    theta = 0.1
    t = 4
    N = 2**t

    qc = qpe_arbitrary_phase(theta, t)
    sim = AerSimulator()
    compiled = transpile(qc, sim)
    job = sim.run(compiled, shots=8192)
    counts = job.result().get_counts()

    print(f"QPE for theta = {theta} with t = {t} counting qubits")
    print(f"Grid spacing: 1/{N} = {1/N:.4f}")
    print(f"Nearest grid point: {round(N * theta)}/{N} = {round(N * theta) / N:.4f}")
    print()

    # Compare measured vs theoretical distributions
    print(f"{'m':>4}  {'theta_est':>10}  {'P_theory':>10}  {'P_measured':>10}")
    print("-" * 42)

    for m in range(N):
        bitstring = format(m, f"0{t}b")
        measured_count = counts.get(bitstring, 0)
        p_measured = measured_count / 8192

        # Theoretical probability
        delta = theta - m / N
        if abs(delta) < 1e-12:
            p_theory = 1.0
        else:
            p_theory = (np.sin(np.pi * delta * N) ** 2) / (N**2 * np.sin(np.pi * delta) ** 2)

        if p_theory > 0.01 or p_measured > 0.01:
            print(f"{m:4d}  {m/N:10.4f}  {p_theory:10.4f}  {p_measured:10.4f}")


if __name__ == "__main__":
    run_non_exact_phase()

Expected output: the distribution peaks at m = 2 (θ_est = 0.125) with approximately 57% probability, with secondary weight at m = 1 (θ_est = 0.0625) near 26%. The remaining probability spreads across other bins with rapidly decreasing weights. Together the two bins bracketing the true phase hold about 83% of the total, consistent with the 8/π² ≈ 0.81 lower bound derived above.

Qiskit Implementation: T-Gate Phase

The T gate has eigenvalue e^(iπ/4) = e^(2πi * 1/8), so the true phase is θ = 1/8 = 0.125. With 3 counting qubits we should recover θ = 001 in binary, i.e., the integer 1 out of 8.

import numpy as np
from qiskit import QuantumCircuit, QuantumRegister, ClassicalRegister, transpile
from qiskit_aer import AerSimulator
from qiskit.circuit.library import QFT


def qpe_t_gate(t: int = 3) -> QuantumCircuit:
    """
    QPE circuit estimating the phase of the T gate.
    T|1> = e^(i*pi/4)|1>, so theta = 1/8.
    The eigenstate is |1>.
    """
    counting = QuantumRegister(t, name="count")
    eigenstate = QuantumRegister(1, name="eig")
    cr = ClassicalRegister(t, name="c")
    qc = QuantumCircuit(counting, eigenstate, cr)

    # Prepare eigenstate |1> of T gate
    qc.x(eigenstate[0])

    # Hadamard on counting qubits
    qc.h(counting)
    qc.barrier()

    # Controlled-U^(2^k): controlled-T^(2^k)
    # T^(2^k) = P(2^k * pi/4) = Rz(2^k * pi/4) up to global phase
    for k in range(t):
        angle = 2 ** k * np.pi / 4
        qc.cp(angle, counting[k], eigenstate[0])

    qc.barrier()

    # Inverse QFT on counting register
    iqft = QFT(t, inverse=True, do_swaps=True)
    qc.compose(iqft, qubits=counting, inplace=True)

    qc.measure(counting, cr)
    return qc


def run_qpe():
    t = 3
    qc = qpe_t_gate(t)
    sim = AerSimulator()
    compiled = transpile(qc, sim)
    job = sim.run(compiled, shots=2048)
    counts = job.result().get_counts()

    print(f"QPE with t={t} counting qubits for T-gate phase (theta=1/8)")
    print(f"Raw counts: {counts}")

    for bitstring, count in sorted(counts.items(), key=lambda x: -x[1]):
        # Qiskit classical register: integer value is int(bitstring, 2)
        estimated_theta = int(bitstring, 2) / (2 ** t)
        print(f"  Bitstring: {bitstring} -> theta = {int(bitstring, 2)}/{2**t} = {estimated_theta:.4f} ({count} shots)")


if __name__ == "__main__":
    run_qpe()

Expected output: all 2048 shots produce bitstring 001, giving theta = 1/8 = 0.1250.

The Hadamard Test: Full Qiskit Implementation

When you do not have direct access to prepare |u⟩ but can apply U as a controlled operation, the Hadamard test estimates Re(⟨ψ|U|ψ⟩) for any state |ψ⟩. The circuit uses a single ancilla qubit:

Initialize ancilla to |0⟩, target to |ψ⟩.
Hadamard on ancilla.
Controlled-U from ancilla to target.
Hadamard on ancilla.
Measure ancilla. Probability of 0 is (1 + Re(⟨ψ|U|ψ⟩)) / 2.

By inserting an S† gate before the final Hadamard you get the imaginary part instead. Together, real and imaginary parts give the full expectation value. This is useful for quantum chemistry energy estimation when the eigenstate is prepared variationally.

Qiskit implementation

The following code estimates both Re(⟨+|T|+⟩) and Im(⟨+|T|+⟩) using the Hadamard test. Analytically:

⟨+|T|+⟩ = (⟨0| + ⟨1|)(T|0⟩ + T|1⟩) / 2
         = (⟨0|0⟩ + ⟨1|e^(iπ/4)|1⟩) / 2  (cross terms vanish since T is diagonal)
         = (1 + e^(iπ/4)) / 2

Computing numerically: Re = (1 + cos(π/4)) / 2 ≈ 0.8536 and Im = sin(π/4) / 2 ≈ 0.3536.

import numpy as np
from qiskit import QuantumCircuit, transpile
from qiskit_aer import AerSimulator


def hadamard_test_real(shots: int = 8192) -> float:
    """
    Estimate Re(<+|T|+>) using the Hadamard test.
    |psi> = |+> = H|0>, U = T gate.
    """
    qc = QuantumCircuit(2, 1)

    # Prepare |psi> = |+> on qubit 1
    qc.h(1)

    # Hadamard on ancilla (qubit 0)
    qc.h(0)

    # Controlled-T from ancilla to target.
    # Qiskit has no qc.ct shortcut; controlled-T equals a controlled-phase
    # of pi/4 since T = diag(1, e^(i*pi/4)).
    qc.cp(np.pi / 4, 0, 1)

    # Hadamard on ancilla
    qc.h(0)

    # Measure ancilla
    qc.measure(0, 0)

    sim = AerSimulator()
    compiled = transpile(qc, sim)
    job = sim.run(compiled, shots=shots)
    counts = job.result().get_counts()

    p0 = counts.get("0", 0) / shots
    # Re(<psi|U|psi>) = 2*P(0) - 1
    return 2 * p0 - 1


def hadamard_test_imag(shots: int = 8192) -> float:
    """
    Estimate Im(<+|T|+>) using the Hadamard test with S-dagger.
    """
    qc = QuantumCircuit(2, 1)

    # Prepare |psi> = |+> on qubit 1
    qc.h(1)

    # Hadamard on ancilla (qubit 0)
    qc.h(0)

    # Controlled-T from ancilla to target (controlled-phase of pi/4).
    qc.cp(np.pi / 4, 0, 1)

    # S-dagger on ancilla (rotates to extract imaginary part)
    qc.sdg(0)

    # Hadamard on ancilla
    qc.h(0)

    # Measure ancilla
    qc.measure(0, 0)

    sim = AerSimulator()
    compiled = transpile(qc, sim)
    job = sim.run(compiled, shots=shots)
    counts = job.result().get_counts()

    p0 = counts.get("0", 0) / shots
    # Im(<psi|U|psi>) = 2*P(0) - 1
    return 2 * p0 - 1


def run_hadamard_test():
    re_measured = hadamard_test_real()
    im_measured = hadamard_test_imag()

    # Analytical values
    re_exact = (1 + np.cos(np.pi / 4)) / 2
    im_exact = np.sin(np.pi / 4) / 2

    print("Hadamard test: estimating <+|T|+>")
    print(f"  Re(<+|T|+>): measured = {re_measured:.4f}, exact = {re_exact:.4f}")
    print(f"  Im(<+|T|+>): measured = {im_measured:.4f}, exact = {im_exact:.4f}")
    print(f"  Full value:  measured = {re_measured:.4f} + {im_measured:.4f}i")
    print(f"               exact    = {re_exact:.4f} + {im_exact:.4f}i")


if __name__ == "__main__":
    run_hadamard_test()

The measurements converge to the analytical values with statistical fluctuations that decrease as 1/√(shots). With 8192 shots, you should expect accuracy to about 0.01.

Iterative QPE: Full Qiskit Implementation

Standard QPE requires a circuit depth proportional to t * max_depth(U^(2^k)), which is deep. Iterative QPE (Kitaev’s phase estimation) trades depth for repeated single-qubit measurements.

Each iteration uses a single counting qubit and one controlled-U^(2^k) application, measuring to learn the k-th bit of θ (from least significant to most significant, with feedback). The circuit depth per iteration is constant. The total number of circuit runs is t (plus overhead for error correction and repetition).

Iterative QPE is the practical choice on near-term hardware because it avoids storing large quantum states across deep circuits.

Algorithm details

The iterative QPE algorithm extracts θ one bit at a time, starting from the least significant bit and working up. In round j (where j = t-1, t-2, ..., 0), the circuit:

Prepares a single ancilla in |0⟩.
Applies H to the ancilla.
Applies controlled-U^(2^j) to the eigenstate.
Applies a phase correction Rz(-ω_j) on the ancilla, where ω_j accounts for previously measured bits.
Applies H to the ancilla.
Measures the ancilla to obtain bit b_j.

The phase correction is:

ω_j = π * Σ_{k=j+1}^{t-1} b_k * 2^(j-k+1)

This correction removes the influence of already-determined bits so that each measurement isolates a single bit of θ.

Complete implementation

The following code implements iterative QPE for the phase gate P(2πθ) with configurable θ:

import numpy as np
from qiskit import QuantumCircuit, transpile
from qiskit_aer import AerSimulator


def iterative_qpe(theta: float, t: int, shots_per_round: int = 1) -> float:
    """
    Iterative (Kitaev) QPE for U = diag(1, e^(2*pi*i*theta)).
    Extracts t bits of theta using a single ancilla qubit per round.

    Parameters:
        theta: true phase in [0, 1)
        t: number of bits of precision
        shots_per_round: shots per circuit execution (1 for deterministic,
                         more for majority voting on noisy hardware)

    Returns:
        Estimated phase as a float in [0, 1).
    """
    sim = AerSimulator()
    bits = [0] * t  # bits[j] stores bit j (j=0 is most significant)

    # Extract bits from most significant (j=t-1) to least significant (j=0)
    # But the standard iterative protocol goes from least significant to most.
    # Here we extract bit j of the binary fraction 0.b_{t-1} b_{t-2} ... b_0.
    # Round j extracts bit b_j using controlled-U^(2^j).

    measured_bits = []  # measured_bits[k] = bit from round k

    for j in range(t - 1, -1, -1):
        qc = QuantumCircuit(2, 1)

        # Prepare eigenstate |1> on qubit 1
        qc.x(1)

        # Hadamard on ancilla (qubit 0)
        qc.h(0)

        # Controlled-U^(2^j): phase gate with angle 2^j * 2*pi*theta
        angle = 2**j * 2 * np.pi * theta
        qc.cp(angle, 0, 1)

        # Phase correction from previously measured (less significant) bits
        omega = 0.0
        for k, prev_bit in enumerate(measured_bits):
            # Each previously measured bit b_k contributes to the correction.
            # The correction for round j given bit from round (j + k + 1):
            omega += prev_bit * np.pi / (2 ** (k + 1))

        if abs(omega) > 1e-12:
            qc.p(-omega, 0)

        # Hadamard on ancilla
        qc.h(0)

        # Measure
        qc.measure(0, 0)

        compiled = transpile(qc, sim)

        if shots_per_round == 1:
            job = sim.run(compiled, shots=1)
            result = job.result().get_counts()
            bit = int(list(result.keys())[0])
        else:
            # Majority voting for noisy scenarios
            job = sim.run(compiled, shots=shots_per_round)
            result = job.result().get_counts()
            zeros = result.get("0", 0)
            ones = result.get("1", 0)
            bit = 0 if zeros >= ones else 1

        measured_bits.insert(0, bit)

    # Convert measured bits to phase
    # measured_bits is now [b_{t-1}, b_{t-2}, ..., b_0]
    # theta_est = sum_j b_j / 2^(t-j)
    theta_est = 0.0
    for j, b in enumerate(measured_bits):
        theta_est += b / (2 ** (j + 1))

    return theta_est


def run_iterative_qpe():
    test_cases = [
        (0.125, 3, "1/8 (exact on 3-bit grid)"),
        (0.25, 4, "1/4 (exact on 4-bit grid)"),
        (0.1, 6, "0.1 (non-exact, 6 bits)"),
        (0.333, 8, "~1/3 (non-exact, 8 bits)"),
    ]

    for theta, t, desc in test_cases:
        theta_est = iterative_qpe(theta, t)
        error = abs(theta_est - theta)
        print(f"theta = {theta} ({desc})")
        print(f"  Estimated: {theta_est:.6f}, Error: {error:.6f}")
        print(f"  Max possible error with {t} bits: {1/2**t:.6f}")
        print()


if __name__ == "__main__":
    run_iterative_qpe()

For exact phases (like θ = 1/8 with t = 3), iterative QPE always returns the correct answer. For non-exact phases, the estimate rounds to the nearest representable value, with accuracy limited by the number of bits t.

Application: Eigenvalue Estimation for Chemistry

In quantum chemistry, the Hamiltonian H of a molecule is decomposed into Pauli terms. The ground state energy is the smallest eigenvalue of H. If you can prepare the ground state (via VQE, for example) and construct U = e^(-iHt) as a quantum circuit (Trotterization), QPE estimates the energy E from the phase θ = Et / (2π).

For hydrogen at equilibrium bond length, t = 8 to 12 counting qubits yields energies accurate to milli-Hartree precision, sufficient for chemical accuracy.

Qiskit implementation: H2 ground state energy

The following code constructs a simplified H2 Hamiltonian as a SparsePauliOp, builds the Trotter circuit for the time evolution operator, and runs QPE to estimate the ground state energy.

Two details are worth flagging before the code, because both are easy to get wrong:

The evolution must be controlled. The whole point of QPE is that counting qubit k picks up the phase e^(2πi·2^k·θ) by phase kickback. If you apply the Trotter steps to the system register unconditionally, the counting register stays in its uniform superposition, the inverse QFT maps it straight back to |0...0⟩, and QPE reports m = 0 no matter what the Hamiltonian is. The fix is to build the Trotter step as its own circuit, convert it with .to_gate().control(1), and condition it on counting[k].

The step size must not shrink with k. Implementing U^(2^k) means repeating the same Trotter step 2^k · n_trotter times. If you also divide dt by 2^k, the total evolution time collapses back to a constant and the 2^k scaling that QPE depends on is destroyed.

On the energy scale. The six coefficients below give the electronic energy in this two-qubit encoding; the ground eigenvalue is about -1.85 Hartree. The familiar -1.137 Hartree figure for H2 is the total energy, which additionally includes the nuclear repulsion term (roughly +0.71 Hartree at this geometry). That constant is not part of the qubit Hamiltonian, so QPE never sees it.

import numpy as np
from qiskit import QuantumCircuit, QuantumRegister, ClassicalRegister, transpile
from qiskit_aer import AerSimulator
from qiskit.circuit.library import QFT
from qiskit.quantum_info import SparsePauliOp


def build_h2_hamiltonian() -> SparsePauliOp:
    """
    Simplified H2 Hamiltonian at equilibrium bond length (0.735 Angstrom)
    in the STO-3G basis using the Bravyi-Kitaev transformation.
    H = g0*II + g1*ZI + g2*IZ + g3*ZZ + g4*XX + g5*YY
    Coefficients from Ref: O'Malley et al., PRX 6, 031007 (2016).
    """
    g0 = -0.4804
    g1 = +0.3435
    g2 = -0.4347
    g3 = +0.5716
    g4 = +0.0910
    g5 = +0.0910

    hamiltonian = SparsePauliOp.from_list([
        ("II", g0),
        ("ZI", g1),
        ("IZ", g2),
        ("ZZ", g3),
        ("XX", g4),
        ("YY", g5),
    ])
    return hamiltonian


def trotter_step_circuit(coeffs: dict, dt: float) -> QuantumCircuit:
    """
    One first-order Trotter step approximating exp(-i * H' * dt), where H' is
    the Hamiltonian WITHOUT the identity term (that term is a pure offset and
    is added back classically).

    Returned as a standalone 2-qubit circuit so it can be turned into a gate
    and CONTROLLED. This is the part QPE actually needs: an uncontrolled
    evolution imparts no phase on the counting register at all.
    """
    step = QuantumCircuit(2, name="trot")
    q0, q1 = 0, 1

    # Single-Z terms: exp(-i * g * dt * Z) = Rz(2 * g * dt)
    step.rz(2 * coeffs["ZI"] * dt, q0)
    step.rz(2 * coeffs["IZ"] * dt, q1)

    # ZZ term: exp(-i * g * dt * Z0 Z1), decomposed as CNOT, Rz, CNOT
    step.cx(q0, q1)
    step.rz(2 * coeffs["ZZ"] * dt, q1)
    step.cx(q0, q1)

    # XX term: basis-change with H on both, then a ZZ-type rotation
    step.h(q0)
    step.h(q1)
    step.cx(q0, q1)
    step.rz(2 * coeffs["XX"] * dt, q1)
    step.cx(q0, q1)
    step.h(q0)
    step.h(q1)

    # YY term: basis-change with Sdg then H, ZZ-type rotation, then H then S
    step.sdg(q0)
    step.sdg(q1)
    step.h(q0)
    step.h(q1)
    step.cx(q0, q1)
    step.rz(2 * coeffs["YY"] * dt, q1)
    step.cx(q0, q1)
    step.h(q0)
    step.h(q1)
    step.s(q0)
    step.s(q1)

    return step


def qpe_h2(t: int = 4, n_trotter: int = 2, evolution_time: float = 2.0):
    """
    QPE for the H2 ground state energy.
    Uses t counting qubits and n_trotter Trotter steps per application of U.
    """
    coeffs = {"ZI": 0.3435, "IZ": -0.4347, "ZZ": 0.5716,
              "XX": 0.0910, "YY": 0.0910}
    g0 = -0.4804  # identity coefficient (pure energy offset)

    # Exact reference from classical diagonalization, for comparison.
    exact_energy = float(np.linalg.eigvalsh(build_h2_hamiltonian().to_matrix())[0])

    counting = QuantumRegister(t, name="count")
    system = QuantumRegister(2, name="sys")
    cr = ClassicalRegister(t, name="c")
    qc = QuantumCircuit(counting, system, cr)

    # Prepare the ground state of H'. It lives in the {|01>, |10>} subspace
    # and has the form cos(alpha)|01> + sin(alpha)|10>. This Ry + CNOT ansatz
    # reaches it exactly; alpha comes from diagonalizing the 2x2 block.
    alpha = -0.1149
    qc.x(system[0])
    qc.ry(2 * alpha, system[1])
    qc.cx(system[1], system[0])

    # Hadamard on counting qubits
    qc.h(counting)

    # Controlled-U^(2^k). U = exp(-i H' * evolution_time), built from
    # n_trotter Trotter steps of size dt. Applying it 2^k times means
    # repeating the CONTROLLED step 2^k * n_trotter times. Note dt does NOT
    # shrink with k: that is what makes the phase scale as 2^k.
    dt = evolution_time / n_trotter
    controlled_step = trotter_step_circuit(coeffs, dt).to_gate().control(1)

    for k in range(t):
        for _ in range(2**k * n_trotter):
            qc.append(controlled_step, [counting[k], system[0], system[1]])

    # Inverse QFT
    iqft = QFT(t, inverse=True, do_swaps=True)
    qc.compose(iqft, qubits=counting, inplace=True)

    qc.measure(counting, cr)

    # Run simulation
    sim = AerSimulator()
    compiled = transpile(qc, sim, optimization_level=1)
    job = sim.run(compiled, shots=4096)
    counts = job.result().get_counts()

    print(f"QPE for H2 with t={t} counting qubits, {n_trotter} Trotter steps")
    print(f"Exact ground state energy (diagonalization): {exact_energy:.4f} Hartree")
    print(f"Grid resolution with t={t}: {2 * np.pi / evolution_time / 2**t:.4f} Hartree")
    print()

    for bitstring, count in sorted(counts.items(), key=lambda x: -x[1])[:3]:
        m = int(bitstring, 2)
        theta = m / 2**t
        # U = exp(-i H' t0) has eigenvalue exp(-i E' t0), and QPE reports theta
        # with exp(2*pi*i*theta) = exp(-i E' t0). So E' = -2*pi*theta / t0,
        # and the full energy adds the identity offset back: E = E' + g0.
        e_prime = -2 * np.pi * theta / evolution_time
        energy = e_prime + g0
        print(f"  m={m:2d}, theta={theta:.4f}, E = {energy:+.4f} Hartree ({count} shots)")

    print()
    print("The dominant outcome is the nearest t-bit grid point to the true")
    print("phase, so the residual error here is set by the QPE grid, not by")
    print("Trotter error. More counting qubits shrink the grid spacing.")


if __name__ == "__main__":
    qpe_h2()

The phase-to-energy conversion follows from the relationship U = e^(-iH't₀), where t₀ is the evolution time (not the counting-qubit count). The eigenvalue of U is e^(-iE't₀), and QPE measures θ where e^(2πiθ) = e^(-iE't₀). Solving gives E' = -2πθ/t₀. The identity coefficient g0 is an overall offset that is added back classically: E = E' + g0.

The choice t₀ = 2.0 is not arbitrary. QPE can only report θ ∈ [0, 1), so the evolution time has to be small enough that every eigenphase E'·t₀/(2π) stays inside one period. Pick t₀ too large and distinct eigenvalues alias onto the same θ, which is indistinguishable from a correct answer in the output histogram.

The accuracy of this estimate depends on three factors: (1) the quality of the initial state preparation (how close to the true ground state), (2) the Trotter error (controlled by n_trotter), and (3) the QPE precision (controlled by t). Here the Ry + CNOT ansatz reaches the ground state exactly and the Trotter error is small enough not to move the peak, so the residual error is set entirely by factor (3): the true phase is θ ≈ 0.4363 and the nearest 4-bit grid point is 7/16 = 0.4375, which back-converts to -1.8548 Hartree against an exact -1.8512.

Application: QPE Inside HHL

The HHL algorithm (Harrow-Hassidim-Lloyd) solves linear systems Ax = b in quantum superposition. It requires estimating the eigenvalues of A to invert them. The subroutine structure is:

Phase-encode |b⟩ into the eigenstate register.
Run QPE to copy eigenvalue estimates into an ancilla register.
Apply controlled rotations to invert eigenvalues (rotation angle = 1/λ).
Uncompute QPE (run QPE†).
Measure success ancilla.

The precision of QPE directly determines the accuracy of HHL. Poor phase estimation means poorly inverted eigenvalues and a wrong solution vector.

QPE substructure in HHL: code sketch

The following code illustrates the QPE and eigenvalue inversion steps for a 2x2 system. Consider A with eigenvalues λ_1 = 1 and λ_2 = 2. The HHL circuit uses QPE to encode these eigenvalues in a counting register, then applies controlled-Ry rotations proportional to 1/λ to an ancilla qubit.

import numpy as np
from qiskit import QuantumCircuit, QuantumRegister, ClassicalRegister, transpile
from qiskit_aer import AerSimulator
from qiskit.circuit.library import QFT


def hhl_demo():
    """
    Simplified HHL circuit. The controlled-U implemented below is diag(1, e^(i*pi/2)),
    so the only eigenvalue this demo actually exercises is lambda = 1 (theta = 0.25,
    m = 1). The other eigenvector picks up no phase at all and lands in m = 0, so the
    m = 2 inversion branch never fires. Kept because the m=1 branch shows the mechanism.
    Uses 2 counting qubits for QPE, 1 system qubit, 1 ancilla for inversion.

    Circuit layout:
    - qubits 0-1: counting register (QPE)
    - qubit 2: system register (|b>)
    - qubit 3: ancilla for eigenvalue inversion
    """
    t = 2  # counting qubits
    counting = QuantumRegister(t, name="count")
    system = QuantumRegister(1, name="sys")
    ancilla = QuantumRegister(1, name="anc")
    cr = ClassicalRegister(1, name="c")
    qc = QuantumCircuit(counting, system, ancilla, cr)

    # Prepare |b> state on system qubit
    # For demonstration, |b> = |1> (an eigenstate of our simple A)
    qc.x(system[0])

    # === QPE forward ===
    # Hadamard on counting qubits
    qc.h(counting[0])
    qc.h(counting[1])

    # Controlled-U^(2^k). U = diag(1, e^(i*pi/2)), so the lambda=1 eigenvector gets
    # theta = 0.25 and lands in m=1. The other eigenvector gets theta = 0 and lands
    # in m=0, which means this demo only ever exercises the lambda=1 branch.
    # with 2 counting qubits

    # Controlled-U (k=0): phase of 2*pi*theta for each eigenvalue
    qc.cp(np.pi / 2, counting[0], system[0])  # eigenvalue-dependent phase

    # Controlled-U^2 (k=1): phase of 2*2*pi*theta
    qc.cp(np.pi, counting[1], system[0])

    # Inverse QFT on counting register
    iqft = QFT(t, inverse=True, do_swaps=True)
    qc.compose(iqft, qubits=counting, inplace=True)

    qc.barrier()

    # === Eigenvalue inversion ===
    # Counting register now holds |m> where m encodes lambda.
    # Apply controlled-Ry(2*arcsin(C/lambda)) to ancilla.
    # C is a normalization constant (C <= min eigenvalue).
    C = 0.5

    # If counting = |01> (m=1, lambda=1): Ry(2*arcsin(0.5/1))
    angle_1 = 2 * np.arcsin(C / 1.0)
    # If counting = |10> (m=2, lambda=2): Ry(2*arcsin(0.5/2))
    angle_2 = 2 * np.arcsin(C / 2.0)

    # Controlled rotation conditioned on counting register value
    # m=1: counting[1]=0, counting[0]=1
    qc.x(counting[1])  # flip so both are 1 when m=1
    qc.mcry(angle_1, [counting[0], counting[1]], ancilla[0])
    qc.x(counting[1])

    # m=2: counting[1]=1, counting[0]=0
    qc.x(counting[0])  # flip so both are 1 when m=2
    qc.mcry(angle_2, [counting[0], counting[1]], ancilla[0])
    qc.x(counting[0])

    qc.barrier()

    # === QPE inverse (uncompute) ===
    qft = QFT(t, inverse=False, do_swaps=True)
    qc.compose(qft, qubits=counting, inplace=True)

    # Undo controlled unitaries
    qc.cp(-np.pi, counting[1], system[0])
    qc.cp(-np.pi / 2, counting[0], system[0])

    qc.h(counting[0])
    qc.h(counting[1])

    # Measure ancilla: post-select on |1> to get solution
    qc.measure(ancilla, cr)

    # Run
    sim = AerSimulator()
    compiled = transpile(qc, sim)
    job = sim.run(compiled, shots=4096)
    counts = job.result().get_counts()

    print("HHL demo: only the lambda = 1 eigenvector is exercised")
    print(f"Ancilla measurement results: {counts}")
    p_success = counts.get("1", 0) / 4096

    # |b> = |1> was prepared as a SINGLE eigenstate (the lambda = 1 one), so
    # QPE lands entirely in the m=1 branch and the ancilla rotation is
    # Ry(2*arcsin(C/1)). The success amplitude is therefore C/lambda_1 and
    # P(ancilla=1) = (C/lambda_1)^2. In a full HHL, where |b> is spread over several
    # eigenvectors with distinct nonzero eigenvalues, you would instead get the
    # weighted average of C^2/lambda^2. This circuit does not implement that case.
    expected = (C / 1.0) ** 2

    print(f"Success probability (ancilla=1): {p_success:.4f}")
    print(f"Expected for this |b>: (C/lambda_1)^2 = {expected:.4f}")
    print()
    print("The QPE precision (t=2 here) determines the smallest eigenvalue")
    print("that can be resolved and inverted. With t counting qubits, the")
    print("condition number kappa that HHL can handle is at most 2^t.")


if __name__ == "__main__":
    hhl_demo()

The QPE precision directly constrains the condition number that HHL can handle. With t counting qubits, eigenvalues can only be resolved to multiples of 1/2^t. If two eigenvalues are closer than this resolution, they collapse to the same estimate, and the inversion step applies the wrong angle. For a matrix with condition number κ = λ_max / λ_min, you need t ≥ log2(κ) counting qubits to resolve all eigenvalues.

Application: Period Finding in Shor’s Algorithm

Shor’s algorithm uses QPE to find the period r of f(x) = a^x mod N. The unitary is U|x⟩ = |ax mod N⟩. Its eigenvalues are e^(2πi k/r) for k = 0, ..., r-1. QPE applied to a uniform superposition of eigenstates yields measurements close to multiples of 1/r, and continued fraction expansion recovers r.

This is the exact same QPE circuit as above, with U replaced by modular exponentiation. The computational difficulty of implementing U efficiently is where most of Shor’s circuit complexity lives.

Approximate QFT

The standard QFT on n qubits uses controlled-R_k rotation gates for all k from 2 to n, where R_k = diag(1, e^(2πi/2^k)). The small-angle rotations (large k) contribute phases of 2π/2^k, which become negligibly small for large k. Truncating the QFT by omitting rotations with k > m reduces the two-qubit gate count from O(n²) to O(n * m) while introducing only small fidelity loss.

Why truncation works

The rotation R_k adds a phase of 2π/2^k conditioned on a distant qubit. For k = 10, this phase is 2π/1024 ≈ 0.006 radians. On noisy hardware, the gate error from implementing this rotation often exceeds the error from omitting it entirely. The approximate QFT (AQFT) takes advantage of this by keeping only the m largest rotations per qubit.

Qiskit implementation: AQFT vs full QFT fidelity comparison

import numpy as np
from qiskit import QuantumCircuit, transpile
from qiskit_aer import AerSimulator
from qiskit.quantum_info import Statevector, state_fidelity


def full_qft(n: int) -> QuantumCircuit:
    """Standard QFT on n qubits (no truncation)."""
    qc = QuantumCircuit(n, name=f"QFT_{n}")

    for i in range(n):
        qc.h(i)
        for j in range(i + 1, n):
            k = j - i + 1
            qc.cp(2 * np.pi / 2**k, j, i)

    # Swap qubits to match standard QFT output ordering
    for i in range(n // 2):
        qc.swap(i, n - 1 - i)

    return qc


def approximate_qft(n: int, m: int) -> QuantumCircuit:
    """
    Approximate QFT on n qubits, keeping only rotations R_k with k <= m.
    When m >= n, this is identical to the full QFT.
    """
    qc = QuantumCircuit(n, name=f"AQFT_{n}_m{m}")

    for i in range(n):
        qc.h(i)
        for j in range(i + 1, min(i + m, n)):
            k = j - i + 1
            if k <= m:
                qc.cp(2 * np.pi / 2**k, j, i)

    # Swap qubits
    for i in range(n // 2):
        qc.swap(i, n - 1 - i)

    return qc


def compare_aqft_fidelity():
    """
    Compare fidelity of AQFT vs full QFT for various truncation depths.
    Tests on a specific input state (not |0>, which is trivial).
    """
    n_qubits = 8

    # Prepare a non-trivial input state: apply some rotations
    prep = QuantumCircuit(n_qubits)
    for i in range(n_qubits):
        prep.ry(0.3 * (i + 1), i)
    prep.cx(0, 1)
    prep.cx(2, 3)

    # Get exact QFT output
    full_circ = prep.compose(full_qft(n_qubits))
    sv_full = Statevector.from_instruction(full_circ)

    print(f"AQFT fidelity comparison (n = {n_qubits} qubits)")
    print(f"{'Truncation m':>14}  {'2Q gates':>10}  {'Fidelity':>10}  {'Gate reduction':>15}")
    print("-" * 55)

    full_gates = n_qubits * (n_qubits - 1) // 2
    for m in range(1, n_qubits + 1):
        approx_circ = prep.compose(approximate_qft(n_qubits, m))
        sv_approx = Statevector.from_instruction(approx_circ)
        fid = state_fidelity(sv_full, sv_approx)

        # Count approximate 2Q gates
        approx_gates = sum(min(m - 1, n_qubits - i - 1) for i in range(n_qubits))
        reduction = 1 - approx_gates / full_gates

        print(f"{m:14d}  {approx_gates:10d}  {fid:10.6f}  {reduction:14.1%}")


if __name__ == "__main__":
    compare_aqft_fidelity()

Run on n = 8 qubits, the fidelity climbs steeply with the truncation depth: m = 3 gives only 0.53 and m = 4 gives 0.89, but m = 5 reaches 0.983 and m = 6 reaches 0.998 while still cutting the two-qubit gate count by roughly 11 to 21 percent. So the useful regime here is m ≈ 5 to 6, not the aggressive m = 3 truncation that is sometimes quoted.

The general rule is that m needs to grow like O(log n) to hold fidelity fixed, which is why AQFT is attractive at large n: the gate count drops from O(n²) to O(n log n) while the error stays bounded. At n = 8 there is simply not much room between “too few rotations” and “all of them.” On NISQ hardware the trade is worth making whenever the error of the omitted gates exceeds the error of executing them, which is exactly the case for the smallest-angle rotations.

Noise Impact and Mitigation

On real hardware, QPE degrades due to:

Gate errors accumulating in the controlled-U^(2^k) chains. The k=t-1 term requires 2^(t-1) applications of U, each adding error.
Decoherence over the long circuit depth.
QFT fidelity decreasing with register size.

Iterative QPE, combined with quantum error mitigation techniques like zero-noise extrapolation, extends QPE’s practical reach. For problems requiring more than 5-7 bits of precision, fault-tolerant hardware is the realistic requirement.

Noisy simulation in Qiskit

The following code simulates QPE on a noisy backend and demonstrates how noise shifts the measurement distribution away from the correct answer.

import numpy as np
from qiskit import QuantumCircuit, QuantumRegister, ClassicalRegister, transpile
from qiskit_aer import AerSimulator
from qiskit_aer.noise import NoiseModel, depolarizing_error
from qiskit.circuit.library import QFT


def qpe_circuit(theta: float, t: int) -> QuantumCircuit:
    """Build a QPE circuit for U = diag(1, e^(2*pi*i*theta))."""
    counting = QuantumRegister(t, name="count")
    eigenstate = QuantumRegister(1, name="eig")
    cr = ClassicalRegister(t, name="c")
    qc = QuantumCircuit(counting, eigenstate, cr)

    qc.x(eigenstate[0])
    qc.h(counting)

    for k in range(t):
        angle = 2**k * 2 * np.pi * theta
        qc.cp(angle, counting[k], eigenstate[0])

    iqft = QFT(t, inverse=True, do_swaps=True)
    qc.compose(iqft, qubits=counting, inplace=True)
    qc.measure(counting, cr)

    return qc


def run_noisy_qpe():
    theta = 0.125  # exact on 3-bit grid
    t = 3
    shots = 8192

    qc = qpe_circuit(theta, t)

    # Ideal simulation
    ideal_sim = AerSimulator()
    compiled_ideal = transpile(qc, ideal_sim)
    counts_ideal = ideal_sim.run(compiled_ideal, shots=shots).result().get_counts()

    # Noisy simulation with varying error rates
    error_rates = [0.001, 0.005, 0.01, 0.02, 0.05, 0.10, 0.20]

    # With t counting qubits, a fully depolarized register is uniform over
    # 2^t outcomes. That is the floor P(correct) decays toward.
    uniform_floor = 1 / 2**t

    print(f"QPE noise analysis: theta = {theta}, t = {t}")
    print(f"Correct outcome: m = {int(theta * 2**t)}")
    print(f"Uniform (fully depolarized) floor: {uniform_floor:.4f}")
    print()
    print(f"{'Error rate':>12}  {'P(correct)':>12}  {'Peak bin':>10}  {'Peak correct?':>14}")
    print("-" * 54)

    # Ideal result
    correct_bitstring = format(int(theta * 2**t), f"0{t}b")
    p_correct_ideal = counts_ideal.get(correct_bitstring, 0) / shots
    print(f"{'ideal':>12}  {p_correct_ideal:12.4f}  {correct_bitstring:>10}  {'yes':>14}")

    for p_err in error_rates:
        # Build noise model
        noise_model = NoiseModel()
        error_1q = depolarizing_error(p_err, 1)
        error_2q = depolarizing_error(p_err * 2, 2)
        noise_model.add_all_qubit_quantum_error(error_1q, ["h", "rz", "x", "p"])
        noise_model.add_all_qubit_quantum_error(error_2q, ["cp", "cx"])

        noisy_sim = AerSimulator(noise_model=noise_model)
        compiled_noisy = transpile(qc, noisy_sim)
        counts_noisy = noisy_sim.run(compiled_noisy, shots=shots).result().get_counts()

        # Which bin dominates, and how much weight does the right one keep?
        best_bitstring = max(counts_noisy, key=counts_noisy.get)
        p_correct = counts_noisy.get(correct_bitstring, 0) / shots
        is_correct = "yes" if best_bitstring == correct_bitstring else "NO"
        print(f"{p_err:12.3f}  {p_correct:12.4f}  {best_bitstring:>10}  {is_correct:>14}")

    print()
    print("Depolarizing noise is unbiased: it mixes the output toward the uniform")
    print("distribution rather than pushing weight onto any particular wrong bin.")
    print("So the peak stays on the correct answer while P(correct) collapses")
    print("toward the 1/2^t floor. What you lose is CONTRAST, not accuracy: the")
    print("estimate is still right, but you need far more shots to see it, and")
    print("eventually the peak is statistically indistinguishable from noise.")
    print("Coherent errors (systematic over-rotation) behave differently: those")
    print("do shift the peak to a wrong bin, which is far more dangerous because")
    print("the result looks confident and is simply wrong.")


if __name__ == "__main__":
    run_noisy_qpe()

The table shows something more subtle than “noise makes QPE wrong.” Under depolarizing noise the correct bin stays the most probable outcome across the whole sweep, even at a wildly unrealistic 20% gate error. What collapses is the contrast: P(correct) falls from 1.0 to roughly 0.99 at 0.1% error, 0.89 at 1%, 0.56 at 5%, and 0.17 at 20%, converging on the uniform floor of 1/2^t = 0.125. The signal is being drowned, not moved.

This distinction matters in practice. Depolarizing noise is unbiased, so it costs you shots (you need enough samples to resolve the peak above a near-uniform background) but it does not lie to you. Coherent errors, such as a systematic over-rotation in the controlled-phase gates, are the dangerous case: they shift the peak onto a neighboring bin, and the result comes back looking perfectly confident while being simply wrong. When you are debugging a QPE run on hardware, a flat-looking histogram points at decoherence, whereas a sharp peak in the wrong place points at miscalibration.

Zero-noise extrapolation (ZNE)

ZNE mitigates noise by intentionally amplifying it and then extrapolating back to the zero-noise limit. The simplest approach is unitary folding: replacing each gate G with G G† G to triple the effective noise. By running circuits at noise factors 1x, 3x, and 5x, a linear or polynomial extrapolation estimates the noise-free expectation value.

For QPE, ZNE can be applied to the probability of each measurement outcome. After extrapolation, the corrected distribution is re-normalized and the peak identified. This partial correction often recovers the correct bin even when the raw noisy distribution is ambiguous.

On current NISQ hardware (gate errors around 0.1% to 1%), ZNE can extend QPE precision by roughly 1 to 2 additional bits before the extrapolation becomes unreliable.

Scaling Comparison: QPE Variants

The following table compares the three main QPE approaches across key metrics:

Metric	Standard QPE	Iterative QPE	Randomized QPE
Ancilla qubits	t	1	1
Circuit depth	O(2^t * d_U)	O(2^t * d_U) per round, but only one round active at a time	O(2^k * d_U) per sample, random k
Total circuits	1	t	O(t / ε)
Total shots	O(1) for exact phases	t (1 shot per round, or more for noise)	O(log(1/δ) / ε²)
Classical post-processing	Binary fraction readout	Bit accumulation with feedback	Median-of-means or Bayesian estimation
Coherence requirement	All t+n qubits coherent simultaneously	Only 1+n qubits coherent per round	Only 1+n qubits coherent per sample
NISQ suitability	Poor (deep circuits, many qubits)	Good (shallow per round)	Good (shallow per sample)
Fault-tolerant suitability	Excellent (single run)	Good (but more total operations)	Good (but more total operations)

Recommendation matrix

Quantum chemistry (energy estimation):

Near-term: iterative QPE or randomized QPE with error mitigation
Fault-tolerant: standard QPE with qubitized walk operator

Cryptographic period finding (Shor’s):

Standard QPE is the natural fit since the algorithm already requires fault-tolerant hardware for meaningful problem sizes
Iterative QPE can reduce qubit count at the cost of more circuit executions

Linear algebra (HHL and variants):

Standard QPE, since HHL requires QPE as a coherent subroutine (no mid-circuit measurement and classical feedback in the basic version)
For NISQ implementations of HHL-like algorithms, variational approaches often replace QPE entirely

Common Mistakes

This section catalogs errors that frequently appear in QPE implementations, along with explanations and fixes.

1. Using the wrong eigenstate

The T gate has two eigenstates: |0⟩ with eigenvalue 1 (phase θ = 0) and |1⟩ with eigenvalue e^(iπ/4) (phase θ = 1/8). A common mistake is preparing |0⟩ instead of |1⟩ as the eigenstate:

counting = QuantumRegister(3, name="count")
eigenstate = QuantumRegister(1, name="eig")
qc = QuantumCircuit(counting, eigenstate)

# The eigenstate register starts in |0>, which is ALSO an eigenstate of T
# (with eigenvalue 1, so theta = 0). Omit the line below and QPE happily
# returns theta = 0 instead of theta = 1/8. Nothing errors; you just
# measured the other eigenvalue.
qc.x(eigenstate[0])  # <-- the line that is easy to forget

Without the X gate, the eigenstate register starts in |0⟩, and since T|0⟩ = |0⟩, QPE correctly returns θ = 0. The circuit is technically correct; it just measures the wrong eigenvalue. Always verify which eigenstate you are preparing and which eigenvalue it corresponds to.

2. QFT direction confusion

QPE requires the inverse QFT, not the forward QFT. The forward QFT encodes computational basis states into phase patterns; the inverse QFT decodes phase patterns back to computational basis states. Applying the forward QFT instead of the inverse produces a scrambled output.

In Qiskit’s QFT class, set inverse=True:

from qiskit.circuit.library import QFT
from qiskit.quantum_info import Operator

# CORRECT
iqft = QFT(t, inverse=True, do_swaps=True)

# WRONG: forward QFT, not inverse
qft = QFT(t, inverse=False, do_swaps=True)

# The two are adjoints of each other, not the same operator.
same = Operator(iqft).equiv(Operator(qft))
print("forward QFT equals inverse QFT:", same)
print("inverse QFT equals (forward QFT)-dagger:",
      Operator(iqft).equiv(Operator(qft).adjoint()))

3. Off-by-one in the binary fraction

The measured integer m from t counting qubits corresponds to the phase θ = m / 2^t, not θ = m / 2^(t-1). With t = 3 counting qubits and measurement outcome m = 1:

t = 3
m = 1  # the integer read out of the counting register

# CORRECT
theta = m / 2**t  # theta = 1/8 = 0.125

# WRONG
theta_wrong = m / 2 ** (t - 1)  # theta = 1/4 = 0.25 (off by factor of 2)

print(f"correct: theta = {m}/2^{t} = {theta}")
print(f"wrong:   theta = {m}/2^{t-1} = {theta_wrong}  (doubled)")

This off-by-one error is surprisingly common and produces phase estimates that are systematically doubled.

4. Qiskit bit ordering (little-endian)

Qiskit uses little-endian bit ordering for classical registers. The bitstring "001" in Qiskit represents the integer 1 (rightmost bit is most significant in the displayed string, but corresponds to classical bit 0). When converting measurement bitstrings to integers:

bitstring = "001"  # as returned by get_counts()

# Qiskit bitstring "001":
# Bit 0 (rightmost in display) = 1, Bit 1 = 0, Bit 2 = 0
# Integer value = 1*2^0 + 0*2^1 + 0*2^2 = 1
m = int(bitstring, 2)  # This works correctly because Python's int()
                       # reads the string left-to-right as MSB-first,
                       # which matches Qiskit's display convention.

print(f"bitstring {bitstring!r} -> m = {m} -> theta = {m / 2**3}")

The confusion arises because Qiskit’s qubit ordering is q[0] at the bottom of the circuit diagram (least significant), but the classical bit display puts c[0] at the rightmost position. If your counting register maps qubit k to classical bit k, the int(bitstring, 2) conversion works directly. Problems arise when the qubit-to-classical-bit mapping is scrambled.

5. Insufficient shots for non-exact phases

For phases that do not lie on the t-bit grid, QPE produces a probability distribution (not a deterministic outcome). A single shot returns one sample from this distribution, which may not be the most probable outcome. For θ = 0.1 with t = 4, the nearest bin (m = 2) has only about 57% probability. With a single shot, there is a 43% chance of measuring something else.

Use enough shots to reliably identify the peak, and consider taking the most frequent outcome rather than any single measurement. For precision-critical applications, use at least O(1/P(m*)²) shots where P(m*) is the probability of the correct bin.

6. Wrong angle in the controlled-phase gate

The cp(angle) gate in Qiskit applies a phase of e^(i * angle). For QPE of a unitary with eigenvalue e^(2πiθ), the controlled-U^(2^k) gate should apply phase 2^k * 2πθ. A common error is using θ directly as the angle instead of 2πθ:

k = 0
theta = 0.3  # a phase that is NOT a special case, so the 2*pi factor shows up

# CORRECT: angle is 2^k * 2*pi*theta
angle = 2**k * 2 * np.pi * theta
qc.cp(angle, counting[k], eigenstate[0])

# WRONG: using theta directly (missing the 2*pi factor)
qc_wrong = QuantumCircuit(counting, eigenstate)
angle_wrong = 2**k * theta
qc_wrong.cp(angle_wrong, counting[k], eigenstate[0])

print(f"correct angle = {angle:.4f} rad")
print(f"wrong angle   = {angle_wrong:.4f} rad  (short by a factor of 2*pi)")

For the T gate specifically, the eigenvalue phase is π/4 (not 2π * 1/8 = π/4; these happen to coincide). But for arbitrary θ, forgetting the 2π factor produces completely wrong results.

Similarly, be careful not to double-count: if you define θ as the phase in e^(2πiθ) (so θ ∈ [0,1)), the angle for cp is 2^k * 2π * θ. If you define φ as the raw phase in e^(iφ) (so φ ∈ [0, 2π)), then the angle is 2^k * φ. Mixing these conventions is a recipe for factor-of-2π errors.