The Quantum Fourier Transform: How and Why It Works

Fourier Transforms: The Intuition

Before diving into quantum circuits, it helps to understand what a Fourier transform actually does. In plain terms, the Fourier transform takes a signal and decomposes it into frequency components. If you record audio, the Fourier transform tells you which pitches are present and how loud each one is.

The discrete Fourier transform (DFT) does this for a finite list of numbers. Given a vector of N values, it produces N frequency components. A large component at frequency k means the original signal has a strong pattern that repeats every N/k steps.

Here is a concrete example. Consider a signal that repeats every 4 steps in a sequence of length 8:

import numpy as np

# A periodic signal with period 4 in N=8 samples
x = np.array([1, 0, 0, 0, 1, 0, 0, 0])

# Normalized DFT (same normalization the QFT uses)
X = np.fft.fft(x) / np.sqrt(8)

print("Input signal:", x)
print("DFT magnitudes:", np.round(np.abs(X), 3))
# Large components at k=0 (DC offset) and k=2 (frequency 2 = period 4)

The output shows large magnitudes at frequency k=0 (the average value) and k=2. Frequency k=2 corresponds to a period of N/k = 8/2 = 4, which matches the input pattern perfectly.

This frequency detection ability is exactly what makes the Fourier transform so powerful in quantum computing. Shor’s algorithm works because the QFT can detect the period of a modular exponentiation function, and that period reveals the factors of an integer.

What the QFT Does

The classical DFT maps a vector x of N complex numbers to a vector X where:

X_k = (1/sqrt(N)) * sum_{j=0}^{N-1} x_j * exp(2piijk/N)

The best classical algorithm, the FFT, computes this in O(N log N) time.

The quantum Fourier transform (QFT) applies the same transformation to the amplitudes of a quantum state:

QFT |j> = (1/sqrt(N)) * sum_{k=0}^{N-1} exp(2piijk/N) |k>

For n qubits, N = 2^n. The QFT circuit uses O(n^2) gates and runs in depth O(n^2), compared to the classical O(N log N) = O(n * 2^n) operations. This is an exponential improvement in the number of gates.

The catch: you cannot read out all N transformed amplitudes directly. Measurement collapses the state. The QFT is useful as a subroutine when downstream quantum operations exploit the transformed state, rather than when you need the full Fourier transform classically.

The QFT on a Single Qubit

The simplest case of the QFT is n=1, which acts on a single qubit (N=2). Applying the QFT formula:

QFT|0> = (1/sqrt(2)) * (exp(0)|0> + exp(0)|1>) = (|0> + |1>) / sqrt(2)
QFT|1> = (1/sqrt(2)) * (exp(0)|0> + exp(2pii*1/2)|1>) = (|0> - |1>) / sqrt(2)

These are exactly the results of the Hadamard gate:

H|0> = |+> = (|0> + |1>) / sqrt(2)
H|1> = |-> = (|0> - |1>) / sqrt(2)

So the Hadamard gate is the 1-qubit QFT. This gives you an intuitive anchor: every time you see a Hadamard in a quantum algorithm, you can think of it as a tiny Fourier transform. The full n-qubit QFT generalizes this idea by adding controlled phase rotations that encode the finer frequency structure.

The QFT Product Formula

The QFT has a beautiful factored form that reveals exactly why the circuit works. For an n-qubit input state |j>, where j has binary representation j = j_1 j_2 … j_n (with j_1 as the most significant bit):

QFT|j> = (1/sqrt(2^n)) * tensor product over k=1..n of (|0> + exp(2pii * j/2^k) |1>)

Each factor in this tensor product is a single-qubit state. Qubit k in the output holds:

(|0> + exp(2pii * j/2^k) |1>) / sqrt(2)

The phase exp(2pii * j/2^k) depends on the last k bits of j. This is the key insight: each output qubit depends only on a subset of the input bits.

This factored structure maps directly onto the circuit:

The Hadamard on qubit k creates the initial superposition (|0> + |1>) / sqrt(2).
The controlled phase rotations from other qubits add the phases exp(2pii * j/2^k) by reading bits of j from those qubits.

Since each qubit must interact with every other qubit to accumulate all the required phases, the circuit needs O(n^2) two-qubit gates total: qubit 1 interacts with n-1 others, qubit 2 with n-2, and so on, giving n(n-1)/2 controlled rotations plus n Hadamards.

The Circuit Construction

The QFT on n qubits uses three types of gates:

Hadamard (H): creates an equal superposition of |0> and |1>.
Controlled-R_k: applies a phase of exp(2pii/2^k) conditioned on a control qubit.
SWAP: reverses the bit order of the output (sometimes omitted and compensated for in later stages).

The QFT circuit for qubit j applies:

H to qubit j
Controlled-R_2 with control qubit j+1
Controlled-R_3 with control qubit j+2
…
Controlled-R_{n-j} with control qubit n-1

This is repeated for each qubit from 0 to n-1, followed by SWAP gates to correct the bit ordering.

from qiskit import QuantumCircuit
import numpy as np

def qft_circuit(n_qubits, inverse=False):
    """
    Build the QFT (or inverse QFT) circuit on n_qubits.
    """
    qc = QuantumCircuit(n_qubits)

    def qft_rotate(circuit, n):
        """Recursive QFT rotation layer for qubit 0..n-1."""
        if n == 0:
            return circuit
        n -= 1
        circuit.h(n)
        for qubit in range(n):
            # Controlled phase rotation: exp(2*pi*i / 2^(n-qubit+1))
            circuit.cp(np.pi / 2 ** (n - qubit), qubit, n)
        qft_rotate(circuit, n)

    qft_rotate(qc, n_qubits)

    # Swap qubits to restore bit order
    for qubit in range(n_qubits // 2):
        qc.swap(qubit, n_qubits - qubit - 1)

    if inverse:
        return qc.inverse()
    return qc

# Build and display a 4-qubit QFT
qc = qft_circuit(4)
print(qc.draw(output="text", fold=80))
print(f"\nQFT gate count: {qc.size()}")
print(f"QFT circuit depth: {qc.depth()}")

Step-by-Step 3-Qubit QFT on |5>

Working through a concrete example makes the circuit construction tangible. We apply the 3-qubit QFT to the state |5> = |101> in binary.

In Qiskit’s little-endian convention, |5> means qubit 0 = 1, qubit 1 = 0, qubit 2 = 1. The QFT formula predicts:

QFT|5> = (1/sqrt(8)) * sum_{k=0}^{7} exp(2pii5k/8) |k>

Let us trace through the circuit gate by gate:

H on qubit 2 (most significant in the recursive decomposition): creates (|0> + exp(2pii * 0.j_2 j_1 j_0)|1>)/sqrt(2) where j_2 j_1 j_0 = 101.
Controlled-R_2 from qubit 1 to qubit 2: adds phase depending on qubit 1 (which is 0 for |5>), so no phase is added.
Controlled-R_3 from qubit 0 to qubit 2: adds phase depending on qubit 0 (which is 1), contributing exp(2pii/8).
H on qubit 1: starts the superposition for qubit 1.
Controlled-R_2 from qubit 0 to qubit 1: adds phase from qubit 0.
H on qubit 0: the final Hadamard.
SWAP qubit 0 and qubit 2: reverses the output bit order.

We can verify numerically that this produces the correct result:

from qiskit import QuantumCircuit
from qiskit.quantum_info import Statevector
import numpy as np

# Prepare |5> = |101> in Qiskit's little-endian convention
n = 3
init = QuantumCircuit(n)
init.x(0)  # qubit 0 = 1
init.x(2)  # qubit 2 = 1
# State is |101> = |5>

# Apply QFT
full = init.compose(qft_circuit(n))
sv = Statevector(full)

print("QFT|5> amplitudes:")
for i, amp in enumerate(sv.data):
    phase = np.angle(amp) / np.pi
    print(f"  |{i:03b}> ({i}): magnitude={abs(amp):.4f}, phase={phase:.4f}*pi")

# Verify against the formula: QFT|5> = (1/sqrt(8)) * sum_k exp(2*pi*i*5*k/8)|k>
expected = np.array([np.exp(2j * np.pi * 5 * k / 8) for k in range(8)]) / np.sqrt(8)
print("\nMatches QFT formula:", np.allclose(sv.data, expected, atol=1e-10))

Each amplitude has the same magnitude 1/sqrt(8), but carries a different phase exp(2pii5k/8). This uniform magnitude with varying phase is characteristic of QFT applied to a computational basis state.

Why the QFT Is Efficient

The classical FFT requires O(N log N) = O(n * 2^n) arithmetic operations to compute the DFT of N = 2^n numbers. The QFT requires only O(n^2) quantum gates. Where does this dramatic speedup come from?

The answer lies in quantum parallelism through superposition. A quantum state of n qubits encodes 2^n amplitudes simultaneously. When you apply a gate to one qubit, it transforms all 2^n amplitudes in a single operation. The QFT circuit applies n^2 gates, but each gate acts on the full superposition of 2^n states in parallel.

However, this parallelism comes with a fundamental limitation. The classical FFT produces all N output values explicitly. You can read any of them. The QFT produces a quantum state whose amplitudes are the Fourier-transformed values, but measurement gives you only ONE random sample from the output distribution. You cannot extract all 2^n amplitudes without repeating the computation exponentially many times.

This is why the QFT alone does not replace the FFT for classical signal processing. Its power emerges when combined with other quantum operations (like in phase estimation) that make the measurement outcome carry useful information with high probability.

The Approximate QFT

In the full QFT circuit, the controlled-R_k gates for large k apply very small phase rotations. For example, R_10 rotates by exp(2pii/1024), an angle of about 0.006 radians. On noisy hardware, the error introduced by a two-qubit gate often exceeds this tiny rotation, making the gate worse than useless.

The approximate QFT drops controlled rotations beyond a chosen cutoff degree m. This reduces the gate count from O(n^2) to O(n*m) while introducing bounded approximation error.

from qiskit import QuantumCircuit
import numpy as np

def approx_qft_circuit(n_qubits, approximation_degree=2):
    """
    Approximate QFT: drop controlled rotations of order > approximation_degree.
    Reduces gate count from O(n^2) to O(n * approximation_degree).

    approximation_degree: maximum distance between control and target qubits.
        Full QFT corresponds to approximation_degree >= n_qubits - 1.
    """
    qc = QuantumCircuit(n_qubits)

    for target in reversed(range(n_qubits)):
        qc.h(target)
        # Only apply R_k for k up to approximation_degree
        for k in range(1, min(approximation_degree, target) + 1):
            control = target - k
            qc.cp(np.pi / 2**k, control, target)

    # Swap qubits to restore bit order
    for qubit in range(n_qubits // 2):
        qc.swap(qubit, n_qubits - qubit - 1)

    return qc

# Compare gate counts for 8-qubit QFT
full = qft_circuit(8)
approx_2 = approx_qft_circuit(8, approximation_degree=2)
approx_3 = approx_qft_circuit(8, approximation_degree=3)

print(f"Full QFT:    {full.size()} gates, depth {full.depth()}")
print(f"Approx (m=2): {approx_2.size()} gates, depth {approx_2.depth()}")
print(f"Approx (m=3): {approx_3.size()} gates, depth {approx_3.depth()}")

The approximation error is bounded by O(n/2^m), so even m=3 or m=4 gives excellent fidelity for practical circuit sizes. This tradeoff between accuracy and circuit depth is crucial for near-term quantum hardware where every two-qubit gate adds noise.

The Inverse QFT

The inverse QFT (IQFT) reverses the Fourier transform: IQFT * QFT = Identity. Since the QFT is a unitary operation, its inverse is its conjugate transpose (adjoint).

In circuit terms, the IQFT is constructed by:

Reversing the order of all gates in the QFT circuit.
Replacing every phase angle theta with -theta (conjugating all phase rotations).

So where the QFT applies H then controlled phase rotations (with positive angles), the IQFT applies controlled phase rotations (with negative angles) then H, working through the qubits in reverse order.

We can verify this numerically:

from qiskit import QuantumCircuit
from qiskit.quantum_info import Operator
import numpy as np

n = 3

# Build QFT and IQFT
qft = qft_circuit(n)
iqft = qft_circuit(n, inverse=True)

# Compose them: IQFT * QFT should equal the identity
combined = qft.compose(iqft)
op = Operator(combined)

# Check if the result is the identity matrix
identity = np.eye(2**n)
print("QFT * IQFT = Identity:", np.allclose(op.data, identity, atol=1e-10))

# Also verify on a specific state
from qiskit.quantum_info import Statevector
init = QuantumCircuit(n)
init.x(0)
init.x(1)  # Prepare |3> = |011>

roundtrip = init.compose(qft).compose(iqft)
original_sv = Statevector(init)
roundtrip_sv = Statevector(roundtrip)
print("Round-trip preserves state:", np.allclose(original_sv.data, roundtrip_sv.data, atol=1e-10))

The IQFT appears in quantum phase estimation, where it converts phase-encoded information in the counting register back into a computational basis state that can be measured.

Phase Kickback in Controlled Rotations

The controlled phase rotations in the QFT circuit rely on a mechanism called phase kickback. Understanding this mechanism clarifies how the circuit accumulates the correct phases.

Consider a controlled-R_k gate acting on a control qubit |c> and a target qubit |t>. The gate applies exp(2pii/2^k) to the target when both control and target are |1>:

CR_k |0>|t> = |0>|t> (control is 0, nothing happens)
CR_k |1>|0> = |1>|0> (target is 0, no phase)
CR_k |1>|1> = exp(2pii/2^k) |1>|1> (both are 1, phase applied)

When the control is in a superposition (a|0> + b|1>) and the target is |1>, the phase goes onto the control:

CR_k (a|0> + b|1>) |1> = a|0>|1> + b * exp(2pii/2^k) |1>|1>

The target state |1> is unchanged, but the control qubit picks up a relative phase. This is phase kickback: the phase “kicks back” from the target onto the control.

In the QFT circuit, after the Hadamard puts a qubit into superposition, the subsequent controlled rotations use other qubits (which encode bits of the input j) as controls. Each control qubit that is |1> kicks a specific phase onto the target qubit, building up the total phase exp(2pii * j/2^k) required by the product formula.

Verifying the QFT Numerically

from qiskit import QuantumCircuit
from qiskit.quantum_info import Statevector
import numpy as np

# Prepare a simple input state: equal superposition of |0> and |4>
n = 4
init_qc = QuantumCircuit(n)
init_qc.h(0)        # Creates (|0> + |1>) / sqrt(2) on qubit 0
init_qc.x(2)        # Sets qubit 2 to |1>, shifting to a specific state

# Apply QFT
full_circuit = init_qc.compose(qft_circuit(n))
sv = Statevector(full_circuit)

print("QFT output amplitudes (non-negligible):")
for i, amp in enumerate(sv.data):
    if abs(amp) > 0.01:
        print(f"  |{i:04b}> ({i:2d}): {amp:.4f}")

# Verify against numpy: Qiskit QFT uses the positive-exponent convention,
# which matches numpy's ifft (scaled by sqrt(N))
input_amplitudes = Statevector(init_qc).data
classical_qft = np.fft.ifft(input_amplitudes) * np.sqrt(2**n)
print("\nNumerical QFT matches scaled numpy ifft:",
      np.allclose(sv.data, classical_qft, atol=1e-8))

The QFT in Quantum Phase Estimation

Quantum phase estimation (QPE) is the central subroutine in Shor’s algorithm and many other algorithms. Given a unitary U with eigenvector |u> and eigenvalue exp(2pii*phi), QPE estimates phi.

The algorithm works in three steps:

Prepare t ancilla qubits in equal superposition (t controls accuracy).
Apply controlled-U^(2^j) for j = 0, …, t-1 using the ancilla qubits as controls.
Apply the inverse QFT to the ancilla register.

After step 3, the ancilla register encodes the t-bit binary approximation of phi with high probability.

from qiskit import QuantumCircuit
import numpy as np

def qpe_circuit(unitary_circuit, n_counting_qubits, n_state_qubits):
    """
    Build a QPE circuit for a given unitary.
    unitary_circuit: QuantumCircuit acting on n_state_qubits
    """
    total_qubits = n_counting_qubits + n_state_qubits
    qc = QuantumCircuit(total_qubits, n_counting_qubits)

    # Step 1: Hadamard on counting qubits
    qc.h(range(n_counting_qubits))

    # Step 2: Controlled-U^(2^j) for each counting qubit
    for j in range(n_counting_qubits):
        # Apply U 2^j times, controlled on counting qubit j
        controlled_u = unitary_circuit.control(1)
        repetitions = 2 ** j
        for _ in range(repetitions):
            qc.compose(
                controlled_u,
                qubits=[j] + list(range(n_counting_qubits, total_qubits)),
                inplace=True
            )

    # Step 3: Inverse QFT on counting register
    iqft = qft_circuit(n_counting_qubits, inverse=True)
    qc.compose(iqft, qubits=range(n_counting_qubits), inplace=True)

    qc.measure(range(n_counting_qubits), range(n_counting_qubits))
    return qc

QPE Worked Example: Finding the Phase of the T Gate

The T gate applies a phase of exp(ipi/4) to the |1> state. Since T|1> = exp(ipi/4)|1>, the state |1> is an eigenvector of T with eigenvalue exp(2piiphi) where phi = 1/8 (because pi/4 = 2pi * 1/8).

We use 4 counting qubits, which gives us 2^4 = 16 possible outcomes and a precision of 1/16:

from qiskit import QuantumCircuit, transpile
from qiskit_aer import AerSimulator
import numpy as np

# QPE to find the phase of the T gate
# T|1> = exp(i*pi/4)|1>, so phi = 1/8

counting_qubits = 4
state_qubits = 1
total = counting_qubits + state_qubits

qpe = QuantumCircuit(total, counting_qubits)

# Prepare the eigenstate |1> on the state qubit
qpe.x(counting_qubits)

# Hadamard on all counting qubits
qpe.h(range(counting_qubits))

# Controlled-T^(2^j): the T gate has phase pi/4,
# so T^(2^j) has phase 2^j * pi/4
for j in range(counting_qubits):
    angle = (2**j) * np.pi / 4
    qpe.cp(angle, j, counting_qubits)

# Apply inverse QFT to the counting register
iqft = qft_circuit(counting_qubits, inverse=True)
qpe.compose(iqft, qubits=range(counting_qubits), inplace=True)

# Measure counting register
qpe.measure(range(counting_qubits), range(counting_qubits))

# Simulate
sim = AerSimulator()
compiled = transpile(qpe, sim)
result = sim.run(compiled, shots=1024).result()
counts = result.get_counts()

print("QPE measurement results:")
for bitstring, count in sorted(counts.items(), key=lambda x: -x[1]):
    decimal_value = int(bitstring, 2)
    estimated_phase = decimal_value / 2**counting_qubits
    print(f"  {bitstring} (decimal {decimal_value}): "
          f"{count} counts, estimated phi = {decimal_value}/{2**counting_qubits}"
          f" = {estimated_phase:.4f}")

# Expected: bitstring "0010" (decimal 2 in little-endian = binary 0010)
# phi = 2/16 = 1/8, so eigenvalue = exp(2*pi*i/8) = exp(i*pi/4). Correct!

Since phi = 1/8 is exactly representable in 4 bits (binary 0.0010), QPE finds it with certainty. For phases that are not exact binary fractions, QPE returns the closest binary approximation with high probability, and adding more counting qubits increases the precision.

The QFT in Shor’s Algorithm

Shor’s algorithm factors an integer N by reducing factoring to period finding, then solving period finding with QPE. Here is how the reduction works:

Step 1: Choose a random base. Pick a random integer a with 1 < a < N. Compute gcd(a, N). If the gcd is greater than 1, you have already found a factor (this is unlikely but possible). Otherwise, a and N are coprime, and you proceed to period finding.

Step 2: Find the period. The function f(x) = a^x mod N is periodic. There exists a smallest positive integer r (the order of a modulo N) such that a^r = 1 (mod N). For example, with a=2 and N=15: 2^1=2, 2^2=4, 2^3=8, 2^4=16=1 (mod 15), so r=4.

Step 3: Extract factors. If r is even, compute gcd(a^(r/2) - 1, N) and gcd(a^(r/2) + 1, N). With high probability, at least one of these is a nontrivial factor of N. In our example: 2^2 - 1 = 3 and 2^2 + 1 = 5, and indeed 15 = 3 * 5.

Why QPE solves period finding. Define the unitary U_a that maps |y> to |ay mod N>. The eigenstates of U_a have eigenvalues exp(2pii*k/r) for k = 0, 1, …, r-1. Running QPE on U_a produces an estimate of k/r for a random k. The continued fractions algorithm then recovers r from this fraction.

Why the QFT is the key. When QPE prepares the counting register, it creates a superposition with a periodic structure (period related to r). The inverse QFT transforms this periodic superposition into a sharp peak at the corresponding frequency. Without the QFT, the periodic information would be spread across all amplitudes with no way to extract it through measurement.

Classically, detecting the period of a^x mod N requires evaluating f(x) for O(sqrt(N)) values at best (using birthday-bound arguments). QPE with the QFT finds the period using O(log^2(N)) quantum gates (with efficient modular exponentiation), an exponential speedup.

Hardware Implementation Challenges

Building a QFT circuit on real quantum hardware introduces practical challenges. Understanding these helps you make informed choices about circuit design.

Gate counts scale quadratically. For n qubits, the full QFT requires n Hadamards, n(n-1)/2 controlled phase rotations, and n/2 SWAPs:

import numpy as np

def qft_resource_estimate(n):
    """Estimate resources for n-qubit QFT."""
    hadamards = n
    controlled_phases = n * (n - 1) // 2
    swaps = n // 2
    # Each SWAP = 3 CNOTs, each controlled phase = ~2 CNOTs after transpilation
    cnots_from_phases = controlled_phases * 2
    cnots_from_swaps = swaps * 3
    total_cnots = cnots_from_phases + cnots_from_swaps
    return {
        "hadamards": hadamards,
        "controlled_phases": controlled_phases,
        "swaps": swaps,
        "total_cnots_estimated": total_cnots,
    }

for n in [4, 8, 12, 16]:
    r = qft_resource_estimate(n)
    # Estimate fidelity at 0.3% CNOT error rate
    fidelity = (1 - 0.003) ** r["total_cnots_estimated"]
    print(f"n={n:2d}: {r['controlled_phases']:3d} CPhase, "
          f"{r['swaps']:2d} SWAPs, ~{r['total_cnots_estimated']:4d} CNOTs, "
          f"estimated fidelity: {fidelity:.1%}")

At a typical CNOT error rate of 0.3%, a 12-qubit full QFT with roughly 138 CNOTs has an expected fidelity of about 66%. The approximate QFT with degree m=2 cuts the CNOT count dramatically:

def approx_qft_resource_estimate(n, m):
    """Estimate resources for approximate n-qubit QFT with degree m."""
    controlled_phases = sum(min(m, k) for k in range(n))
    swaps = n // 2
    cnots = controlled_phases * 2 + swaps * 3
    return controlled_phases, swaps, cnots

for n in [4, 8, 12, 16]:
    cp_full = n * (n - 1) // 2
    cp_approx, sw, cnots_approx = approx_qft_resource_estimate(n, m=2)
    fidelity = (1 - 0.003) ** cnots_approx
    print(f"n={n:2d}: approx CPhase={cp_approx:3d} (vs full {cp_full:3d}), "
          f"~{cnots_approx:3d} CNOTs, fidelity: {fidelity:.1%}")

Connectivity constraints. Most hardware has limited qubit connectivity (e.g., a linear chain or a heavy-hex lattice). The QFT requires interactions between distant qubits, which means the transpiler must insert SWAP gates for routing. On a linear topology, this can multiply the CNOT count by a factor of 2 to 3.

The approximate QFT is essential on near-term hardware. For circuits beyond about 6 qubits, the approximate QFT with m=2 or m=3 gives a much better fidelity-accuracy tradeoff than the full QFT, because the error from dropped small-angle rotations is far less than the error from noisy two-qubit gates.

Common Mistakes

Several misconceptions about the QFT trip up even experienced quantum programmers.

Confusing amplitudes with measurement outcomes

The QFT transforms quantum amplitudes, not classical data. After applying the QFT, the state holds all 2^n Fourier coefficients as amplitudes, but a single measurement returns only ONE outcome sampled from the probability distribution |amplitude|^2. You do not get the full Fourier transform from one shot. Extracting all amplitudes requires exponentially many measurements, which eliminates the quantum speedup if that is your goal. The QFT is powerful precisely because algorithms like QPE are designed so that a single (or few) measurement(s) give useful information.

Wrong sign convention

Different textbooks and frameworks use opposite sign conventions for the QFT. Some define the QFT with a positive exponent exp(+2piijk/N) and others use negative exp(-2piijk/N). Qiskit uses the positive convention, which matches numpy’s ifft (inverse FFT), not fft. If you compare your QFT output against the wrong numpy function, everything will look wrong even though your circuit is correct. Always check which convention your reference uses.

Forgetting the SWAP gates

The QFT circuit without the final SWAP gates produces the output in bit-reversed order. This is sometimes intentional: in QPE, many implementations skip the SWAPs and simply reorder the measurement results classically, since SWAPs add unnecessary noise. But if you check your QFT output against the formula and forget that the SWAPs are missing, the amplitudes will appear scrambled.

Expecting the QFT to replace the classical FFT

The QFT computes the Fourier transform of quantum amplitudes, not of classical data. Loading classical data into quantum amplitudes (state preparation) is itself a costly operation, often requiring O(2^n) gates, which wipes out the QFT’s speedup. The QFT delivers its advantage when the input state is already prepared efficiently by a quantum process (as in QPE or Shor’s algorithm), not when you try to Fourier-transform a classical dataset.

Key Takeaways

The QFT is a compact circuit with O(n^2) gates that applies the N-point Fourier transform in O(n^2) depth, exponentially fewer operations than the classical FFT. It cannot be used to read out the Fourier transform directly (measurement collapses the state), but it is indispensable as a subroutine in QPE, Shor’s algorithm, quantum signal processing, and quantum arithmetic.

The core ideas to remember:

The Hadamard gate is the single-qubit QFT. The full QFT generalizes it with controlled phase rotations.
The product formula QFT|j> = tensor product of (|0> + exp(2pii*j/2^k)|1>)/sqrt(2) reveals why the circuit has the structure it does.
The approximate QFT drops small-angle rotations, reducing gate count from O(n^2) to O(n*m) with bounded error. This is essential for hardware implementations.
Phase kickback is the mechanism that writes input-dependent phases onto the output qubits.
QPE uses the inverse QFT to convert phase information into measurable bit strings, enabling Shor’s algorithm and many other quantum algorithms.

Understanding the QFT circuit construction is a prerequisite for studying any algorithm that involves periodicity, phase estimation, or integer arithmetic.