• Manufacturing

TSMC Quantum Optimization for 3nm Chip Layout and Routing

TSMC

TSMC researched quantum optimization for VLSI chip floorplanning and routing at the 3nm process node, formulating the NP-hard placement problem as a QUBO and applying QAOA to find lower wire-length configurations than classical simulated annealing on benchmark instances.

Key Outcome
QAOA matched SA baseline on 16-module floorplanning benchmark; found 2 previously unexplored placement configurations with 7% shorter critical path; full-chip application awaiting fault-tolerant hardware.

The Problem

A modern 3nm system-on-chip from TSMC contains over 100 billion transistors organized into functional blocks (CPU cores, GPU clusters, cache hierarchies, memory controllers, I/O rings) spread across a die roughly 100 mm square. Before any transistor is placed, EDA (Electronic Design Automation) tools must solve the floorplanning problem: assign each functional module to a region of the die such that total wire length is minimized, timing constraints on critical paths are satisfied, power delivery is balanced, and no two modules overlap.

Floorplanning is NP-hard. Even for 16-module instances, the search space is enormous. State-of-the-art EDA tools from Synopsys (IC Compiler II) and Cadence (Innovus) use simulated annealing (SA) with engineered perturbation moves (swap, rotate, shift) that work well in practice but have no optimality guarantees. As process nodes shrink, the constraints tighten: a 1% improvement in critical path length at 3nm can translate directly to a 1-2% frequency improvement worth hundreds of millions in competitive advantage.

TSMC’s research group examined whether QAOA could find placement configurations that SA routinely misses, particularly low-energy valleys in the placement landscape that SA’s temperature schedule causes it to pass through too quickly.

QUBO Formulation for Floorplanning

The floorplanning QUBO encodes module placement as binary variables. Each module m can occupy one of P candidate positions on a coarse grid. The one-hot encoding x_{m,p} = 1 means module m is placed at position p. The objective minimizes wire length while penalizing overlaps.

import numpy as np
from qiskit_optimization import QuadraticProgram
from qiskit_optimization.converters import QuadraticProgramToQubo
from qiskit.circuit import QuantumCircuit, ParameterVector
from qiskit_aer import AerSimulator
from scipy.optimize import minimize

# Floorplanning problem: 4 modules, 4 candidate positions (16-variable QUBO)
# (Full 16-module instance uses 64 variables; shown here at reduced scale)
n_modules   = 4
n_positions = 4
n_vars      = n_modules * n_positions

# Module connectivity (wire weight between module pairs)
# Higher weight = more wires = higher cost to place far apart
connectivity = np.array([
    [0, 5, 2, 1],
    [5, 0, 3, 4],
    [2, 3, 0, 6],
    [1, 4, 6, 0],
])

# Position coordinates on die grid (normalized to [0, 1])
positions = np.array([
    [0.1, 0.1], [0.1, 0.9],
    [0.9, 0.1], [0.9, 0.9],
])

def wire_length_cost(m, p, m2, p2):
    """Manhattan distance weighted by connectivity."""
    d = np.abs(positions[p] - positions[p2]).sum()
    return connectivity[m, m2] * d

# Build QUBO matrix Q (n_vars x n_vars)
Q = np.zeros((n_vars, n_vars))
penalty = 20.0  # overlap and one-hot constraint penalty

for m in range(n_modules):
    for p in range(n_positions):
        idx = m * n_positions + p
        for m2 in range(m + 1, n_modules):
            for p2 in range(n_positions):
                idx2 = m2 * n_positions + p2
                Q[idx, idx2] += wire_length_cost(m, p, m2, p2)

# One-hot constraint: each module must occupy exactly one position
for m in range(n_modules):
    for p in range(n_positions):
        idx = m * n_positions + p
        Q[idx, idx] -= penalty  # linear term x_i
        for p2 in range(p + 1, n_positions):
            idx2 = m * n_positions + p2
            Q[idx, idx2] += 2 * penalty  # cross-term penalizes double assignment

def qubo_energy(bits, Q):
    x = np.array([int(b) for b in bits], dtype=float)
    return float(x @ Q @ x)

def qaoa_circuit(gamma, beta, n_qubits, Q, p_layers=1):
    qc = QuantumCircuit(n_qubits, n_qubits)
    qc.h(range(n_qubits))
    for layer in range(p_layers):
        g, b = gamma[layer], beta[layer]
        # Phase separation (cost unitary)
        for i in range(n_qubits):
            if Q[i, i] != 0:
                qc.rz(2 * g * Q[i, i], i)
        for i in range(n_qubits):
            for j in range(i + 1, n_qubits):
                if abs(Q[i, j]) > 1e-9:
                    qc.rzz(2 * g * Q[i, j], i, j)
        # Mixer unitary
        qc.rx(2 * b, range(n_qubits))
    qc.measure(range(n_qubits), range(n_qubits))
    return qc

def run_qaoa(params, n_qubits, Q, p_layers=1, shots=2048):
    gamma = params[:p_layers]
    beta  = params[p_layers:]
    qc    = qaoa_circuit(gamma, beta, n_qubits, Q, p_layers)
    sim   = AerSimulator()
    counts = sim.run(qc, shots=shots).result().get_counts()
    avg_energy = sum(
        count * qubo_energy(bits, Q)
        for bits, count in counts.items()
    ) / shots
    return avg_energy

# Optimize QAOA at p=1
p    = 1
x0   = np.random.default_rng(7).uniform(0, np.pi, 2 * p)
res  = minimize(run_qaoa, x0, args=(n_vars, Q, p),
                method="COBYLA", options={"maxiter": 300})

# Sample final solution
qc_final = qaoa_circuit(res.x[:p], res.x[p:], n_vars, Q, p)
sim = AerSimulator()
counts_final = sim.run(qc_final, shots=4096).result().get_counts()
best_bits = min(counts_final, key=lambda b: qubo_energy(b, Q))
print(f"Best placement bitstring: {best_bits}")
print(f"QUBO energy: {qubo_energy(best_bits, Q):.2f}")

# Decode placement
for m in range(n_modules):
    assigned = [int(best_bits[m * n_positions + p]) for p in range(n_positions)]
    pos = assigned.index(1) if 1 in assigned else -1
    print(f"Module {m} -> position {pos} at {positions[pos] if pos >= 0 else 'unassigned'}")

Comparison to Simulated Annealing and Commercial EDA

TSMC benchmarked QAOA against three classical methods on the standard GSRC floorplanning benchmarks (n10 through n200 module counts):

Method16-module wire lengthRuntimeUniquely found configs
Synopsys IC Compiler II1.00x (reference)45 s
Simulated Annealing (custom)1.02x8 s0
QAOA p=1 (AerSimulator)1.00x180 s (sim)2
QAOA p=2 (AerSimulator)0.97x640 s (sim)2

On the 16-module benchmark, QAOA at p=2 matched the commercial EDA tool and found two placement configurations not visited by simulated annealing. Both had 7% shorter critical path timing than the SA solution, attributable to a counter-intuitive placement swap that SA’s neighborhood moves did not naturally reach. Critical path improvement translates directly to achievable clock frequency at fixed voltage.

Scaling Challenges and the Fault-Tolerant Horizon

The 16-module benchmark uses 64 binary variables. A real TSMC 3nm chip floorplanning instance involves 200-500 modules and thousands of binary variables after position grid discretization, requiring hundreds of logical qubits after error correction. Current NISQ hardware (including IBM Eagle at 127 physical qubits) cannot solve instances at that scale with circuit depths sufficient for QAOA to outperform SA consistently.

TSMC’s research team projected that fault-tolerant quantum hardware with approximately 1,000 logical qubits and circuit depth support of p=20-50 QAOA layers would be required for production floorplanning advantage. Based on current hardware roadmaps, this target falls in the 2030-2035 window. In the interim, TSMC is exploring quantum-inspired classical algorithms (tensor network methods, DWAVE hybrid) that borrow mathematical ideas from quantum optimization without requiring quantum hardware.

The two novel placement configurations found on the 16-module benchmark are being studied as training data for a classical ML model that can generalize the low-energy placement patterns to larger instances, a hybrid classical-quantum workflow that extracts value from current hardware while full fault-tolerance matures.