• Machine Learning

Waymo Quantum-Enhanced Motion Planning for Autonomous Vehicles

Waymo

Waymo researched quantum reinforcement learning for autonomous vehicle motion planning in complex urban environments, exploring quantum approximate optimization and quantum-enhanced RL to handle high-dimensional multi-agent trajectory problems.

Key Outcome
Quantum RL matched classical deep Q-network on simplified 8-agent intersection simulation; identified quantum advantage pathway through quantum walk-based exploration.

The Problem

Motion planning for autonomous vehicles is one of the hardest real-time optimization problems in applied ML. At a busy urban intersection, a self-driving car must simultaneously predict the intentions of pedestrians, cyclists, and other vehicles, generate a safe trajectory for itself, and re-plan at 10Hz as the scene evolves. With eight or more interacting agents, the joint state space grows exponentially and classical deep RL struggles to explore it efficiently.

Waymo’s research team asked whether quantum reinforcement learning could explore this high-dimensional space more effectively than classical deep Q-networks, and whether quantum approximate optimization could accelerate trajectory selection at inference time.

Quantum RL Approach

The team encoded the multi-agent scene state into a parameterized quantum circuit using angle embedding. Each agent’s position, velocity, and heading were mapped to rotation angles on a qubit register. A variational quantum circuit, trained with the parameter-shift rule, acted as the Q-function approximator, replacing the classical neural network in a standard DQN setup.

import pennylane as qml
import numpy as np

n_qubits = 8  # one qubit per agent in simplified scenario
dev = qml.device("default.qubit", wires=n_qubits)

@qml.qnode(dev)
def q_network(state, weights):
    # Angle embedding: encode agent states as rotation angles
    qml.AngleEmbedding(state, wires=range(n_qubits), rotation="Y")

    # Variational ansatz: two layers of CNOT-entangled Ry gates
    for layer in range(2):
        for i in range(n_qubits):
            qml.RY(weights[layer, i], wires=i)
        for i in range(n_qubits - 1):
            qml.CNOT(wires=[i, i + 1])

    # Measure each qubit to produce Q-values for 8 action candidates
    return [qml.expval(qml.PauliZ(i)) for i in range(n_qubits)]

# Training loop (simplified)
weights = np.random.uniform(-np.pi, np.pi, (2, n_qubits))
opt = qml.AdamOptimizer(stepsize=0.01)

def loss_fn(weights, state_batch, target_q):
    q_vals = np.array([q_network(s, weights) for s in state_batch])
    return np.mean((q_vals - target_q) ** 2)

for step in range(500):
    weights, cost = opt.step_and_cost(
        lambda w: loss_fn(w, state_batch, target_q), weights
    )

State Space Encoding and Simulation Results

Encoding the full Waymo scene graph into a fixed-size qubit register required careful dimensionality reduction. The team projected each agent’s state into a 1-qubit angle using a classical learned embedding before feeding into the quantum circuit, keeping the circuit width tractable for near-term hardware.

Simulations ran on PennyLane’s default.qubit simulator and, for selected experiments, on Google Sycamore via Waymo’s internal access to Alphabet’s quantum hardware. On a simplified 8-agent intersection scenario, the quantum DQN reached the same cumulative reward as a classical DQN with a comparable parameter count, while showing measurably broader exploration during early training, consistent with the theoretical advantage of quantum walk-based state space traversal.

The Quantum Advantage Pathway

The team’s clearest finding was not a present-day speedup but a well-defined pathway to quantum advantage. Quantum walk-based exploration provides a quadratic speedup in hitting time over random classical exploration on structured graphs; and the agent interaction graph at an intersection has exactly the kind of low-diameter structure that benefits from quantum walks. As hardware qubit counts and gate fidelities improve, the team projected that quantum-enhanced exploration could reduce the sample complexity of policy learning by a factor proportional to the square root of the state space diameter.

The near-term conclusion is that quantum RL is not yet faster than GPU-accelerated classical RL at production scale, but the algorithmic foundation is solid. Waymo continues this research line as Sycamore successor hardware matures.