- Machine Learning
Waymo Quantum-Enhanced Motion Planning for Autonomous Vehicles
Waymo
Waymo researched quantum reinforcement learning for autonomous vehicle motion planning in complex urban environments, exploring quantum approximate optimization and quantum-enhanced RL to handle high-dimensional multi-agent trajectory problems.
- Key Outcome
- Quantum RL matched classical deep Q-network on simplified 8-agent intersection simulation; identified quantum advantage pathway through quantum walk-based exploration.
The Problem
Motion planning for autonomous vehicles is one of the hardest real-time optimization problems in applied ML. At a busy urban intersection, a self-driving car must simultaneously predict the intentions of pedestrians, cyclists, and other vehicles, generate a safe trajectory for itself, and re-plan at 10Hz as the scene evolves. With eight or more interacting agents, the joint state space grows exponentially and classical deep RL struggles to explore it efficiently.
Waymo’s research team asked whether quantum reinforcement learning could explore this high-dimensional space more effectively than classical deep Q-networks, and whether quantum approximate optimization could accelerate trajectory selection at inference time.
Quantum RL Approach
The team encoded the multi-agent scene state into a parameterized quantum circuit using angle embedding. Each agent’s position, velocity, and heading were mapped to rotation angles on a qubit register. A variational quantum circuit, trained with the parameter-shift rule, acted as the Q-function approximator, replacing the classical neural network in a standard DQN setup.
import pennylane as qml
import numpy as np
n_qubits = 8 # one qubit per agent in simplified scenario
dev = qml.device("default.qubit", wires=n_qubits)
@qml.qnode(dev)
def q_network(state, weights):
# Angle embedding: encode agent states as rotation angles
qml.AngleEmbedding(state, wires=range(n_qubits), rotation="Y")
# Variational ansatz: two layers of CNOT-entangled Ry gates
for layer in range(2):
for i in range(n_qubits):
qml.RY(weights[layer, i], wires=i)
for i in range(n_qubits - 1):
qml.CNOT(wires=[i, i + 1])
# Measure each qubit to produce Q-values for 8 action candidates
return [qml.expval(qml.PauliZ(i)) for i in range(n_qubits)]
# Training loop (simplified)
weights = np.random.uniform(-np.pi, np.pi, (2, n_qubits))
opt = qml.AdamOptimizer(stepsize=0.01)
def loss_fn(weights, state_batch, target_q):
q_vals = np.array([q_network(s, weights) for s in state_batch])
return np.mean((q_vals - target_q) ** 2)
for step in range(500):
weights, cost = opt.step_and_cost(
lambda w: loss_fn(w, state_batch, target_q), weights
)
State Space Encoding and Simulation Results
Encoding the full Waymo scene graph into a fixed-size qubit register required careful dimensionality reduction. The team projected each agent’s state into a 1-qubit angle using a classical learned embedding before feeding into the quantum circuit, keeping the circuit width tractable for near-term hardware.
Simulations ran on PennyLane’s default.qubit simulator and, for selected experiments, on Google Sycamore via Waymo’s internal access to Alphabet’s quantum hardware. On a simplified 8-agent intersection scenario, the quantum DQN reached the same cumulative reward as a classical DQN with a comparable parameter count, while showing measurably broader exploration during early training, consistent with the theoretical advantage of quantum walk-based state space traversal.
The Quantum Advantage Pathway
The team’s clearest finding was not a present-day speedup but a well-defined pathway to quantum advantage. Quantum walk-based exploration provides a quadratic speedup in hitting time over random classical exploration on structured graphs; and the agent interaction graph at an intersection has exactly the kind of low-diameter structure that benefits from quantum walks. As hardware qubit counts and gate fidelities improve, the team projected that quantum-enhanced exploration could reduce the sample complexity of policy learning by a factor proportional to the square root of the state space diameter.
The near-term conclusion is that quantum RL is not yet faster than GPU-accelerated classical RL at production scale, but the algorithmic foundation is solid. Waymo continues this research line as Sycamore successor hardware matures.