Continuous-Variable Quantum Neural Networks with Strawberry Fields
Build a continuous-variable quantum neural network using Strawberry Fields with differentiable backends, implementing the Killoran et al. CV-QNN architecture for function approximation.
Continuous-variable quantum neural networks (CV-QNNs) represent a fundamentally different approach to quantum machine learning compared to qubit-based variational circuits. Instead of parameterized rotations on discrete states, CV-QNNs use optical gates on bosonic modes, producing a continuous optimization landscape. In this tutorial, you will implement the CV-QNN architecture proposed by Killoran et al. (2019) using Strawberry Fields with its TensorFlow backend, and train it to approximate a simple mathematical function.
Prerequisites
You will need Strawberry Fields with TensorFlow support:
pip install strawberryfields[tf] tensorflow numpy matplotlib
The TensorFlow backend ("tf") makes all quantum gate parameters differentiable, enabling gradient-based optimization of quantum circuits.
The CV-QNN Architecture
The CV-QNN architecture introduced by Killoran, Bromley, Arrazola, Schuld, Quesada, and Killoran consists of repeated layers, each containing the following operations applied to modes:
- Interferometer: A universal linear optical network (beamsplitters and phase shifters) that mixes all modes. This is parameterized by angles and .
- Squeezing: Individual
Sgate(r, phi)on each mode, adding non-trivial correlations in the quadrature space. - Interferometer: A second interferometer with independent parameters.
- Displacement: Individual
Dgate(alpha, phi)on each mode, shifting the state in phase space. - Non-Gaussian gate: A
Kgate(kappa)(Kerr interaction) on each mode. This is the crucial ingredient that makes the network universal; without it, all operations would be Gaussian and classically simulable.
Each layer has parameters from the interferometers plus parameters from squeezing, displacement, and Kerr gates.
Why CV-QNNs Differ from Qubit Variational Circuits
There are several important distinctions between CV-QNNs and their qubit-based counterparts such as VQE or QAOA:
Continuous optimization landscape. Qubit circuits produce discrete measurement outcomes (bitstrings), and the cost function is computed from expectation values. CV circuits operate on continuous quadrature variables, and the output can be a continuous function of the input. This means the optimization landscape is inherently smooth.
Non-Gaussian gates provide universality. Gaussian operations (interferometers, squeezing, displacement) alone are efficiently classically simulable. Adding a single non-Gaussian element like the Kerr gate makes the model computationally universal. This is analogous to how adding a non-Clifford gate (like T) to Clifford circuits enables universal qubit computation.
Barren plateaus at small depth. Research has shown that CV-QNNs with moderate circuit depth and a small number of modes exhibit trainable gradients, avoiding the barren plateau problem that plagues deep random qubit circuits. This advantage diminishes as the number of modes and depth grow, but for practical near-term circuits the optimization landscape remains navigable.
Building a Single CV-QNN Layer
Let us start by constructing a single layer of the CV-QNN. We will use two modes for this example.
import strawberryfields as sf
from strawberryfields.ops import (
Sgate, Dgate, Kgate, BSgate, Rgate, MeasureX
)
import tensorflow as tf
import numpy as np
def cv_qnn_layer(q, params, num_modes):
"""
Apply one layer of the CV-QNN architecture to the register q.
params is a dictionary containing:
- 'bs1_theta', 'bs1_phi': beamsplitter params for first interferometer
- 'sq_r', 'sq_phi': squeezing params per mode
- 'bs2_theta', 'bs2_phi': beamsplitter params for second interferometer
- 'disp_r', 'disp_phi': displacement params per mode
- 'kerr': Kerr gate params per mode
"""
# First interferometer (for 2 modes, a single beamsplitter suffices)
BSgate(params['bs1_theta'], params['bs1_phi']) | (q[0], q[1])
# Squeezing on each mode
for i in range(num_modes):
Sgate(params['sq_r'][i], params['sq_phi'][i]) | q[i]
# Second interferometer
BSgate(params['bs2_theta'], params['bs2_phi']) | (q[0], q[1])
# Displacement on each mode
for i in range(num_modes):
Dgate(params['disp_r'][i], params['disp_phi'][i]) | q[i]
# Kerr non-linearity on each mode
for i in range(num_modes):
Kgate(params['kerr'][i]) | q[i]
For larger numbers of modes, the interferometer would be decomposed into a mesh of beamsplitters and phase shifters using the Clements or Reck decomposition. Strawberry Fields provides sf.ops.Interferometer for this purpose, which accepts a unitary matrix and automatically decomposes it.
Implementing a Function Approximation Task
We will train a CV-QNN to approximate the sine function. The input is encoded as a displacement in mode 0, and the output is read from a homodyne measurement of the same mode.
import strawberryfields as sf
from strawberryfields.ops import (
Sgate, Dgate, Kgate, BSgate, Rgate, MeasureHomodyne, Vacuum
)
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# Hyperparameters
num_modes = 2
num_layers = 4
cutoff_dim = 10 # Fock space truncation for the tf backend
# Initialize trainable parameters for all layers
np.random.seed(42)
params = []
for layer in range(num_layers):
layer_params = {
'bs1_theta': tf.Variable(np.random.normal(0, 0.1), dtype=tf.float64),
'bs1_phi': tf.Variable(np.random.normal(0, 0.1), dtype=tf.float64),
'sq_r': tf.Variable(np.random.normal(0, 0.1, size=num_modes), dtype=tf.float64),
'sq_phi': tf.Variable(np.random.normal(0, 0.1, size=num_modes), dtype=tf.float64),
'bs2_theta': tf.Variable(np.random.normal(0, 0.1), dtype=tf.float64),
'bs2_phi': tf.Variable(np.random.normal(0, 0.1), dtype=tf.float64),
'disp_r': tf.Variable(np.random.normal(0, 0.1, size=num_modes), dtype=tf.float64),
'disp_phi': tf.Variable(np.random.normal(0, 0.1, size=num_modes), dtype=tf.float64),
'kerr': tf.Variable(np.random.normal(0, 0.1, size=num_modes), dtype=tf.float64),
}
params.append(layer_params)
# Collect all trainable variables
all_variables = []
for lp in params:
for v in lp.values():
all_variables.append(v)
def circuit(x_input, params):
"""
Run the CV-QNN circuit with input x encoded as displacement.
Returns the expectation value of the X quadrature on mode 0.
"""
prog = sf.Program(num_modes)
with prog.context as q:
# Encode input as displacement on mode 0
Dgate(x_input, 0.0) | q[0]
# Apply CV-QNN layers
for layer in range(num_layers):
p = params[layer]
# First interferometer
BSgate(p['bs1_theta'], p['bs1_phi']) | (q[0], q[1])
# Squeezing
for i in range(num_modes):
Sgate(p['sq_r'][i], p['sq_phi'][i]) | q[i]
# Second interferometer
BSgate(p['bs2_theta'], p['bs2_phi']) | (q[0], q[1])
# Displacement
for i in range(num_modes):
Dgate(p['disp_r'][i], p['disp_phi'][i]) | q[i]
# Kerr nonlinearity
for i in range(num_modes):
Kgate(p['kerr'][i]) | q[i]
# Run on TensorFlow backend
eng = sf.Engine("tf", backend_options={"cutoff_dim": cutoff_dim})
result = eng.run(prog)
state = result.state
# Return the expectation value of X on mode 0
return state.x_quad_values(0)
Training the Network
The training loop uses TensorFlow’s automatic differentiation to compute gradients of the loss function with respect to all circuit parameters.
# Training data: learn sin(x) over [-1, 1]
x_train = np.linspace(-1.0, 1.0, 20)
y_train = np.sin(x_train)
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)
# Training loop
losses = []
for epoch in range(200):
with tf.GradientTape() as tape:
loss = 0.0
for x_val, y_val in zip(x_train, y_train):
x_tf = tf.constant(x_val, dtype=tf.float64)
y_pred = circuit(x_tf, params)
loss += (y_pred - y_val) ** 2
loss = loss / len(x_train) # Mean squared error
gradients = tape.gradient(loss, all_variables)
optimizer.apply_gradients(zip(gradients, all_variables))
if epoch % 20 == 0:
print(f"Epoch {epoch}: MSE = {loss.numpy():.6f}")
losses.append(loss.numpy())
# Plot training loss
plt.figure(figsize=(8, 4))
plt.plot(losses)
plt.xlabel("Epoch")
plt.ylabel("MSE Loss")
plt.title("CV-QNN Training Progress")
plt.yscale("log")
plt.grid(True)
plt.savefig("cvqnn_training_loss.png", dpi=150)
plt.show()
Evaluating the Trained Network
After training, we evaluate the circuit on a finer grid and compare the output to the target function.
# Evaluate on a finer grid
x_test = np.linspace(-1.5, 1.5, 100)
y_test = np.sin(x_test)
y_pred = []
for x_val in x_test:
x_tf = tf.constant(x_val, dtype=tf.float64)
pred = circuit(x_tf, params)
y_pred.append(pred.numpy())
y_pred = np.array(y_pred)
plt.figure(figsize=(8, 5))
plt.plot(x_test, y_test, "b-", label="sin(x)", linewidth=2)
plt.plot(x_test, y_pred, "r--", label="CV-QNN output", linewidth=2)
plt.scatter(x_train, y_train, c="black", s=30, zorder=5, label="Training points")
plt.xlabel("x")
plt.ylabel("y")
plt.title("CV-QNN Function Approximation")
plt.legend()
plt.grid(True)
plt.savefig("cvqnn_approximation.png", dpi=150)
plt.show()
With sufficient training, the CV-QNN can closely approximate the sine function within the training domain. Performance outside the training range (extrapolation) will degrade, similar to classical neural networks.
The Role of the Kerr Gate
The Kerr gate Kgate(kappa) implements the self-interaction Hamiltonian , where is the photon number operator. This introduces a non-linear phase shift that depends on the photon number of the mode. Without this gate, the entire circuit would consist of Gaussian operations, which can be efficiently simulated classically and have limited expressive power.
To see this concretely, try removing the Kerr gates from the circuit and retraining. You will find that the network can only learn affine functions (linear maps plus constant offsets), because Gaussian operations preserve the Gaussian character of the state and the expectation value of is a linear function of the input displacement.
# Experiment: train without Kerr gates
# Set all Kerr parameters to zero and freeze them
for lp in params:
lp['kerr'].assign(tf.zeros_like(lp['kerr']))
# Retrain and observe that only linear functions are learnable
Alternative Non-Gaussian Gates
While the Kerr gate is the standard choice in the Killoran et al. architecture, Strawberry Fields also provides the cubic phase gate Vgate(gamma), which implements . The cubic phase gate is an alternative source of non-Gaussianity. In principle, any single non-Gaussian gate combined with Gaussian operations is sufficient for universality, but the choice affects trainability and the types of functions the network can represent efficiently.
Scaling Considerations
For this tutorial, we used 2 modes and 4 layers with a Fock space cutoff of 10. In practice, several factors constrain the scalability of CV-QNNs:
Fock space truncation. The TensorFlow backend represents each mode as a vector in truncated Fock space. With cutoff dimension and modes, the full state vector has elements. This exponential scaling limits classical simulation to small systems, typically 4 to 6 modes with moderate cutoff.
Gate decomposition. The interferometer for modes requires beamsplitters (using the Clements decomposition). The total parameter count per layer scales as .
Hardware mapping. On Xanadu’s photonic hardware, the available gate set and connectivity differ from the idealized model. Time-domain multiplexing on processors like Borealis imposes specific constraints on which gates can be applied in sequence. PennyLane, Xanadu’s other quantum software library, provides more direct hardware integration for variational circuits.
Summary
In this tutorial, you built a continuous-variable quantum neural network using Strawberry Fields, following the architecture of Killoran et al. The key takeaways are that CV-QNNs operate on a fundamentally different computational substrate than qubit variational circuits, that the Kerr gate provides the essential non-Gaussian element for universality, and that the TensorFlow backend enables standard gradient-based optimization of all circuit parameters. For production quantum machine learning workflows, consider using PennyLane, which provides a higher-level interface for variational CV circuits with support for automatic differentiation across multiple backends.
Was this tutorial helpful?