Continuous-Variable Quantum Neural Networks with Strawberry Fields

Continuous-variable quantum neural networks (CV-QNNs) represent a fundamentally different approach to quantum machine learning compared to qubit-based variational circuits. Instead of parameterized rotations on discrete states, CV-QNNs use optical gates on bosonic modes, producing a continuous optimization landscape. In this tutorial, you will implement the CV-QNN architecture proposed by Killoran et al. (2019) using Strawberry Fields with its TensorFlow backend, and train it to approximate a simple mathematical function.

Prerequisites

You will need Strawberry Fields with TensorFlow support:

pip install strawberryfields[tf] tensorflow numpy matplotlib

The TensorFlow backend ("tf") makes all quantum gate parameters differentiable, enabling gradient-based optimization of quantum circuits.

The CV-QNN Architecture

The CV-QNN architecture introduced by Killoran, Bromley, Arrazola, Schuld, Quesada, and Lloyd consists of repeated layers, each containing the following operations applied to $N$ modes:

Interferometer: A universal linear optical network (beamsplitters and phase shifters) that mixes all modes. This is parameterized by angles $\theta$ and $\phi$ .
Squeezing: Individual Sgate(r, phi) on each mode, adding non-trivial correlations in the quadrature space.
Interferometer: A second interferometer with independent parameters.
Displacement: Individual Dgate(alpha, phi) on each mode, shifting the state in phase space.
Non-Gaussian gate: A Kgate(kappa) (Kerr interaction) on each mode. This is the crucial ingredient that makes the network universal; without it, all operations would be Gaussian and classically simulable.

Each layer has $O(N^2)$ parameters from the interferometers plus $O(N)$ parameters from squeezing, displacement, and Kerr gates.

Why CV-QNNs Differ from Qubit Variational Circuits

There are several important distinctions between CV-QNNs and their qubit-based counterparts such as VQE or QAOA:

Continuous optimization landscape. Qubit circuits produce discrete measurement outcomes (bitstrings), and the cost function is computed from expectation values. CV circuits operate on continuous quadrature variables, and the output can be a continuous function of the input. This means the optimization landscape is inherently smooth.

Non-Gaussian gates provide universality. Gaussian operations (interferometers, squeezing, displacement) alone are efficiently classically simulable. Adding a single non-Gaussian element like the Kerr gate makes the model computationally universal. This is analogous to how adding a non-Clifford gate (like T) to Clifford circuits enables universal qubit computation.

Barren plateaus at small depth. Research has shown that CV-QNNs with moderate circuit depth and a small number of modes exhibit trainable gradients, avoiding the barren plateau problem that plagues deep random qubit circuits. This advantage diminishes as the number of modes and depth grow, but for practical near-term circuits the optimization landscape remains navigable.

Building a Single CV-QNN Layer

Let us start by constructing a single layer of the CV-QNN. We will use two modes for this example.

import strawberryfields as sf
from strawberryfields.ops import (
    Sgate, Dgate, Kgate, BSgate, Rgate, MeasureX
)
import tensorflow as tf
import numpy as np

def cv_qnn_layer(q, params, num_modes):
    """
    Apply one layer of the CV-QNN architecture to the register q.

    params is a dictionary containing:
      - 'bs1_theta', 'bs1_phi': beamsplitter params for first interferometer
      - 'sq_r', 'sq_phi': squeezing params per mode
      - 'bs2_theta', 'bs2_phi': beamsplitter params for second interferometer
      - 'disp_r', 'disp_phi': displacement params per mode
      - 'kerr': Kerr gate params per mode
    """
    # First interferometer (for 2 modes, a single beamsplitter suffices)
    BSgate(params['bs1_theta'], params['bs1_phi']) | (q[0], q[1])

    # Squeezing on each mode
    for i in range(num_modes):
        Sgate(params['sq_r'][i], params['sq_phi'][i]) | q[i]

    # Second interferometer
    BSgate(params['bs2_theta'], params['bs2_phi']) | (q[0], q[1])

    # Displacement on each mode
    for i in range(num_modes):
        Dgate(params['disp_r'][i], params['disp_phi'][i]) | q[i]

    # Kerr non-linearity on each mode
    for i in range(num_modes):
        Kgate(params['kerr'][i]) | q[i]

For larger numbers of modes, the interferometer would be decomposed into a mesh of beamsplitters and phase shifters using the Clements or Reck decomposition. Strawberry Fields provides sf.ops.Interferometer for this purpose, which accepts a unitary matrix and automatically decomposes it.

Implementing a Function Approximation Task

We will train a CV-QNN to approximate the sine function. The input is encoded as a displacement in mode 0, and the output is read from a homodyne measurement of the same mode.

import strawberryfields as sf
from strawberryfields.ops import (
    Sgate, Dgate, Kgate, BSgate, Rgate, MeasureHomodyne, Vacuum
)
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

# Hyperparameters
num_modes = 2
num_layers = 4
cutoff_dim = 10  # Fock space truncation for the tf backend

# Initialize trainable parameters for all layers
np.random.seed(42)
params = []
for layer in range(num_layers):
    layer_params = {
        'bs1_theta': tf.Variable(np.random.normal(0, 0.1), dtype=tf.float64),
        'bs1_phi': tf.Variable(np.random.normal(0, 0.1), dtype=tf.float64),
        'sq_r': tf.Variable(np.random.normal(0, 0.1, size=num_modes), dtype=tf.float64),
        'sq_phi': tf.Variable(np.random.normal(0, 0.1, size=num_modes), dtype=tf.float64),
        'bs2_theta': tf.Variable(np.random.normal(0, 0.1), dtype=tf.float64),
        'bs2_phi': tf.Variable(np.random.normal(0, 0.1), dtype=tf.float64),
        'disp_r': tf.Variable(np.random.normal(0, 0.1, size=num_modes), dtype=tf.float64),
        'disp_phi': tf.Variable(np.random.normal(0, 0.1, size=num_modes), dtype=tf.float64),
        'kerr': tf.Variable(np.random.normal(0, 0.1, size=num_modes), dtype=tf.float64),
    }
    params.append(layer_params)

# Collect all trainable variables
all_variables = []
for lp in params:
    for v in lp.values():
        all_variables.append(v)


def circuit(x_input, params):
    """
    Run the CV-QNN circuit with input x encoded as displacement.
    Returns the expectation value of the X quadrature on mode 0.
    """
    prog = sf.Program(num_modes)

    with prog.context as q:
        # Encode input as displacement on mode 0
        Dgate(x_input, 0.0) | q[0]

        # Apply CV-QNN layers
        for layer in range(num_layers):
            p = params[layer]

            # First interferometer
            BSgate(p['bs1_theta'], p['bs1_phi']) | (q[0], q[1])

            # Squeezing
            for i in range(num_modes):
                Sgate(p['sq_r'][i], p['sq_phi'][i]) | q[i]

            # Second interferometer
            BSgate(p['bs2_theta'], p['bs2_phi']) | (q[0], q[1])

            # Displacement
            for i in range(num_modes):
                Dgate(p['disp_r'][i], p['disp_phi'][i]) | q[i]

            # Kerr nonlinearity
            for i in range(num_modes):
                Kgate(p['kerr'][i]) | q[i]

    # Run on TensorFlow backend
    eng = sf.Engine("tf", backend_options={"cutoff_dim": cutoff_dim})
    result = eng.run(prog)
    state = result.state

    # Return the expectation value of X on mode 0
    return state.x_quad_values(0)

Training the Network

The training loop uses TensorFlow’s automatic differentiation to compute gradients of the loss function with respect to all circuit parameters.

# Training data: learn sin(x) over [-1, 1]
x_train = np.linspace(-1.0, 1.0, 20)
y_train = np.sin(x_train)

optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)

# Training loop
losses = []
for epoch in range(200):
    with tf.GradientTape() as tape:
        loss = 0.0
        for x_val, y_val in zip(x_train, y_train):
            x_tf = tf.constant(x_val, dtype=tf.float64)
            y_pred = circuit(x_tf, params)
            loss += (y_pred - y_val) ** 2

        loss = loss / len(x_train)  # Mean squared error

    gradients = tape.gradient(loss, all_variables)
    optimizer.apply_gradients(zip(gradients, all_variables))

    if epoch % 20 == 0:
        print(f"Epoch {epoch}: MSE = {loss.numpy():.6f}")
    losses.append(loss.numpy())

# Plot training loss
plt.figure(figsize=(8, 4))
plt.plot(losses)
plt.xlabel("Epoch")
plt.ylabel("MSE Loss")
plt.title("CV-QNN Training Progress")
plt.yscale("log")
plt.grid(True)
plt.savefig("cvqnn_training_loss.png", dpi=150)
plt.show()

Evaluating the Trained Network

After training, we evaluate the circuit on a finer grid and compare the output to the target function.

# Evaluate on a finer grid
x_test = np.linspace(-1.5, 1.5, 100)
y_test = np.sin(x_test)
y_pred = []

for x_val in x_test:
    x_tf = tf.constant(x_val, dtype=tf.float64)
    pred = circuit(x_tf, params)
    y_pred.append(pred.numpy())

y_pred = np.array(y_pred)

plt.figure(figsize=(8, 5))
plt.plot(x_test, y_test, "b-", label="sin(x)", linewidth=2)
plt.plot(x_test, y_pred, "r--", label="CV-QNN output", linewidth=2)
plt.scatter(x_train, y_train, c="black", s=30, zorder=5, label="Training points")
plt.xlabel("x")
plt.ylabel("y")
plt.title("CV-QNN Function Approximation")
plt.legend()
plt.grid(True)
plt.savefig("cvqnn_approximation.png", dpi=150)
plt.show()

With sufficient training, the CV-QNN can closely approximate the sine function within the training domain. Performance outside the training range (extrapolation) will degrade, similar to classical neural networks.

The Role of the Kerr Gate

The Kerr gate Kgate(kappa) implements the self-interaction Hamiltonian $H = \kappa \hat{n}^2$ , where $\hat{n}$ is the photon number operator. This introduces a non-linear phase shift that depends on the photon number of the mode. Without this gate, the entire circuit would consist of Gaussian operations, which can be efficiently simulated classically and have limited expressive power.

To see this concretely, try removing the Kerr gates from the circuit and retraining. You will find that the network can only learn affine functions (linear maps plus constant offsets), because Gaussian operations preserve the Gaussian character of the state and the expectation value of $\hat{X}$ is a linear function of the input displacement.

# Experiment: train without Kerr gates
# Set all Kerr parameters to zero and freeze them
for lp in params:
    lp['kerr'].assign(tf.zeros_like(lp['kerr']))

# Retrain and observe that only linear functions are learnable

Alternative Non-Gaussian Gates

While the Kerr gate is the standard choice in the Killoran et al. architecture, Strawberry Fields also provides the cubic phase gate Vgate(gamma), which implements $H = \gamma \hat{X}^3$ . The cubic phase gate is an alternative source of non-Gaussianity. In principle, any single non-Gaussian gate combined with Gaussian operations is sufficient for universality, but the choice affects trainability and the types of functions the network can represent efficiently.

Scaling Considerations

For this tutorial, we used 2 modes and 4 layers with a Fock space cutoff of 10. In practice, several factors constrain the scalability of CV-QNNs:

Fock space truncation. The TensorFlow backend represents each mode as a vector in truncated Fock space. With cutoff dimension $d$ and $N$ modes, the full state vector has $d^N$ elements. This exponential scaling limits classical simulation to small systems, typically 4 to 6 modes with moderate cutoff.

Gate decomposition. The interferometer for $N$ modes requires $O(N^2)$ beamsplitters (using the Clements decomposition). The total parameter count per layer scales as $O(N^2)$ .

Hardware mapping. On Xanadu’s photonic hardware, the available gate set and connectivity differ from the idealized model. Time-domain multiplexing on processors like Borealis imposes specific constraints on which gates can be applied in sequence. PennyLane, Xanadu’s other quantum software library, provides more direct hardware integration for variational circuits.

Summary

In this tutorial, you built a continuous-variable quantum neural network using Strawberry Fields, following the architecture of Killoran et al. The key takeaways are that CV-QNNs operate on a fundamentally different computational substrate than qubit variational circuits, that the Kerr gate provides the essential non-Gaussian element for universality, and that the TensorFlow backend enables standard gradient-based optimization of all circuit parameters. For production quantum machine learning workflows, consider using PennyLane, which provides a higher-level interface for variational CV circuits with support for automatic differentiation across multiple backends.