Hybrid Quantum-Classical Optimization Loops with PennyLane and PyTorch

Hybrid quantum-classical optimization sits at the heart of practical near-term quantum computing. Variational algorithms like VQE and QAOA alternate between quantum circuit execution (sampling expectation values) and classical optimization (updating circuit parameters). PennyLane makes this pipeline feel natural by treating quantum circuits as differentiable functions that slot directly into classical machine learning frameworks.

This tutorial builds a complete hybrid classifier: a parameterized quantum circuit (PQC) wrapped as a TorchLayer, trained on a synthetic binary classification dataset using PyTorch’s Adam optimizer, with gradients computed automatically through the quantum circuit via PennyLane’s parameter-shift rule.

The Idea: Quantum Circuits as Differentiable Layers

PyTorch’s autograd engine computes gradients by tracing operations through a computational graph. Every operation must be differentiable. Quantum circuits, at first glance, seem incompatible: they involve physical measurement, not a smooth mathematical function.

PennyLane solves this with the parameter-shift rule. For a gate G(theta) = exp(-i theta P / 2) where P is a Pauli operator, the gradient with respect to theta can be computed exactly using two circuit evaluations:

d/d_theta <O> = [<O>(theta + pi/2) - <O>(theta - pi/2)] / 2

This two-point formula is exact (not an approximation like finite differences) and works on real hardware. PennyLane hooks the parameter-shift rule into PyTorch’s autograd, so calling .backward() on a loss that depends on circuit outputs triggers hardware-compatible gradient computation automatically.

Setup and Dataset

import torch
import torch.nn as nn
import pennylane as qml
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_circles
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Reproducibility
torch.manual_seed(42)
np.random.seed(42)

# Synthetic binary classification: two concentric circles
# This dataset is linearly inseparable, making it a good test
X_raw, y_raw = make_circles(n_samples=40, noise=0.15, factor=0.4, random_state=42)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_raw)

# Scale to [-pi, pi] for use as rotation angles
X = (X_scaled / X_scaled.max()) * np.pi

# Split into train/test
split = 30
X_train = torch.tensor(X[:split], dtype=torch.float32)
y_train = torch.tensor(y_raw[:split], dtype=torch.float32)
X_test  = torch.tensor(X[split:], dtype=torch.float32)
y_test  = torch.tensor(y_raw[split:], dtype=torch.float32)

print(f"Training samples: {len(X_train)}, Test samples: {len(X_test)}")
print(f"Features: {X_train.shape[1]}, Classes: {len(torch.unique(y_train))}")

The Variational Quantum Circuit

We use a 2-qubit circuit with 3 layers of parameterized rotations and entangling gates. The 2 input features are angle-encoded via Ry gates, and the output is the expectation value of Z on qubit 0:

n_qubits = 2
n_layers = 3
dev = qml.device("default.qubit", wires=n_qubits)

@qml.qnode(dev, interface="torch", diff_method="parameter-shift")
def quantum_circuit(inputs, weights):
    """
    Variational quantum circuit for binary classification.

    inputs:  shape (2,) - the two input features, used as rotation angles
    weights: shape (n_layers, n_qubits, 3) - trainable parameters
             [layer, qubit, {Rz, Ry, Rz}]
    """
    # --- Data encoding: angle embedding ---
    qml.AngleEmbedding(inputs, wires=range(n_qubits), rotation='Y')

    # --- Variational layers ---
    for layer in range(n_layers):
        # Parameterized rotations on each qubit
        for qubit in range(n_qubits):
            qml.Rot(weights[layer, qubit, 0],
                    weights[layer, qubit, 1],
                    weights[layer, qubit, 2],
                    wires=qubit)
        # Entangling layer: CNOT ring
        for qubit in range(n_qubits - 1):
            qml.CNOT(wires=[qubit, qubit + 1])
        if n_qubits > 1:
            qml.CNOT(wires=[n_qubits - 1, 0])   # close the ring

    # Output: expectation value of PauliZ on qubit 0
    # Maps to [-1, 1]; we will convert to probability in the model
    return qml.expval(qml.PauliZ(0))

Wrapping as a PyTorch Module

qml.qnn.TorchLayer converts the QNode into a torch.nn.Module, making it composable with any PyTorch model:

weight_shapes = {"weights": (n_layers, n_qubits, 3)}

qlayer = qml.qnn.TorchLayer(quantum_circuit, weight_shapes)

class HybridClassifier(nn.Module):
    """
    Hybrid quantum-classical binary classifier.
    Architecture: quantum circuit -> sigmoid activation -> prediction
    """
    def __init__(self):
        super().__init__()
        self.quantum = qlayer
        # Optional: a classical post-processing layer
        self.classical = nn.Linear(1, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        # Run quantum circuit for each sample (batch dimension handled below)
        q_out = torch.stack([self.quantum(x[i]) for i in range(x.shape[0])])
        q_out = q_out.unsqueeze(1)    # shape: (batch, 1)
        logit = self.classical(q_out).squeeze(1)
        return self.sigmoid(logit)

model = HybridClassifier()
print(f"Total trainable parameters: {sum(p.numel() for p in model.parameters())}")
# n_layers * n_qubits * 3 rotation angles + 2 classical (weight + bias) = 20

Training Loop

Standard PyTorch training with Adam optimizer and binary cross-entropy loss:

optimizer = torch.optim.Adam(model.parameters(), lr=0.05)
loss_fn = nn.BCELoss()

n_epochs = 10
train_losses = []
train_accs   = []
test_accs    = []

print(f"{'Epoch':>6} {'Loss':>10} {'Train Acc':>12} {'Test Acc':>10}")
print("-" * 42)

for epoch in range(n_epochs):
    model.train()
    optimizer.zero_grad()

    # Forward pass
    predictions = model(X_train)
    loss = loss_fn(predictions, y_train)

    # Backward pass (triggers parameter-shift gradient computation)
    loss.backward()
    optimizer.step()

    # Metrics
    train_pred_labels = (predictions.detach() > 0.5).float()
    train_acc = (train_pred_labels == y_train).float().mean().item()

    model.eval()
    with torch.no_grad():
        test_pred = model(X_test)
        test_pred_labels = (test_pred > 0.5).float()
        test_acc = (test_pred_labels == y_test).float().mean().item()

    train_losses.append(loss.item())
    train_accs.append(train_acc)
    test_accs.append(test_acc)

    if (epoch + 1) % 10 == 0:
        print(f"{epoch+1:>6} {loss.item():>10.4f} {train_acc:>12.4f} {test_acc:>10.4f}")

Visualizing the Loss Curve

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Loss curve
axes[0].plot(train_losses, label='Training Loss', color='royalblue')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('BCE Loss')
axes[0].set_title('Hybrid Quantum Classifier: Training Loss')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Accuracy curves
axes[1].plot(train_accs, label='Train Accuracy', color='royalblue')
axes[1].plot(test_accs,  label='Test Accuracy',  color='coral')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Accuracy')
axes[1].set_title('Train vs Test Accuracy')
axes[1].set_ylim(0, 1.05)
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('hybrid_training.png', dpi=150)
plt.show()

Comparison with Classical Logistic Regression

A meaningful benchmark tests whether the quantum circuit captures non-linear structure better than a linear classical model:

# Classical baseline: logistic regression
X_train_np = X_train.numpy()
X_test_np  = X_test.numpy()
y_train_np = y_train.numpy()
y_test_np  = y_test.numpy()

lr_model = LogisticRegression(max_iter=1000)
lr_model.fit(X_train_np, y_train_np)

lr_train_acc = accuracy_score(y_train_np, lr_model.predict(X_train_np))
lr_test_acc  = accuracy_score(y_test_np,  lr_model.predict(X_test_np))

# Hybrid model final accuracy
model.eval()
with torch.no_grad():
    final_train_pred = (model(X_train) > 0.5).float()
    final_test_pred  = (model(X_test)  > 0.5).float()

hybrid_train_acc = (final_train_pred == y_train).float().mean().item()
hybrid_test_acc  = (final_test_pred  == y_test).float().mean().item()

print("\n--- Accuracy Comparison ---")
print(f"{'Model':>30} {'Train Acc':>12} {'Test Acc':>10}")
print("-" * 54)
print(f"{'Logistic Regression (classical)':>30} {lr_train_acc:>12.4f} {lr_test_acc:>10.4f}")
print(f"{'Hybrid Quantum Classifier':>30} {hybrid_train_acc:>12.4f} {hybrid_test_acc:>10.4f}")

On the concentric circles dataset, logistic regression typically achieves 50-60% test accuracy (the classes are not linearly separable). The hybrid classifier, with its non-linear quantum kernel, can achieve 80-95% depending on training dynamics and initialization.

Decision Boundary Visualization

# Plot decision boundaries side by side
xx, yy = np.meshgrid(
    np.linspace(-3.5, 3.5, 10),
    np.linspace(-3.5, 3.5, 10)
)
grid_scaled = scaler.transform(np.c_[xx.ravel(), yy.ravel()])
grid = (grid_scaled / grid_scaled.max()) * np.pi
grid_tensor = torch.tensor(grid, dtype=torch.float32)

model.eval()
with torch.no_grad():
    Z_hybrid = model(grid_tensor).numpy().reshape(xx.shape)

Z_lr = lr_model.predict_proba(
    scaler.transform(np.c_[xx.ravel(), yy.ravel()])
)[:, 1].reshape(xx.shape)

fig, axes = plt.subplots(1, 2, figsize=(12, 5))
for ax, Z, title in zip(axes,
                          [Z_lr, Z_hybrid],
                          ['Logistic Regression', 'Hybrid Quantum Classifier']):
    ax.contourf(xx, yy, Z, levels=50, cmap='RdBu', alpha=0.7)
    ax.scatter(X_raw[:split, 0], X_raw[:split, 1], c=y_raw[:split],
               cmap='RdBu', edgecolors='k', s=25, linewidths=0.5)
    ax.set_title(title)
    ax.set_xlabel('Feature 1')
    ax.set_ylabel('Feature 2')

plt.tight_layout()
plt.savefig('decision_boundaries.png', dpi=150)
plt.show()

The classical model’s linear decision boundary fails to separate the circles. The hybrid model learns a curved boundary by exploiting the non-linear transformation that the quantum circuit applies to the input features through angle encoding and entanglement.

Key Takeaways

PennyLane’s TorchLayer makes hybrid optimization genuinely composable. A few things to keep in mind for practical use:

Parameter-shift gradients are exact but expensive. Each parameter requires two circuit evaluations. For 18 parameters this tutorial uses 36 circuit calls per batch per backward pass. On real hardware, batch sizes of 1-10 are practical.
Barren plateaus are a real obstacle for deeper or wider circuits. Gradient magnitudes vanish exponentially with circuit size under random initialization. Structured initialization (near-identity) and problem-inspired circuit design help.
Quantum advantage in classification is not established for near-term devices. The value of this pipeline is its structure: it will be the correct approach when hardware matures, and it is a useful research tool now for studying quantum learning theory.

From here, explore qml.qnn.KerasLayer for TensorFlow integration, qml.gradients.stoch_pulse_grad for pulse-level gradient computation, and PennyLane’s built-in datasets for standardized quantum ML benchmarking.