Quantum Kernels and Support Vector Machines with PennyLane

Kernels, Features, and the Quantum Angle

Support vector machines (SVMs) are among the most theoretically grounded classifiers in machine learning. Their power comes from the kernel trick: instead of explicitly mapping data into a high-dimensional feature space, you compute inner products in that space directly using a kernel function k(x, x’). The SVM finds the maximum-margin hyperplane in the feature space without ever constructing the features explicitly.

This kernel trick is exactly where quantum computing enters. A quantum computer can embed classical data into a Hilbert space exponentially larger than the input dimension. The fidelity kernel, the overlap between two quantum states, is a natural inner product in that space. If the quantum feature map extracts structure that classical kernels cannot, a quantum kernel SVM might outperform classical alternatives on certain datasets.

This is not guaranteed. Whether quantum kernels provide practical advantage is an active research question. But the framework is well-defined, implementable today, and produces classifiers that work, which makes it an ideal entry point into quantum machine learning.

The Quantum Feature Map

A quantum feature map embeds a classical data point x into a quantum state:

phi: R^n -> H
     x   -> |phi(x)>

The quantum fidelity kernel between two data points is then:

k(x, x') = |<phi(x') | phi(x)>|^2

This is the probability that a circuit prepared in state |phi(x)> measures the all-zero state when run on state |phi(x’)>. Equivalently, it is the squared overlap of the two quantum states.

The choice of feature map determines the expressivity and the inductive bias of the kernel. A good quantum feature map should:

Be non-trivially quantum (not efficiently classically simulatable for all inputs)
Capture structure relevant to the classification problem
Be deep enough to express useful features but shallow enough to run before decoherence

IQP Feature Maps and Angle Encoding

Two feature map families dominate the quantum kernel literature:

Angle encoding. Each feature x_i is encoded as a rotation angle for qubit i:

U_angle(x) = product_i R_Y(x_i)

This is simple and expressible on any hardware but is only as expressive as n-dimensional rotations. The corresponding kernel is essentially a product of cosine terms.

IQP (Instantaneous Quantum Polynomial) circuits. Data is encoded through repeated layers of Hadamards and data-dependent ZZ interactions:

U_IQP(x) = [H^n * U_ZZ(x)]^d

where U_ZZ(x) = product_{j<k} exp(i x_j x_k Z_j Z_k) and d is the number of repetitions. The ZZ interactions create entanglement that mixes features together, producing a kernel that depends on products of feature pairs. Havlicek et al. (Nature 2019) introduced this map as a candidate for quantum advantage.

The kernel value for an IQP map can only be efficiently estimated by running the quantum circuit; classically computing the exact kernel value requires exponential time in n (assuming quantum supremacy conjectures hold).

PennyLane Implementation

Install the required packages:

pip install pennylane scikit-learn matplotlib numpy

Full implementation:

import pennylane as qml
from pennylane import numpy as np
import numpy
from sklearn.svm import SVC
from sklearn.datasets import make_circles, make_moons
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
import matplotlib.pyplot as plt

# ---------------------------------------------------------------
# 1. Define the quantum feature map
# ---------------------------------------------------------------

n_qubits = 4  # one qubit per feature
dev = qml.device("default.qubit", wires=n_qubits)

def iqp_feature_map(x, n_layers=2):
    """
    IQP-style feature map encoding a 4-dimensional vector x.
    Uses alternating layers of Hadamards and ZZ entanglers.
    """
    # Initial Hadamard layer: create equal superposition
    for i in range(n_qubits):
        qml.Hadamard(wires=i)
    
    for layer in range(n_layers):
        # Single-qubit Z rotations encoding individual features
        for i in range(n_qubits):
            qml.RZ(2.0 * x[i], wires=i)
        
        # Two-qubit ZZ interactions encoding feature products
        for i in range(n_qubits):
            for j in range(i + 1, n_qubits):
                qml.IsingZZ(2.0 * (numpy.pi - x[i]) * (numpy.pi - x[j]), wires=[i, j])
        
        # Second Hadamard layer
        for i in range(n_qubits):
            qml.Hadamard(wires=i)

@qml.qnode(dev)
def kernel_circuit(x1, x2):
    """
    Compute the quantum fidelity kernel between x1 and x2.
    
    Circuit: U(x1) applied forward, then U(x2) applied in reverse.
    Measure probability of all-zero outcome.
    """
    iqp_feature_map(x1)
    qml.adjoint(iqp_feature_map)(x2)
    return qml.probs(wires=range(n_qubits))

def quantum_kernel(x1, x2):
    """
    Quantum fidelity kernel: k(x1, x2) = |<phi(x2)|phi(x1)>|^2
    Returns the probability of measuring |00...0>.
    """
    probs = kernel_circuit(x1, x2)
    return float(probs[0])  # probability of all-zeros outcome

# ---------------------------------------------------------------
# 2. Build the Gram matrix for a dataset
# ---------------------------------------------------------------

def build_kernel_matrix(X1, X2, kernel_fn):
    """Compute the full kernel matrix K[i,j] = kernel_fn(X1[i], X2[j])."""
    n1, n2 = len(X1), len(X2)
    K = numpy.zeros((n1, n2))
    
    for i in range(n1):
        for j in range(n2):
            K[i, j] = kernel_fn(X1[i], X2[j])
        
        if (i + 1) % 10 == 0:
            print(f"  Kernel matrix: {i+1}/{n1} rows computed", flush=True)
    
    return K

# ---------------------------------------------------------------
# 3. Generate and preprocess data
# ---------------------------------------------------------------

numpy.random.seed(42)

# Two interleaved half-circles (non-linearly separable)
X_raw, y = make_moons(n_samples=40, noise=0.15, random_state=42)

# Scale features to [0, pi] range for the feature map
scaler = MinMaxScaler(feature_range=(0, numpy.pi))
X_2d = scaler.fit_transform(X_raw)

# Pad to 4 features (repeat the two features twice for 4-qubit map)
X = numpy.concatenate([X_2d, X_2d], axis=1)

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42, stratify=y
)

print(f"Training set: {len(X_train)} samples")
print(f"Test set: {len(X_test)} samples")
print(f"Features: {X_train.shape[1]} (2 physical, padded to 4 for qubits)")

# ---------------------------------------------------------------
# 4. Compute kernel matrices (warning: O(n^2) circuit evaluations)
# ---------------------------------------------------------------

print("\nComputing training kernel matrix (this takes a few minutes on CPU)...")
K_train = build_kernel_matrix(X_train, X_train, quantum_kernel)

print("Computing test kernel matrix...")
K_test = build_kernel_matrix(X_test, X_train, quantum_kernel)

# Verify positive semi-definiteness (important for SVM convergence)
eigenvalues = numpy.linalg.eigvalsh(K_train)
print(f"\nKernel matrix min eigenvalue: {eigenvalues.min():.6f}")
print(f"Kernel is PSD: {eigenvalues.min() >= -1e-6}")

# ---------------------------------------------------------------
# 5. Train the quantum kernel SVM
# ---------------------------------------------------------------

# Use precomputed kernel (we pass the Gram matrix directly)
qsvm = SVC(kernel="precomputed", C=1.0)
qsvm.fit(K_train, y_train)

# Predict on test set
y_pred = qsvm.predict(K_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"\nQuantum Kernel SVM Results:")
print(f"Test accuracy: {accuracy:.4f} ({accuracy*100:.1f}%)")
print(classification_report(y_test, y_pred, target_names=["Class 0", "Class 1"]))
print(f"Number of support vectors: {sum(qsvm.n_support_)}")

# ---------------------------------------------------------------
# 6. Compare with classical RBF kernel SVM
# ---------------------------------------------------------------

from sklearn.svm import SVC as ClassicalSVC

rbf_svm = ClassicalSVC(kernel="rbf", C=1.0, gamma="scale")
rbf_svm.fit(X_train, y_train)
y_pred_rbf = rbf_svm.predict(X_test)
rbf_accuracy = accuracy_score(y_test, y_pred_rbf)

print(f"\nClassical RBF Kernel SVM Test accuracy: {rbf_accuracy:.4f} ({rbf_accuracy*100:.1f}%)")

# ---------------------------------------------------------------
# 7. Visualize the decision boundaries
# ---------------------------------------------------------------

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

x_min, x_max = X_2d[:, 0].min() - 0.1, X_2d[:, 0].max() + 0.1
y_min, y_max = X_2d[:, 1].min() - 0.1, X_2d[:, 1].max() + 0.1
xx, yy = numpy.meshgrid(
    numpy.linspace(x_min, x_max, 5),
    numpy.linspace(y_min, y_max, 5)
)
grid_2d = numpy.c_[xx.ravel(), yy.ravel()]
grid_4d = numpy.concatenate([grid_2d, grid_2d], axis=1)

for ax, (name, svm, grid, use_qkernel) in zip(axes, [
    ("Quantum Kernel SVM", qsvm, grid_4d, True),
    ("Classical RBF SVM", rbf_svm, grid_4d, False)
]):
    if use_qkernel:
        K_grid = build_kernel_matrix(grid_4d, X_train, quantum_kernel)
        Z = svm.predict(K_grid)
    else:
        Z = svm.predict(grid_4d)
    
    Z = Z.reshape(xx.shape)
    ax.contourf(xx, yy, Z, alpha=0.3, cmap="RdBu")
    ax.scatter(X_2d[:, 0], X_2d[:, 1], c=y, cmap="RdBu", edgecolors="k", s=50)
    ax.set_title(f"{name}")
    ax.set_xlabel("Feature 1")
    ax.set_ylabel("Feature 2")

plt.suptitle("Quantum vs Classical Kernel SVM on Moon Dataset", fontsize=13)
plt.tight_layout()
plt.savefig("quantum_kernel_svm.png", dpi=150)
plt.show()

Understanding What the Kernel Computes

It is worth examining what the IQP kernel actually computes. For two data points x and x’, the kernel value is:

k(x, x') = |<phi(x') | phi(x)>|^2 = Tr[rho(x) * rho(x')]

where rho(x) = |phi(x)><phi(x)|. This is the Hilbert-Schmidt inner product of the two density matrices.

For angle encoding alone, this is equivalent to a product of cosine terms:

k_angle(x, x') = product_i cos^2((x_i - x'_i) / 2)

which is a simple stationary kernel, not obviously better than an RBF kernel. The IQP feature map adds ZZ coupling terms, making the kernel depend on products of feature differences:

k_IQP(x, x') includes terms like cos((x_i - x'_i) * (x_j - x'_j) * pi)

These cross-feature terms can capture interactions that isotropic classical kernels miss. Whether this is useful depends on the data distribution.

When Quantum Kernels Might Help

The theoretical case for quantum kernels is strongest when:

The data has structure in a specific high-dimensional space. If the optimal SVM hyperplane in feature space can be expressed as a polynomial of bounded degree in the features, classical polynomial kernels work fine. If the optimal hyperplane requires exponentially many terms, a quantum feature map might access it more efficiently.

The data is generated by a quantum process. Data from quantum chemistry simulations, quantum sensor measurements, or quantum communication protocols might have natural structure that a quantum feature map encodes efficiently.

The kernel matrix cannot be classically approximated. This is a necessary condition for quantum advantage. If the kernel matrix can be efficiently computed classically (which is true for many feature maps that look “quantum” but are actually low-rank or structured), there is no advantage.

Liu et al. (Nature Physics, 2021) proved that there exist datasets for which a quantum kernel SVM has a provable exponential advantage over any classical kernel SVM. The dataset is artificial and constructed to exploit discrete logarithm hardness, but the existence proof establishes that quantum kernel advantage is possible in principle.

Practical Limitations

The O(n^2) bottleneck. Building the kernel matrix requires O(n^2) circuit evaluations for a training set of n points. For n = 1000, that is a million circuit runs. This is feasible on a simulator but expensive on hardware. Kernel approximation methods (Nystrom approximation, random Fourier features) can reduce this, but adapting them to quantum kernels is an active research area.

Trainability and expressivity tradeoffs. Very deep feature maps produce kernels that are too expressive; they overfit. Very shallow maps produce kernels that are too simple to be useful. Finding the right depth is as much art as science, though kernel target alignment (measuring how well the kernel matches the label structure) provides a principled search strategy.

Hardware noise degrades the kernel. On a real device, the circuit implementing phi(x) is noisy. The measured kernel value is:

k_noisy(x, x') = k_ideal(x, x') * (1 - epsilon) + epsilon / 2^n

where epsilon aggregates all error sources. As n grows, epsilon grows and the signal-to-noise in the kernel degrades. Error mitigation (zero-noise extrapolation, probabilistic error cancellation) is needed for large circuits.

Using PennyLane’s Built-in Kernel Utilities

PennyLane provides qml.kernels for convenience:

import pennylane as qml
from pennylane import numpy as np

dev = qml.device("default.qubit", wires=4)

@qml.qnode(dev)
def kernel_circuit(x1, x2):
    qml.AngleEmbedding(x1, wires=range(4))
    qml.adjoint(qml.AngleEmbedding)(x2, wires=range(4))
    return qml.probs(wires=range(4))

# PennyLane's kernel function wrapper
kernel_fn = lambda x1, x2: kernel_circuit(x1, x2)[0]

# Build the kernel matrix on a small subset using PennyLane's batched utility
X_small = X_train[:10]
K = qml.kernels.square_kernel_matrix(X_small, kernel_fn, assume_normalized_kernel=True)

# Regularize to handle numerical issues (displace_matrix adds identity to remove negative eigenvalues)
K_reg = qml.kernels.displace_matrix(K)

print(f"Kernel matrix shape: {K_reg.shape}")
print(f"Kernel matrix condition number: {numpy.linalg.cond(K_reg):.2f}")

The qml.kernels module also provides kernel_target_alignment, a differentiable metric for optimizing the feature map parameters:

# Optimize feature map parameters by maximizing kernel-target alignment
from pennylane import numpy as pnp

params = pnp.array([0.1, 0.2, 0.3, 0.4], requires_grad=True)

def parametric_kernel(x1, x2, params):
    """A parametric feature map where parameters are trainable."""
    @qml.qnode(dev)
    def circuit():
        for i in range(4):
            qml.RY(params[i] * x1[i], wires=i)
        for i in range(4):
            qml.RY(params[i] * x2[i], wires=i)
        return qml.probs(wires=range(4))
    return float(circuit()[0])

# Kernel target alignment measures how well the kernel separates classes
kta = qml.kernels.target_alignment(X_train[:10], y_train[:10],
    lambda x1, x2: parametric_kernel(x1, x2, params),
    assume_normalized_kernel=True)
print(f"Kernel-target alignment: {kta:.4f}")
# Higher KTA -> kernel better matches the label structure -> better SVM accuracy

Quantum kernel methods sit at the intersection of kernel methods (a mature classical ML field) and quantum feature maps (an active quantum research area). PennyLane’s implementation makes them accessible for experimentation, and the connection to scikit-learn’s SVM means you get a complete, well-understood classifier with theoretical guarantees around the kernel-SVM framework. Whether quantum kernels will prove practically advantageous on real hardware for real datasets remains an open question, but the tools to investigate that question are available today.