Quantinuum Compositional Quantum Natural Language Processing

Quantinuum developed the DisCoCat (Distributional Compositional Categorical) framework for quantum natural language processing, encoding grammatical sentence structure as quantum circuits on their H-series trapped-ion computers. The meaning of a sentence is computed as a tensor contraction of word-state quantum circuits connected by grammatical reduction rules, implemented using the lambeq Python library. Quantinuum ran early binary text classification experiments on small curated sentence datasets using H-series hardware.

DisCoCat (Distributional Compositional Categorical) theory, developed by Bob Coecke and collaborators at Oxford and later at Quantinuum, unifies two previously separate NLP paradigms: distributional semantics (word meanings as vectors learned from co-occurrence statistics) and compositional grammar (sentence meaning computed by combining word meanings according to syntactic structure). In classical NLP, these paradigms are often in tension; bag-of-words models lose grammar, while symbolic parsers struggle with ambiguity. DisCoCat resolves this by placing both words and grammatical reductions in the same mathematical framework: the compact closed category. The key insight is that the same string-diagram formalism that describes quantum circuits also describes grammatical reduction in pregroup grammar, making quantum computing a natural substrate for DisCoCat computations.

In the quantum implementation, each word in a sentence is assigned a quantum circuit that prepares a state in a Hilbert space reflecting that word’s grammatical type. A noun is a 1-qubit state (type n), an adjective is a 2-qubit state mapping noun-type to noun-type (type n^l * n), and a transitive verb is a 4-qubit state (type n^l * s * n^r). Grammatical reductions in the pregroup parse correspond to Bell-basis measurements (cups in string diagram notation) that entangle and contract adjacent word circuits, reducing the full sentence to a single sentence-type state (type s, encoded as 1 qubit) whose amplitude encodes sentence meaning. Training optimizes the rotation angles inside word circuits using a hybrid quantum-classical loop.

from lambeq import BobcatParser, AtomicType, IQPAnsatz, TketModel
from lambeq import Dataset
from lambeq.backend.tensor import Diagram
from pytket.extensions.quantinuum import QuantinuumBackend
import numpy as np

# Parse sentences into DisCoCat string diagrams
parser = BobcatParser(verbose="suppress")

train_sentences = [
    "the film was excellent",
    "i loved this movie",
    "terrible acting ruined the plot",
    "absolutely wonderful performance",
    "boring and slow moving story",
    # ... 95 more sentences
]
train_labels = [1, 1, 0, 1, 0]  # 1=positive, 0=negative

# Parse and convert to diagrams
raw_diagrams = parser.sentences2diagrams(train_sentences[:5])

# Apply IQP (Instantaneous Quantum Polynomial) ansatz
# Maps DisCoCat diagrams to parameterized quantum circuits
N = AtomicType.NOUN
S = AtomicType.SENTENCE

ansatz = IQPAnsatz(
    {N: 1, S: 1},     # 1 qubit per noun/sentence type
    n_layers=2,        # IQP layers of single-qubit rotations + CZ
    n_single_qubit_params=3,
)
train_circuits = [ansatz(d) for d in raw_diagrams]

print(f"Example circuit qubit count: {train_circuits[0].n_qubits}")
print(f"Example circuit gate count: {len(train_circuits[0].gates)}")

# Compile for Quantinuum H1-2 via pytket
backend = QuantinuumBackend(device_name="H1-2")
compiled = [backend.get_compiled_circuit(c, optimisation_level=1)
            for c in train_circuits]

# Hybrid training loop (simplified)
from lambeq import QuantumTrainer, SPSAOptimizer

model = TketModel.from_diagrams(raw_diagrams, backend=backend)

def accuracy(y_pred: np.ndarray, y_true: list[int]) -> float:
    predictions = (y_pred[:, 1] > 0.5).astype(int)  # P(positive) > 0.5
    return np.mean(predictions == np.array(y_true))

trainer = QuantumTrainer(
    model=model,
    loss_function=lambda y_pred, y_true: -np.mean(
        np.array(y_true) * np.log(y_pred[:, 1] + 1e-9)
        + (1 - np.array(y_true)) * np.log(y_pred[:, 0] + 1e-9)
    ),
    optimizer=SPSAOptimizer,
    optim_hyperparams={"a": 0.05, "c": 0.06, "A": 0.001},
    epochs=100,
    evaluate_functions={"acc": accuracy},
    evaluate_on_train=True,
    verbose="text",
    seed=42,
)

# trainer.fit(train_data, val_data)  # runs on H1-2 hardware

# Inspect a compiled circuit structure
print(f"\nCircuit for '{train_sentences[0]}':")
print(compiled[0])

The structural advantage of DisCoCat over bag-of-words baselines comes from how grammatical relations create entanglement between word circuits. In a bag-of-words model, “the dog bit the man” and “the man bit the dog” have identical representations because word order is ignored. In the DisCoCat circuit, the transitive verb circuit is entangled with subject and object noun circuits in a grammatically specified way, so the contraction order encodes who bit whom. This relational structure is naturally represented as quantum entanglement, and the sentence-type output qubit state reflects the composed meaning including argument structure. In Quantinuum’s initial hardware demonstration accompanying the lambeq release, a DisCoCat classifier reached 87% accuracy on the held-out test sentences, the first QNLP problem run on System Model H1 hardware. The published QNLP experiments of this generation used small curated datasets of roughly 30 to 130 sentences, so the results are proofs of principle for grammar-aware quantum encodings rather than benchmarks against state-of-the-art classical NLP; the structural point is that the quantum circuit explicitly preserves grammatical relations that bag-of-words models discard. Circuit sizes remained small, from a few qubits for short sentences (“film was great”) to the low tens of qubits for longer complex sentences, within the limits of current trapped-ion hardware.

Quantinuum Compositional Quantum Natural Language Processing

Sources

Related Case Studies

CERN: Quantum Classifiers for LHC Particle Event Selection

Get one quantum email a week