• Machine Learning

Baidu Quantum Natural Language Processing with PaddlePaddle Quantum

Baidu

Baidu developed PaddlePaddle Quantum (Paddle Quantum), a quantum ML framework built on their PaddlePaddle deep learning library, and applied it to Chinese text classification using DisCoCat-style sentence encoding on a Weibo sentiment dataset.

Key Outcome
Quantum NLP model achieved 83% accuracy on binary Chinese sentiment classification vs 89% classical BERT; Paddle Quantum framework adopted by 5,000+ users; Qian Shi 10-qubit processor validated quantum ML workflows.

The Problem

Chinese natural language processing presents unique challenges: word segmentation is non-trivial, character-level semantics differ from alphabetic languages, and social media text (as on Weibo) is dense with abbreviations and code-switching. Baidu, as China’s largest search engine, processes billions of Chinese-language queries daily. Their quantum computing research group asked whether quantum circuits could encode sentence structure in ways that offer computational advantages over classical transformer architectures like BERT.

The theoretical motivation comes from DisCoCat (Distributional Compositional Categorical) grammar, which represents sentence meaning using tensor network contractions. These contractions map naturally to quantum circuits: words become quantum states and grammatical composition becomes entangling gates. Baidu’s PaddlePaddle Quantum framework (Paddle Quantum) was built to make this pipeline accessible to researchers without deep quantum hardware expertise.

Paddle Quantum Architecture

Paddle Quantum integrates quantum circuits as differentiable layers in PaddlePaddle’s computational graph. A quantum circuit layer accepts classical input features, encodes them into qubit states via parameterized rotations, applies entangling layers, and returns measurement expectation values as output features. This allows hybrid quantum-classical models to be trained end-to-end with standard gradient descent.

import paddle
import paddle_quantum as pq
from paddle_quantum.circuit import UAnsatz
import numpy as np

# Paddle Quantum: quantum NLP sentence encoder (DisCoCat-style)
# Each word is encoded into 2 qubits; composition uses CNOT entanglement

N_QUBITS = 4  # 2 words x 2 qubits per word
pq.set_backend("state_vector")

def build_sentence_circuit(word1_params, word2_params, comp_params):
    """
    Encode two words into 4-qubit state and apply compositional layer.
    word_params: rotation angles for Rx, Ry, Rz on 2 qubits per word
    comp_params: angles for the composition (entangling) layer
    """
    cir = UAnsatz(N_QUBITS)

    # Word 1 encoding on qubits 0, 1
    cir.rx(word1_params[0], 0)
    cir.ry(word1_params[1], 0)
    cir.rx(word1_params[2], 1)
    cir.ry(word1_params[3], 1)

    # Word 2 encoding on qubits 2, 3
    cir.rx(word2_params[0], 2)
    cir.ry(word2_params[1], 2)
    cir.rx(word2_params[2], 3)
    cir.ry(word2_params[3], 3)

    # Composition layer: entangle word representations
    cir.cnot([0, 2])
    cir.cnot([1, 3])
    cir.ry(comp_params[0], 2)
    cir.ry(comp_params[1], 3)

    return cir

# Simulate forward pass for a 2-word sentence
word1 = paddle.to_tensor(np.random.randn(4).astype("float32"))
word2 = paddle.to_tensor(np.random.randn(4).astype("float32"))
comp  = paddle.to_tensor(np.random.randn(2).astype("float32"))

cir = build_sentence_circuit(word1, word2, comp)
state = cir.run_state_vector()

# Measure Z expectation on qubit 0 as sentiment score
obs = pq.Hamiltonian([[1.0, "Z0"]])
sentiment_logit = cir.expecval(obs)
print(f"Sentiment logit (Z0 expectation): {float(sentiment_logit):.4f}")

Chinese Sentiment Classification on Weibo

The Weibo sentiment dataset contains short posts labeled positive or negative. Preprocessing involved jieba word segmentation, removing stopwords, and mapping each post to a fixed two-word representation (subject and predicate) for compatibility with the 4-qubit DisCoCat circuit. Longer sentences were handled by a hierarchical composition that fed intermediate circuit outputs back as word embeddings for the next composition step.

Training used PaddlePaddle’s Adam optimizer with gradient computation via the parameter-shift rule on both simulator and the Qian Shi 10-qubit superconducting processor. The quantum model was compared against a classical BERT-base-Chinese fine-tuned with identical train/test splits. The gap (83% vs 89%) reflects both the limited expressiveness of 4-qubit circuits and the structural constraint of forcing variable-length text into fixed-circuit templates.

QNNB Benchmark and Broader Impact

Alongside the NLP application, Baidu released the Quantum Neural Network Benchmark (QNNB), a suite of standardized tasks for evaluating quantum ML models: binary classification on structured datasets, regression on physics simulation outputs, and generative modeling of quantum state distributions. QNNB fills a gap in the field where ad hoc benchmarks make cross-paper comparisons unreliable.

Paddle Quantum’s integration with PaddlePaddle’s ecosystem (automatic differentiation, GPU acceleration for simulation, distributed training) lowered the barrier for Chinese AI researchers to enter quantum ML. The framework’s adoption by over 5,000 users within a year reflects China’s strategic investment in domestic quantum software stacks that are not dependent on IBM Qiskit or Google Cirq infrastructure.