• Machine Learning

Zapata AI: Generative Quantum Models for Industrial Data

Zapata AI

Zapata AI developed quantum generative models using its Orquestra platform to synthesize tabular data for financial risk modelling and supply chain simulation, outperforming classical VAEs on small datasets.

Key Outcome
Quantum Born Machine models generated synthetic datasets with 22% higher statistical fidelity than classical VAE baselines for financial time-series data with fewer than 500 training samples.

The Challenge

Industrial machine learning applications routinely encounter a painful constraint: the most valuable datasets are also the smallest. Financial institutions collecting rare credit events, manufacturers logging infrequent defect signatures, and logistics operators tracking black-swan disruptions all face the same problem: not enough real data to train robust models. Synthetic data generation is an obvious solution, but classical generative models such as variational autoencoders (VAEs) and GANs require substantial training data to learn meaningful distributions. They often fail or overfit precisely in the low-data regime where synthetic augmentation is most needed. Zapata AI investigated whether quantum generative models could learn richer representations from fewer examples by exploiting the exponentially large Hilbert space accessible to quantum circuits.

The Quantum Approach

Zapata’s team implemented Quantum Born Machines (QBMs) within the Orquestra workflow platform. A QBM represents a probability distribution implicitly through the squared amplitudes of a parameterized quantum circuit’s output state. Training minimizes the maximum mean discrepancy (MMD) between the model distribution and the empirical training distribution, a loss function well suited to small-sample settings because it does not require density estimation. Experiments ran on both IBM Quantum and IonQ hardware, with Orquestra managing circuit compilation, backend routing, and result aggregation across providers.

from orquestra.quantum.circuits import Circuit, H, CNOT, RZ, RY
from orquestra.quantum.backends import IBMQBackend
from orquestra.ml.generative import QuantumBornMachine
from orquestra.ml.losses import MaximumMeanDiscrepancy
import numpy as np

# Load small financial time-series dataset (< 500 samples, 8 features)
training_data = np.load("credit_default_timeseries.npy")  # shape (480, 8)

# Define 8-qubit QBM ansatz
def build_ansatz(n_qubits: int, depth: int) -> Circuit:
    circuit = Circuit()
    for q in range(n_qubits):
        circuit += RY(np.random.uniform(0, np.pi))(q)
    for d in range(depth):
        for q in range(n_qubits - 1):
            circuit += CNOT(q, q + 1)
        for q in range(n_qubits):
            circuit += RZ(np.random.uniform(-np.pi, np.pi))(q)
    return circuit

backend = IBMQBackend(device_name="ibmq_mumbai", n_shots=4096)
mmd_loss = MaximumMeanDiscrepancy(kernel="rbf", sigma=1.0)

qbm = QuantumBornMachine(
    ansatz_factory=lambda: build_ansatz(n_qubits=8, depth=4),
    loss_fn=mmd_loss,
    backend=backend,
    optimizer="Adam",
    learning_rate=0.02,
    n_epochs=150,
)

qbm.fit(training_data)
synthetic_samples = qbm.sample(n_samples=2000)
print(f"Generated {len(synthetic_samples)} synthetic records")

Cross-hardware execution allowed Zapata to compare QBM fidelity across gate-based (IBM) and trapped-ion (IonQ) backends, finding that IonQ’s lower noise floor improved fidelity for deeper circuits while IBM provided faster iteration during hyperparameter search.

Results and Implications

On financial credit-default time-series datasets with fewer than 500 training samples, QBMs generated synthetic data scoring 22% higher on statistical fidelity metrics (Wasserstein distance and MMD against held-out real data) compared to classical VAE baselines trained on the same data. The gap narrowed as training set size increased beyond 2,000 samples, consistent with the theoretical expectation that quantum models offer the most relative advantage in data-scarce settings.

Downstream models trained on QBM-augmented datasets showed measurably better calibration on rare event prediction tasks, which is the metric that matters most for credit risk applications. Zapata shared these results with two financial services partners who integrated the QBM workflow into their model development pipelines for low-frequency event modelling. The study highlighted a pragmatic near-term use case for quantum ML: not replacing classical generative models across the board, but selectively deploying quantum approaches where data scarcity creates a genuine performance gap.