BioNTech: Quantum Computing for mRNA Secondary Structure Prediction

BioNTech explored quantum approaches to predicting mRNA secondary structure folding energetics, targeting improved mRNA vaccine sequence design by identifying thermodynamically optimal codon sequences.

The Challenge

The design of mRNA therapeutics and vaccines depends critically on the secondary structure formed by the mRNA molecule. Single-stranded RNA folds back on itself through Watson-Crick base pairing: adenine with uracil, guanine with cytosine, and through wobble pairings. The resulting stem-loop structures profoundly influence how efficiently ribosomes translate the mRNA into protein, how stable the molecule is in the body, and how readily immune sensors recognise it.

BioNTech’s mRNA design platform must evaluate candidate nucleotide sequences and predict their minimum free energy secondary structure. The minimum free energy structure is the thermodynamically favoured fold, found by identifying the set of base pairs that maximises the total negative free energy contribution. For a sequence of $N$ nucleotides, the number of possible secondary structures grows exponentially, and classical algorithms such as the dynamic programming approach in the Vienna RNA package solve this in $O(N^3)$ time. As therapeutic mRNA sequences grow longer (vaccine antigens commonly run 1000 to 4000 nucleotides), even this polynomial classical cost becomes expensive when screening millions of codon-optimised sequence variants.

BioNTech partnered with IBM Quantum to investigate whether quantum algorithms could provide a more favourable scaling path for the minimum free energy calculation, particularly for the long-range base-pairing interactions that classical dynamic programming handles least efficiently.

The Quantum Approach

The mRNA folding problem was mapped to a spin Hamiltonian. Each potential base pair $(i, j)$ in the sequence was represented by a binary variable: $z_{ij} = 1$ if nucleotides $i$ and $j$ are paired, $z_{ij} = 0$ otherwise. Pairing free energy contributions (Turner model parameters) provided the linear terms. Penalties for conflicting pairings (pseudoknot exclusion and the constraint that each nucleotide participates in at most one pair) provided quadratic terms. The resulting Ising Hamiltonian was implemented using Qiskit Nature’s SparsePauliOp and minimised using VQE with a hardware-efficient ansatz.

import numpy as np
from qiskit.quantum_info import SparsePauliOp
from qiskit_algorithms import VQE
from qiskit_algorithms.optimizers import L_BFGS_B
from qiskit.circuit.library import EfficientSU2
from qiskit.primitives import Estimator

def build_rna_hamiltonian(sequence: str, energy_params: dict) -> SparsePauliOp:
    """
    Build a Pauli Hamiltonian for RNA minimum free energy folding.
    sequence: RNA sequence string (A, U, G, C)
    energy_params: Turner model stacking energies (kcal/mol)
    Returns SparsePauliOp over N*(N-1)/2 qubits (one per candidate pair).
    """
    N = len(sequence)
    valid_pairs = []
    pair_energies = []

    complement = {"A": "U", "U": "A", "G": "C", "C": "G"}

    for i in range(N):
        for j in range(i + 4, N):  # minimum loop size 3
            if complement.get(sequence[i]) == sequence[j]:
                pair_idx = i * N + j
                stack_key = sequence[i] + sequence[j]
                energy = energy_params.get(stack_key, -1.0)
                valid_pairs.append((i, j, pair_idx))
                pair_energies.append(energy)

    n_qubits = len(valid_pairs)
    pauli_list = []

    # Linear terms: base pair free energies (Z_k contributes -E_k/2 to objective)
    for k, (i, j, _) in enumerate(valid_pairs):
        z_str = "I" * (n_qubits - k - 1) + "Z" + "I" * k
        pauli_list.append((z_str, -pair_energies[k] / 2))

    # Quadratic penalty: two pairs (i,j) and (i,k) conflict if they share nucleotide i
    PENALTY = 5.0
    for k1 in range(len(valid_pairs)):
        for k2 in range(k1 + 1, len(valid_pairs)):
            i1, j1, _ = valid_pairs[k1]
            i2, j2, _ = valid_pairs[k2]
            if i1 == i2 or j1 == j2 or i1 == j2 or j1 == i2:
                zz_str = (
                    "I" * (n_qubits - max(k1, k2) - 1)
                    + "Z"
                    + "I" * (max(k1, k2) - min(k1, k2) - 1)
                    + "Z"
                    + "I" * min(k1, k2)
                )
                pauli_list.append((zz_str, PENALTY))

    return SparsePauliOp.from_list(pauli_list)

# Example: 10-nucleotide test sequence
sequence = "AUGCAUUGCA"
energy_params = {"AU": -0.9, "UA": -0.9, "GC": -2.1, "CG": -2.1, "GU": -0.5}
hamiltonian = build_rna_hamiltonian(sequence, energy_params)

n_qubits = hamiltonian.num_qubits
ansatz = EfficientSU2(n_qubits, reps=2, entanglement="linear")

estimator = Estimator()
optimizer = L_BFGS_B(maxiter=500)
vqe = VQE(estimator=estimator, ansatz=ansatz, optimizer=optimizer)

result = vqe.compute_minimum_eigenvalue(hamiltonian)
print(f"Minimum free energy estimate: {result.eigenvalue.real:.3f} kcal/mol")

For 50-nucleotide test sequences, the quantum VQE results were benchmarked against the Vienna RNA RNAfold package, using mean absolute error in free energy as the comparison metric.

Results and Implications

VQE-based predictions of minimum free energy structures for 50-nucleotide mRNA test sequences achieved a mean absolute error of 0.4 kcal/mol against Vienna RNA reference values, comparable to the experimental measurement uncertainty of Turner model parameters. The quantum approach required 25 qubits for the 50-nucleotide problem, with qubit count scaling as $O(N^2)$ in the number of valid candidate base pairs.

The key finding was not accuracy parity at small scales, but favourable scaling trajectory. Classical dynamic programming scales as $O(N^3)$ , while the VQE energy evaluation scales as $O(N^2 \cdot \text{circuit depth})$ . BioNTech’s team projected a crossover point near 200 nucleotides where quantum VQE on error-corrected hardware would outperform classical methods in wall-clock time for the same accuracy level.

For therapeutic mRNA applications, this crossover is highly relevant: signal peptide and 5’ untranslated region optimisation frequently involves sequences in the 100 to 300 nucleotide range, exactly where quantum advantage is projected to emerge. BioNTech has incorporated these findings into their quantum computing research roadmap, targeting integration with their codon optimisation pipeline once hardware supporting 500+ logical qubits becomes available.

BioNTech: Quantum Computing for mRNA Secondary Structure Prediction

The Challenge

The Quantum Approach

Results and Implications

Related Case Studies

Boehringer Ingelheim: Quantum Chemistry for Drug Discovery

Eli Lilly Quantum Computing for Protein-Drug Binding Affinity Prediction

Novo Nordisk: Quantum Chemistry for GLP-1 Drug Design

Get one quantum email a week