Quantum Chemistry on Quantum Computers: The Basics

Why Quantum Chemistry Is Hard Classically

The electronic structure of a molecule is governed by the Schrodinger equation. For a molecule with N electrons, the wavefunction is a function of 3N spatial coordinates. Solving the Schrodinger equation exactly requires representing this wavefunction in a Hilbert space that grows exponentially with the number of electrons.

The gold standard exact method is full configuration interaction (FCI). FCI considers every possible way to distribute N electrons among M spin-orbitals. The number of determinants in the FCI expansion is the binomial coefficient C(M, N), which grows combinatorially.

The Scaling Wall: Concrete Numbers

To see why FCI becomes intractable, consider three molecules at increasing size:

Molecule	Electrons	Spin-orbitals (STO-3G)	FCI determinants C(M, N)	Feasible classically?
H2	2	4	6	Yes, trivially
N2	14	20	~38,760	Yes, manageable
N2	14	56 (cc-pVDZ)	~1.0 x 10^12	Barely, with effort
FeMoco	54	108 (minimal CAS)	~1.4 x 10^31	No, utterly impossible

For H2 with 2 electrons in 4 spin-orbitals, FCI requires C(4,2) = 6 Slater determinants. A classical computer handles this in microseconds. For N2 in a cc-pVDZ basis set, the same calculation requires roughly 10^12 determinants, pushing the limits of even the largest supercomputers. For the FeMoco cluster (the active site of nitrogenase, the enzyme that fixes atmospheric nitrogen), C(108,54) produces roughly 10^31 determinants. Storing a single complex amplitude for each determinant would require more memory than exists on Earth, by many orders of magnitude.

A quantum computer with roughly 108 qubits (one per spin-orbital) can represent all 10^31 amplitudes simultaneously through superposition. This is the fundamental promise of quantum chemistry simulation: the Hilbert space that overwhelms classical memory maps naturally onto a quantum register.

What Classical Approximations Sacrifice

Because FCI is intractable for interesting molecules, classical chemists rely on approximations:

Density functional theory (DFT) replaces the full wavefunction with the electron density, reducing the problem from 3N dimensions to 3 dimensions. DFT scales as O(N^3) and handles hundreds of atoms, but it struggles with strongly correlated systems and its accuracy depends on the choice of exchange-correlation functional.
Coupled cluster CCSD(T) truncates the correlation expansion at singles, doubles, and perturbative triples. It gives excellent results for weakly correlated molecules but scales as O(N^7) and fails badly when multiple electronic configurations contribute equally to the ground state.
CASSCF (complete active space self-consistent field) performs FCI within a small subset of orbitals. It captures strong correlation in the active space but misses dynamic correlation outside it.

Quantum computers do not need these approximations. They can, in principle, represent the exact FCI wavefunction directly.

Second Quantization

Rather than working with explicit spatial wavefunctions of 3N coordinates, second quantization describes chemistry in terms of fermionic creation and annihilation operators. This formalism is compact, algebraically convenient, and maps cleanly onto qubit operations.

Creation and Annihilation Operators

The creation operator a_p^dagger adds an electron to spin-orbital p. If orbital p is already occupied, the result is zero (the Pauli exclusion principle). The annihilation operator a_p removes an electron from spin-orbital p. If orbital p is empty, the result is also zero.

These operators obey the fermionic anticommutation relations:

{a_p, a_q^dagger} = a_p a_q^dagger + a_q^dagger a_p = delta_{pq}
{a_p, a_q} = 0
{a_p^dagger, a_q^dagger} = 0

The anticommutation ensures that swapping two electrons introduces a minus sign, enforcing the antisymmetry of the fermionic wavefunction.

The Electronic Hamiltonian

In second quantization, the molecular electronic Hamiltonian takes the form:

H = sum_{pq} h_{pq} a_p^dagger a_q + (1/2) sum_{pqrs} g_{pqrs} a_p^dagger a_q^dagger a_r a_s

The coefficients h_{pq} and g_{pqrs} are computed classically from the molecular geometry and basis set. Here is what they represent physically:

h_{pq} (one-electron integrals) describe a single electron at a time. They contain two contributions: the kinetic energy of electron p and the attraction between electron p and all the atomic nuclei. These terms capture the behavior of each electron moving independently in the field of the nuclei.
g_{pqrs} (two-electron integrals) describe pairs of electrons interacting simultaneously. They represent the Coulomb repulsion between electron p at one position and electron q at another. These terms are responsible for electron correlation, the central challenge in quantum chemistry.

The Schrodinger equation H|psi> = E|psi> finds the lowest-energy arrangement of all electrons simultaneously. Because the two-electron terms couple every pair of electrons, the solution requires accounting for all electrons at once. This is what forces the wavefunction into the exponentially large Hilbert space: you cannot factorize it into independent single-electron pieces when electron-electron repulsion matters.

Basis Sets: Where the Integrals Come From

The one-electron integrals h_{pq} and two-electron integrals g_{pqrs} depend on the choice of basis set. A basis set is a collection of mathematical functions (typically Gaussian functions centered on atoms) used to approximate the spatial part of each molecular orbital.

Common Basis Sets

STO-3G is a minimal basis set that uses 1 basis function per core orbital and 1 per valence orbital, with each function approximated by 3 Gaussians. It is fast but inaccurate, useful mainly for testing and debugging.
6-31G is a split-valence basis that uses 2 functions for each valence orbital (one contracted, one diffuse). This gives more flexibility for describing chemical bonds and improves energy predictions noticeably over STO-3G.
cc-pVDZ (correlation-consistent polarized valence double-zeta) adds polarization functions (d-functions on heavy atoms, p-functions on hydrogen) to capture angular correlation. It is the standard benchmark basis for correlated methods.

Basis Size Determines Qubit Count

The number of spin-orbitals in a calculation grows directly with the basis set size. Under the Jordan-Wigner mapping, each spin-orbital becomes one qubit. Larger basis sets give more accurate chemistry but require more qubits:

Molecule	Basis set	Spatial orbitals	Spin-orbitals	Qubits (JW)
H2	STO-3G	2	4	4
H2	cc-pVDZ	5	10	10
LiH	STO-3G	6	12	12
H2O	STO-3G	7	14	14
H2O	cc-pVDZ	24	48	48
N2	cc-pVDZ	28	56	56

This table reveals a practical tension: accurate chemistry demands large basis sets, but larger basis sets demand more qubits. The active space approximation (discussed below) addresses this by selecting only the most chemically relevant orbitals.

Fermion-to-Qubit Mappings

Quantum computers operate on qubits, not fermions. A mapping translates fermionic creation and annihilation operators into qubit operators (strings of Pauli matrices X, Y, Z, and I). Different mappings produce different Pauli representations of the same Hamiltonian.

Jordan-Wigner Transformation

The Jordan-Wigner (JW) mapping assigns spin-orbital p to qubit p directly. The occupation of each orbital is stored in the corresponding qubit: |1> means occupied, |0> means empty.

The creation operator maps to:

a_p^dagger = (1/2)(X_p - iY_p) tensor Z_{p-1} tensor ... tensor Z_0

The string of Z operators before qubit p enforces fermionic antisymmetry. When you swap two fermions, the wavefunction picks up a minus sign. The Z gates track the parity of all lower-indexed orbitals, reproducing this sign structure.

Worked Example: H2 with STO-3G

H2 in the STO-3G basis has 2 spatial orbitals, giving 4 spin-orbitals. Under Jordan-Wigner, these map to 4 qubits:

Qubit 0: spin-orbital 0 (first spatial orbital, alpha spin)
Qubit 1: spin-orbital 1 (first spatial orbital, beta spin)
Qubit 2: spin-orbital 2 (second spatial orbital, alpha spin)
Qubit 3: spin-orbital 3 (second spatial orbital, beta spin)

The HOMO (highest occupied molecular orbital) corresponds to the first spatial orbital (qubits 0 and 1). The LUMO (lowest unoccupied molecular orbital) corresponds to the second spatial orbital (qubits 2 and 3). The HOMO-LUMO gap is the energy difference between these, and it determines the molecule’s chemical reactivity.

After applying the Jordan-Wigner transformation and simplifying, the H2 Hamiltonian becomes a sum of Pauli terms:

H = c0 * IIII + c1 * IIIZ + c2 * IIZI + c3 * IZII + c4 * ZIII
    + c5 * IIZZ + c6 * IZIZ + c7 * ZIZI + c8 * IZZZ
    + c9 * (XXII + YYII) + c10 * (XXYY + YYXX)
    + c11 * (XXZZ + YYZZ) + ...

The exact coefficients depend on the molecular integrals, but the structure is instructive:

IIII is a constant energy offset (nuclear repulsion plus a portion of the electronic energy).
Single-Z terms (like IIIZ) measure the occupation of individual orbitals. They contribute energy based on whether an orbital is occupied or empty.
ZZ terms (like IIZZ) measure pairwise occupation. They capture the classical part of electron-electron repulsion.
XX + YY terms are the hopping terms. They represent an electron moving from one orbital to another, which is the mechanism of electron correlation. These terms create entanglement in the qubit state, and they are the reason a classical product state (Hartree-Fock) cannot capture the exact ground state.

The XX + YY hopping terms are physically essential. Without them, the Hamiltonian would be diagonal in the computational basis and trivially solvable classically. Electron correlation, the phenomenon that makes quantum chemistry hard, lives in these off-diagonal terms.

Bravyi-Kitaev Transformation

The Bravyi-Kitaev (BK) mapping encodes both occupation and parity information using a balanced binary tree structure. Each qubit stores a mix of occupation and partial parity data, distributing the fermionic sign information more efficiently.

The key advantage is in the length of the Pauli strings. For N spin-orbitals:

Jordan-Wigner produces Pauli strings of length up to O(N) for hopping terms, because the Z-string must span all lower-indexed qubits.
Bravyi-Kitaev produces Pauli strings of length O(log N), because the tree structure localizes the parity information.

In practice, for a molecule with 50 spin-orbitals, Jordan-Wigner hopping terms can produce Pauli strings of length 50, while Bravyi-Kitaev keeps them to length ~6. Shorter Pauli strings translate directly to fewer CNOT gates per term, which matters enormously on noisy hardware where every additional gate introduces error. For small molecules like H2, the difference is negligible, but for larger systems BK is often the better choice.

The Variational Quantum Eigensolver (VQE)

VQE is the primary near-term algorithm for quantum chemistry. It combines a parameterized quantum circuit (the ansatz) with a classical optimizer to find the ground state energy of a molecular Hamiltonian.

The Variational Principle

The mathematical foundation of VQE is the variational principle. It states that for any normalized quantum state |psi(theta)>:

E(theta) = <psi(theta)|H|psi(theta)> >= E_0

where E_0 is the true ground state energy. Equality holds if and only if |psi(theta)> is the exact ground state. This means that any trial wavefunction gives an energy that is an upper bound on the true answer. The closer the trial state is to the true ground state, the lower the energy.

VQE exploits this by parameterizing a quantum circuit with angles theta and iteratively adjusting those angles to minimize E(theta). The minimum energy found is the best approximation to the ground state energy within the family of states the ansatz can represent.

The Hartree-Fock Initial State

Every VQE calculation starts from an initial state. The standard choice is the Hartree-Fock (HF) state, which is the best single-determinant approximation to the ground state. In the Hartree-Fock picture, each electron occupies an independent orbital with no explicit electron-electron correlation.

For H2 with 2 electrons and 4 spin-orbitals, the Hartree-Fock state places both electrons in the two lowest spin-orbitals:

|HF> = |1100>    (qubits 0 and 1 occupied, qubits 2 and 3 empty)

Note: In Qiskit’s little-endian qubit ordering, this state is prepared by applying X gates to qubits 0 and 1. The Hartree-Fock state is a product state with zero entanglement between qubits. It captures roughly 99% of the total electronic energy for many molecules but misses the correlation energy, which is the chemically important part that determines bond breaking, reaction barriers, and spectroscopic properties.

The UCCSD Ansatz

The most common ansatz for quantum chemistry is UCCSD: Unitary Coupled Cluster with Singles and Doubles.

Coupled cluster is a classical method that systematically adds correlated corrections to the Hartree-Fock reference. It parameterizes the wavefunction as:

|CC> = exp(T) |HF>

where T is the cluster operator. The operator T = T1 + T2 contains single excitations (T1, promoting one electron from an occupied to a virtual orbital) and double excitations (T2, promoting two electrons simultaneously).

Singles capture orbital relaxation: if the Hartree-Fock orbitals are not quite optimal, singles adjust the orbital shapes.
Doubles capture pairwise electron correlation: two electrons coordinating their motion to avoid each other. This is the dominant source of correlation energy in most molecules.

Classical coupled cluster uses the non-unitary operator exp(T), which is efficient to compute classically but cannot be directly implemented as a quantum circuit. UCCSD makes the operator unitary:

|UCCSD> = exp(T - T^dagger) |HF>

This unitary operator can be decomposed into a sequence of quantum gates (rotations and CNOTs). The parameters in T are the variational angles that the classical optimizer adjusts.

VQE in Practice: H2 Example

The following code runs VQE on the hydrogen molecule using Qiskit Nature. It computes the molecular integrals classically, maps the fermionic Hamiltonian to qubits with Jordan-Wigner, builds a UCCSD ansatz starting from the Hartree-Fock state, and minimizes the energy using the SLSQP optimizer.

from qiskit_nature.second_q.drivers import PySCFDriver
from qiskit_nature.second_q.mappers import JordanWignerMapper
from qiskit_nature.second_q.circuit.library import HartreeFock, UCCSD
from qiskit_algorithms import VQE
from qiskit_algorithms.optimizers import SLSQP
from qiskit.primitives import Estimator

# Define the molecule: H2 at equilibrium bond length (0.735 Angstrom)
driver = PySCFDriver(atom="H 0 0 0; H 0 0 0.735", basis="sto3g")
problem = driver.run()

# Map the fermionic Hamiltonian to a qubit Hamiltonian via Jordan-Wigner
mapper = JordanWignerMapper()
qubit_op = mapper.map(problem.second_q_ops()[0])

# Set up the UCCSD ansatz with Hartree-Fock initial state
num_particles = problem.num_particles       # (1, 1) -> 1 alpha, 1 beta
num_spatial_orbitals = problem.num_spatial_orbitals  # 2 spatial orbitals

hf_state = HartreeFock(num_spatial_orbitals, num_particles, mapper)
ansatz = UCCSD(num_spatial_orbitals, num_particles, mapper, initial_state=hf_state)

# Run VQE with the SLSQP optimizer (a gradient-based classical optimizer)
estimator = Estimator()
optimizer = SLSQP(maxiter=200)
vqe = VQE(estimator, ansatz, optimizer)
result = vqe.compute_minimum_eigenvalue(qubit_op)

print(f"Ground state energy: {result.eigenvalue:.6f} Ha")
# Expected: approximately -1.137 Ha for H2 at 0.735 Angstrom bond length

Interpreting the Output

When VQE returns result.eigenvalue = -1.137 Ha, this is the electronic ground state energy of H2 in Hartree atomic units. To convert:

1 Hartree = 27.21 eV
1 Hartree = 627.5 kcal/mol

Chemical accuracy is the threshold at which computed energies are useful for predicting reaction outcomes. It is defined as 1 kcal/mol, which equals approximately 1.6 milliHartree (mHa). If a classically exact method like CCSD(T) gives -1.1516 Ha and VQE gives -1.1450 Ha, the error is 6.6 mHa, which is roughly 4 kcal/mol, well above chemical accuracy. Reaching chemical accuracy requires a sufficiently expressive ansatz, enough VQE iterations for convergence, and low enough hardware noise.

For H2 with STO-3G, the UCCSD ansatz is essentially exact because the problem is small enough that UCCSD captures the full correlation. The real test comes with larger molecules and larger basis sets, where the ansatz expressibility and noise become limiting factors.

The Active Space Approximation

Full simulation of all molecular orbitals is unnecessary and too expensive even for quantum computers in the near term. The active space approximation selects a small set of orbitals near the Fermi level that dominate the correlation energy, reducing the qubit requirements dramatically.

How Active Spaces Work

A CAS(n, m) active space contains n electrons distributed among m spatial orbitals. The remaining orbitals are frozen:

Core orbitals (far below the Fermi level) are frozen as doubly occupied. Their electrons do not participate in the quantum calculation. This is called the frozen core approximation, and it is justified because core electrons contribute very little to chemical bonding or reactivity.
Virtual orbitals (far above the Fermi level) are frozen as empty. They have negligible occupation in the ground state and can safely be excluded.
Active orbitals (near the HOMO-LUMO gap) are treated with full quantum correlation. These are the frontier orbitals where chemical bonding, electron correlation, and reactivity are concentrated.

The qubit count drops from the full basis (2M spin-orbitals) to just 2m spin-orbitals for the active space. For example, water in the cc-pVDZ basis has 48 spin-orbitals but a CAS(4,4) active space needs only 8 qubits.

Selecting the Active Space

Choosing which orbitals to include in the active space requires chemical intuition and, ideally, preliminary classical calculations. Standard guidelines include:

Include frontier orbitals: the HOMO, LUMO, and orbitals adjacent to them are almost always in the active space.
Check natural orbital occupation numbers: run a preliminary CASSCF or MP2 calculation and examine the occupation numbers. Orbitals with occupation between 0.1 and 1.9 (on a scale of 0 to 2) have significant correlation character and belong in the active space. Orbitals with occupation very close to 2.0 or 0.0 are safely frozen.
Consider the chemistry: for transition metal complexes, include the d-orbitals and their bonding/antibonding partners. For bond-breaking problems, include the bonding and antibonding orbitals of the bond being broken.

Code Example: Active Space Reduction

from qiskit_nature.second_q.drivers import PySCFDriver
from qiskit_nature.second_q.transformers import ActiveSpaceTransformer
from qiskit_nature.second_q.mappers import JordanWignerMapper

# Set up a water molecule
driver = PySCFDriver(atom="O 0 0 0; H 0 0.757 0.587; H 0 -0.757 0.587", basis="sto3g")
problem = driver.run()

print(f"Full problem: {problem.num_spatial_orbitals} spatial orbitals, "
      f"{problem.num_particles} particles")
# Full problem: 7 spatial orbitals, (5, 5) particles -> 14 qubits with JW

# Reduce to CAS(4, 4): 4 electrons in 4 spatial orbitals
transformer = ActiveSpaceTransformer(num_electrons=4, num_spatial_orbitals=4)
reduced_problem = transformer.transform(problem)

print(f"Active space: {reduced_problem.num_spatial_orbitals} spatial orbitals, "
      f"{reduced_problem.num_particles} particles")
# Active space: 4 spatial orbitals, (2, 2) particles -> 8 qubits with JW

# Now map and solve the reduced problem
mapper = JordanWignerMapper()
qubit_op = mapper.map(reduced_problem.second_q_ops()[0])
print(f"Qubit Hamiltonian has {len(qubit_op)} Pauli terms")

Common Misconceptions

Before looking at the roadmap ahead, it is worth addressing several misconceptions that appear frequently in popular discussions of quantum chemistry simulation.

“Quantum computers will solve all chemistry problems soon.” This is false. Current NISQ demonstrations have simulated H2, LiH, and BeH2, which are toy molecules that classical computers solve exactly in milliseconds. The molecules that would demonstrate true quantum advantage (like FeMoco) require fault-tolerant hardware that does not yet exist. The gap between current capability and industrial relevance is measured in years and in orders of magnitude of hardware improvement.

“VQE always finds the ground state.” This is false. VQE is a variational method that relies on a classical optimizer navigating a potentially rugged energy landscape. The optimizer can get stuck in local minima, especially when the ansatz has many parameters. There is no guarantee that VQE converges to the global minimum. In practice, researchers run VQE multiple times with different initial parameters and use various strategies (parameter initialization from classical methods, adaptive ansatz construction) to improve convergence.

“More qubits means better chemistry.” This is misleading. Adding qubits without improving gate fidelity, connectivity, or the ansatz does not improve results. A 100-qubit device with 1% two-qubit gate error rate produces garbage for any circuit deeper than about 50 layers. The quality of the results depends on the interplay between qubit count, gate error rates, circuit depth, and ansatz expressibility. A 20-qubit calculation with high-fidelity gates often outperforms a 50-qubit calculation with noisy gates.

Where Quantum Chemistry Is Headed

Near-Term vs. Fault-Tolerant Roadmap

The following table summarizes the trajectory from demonstrated results to future targets:

Molecule	Active space	Qubits (JW)	Algorithm	Status
H2	Full	4	VQE on NISQ	Demonstrated 2016
LiH	CAS(2,5)	10	VQE on NISQ	Demonstrated 2017
BeH2	CAS(4,6)	12	VQE on NISQ	Demonstrated 2017
FeMoco	CAS(54,54)	108	QPE (fault-tolerant)	Requires thousands of logical qubits
Cytochrome P450	TBD	~100+	QPE (fault-tolerant)	Future target

The Shift from VQE to QPE

Fault-tolerant quantum computers will replace VQE with quantum phase estimation (QPE). QPE determines eigenvalues directly, without variational optimization and without the risk of local minima. The tradeoff is that QPE requires deep circuits with millions to billions of gates, which demand error-corrected logical qubits.

For FeMoco, published resource estimates range from roughly 100 to a few thousand logical qubits, with T-gate or Toffoli counts on the order of 10^10 to 10^14 depending on the algorithm. With a surface code at physical error rates of 10^{-3}, each logical qubit requires thousands of physical qubits. The total physical qubit count lands in the millions, which places FeMoco simulation firmly in the fault-tolerant era.

The Target Problems

The molecules that define quantum advantage in chemistry share a common trait: they have large numbers of strongly correlated electrons that defeat all classical approximation methods.

FeMoco (the iron-molybdenum cofactor of nitrogenase) catalyzes the conversion of atmospheric N2 into ammonia. Understanding its mechanism at the electronic level could guide the design of synthetic catalysts for nitrogen fixation, potentially replacing the energy-intensive Haber-Bosch process.
Cytochrome P450 enzymes metabolize roughly 75% of all pharmaceutical drugs. Accurate simulation of their active sites would improve drug design and predict drug-drug interactions.
High-temperature superconductors involve strongly correlated electron systems (like the Hubbard model) where DFT and coupled cluster both fail. Quantum simulation could explain the pairing mechanism and guide the design of room-temperature superconductors.

These problems remain years away from practical quantum solution, but they define a clear and scientifically compelling path to quantum advantage in chemistry. The intermediate steps (better error correction, more qubits, improved ansatze, and hybrid quantum-classical workflows) are advancing steadily.