Qiskit SparsePauliOp Tutorial: Build Hamiltonians

Algorithms like VQE, QAOA, and Quantum Phase Estimation all share a common requirement: they need to compute expectation values of Hamiltonians. The naive approach stores the full Hamiltonian as a dense matrix, but an n-qubit Hamiltonian is a 2^n x 2^n matrix. For 20 qubits, that is roughly a million-by-million matrix. For 30 qubits, it is 10^9 x 10^9 entries, requiring exabytes of storage. That approach is impossible on any existing computer.

Physical Hamiltonians, however, are sparse in the Pauli basis. A molecular Hamiltonian for a 30-orbital system might contain a few thousand Pauli terms, but that is thousands, not billions. Qiskit’s SparsePauliOp class exploits this sparsity by storing only the non-zero Pauli terms and their complex coefficients. This makes it practical to represent and manipulate operators for systems that would be completely intractable as dense matrices.

The Pauli Basis

Every operator acting on n qubits can be decomposed as a linear combination of tensor products of the four single-qubit Pauli matrices: I (identity), X (bit-flip), Y (bit-and-phase-flip), and Z (phase-flip). For n qubits, there are 4^n such tensor products, and together they form a complete orthonormal basis for the space of 2^n x 2^n matrices.

A SparsePauliOp stores a list of these Pauli strings (like "IZX") and their complex coefficients. The string "IZX" means I on the leftmost qubit, Z in the middle, and X on the rightmost qubit (qubit 0). A general n-qubit Hamiltonian takes the form:

H = c_0 * P_0 + c_1 * P_1 + ... + c_k * P_k

where each P_i is a tensor product of Pauli matrices (a “Pauli string”) and each c_i is a complex coefficient. The representation is exact and enables efficient arithmetic because Pauli matrices follow simple multiplication rules: XX = I, YY = I, ZZ = I, XY = iZ, Y*X = -iZ, and so on. Multiplying two Pauli strings always produces another Pauli string times a phase factor of +1, -1, +i, or -i.

What Is a SparsePauliOp?

A Hamiltonian acting on n qubits can be written as a weighted sum of tensor products of Pauli matrices:

H = c0 * (I x I) + c1 * (Z x I) + c2 * (I x Z) + c3 * (X x X)

SparsePauliOp stores only the non-zero terms, making it memory-efficient for large systems where most Pauli coefficients are zero. The class stores two arrays internally: a list of Pauli strings (as a PauliList) and a NumPy array of complex coefficients. This compact representation supports fast arithmetic, conversion to matrix form, and direct integration with Qiskit primitives.

Qubit Ordering Convention

Qiskit uses little-endian bit ordering throughout, and SparsePauliOp follows this convention. The rightmost character in a Pauli string corresponds to qubit 0, and the leftmost character corresponds to qubit n-1. This is easy to get wrong, so take a moment to internalize it.

Concretely:

SparsePauliOp("ZI") applies Z to qubit 1 and I to qubit 0. This is not “Z on qubit 0.”
SparsePauliOp("IZ") applies I to qubit 1 and Z to qubit 0.

You can verify this by converting to matrix form and comparing:

from qiskit.quantum_info import SparsePauliOp
import numpy as np

# Z on qubit 1, I on qubit 0
op_zi = SparsePauliOp("ZI")
print("ZI matrix:")
print(op_zi.to_matrix().real)

# I on qubit 1, Z on qubit 0
op_iz = SparsePauliOp("IZ")
print("\nIZ matrix:")
print(op_iz.to_matrix().real)

# For comparison, the single-qubit Z matrix is diag(1, -1).
# "ZI" = Z ⊗ I = diag(1, 1, -1, -1)
# "IZ" = I ⊗ Z = diag(1, -1, 1, -1)

The output confirms: "ZI" produces diag(1, 1, -1, -1) while "IZ" produces diag(1, -1, 1, -1). If you read the string left-to-right as “first qubit, second qubit,” you will get the wrong answer. Always remember: rightmost character is qubit 0.

Creating SparsePauliOp Instances

from qiskit.quantum_info import SparsePauliOp

# Single Pauli string (rightmost character acts on qubit 0)
op = SparsePauliOp("ZI")
print(op)  # SparsePauliOp(['ZI'], coeffs=[1.+0.j])

# Weighted sum of Paulis
H = SparsePauliOp.from_list([
    ("II", -1.0526),
    ("ZI", 0.3979),
    ("IZ", -0.3979),
    ("ZZ", -0.0112),
    ("XX", 0.1809),
])
print(H)

Note the string ordering convention: the rightmost character corresponds to qubit 0, the leftmost to qubit n-1. This is opposite to standard tensor product notation, so keep it consistent throughout your code.

You can also construct a SparsePauliOp from individual Pauli objects or from a PauliList:

from qiskit.quantum_info import Pauli, SparsePauliOp

# From individual Paulis with coefficients
op = SparsePauliOp(Pauli("XYZ"), coeffs=[0.5])

# From a PauliList
labels = ["XXI", "YYI", "ZZI"]
coeffs = [0.25, 0.25, 0.25]
op = SparsePauliOp.from_list(list(zip(labels, coeffs)))
print(op)

Arithmetic Operations

SparsePauliOp supports standard arithmetic:

A = SparsePauliOp.from_list([("XX", 0.5), ("ZZ", 0.5)])
B = SparsePauliOp.from_list([("YY", -0.5), ("ZZ", 0.5)])

C = A + B           # sum
D = 2.0 * A         # scalar multiply
E = A @ B           # operator product
F = A.adjoint()     # Hermitian conjugate

# Simplify and remove near-zero terms
G = C.simplify(atol=1e-10)
print(G)

When you add two operators, the result simply concatenates their Pauli lists. If both operators contain the "ZZ" term, the sum will have two separate "ZZ" entries rather than one merged entry. Calling .simplify() merges duplicate terms and drops any terms whose coefficient magnitude falls below the specified tolerance.

Why Pauli Decomposition?

Physical Hamiltonians do not arrive in Pauli form by default. The process of converting them involves domain-specific mappings that depend on the type of system you are simulating.

Quantum Chemistry: Fermion-to-Qubit Mappings

In quantum chemistry, the molecular Hamiltonian is first expressed in second quantization using fermionic creation and annihilation operators. These operators obey anticommutation relations that have no direct qubit analogue, so a mapping is needed.

The three most common mappings are:

Jordan-Wigner: maps each fermionic mode to one qubit. Simple and intuitive, but produces Pauli strings with many Z operators (long “Z chains”) that grow with system size.
Bravyi-Kitaev: uses a tree-like encoding that reduces the length of Pauli strings to O(log n) at the cost of more complex mapping logic.
Parity mapping: similar to Jordan-Wigner but exploits parity symmetries to reduce qubit count by one or two qubits.

Each mapping produces a different SparsePauliOp for the same physical Hamiltonian, with different numbers of terms and different Pauli string lengths. The physics is identical; only the encoding differs.

How Many Terms?

The number of Pauli terms grows with molecular size, which is why sparse representation matters:

Molecule	Qubits	Pauli Terms (Jordan-Wigner)
H2	4 (2 active)	5
LiH	12	~276
H2O	14	~1,086
N2	20	~2,951

Even for molecules with thousands of Pauli terms, each term is just a string of length n plus one complex number. Storing 3,000 terms for a 20-qubit molecule requires kilobytes of memory. Storing the equivalent dense matrix (2^20 x 2^20 complex entries) would require terabytes.

Condensed Matter and Spin Systems

For spin-lattice Hamiltonians (Heisenberg, Ising, XY models), the Hamiltonian is already naturally expressed in Pauli form. A 1D Heisenberg chain on n sites, for example, has 3(n-1) terms for nearest-neighbor interactions, making SparsePauliOp a natural fit.

Building a Hydrogen Hamiltonian

Here is a 2-qubit Hamiltonian for H2 at a fixed bond length:

from qiskit.quantum_info import SparsePauliOp

# H2 STO-3G Hamiltonian (Jordan-Wigner mapping, minimal basis)
h2_hamiltonian = SparsePauliOp.from_list([
    ("II", -1.0523732),
    ("IZ",  0.39793742),
    ("ZI", -0.39793742),
    ("ZZ", -0.01128010),
    ("XX",  0.18093119),
])

print(f"Number of qubits: {h2_hamiltonian.num_qubits}")
print(f"Number of terms: {len(h2_hamiltonian)}")

This 5-term operator encodes the full electronic structure of H2 in a minimal basis set. The II term is the energy offset (nuclear repulsion plus constant electronic contributions). The IZ and ZI terms represent single-qubit (number operator) contributions. The ZZ and XX terms capture two-body electron-electron interactions. Despite the underlying physics being a continuous-space Coulomb problem, the qubit representation is compact and exact within the chosen basis.

Using SparsePauliOp with Estimator

The Estimator primitive computes expectation values of the form <psi|H|psi> for a given quantum state |psi>. When |psi> is prepared by a parameterized circuit, the Estimator evaluates the expectation value at specified parameter values. This is exactly the core subroutine that VQE calls at every optimization step: prepare a trial state, measure the energy, and feed the result to a classical optimizer.

The Estimator accepts tuples of (circuit, observable, parameter_values). It decomposes the observable into individual Pauli terms, measures each one (or groups compatible ones together), and combines the results to produce the total expectation value.

from qiskit import QuantumCircuit
from qiskit.circuit import Parameter
from qiskit.primitives import StatevectorEstimator
import numpy as np

# Ansatz circuit
theta = Parameter("theta")
ansatz = QuantumCircuit(2)
ansatz.x(0)
ansatz.ry(theta, 1)
ansatz.cx(1, 0)

# Sweep over parameter values
estimator = StatevectorEstimator()
angles = np.linspace(0, 2 * np.pi, 50)

job = estimator.run(
    [(ansatz, h2_hamiltonian, [[a]]) for a in angles]
)
energies = [float(job.result()[i].data.evs.item()) for i in range(len(angles))]
print(f"Minimum energy: {min(energies):.6f} Ha")

A VQE-Like Optimization Loop

The parameter sweep above scans a single parameter. A real VQE uses a classical optimizer to find the minimum more efficiently. Here is a minimal example using SciPy:

from qiskit import QuantumCircuit
from qiskit.circuit import Parameter
from qiskit.primitives import StatevectorEstimator
from scipy.optimize import minimize
import numpy as np

# Build a simple ansatz
theta = Parameter("theta")
ansatz = QuantumCircuit(2)
ansatz.x(0)
ansatz.ry(theta, 1)
ansatz.cx(1, 0)

estimator = StatevectorEstimator()

def cost_function(params):
    """Evaluate <psi(theta)|H|psi(theta)> for the given parameters."""
    job = estimator.run([(ansatz, h2_hamiltonian, [params])])
    energy = float(job.result()[0].data.evs.item())
    return energy

# Run the optimizer
result = minimize(cost_function, x0=[0.5], method="COBYLA")
print(f"VQE ground state energy: {result.fun:.6f} Ha")
print(f"Optimal parameter: {result.x[0]:.4f}")

Each call to cost_function invokes the Estimator, which internally decomposes h2_hamiltonian into its 5 Pauli terms and computes the expectation value of each. The Estimator handles all the measurement logistics, so your code only needs to provide the circuit, the operator, and the parameter values.

Converting to and from Other Formats

import numpy as np
from qiskit.quantum_info import SparsePauliOp, Operator

# To dense matrix
matrix = h2_hamiltonian.to_matrix()
print(matrix.shape)  # (4, 4)

# From a dense matrix (slow for large systems)
dense_op = Operator(np.array([[1, 0], [0, -1]]))
sparse = SparsePauliOp.from_operator(dense_op)
print(sparse)

# Check Hermiticity
import numpy as np
mat = h2_hamiltonian.to_matrix()
print(np.allclose(mat, mat.conj().T))  # True

The .to_matrix() method constructs the full 2^n x 2^n dense matrix, which is useful for verifying small operators or for exact classical simulation. For operators on more than about 15 qubits, this method becomes impractical due to memory requirements.

The .to_list() method returns the operator as a list of (label, coefficient) tuples, which is useful for serialization and inspection:

for label, coeff in h2_hamiltonian.to_list():
    print(f"{label}: {coeff.real:+.6f}")

Reducing Operator Size

For larger molecules, you may want to remove small terms to reduce circuit depth:

# Keep only terms with |coefficient| > threshold
truncated = SparsePauliOp.from_list([
    (label, coeff) for label, coeff in
    zip(h2_hamiltonian.paulis.to_labels(), h2_hamiltonian.coeffs)
    if abs(coeff) > 0.01
])
print(f"Reduced from {len(h2_hamiltonian)} to {len(truncated)} terms")

Truncation introduces a systematic error in the computed expectation value. The error is bounded by the sum of the absolute values of the dropped coefficients. For a Hamiltonian with many small terms, this bound is often loose, and the actual error is much smaller in practice. Always benchmark against the full operator on a small test case before applying truncation to production workloads.

Performance Considerations

When working with large operators or running many Estimator jobs, keep these guidelines in mind.

Batch your Estimator calls. The Estimator accepts a list of (circuit, observable, parameter_values) tuples in a single .run() call. Batching multiple evaluations into one call is significantly more efficient than submitting them one at a time, because it amortizes compilation overhead and allows the backend to optimize execution scheduling.

Call .simplify() after arithmetic. Addition and composition can produce duplicate Pauli strings. For example, adding two operators that both contain a "ZZI" term results in two separate "ZZI" entries. The .simplify() method merges these duplicates by summing their coefficients and removes any terms with near-zero coefficients. If you perform many additions in a loop without simplifying, the operator can accumulate thousands of redundant terms that slow down every subsequent operation.

# Building an operator iteratively
op = SparsePauliOp.from_list([("III", 0.0)])
for i in range(100):
    op = op + SparsePauliOp.from_list([("ZZI", 0.01), ("IXX", 0.02)])

print(f"Before simplify: {len(op)} terms")
op = op.simplify()
print(f"After simplify: {len(op)} terms")

Avoid from_operator() for large matrices. The SparsePauliOp.from_operator(dense_matrix) constructor performs a full Pauli decomposition of the dense matrix, which requires O(4^n) operations and O(4^n) memory. For anything beyond about 10 qubits, this becomes prohibitively slow. If you have a Hamiltonian in a domain-specific form (second-quantized, spin model, etc.), use the appropriate mapping tools to construct the SparsePauliOp directly rather than going through a dense intermediate.

Group commuting terms for measurement. When running on real hardware, the number of measurement circuits scales with the number of distinct measurement bases. Pauli terms that share the same qubit-wise commuting group can be measured simultaneously. Qiskit’s Estimator handles this grouping automatically, but being aware of it helps you understand why operators with fewer unique measurement bases execute faster.

Common Mistakes

Getting qubit ordering backwards. The string "ZXI" applies I to qubit 0, X to qubit 1, and Z to qubit 2. If you construct operators by reading the string left-to-right as qubit 0, 1, 2, every multi-qubit term in your Hamiltonian will act on the wrong qubits. When in doubt, convert a small test case to a matrix and verify.

Forgetting to call .simplify(). After repeated arithmetic (especially in loops), your operator may contain hundreds or thousands of duplicate Pauli strings. This does not change the mathematical operator, but it dramatically slows down the Estimator because each duplicate term is evaluated separately. Call .simplify() after building your operator and periodically during iterative construction.

Passing a non-Hermitian operator to the Estimator. The expectation value of a Hermitian operator is always real, which is what physical observables require. If you accidentally construct a non-Hermitian operator (for example, by using complex coefficients that do not come in conjugate pairs, or by including a bare Y term with a real coefficient where the Hamiltonian physics demands a purely imaginary one), the Estimator may return complex values or raise an error. You can check Hermiticity by verifying that all coefficients of the Pauli decomposition satisfy the conjugation symmetry, or more directly:

mat = op.to_matrix()
is_hermitian = np.allclose(mat, mat.conj().T)

Confusing SparsePauliOp with Pauli. The Pauli class represents a single n-qubit Pauli string (like "XYZZ") with an optional phase factor, but no arbitrary complex coefficient. It is a single basis element, not a linear combination. SparsePauliOp is a weighted sum of Pauli objects. If you try to pass a Pauli where a SparsePauliOp is expected, Qiskit may silently convert it (treating the coefficient as 1.0), but relying on this implicit conversion makes code harder to read and debug.

SparsePauliOp is the standard way to represent quantum operators in modern Qiskit. Mastering it is essential for VQE, QAOA, and any algorithm that needs to evaluate physical observables on quantum hardware.