Qiskit Runtime Sessions and Primitives: A Production Guide

IBM Quantum’s Qiskit Runtime is the execution layer that sits between your Python code and real quantum hardware. Rather than submitting isolated jobs, Runtime introduces structured execution patterns (sessions, batches, and primitives) that dramatically simplify how you build algorithms and control QPU costs.

Sessions vs Batch Mode

Session mode reserves a continuous block of QPU access. Once a session starts, your jobs run without being interleaved with other users’ work. The QPU stays “warm” (no qubit re-calibration between jobs), which reduces job turnaround time for iterative algorithms like VQE or QAOA.

from qiskit_ibm_runtime import QiskitRuntimeService, Session

service = QiskitRuntimeService(channel="ibm_quantum")
backend = service.least_busy(operational=True, simulator=False)

with Session(backend=backend) as session:
    # All jobs submitted here share QPU access
    # Session closes automatically when the block exits
    pass

Sessions have a maximum duration (typically 8 hours). If your algorithm finishes sooner, close the session explicitly with session.close() to stop QPU time charges.

Batch mode is designed for embarrassingly parallel workloads where jobs do not depend on each other’s results. IBM schedules batch jobs more flexibly (interleaving them with other users if needed), so latency per job is higher but throughput for large job sets can be better.

from qiskit_ibm_runtime import Batch

with Batch(backend=backend) as batch:
    # Submit many independent jobs
    pass

Choose Session for iterative algorithms and Batch for parameter sweeps, benchmarking, or running multiple independent circuits.

The Estimator Primitive

The Estimator primitive computes expectation values of observables given a circuit that prepares a quantum state. It is the workhorse for variational algorithms.

There are two variants: StatevectorEstimator for noiseless simulation (useful for prototyping) and EstimatorV2 from qiskit_ibm_runtime for real hardware execution.

Primitive Unified Blocs (PUBs)

Both Estimator and Sampler accept inputs in PUB format: tuples of (circuit, observables, parameter_values, precision) for Estimator, or (circuit, parameter_values, shots) for Sampler.

PUBs allow you to batch multiple circuit-observable pairs into a single job, reducing per-job overhead significantly:

# Requires: qiskit_ibm_runtime
from qiskit.circuit import QuantumCircuit, ParameterVector
from qiskit.quantum_info import SparsePauliOp
from qiskit_ibm_runtime import EstimatorV2, Session

# Build a parameterized ansatz
theta = ParameterVector("theta", 4)
ansatz = QuantumCircuit(2)
ansatz.ry(theta[0], 0)
ansatz.ry(theta[1], 1)
ansatz.cx(0, 1)
ansatz.ry(theta[2], 0)
ansatz.ry(theta[3], 1)

# Define the observable (Hamiltonian)
H = SparsePauliOp.from_list([
    ("ZZ", -1.0),
    ("XX", 0.5),
    ("YY", 0.5),
])

# Parameter sets to evaluate (e.g., multiple optimizer steps batched)
import numpy as np
param_values = np.random.uniform(-np.pi, np.pi, size=(5, 4))

# Each row of param_values becomes one PUB entry
with Session(backend=backend) as session:
    estimator = EstimatorV2(mode=session)
    # Build PUBs: list of (circuit, observables, param_values)
    pubs = [(ansatz, H, param_values)]
    job = estimator.run(pubs)
    result = job.result()
    # result[0].data.evs contains a (5,) array of expectation values
    evs = result[0].data.evs
    print(f"Expectation values: {evs}")

ISA Circuits

Before submitting to hardware, circuits must be transpiled to Instruction Set Architecture (ISA) circuits: circuits expressed only in the native gates of the target device, with qubits mapped to physical qubits.

# Requires: qiskit_ibm_runtime
from qiskit.transpiler.preset_passmanagers import generate_preset_pass_manager

# Generate ISA circuit (optimization_level 1-3)
pm = generate_preset_pass_manager(
    backend=backend,
    optimization_level=2,
)
isa_ansatz = pm.run(ansatz)
isa_H = H.apply_layout(isa_ansatz.layout)

# Now use the ISA circuit in your PUBs
pubs = [(isa_ansatz, isa_H, param_values)]

Always use ISA circuits with EstimatorV2 on real hardware. The primitive will reject non-ISA circuits to prevent silent transpilation errors.

Estimator Options and Resilience Levels

EstimatorV2 supports a range of error mitigation options through its options interface:

from qiskit_ibm_runtime.options import EstimatorOptions

options = EstimatorOptions()

# resilience_level controls the mitigation stack
# 0: No mitigation (fastest, cheapest)
# 1: Dynamical decoupling + readout error mitigation
# 2: Level 1 + zero-noise extrapolation (ZNE)
# 3: Level 2 + Probabilistic Error Cancellation (PEC)
options.resilience_level = 1

# Twirling randomizes noise for better ZNE behavior
options.twirling.enable_gates = True
options.twirling.num_randomizations = 32

# Readout mitigation
options.resilience.measure_mitigation = True

estimator = EstimatorV2(mode=session, options=options)

Resilience level 1 is a good default for most production runs: it provides meaningful noise reduction at modest cost. Level 2 (ZNE) can cut errors in half for shallow circuits but increases QPU time by 3-5x. Level 3 adds further mitigation at the cost of substantially more shots.

The Sampler Primitive

SamplerV2 returns quasi-probability distributions (bitstring counts) rather than expectation values. It is the right choice when you need measurement outcome distributions:

# Requires: qiskit_ibm_runtime
from qiskit_ibm_runtime import SamplerV2

qc = QuantumCircuit(3, 3)
qc.h(0)
qc.cx(0, 1)
qc.cx(0, 2)
qc.measure_all()

isa_qc = pm.run(qc)

with Session(backend=backend) as session:
    sampler = SamplerV2(mode=session)
    # PUB for Sampler: (circuit, param_values, shots)
    pub = (isa_qc, [], 4096)
    job = sampler.run([pub])
    result = job.result()
    counts = result[0].data.meas.get_counts()
    print(counts)

Complete VQE with RuntimeEstimator

Here is a full VQE implementation using EstimatorV2 inside a session, including ISA transpilation and optimizer integration:

import numpy as np
from scipy.optimize import minimize
from qiskit.circuit.library import TwoLocal
from qiskit.quantum_info import SparsePauliOp
from qiskit.transpiler.preset_passmanagers import generate_preset_pass_manager
from qiskit_ibm_runtime import QiskitRuntimeService, Session, EstimatorV2
from qiskit_ibm_runtime.options import EstimatorOptions

service = QiskitRuntimeService(channel="ibm_quantum")
backend = service.least_busy(operational=True, simulator=False, min_num_qubits=2)

# H2 Hamiltonian (simplified 2-qubit form)
hamiltonian = SparsePauliOp.from_list([
    ("II", -1.0523732),
    ("IZ",  0.3979374),
    ("ZI", -0.3979374),
    ("ZZ", -0.0112801),
    ("XX",  0.1809312),
])

# Ansatz: TwoLocal with Ry and CNOT
ansatz = TwoLocal(2, "ry", "cx", reps=2)
num_params = ansatz.num_parameters

# Transpile to ISA
pm = generate_preset_pass_manager(backend=backend, optimization_level=2)
isa_ansatz = pm.run(ansatz)
isa_H = hamiltonian.apply_layout(isa_ansatz.layout)

# Options: resilience level 1 for production run
options = EstimatorOptions()
options.resilience_level = 1
options.default_shots = 8192

with Session(backend=backend) as session:
    estimator = EstimatorV2(mode=session, options=options)

    eval_count = 0

    def cost_function(params):
        nonlocal eval_count
        pub = (isa_ansatz, isa_H, [params])
        result = estimator.run([pub]).result()
        energy = float(result[0].data.evs[0])
        eval_count += 1
        print(f"  Step {eval_count}: energy = {energy:.6f} Ha")
        return energy

    x0 = np.zeros(num_params)
    opt_result = minimize(cost_function, x0, method="SLSQP",
                          options={"maxiter": 50, "ftol": 1e-5})

    print(f"\nVQE converged: {opt_result.success}")
    print(f"Ground state energy: {opt_result.fun:.6f} Ha")
    print(f"Total evaluations: {eval_count}")
    print(f"Session ID: {session.session_id}")

Cost Optimization Strategies

QPU time on IBM Quantum is billed in seconds of usage. Several practices keep costs manageable:

Use the right resilience level. Levels 2 and 3 multiply QPU time by large factors. Start with level 1 and only increase if the noise is unacceptable for your application.

Batch parameter evaluations within a session. Each estimator.run() call has overhead. Passing multiple PUBs in one call (or multiple parameter sets per PUB) amortizes that overhead.

Prefer least_busy with min_num_qubits. Larger backends have longer queues. Use the minimum number of qubits you need and pick the least busy machine.

Set default_shots appropriately. More shots reduce statistical variance but cost more. For variational optimization steps, 2000-4000 shots are often sufficient. Reserve 8192+ for final result verification.

Close sessions promptly. A session that sits idle still consumes QPU time allocation. Use the context manager (with Session(...)) to ensure automatic cleanup.

Qiskit Runtime’s session model, combined with PUBs and ISA circuits, represents a mature interface for running real quantum algorithms at production scale. Understanding the cost levers (resilience level, shots, session lifetime, and job batching) lets you get the most out of limited QPU access.