Qiskit Serverless: Scaling Quantum Workloads on IBM Quantum Platform

Running a VQE parameter sweep on IBM Quantum hardware naively means submitting circuits one at a time and waiting for each job to complete before submitting the next. With hundreds of parameter points, this serializes what should be a parallel workload and wastes time on queue overhead. Qiskit Serverless solves this by letting you define a quantum-classical program that runs on IBM Quantum’s cloud infrastructure, with access to both classical compute resources and quantum hardware, executing circuit batches in parallel and returning aggregated results to your local machine.

What is Qiskit Serverless?

Qiskit Serverless (part of the IBM Quantum Platform) is a cloud execution framework for quantum programs. You write a Python function decorated with @qiskit_serverless.distribute_task or wrapped in a QiskitFunction, upload it to the cloud, and invoke it remotely. The infrastructure handles:

Allocating classical compute resources for preprocessing and postprocessing
Batching and parallelizing circuit submissions to quantum hardware
Aggregating results and returning them to the caller
Managing IBM Quantum session and backend connections

This is distinct from the Qiskit Functions Catalog (pre-built functions published by IBM and its partners) and from manually managing Session or Batch objects.

Installation and Setup

# Install required packages
# pip install qiskit-ibm-catalog qiskit-serverless qiskit-ibm-runtime

# Client-side tools for uploading and running programs on IBM Quantum
from qiskit_ibm_catalog import QiskitServerless, QiskitFunction
from qiskit_ibm_runtime import QiskitRuntimeService, EstimatorV2, SamplerV2, Session, Batch
from qiskit.circuit.library import EfficientSU2
from qiskit.primitives import StatevectorEstimator
from qiskit.quantum_info import SparsePauliOp
import numpy as np

# Authenticate with IBM Quantum (older docs show the retired "ibm_quantum" channel)
# service = QiskitRuntimeService(channel="ibm_quantum_platform", token="YOUR_API_KEY")
# backend = service.least_busy(operational=True, simulator=False)

# For local development, use the fake backend
from qiskit_ibm_runtime.fake_provider import FakeSherbrooke
from qiskit_ibm_runtime import EstimatorV2 as RuntimeEstimator
from qiskit.transpiler.preset_passmanagers import generate_preset_pass_manager

backend = FakeSherbrooke()
pm = generate_preset_pass_manager(backend=backend, optimization_level=1)
print(f"Backend: {backend.name}")
print(f"Qubits: {backend.num_qubits}")

Defining a VQE Workload

A VQE parameter sweep evaluates the energy $E(\theta)$ at many parameter points, either to map the landscape or to feed an outer optimizer. This is embarrassingly parallel: each evaluation is independent.

# Requires: qiskit_ibm_runtime
n_qubits = 4
reps = 2

# Ansatz and Hamiltonian
ansatz = EfficientSU2(n_qubits, reps=reps, entanglement='linear')
n_params = ansatz.num_parameters
print(f"Ansatz parameters: {n_params}")

H = SparsePauliOp.from_list([
    ("ZZII", 1.0),
    ("IZZI", 0.8),
    ("IIZZ", 0.8),
    ("XIXI", 0.4),
    ("IXIX", 0.4),
])

# Transpile the ansatz once
ansatz_isa = pm.run(ansatz)
H_isa = H.apply_layout(ansatz_isa.layout)

print(f"Original depth: {ansatz.depth()}")
print(f"Transpiled depth: {ansatz_isa.depth()}")

Session vs Batch vs Serverless

Understanding the execution modes is essential for cost and latency optimization:

Session: Reserves a dedicated slot on the backend. All jobs in the session run back-to-back without queuing between them. Best for iterative algorithms (VQE, QAOA) where the next circuit depends on the previous result. You pay for the full session duration, including classical compute time between jobs.

Batch: Groups independent circuits and submits them together. Jobs may run on different backend instances. Best for parallel sweeps where all circuits are known upfront. Lower overhead than multiple individual jobs, but no session reservation.

Serverless: Runs your entire program on IBM Quantum’s cloud, with access to both classical and quantum resources inside the program. Best for complex workflows combining preprocessing, multiple quantum jobs, and postprocessing. The program can spawn parallel tasks internally.

def session_vqe_step(params_batch, backend, ansatz_isa, H_isa):
    """
    Evaluate energy for a batch of parameter sets within a Session.
    Returns list of energies.
    """
    pub_list = [(ansatz_isa, H_isa, [p]) for p in params_batch]

    with Session(backend=backend) as session:
        estimator = RuntimeEstimator(mode=session)
        job = estimator.run(pub_list)
        results = job.result()

    energies = [float(results[i].data.evs) for i in range(len(params_batch))]
    return energies

def batch_vqe_sweep(params_list, backend, ansatz_isa, H_isa):
    """
    Evaluate all parameter sets in a single Batch submission.
    """
    pub_list = [(ansatz_isa, H_isa, [p]) for p in params_list]

    with Batch(backend=backend) as batch:
        estimator = RuntimeEstimator(mode=batch)
        job = estimator.run(pub_list)
        results = job.result()

    return [float(results[i].data.evs) for i in range(len(params_list))]

Writing a Serverless Function

A QiskitFunction is a Python script that runs on IBM Quantum’s serverless infrastructure. You write it as a standalone Python file, upload it, and invoke it remotely.

# This would be written to a file (e.g., vqe_sweep_program.py)
# and uploaded via QiskitServerless (from qiskit-ibm-catalog)

VQE_PROGRAM_SOURCE = '''
from qiskit_serverless import distribute_task, get_arguments, save_result
from qiskit_ibm_runtime import EstimatorV2, Session
from qiskit.transpiler.preset_passmanagers import generate_preset_pass_manager
from qiskit.quantum_info import SparsePauliOp
from qiskit.circuit.library import EfficientSU2
import numpy as np

@distribute_task()
def evaluate_params(backend_name, params_chunk, hamiltonian_data, ansatz_config):
    """Evaluate a chunk of parameter sets in parallel."""
    from qiskit_ibm_runtime import QiskitRuntimeService
    service = QiskitRuntimeService()
    backend = service.backend(backend_name)

    pm = generate_preset_pass_manager(backend=backend, optimization_level=1)
    H = SparsePauliOp.from_list(hamiltonian_data)
    ansatz = EfficientSU2(**ansatz_config)
    ansatz_isa = pm.run(ansatz)
    H_isa = H.apply_layout(ansatz_isa.layout)

    pub_list = [(ansatz_isa, H_isa, [p]) for p in params_chunk]

    with Session(backend=backend) as session:
        estimator = EstimatorV2(mode=session)
        job = estimator.run(pub_list)
        results = job.result()

    return [float(results[i].data.evs) for i in range(len(params_chunk))]

# Main program entry point
args = get_arguments()
params_list = args["params_list"]
backend_name = args.get("backend_name", "ibm_fez")
hamiltonian_data = args["hamiltonian_data"]
ansatz_config = args["ansatz_config"]

# Split into chunks for parallel execution
chunk_size = 10
chunks = [params_list[i:i+chunk_size] for i in range(0, len(params_list), chunk_size)]

# Launch parallel tasks
futures = [
    evaluate_params(backend_name, chunk, hamiltonian_data, ansatz_config)
    for chunk in chunks
]

# Aggregate results
all_energies = []
for future in futures:
    all_energies.extend(future.result())

save_result({"energies": all_energies, "n_params_evaluated": len(all_energies)})
'''

print("Serverless program defined.")
print(f"Program length: {len(VQE_PROGRAM_SOURCE)} characters")

Local Parallel Simulation

For local testing without IBM Quantum access, we can simulate the parallel sweep using the statevector estimator:

# Requires: qiskit_ibm_runtime
from qiskit.primitives import StatevectorEstimator

def local_parallel_sweep(params_list, ansatz, H, chunk_size=10):
    """
    Simulate a parallel parameter sweep locally.
    In production, each chunk would be a separate Serverless task.
    """
    estimator = StatevectorEstimator()
    all_energies = []

    chunks = [params_list[i:i+chunk_size] for i in range(0, len(params_list), chunk_size)]
    print(f"Processing {len(params_list)} points in {len(chunks)} chunks of {chunk_size}")

    for chunk_idx, chunk in enumerate(chunks):
        pub_list = [(ansatz, H, [p]) for p in chunk]
        job = estimator.run(pub_list)
        results = job.result()
        chunk_energies = [float(results[i].data.evs) for i in range(len(chunk))]
        all_energies.extend(chunk_energies)
        print(f"  Chunk {chunk_idx+1}/{len(chunks)}: min energy = {min(chunk_energies):.4f}")

    return np.array(all_energies)

# Generate parameter sweep: scan first two parameters
n_sweep_points = 30
params_sweep = []
np.random.seed(0)
base_params = np.random.uniform(-np.pi, np.pi, n_params)

for angle in np.linspace(-np.pi, np.pi, n_sweep_points):
    p = base_params.copy()
    p[0] = angle
    params_sweep.append(p)

energies = local_parallel_sweep(params_sweep, ansatz, H)

import matplotlib.pyplot as plt
angles = np.linspace(-np.pi, np.pi, n_sweep_points)
plt.figure(figsize=(7, 4))
plt.plot(angles, energies, 'o-', markersize=4)
plt.xlabel('Parameter 0 (radians)')
plt.ylabel('Energy')
plt.title('VQE Energy Landscape (1D Parameter Scan)')
plt.tight_layout()
plt.savefig('vqe_sweep.png', dpi=150)
print(f"\nMinimum energy found: {energies.min():.6f} at angle {angles[energies.argmin()]:.3f}")

Estimating and Controlling Costs

IBM Quantum bills paid plans by QPU time consumed. Per-shot execution time depends on circuit depth, qubit count, and repetition delay, so check IBM’s documentation and your plan’s pricing page for current rates. The function below is a rough illustrative model, not IBM’s billing formula:

# Requires: qiskit_ibm_runtime
def estimate_job_cost(n_circuits, shots_per_circuit, circuit_depth, n_qubits):
    """
    Rough cost estimate in QPU seconds.
    Illustrative model only; not IBM's billing formula.
    """
    # Time per shot scales with circuit depth and qubit count
    time_per_shot_us = 0.01 * circuit_depth  # microseconds
    total_shots = n_circuits * shots_per_circuit
    total_time_s = total_shots * time_per_shot_us * 1e-6

    # Overhead per job submission (queue, compilation, classical overhead)
    overhead_per_job_s = 2.0
    n_jobs = max(1, n_circuits // 300)  # illustrative assumption: a few hundred PUBs per job
    total_overhead_s = n_jobs * overhead_per_job_s

    total_s = total_time_s + total_overhead_s
    print(f"Estimated QPU time: {total_s:.1f} seconds")
    print(f"  Circuit execution: {total_time_s:.2f} s")
    print(f"  Job overhead ({n_jobs} jobs): {total_overhead_s:.1f} s")
    return total_s

print("Cost estimate for 200-point VQE sweep:")
estimate_job_cost(
    n_circuits=200,
    shots_per_circuit=1024,
    circuit_depth=ansatz_isa.depth(),
    n_qubits=n_qubits
)

print("\nCost estimate with Batch (fewer job submissions):")
estimate_job_cost(
    n_circuits=200,
    shots_per_circuit=1024,
    circuit_depth=ansatz_isa.depth(),
    n_qubits=n_qubits
)

Cost optimization strategies: Use Batch instead of individual jobs to reduce per-job overhead. Transpile once and cache the ISA circuit rather than re-transpiling for each parameter set. Use EstimatorV2 with PUB batching rather than individual circuit jobs. Set shots as low as acceptable for your gradient estimator (fewer shots = lower cost but noisier gradients). For iterative VQE, a Session avoids queue re-entry overhead between optimizer steps but charges for idle time, so keep classical compute between jobs fast.

Uploading and Running a Serverless Function

# In production (requires an IBM Quantum account):
# from qiskit_ibm_catalog import QiskitServerless, QiskitFunction
# serverless = QiskitServerless(token="YOUR_IBM_CLOUD_API_KEY")
#
# with open("vqe_sweep_program.py", "w") as f:
#     f.write(VQE_PROGRAM_SOURCE)
#
# function = QiskitFunction(
#     title="vqe-parameter-sweep",
#     entrypoint="vqe_sweep_program.py",
#     working_dir="./",
# )
# serverless.upload(function)
#
# vqe_sweep = serverless.load("vqe-parameter-sweep")
# job = vqe_sweep.run(
#     params_list=[p.tolist() for p in params_sweep],
#     backend_name="ibm_fez",
#     hamiltonian_data=[("ZZII", 1.0), ("IZZI", 0.8)],
#     ansatz_config={"num_qubits": 4, "reps": 2, "entanglement": "linear"},
# )
# result = job.result()
# print(result["energies"])

print("Serverless upload/run pattern shown above (requires IBM Quantum credentials).")
print("For local testing, use the local_parallel_sweep function demonstrated above.")

Note: older docs show a ServerlessClient imported from qiskit_serverless; that client targets self-hosted clusters. For IBM Quantum Platform’s managed service, use QiskitServerless from the qiskit-ibm-catalog package as shown above, while the program itself still imports distribute_task, get_arguments, and save_result from qiskit_serverless.

Qiskit Serverless shifts the mental model from “submit a circuit” to “deploy a quantum program.” For workloads that require dozens to thousands of circuit evaluations, coordinating them at the program level reduces latency, lowers overhead, and makes cost tracking straightforward. It is the natural fit for quantum optimization, machine learning training loops, and large-scale variational algorithms.