Parametric Quantum Programs and Compilation with pyQuil

Why Parametric Compilation Matters

Variational quantum algorithms like VQE and QAOA require running the same circuit structure hundreds or thousands of times with different rotation angles. Each iteration of the classical optimizer produces a new set of angles that must be injected into the circuit. The question is: do you recompile the circuit every time, or do you compile once and swap in new parameters?

The numbers make the answer obvious. A typical Quilc compilation of a 10-qubit QAOA circuit with 2 layers takes 0.5 to 2 seconds, depending on circuit depth and the target device topology. If your optimizer runs 500 iterations, that is 250 to 1000 seconds spent purely on compilation. With parametric programs, you compile once (about 2 seconds total) and then stream parameter values for all 500 iterations. Each parameter injection takes on the order of milliseconds, not seconds.

Here is a concrete timing comparison:

import time
import numpy as np
from pyquil import Program, get_qc
from pyquil.gates import RZ, RX, CZ, MEASURE

def build_parametric_circuit():
    """Build a parametric 2-qubit circuit with symbolic angles."""
    prog = Program()
    theta = prog.declare("theta", "REAL", 4)
    prog += RZ(theta[0], 0)
    prog += RX(theta[1], 0)
    prog += RZ(theta[2], 1)
    prog += RX(theta[3], 1)
    prog += CZ(0, 1)
    ro = prog.declare("ro", "BIT", 2)
    prog += MEASURE(0, ro[0])
    prog += MEASURE(1, ro[1])
    prog.wrap_in_numshots_loop(100)
    return prog

def build_fixed_circuit(angles):
    """Build a non-parametric circuit with hardcoded angles."""
    prog = Program()
    prog += RZ(angles[0], 0)
    prog += RX(angles[1], 0)
    prog += RZ(angles[2], 1)
    prog += RX(angles[3], 1)
    prog += CZ(0, 1)
    ro = prog.declare("ro", "BIT", 2)
    prog += MEASURE(0, ro[0])
    prog += MEASURE(1, ro[1])
    prog.wrap_in_numshots_loop(100)
    return prog

qc = get_qc("2q-qvm")
n_iterations = 50
all_angles = [np.random.uniform(0, 2 * np.pi, 4) for _ in range(n_iterations)]

# Approach 1: Recompile every iteration
start = time.time()
for angles in all_angles:
    prog = build_fixed_circuit(angles)
    exe = qc.compile(prog)
    qc.run(exe).get_register_map()["ro"]
recompile_time = time.time() - start

# Approach 2: Compile once, stream parameters
start = time.time()
prog = build_parametric_circuit()
exe = qc.compile(prog)
for angles in all_angles:
    qc.run(exe, memory_map={"theta": angles}).get_register_map()["ro"]
parametric_time = time.time() - start

print(f"Recompile every iteration: {recompile_time:.2f}s")
print(f"Compile once (parametric): {parametric_time:.2f}s")
print(f"Speedup: {recompile_time / parametric_time:.1f}x")

On a local QVM, the speedup is typically 3x to 10x for small circuits. On real hardware through QCS, where compilation is more expensive and network latency adds up, the speedup grows to 50x or more for large circuits with many iterations.

pyQuil supports parametric programs through two mechanisms: the declare / memory map approach (recommended for most users) and Quil-T parameters for pulse-level control. This tutorial covers both in depth.

Setup

Install the Forest SDK (includes Quilc and QVM) and pyQuil:

pip install pyquil

Download and install the Forest SDK from Rigetti’s developer portal. Start Quilc and QVM in separate terminals:

quilc -S   # compiler server on port 5555
qvm -S     # QVM server on port 5000

pyQuil connects to these local servers by default. You can also point to Rigetti QCS endpoints for cloud execution.

Basic pyQuil Program

Before parametrics, a quick reminder of pyQuil’s structure:

from pyquil import Program, get_qc
from pyquil.gates import H, CNOT, MEASURE

prog = Program()
prog += H(0)
prog += CNOT(0, 1)
ro = prog.declare("ro", "BIT", 2)
prog += MEASURE(0, ro[0])
prog += MEASURE(1, ro[1])

qc = get_qc("2q-qvm")
result = qc.run(qc.compile(prog)).get_register_map()["ro"]
print(result)

The Quil Instruction Set

Before diving into parametric programs, it helps to understand what Quil programs actually look like at the text level. Quil (Quantum Instruction Language) is a human-readable assembly language for quantum/classical hybrid computation. Every pyQuil Program object serializes to a Quil string.

A complete Quil program has four main sections:

DECLARE ro BIT[2]              # Classical memory declarations
DECLARE theta REAL[4]          # Parameter declarations

PRAGMA PRESERVE_BLOCK          # Compiler directives (optional)
RZ(theta[0]) 0                 # Gate instructions with parameters
RX(pi/2) 0                    # Gate instructions with constants
CZ 0 1                        # Two-qubit gate instructions
PRAGMA END_PRESERVE_BLOCK

MEASURE 0 ro[0]               # Measurement instructions
MEASURE 1 ro[1]

The DECLARE statements allocate classical memory. Gate instructions specify a gate name, optional parameters in parentheses, and target qubit indices. PRAGMA directives communicate hints to the compiler (for example, telling Quilc not to optimize a particular block). MEASURE instructions read a qubit into a classical bit.

Abstract Quil vs. Native Quil

When you write a pyQuil program using gates like H, CNOT, and RY, you are writing abstract Quil. These gates are hardware-agnostic. Quilc transforms abstract Quil into native Quil, which uses only the gates that the target processor physically implements.

For Rigetti’s Ankaa-class processors, native Quil uses:

RZ(theta) for arbitrary Z-rotations (implemented as virtual frame updates, zero duration)
RX(k * pi/2) for X-rotations restricted to multiples of pi/2
ISWAP for the iSWAP entangling gate, the native two-qubit gate on Ankaa-class chips
CZ for the controlled-Z entangling gate (the QVM’s default target and Rigetti’s newer Cepheus systems; check the device ISA on QCS)

Here is what happens when Quilc compiles a simple parametric circuit:

from pyquil import Program, get_qc
from pyquil.gates import H, CNOT, RY, MEASURE

prog = Program()
theta = prog.declare("theta", "REAL", 2)
prog += H(0)
prog += RY(theta[0], 1)
prog += CNOT(0, 1)
prog += RY(theta[1], 0)

ro = prog.declare("ro", "BIT", 2)
prog += MEASURE(0, ro[0])
prog += MEASURE(1, ro[1])

qc = get_qc("2q-qvm")
native = qc.compiler.quil_to_native_quil(prog)
print(native)

The output looks something like this (exact decomposition varies by Quilc version):

DECLARE ro BIT[2]
DECLARE theta REAL[2]
RZ(pi/2) 0
RX(pi/2) 0
RZ(pi/2) 0
RZ(theta[0]/2) 1
RX(pi/2) 1
RZ(theta[0]/2) 1
RX(-pi/2) 1
CZ 0 1
RZ(theta[1]/2) 0
RX(pi/2) 0
RZ(theta[1]/2) 0
RX(-pi/2) 0
MEASURE 0 ro[0]
MEASURE 1 ro[1]

Notice that H decomposes into an RZ-RX-RZ sequence, RY decomposes into RZ-RX-RZ with the parameter threaded through, and CNOT becomes a combination of single-qubit rotations and CZ. The parametric references (theta[0], theta[1]) survive compilation and appear in the native Quil output. This is the key feature that makes parametric compilation work.

REAL Memory Declarations in Depth

The declare method on a Program object creates a classical memory region. It takes three arguments: the name of the region, the memory type, and the size (number of elements).

prog.declare(name, memory_type, memory_size)

pyQuil supports four memory types:

Memory Type	Quil Keyword	Purpose	Python dtype
`"BIT"`	`BIT`	Measurement results (0 or 1)	`int`
`"REAL"`	`REAL`	Floating-point gate angles	`float64`
`"INTEGER"`	`INTEGER`	Integer data for classical logic	`int`
`"OCTET"`	`OCTET`	Raw bytes for advanced use	`int`

For parametric quantum programs, REAL is the type you use for rotation angles. BIT is the type you use for measurement readout registers.

Each declare call maps to a Quil DECLARE instruction:

theta = prog.declare("theta", "REAL", 4)
# Produces: DECLARE theta REAL[4]

phi = prog.declare("phi", "REAL", 3)
# Produces: DECLARE phi REAL[3]

ro = prog.declare("ro", "BIT", 5)
# Produces: DECLARE ro BIT[5]

Multiple Parameter Groups

For complex ansatze, separating parameters into distinct memory regions makes your code cleaner and your optimizer logic simpler. Consider a multi-layer ansatz where each layer has its own set of rotation angles and you also have separate parameters for single-qubit and two-qubit gates:

from pyquil import Program
from pyquil.gates import RZ, RX, CZ

prog = Program()

# Single-qubit rotation parameters: 4 qubits x 3 layers = 12 values
single_params = prog.declare("single_params", "REAL", 12)

# Two-qubit interaction parameters: 3 edges x 3 layers = 9 values
interaction_params = prog.declare("interaction_params", "REAL", 9)

# Measurement register
ro = prog.declare("ro", "BIT", 4)

# Layer 0: single-qubit rotations on 4 qubits
for q in range(4):
    prog += RZ(single_params[0 * 4 + q], q)
    prog += RX(single_params[0 * 4 + q], q)

# Layer 0: entangling gates on 3 edges
prog += CZ(0, 1)
prog += CZ(1, 2)
prog += CZ(2, 3)

When you run the compiled program, you pass each memory region separately in the memory_map dictionary:

import numpy as np

result = qc.run(exe, memory_map={
    "single_params": np.random.uniform(0, 2 * np.pi, 12),
    "interaction_params": np.random.uniform(0, np.pi, 9),
}).get_register_map()["ro"]

This separation is purely organizational. The Quilc compiler treats all REAL memory regions identically during compilation.

Parametric Programs with `declare` and Memory Maps

In Quil, you can declare a classical memory region of type REAL and reference it as a gate parameter. The circuit is compiled once with a symbolic placeholder; at runtime you fill the memory region with concrete values by passing a memory_map to qc.run.

from pyquil import Program, get_qc
from pyquil.gates import RZ, RX, CZ, MEASURE
import numpy as np

prog = Program()

# Declare a REAL memory region with 4 slots for rotation angles
theta = prog.declare("theta", "REAL", 4)

# Build a 2-qubit VQE-style ansatz using native Rigetti gates
# Layer 1: single-qubit rotations
prog += RZ(theta[0], 0)
prog += RX(theta[1], 0)
prog += RZ(theta[2], 1)
prog += RX(theta[3], 1)

# Entangling layer
prog += CZ(0, 1)

# Layer 2: more single-qubit rotations
prog += RZ(theta[0], 0)
prog += RZ(theta[2], 1)

# Measurement
ro = prog.declare("ro", "BIT", 2)
prog += MEASURE(0, ro[0])
prog += MEASURE(1, ro[1])

prog.wrap_in_numshots_loop(1000)

qc = get_qc("2q-qvm")
native_prog = qc.compiler.quil_to_native_quil(prog)
exe = qc.compiler.native_quil_to_executable(native_prog)

# Now run with different angle values without recompiling
angles_a = np.array([0.1, 0.5, 0.3, 0.7], dtype=np.float64)
angles_b = np.array([1.2, 0.8, 0.4, 1.1], dtype=np.float64)

result_a = qc.run(exe, memory_map={"theta": angles_a}).get_register_map()["ro"]
result_b = qc.run(exe, memory_map={"theta": angles_b}).get_register_map()["ro"]

print("Result A:", result_a.mean(axis=0))
print("Result B:", result_b.mean(axis=0))

The key insight: qc.compiler.quil_to_native_quil and qc.compiler.native_quil_to_executable run once. Subsequent calls to qc.run with different memory_map values execute the pre-compiled binary with new parameters injected at runtime.

Multi-Layer Hardware-Efficient Ansatz

Real variational algorithms use deeper circuits with multiple layers. A hardware-efficient ansatz consists of repeating blocks, where each block applies single-qubit rotations to every qubit followed by an entangling layer of CZ gates between neighboring qubits.

Here is a parametric construction for n=4 qubits and L=3 layers. Each layer uses 4 parameters (one RY rotation per qubit), giving 4 * 3 = 12 total parameters. We use parameter sharing: the same parameter index controls the same qubit across all repetitions within a layer.

from pyquil import Program, get_qc
from pyquil.gates import RY, RZ, CZ, MEASURE
import numpy as np

def build_hardware_efficient_ansatz(n_qubits, n_layers):
    """
    Build a hardware-efficient ansatz with:
    - n_qubits qubits
    - n_layers layers
    - Each layer: RY rotations on all qubits, then CZ on adjacent pairs
    - Total parameters: n_qubits * n_layers
    """
    prog = Program()
    n_params = n_qubits * n_layers
    theta = prog.declare("theta", "REAL", n_params)

    for layer in range(n_layers):
        # Single-qubit rotation layer
        for q in range(n_qubits):
            param_idx = layer * n_qubits + q
            prog += RY(theta[param_idx], q)

        # Entangling layer: CZ on adjacent pairs
        for q in range(n_qubits - 1):
            prog += CZ(q, q + 1)

        # Optional: add a closing RZ layer for expressivity
        for q in range(n_qubits):
            param_idx = layer * n_qubits + q
            prog += RZ(theta[param_idx], q)

    # Measurement
    ro = prog.declare("ro", "BIT", n_qubits)
    for q in range(n_qubits):
        prog += MEASURE(q, ro[q])

    prog.wrap_in_numshots_loop(1000)
    return prog, n_params

# Build for 4 qubits, 3 layers
n_qubits = 4
n_layers = 3
prog, n_params = build_hardware_efficient_ansatz(n_qubits, n_layers)
print(f"Circuit has {n_params} parameters")

# Compile once
qc = get_qc("4q-qvm")
exe = qc.compile(prog)

# Run with random parameters
angles = np.random.uniform(0, 2 * np.pi, n_params)
result = qc.run(exe, memory_map={"theta": angles}).get_register_map()["ro"]
print(f"Bitstring shape: {result.shape}")  # (1000, 4)
print(f"Mean measurements: {result.mean(axis=0)}")

The parameter sharing pattern (reusing theta[param_idx] for both RY and RZ within the same layer) reduces the total parameter count while maintaining expressivity. In practice, you might want independent parameters for each gate. Adjust the indexing to suit your ansatz design.

Gradient Computation with Parameter Shift

Many quantum optimization algorithms benefit from gradient-based optimizers. The parameter-shift rule provides an exact gradient for parametric quantum gates: for a gate of the form exp(-i * theta * G / 2) where G has eigenvalues +/-1, the partial derivative of the expectation value with respect to theta is:

df/d(theta) = [f(theta + pi/2) - f(theta - pi/2)] / 2

This requires evaluating the circuit twice per parameter (once with the angle shifted up by pi/2, once shifted down). With parametric compilation, each evaluation is a fast parameter injection rather than a full recompilation.

from pyquil import Program, get_qc
from pyquil.gates import RY, CZ, MEASURE
import numpy as np

def build_simple_ansatz():
    """2-qubit ansatz with 4 RY parameters."""
    prog = Program()
    theta = prog.declare("theta", "REAL", 4)

    prog += RY(theta[0], 0)
    prog += RY(theta[1], 1)
    prog += CZ(0, 1)
    prog += RY(theta[2], 0)
    prog += RY(theta[3], 1)

    ro = prog.declare("ro", "BIT", 2)
    prog += MEASURE(0, ro[0])
    prog += MEASURE(1, ro[1])
    prog.wrap_in_numshots_loop(2000)
    return prog

def expectation_z0z1(bitstrings):
    """
    Compute <Z0 Z1> from measurement bitstrings.
    Z eigenvalue: 0 -> +1, 1 -> -1
    """
    z0 = 1 - 2 * bitstrings[:, 0]
    z1 = 1 - 2 * bitstrings[:, 1]
    return np.mean(z0 * z1)

def evaluate_cost(exe, qc, params):
    """Run the circuit and return the expectation value."""
    result = qc.run(exe, memory_map={"theta": params}).get_register_map()["ro"]
    return expectation_z0z1(result)

def parameter_shift_gradient(exe, qc, params, shift=np.pi / 2):
    """
    Compute the gradient using the parameter-shift rule.
    For each parameter, evaluate f(theta + shift) and f(theta - shift).
    Total circuit evaluations: 2 * len(params).
    """
    gradient = np.zeros_like(params)
    for i in range(len(params)):
        # Shift parameter i up
        params_plus = params.copy()
        params_plus[i] += shift

        # Shift parameter i down
        params_minus = params.copy()
        params_minus[i] -= shift

        # Evaluate both shifts
        f_plus = evaluate_cost(exe, qc, params_plus)
        f_minus = evaluate_cost(exe, qc, params_minus)

        # Parameter-shift formula
        gradient[i] = (f_plus - f_minus) / 2.0

    return gradient

# Build and compile once
qc = get_qc("2q-qvm")
prog = build_simple_ansatz()
exe = qc.compile(prog)

# Gradient descent loop
params = np.random.uniform(0, 2 * np.pi, 4)
learning_rate = 0.3
n_steps = 30

print("Step | Cost     | Gradient Norm")
print("-----|----------|-------------")
for step in range(n_steps):
    cost = evaluate_cost(exe, qc, params)
    grad = parameter_shift_gradient(exe, qc, params)
    grad_norm = np.linalg.norm(grad)

    print(f"{step:4d} | {cost:+.4f} | {grad_norm:.4f}")

    # Update parameters
    params = params - learning_rate * grad

    # Check convergence
    if grad_norm < 0.01:
        print(f"Converged at step {step}")
        break

print(f"\nFinal parameters: {params}")
print(f"Final cost: {evaluate_cost(exe, qc, params):.4f}")

For 4 parameters, each gradient computation requires 8 circuit evaluations (2 per parameter). Over 30 optimization steps, that is 240 circuit evaluations plus 30 cost evaluations, for 270 total. With parametric compilation, all 270 evaluations use the same compiled executable. Without it, you would compile the circuit 270 times.

Rigetti’s Native Gate Set

Rigetti’s superconducting hardware uses a restricted native gate set. Quilc translates arbitrary unitaries into sequences of these native gates:

Native Gate	Description
`RZ(theta, q)`	Z-rotation by angle theta (virtual, zero duration)
`RX(k * pi/2, q)`	X-rotation by multiples of pi/2 only
`ISWAP(q0, q1)`	iSWAP two-qubit gate (the Ankaa-class native entangler)
`CZ(q0, q1)`	Controlled-Z two-qubit gate (QVM default target, Cepheus systems, older Aspen devices)
`CPHASE(theta, q0, q1)`	Controlled phase gate (older Aspen-class devices)

Rigetti’s Ankaa-3 Topology

Rigetti’s Ankaa-3 processor (84 qubits, launched December 2024) arranges its qubits in a square lattice. Older docs may reference Ankaa-2, which has been retired from QCS. Each qubit connects to up to 4 neighbors (left, right, above, below), with edge qubits having fewer connections. Here is a simplified view of a portion of the Ankaa-3 connectivity:

    0 --- 1 --- 2 --- 3 --- 4
    |     |     |     |     |
    5 --- 6 --- 7 --- 8 --- 9
    |     |     |     |     |
   10 ---11 ---12 ---13 ---14
    |     |     |     |     |
   15 ---16 ---17 ---18 ---19
    |     |     |     |     |
   20 ---21 ---22 ---23 ---24

Each line segment represents a direct CZ (or ISWAP) connection. Qubit 0 can interact directly with qubits 1 and 5. Qubit 6 can interact directly with qubits 1, 5, 7, and 11. Two-qubit gates between non-adjacent qubits require SWAP routing.

Consider a 4-qubit circuit that needs a CZ between qubit 0 and qubit 6. On this topology, qubits 0 and 6 are not adjacent (0 connects to 1 and 5; 6 connects to 1, 5, 7, and 11). Quilc inserts SWAP gates to route the qubit states:

from pyquil import Program, get_qc
from pyquil.gates import H, CZ, MEASURE

prog = Program()
# Attempt a CZ between logical qubits 0 and 6
# On Ankaa-3, these may or may not be adjacent depending on
# the physical qubit assignment Quilc chooses
prog += H(0)
prog += H(6)
prog += CZ(0, 6)

ro = prog.declare("ro", "BIT", 2)
prog += MEASURE(0, ro[0])
prog += MEASURE(6, ro[1])

# Use the Ankaa-3 device (or QVM with its topology)
# qc = get_qc("Ankaa-3")  # real hardware
qc = get_qc("9q-square-qvm")  # QVM with square topology

native = qc.compiler.quil_to_native_quil(prog)
print("Native program with routing:")
print(native)

In the native output, you will see that Quilc has mapped logical qubits to physical qubits and potentially inserted SWAP operations (decomposed into CZ and single-qubit gates) to bring non-adjacent qubits together. This routing overhead is why minimizing long-range interactions in your ansatz design matters for real hardware performance.

Topology-Aware Ansatz Design

When writing circuits for Ankaa-3, restrict your two-qubit gates to pairs that are physically adjacent. A linear nearest-neighbor entangling pattern (CZ between qubits q and q+1 for consecutive physical qubits) avoids all SWAP overhead:

from pyquil import Program
from pyquil.gates import RY, CZ

prog = Program()
theta = prog.declare("theta", "REAL", 8)

# Use physical qubits that form a line on the lattice
# For example, qubits 0, 1, 2, 3 on the top row of Ankaa-3
physical_qubits = [0, 1, 2, 3]

for i, q in enumerate(physical_qubits):
    prog += RY(theta[i], q)

# Only CZ between adjacent pairs: no SWAP needed
for i in range(len(physical_qubits) - 1):
    prog += CZ(physical_qubits[i], physical_qubits[i + 1])

for i, q in enumerate(physical_qubits):
    prog += RY(theta[4 + i], q)

Understanding Quilc Optimization

Quilc does more than just translate gates. It applies several optimization passes to reduce the compiled circuit depth and gate count. Understanding these passes helps you write circuits that compile efficiently.

Key Optimization Passes

Gate fusion: When multiple single-qubit gates act on the same qubit in sequence, Quilc fuses them into a single RZ-RX-RZ decomposition. Three arbitrary rotations on the same qubit become one RZ-RX-RZ sequence.

Dead gate elimination: If a sequence of gates computes to the identity (for example, RZ(pi) followed by RZ(-pi)), Quilc removes the pair entirely.

Qubit mapping: Quilc assigns logical qubits to physical qubits on the target device, choosing an assignment that minimizes the number of SWAP gates needed for the circuit’s two-qubit interactions.

Here is a before-and-after comparison:

from pyquil import Program, get_qc
from pyquil.gates import RY, RX, RZ, H, CNOT, MEASURE

prog = Program()
theta = prog.declare("theta", "REAL", 2)

# Deliberately inefficient: multiple redundant single-qubit gates
prog += H(0)
prog += RZ(theta[0], 0)
prog += RX(0.5, 0)
prog += RZ(0.3, 0)
prog += RY(theta[1], 1)
prog += RX(0.0, 1)  # Identity rotation, should be eliminated
prog += CNOT(0, 1)

ro = prog.declare("ro", "BIT", 2)
prog += MEASURE(0, ro[0])
prog += MEASURE(1, ro[1])

qc = get_qc("2q-qvm")
native = qc.compiler.quil_to_native_quil(prog)

print("Original program (abstract Quil):")
print(prog)
print("\nCompiled program (native Quil):")
print(native)

In the compiled output, Quilc fuses the H, RZ, RX, and RZ on qubit 0 into a single RZ-RX-RZ sequence. The RX(0.0) on qubit 1 (which is an identity operation) gets absorbed into the surrounding gates. The CNOT decomposes into single-qubit rotations and a CZ gate.

Counting Gate Reductions

You can count gates programmatically:

def count_gates(program):
    """Count the number of gate instructions in a program."""
    count = 0
    for inst in program.instructions:
        if hasattr(inst, 'name') and inst.name not in ('DECLARE', 'MEASURE',
                                                         'PRAGMA', 'HALT'):
            count += 1
    return count

original_gates = count_gates(prog)
native_gates = count_gates(native)
print(f"Original gates: {original_gates}")
print(f"Native gates: {native_gates}")

Compiling for the Ankaa Topology

When you target a real device name instead of the QVM, Quilc automatically uses the device’s connectivity and native gate set:

from pyquil import Program, get_qc
from pyquil.gates import RY, CZ, MEASURE
import numpy as np

prog = Program()
theta = prog.declare("theta", "REAL", 3)

# A 3-qubit ansatz
prog += RY(theta[0], 0)
prog += RY(theta[1], 1)
prog += RY(theta[2], 2)
prog += CZ(0, 1)
prog += CZ(1, 2)
prog += RY(theta[0], 0)
prog += RY(theta[1], 1)
prog += RY(theta[2], 2)

ro = prog.declare("ro", "BIT", 3)
prog += MEASURE(0, ro[0])
prog += MEASURE(1, ro[1])
prog += MEASURE(2, ro[2])
prog.wrap_in_numshots_loop(500)

# Connect to QCS for real hardware -- requires QCS credentials
# qc = get_qc("Ankaa-3")
# For local testing use:
qc = get_qc("3q-qvm")

native = qc.compiler.quil_to_native_quil(prog)
print("Native Quil program:")
print(native)

exe = qc.compiler.native_quil_to_executable(native)

angles = np.array([np.pi / 4, np.pi / 3, np.pi / 6])
result = qc.run(exe, memory_map={"theta": angles}).get_register_map()["ro"]
print("Measurement counts:", result.sum(axis=0))

Printing native shows exactly which RZ, RX, CZ, and ISWAP instructions Quilc generated, useful for understanding gate overhead and optimizing your ansatz to minimize two-qubit gate count.

Expectation Values from Pauli Operators

VQE and other variational algorithms compute energy expectation values from measurement results. The Hamiltonian is expressed as a sum of Pauli terms (like Z0, Z1, Z0Z1, X0X1), and each term’s expectation value is estimated from bitstring statistics.

For the Z operator, the eigenvalue mapping is: measuring |0> gives +1, measuring |1> gives -1. For multi-qubit Pauli-Z terms, you multiply the individual Z eigenvalues.

import numpy as np

def expectation_z(bitstrings, qubit_idx):
    """
    Compute <Z_i> from measurement bitstrings.
    bitstrings: numpy array of shape (n_shots, n_qubits), values 0 or 1
    qubit_idx: which qubit to compute the Z expectation for
    """
    # Map 0 -> +1, 1 -> -1
    z_values = 1 - 2 * bitstrings[:, qubit_idx]
    return np.mean(z_values)

def expectation_zz(bitstrings, qubit_i, qubit_j):
    """
    Compute <Z_i Z_j> from measurement bitstrings.
    Product of individual Z eigenvalues.
    """
    zi = 1 - 2 * bitstrings[:, qubit_i]
    zj = 1 - 2 * bitstrings[:, qubit_j]
    return np.mean(zi * zj)

def expectation_from_counts(bitstrings, qubit_i, qubit_j):
    """
    Alternative computation using probability counts.
    <Z_i Z_j> = P(00) + P(11) - P(01) - P(10)
    where the bits refer to qubits i and j.
    """
    n_shots = len(bitstrings)
    bi = bitstrings[:, qubit_i]
    bj = bitstrings[:, qubit_j]

    p00 = np.sum((bi == 0) & (bj == 0)) / n_shots
    p01 = np.sum((bi == 0) & (bj == 1)) / n_shots
    p10 = np.sum((bi == 1) & (bj == 0)) / n_shots
    p11 = np.sum((bi == 1) & (bj == 1)) / n_shots

    return p00 + p11 - p01 - p10

# Example: compute energy for H = 0.5*Z0 + 0.3*Z1 + 0.2*Z0*Z1
def compute_energy(bitstrings):
    """Compute <H> = 0.5*<Z0> + 0.3*<Z1> + 0.2*<Z0 Z1>."""
    ez0 = expectation_z(bitstrings, 0)
    ez1 = expectation_z(bitstrings, 1)
    ezz = expectation_zz(bitstrings, 0, 1)
    return 0.5 * ez0 + 0.3 * ez1 + 0.2 * ezz

# Use with a parametric circuit
from pyquil import Program, get_qc
from pyquil.gates import RY, CZ, MEASURE

prog = Program()
theta = prog.declare("theta", "REAL", 4)

prog += RY(theta[0], 0)
prog += RY(theta[1], 1)
prog += CZ(0, 1)
prog += RY(theta[2], 0)
prog += RY(theta[3], 1)

ro = prog.declare("ro", "BIT", 2)
prog += MEASURE(0, ro[0])
prog += MEASURE(1, ro[1])
prog.wrap_in_numshots_loop(4000)

qc = get_qc("2q-qvm")
exe = qc.compile(prog)

# Evaluate energy at a specific parameter point
params = np.array([0.5, 1.2, 0.8, 0.3])
bitstrings = qc.run(exe, memory_map={"theta": params}).get_register_map()["ro"]

energy = compute_energy(bitstrings)
print(f"<Z0> = {expectation_z(bitstrings, 0):.4f}")
print(f"<Z1> = {expectation_z(bitstrings, 1):.4f}")
print(f"<Z0 Z1> = {expectation_zz(bitstrings, 0, 1):.4f}")
print(f"Energy <H> = {energy:.4f}")

For Pauli-X and Pauli-Y terms, you need to apply a basis rotation before measurement. To measure in the X basis, apply a Hadamard gate before the MEASURE instruction. To measure in the Y basis, apply RX(-pi/2) before MEASURE. Each Pauli term may require a separate circuit (or a separate set of basis rotations), though terms that commute can share measurements.

A VQE-Style Optimization Loop

Putting it all together: compile once, optimize in a loop.

from pyquil import Program, get_qc
from pyquil.gates import RY, RZ, CZ, MEASURE
from scipy.optimize import minimize
import numpy as np

def build_ansatz():
    prog = Program()
    theta = prog.declare("theta", "REAL", 4)
    prog += RZ(theta[0], 0)
    prog += RY(theta[1], 0)
    prog += RZ(theta[2], 1)
    prog += RY(theta[3], 1)
    prog += CZ(0, 1)
    prog += RY(theta[1], 0)
    prog += RY(theta[3], 1)
    ro = prog.declare("ro", "BIT", 2)
    prog += MEASURE(0, ro[0])
    prog += MEASURE(1, ro[1])
    prog.wrap_in_numshots_loop(500)
    return prog

qc = get_qc("2q-qvm")
prog = build_ansatz()
exe = qc.compiler.native_quil_to_executable(
    qc.compiler.quil_to_native_quil(prog)
)

def cost(params):
    result = qc.run(exe, memory_map={"theta": params}).get_register_map()["ro"]
    # Simple cost: minimize <Z0 x Z1> = 1 - 2*(P(01) + P(10))
    counts = result
    n = len(counts)
    excited = ((counts[:, 0] != counts[:, 1]).sum()) / n
    return excited

x0 = np.random.uniform(0, 2 * np.pi, 4)
res = minimize(cost, x0, method="COBYLA", options={"maxiter": 100})
print(f"Optimized cost: {res.fun:.4f}")
print(f"Optimal angles: {res.x}")

Because exe is compiled once outside the optimization loop, the cost function body is fast: just runtime parameter injection and measurement.

Batched VQE with Convergence Tracking

The simple optimization loop above works, but a production VQE implementation needs convergence monitoring, timing information, and proper termination criteria. Here is a more complete version:

import time
import numpy as np
from pyquil import Program, get_qc
from pyquil.gates import RY, RZ, CZ, MEASURE
from scipy.optimize import minimize

def build_vqe_ansatz(n_qubits=2, n_layers=2):
    """Build a layered ansatz and return the program and parameter count."""
    prog = Program()
    n_params = n_qubits * n_layers * 2  # RZ and RY per qubit per layer
    theta = prog.declare("theta", "REAL", n_params)

    idx = 0
    for layer in range(n_layers):
        for q in range(n_qubits):
            prog += RZ(theta[idx], q)
            idx += 1
            prog += RY(theta[idx], q)
            idx += 1
        for q in range(n_qubits - 1):
            prog += CZ(q, q + 1)

    ro = prog.declare("ro", "BIT", n_qubits)
    for q in range(n_qubits):
        prog += MEASURE(q, ro[q])
    prog.wrap_in_numshots_loop(2000)
    return prog, n_params

def compute_zz_energy(bitstrings):
    """Compute <Z0 Z1> as the cost function."""
    z0 = 1 - 2 * bitstrings[:, 0]
    z1 = 1 - 2 * bitstrings[:, 1]
    return np.mean(z0 * z1)

# Build and compile
qc = get_qc("2q-qvm")
prog, n_params = build_vqe_ansatz(n_qubits=2, n_layers=2)

compile_start = time.time()
exe = qc.compile(prog)
compile_time = time.time() - compile_start
print(f"Compilation time: {compile_time:.3f}s (one-time cost)")

# Track optimization progress
cost_history = []
eval_times = []

def tracked_cost(params):
    """Cost function with timing and history tracking."""
    start = time.time()
    result = qc.run(exe, memory_map={"theta": params}).get_register_map()["ro"]
    elapsed = time.time() - start
    eval_times.append(elapsed)

    energy = compute_zz_energy(result)
    cost_history.append(energy)
    return energy

# Run COBYLA optimization
x0 = np.random.uniform(0, 2 * np.pi, n_params)

opt_start = time.time()
res = minimize(
    tracked_cost,
    x0,
    method="COBYLA",
    options={
        "maxiter": 200,
        "rhobeg": 0.5,   # Initial step size
        "catol": 1e-4,    # Constraint absolute tolerance
    },
)
opt_time = time.time() - opt_start

# Print results
print(f"\nOptimization complete:")
print(f"  Final energy: {res.fun:+.4f}")
print(f"  Function evaluations: {res.nfev}")
print(f"  Total optimization time: {opt_time:.2f}s")
print(f"  Average eval time: {np.mean(eval_times):.4f}s")
print(f"  Compile time (amortized over {res.nfev} evals): "
      f"{compile_time / res.nfev:.5f}s per eval")

# Print convergence trace
print(f"\nConvergence trace (every 10 evals):")
for i in range(0, len(cost_history), 10):
    print(f"  Eval {i:4d}: energy = {cost_history[i]:+.4f}")

The key performance insight: the compilation time is paid once, and the per-evaluation time stays constant regardless of how many iterations the optimizer needs. For a 200-iteration optimization, the amortized compilation cost per evaluation drops to nearly zero.

Submitting to Rigetti QCS

For real hardware, swap the QVM endpoint for a QCS device:

from pyquil import get_qc

# Requires ~/.qcs/settings.toml with valid credentials
qc = get_qc("Ankaa-3")
exe = qc.compiler.native_quil_to_executable(
    qc.compiler.quil_to_native_quil(prog)
)
result = qc.run(exe, memory_map={"theta": angles})

QCS manages job queuing, calibration data, and readout error mitigation. The get_qc abstraction handles the difference between QVM and QCS transparently.

Error Handling and QCS Integration

Working with Quilc and QCS introduces several practical issues that you should handle in production code.

QCS Credentials and Environment

Rigetti QCS requires authentication credentials stored in ~/.qcs/settings.toml. You can also set credentials through environment variables:

# Set QCS credentials via environment variables
export QCS_SETTINGS_APPLICATIONS_PYQUIL_QVM_URL="http://localhost:5000"
export QCS_SETTINGS_APPLICATIONS_PYQUIL_QUILC_URL="tcp://localhost:5555"

To list available devices from QCS:

from pyquil import list_quantum_computers

# List all quantum computers available to your account
available = list_quantum_computers()
print("Available devices:", available)

QVM vs. Real Hardware

The distinction between simulation and real hardware is a single string:

from pyquil import get_qc

# Local QVM simulation (no credentials needed)
qc_sim = get_qc("2q-qvm")

# QVM with realistic noise model from a real device
qc_noisy = get_qc("Ankaa-3", as_qvm=True)

# Real Ankaa-3 hardware (requires QCS credentials)
# qc_real = get_qc("Ankaa-3")

Using as_qvm=True with a real device name gives you a QVM simulator that uses the device’s actual noise model and connectivity. This is useful for testing before spending QPU time.

Handling Compiler Errors

Quilc raises errors when it encounters programs it cannot compile. Common causes include referencing qubit indices outside the device topology or using unsupported gate types:

from pyquil import Program, get_qc
from pyquil.gates import RY, CZ

try:
    prog = Program()
    theta = prog.declare("theta", "REAL", 1)
    # Qubit 999 does not exist on a 2-qubit QVM
    prog += RY(theta[0], 999)

    qc = get_qc("2q-qvm")
    native = qc.compiler.quil_to_native_quil(prog)
except Exception as e:
    print(f"Compiler error: {e}")
    # Handle gracefully: log the error, try a different qubit mapping, etc.

Always validate your qubit indices against the target device before compilation. For real devices, query the device topology to get the list of valid qubit indices and their connections.

Quil-T Parameters and Pulse-Level Parametrics

pyQuil also exposes Quil-T (Quil-Timing), a pulse-level extension of Quil that gives you direct control over the microwave pulses applied to qubits. While the declare / REAL memory approach parametrizes at the gate level, Quil-T and the Parameter class from pyquil.quilatom let you parametrize at the pulse level.

Why Pulse-Level Control Matters

Standard gate-level programming limits you to the gate set that Quilc knows about. With Quil-T, you can:

Define custom gate calibrations with specific pulse shapes
Vary pulse amplitude, duration, or frequency as continuous parameters
Implement optimal control pulses that do not decompose cleanly into standard gates
Run randomized benchmarking and tomography experiments with fine-grained timing

Quil-T Basics

Quil-T extends Quil with timing directives, waveform definitions, and pulse instructions. A simple Quil-T program looks like this at the Quil text level:

DEFFRAME 0 "rf":
    DIRECTION: "tx"
    INITIAL-FREQUENCY: 5.2e9
    CENTER-FREQUENCY: 5.2e9
    HARDWARE-OBJECT: "q0_rf"

DEFWAVEFORM my_gaussian:
    1.0, 0.9, 0.6, 0.3, 0.1

PULSE 0 "rf" my_gaussian

Using Quil-T Parameters in pyQuil

The Parameter class from pyquil.quilatom creates a symbolic parameter for pulse-level programs (it prints as %name in Quil text). Here is an example that defines a custom single-qubit rotation using a flat pulse with a parametric amplitude:

from pyquil import Program
from pyquil.quilbase import (
    DefFrame,
    DefWaveform,
    Pulse,
    FormalArgument,
)
from pyquil.quilatom import Frame, Parameter
from pyquil.quiltwaveforms import FlatWaveform
import numpy as np

# Create a parametric amplitude
amp = Parameter("amplitude")

# Build a Quil-T program with parametric pulse amplitude
prog = Program()

# Declare measurement readout
ro = prog.declare("ro", "BIT", 1)

# Define a flat waveform with parametric amplitude and 40ns duration
# The actual pulse shape depends on your calibration data
waveform = FlatWaveform(duration=40e-9, iq=amp + 0j)

# In practice, you would define a DEFCAL (gate calibration) that uses
# this waveform, then call that gate in your program. Note that template
# waveform parameters must be resolved at compile time; for run-time
# sweeps of scale or frequency, combine REAL memory regions with Quil-T
# instructions like SET-SCALE and SHIFT-FREQUENCY.

This is an advanced feature relevant to quantum hardware researchers doing calibration and optimal control. For most variational algorithm work, the declare / REAL memory approach is simpler and sufficient.

When to Use Gate-Level vs. Pulse-Level Parametrics

Feature	`declare` / REAL	Quil-T parameters
Ease of use	Simple, recommended for most users	Complex, requires calibration knowledge
Parameter type	Gate rotation angles	Pulse amplitudes, durations, frequencies
Compilation	Quilc handles gate decomposition	You define the pulse directly
Use case	VQE, QAOA, variational algorithms	Calibration, optimal control, benchmarking
Hardware requirement	Any Rigetti device	Requires pulse-level access on QCS

Common Mistakes

Here are the most frequent pitfalls when working with pyQuil parametric programs, along with how to avoid each one.

1. Forgetting `wrap_in_numshots_loop`

If you compile a program without calling wrap_in_numshots_loop, the circuit executes exactly once per qc.run call. You get a single bitstring instead of the 1000 you expected.

# Wrong: only 1 shot
prog = Program()
prog += RY(theta[0], 0)
ro = prog.declare("ro", "BIT", 1)
prog += MEASURE(0, ro[0])
exe = qc.compile(prog)
result = qc.run(exe).get_register_map()["ro"]
print(result.shape)  # (1, 1) -- only one shot!

# Correct: 1000 shots
prog.wrap_in_numshots_loop(1000)
exe = qc.compile(prog)
result = qc.run(exe).get_register_map()["ro"]
print(result.shape)  # (1000, 1)

Always call wrap_in_numshots_loop before compilation.

2. Modifying the Program After Compilation

The compiled executable is a snapshot of the program at compilation time. Adding gates to the original Program object after compilation has no effect on the executable.

prog = Program()
theta = prog.declare("theta", "REAL", 2)
prog += RY(theta[0], 0)
ro = prog.declare("ro", "BIT", 1)
prog += MEASURE(0, ro[0])
prog.wrap_in_numshots_loop(1000)

exe = qc.compile(prog)

# This does NOTHING to the compiled executable
prog += RY(theta[1], 0)

# The executable still runs the original 1-gate circuit
result = qc.run(exe, memory_map={"theta": [0.5, 1.0]}).get_register_map()["ro"]

If you need to change the circuit structure, you must recompile.

3. Wrong dtype for memory_map Values

The memory_map values must be numpy arrays with float64 dtype for REAL parameters. Passing Python lists sometimes works, but passing integers or wrong dtypes causes silent errors or crashes.

import numpy as np

# Wrong: Python list of ints
result = qc.run(exe, memory_map={"theta": [1, 2, 3, 4]})

# Wrong: float32 array
result = qc.run(exe, memory_map={"theta": np.array([1.0, 2.0], dtype=np.float32)})

# Correct: float64 numpy array
result = qc.run(exe, memory_map={"theta": np.array([1.0, 2.0], dtype=np.float64)})

# Also correct: numpy defaults to float64
result = qc.run(exe, memory_map={"theta": np.array([1.0, 2.0])})

4. Using get_qc(“Ankaa-3”) Without Credentials

Attempting to connect to real hardware without QCS credentials configured raises a connection error. Always test with the QVM first.

from pyquil import get_qc

# This fails without ~/.qcs/settings.toml
try:
    qc = get_qc("Ankaa-3")
except Exception as e:
    print(f"QCS connection failed: {e}")
    print("Falling back to QVM")
    qc = get_qc("2q-qvm")

5. Not Resetting Qubits in Mid-Circuit Reuse

If your program reuses qubits (measures and then applies more gates), you must explicitly reset the qubit first. Without a RESET, the qubit retains its post-measurement state.

from pyquil.gates import RESET

prog = Program()
theta = prog.declare("theta", "REAL", 2)
ro = prog.declare("ro", "BIT", 2)

# First use of qubit 0
prog += RY(theta[0], 0)
prog += MEASURE(0, ro[0])

# Reset before reuse
prog += RESET(0)

# Second use of qubit 0
prog += RY(theta[1], 0)
prog += MEASURE(0, ro[1])

6. Using Unsupported Gate Types

Quilc supports a specific set of abstract gates. If you construct a custom unitary using DefGate without providing a native decomposition, Quilc may fail to compile it. Stick to standard gates (H, CNOT, RX, RY, RZ, CZ, SWAP, etc.) or provide explicit decompositions for custom gates.

from pyquil.quilbase import DefGate
import numpy as np

# Define a custom gate
sqrt_x_matrix = np.array([
    [0.5 + 0.5j, 0.5 - 0.5j],
    [0.5 - 0.5j, 0.5 + 0.5j]
])
sqrt_x_def = DefGate("SQRT-X", sqrt_x_matrix)
SQRT_X = sqrt_x_def.get_constructor()

prog = Program()
prog += sqrt_x_def  # Must include the definition in the program
prog += SQRT_X(0)

# Quilc can handle this because it can decompose any single-qubit
# unitary into RZ-RX-RZ. For multi-qubit custom gates, provide
# the decomposition yourself.

Key Takeaways

Compile once with quil_to_native_quil and native_quil_to_executable, then pass parameters via memory_map at runtime to avoid compilation overhead in optimization loops.
For a 500-iteration optimization, parametric compilation reduces total compilation time from hundreds of seconds to a single compilation of about 2 seconds.
Quilc automatically maps circuits to Rigetti’s native gate set (RZ, RX, CZ, ISWAP) and handles qubit routing for the Ankaa topology.
Use separate DECLARE regions for different parameter groups (single-qubit angles, two-qubit angles, layer-specific parameters) to keep your code organized.
The parameter-shift rule provides exact gradients at a cost of 2 circuit evaluations per parameter, and all evaluations reuse the same compiled executable.
Compute expectation values from bitstrings by mapping measurement outcomes to Pauli eigenvalues: |0> maps to +1 and |1> maps to -1 for Z-basis measurements.
Design your ansatz to match the Ankaa-3 square lattice topology, restricting CZ gates to adjacent qubit pairs to avoid SWAP routing overhead.
The QVM is a perfect local stand-in for developing and debugging before spending real QCS credits. Use as_qvm=True with a real device name for noise-aware simulation.
VQE, QAOA, and other variational algorithms map naturally onto this compile-once-run-many pattern.
For pulse-level parametrics, Quil-T provides fine-grained control over microwave pulses, useful for calibration and optimal control research.