Using PennyLane with Amazon Braket
Connect PennyLane to Amazon Braket backends to run hybrid quantum-classical workflows, from local simulation to IonQ trapped-ion hardware.
Why PennyLane on Braket?
PennyLane and Amazon Braket solve different problems, and the combination is more powerful than either tool alone.
PennyLane is a differentiable quantum programming framework built for gradient-based optimization. It treats quantum circuits as differentiable functions, which means you can compute gradients of quantum computations the same way PyTorch computes gradients of neural networks. This makes it the natural choice for variational quantum algorithms, quantum machine learning, and any workflow where you need to optimize circuit parameters.
Amazon Braket is a managed quantum computing service that provides access to hardware from multiple vendors through a single API. Instead of setting up separate accounts with IonQ, Rigetti, IQM, and QuEra, you access all of them through your AWS account.
The PennyLane-Braket plugin bridges these two systems, and the combination delivers three concrete advantages:
-
Write once, run anywhere. You define your circuit in PennyLane and swap the backend device without touching the circuit code. The same QNode runs on a local simulator during development, a cloud simulator for larger tests, and a trapped-ion QPU for the real experiment.
-
Automatic gradient computation on real hardware. PennyLane implements the parameter-shift rule, which computes exact gradients by evaluating the circuit at shifted parameter values. This works on any backend that returns expectation values, including physical QPUs. You do not need finite-difference approximations or simulator-specific backpropagation.
-
Clean integration with ML frameworks. PennyLane QNodes can operate as PyTorch modules, JAX functions, or TensorFlow layers. This means you can build hybrid classical-quantum models where a neural network feeds parameters into a quantum circuit, and gradients flow through the entire pipeline. Running those quantum circuits on Braket hardware requires no changes to the classical ML code.
Installation
pip install amazon-braket-pennylane-plugin amazon-braket-sdk pennylane
This installs three packages:
pennylane: the core differentiable quantum frameworkamazon-braket-sdk: the Python SDK for Amazon Braketamazon-braket-pennylane-plugin: the bridge that registers Braket devices with PennyLane
After installation, PennyLane automatically recognizes the Braket device strings (braket.local.qubit, braket.aws.qubit). No manual registration is needed.
For PyTorch integration (covered later in this tutorial), also install:
pip install torch
Understanding Braket Device Types
The plugin provides access to several distinct backends. Choosing the right one depends on your circuit size, whether you need noise simulation, and how much you want to spend.
Device Comparison Table
| Device | Type | Max Qubits | Cost | Best For |
|---|---|---|---|---|
braket.local.qubit | Local simulator | ~25 (RAM limited) | Free | Development, debugging, unit tests |
| SV1 | Cloud statevector | 34 | 0.00075/qubit/circuit | Medium circuits, exact simulation |
| TN1 | Cloud tensor network | 50 | 0.00075/qubit/circuit | Shallow, wide circuits |
| DM1 | Cloud density matrix | 17 | 0.000075/qubit/circuit | Noise simulation |
| IonQ Aria | Trapped-ion QPU | 25 | ~$0.01/shot | High-fidelity experiments, all-to-all connectivity |
| Rigetti Ankaa-2 | Superconducting QPU | 84 | ~$0.0009/shot | Larger circuits, fast repetition rates |
| IQM Garnet | Superconducting QPU | 20 | ~$0.00145/shot | European availability, moderate circuit depths |
Local Simulator: braket.local.qubit
The local simulator runs entirely on your machine. It requires no AWS credentials and no internet connection.
import pennylane as qml
# No AWS setup needed. Runs on your CPU.
dev_local = qml.device("braket.local.qubit", wires=4, shots=1000)
Use this device for all development and debugging. It is fast for small circuits (under 20 qubits), free, and gives you immediate feedback. The practical qubit limit depends on your available RAM: statevector simulation requires 2^n complex amplitudes, so 25 qubits need about 1 GB and 30 qubits need about 32 GB.
Cloud Simulators: SV1, TN1, DM1
The cloud simulators run on AWS infrastructure and can handle larger circuits than your laptop. All three require AWS credentials and an S3 bucket for result storage.
# SV1: statevector simulator, up to 34 qubits
dev_sv1 = qml.device(
"braket.aws.qubit",
device_arn="arn:aws:braket:::device/quantum-simulator/amazon/sv1",
wires=10,
shots=1000,
s3_destination_folder=("my-braket-bucket", "sv1-results"),
)
# TN1: tensor network simulator, up to 50 qubits
# Best for circuits with limited entanglement depth
dev_tn1 = qml.device(
"braket.aws.qubit",
device_arn="arn:aws:braket:::device/quantum-simulator/amazon/tn1",
wires=30,
shots=1000,
s3_destination_folder=("my-braket-bucket", "tn1-results"),
)
# DM1: density matrix simulator, up to 17 qubits
# Supports noise models for realistic hardware simulation
dev_dm1 = qml.device(
"braket.aws.qubit",
device_arn="arn:aws:braket:::device/quantum-simulator/amazon/dm1",
wires=8,
shots=1000,
s3_destination_folder=("my-braket-bucket", "dm1-results"),
)
SV1 is the general-purpose choice. It performs exact statevector simulation, which means it tracks all 2^n amplitudes and gives you statistically exact expectation values (when shots=None) or sampled measurement outcomes (when shots are specified).
TN1 uses tensor network contraction, which makes it efficient for circuits where entanglement does not grow too fast. It handles up to 50 qubits but may be slow for highly entangled circuits. Use it when your circuit is wide (many qubits) but shallow (few layers of entangling gates).
DM1 tracks the full density matrix (2^n x 2^n), which means it can simulate noise channels like depolarizing noise and amplitude damping. The qubit limit is lower (17 qubits) because the density matrix has quadratically more entries than the statevector.
QPU Devices
To run on real quantum hardware, use the device ARN for the specific QPU.
# IonQ Aria trapped-ion QPU (us-east-1 region)
dev_ionq = qml.device(
"braket.aws.qubit",
device_arn="arn:aws:braket:us-east-1::device/qpu/ionq/Aria-1",
wires=4,
shots=1000,
s3_destination_folder=("my-braket-bucket", "ionq-results"),
)
# Rigetti Ankaa-2 superconducting QPU (us-west-1 region)
dev_rigetti = qml.device(
"braket.aws.qubit",
device_arn="arn:aws:braket:us-west-1::device/qpu/rigetti/Ankaa-2",
wires=4,
shots=1000,
s3_destination_folder=("my-braket-bucket", "rigetti-results"),
)
The key point: the same QNode (quantum function) runs on any of these devices. You switch backends by changing the device, not the circuit.
S3 Bucket Requirement
Every AWS-backed device (cloud simulators and QPUs) requires an S3 bucket to store results. This is because Braket tasks are asynchronous: when you submit a circuit, Braket queues it, runs it, and writes the results to S3. Your PennyLane session then polls S3 to retrieve those results.
This design means your results persist even if your local session crashes or disconnects. You can also inspect raw results in S3 after the fact, which is useful for debugging.
Creating an S3 Bucket for Braket
# Create a bucket in the same region as your target device
aws s3 mb s3://my-braket-bucket --region us-east-1
A few rules to keep in mind:
- The bucket must be in the same AWS region as the device you are targeting. IonQ Aria runs in
us-east-1, so your bucket must also be inus-east-1. - The bucket name must be globally unique across all of AWS.
- Your IAM user or role needs
s3:PutObjectands3:GetObjectpermissions on the bucket, plusbraket:*permissions for submitting tasks.
You pass the bucket information as a tuple of (bucket_name, key_prefix):
s3_destination_folder = ("my-braket-bucket", "experiment-2026-04")
Braket writes results under the given prefix, so you can organize experiments by date or project.
Defining a QNode
A QNode is PennyLane’s core abstraction: a quantum function bound to a specific device. When you call a QNode, PennyLane compiles the quantum function into instructions for the target device, executes it, and returns classical results.
Here we build a simple variational circuit with two layers of parameterized rotations and an entangling layer between them.
import pennylane as qml
import numpy as np
dev = qml.device("braket.local.qubit", wires=4, shots=1000)
@qml.qnode(dev)
def circuit(params):
# First layer: parameterized Y-rotations on each qubit
for i in range(4):
qml.RY(params[i], wires=i)
# Entangling layer: chain of CNOTs creates correlations
qml.CNOT(wires=[0, 1])
qml.CNOT(wires=[1, 2])
qml.CNOT(wires=[2, 3])
# Second layer: another set of Y-rotations
for i in range(4):
qml.RY(params[4 + i], wires=i)
# Measure the ZZ correlation between first and last qubit
return qml.expval(qml.PauliZ(0) @ qml.PauliZ(3))
# Evaluate the circuit with random parameters
params = np.random.uniform(0, np.pi, size=8)
result = circuit(params)
print(f"Expectation value: {result:.4f}")
The circuit has 8 trainable parameters (two per qubit across two layers). The return value is the expectation value of Z_0 Z_3, which measures the correlation between qubits 0 and 3. This value ranges from -1 (anti-correlated) to +1 (correlated).
Gradient Computation via Parameter Shift
PennyLane supports automatic differentiation of quantum circuits. On hardware and shot-based simulators, it uses the parameter-shift rule to compute exact analytical gradients.
The parameter-shift rule works as follows: for a gate of the form R(theta) = exp(-i * theta * G / 2), the derivative of the expectation value with respect to theta is:
d/d(theta) <circuit(theta)> = [ <circuit(theta + pi/2)> - <circuit(theta - pi/2)> ] / 2
Each parameter requires two extra circuit evaluations (one shifted by +pi/2 and one by -pi/2). PennyLane handles this automatically.
# Compute gradient with respect to all 8 parameters
grad_fn = qml.grad(circuit)
gradients = grad_fn(params)
print(f"Gradients: {np.round(gradients, 4)}")
Understanding the Cost of Gradients
For a circuit with n trainable parameters, the parameter-shift rule requires 2n additional circuit evaluations (plus one forward pass). Each evaluation becomes a separate Braket task.
Here is the arithmetic for a single gradient step with 8 parameters on IonQ Aria at 1000 shots:
- Forward pass: 1 task, 1000 shots at 10.00**
- Gradient evaluations: 16 tasks, each 1000 shots = $160.00
- Total cost for one gradient step: $170.00
This is why you develop on local simulators and use cloud resources deliberately. On the free local simulator, the same computation takes a few seconds and costs nothing.
For Rigetti Ankaa-2 at ~15.30 per gradient step. Cloud simulators like SV1 charge per task rather than per shot, making them much cheaper for gradient-heavy workloads.
Parallel Execution
By default, PennyLane submits Braket tasks sequentially: it waits for each circuit evaluation to finish before submitting the next one. For gradient computations, this means 17 sequential round-trips (1 forward + 16 shifts).
The plugin supports parallel task submission, which sends all independent circuits to Braket at once. This reduces wall-clock time significantly, especially for QPU tasks where queue times dominate.
# Enable parallel execution with max_parallel=10
dev_parallel = qml.device(
"braket.aws.qubit",
device_arn="arn:aws:braket:::device/quantum-simulator/amazon/sv1",
wires=4,
shots=1000,
s3_destination_folder=("my-braket-bucket", "parallel-results"),
max_parallel=10,
)
The max_parallel parameter controls how many tasks are submitted simultaneously. Setting it to 10 means up to 10 circuits run concurrently on Braket. For a parameter-shift gradient with 16 shift circuits, this reduces the number of sequential batches from 16 to 2.
Parallel execution does not reduce cost (you still pay for the same number of tasks and shots), but it can cut wall-clock time dramatically when tasks spend most of their time in the queue or waiting for hardware.
Native Gate Support by Device
Different QPUs support different sets of native gates. When you write a circuit using gates like qml.CNOT or qml.RY, PennyLane and Braket automatically compile these into the native gate set of the target device. This compilation is transparent, but it affects circuit depth and therefore noise.
Here is a summary of native gates for major Braket QPUs:
| QPU | Native Gates | Notes |
|---|---|---|
| IonQ Aria | GPi, GPi2, MS (Molmer-Sorensen) | All-to-all connectivity, no SWAP overhead |
| Rigetti Ankaa-2 | RZ, RX, CZ | Limited connectivity, SWAPs may be inserted |
| IQM Garnet | PRX, CZ | Star-like connectivity topology |
You can inspect the supported operations for any device programmatically:
import pennylane as qml
dev = qml.device("braket.local.qubit", wires=2)
# List all operations this device supports
print("Supported operations:")
for op in sorted(dev.operations):
print(f" {op}")
The practical consequence: a single CNOT on IonQ compiles to one MS gate (plus single-qubit corrections), while on Rigetti it compiles to single-qubit gates plus a CZ. If your qubits are not physically connected on Rigetti, the compiler inserts SWAP gates, which increases circuit depth. IonQ’s all-to-all connectivity avoids this entirely.
PyTorch Integration for Hybrid ML
One of PennyLane’s strongest features is its integration with classical ML frameworks. Here is how to use the PennyLane-Braket plugin with PyTorch to build a hybrid classical-quantum model.
The key is specifying interface="torch" in the QNode decorator and wrapping your parameters in torch.tensor with requires_grad=True.
import pennylane as qml
import torch
import numpy as np
dev = qml.device("braket.local.qubit", wires=2, shots=1000)
@qml.qnode(dev, interface="torch")
def quantum_layer(inputs, weights):
# Encode classical data into qubit rotations
qml.RY(inputs[0], wires=0)
qml.RY(inputs[1], wires=1)
# Parameterized entangling layer
qml.CNOT(wires=[0, 1])
qml.RY(weights[0], wires=0)
qml.RY(weights[1], wires=1)
return qml.expval(qml.PauliZ(0))
# Classical input data
x = torch.tensor([0.5, 0.8], dtype=torch.float64)
# Trainable quantum weights
weights = torch.tensor(
np.random.uniform(0, np.pi, size=2),
dtype=torch.float64,
requires_grad=True,
)
# Forward pass: returns a torch tensor with a grad_fn
output = quantum_layer(x, weights)
print(f"Output: {output.item():.4f}")
# Backward pass: gradients flow through the quantum circuit
output.backward()
print(f"Weight gradients: {weights.grad}")
In this example, quantum_layer behaves like any other PyTorch module. You can compose it with classical layers, use standard PyTorch optimizers like Adam, and train end-to-end. The gradients are computed via the parameter-shift rule on the quantum side and standard backpropagation on the classical side, and PennyLane stitches them together seamlessly.
To switch this from local simulation to real hardware, change only the device line. The PyTorch integration, gradient computation, and training loop all remain identical.
Practical VQE Example
The Variational Quantum Eigensolver (VQE) finds the ground state energy of a quantum Hamiltonian by optimizing a parameterized circuit. This is one of the most studied near-term quantum algorithms and a natural fit for PennyLane on Braket.
Here we find the ground state energy of a two-qubit Hamiltonian that describes two interacting spins:
H = -1.0 * Z_0 Z_1 + 0.5 * X_0 + 0.5 * X_1
The exact ground state energy of this Hamiltonian is -1.118 (you can verify this by diagonalizing the 4x4 matrix).
import pennylane as qml
import numpy as np
dev = qml.device("braket.local.qubit", wires=2, shots=2000)
@qml.qnode(dev)
def vqe_circuit(params):
# Prepare an ansatz with enough expressibility to reach the ground state
# Layer 1: general single-qubit rotations
qml.RY(params[0], wires=0)
qml.RY(params[1], wires=1)
# Entangling gate
qml.CNOT(wires=[0, 1])
# Layer 2: additional rotations after entanglement
qml.RY(params[2], wires=0)
qml.RY(params[3], wires=1)
# Measure the three terms of the Hamiltonian separately
# H = -1.0 * Z0 Z1 + 0.5 * X0 + 0.5 * X1
return (
-1.0 * qml.expval(qml.PauliZ(0) @ qml.PauliZ(1))
+ 0.5 * qml.expval(qml.PauliX(0))
+ 0.5 * qml.expval(qml.PauliX(1))
)
# Initialize parameters randomly
params = np.random.uniform(0, np.pi, size=4)
# Use PennyLane's gradient descent optimizer
opt = qml.GradientDescentOptimizer(stepsize=0.2)
print("Training VQE to find ground state energy...")
print(f"{'Step':>5} {'Energy':>10}")
print("-" * 18)
for step in range(80):
params = opt.step(vqe_circuit, params)
if (step + 1) % 10 == 0:
energy = vqe_circuit(params)
print(f"{step + 1:>5} {energy:>10.4f}")
final_energy = vqe_circuit(params)
print(f"\nFinal VQE energy: {final_energy:.4f}")
print(f"Exact ground state: -1.1180")
The optimizer adjusts the four rotation angles to minimize the energy expectation value. With enough steps and shots, the VQE converges close to the exact ground state energy.
A few practical notes on VQE:
- Shot noise introduces variance in the energy estimate. More shots reduce this variance but cost more on hardware. Start with 1000-2000 shots for development.
- Ansatz choice matters. The RY+CNOT+RY structure above is expressive enough for this two-qubit problem, but larger systems need deeper or hardware-efficient ansatze.
- Optimizer choice also matters. Gradient descent works but converges slowly. For noisy objectives, consider
qml.AdamOptimizeror SPSA (Simultaneous Perturbation Stochastic Approximation), which are more robust to shot noise.
Hybrid Training Loop: QAOA Example
Here is a QAOA workflow for a small Max-Cut problem. QAOA (Quantum Approximate Optimization Algorithm) alternates between a “cost layer” encoding the problem and a “mixer layer” that explores the solution space.
import pennylane as qml
import numpy as np
# 4-node ring graph: find a partition that cuts the most edges
edges = [(0, 1), (1, 2), (2, 3), (3, 0)]
dev = qml.device("braket.local.qubit", wires=4, shots=2000)
@qml.qnode(dev)
def qaoa_circuit(gammas, betas):
# Start in equal superposition of all bitstrings
for w in range(4):
qml.Hadamard(wires=w)
# p=2 QAOA layers (more layers = better approximation)
for layer in range(2):
# Cost layer: apply ZZ interactions on each edge
# This encodes the Max-Cut objective into phase kickbacks
for i, j in edges:
qml.CNOT(wires=[i, j])
qml.RZ(2 * gammas[layer], wires=j)
qml.CNOT(wires=[i, j])
# Mixer layer: X-rotations explore different solutions
for w in range(4):
qml.RX(2 * betas[layer], wires=w)
# Measure the cost function as sum of ZZ expectations
return sum(qml.expval(qml.PauliZ(i) @ qml.PauliZ(j)) for i, j in edges)
def cost_fn(params):
gammas = params[:2]
betas = params[2:]
return qaoa_circuit(gammas, betas)
# Optimize with gradient descent
opt = qml.GradientDescentOptimizer(stepsize=0.1)
params = np.random.uniform(0, np.pi, size=4)
print("Training QAOA for Max-Cut...")
print(f"{'Step':>5} {'Cost':>10}")
print("-" * 18)
for step in range(50):
params = opt.step(cost_fn, params)
if (step + 1) % 10 == 0:
val = cost_fn(params)
print(f"{step + 1:>5} {val:>10.4f}")
The cost value should decrease toward -4.0 (the minimum for a ring graph with 4 edges, corresponding to the optimal cut that separates alternating nodes).
Moving to Real Hardware
To run any of the examples above on IonQ Aria, change only the device line:
# Replace the local device:
# dev = qml.device("braket.local.qubit", wires=4, shots=2000)
# With the IonQ Aria QPU:
dev = qml.device(
"braket.aws.qubit",
device_arn="arn:aws:braket:us-east-1::device/qpu/ionq/Aria-1",
wires=4,
shots=2000,
s3_destination_folder=("my-braket-bucket", "ionq-results"),
)
IonQ’s trapped-ion architecture has all-to-all connectivity, so CNOT gates between any qubit pair execute without SWAP overhead. This is a significant advantage for algorithms like QAOA where the problem graph may not match a nearest-neighbor topology.
Be aware that QPU tasks are asynchronous. Braket queues them and PennyLane blocks until results return. Queue times vary from seconds to minutes depending on demand.
Cost Control Strategies
Running quantum circuits on real hardware can get expensive quickly, especially for gradient-based optimization. Here are strategies to keep costs manageable.
1. Develop locally first. Use braket.local.qubit for all development, debugging, and initial testing. It is free and fast for circuits under 20 qubits. Only move to cloud resources when you have a working circuit.
2. Use cloud simulators before QPUs. SV1 costs 1.28 on SV1, compared to $170 on IonQ.
3. Tune shot counts carefully. More shots give better gradient estimates but cost more on QPUs (which charge per shot). For early optimization steps where the parameters are far from optimal, low shot counts (100-500) often suffice. Increase shots in later steps as you fine-tune.
4. Reduce the number of parameters. Fewer parameters means fewer gradient evaluations. Hardware-efficient ansatze that reuse parameters or use structured circuits can dramatically cut the number of parameter-shift evaluations.
5. Set budget alerts. Use AWS Cost Explorer to set alerts at thresholds like 50, and $100. Braket charges appear under the “Amazon Braket” service in your billing console. You can also set a per-task spending limit in your Braket settings.
6. Check device availability windows. Some QPUs have scheduled availability windows. Check the Braket console or use the SDK to query device status before submitting tasks.
from braket.aws import AwsDevice
device = AwsDevice("arn:aws:braket:us-east-1::device/qpu/ionq/Aria-1")
print(f"Device status: {device.status}")
print(f"Device is available: {device.is_available}")
Common Mistakes and How to Fix Them
Forgetting the S3 bucket
If you create an AWS-backed device without specifying s3_destination_folder, or if the bucket does not exist, you get an error like BucketNotFound or AccessDenied.
Fix: Create the bucket first with aws s3 mb s3://my-bucket --region us-east-1 and ensure your IAM permissions include S3 access.
Wrong AWS region
The device ARN contains a region (e.g., us-east-1 for IonQ). Your S3 bucket and your AWS session must be in the same region.
Fix: Check the region in the ARN and make sure your bucket is in that region. You can set your default region with aws configure or the AWS_DEFAULT_REGION environment variable.
Using qml.state() on shot-based devices
qml.state() returns the full statevector, which only works on simulators running in exact (shots=None) mode. If you use it on a shot-based simulator or a QPU, you get an error.
Fix: Use qml.probs(wires=range(n)) to get the probability distribution over computational basis states, or use qml.expval() with a specific observable. Both work on all devices.
@qml.qnode(dev)
def correct_circuit(params):
qml.RY(params[0], wires=0)
qml.RY(params[1], wires=1)
qml.CNOT(wires=[0, 1])
# Works on all devices, including QPUs
return qml.probs(wires=[0, 1])
Setting shots=None on a QPU
Real quantum hardware always requires a finite number of measurement shots. You cannot run a QPU in exact statevector mode because the hardware can only return sampled measurement outcomes.
Fix: Always specify a positive integer for shots when targeting QPU devices. Values between 100 and 10,000 are typical.
Exceeding qubit limits
Each backend has a maximum qubit count. The local simulator is limited by your RAM (practically around 25 qubits on a laptop with 16 GB). SV1 handles up to 34 qubits. QPUs have device-specific limits.
Fix: Check the qubit limit for your target device before designing your circuit. For local development, keep circuits under 20 qubits to ensure fast iteration.
Not accounting for gradient cost
A common surprise: calling qml.grad(circuit)(params) on a circuit with n parameters does not evaluate one circuit. It evaluates 2n + 1 circuits. If each circuit costs money, your bill scales linearly with the parameter count.
Fix: Plan your parameter budget before moving to paid backends. If your circuit has 20 parameters and you plan 100 gradient steps, that is 4,100 circuit evaluations. Multiply by the per-task or per-shot cost to estimate your total spend.
Key Takeaways
- The PennyLane-Braket plugin lets you write circuits once and run them on local simulators, cloud simulators, or real QPUs from IonQ, Rigetti, and IQM.
- Parameter-shift gradients work on all Braket backends, enabling hardware-compatible training loops for VQE, QAOA, and quantum ML.
- Switching from simulation to hardware requires only a device change. No circuit modifications are needed.
- Every circuit evaluation becomes a Braket task. For gradient computation with
nparameters, expect2n + 1tasks per optimization step. Budget accordingly. - Use local simulation for development, cloud simulators for validation, and QPU access for final experiments. This workflow minimizes cost while maximizing iteration speed.
- The PyTorch/JAX/TensorFlow interfaces let you embed quantum circuits inside classical ML pipelines, with gradients flowing through the entire hybrid model.
Was this tutorial helpful?