Using D-Wave Leap Hybrid Solvers

D-Wave’s Advantage QPU has over 5,000 physical qubits, but the effective problem size is much smaller than that number suggests. The reason is embedding overhead. When you map a logical problem onto the QPU’s Pegasus topology, each logical variable may require multiple physical qubits chained together. For a problem with N fully connected variables, the Pegasus embedding requires roughly N²/5 physical qubits. That means around 150 fully connected variables already approach the capacity of a 5,000-qubit machine. Problems with hundreds or thousands of variables simply cannot fit.

D-Wave Leap’s hybrid solvers break through this ceiling. Instead of requiring you to embed the entire problem at once, hybrid solvers combine classical algorithms with targeted QPU access in a pipeline:

Classical preprocessing: The solver decomposes the large problem into smaller sub-problems using heuristic analysis of the problem structure.
QPU phase: The hardest sub-problems (those where classical methods get stuck) are sent to the quantum annealer, which can explore their solution landscapes more effectively.
Classical postprocessing: The solver combines the sub-problem solutions back into a single global solution and iterates.

This entire pipeline runs inside D-Wave’s cloud infrastructure. From your perspective, you submit the full BQM and get back a solution. The decomposition, QPU scheduling, and recombination all happen behind the scenes.

How the Hybrid Solver Works

D-Wave’s LeapHybridSampler uses a proprietary algorithm, but the general approach works roughly as follows:

Initialization: The solver creates an initial solution, either randomly or with a fast greedy heuristic that scans the linear biases.
Frustration detection: It identifies “frustrated” regions of the problem where the current variable assignments produce high energy. These are sub-graphs where flipping variables could reduce the objective value, but classical local search gets trapped in local minima.
Sub-QUBO extraction: The solver extracts small sub-QUBOs (typically 50 to 200 variables) from these frustrated regions. These sub-problems are small enough to embed directly on the QPU.
QPU solving: Each sub-QUBO is sent to the quantum annealer, which explores the sub-problem’s energy landscape using quantum tunneling to escape local minima that would trap classical methods.
Solution update: The QPU results replace the corresponding variables in the full solution. The solver then re-evaluates which regions are now frustrated and repeats.

The time_limit parameter controls how long this loop runs. The solver keeps the best solution found across all iterations. Longer time limits allow more iterations of this decompose-solve-recombine cycle, which generally improves solution quality with diminishing returns.

When to Use Hybrid vs. Direct QPU

Use the QPU directly when your problem is small enough to embed (roughly under 150 fully connected variables on Pegasus topology) and you want fine-grained control over annealing parameters. Use hybrid solvers when your problem exceeds the QPU’s effective capacity, or when you want a turnkey solution without managing embedding and chain strength yourself.

Choosing Between QPU, Hybrid, and Classical

The following table summarizes when to use each solver type:

Method	Problem Size	Control	Cost	When to Use
Direct QPU (`DWaveSampler`)	< 150 fully connected vars	Full (chain strength, annealing time, num_reads)	Per shot	Small problems where you need full QPU control and parameter tuning
Hybrid BQM (`LeapHybridSampler`)	Hundreds to millions of vars	Limited (`time_limit` only)	Per second of runtime	Large problems where embedding fails or is impractical
Simulated Annealing (`dwave.samplers`)	Any size (limited by RAM)	Full (sweeps, beta range, num_reads)	Free (runs locally)	Baseline validation, problems with no demonstrated QPU benefit
Hybrid CQM (`LeapHybridCQMSampler`)	Hundreds to millions of vars	High-level (constraint-based)	Per second of runtime	Problems better expressed with explicit constraints

A practical workflow for most projects: start with simulated annealing to validate your problem formulation, then move to the hybrid solver for production-scale instances. Use direct QPU access only when you need to experiment with annealing schedules or study hardware behavior.

Prerequisites

You need a D-Wave Leap account and API token. Set your token as an environment variable or configure it with the dwave CLI:

pip install dwave-ocean-sdk
dwave config create  # follow prompts to enter your API token

Basic Usage

from dwave.system import LeapHybridSampler
import dimod

# Create a hybrid sampler (requires Leap API token)
sampler = LeapHybridSampler()

# Check the solver's properties
print(f"Minimum time limit: {sampler.min_time_limit(dimod.BinaryQuadraticModel({}, {}, 0, 'BINARY'))}s")

Portfolio Optimization Example

Consider a simplified portfolio selection problem: choose a subset of 500 assets to maximize expected return while penalizing risk (covariance between selected assets). Each binary variable $x_i$ indicates whether asset $i$ is included in the portfolio.

The objective function balances two competing goals:

Maximize returns: Each asset $i$ has an expected return $r_i$ . Including asset $i$ (setting $x_i = 1$ ) adds $r_i$ to the portfolio’s expected return. Since QUBO is a minimization framework, we negate the returns so that minimizing energy maximizes return.
Minimize risk: The risk comes from covariance between assets. If assets $i$ and $j$ are both selected, their covariance $\sigma_{ij}$ contributes to portfolio volatility. The risk_weight parameter controls how much we penalize risk relative to return.

import numpy as np
from dwave.system import LeapHybridSampler
import dimod

np.random.seed(42)
n_assets = 500

# Synthetic data: expected returns and covariance
returns = np.random.uniform(0.01, 0.10, n_assets)
cov_matrix = np.random.uniform(-0.01, 0.02, (n_assets, n_assets))
cov_matrix = (cov_matrix + cov_matrix.T) / 2  # make symmetric
np.fill_diagonal(cov_matrix, np.random.uniform(0.01, 0.05, n_assets))

risk_weight = 0.5  # trade-off between return and risk

# Build the BQM
# Objective: maximize returns (minimize negative returns) + penalize risk
linear = {i: -returns[i] for i in range(n_assets)}
quadratic = {}
for i in range(n_assets):
    for j in range(i + 1, n_assets):
        if abs(cov_matrix[i, j]) > 0.005:  # sparse threshold
            quadratic[(i, j)] = risk_weight * cov_matrix[i, j]

bqm = dimod.BinaryQuadraticModel(linear, quadratic, 0.0, dimod.BINARY)
print(f"BQM: {bqm.num_variables} variables, {bqm.num_interactions} interactions")

# Solve with hybrid
hybrid_sampler = LeapHybridSampler()
hybrid_result = hybrid_sampler.sample(
    bqm,
    time_limit=10,  # seconds; minimum depends on problem size
)

best = hybrid_result.first
selected = [i for i in range(n_assets) if best.sample[i] == 1]
print(f"Selected {len(selected)} assets")
print(f"Hybrid energy: {best.energy:.6f}")

A few notes on this code:

Sparse threshold (abs(cov_matrix[i, j]) > 0.005): Many covariance pairs have negligibly small values. Filtering them out reduces the number of quadratic terms in the BQM, which speeds up both submission and solving. Without this threshold, a 500-variable fully connected problem would have ~125,000 quadratic terms. With it, only the pairs with meaningful correlation remain.
Linear biases are the negated returns. A more negative linear bias makes the solver prefer to set that variable to 1 (include that asset), since doing so lowers the energy.
Quadratic biases encode pairwise risk. A positive quadratic bias between assets $i$ and $j$ penalizes selecting both simultaneously, while a negative bias (from negative covariance) rewards selecting both as a hedge.

Interpreting the Output

The energy value represents the objective function evaluated at the returned solution. Since we negated the returns, a more negative energy means a better portfolio (higher return, lower risk). To recover the actual portfolio metrics:

# Calculate portfolio return and risk from the solution
portfolio_return = sum(returns[i] for i in selected)
portfolio_risk = sum(
    cov_matrix[i, j]
    for i in selected
    for j in selected
    if i < j
)
print(f"Portfolio return: {portfolio_return:.4f}")
print(f"Portfolio risk:   {portfolio_risk:.4f}")
print(f"Number of assets: {len(selected)} out of {n_assets}")

As a sanity check, verify that the solver selected a reasonable number of assets. If all 500 are selected, the risk penalty is too low. If zero or one are selected, the risk penalty is too high. Adjust risk_weight accordingly.

Constrained Quadratic Model (CQM)

The hybrid solver also accepts a ConstrainedQuadraticModel (CQM), which provides a higher-level interface than raw QUBO. With CQM, you can:

Use binary, integer, or continuous (discrete) variables
Specify constraints explicitly instead of encoding them as penalty terms
Let the solver handle constraint satisfaction internally

This is a significant advantage for problems that have natural constraints. In QUBO formulations, you must convert each constraint into a penalty term with a hand-tuned penalty weight. If the weight is too low, the solver ignores the constraint. If it is too high, the penalty dominates the objective and the solver spends all its effort satisfying constraints instead of optimizing. CQM eliminates this tuning entirely.

Knapsack Problem with CQM

The classic knapsack problem selects items to maximize total value without exceeding a weight limit. Here is a CQM formulation:

from dwave.system import LeapHybridCQMSampler
import dimod

# Define items: (value, weight)
items = {
    "laptop":     (600, 2.5),
    "camera":     (500, 1.5),
    "headphones": (150, 0.3),
    "book":       (100, 0.7),
    "jacket":     (250, 1.0),
    "charger":    (75,  0.2),
    "tablet":     (450, 0.8),
    "snacks":     (30,  0.5),
}

max_weight = 4.0  # kg

# Build the CQM
cqm = dimod.ConstrainedQuadraticModel()

# Create a binary variable for each item
x = {item: dimod.Binary(item) for item in items}

# Objective: maximize total value (minimize the negative)
cqm.set_objective(-sum(value * x[item] for item, (value, weight) in items.items()))

# Constraint: total weight must not exceed the limit
cqm.add_constraint(
    sum(weight * x[item] for item, (value, weight) in items.items()) <= max_weight,
    label="weight_limit",
)

# Solve with the hybrid CQM solver
cqm_sampler = LeapHybridCQMSampler()
result = cqm_sampler.sample_cqm(cqm, time_limit=5)

# Filter to only feasible solutions (those satisfying all constraints)
feasible = result.filter(lambda row: row.is_feasible)
if feasible:
    best = feasible.first
    selected_items = [item for item in items if best.sample[item] == 1]
    total_value = sum(items[item][0] for item in selected_items)
    total_weight = sum(items[item][1] for item in selected_items)
    print(f"Selected: {selected_items}")
    print(f"Total value:  {total_value}")
    print(f"Total weight: {total_weight:.1f} kg (limit: {max_weight} kg)")
else:
    print("No feasible solution found. Try increasing time_limit.")

Notice how the weight constraint is expressed directly as an inequality. There is no need to choose a penalty weight or verify that the constraint is satisfied manually. The filter call on the result separates feasible solutions from infeasible ones, since the hybrid solver may return some solutions that violate constraints (especially with short time limits).

Comparing Hybrid vs. Simulated Annealing

Running the same BQM through both solvers reveals how the hybrid solver performs relative to a classical baseline:

from dwave.samplers import SimulatedAnnealingSampler
import time

sa_sampler = SimulatedAnnealingSampler()

# Time the SA solve
sa_start = time.time()
sa_result = sa_sampler.sample(bqm, num_reads=50, num_sweeps=2000)
sa_elapsed = time.time() - sa_start

print(f"\nHybrid best energy:  {hybrid_result.first.energy:.6f}")
print(f"SA best energy:      {sa_result.first.energy:.6f}")
print(f"Difference:          {sa_result.first.energy - hybrid_result.first.energy:.6f}")
print(f"\nHybrid wall time:    {hybrid_result.info.get('run_time', 0) / 1e6:.2f}s")
print(f"SA wall time:        {sa_elapsed:.2f}s")

For small problems (under ~100 variables), simulated annealing often matches or beats the hybrid solver since the overhead of decomposition provides little benefit. The hybrid solver’s advantage emerges on larger, more complex problem instances. Here is why:

Why hybrid beats SA for large instances: A 500-variable binary problem has a solution space of 2^500 (roughly 10^150 possible configurations). Simulated annealing explores this space through local moves, flipping one or a few variables at a time. It relies on thermal fluctuations to escape local minima, but for large, rugged energy landscapes the barriers between good solutions can be too tall for thermal annealing to cross efficiently.

The hybrid solver takes a fundamentally different approach. Instead of searching the full 2^500 space with local moves, it uses structure-aware decomposition to identify the regions of the problem where the energy landscape is most difficult. It then sends these small, hard sub-problems to the QPU, which uses quantum tunneling to traverse energy barriers that would trap classical methods. The classical component handles the “easy” parts of the problem efficiently, while the QPU focuses its power on the parts that actually benefit from quantum effects.

In practice, you will typically see the hybrid solver produce lower-energy solutions than SA for problems above ~200 to 300 variables, with the gap widening as problem size increases. For problems with strong local structure (clusters of tightly interacting variables), the hybrid solver’s decomposition strategy is especially effective.

The time_limit Parameter

The time_limit parameter sets the maximum wall-clock time (in seconds) for the hybrid solver. Each problem size has a minimum time limit; requesting less than the minimum raises an error. Longer time limits generally improve solution quality, but with diminishing returns. You can query the minimum for your specific BQM:

min_time = hybrid_sampler.min_time_limit(bqm)
print(f"Minimum time limit for this problem: {min_time:.1f}s")

The minimum time limit scales with problem size. Small problems (under 1,000 variables) typically require 3 to 5 seconds minimum. Large problems (100,000+ variables) may require 10 seconds or more. Always check min_time_limit before setting your own value.

A useful strategy is to run the solver with increasing time limits and track the improvement:

for tl in [5, 10, 20, 30]:
    if tl < min_time:
        continue
    result = hybrid_sampler.sample(bqm, time_limit=tl)
    print(f"time_limit={tl:3d}s  energy={result.first.energy:.6f}")

If the energy stops improving between 10s and 20s, there is no reason to pay for 30s of compute time. The solver charges per second of runtime, so this sweep helps you find the cost-effective sweet spot.

SampleSet Interpretation

The hybrid solver returns a SampleSet with the same interface as classical samplers, though it typically returns fewer samples (often just one, sometimes up to ten). Unlike DWaveSampler where you request hundreds or thousands of reads, the hybrid solver focuses on finding one high-quality solution through its iterative refinement process.

The info dictionary contains timing breakdowns and solver metadata:

print(f"Timing info: {hybrid_result.info.get('timing', {})}")
print(f"Problem ID: {hybrid_result.info.get('problem_id', 'N/A')}")

Timing Breakdown

For hybrid solvers, the timing information is particularly useful because it reveals how the solver allocated its compute budget. The timing dictionary typically includes fields like:

timing = hybrid_result.info.get('timing', {})

# Total time the solver spent on the problem (microseconds)
total_time = timing.get('run_time', 0)

# Time spent on QPU access (microseconds)
qpu_time = timing.get('qpu_access_time', 0)

# Percentage of time spent on QPU vs classical processing
if total_time > 0:
    qpu_fraction = qpu_time / total_time * 100
    print(f"Total run time:     {total_time / 1e6:.2f}s")
    print(f"QPU access time:    {qpu_time / 1e6:.4f}s")
    print(f"QPU fraction:       {qpu_fraction:.1f}%")
    print(f"Classical overhead: {(total_time - qpu_time) / 1e6:.2f}s")

You will typically find that QPU access accounts for a small fraction of the total runtime, often under 10%. The majority of time goes to classical preprocessing (decomposition, heuristic initialization) and postprocessing (solution recombination, local improvement). This is by design: the hybrid solver uses the QPU surgically on the hardest sub-problems rather than running everything on quantum hardware.

The run_time field represents actual compute time, which may be less than your time_limit if the solver converges early. Compare these two values to understand whether a longer time limit would likely help.

Common Mistakes

Setting time_limit Below the Minimum

Every problem has a minimum time limit that depends on its size and structure. If you set time_limit below this threshold, the solver raises an error. Always check the minimum first:

min_time = hybrid_sampler.min_time_limit(bqm)
# Use at least the minimum, or your preferred value if higher
time_limit = max(min_time, 10)
result = hybrid_sampler.sample(bqm, time_limit=time_limit)

Expecting Many Samples

The hybrid solver typically returns 1 to 10 samples, not the hundreds you might request from DWaveSampler with num_reads=1000. If you need a distribution of solutions (for example, to estimate solution diversity or calculate statistics), run the hybrid solver multiple times with different random seeds, or use simulated annealing for the sampling step after identifying a good energy range with the hybrid solver.

Using Hybrid for Small Problems

For problems under ~100 fully connected variables, the hybrid solver adds unnecessary overhead. The decomposition, cloud round-trip, and QPU scheduling all take time that a local simulated annealing run avoids entirely. Furthermore, the hybrid solver charges per second while SimulatedAnnealingSampler (from dwave.samplers) is free and runs locally.

Use hybrid solvers for problems that are too large to embed on the QPU directly or where simulated annealing produces clearly suboptimal results.

Not Normalizing the Objective Function

The hybrid solver works best when the coefficients in your BQM are in a similar numerical range. If your linear biases are on the order of 0.01 but your quadratic biases are on the order of 1,000, the solver may struggle to balance the two terms effectively. Before submitting, consider normalizing:

# Normalize so that the largest coefficient has magnitude 1.0
bqm.normalize()

# Or scale to a specific range
bqm.normalize(bias_range=(-1, 1), quadratic_range=(-1, 1))

Normalization does not change which solution is optimal (it scales the entire energy landscape uniformly), but it improves numerical stability inside the solver’s decomposition and recombination steps. This is especially important for problems where you combine multiple objective terms with different natural scales, like the return and risk terms in the portfolio example above.

Next Steps

Hybrid solvers are the most practical entry point for real-world D-Wave applications, handling the complexity of embedding and decomposition so you can focus on problem formulation. From here, consider:

Experimenting with LeapHybridCQMSampler for constraint-heavy problems
Profiling your solver calls with the timing breakdown to optimize cost
Comparing hybrid results against classical baselines to verify that the quantum component adds value for your specific problem class