QUBO Formulation: Translating Problems for D-Wave
Learn how to formulate combinatorial optimization problems as QUBO matrices for D-Wave, including penalty terms for constraints and a worked number partitioning example.
Circuit diagrams
QUBO (Quadratic Unconstrained Binary Optimization) is the universal language of D-Wave systems. The quantum annealer, the simulated annealing sampler, and the hybrid solvers all accept the same QUBO format. Once you express your problem as a QUBO, switching between solvers is a one-line change. The hard part is always the formulation, not the execution.
This makes QUBO formulation the single most important skill in D-Wave programming. Every combinatorial optimization problem you want to solve, whether it is scheduling, routing, graph partitioning, or constraint satisfaction, must be translated into a matrix of numbers that encodes “what is good” and “what is bad” as energy values. The annealer then searches for the lowest-energy assignment of binary variables. If your QUBO faithfully represents the problem, low-energy solutions correspond to good answers. If the formulation is wrong, the annealer finds the minimum of the wrong function, and the results are useless regardless of how powerful the hardware is.
This tutorial builds the intuition and mechanical skills you need to construct correct QUBO formulations. We start with the mathematical structure, work through standard patterns you will reuse constantly, and solve two complete problems end to end.
The QUBO Objective Function
A QUBO minimizes the function:
where each . The matrix encodes both the objective you want to minimize and any constraints you need to enforce. Since the problem is “unconstrained” in the mathematical sense, you fold constraints into as penalty terms that raise the energy of invalid solutions.
The formula is compact but worth unpacking in plain terms:
- Diagonal entries act as biases on individual variables. A large negative value on encourages (because setting reduces the energy by ). A large positive value discourages (it would add cost).
- Off-diagonal entries (where ) capture pairwise interactions. A negative means the energy decreases when both and , so the two variables “want” to both be active together. A positive penalizes both being 1 simultaneously, pushing them apart.
- The annealer searches over all possible assignments of 0s and 1s and returns the assignment (or assignments) that minimize .
A Two-Variable Example by Hand
Consider the QUBO dictionary Q = {(0,0): -1, (1,1): -1, (0,1): 2}. This represents two binary variables and with:
- : a bias encouraging
- : a bias encouraging
- : a penalty for both variables being 1 simultaneously
Evaluate for all four possible assignments:
| 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | -1 | 0 | 0 | -1 |
| 0 | 1 | 0 | -1 | 0 | -1 |
| 1 | 1 | -1 | -1 | 2 | 0 |
The minimum energy is , achieved by either or . The biases want both variables on, but the positive interaction penalizes them being on together. The result: exactly one of the two variables is 1 at the optimum. This is actually a one-hot constraint in disguise, which we formalize below.
Standard QUBO Patterns
Certain patterns appear in nearly every QUBO formulation. Learning to recognize and apply them saves time and reduces errors.
One-Hot Constraint: “Exactly One of Variables Equals 1”
This is the most common constraint in practice. It arises whenever you model a choice among options (assigning a task, choosing a color, selecting a time slot).
The penalty function is:
Expand the square:
For binary variables, (since and ). Applying this simplification:
This gives us the QUBO entries:
- Diagonal: for each variable
- Off-diagonal: for each pair
- Constant: (does not affect optimization, but matters if you want correct absolute energy)
When exactly one variable is 1, the diagonal terms contribute , the off-diagonal terms contribute , and the constant contributes , giving a net penalty of . Any other assignment yields a positive penalty.
At-Most-One Constraint: “At Most One of Variables Equals 1”
This variant allows the “none selected” case. The penalty is:
This adds only positive off-diagonal terms. Any pair of variables that are both 1 incurs a penalty. Zero or one variable being 1 yields zero penalty. The QUBO entries are simply:
- Off-diagonal: for each pair
Logical Constraints as QUBO Terms
You can express logical relationships between binary variables as quadratic penalties. These are essential for combining sub-problems or enforcing structural rules.
AND constraint (): introduce an auxiliary variable and add the penalty:
This penalty equals zero when and is strictly positive otherwise.
QUBO entries: , , , .
OR constraint (): add the penalty:
QUBO entries: , , , , , .
NOT constraint (): add the penalty:
This is zero when (exactly one is 1) and positive otherwise.
QUBO entries: , , .
You can verify each of these by enumerating all assignments, as we did in the two-variable example above. This is always a good practice before embedding a pattern into a larger formulation.
Worked Example 1: Number Partitioning
Given a set of numbers , partition them into two subsets and such that the difference is minimized. Assign if element goes to subset , and if it goes to subset .
Deriving the QUBO
The difference in sums is:
We minimize the squared difference (minimizing the square also minimizes the absolute value):
Expand by writing (the total sum):
Now expand :
where we used . Substituting back:
Combining the linear terms:
Reading off the QUBO entries (dropping the constant ):
- Diagonal:
- Off-diagonal: for
The optimal energy is . A perfect partition (difference = 0) gives energy (since the constant was dropped from the QUBO but is still subtracted from the raw solver energy). When you see energy from the solver, the partition is perfect.
Implementation
import dimod
from dwave.samplers import SimulatedAnnealingSampler
# The set of numbers to partition
S = [3, 1, 4, 2]
n = len(S)
total = sum(S) # 10
# Build the QUBO matrix as a dictionary
Q = {}
for i in range(n):
# Linear (diagonal) terms: s_i * (4*s_i - 4*total)
Q[(i, i)] = S[i] * (4 * S[i] - 4 * total)
for j in range(i + 1, n):
# Quadratic (off-diagonal) terms: 8 * s_i * s_j
Q[(i, j)] = 8 * S[i] * S[j]
print("QUBO matrix entries:")
for key, val in sorted(Q.items()):
print(f" Q{key} = {val}")
# Create BinaryQuadraticModel from the QUBO
bqm = dimod.BinaryQuadraticModel.from_qubo(Q)
# Solve with simulated annealing
sampler = SimulatedAnnealingSampler()
sampleset = sampler.sample(bqm, num_reads=100, num_sweeps=1000)
# Inspect the best solution
best = sampleset.first
print(f"\nBest assignment: {best.sample}")
print(f"Energy: {best.energy}")
# Interpret the partition
subset_a = [S[i] for i in range(n) if best.sample[i] == 0]
subset_b = [S[i] for i in range(n) if best.sample[i] == 1]
print(f"Subset A: {subset_a}, sum = {sum(subset_a)}")
print(f"Subset B: {subset_b}, sum = {sum(subset_b)}")
print(f"Difference: {abs(sum(subset_a) - sum(subset_b))}")
For with total , a perfect partition exists: and both sum to 5. The solver should return energy (that is, ) and a difference of 0.
Inspecting the Energy Landscape
The SampleSet returned by the sampler contains all sampled solutions ranked by energy. Examining the low-energy spectrum helps you judge whether your formulation is working:
# Show the five lowest-energy samples
print("\nTop 5 solutions:")
for sample, energy, num_occ in list(sampleset.data(
fields=["sample", "energy", "num_occurrences"],
sorted_by="energy"
))[:5]:
print(f" {dict(sample)} energy={energy:.1f} count={num_occ}")
If many low-energy solutions violate your constraints, your penalty strength is too low. If the solver finds few distinct solutions and they cluster at high penalty energies, may be too high relative to the objective range.
Worked Example 2: Minimum Vertex Cover
A vertex cover of a graph is a set of vertices such that every edge has at least one endpoint in the set. The minimum vertex cover problem asks for the smallest such set. This is a classic NP-hard problem and a natural fit for QUBO.
Problem Setup
Consider a small graph with 5 vertices and 6 edges:
Vertices: {0, 1, 2, 3, 4}
Edges: {(0,1), (0,2), (1,2), (1,3), (2,4), (3,4)}
Assign binary variable if vertex is in the cover, otherwise.
Formulating the QUBO
The objective is to minimize the number of selected vertices:
The constraint is that for every edge , at least one endpoint must be selected: . Equivalently, the forbidden case is and . We penalize this with:
For each edge , this contributes:
- Diagonal: ,
- Off-diagonal:
- Constant: (per edge)
Combining the objective and penalty terms for the full QUBO:
- Diagonal: , where the comes from the objective and accumulates once per incident edge.
- Off-diagonal: for each edge .
Implementation
import dimod
from dwave.samplers import SimulatedAnnealingSampler
# Define the graph
edges = [(0, 1), (0, 2), (1, 2), (1, 3), (2, 4), (3, 4)]
n_vertices = 5
# Compute the degree of each vertex
degree = [0] * n_vertices
for u, v in edges:
degree[u] += 1
degree[v] += 1
# Penalty strength: the objective ranges from 0 to n_vertices = 5,
# so P should be somewhat larger than 5.
P = 8.0
# Build the QUBO
Q = {}
# Diagonal terms: objective (+1 per vertex) + penalty (-P per incident edge)
for i in range(n_vertices):
Q[(i, i)] = 1 - P * degree[i]
# Off-diagonal terms: penalty (+P per edge)
for u, v in edges:
Q[(u, v)] = P
print("QUBO matrix entries:")
for key, val in sorted(Q.items()):
print(f" Q{key} = {val}")
# Solve
bqm = dimod.BinaryQuadraticModel.from_qubo(Q)
sampler = SimulatedAnnealingSampler()
sampleset = sampler.sample(bqm, num_reads=200, num_sweeps=1000)
best = sampleset.first
cover = [i for i in range(n_vertices) if best.sample[i] == 1]
print(f"\nBest assignment: {best.sample}")
print(f"Energy: {best.energy}")
print(f"Vertex cover: {cover} (size {len(cover)})")
# Verify: check every edge is covered
all_covered = all(
best.sample[u] == 1 or best.sample[v] == 1
for u, v in edges
)
print(f"All edges covered: {all_covered}")
For this graph, the minimum vertex cover has size 3 (for example, vertices cover all six edges). The solver should find this or an equivalent cover. Notice the verification step at the end: always check that constraints are satisfied, especially when tuning penalty strengths.
Why This Formulation Works
The penalty terms create an energy “floor” for any assignment that leaves an edge uncovered. With and the objective contributing at most 5, any uncovered edge adds 8 to the energy while removing one vertex from the cover only saves 1. The annealer strongly prefers covering all edges and then minimizes the cover size among valid solutions.
Penalty Strength Tuning
Choosing the right penalty strength is one of the most practical challenges in QUBO formulation. The theory is simple (set large enough that violating a constraint always costs more than any objective improvement), but getting it right in practice requires care.
Estimating the Objective Range
Before running any solver, analyze the structure of your problem to bound the objective function’s range:
- For number partitioning, the squared difference ranges from to .
- For vertex cover, the objective (number of selected vertices) ranges from to .
- For assignment problems, sum the range of possible cost assignments.
The key quantity is the maximum improvement the solver could gain by violating a constraint. If the objective ranges over , then must exceed to guarantee that no constraint violation is ever worthwhile.
What Happens When P Is Too Low
When is too small relative to the objective range, the solver can reduce the total energy by violating constraints. In practice, you observe:
- Many of the top solutions in the
SampleSetare infeasible (they violate one or more constraints). - The lowest-energy feasible solution may appear deep in the list, or not at all.
- The solver “wastes” its sampling budget exploring infeasible regions of the search space.
What Happens When P Is Too High
When is much larger than the objective range, the penalty terms dominate the energy landscape. The annealer sees a landscape where all feasible solutions have nearly the same energy (the penalty contribution is zero for all of them, and the objective differences are tiny by comparison). This means:
- The annealer finds feasible solutions easily, but cannot distinguish good feasible solutions from mediocre ones.
- Many samples cluster at the same energy, reducing the effective number of distinct solutions explored.
- In the worst case, the annealer essentially performs random sampling within the feasible space.
The Sweep Approach
A systematic way to find the right :
- Estimate the objective range .
- Run the solver at , , , and .
- For each run, track two metrics: the feasibility rate (fraction of samples satisfying all constraints) and the best feasible objective value.
- Look for the sweet spot: the smallest where the feasibility rate is close to 100%. This gives the annealer maximum sensitivity to objective differences while still respecting constraints.
import dimod
from dwave.samplers import SimulatedAnnealingSampler
def evaluate_penalty_strength(Q_objective, penalty_terms, P_values, num_reads=200):
"""
Sweep over penalty strengths and report feasibility rate
and best objective value for each.
Q_objective: dict of QUBO entries for the objective alone
penalty_terms: function that takes P and returns dict of QUBO penalty entries
P_values: list of penalty strengths to try
"""
sampler = SimulatedAnnealingSampler()
for P in P_values:
# Combine objective and penalty QUBOs
Q = dict(Q_objective)
for key, val in penalty_terms(P).items():
Q[key] = Q.get(key, 0) + val
bqm = dimod.BinaryQuadraticModel.from_qubo(Q)
sampleset = sampler.sample(bqm, num_reads=num_reads, num_sweeps=1000)
# Analyze results (feasibility check is problem-specific)
print(f"P = {P:6.1f} | "
f"Best energy = {sampleset.first.energy:8.1f} | "
f"Unique solutions = {len(sampleset.record)}")
In production, you replace the inner loop with your problem-specific feasibility check and objective extraction.
Common Mistakes
Learning from common pitfalls saves significant debugging time.
Forgetting That
When expanding penalty terms like , the squared terms simplify to . Forgetting this step leaves you with a polynomial that is not in QUBO form. The mechanical rule: every time you see , replace it with . This applies throughout the entire expansion, not just at the final step.
Using a Single Penalty Strength for All Constraints
Different constraints may have different “importance” levels relative to the objective. For example, in a scheduling problem, a hard constraint (“no two meetings in the same room at the same time”) should have a much higher penalty than a soft constraint (“prefer morning slots”). Use separate penalty strengths and tune them independently.
Too Many Variables for Direct QPU Embedding
The D-Wave QPU has a fixed connectivity graph (Pegasus topology on current hardware). Embedding a fully connected QUBO of variables requires chains of physical qubits, and the chain length grows with . In practice:
- Problems with fewer than ~50 logical variables often embed cleanly on current hardware.
- Problems with 50 to 150 variables may embed but with long chains that degrade solution quality.
- Problems with more than 150 variables typically require the hybrid solver (
LeapHybridSampler) or decomposition approaches.
If you are prototyping with SimulatedAnnealingSampler and plan to move to the QPU, keep the variable count in mind from the start. Reformulating to reduce variables is much easier before you have built an entire pipeline around a specific QUBO structure.
Not Verifying on Small Instances
Before running your formulation on the target problem size, test it with a brute-force enumeration on a small instance (4 to 8 variables). This catches formulation errors that are invisible in solver output:
import itertools
import dimod
def brute_force_qubo(Q, n_variables):
"""
Enumerate all 2^n assignments and return the minimum energy
and corresponding assignment. Only practical for n <= ~20.
"""
bqm = dimod.BinaryQuadraticModel.from_qubo(Q)
best_energy = float('inf')
best_assignment = None
for bits in itertools.product([0, 1], repeat=n_variables):
sample = {i: bits[i] for i in range(n_variables)}
energy = bqm.energy(sample)
if energy < best_energy:
best_energy = energy
best_assignment = sample
return best_assignment, best_energy
# Example: verify the number partitioning QUBO for S = [3, 1, 4, 2]
S = [3, 1, 4, 2]
n = len(S)
total = sum(S)
Q = {}
for i in range(n):
Q[(i, i)] = S[i] * (4 * S[i] - 4 * total)
for j in range(i + 1, n):
Q[(i, j)] = 8 * S[i] * S[j]
assignment, energy = brute_force_qubo(Q, n)
print(f"Optimal assignment: {assignment}")
print(f"Optimal energy: {energy}")
# Verify: energy should be -100 (i.e., -T^2) for a perfect partition
If the brute-force optimum does not match the known answer to your problem, the formulation has a bug. Fix it before scaling up.
Next Steps
With QUBO formulation skills in place, three paths open up:
Embedding for direct QPU access. To run a QUBO on D-Wave hardware, the logical problem graph must be mapped onto the QPU’s physical topology. Ocean’s EmbeddingComposite handles this automatically for small problems. For larger or performance-critical problems, you may want to control the embedding manually using minorminer. The embedding step introduces chain strength as another parameter to tune, similar in spirit to penalty strength.
Hybrid solvers for larger problems. D-Wave’s Leap hybrid solvers (LeapHybridSampler, LeapHybridCQMSampler) accept problems with thousands of variables and handle decomposition internally. If your QUBO has more than ~100 variables, hybrid solvers are usually the practical path. They accept the same BinaryQuadraticModel object you have been building.
Constrained Quadratic Models (CQM). For problems with many constraints, manually converting every constraint into a penalty term becomes tedious and error-prone. The dimod.ConstrainedQuadraticModel class lets you specify constraints directly (equality, inequality, one-hot) and handles the penalty conversion internally. This is a higher-level interface that still compiles down to the same underlying optimization, but it separates the modeling step from the penalty-tuning step. The CQM approach is especially useful with LeapHybridCQMSampler, which accepts CQM objects natively.
Regardless of which path you take, the formulation skills from this tutorial remain the foundation. Understanding what the solver is actually optimizing gives you the ability to debug poor results, tune performance, and recognize when a problem is a good or bad fit for quantum annealing.
Was this tutorial helpful?