Ocean SDK Intermediate Free 3/12 in series 25 min read

QUBO Formulation: Translating Problems for D-Wave

Learn how to formulate combinatorial optimization problems as QUBO matrices for D-Wave, including penalty terms for constraints and a worked number partitioning example.

What you'll learn

  • d-wave
  • ocean
  • qubo
  • optimization
  • binary quadratic model
  • penalty terms

Prerequisites

  • ocean-hello-world

QUBO (Quadratic Unconstrained Binary Optimization) is the universal language of D-Wave systems. The quantum annealer, the simulated annealing sampler, and the hybrid solvers all accept the same QUBO format. Once you express your problem as a QUBO, switching between solvers is a one-line change. The hard part is always the formulation, not the execution.

This makes QUBO formulation the single most important skill in D-Wave programming. Every combinatorial optimization problem you want to solve, whether it is scheduling, routing, graph partitioning, or constraint satisfaction, must be translated into a matrix of numbers that encodes “what is good” and “what is bad” as energy values. The annealer then searches for the lowest-energy assignment of binary variables. If your QUBO faithfully represents the problem, low-energy solutions correspond to good answers. If the formulation is wrong, the annealer finds the minimum of the wrong function, and the results are useless regardless of how powerful the hardware is.

This tutorial builds the intuition and mechanical skills you need to construct correct QUBO formulations. We start with the mathematical structure, work through standard patterns you will reuse constantly, and solve two complete problems end to end.

The QUBO Objective Function

A QUBO minimizes the function:

f(x)=xTQx=iQiixi+i<jQijxixjf(x) = x^T Q x = \sum_i Q_{ii} x_i + \sum_{i < j} Q_{ij} x_i x_j

where each xi{0,1}x_i \in \{0, 1\}. The matrix QQ encodes both the objective you want to minimize and any constraints you need to enforce. Since the problem is “unconstrained” in the mathematical sense, you fold constraints into QQ as penalty terms that raise the energy of invalid solutions.

The formula is compact but worth unpacking in plain terms:

  • Diagonal entries QiiQ_{ii} act as biases on individual variables. A large negative value on QiiQ_{ii} encourages xi=1x_i = 1 (because setting xi=1x_i = 1 reduces the energy by Qii|Q_{ii}|). A large positive value discourages xi=1x_i = 1 (it would add cost).
  • Off-diagonal entries QijQ_{ij} (where i<ji < j) capture pairwise interactions. A negative QijQ_{ij} means the energy decreases when both xi=1x_i = 1 and xj=1x_j = 1, so the two variables “want” to both be active together. A positive QijQ_{ij} penalizes both being 1 simultaneously, pushing them apart.
  • The annealer searches over all 2n2^n possible assignments of 0s and 1s and returns the assignment (or assignments) that minimize f(x)f(x).

A Two-Variable Example by Hand

Consider the QUBO dictionary Q = {(0,0): -1, (1,1): -1, (0,1): 2}. This represents two binary variables x0x_0 and x1x_1 with:

  • Q00=1Q_{00} = -1: a bias encouraging x0=1x_0 = 1
  • Q11=1Q_{11} = -1: a bias encouraging x1=1x_1 = 1
  • Q01=2Q_{01} = 2: a penalty for both variables being 1 simultaneously

Evaluate f(x)f(x) for all four possible assignments:

x0x_0x1x_1Q00x0Q_{00} x_0Q11x1Q_{11} x_1Q01x0x1Q_{01} x_0 x_1f(x)f(x)
000000
10-100-1
010-10-1
11-1-120

The minimum energy is 1-1, achieved by either (1,0)(1, 0) or (0,1)(0, 1). The biases want both variables on, but the positive interaction penalizes them being on together. The result: exactly one of the two variables is 1 at the optimum. This is actually a one-hot constraint in disguise, which we formalize below.

Standard QUBO Patterns

Certain patterns appear in nearly every QUBO formulation. Learning to recognize and apply them saves time and reduces errors.

One-Hot Constraint: “Exactly One of nn Variables Equals 1”

This is the most common constraint in practice. It arises whenever you model a choice among nn options (assigning a task, choosing a color, selecting a time slot).

The penalty function is:

P(ixi1)2P \cdot \left(\sum_i x_i - 1\right)^2

Expand the square:

P(ixi22ixi+1+2i<jxixj)P \cdot \left(\sum_i x_i^2 - 2 \sum_i x_i + 1 + 2 \sum_{i<j} x_i x_j \right)

For binary variables, xi2=xix_i^2 = x_i (since 02=00^2 = 0 and 12=11^2 = 1). Applying this simplification:

P(ixi2ixi+1+2i<jxixj)=P(ixi+2i<jxixj+1)P \cdot \left(\sum_i x_i - 2 \sum_i x_i + 1 + 2 \sum_{i<j} x_i x_j \right) = P \cdot \left(-\sum_i x_i + 2 \sum_{i<j} x_i x_j + 1\right)

This gives us the QUBO entries:

  • Diagonal: Qii+=PQ_{ii} \mathrel{+}= -P for each variable ii
  • Off-diagonal: Qij+=2PQ_{ij} \mathrel{+}= 2P for each pair i<ji < j
  • Constant: +P+P (does not affect optimization, but matters if you want correct absolute energy)

When exactly one variable is 1, the diagonal terms contribute P-P, the off-diagonal terms contribute 00, and the constant contributes +P+P, giving a net penalty of 00. Any other assignment yields a positive penalty.

At-Most-One Constraint: “At Most One of nn Variables Equals 1”

This variant allows the “none selected” case. The penalty is:

Pi<jxixjP \cdot \sum_{i<j} x_i x_j

This adds only positive off-diagonal terms. Any pair of variables that are both 1 incurs a penalty. Zero or one variable being 1 yields zero penalty. The QUBO entries are simply:

  • Off-diagonal: Qij+=PQ_{ij} \mathrel{+}= P for each pair i<ji < j

Logical Constraints as QUBO Terms

You can express logical relationships between binary variables as quadratic penalties. These are essential for combining sub-problems or enforcing structural rules.

AND constraint (x3=x1 AND x2x_3 = x_1 \text{ AND } x_2): introduce an auxiliary variable x3x_3 and add the penalty:

P(x1x22x1x32x2x3+3x3)P \cdot (x_1 x_2 - 2 x_1 x_3 - 2 x_2 x_3 + 3 x_3)

This penalty equals zero when x3=x1x2x_3 = x_1 \cdot x_2 and is strictly positive otherwise.

QUBO entries: Q12+=PQ_{12} \mathrel{+}= P, Q13+=2PQ_{13} \mathrel{+}= -2P, Q23+=2PQ_{23} \mathrel{+}= -2P, Q33+=3PQ_{33} \mathrel{+}= 3P.

OR constraint (x3=x1 OR x2x_3 = x_1 \text{ OR } x_2): add the penalty:

P(x1x22x1x32x2x3+x1+x2+x3)P \cdot (x_1 x_2 - 2 x_1 x_3 - 2 x_2 x_3 + x_1 + x_2 + x_3)

QUBO entries: Q11+=PQ_{11} \mathrel{+}= P, Q22+=PQ_{22} \mathrel{+}= P, Q33+=PQ_{33} \mathrel{+}= P, Q12+=PQ_{12} \mathrel{+}= P, Q13+=2PQ_{13} \mathrel{+}= -2P, Q23+=2PQ_{23} \mathrel{+}= -2P.

NOT constraint (x2=NOT x1x_2 = \text{NOT } x_1): add the penalty:

P(2x1x2x1x2+1)P \cdot (2 x_1 x_2 - x_1 - x_2 + 1)

This is zero when x1+x2=1x_1 + x_2 = 1 (exactly one is 1) and positive otherwise.

QUBO entries: Q11+=PQ_{11} \mathrel{+}= -P, Q22+=PQ_{22} \mathrel{+}= -P, Q12+=2PQ_{12} \mathrel{+}= 2P.

You can verify each of these by enumerating all assignments, as we did in the two-variable example above. This is always a good practice before embedding a pattern into a larger formulation.

Worked Example 1: Number Partitioning

Given a set of numbers S={s0,s1,,sn1}S = \{s_0, s_1, \ldots, s_{n-1}\}, partition them into two subsets AA and BB such that the difference sum(A)sum(B)|\text{sum}(A) - \text{sum}(B)| is minimized. Assign xi=1x_i = 1 if element ii goes to subset BB, and xi=0x_i = 0 if it goes to subset AA.

Deriving the QUBO

The difference in sums is:

sum(B)sum(A)=isixiisi(1xi)=isi(2xi1)\text{sum}(B) - \text{sum}(A) = \sum_i s_i x_i - \sum_i s_i (1 - x_i) = \sum_i s_i (2x_i - 1)

We minimize the squared difference (minimizing the square also minimizes the absolute value):

f(x)=(isi(2xi1))2f(x) = \left(\sum_i s_i (2x_i - 1)\right)^2

Expand by writing T=isiT = \sum_i s_i (the total sum):

f(x)=(2isixiT)2=4(isixi)24Tisixi+T2f(x) = \left(2 \sum_i s_i x_i - T\right)^2 = 4 \left(\sum_i s_i x_i\right)^2 - 4T \sum_i s_i x_i + T^2

Now expand (isixi)2\left(\sum_i s_i x_i\right)^2:

(isixi)2=isi2xi2+2i<jsisjxixj=isi2xi+2i<jsisjxixj\left(\sum_i s_i x_i\right)^2 = \sum_i s_i^2 x_i^2 + 2 \sum_{i<j} s_i s_j x_i x_j = \sum_i s_i^2 x_i + 2 \sum_{i<j} s_i s_j x_i x_j

where we used xi2=xix_i^2 = x_i. Substituting back:

f(x)=4isi2xi+8i<jsisjxixj4Tisixi+T2f(x) = 4 \sum_i s_i^2 x_i + 8 \sum_{i<j} s_i s_j x_i x_j - 4T \sum_i s_i x_i + T^2

Combining the linear terms:

f(x)=isi(4si4T)xi+i<j8sisjxixj+T2f(x) = \sum_i s_i(4 s_i - 4T) x_i + \sum_{i<j} 8 s_i s_j \, x_i x_j + T^2

Reading off the QUBO entries (dropping the constant T2T^2):

  • Diagonal: Qii=si(4si4T)Q_{ii} = s_i (4 s_i - 4T)
  • Off-diagonal: Qij=8sisjQ_{ij} = 8 s_i s_j for i<ji < j

The optimal energy is T2+(minimum difference)2-T^2 + (\text{minimum difference})^2. A perfect partition (difference = 0) gives energy T2-T^2 (since the constant T2T^2 was dropped from the QUBO but is still subtracted from the raw solver energy). When you see energy =T2= -T^2 from the solver, the partition is perfect.

Implementation

import dimod
from dwave.samplers import SimulatedAnnealingSampler

# The set of numbers to partition
S = [3, 1, 4, 2]
n = len(S)
total = sum(S)  # 10

# Build the QUBO matrix as a dictionary
Q = {}
for i in range(n):
    # Linear (diagonal) terms: s_i * (4*s_i - 4*total)
    Q[(i, i)] = S[i] * (4 * S[i] - 4 * total)
    for j in range(i + 1, n):
        # Quadratic (off-diagonal) terms: 8 * s_i * s_j
        Q[(i, j)] = 8 * S[i] * S[j]

print("QUBO matrix entries:")
for key, val in sorted(Q.items()):
    print(f"  Q{key} = {val}")

# Create BinaryQuadraticModel from the QUBO
bqm = dimod.BinaryQuadraticModel.from_qubo(Q)

# Solve with simulated annealing
sampler = SimulatedAnnealingSampler()
sampleset = sampler.sample(bqm, num_reads=100, num_sweeps=1000)

# Inspect the best solution
best = sampleset.first
print(f"\nBest assignment: {best.sample}")
print(f"Energy: {best.energy}")

# Interpret the partition
subset_a = [S[i] for i in range(n) if best.sample[i] == 0]
subset_b = [S[i] for i in range(n) if best.sample[i] == 1]
print(f"Subset A: {subset_a}, sum = {sum(subset_a)}")
print(f"Subset B: {subset_b}, sum = {sum(subset_b)}")
print(f"Difference: {abs(sum(subset_a) - sum(subset_b))}")

For S={3,1,4,2}S = \{3, 1, 4, 2\} with total T=10T = 10, a perfect partition exists: {3,2}\{3, 2\} and {1,4}\{1, 4\} both sum to 5. The solver should return energy =100= -100 (that is, T2-T^2) and a difference of 0.

Inspecting the Energy Landscape

The SampleSet returned by the sampler contains all sampled solutions ranked by energy. Examining the low-energy spectrum helps you judge whether your formulation is working:

# Show the five lowest-energy samples
print("\nTop 5 solutions:")
for sample, energy, num_occ in list(sampleset.data(
    fields=["sample", "energy", "num_occurrences"],
    sorted_by="energy"
))[:5]:
    print(f"  {dict(sample)}  energy={energy:.1f}  count={num_occ}")

If many low-energy solutions violate your constraints, your penalty strength PP is too low. If the solver finds few distinct solutions and they cluster at high penalty energies, PP may be too high relative to the objective range.

Worked Example 2: Minimum Vertex Cover

A vertex cover of a graph is a set of vertices such that every edge has at least one endpoint in the set. The minimum vertex cover problem asks for the smallest such set. This is a classic NP-hard problem and a natural fit for QUBO.

Problem Setup

Consider a small graph with 5 vertices and 6 edges:

Vertices: {0, 1, 2, 3, 4}
Edges: {(0,1), (0,2), (1,2), (1,3), (2,4), (3,4)}

Assign binary variable xi=1x_i = 1 if vertex ii is in the cover, xi=0x_i = 0 otherwise.

Formulating the QUBO

The objective is to minimize the number of selected vertices:

Objective=ixi\text{Objective} = \sum_i x_i

The constraint is that for every edge (u,v)(u, v), at least one endpoint must be selected: xu+xv1x_u + x_v \geq 1. Equivalently, the forbidden case is xu=0x_u = 0 and xv=0x_v = 0. We penalize this with:

P(1xu)(1xv)=P(1xuxv+xuxv)P \cdot (1 - x_u)(1 - x_v) = P \cdot (1 - x_u - x_v + x_u x_v)

For each edge (u,v)(u,v), this contributes:

  • Diagonal: Quu+=PQ_{uu} \mathrel{+}= -P, Qvv+=PQ_{vv} \mathrel{+}= -P
  • Off-diagonal: Quv+=PQ_{uv} \mathrel{+}= P
  • Constant: +P+P (per edge)

Combining the objective and penalty terms for the full QUBO:

  • Diagonal: Qii=1P(degree of vertex i)Q_{ii} = 1 - P \cdot (\text{degree of vertex } i), where the +1+1 comes from the objective and P-P accumulates once per incident edge.
  • Off-diagonal: Quv=PQ_{uv} = P for each edge (u,v)(u,v).

Implementation

import dimod
from dwave.samplers import SimulatedAnnealingSampler

# Define the graph
edges = [(0, 1), (0, 2), (1, 2), (1, 3), (2, 4), (3, 4)]
n_vertices = 5

# Compute the degree of each vertex
degree = [0] * n_vertices
for u, v in edges:
    degree[u] += 1
    degree[v] += 1

# Penalty strength: the objective ranges from 0 to n_vertices = 5,
# so P should be somewhat larger than 5.
P = 8.0

# Build the QUBO
Q = {}

# Diagonal terms: objective (+1 per vertex) + penalty (-P per incident edge)
for i in range(n_vertices):
    Q[(i, i)] = 1 - P * degree[i]

# Off-diagonal terms: penalty (+P per edge)
for u, v in edges:
    Q[(u, v)] = P

print("QUBO matrix entries:")
for key, val in sorted(Q.items()):
    print(f"  Q{key} = {val}")

# Solve
bqm = dimod.BinaryQuadraticModel.from_qubo(Q)
sampler = SimulatedAnnealingSampler()
sampleset = sampler.sample(bqm, num_reads=200, num_sweeps=1000)

best = sampleset.first
cover = [i for i in range(n_vertices) if best.sample[i] == 1]
print(f"\nBest assignment: {best.sample}")
print(f"Energy: {best.energy}")
print(f"Vertex cover: {cover} (size {len(cover)})")

# Verify: check every edge is covered
all_covered = all(
    best.sample[u] == 1 or best.sample[v] == 1
    for u, v in edges
)
print(f"All edges covered: {all_covered}")

For this graph, the minimum vertex cover has size 3 (for example, vertices {1,2,3}\{1, 2, 3\} cover all six edges). The solver should find this or an equivalent cover. Notice the verification step at the end: always check that constraints are satisfied, especially when tuning penalty strengths.

Why This Formulation Works

The penalty terms create an energy “floor” for any assignment that leaves an edge uncovered. With P=8P = 8 and the objective contributing at most 5, any uncovered edge adds 8 to the energy while removing one vertex from the cover only saves 1. The annealer strongly prefers covering all edges and then minimizes the cover size among valid solutions.

Penalty Strength Tuning

Choosing the right penalty strength PP is one of the most practical challenges in QUBO formulation. The theory is simple (set PP large enough that violating a constraint always costs more than any objective improvement), but getting it right in practice requires care.

Estimating the Objective Range

Before running any solver, analyze the structure of your problem to bound the objective function’s range:

  • For number partitioning, the squared difference ranges from 00 to T2T^2.
  • For vertex cover, the objective (number of selected vertices) ranges from 00 to nn.
  • For assignment problems, sum the range of possible cost assignments.

The key quantity is the maximum improvement the solver could gain by violating a constraint. If the objective ranges over [Emin,Emax][E_{\min}, E_{\max}], then PP must exceed EmaxEminE_{\max} - E_{\min} to guarantee that no constraint violation is ever worthwhile.

What Happens When P Is Too Low

When PP is too small relative to the objective range, the solver can reduce the total energy by violating constraints. In practice, you observe:

  • Many of the top solutions in the SampleSet are infeasible (they violate one or more constraints).
  • The lowest-energy feasible solution may appear deep in the list, or not at all.
  • The solver “wastes” its sampling budget exploring infeasible regions of the search space.

What Happens When P Is Too High

When PP is much larger than the objective range, the penalty terms dominate the energy landscape. The annealer sees a landscape where all feasible solutions have nearly the same energy (the penalty contribution is zero for all of them, and the objective differences are tiny by comparison). This means:

  • The annealer finds feasible solutions easily, but cannot distinguish good feasible solutions from mediocre ones.
  • Many samples cluster at the same energy, reducing the effective number of distinct solutions explored.
  • In the worst case, the annealer essentially performs random sampling within the feasible space.

The Sweep Approach

A systematic way to find the right PP:

  1. Estimate the objective range R=EmaxEminR = E_{\max} - E_{\min}.
  2. Run the solver at P=0.5RP = 0.5R, P=RP = R, P=2RP = 2R, and P=5RP = 5R.
  3. For each run, track two metrics: the feasibility rate (fraction of samples satisfying all constraints) and the best feasible objective value.
  4. Look for the sweet spot: the smallest PP where the feasibility rate is close to 100%. This gives the annealer maximum sensitivity to objective differences while still respecting constraints.
import dimod
from dwave.samplers import SimulatedAnnealingSampler

def evaluate_penalty_strength(Q_objective, penalty_terms, P_values, num_reads=200):
    """
    Sweep over penalty strengths and report feasibility rate
    and best objective value for each.
    
    Q_objective: dict of QUBO entries for the objective alone
    penalty_terms: function that takes P and returns dict of QUBO penalty entries
    P_values: list of penalty strengths to try
    """
    sampler = SimulatedAnnealingSampler()
    
    for P in P_values:
        # Combine objective and penalty QUBOs
        Q = dict(Q_objective)
        for key, val in penalty_terms(P).items():
            Q[key] = Q.get(key, 0) + val
        
        bqm = dimod.BinaryQuadraticModel.from_qubo(Q)
        sampleset = sampler.sample(bqm, num_reads=num_reads, num_sweeps=1000)
        
        # Analyze results (feasibility check is problem-specific)
        print(f"P = {P:6.1f} | "
              f"Best energy = {sampleset.first.energy:8.1f} | "
              f"Unique solutions = {len(sampleset.record)}")

In production, you replace the inner loop with your problem-specific feasibility check and objective extraction.

Common Mistakes

Learning from common pitfalls saves significant debugging time.

Forgetting That xi2=xix_i^2 = x_i

When expanding penalty terms like (xik)2(\sum x_i - k)^2, the squared terms xi2x_i^2 simplify to xix_i. Forgetting this step leaves you with a polynomial that is not in QUBO form. The mechanical rule: every time you see xi2x_i^2, replace it with xix_i. This applies throughout the entire expansion, not just at the final step.

Using a Single Penalty Strength for All Constraints

Different constraints may have different “importance” levels relative to the objective. For example, in a scheduling problem, a hard constraint (“no two meetings in the same room at the same time”) should have a much higher penalty than a soft constraint (“prefer morning slots”). Use separate penalty strengths P1,P2,P_1, P_2, \ldots and tune them independently.

Too Many Variables for Direct QPU Embedding

The D-Wave QPU has a fixed connectivity graph (Pegasus topology on current hardware). Embedding a fully connected QUBO of nn variables requires chains of physical qubits, and the chain length grows with nn. In practice:

  • Problems with fewer than ~50 logical variables often embed cleanly on current hardware.
  • Problems with 50 to 150 variables may embed but with long chains that degrade solution quality.
  • Problems with more than 150 variables typically require the hybrid solver (LeapHybridSampler) or decomposition approaches.

If you are prototyping with SimulatedAnnealingSampler and plan to move to the QPU, keep the variable count in mind from the start. Reformulating to reduce variables is much easier before you have built an entire pipeline around a specific QUBO structure.

Not Verifying on Small Instances

Before running your formulation on the target problem size, test it with a brute-force enumeration on a small instance (4 to 8 variables). This catches formulation errors that are invisible in solver output:

import itertools
import dimod

def brute_force_qubo(Q, n_variables):
    """
    Enumerate all 2^n assignments and return the minimum energy
    and corresponding assignment. Only practical for n <= ~20.
    """
    bqm = dimod.BinaryQuadraticModel.from_qubo(Q)
    best_energy = float('inf')
    best_assignment = None
    
    for bits in itertools.product([0, 1], repeat=n_variables):
        sample = {i: bits[i] for i in range(n_variables)}
        energy = bqm.energy(sample)
        if energy < best_energy:
            best_energy = energy
            best_assignment = sample
    
    return best_assignment, best_energy

# Example: verify the number partitioning QUBO for S = [3, 1, 4, 2]
S = [3, 1, 4, 2]
n = len(S)
total = sum(S)
Q = {}
for i in range(n):
    Q[(i, i)] = S[i] * (4 * S[i] - 4 * total)
    for j in range(i + 1, n):
        Q[(i, j)] = 8 * S[i] * S[j]

assignment, energy = brute_force_qubo(Q, n)
print(f"Optimal assignment: {assignment}")
print(f"Optimal energy: {energy}")
# Verify: energy should be -100 (i.e., -T^2) for a perfect partition

If the brute-force optimum does not match the known answer to your problem, the formulation has a bug. Fix it before scaling up.

Next Steps

With QUBO formulation skills in place, three paths open up:

Embedding for direct QPU access. To run a QUBO on D-Wave hardware, the logical problem graph must be mapped onto the QPU’s physical topology. Ocean’s EmbeddingComposite handles this automatically for small problems. For larger or performance-critical problems, you may want to control the embedding manually using minorminer. The embedding step introduces chain strength as another parameter to tune, similar in spirit to penalty strength.

Hybrid solvers for larger problems. D-Wave’s Leap hybrid solvers (LeapHybridSampler, LeapHybridCQMSampler) accept problems with thousands of variables and handle decomposition internally. If your QUBO has more than ~100 variables, hybrid solvers are usually the practical path. They accept the same BinaryQuadraticModel object you have been building.

Constrained Quadratic Models (CQM). For problems with many constraints, manually converting every constraint into a penalty term becomes tedious and error-prone. The dimod.ConstrainedQuadraticModel class lets you specify constraints directly (equality, inequality, one-hot) and handles the penalty conversion internally. This is a higher-level interface that still compiles down to the same underlying optimization, but it separates the modeling step from the penalty-tuning step. The CQM approach is especially useful with LeapHybridCQMSampler, which accepts CQM objects natively.

Regardless of which path you take, the formulation skills from this tutorial remain the foundation. Understanding what the solver is actually optimizing gives you the ability to debug poor results, tune performance, and recognize when a problem is a good or bad fit for quantum annealing.

Was this tutorial helpful?