QUBO Formulation: Translating Problems for D-Wave

QUBO (Quadratic Unconstrained Binary Optimization) is the universal language of D-Wave systems. The quantum annealer, the simulated annealing sampler, and the hybrid solvers all accept the same QUBO format. Once you express your problem as a QUBO, switching between solvers is a one-line change. The hard part is always the formulation, not the execution.

This makes QUBO formulation the single most important skill in D-Wave programming. Every combinatorial optimization problem you want to solve, whether it is scheduling, routing, graph partitioning, or constraint satisfaction, must be translated into a matrix of numbers that encodes “what is good” and “what is bad” as energy values. The annealer then searches for the lowest-energy assignment of binary variables. If your QUBO faithfully represents the problem, low-energy solutions correspond to good answers. If the formulation is wrong, the annealer finds the minimum of the wrong function, and the results are useless regardless of how powerful the hardware is.

This tutorial builds the intuition and mechanical skills you need to construct correct QUBO formulations. We start with the mathematical structure, work through standard patterns you will reuse constantly, and solve two complete problems end to end.

The QUBO Objective Function

A QUBO minimizes the function:

$f(x) = x^T Q x = \sum_i Q_{ii} x_i + \sum_{i < j} Q_{ij} x_i x_j$

where each $x_i \in \{0, 1\}$ . The matrix $Q$ encodes both the objective you want to minimize and any constraints you need to enforce. Since the problem is “unconstrained” in the mathematical sense, you fold constraints into $Q$ as penalty terms that raise the energy of invalid solutions.

The formula is compact but worth unpacking in plain terms:

Diagonal entries $Q_{ii}$ act as biases on individual variables. A large negative value on $Q_{ii}$ encourages $x_i = 1$ (because setting $x_i = 1$ reduces the energy by $|Q_{ii}|$ ). A large positive value discourages $x_i = 1$ (it would add cost).
Off-diagonal entries $Q_{ij}$ (where $i < j$ ) capture pairwise interactions. A negative $Q_{ij}$ means the energy decreases when both $x_i = 1$ and $x_j = 1$ , so the two variables “want” to both be active together. A positive $Q_{ij}$ penalizes both being 1 simultaneously, pushing them apart.
The annealer searches over all $2^n$ possible assignments of 0s and 1s and returns the assignment (or assignments) that minimize $f(x)$ .

A Two-Variable Example by Hand

Consider the QUBO dictionary Q = {(0,0): -1, (1,1): -1, (0,1): 2}. This represents two binary variables $x_0$ and $x_1$ with:

$Q_{00} = -1$ : a bias encouraging $x_0 = 1$
$Q_{11} = -1$ : a bias encouraging $x_1 = 1$
$Q_{01} = 2$ : a penalty for both variables being 1 simultaneously

Evaluate $f(x)$ for all four possible assignments:

$x_0$	$x_1$	$Q_{00} x_0$	$Q_{11} x_1$	$Q_{01} x_0 x_1$	$f(x)$
0	0	0	0	0	0
1	0	-1	0	0	-1
0	1	0	-1	0	-1
1	1	-1	-1	2	0

The minimum energy is $-1$ , achieved by either $(1, 0)$ or $(0, 1)$ . The biases want both variables on, but the positive interaction penalizes them being on together. The result: exactly one of the two variables is 1 at the optimum. This is actually a one-hot constraint in disguise, which we formalize below.

Standard QUBO Patterns

Certain patterns appear in nearly every QUBO formulation. Learning to recognize and apply them saves time and reduces errors.

One-Hot Constraint: “Exactly One of $n$ Variables Equals 1”

This is the most common constraint in practice. It arises whenever you model a choice among $n$ options (assigning a task, choosing a color, selecting a time slot).

The penalty function is:

$P \cdot \left(\sum_i x_i - 1\right)^2$

Expand the square:

$P \cdot \left(\sum_i x_i^2 - 2 \sum_i x_i + 1 + 2 \sum_{i<j} x_i x_j \right)$

For binary variables, $x_i^2 = x_i$ (since $0^2 = 0$ and $1^2 = 1$ ). Applying this simplification:

$P \cdot \left(\sum_i x_i - 2 \sum_i x_i + 1 + 2 \sum_{i<j} x_i x_j \right) = P \cdot \left(-\sum_i x_i + 2 \sum_{i<j} x_i x_j + 1\right)$

This gives us the QUBO entries:

Diagonal: $Q_{ii} \mathrel{+}= -P$ for each variable $i$
Off-diagonal: $Q_{ij} \mathrel{+}= 2P$ for each pair $i < j$
Constant: $+P$ (does not affect optimization, but matters if you want correct absolute energy)

When exactly one variable is 1, the diagonal terms contribute $-P$ , the off-diagonal terms contribute $0$ , and the constant contributes $+P$ , giving a net penalty of $0$ . Any other assignment yields a positive penalty.

At-Most-One Constraint: “At Most One of $n$ Variables Equals 1”

This variant allows the “none selected” case. The penalty is:

$P \cdot \sum_{i<j} x_i x_j$

This adds only positive off-diagonal terms. Any pair of variables that are both 1 incurs a penalty. Zero or one variable being 1 yields zero penalty. The QUBO entries are simply:

Off-diagonal: $Q_{ij} \mathrel{+}= P$ for each pair $i < j$

Logical Constraints as QUBO Terms

You can express logical relationships between binary variables as quadratic penalties. These are essential for combining sub-problems or enforcing structural rules.

AND constraint ( $x_3 = x_1 \text{ AND } x_2$ ): introduce an auxiliary variable $x_3$ and add the penalty:

$P \cdot (x_1 x_2 - 2 x_1 x_3 - 2 x_2 x_3 + 3 x_3)$

This penalty equals zero when $x_3 = x_1 \cdot x_2$ and is strictly positive otherwise.

QUBO entries: $Q_{12} \mathrel{+}= P$ , $Q_{13} \mathrel{+}= -2P$ , $Q_{23} \mathrel{+}= -2P$ , $Q_{33} \mathrel{+}= 3P$ .

OR constraint ( $x_3 = x_1 \text{ OR } x_2$ ): add the penalty:

$P \cdot (x_1 x_2 - 2 x_1 x_3 - 2 x_2 x_3 + x_1 + x_2 + x_3)$

QUBO entries: $Q_{11} \mathrel{+}= P$ , $Q_{22} \mathrel{+}= P$ , $Q_{33} \mathrel{+}= P$ , $Q_{12} \mathrel{+}= P$ , $Q_{13} \mathrel{+}= -2P$ , $Q_{23} \mathrel{+}= -2P$ .

NOT constraint ( $x_2 = \text{NOT } x_1$ ): add the penalty:

$P \cdot (2 x_1 x_2 - x_1 - x_2 + 1)$

This is zero when $x_1 + x_2 = 1$ (exactly one is 1) and positive otherwise.

QUBO entries: $Q_{11} \mathrel{+}= -P$ , $Q_{22} \mathrel{+}= -P$ , $Q_{12} \mathrel{+}= 2P$ .

You can verify each of these by enumerating all assignments, as we did in the two-variable example above. This is always a good practice before embedding a pattern into a larger formulation.

Worked Example 1: Number Partitioning

Given a set of numbers $S = \{s_0, s_1, \ldots, s_{n-1}\}$ , partition them into two subsets $A$ and $B$ such that the difference $|\text{sum}(A) - \text{sum}(B)|$ is minimized. Assign $x_i = 1$ if element $i$ goes to subset $B$ , and $x_i = 0$ if it goes to subset $A$ .

Deriving the QUBO

The difference in sums is:

$\text{sum}(B) - \text{sum}(A) = \sum_i s_i x_i - \sum_i s_i (1 - x_i) = \sum_i s_i (2x_i - 1)$

We minimize the squared difference (minimizing the square also minimizes the absolute value):

$f(x) = \left(\sum_i s_i (2x_i - 1)\right)^2$

Expand by writing $T = \sum_i s_i$ (the total sum):

$f(x) = \left(2 \sum_i s_i x_i - T\right)^2 = 4 \left(\sum_i s_i x_i\right)^2 - 4T \sum_i s_i x_i + T^2$

Now expand $\left(\sum_i s_i x_i\right)^2$ :

$\left(\sum_i s_i x_i\right)^2 = \sum_i s_i^2 x_i^2 + 2 \sum_{i<j} s_i s_j x_i x_j = \sum_i s_i^2 x_i + 2 \sum_{i<j} s_i s_j x_i x_j$

where we used $x_i^2 = x_i$ . Substituting back:

$f(x) = 4 \sum_i s_i^2 x_i + 8 \sum_{i<j} s_i s_j x_i x_j - 4T \sum_i s_i x_i + T^2$

Combining the linear terms:

$f(x) = \sum_i s_i(4 s_i - 4T) x_i + \sum_{i<j} 8 s_i s_j \, x_i x_j + T^2$

Reading off the QUBO entries (dropping the constant $T^2$ ):

Diagonal: $Q_{ii} = s_i (4 s_i - 4T)$
Off-diagonal: $Q_{ij} = 8 s_i s_j$ for $i < j$

The optimal energy is $-T^2 + (\text{minimum difference})^2$ . A perfect partition (difference = 0) gives energy $-T^2$ (since the constant $T^2$ was dropped from the QUBO but is still subtracted from the raw solver energy). When you see energy $= -T^2$ from the solver, the partition is perfect.

Implementation

import dimod
from dwave.samplers import SimulatedAnnealingSampler

# The set of numbers to partition
S = [3, 1, 4, 2]
n = len(S)
total = sum(S)  # 10

# Build the QUBO matrix as a dictionary
Q = {}
for i in range(n):
    # Linear (diagonal) terms: s_i * (4*s_i - 4*total)
    Q[(i, i)] = S[i] * (4 * S[i] - 4 * total)
    for j in range(i + 1, n):
        # Quadratic (off-diagonal) terms: 8 * s_i * s_j
        Q[(i, j)] = 8 * S[i] * S[j]

print("QUBO matrix entries:")
for key, val in sorted(Q.items()):
    print(f"  Q{key} = {val}")

# Create BinaryQuadraticModel from the QUBO
bqm = dimod.BinaryQuadraticModel.from_qubo(Q)

# Solve with simulated annealing
sampler = SimulatedAnnealingSampler()
sampleset = sampler.sample(bqm, num_reads=100, num_sweeps=1000)

# Inspect the best solution
best = sampleset.first
print(f"\nBest assignment: {best.sample}")
print(f"Energy: {best.energy}")

# Interpret the partition
subset_a = [S[i] for i in range(n) if best.sample[i] == 0]
subset_b = [S[i] for i in range(n) if best.sample[i] == 1]
print(f"Subset A: {subset_a}, sum = {sum(subset_a)}")
print(f"Subset B: {subset_b}, sum = {sum(subset_b)}")
print(f"Difference: {abs(sum(subset_a) - sum(subset_b))}")

For $S = \{3, 1, 4, 2\}$ with total $T = 10$ , a perfect partition exists: $\{3, 2\}$ and $\{1, 4\}$ both sum to 5. The solver should return energy $= -100$ (that is, $-T^2$ ) and a difference of 0.

Inspecting the Energy Landscape

The SampleSet returned by the sampler contains all sampled solutions ranked by energy. Examining the low-energy spectrum helps you judge whether your formulation is working:

# Show the five lowest-energy samples
print("\nTop 5 solutions:")
for sample, energy, num_occ in list(sampleset.data(
    fields=["sample", "energy", "num_occurrences"],
    sorted_by="energy"
))[:5]:
    print(f"  {dict(sample)}  energy={energy:.1f}  count={num_occ}")

If many low-energy solutions violate your constraints, your penalty strength $P$ is too low. If the solver finds few distinct solutions and they cluster at high penalty energies, $P$ may be too high relative to the objective range.

Worked Example 2: Minimum Vertex Cover

A vertex cover of a graph is a set of vertices such that every edge has at least one endpoint in the set. The minimum vertex cover problem asks for the smallest such set. This is a classic NP-hard problem and a natural fit for QUBO.

Problem Setup

Consider a small graph with 5 vertices and 6 edges:

Vertices: {0, 1, 2, 3, 4}
Edges: {(0,1), (0,2), (1,2), (1,3), (2,4), (3,4)}

Assign binary variable $x_i = 1$ if vertex $i$ is in the cover, $x_i = 0$ otherwise.

Formulating the QUBO

The objective is to minimize the number of selected vertices:

$\text{Objective} = \sum_i x_i$

The constraint is that for every edge $(u, v)$ , at least one endpoint must be selected: $x_u + x_v \geq 1$ . Equivalently, the forbidden case is $x_u = 0$ and $x_v = 0$ . We penalize this with:

$P \cdot (1 - x_u)(1 - x_v) = P \cdot (1 - x_u - x_v + x_u x_v)$

For each edge $(u,v)$ , this contributes:

Diagonal: $Q_{uu} \mathrel{+}= -P$ , $Q_{vv} \mathrel{+}= -P$
Off-diagonal: $Q_{uv} \mathrel{+}= P$
Constant: $+P$ (per edge)

Combining the objective and penalty terms for the full QUBO:

Diagonal: $Q_{ii} = 1 - P \cdot (\text{degree of vertex } i)$ , where the $+1$ comes from the objective and $-P$ accumulates once per incident edge.
Off-diagonal: $Q_{uv} = P$ for each edge $(u,v)$ .

Implementation

import dimod
from dwave.samplers import SimulatedAnnealingSampler

# Define the graph
edges = [(0, 1), (0, 2), (1, 2), (1, 3), (2, 4), (3, 4)]
n_vertices = 5

# Compute the degree of each vertex
degree = [0] * n_vertices
for u, v in edges:
    degree[u] += 1
    degree[v] += 1

# Penalty strength: the objective ranges from 0 to n_vertices = 5,
# so P should be somewhat larger than 5.
P = 8.0

# Build the QUBO
Q = {}

# Diagonal terms: objective (+1 per vertex) + penalty (-P per incident edge)
for i in range(n_vertices):
    Q[(i, i)] = 1 - P * degree[i]

# Off-diagonal terms: penalty (+P per edge)
for u, v in edges:
    Q[(u, v)] = P

print("QUBO matrix entries:")
for key, val in sorted(Q.items()):
    print(f"  Q{key} = {val}")

# Solve
bqm = dimod.BinaryQuadraticModel.from_qubo(Q)
sampler = SimulatedAnnealingSampler()
sampleset = sampler.sample(bqm, num_reads=200, num_sweeps=1000)

best = sampleset.first
cover = [i for i in range(n_vertices) if best.sample[i] == 1]
print(f"\nBest assignment: {best.sample}")
print(f"Energy: {best.energy}")
print(f"Vertex cover: {cover} (size {len(cover)})")

# Verify: check every edge is covered
all_covered = all(
    best.sample[u] == 1 or best.sample[v] == 1
    for u, v in edges
)
print(f"All edges covered: {all_covered}")

For this graph, the minimum vertex cover has size 3 (for example, vertices $\{1, 2, 3\}$ cover all six edges). The solver should find this or an equivalent cover. Notice the verification step at the end: always check that constraints are satisfied, especially when tuning penalty strengths.

Why This Formulation Works

The penalty terms create an energy “floor” for any assignment that leaves an edge uncovered. With $P = 8$ and the objective contributing at most 5, any uncovered edge adds 8 to the energy while removing one vertex from the cover only saves 1. The annealer strongly prefers covering all edges and then minimizes the cover size among valid solutions.

Penalty Strength Tuning

Choosing the right penalty strength $P$ is one of the most practical challenges in QUBO formulation. The theory is simple (set $P$ large enough that violating a constraint always costs more than any objective improvement), but getting it right in practice requires care.

Estimating the Objective Range

Before running any solver, analyze the structure of your problem to bound the objective function’s range:

For number partitioning, the squared difference ranges from $0$ to $T^2$ .
For vertex cover, the objective (number of selected vertices) ranges from $0$ to $n$ .
For assignment problems, sum the range of possible cost assignments.

The key quantity is the maximum improvement the solver could gain by violating a constraint. If the objective ranges over $[E_{\min}, E_{\max}]$ , then $P$ must exceed $E_{\max} - E_{\min}$ to guarantee that no constraint violation is ever worthwhile.

What Happens When P Is Too Low

When $P$ is too small relative to the objective range, the solver can reduce the total energy by violating constraints. In practice, you observe:

Many of the top solutions in the SampleSet are infeasible (they violate one or more constraints).
The lowest-energy feasible solution may appear deep in the list, or not at all.
The solver “wastes” its sampling budget exploring infeasible regions of the search space.

What Happens When P Is Too High

When $P$ is much larger than the objective range, the penalty terms dominate the energy landscape. The annealer sees a landscape where all feasible solutions have nearly the same energy (the penalty contribution is zero for all of them, and the objective differences are tiny by comparison). This means:

The annealer finds feasible solutions easily, but cannot distinguish good feasible solutions from mediocre ones.
Many samples cluster at the same energy, reducing the effective number of distinct solutions explored.
In the worst case, the annealer essentially performs random sampling within the feasible space.

The Sweep Approach

A systematic way to find the right $P$ :

Estimate the objective range $R = E_{\max} - E_{\min}$ .
Run the solver at $P = 0.5R$ , $P = R$ , $P = 2R$ , and $P = 5R$ .
For each run, track two metrics: the feasibility rate (fraction of samples satisfying all constraints) and the best feasible objective value.
Look for the sweet spot: the smallest $P$ where the feasibility rate is close to 100%. This gives the annealer maximum sensitivity to objective differences while still respecting constraints.

import dimod
from dwave.samplers import SimulatedAnnealingSampler

def evaluate_penalty_strength(Q_objective, penalty_terms, P_values, num_reads=200):
    """
    Sweep over penalty strengths and report feasibility rate
    and best objective value for each.
    
    Q_objective: dict of QUBO entries for the objective alone
    penalty_terms: function that takes P and returns dict of QUBO penalty entries
    P_values: list of penalty strengths to try
    """
    sampler = SimulatedAnnealingSampler()
    
    for P in P_values:
        # Combine objective and penalty QUBOs
        Q = dict(Q_objective)
        for key, val in penalty_terms(P).items():
            Q[key] = Q.get(key, 0) + val
        
        bqm = dimod.BinaryQuadraticModel.from_qubo(Q)
        sampleset = sampler.sample(bqm, num_reads=num_reads, num_sweeps=1000)
        
        # Analyze results (feasibility check is problem-specific)
        print(f"P = {P:6.1f} | "
              f"Best energy = {sampleset.first.energy:8.1f} | "
              f"Unique solutions = {len(sampleset.record)}")

In production, you replace the inner loop with your problem-specific feasibility check and objective extraction.

Common Mistakes

Learning from common pitfalls saves significant debugging time.

Forgetting That $x_i^2 = x_i$

When expanding penalty terms like $(\sum x_i - k)^2$ , the squared terms $x_i^2$ simplify to $x_i$ . Forgetting this step leaves you with a polynomial that is not in QUBO form. The mechanical rule: every time you see $x_i^2$ , replace it with $x_i$ . This applies throughout the entire expansion, not just at the final step.

Using a Single Penalty Strength for All Constraints

Different constraints may have different “importance” levels relative to the objective. For example, in a scheduling problem, a hard constraint (“no two meetings in the same room at the same time”) should have a much higher penalty than a soft constraint (“prefer morning slots”). Use separate penalty strengths $P_1, P_2, \ldots$ and tune them independently.

Too Many Variables for Direct QPU Embedding

The D-Wave QPU has a fixed connectivity graph (Pegasus topology on Advantage systems, Zephyr on the newer Advantage2). Embedding a fully connected QUBO of $n$ variables requires chains of physical qubits, and the chain length grows with $n$ . In practice:

Problems with fewer than ~50 logical variables often embed cleanly on current hardware.
Problems with 50 to 150 variables may embed but with long chains that degrade solution quality.
Problems with more than 150 variables typically require the hybrid solver (LeapHybridSampler) or decomposition approaches.

If you are prototyping with SimulatedAnnealingSampler and plan to move to the QPU, keep the variable count in mind from the start. Reformulating to reduce variables is much easier before you have built an entire pipeline around a specific QUBO structure.

Not Verifying on Small Instances

Before running your formulation on the target problem size, test it with a brute-force enumeration on a small instance (4 to 8 variables). This catches formulation errors that are invisible in solver output:

import itertools
import dimod

def brute_force_qubo(Q, n_variables):
    """
    Enumerate all 2^n assignments and return the minimum energy
    and corresponding assignment. Only practical for n <= ~20.
    """
    bqm = dimod.BinaryQuadraticModel.from_qubo(Q)
    best_energy = float('inf')
    best_assignment = None
    
    for bits in itertools.product([0, 1], repeat=n_variables):
        sample = {i: bits[i] for i in range(n_variables)}
        energy = bqm.energy(sample)
        if energy < best_energy:
            best_energy = energy
            best_assignment = sample
    
    return best_assignment, best_energy

# Example: verify the number partitioning QUBO for S = [3, 1, 4, 2]
S = [3, 1, 4, 2]
n = len(S)
total = sum(S)
Q = {}
for i in range(n):
    Q[(i, i)] = S[i] * (4 * S[i] - 4 * total)
    for j in range(i + 1, n):
        Q[(i, j)] = 8 * S[i] * S[j]

assignment, energy = brute_force_qubo(Q, n)
print(f"Optimal assignment: {assignment}")
print(f"Optimal energy: {energy}")
# Verify: energy should be -100 (i.e., -T^2) for a perfect partition

If the brute-force optimum does not match the known answer to your problem, the formulation has a bug. Fix it before scaling up.

Next Steps

With QUBO formulation skills in place, three paths open up:

Embedding for direct QPU access. To run a QUBO on D-Wave hardware, the logical problem graph must be mapped onto the QPU’s physical topology. Ocean’s EmbeddingComposite handles this automatically for small problems. For larger or performance-critical problems, you may want to control the embedding manually using minorminer. The embedding step introduces chain strength as another parameter to tune, similar in spirit to penalty strength.

Hybrid solvers for larger problems. D-Wave’s Leap hybrid solvers (LeapHybridSampler, LeapHybridCQMSampler) accept problems with thousands of variables and handle decomposition internally. If your QUBO has more than ~100 variables, hybrid solvers are usually the practical path. They accept the same BinaryQuadraticModel object you have been building.

Constrained Quadratic Models (CQM). For problems with many constraints, manually converting every constraint into a penalty term becomes tedious and error-prone. The dimod.ConstrainedQuadraticModel class lets you specify constraints directly (equality, inequality, one-hot) and handles the penalty conversion internally. This is a higher-level interface that still compiles down to the same underlying optimization, but it separates the modeling step from the penalty-tuning step. The CQM approach is especially useful with LeapHybridCQMSampler, which accepts CQM objects natively.

Regardless of which path you take, the formulation skills from this tutorial remain the foundation. Understanding what the solver is actually optimizing gives you the ability to debug poor results, tune performance, and recognize when a problem is a good or bad fit for quantum annealing.

The QUBO Objective Function

A Two-Variable Example by Hand

Standard QUBO Patterns

One-Hot Constraint: “Exactly One of nnn Variables Equals 1”

At-Most-One Constraint: “At Most One of nnn Variables Equals 1”

Logical Constraints as QUBO Terms

Worked Example 1: Number Partitioning

Deriving the QUBO

Implementation

Inspecting the Energy Landscape

Worked Example 2: Minimum Vertex Cover

Problem Setup

Formulating the QUBO

Implementation

Why This Formulation Works

Penalty Strength Tuning

Estimating the Objective Range

What Happens When P Is Too Low

What Happens When P Is Too High

The Sweep Approach

Common Mistakes

Forgetting That xi2=xix_i^2 = x_ixi2​=xi​

Using a Single Penalty Strength for All Constraints

Too Many Variables for Direct QPU Embedding

Not Verifying on Small Instances

Next Steps

Related Tutorials

Get one quantum email a week

One-Hot Constraint: “Exactly One of $n$ Variables Equals 1”

At-Most-One Constraint: “At Most One of $n$ Variables Equals 1”

Forgetting That $x_i^2 = x_i$