Run Bigger Circuits with Qiskit Circuit Cutting (Wire + Gate)

The Problem: Circuits Too Large for One QPU

NISQ devices have limited qubit counts and constrained connectivity. A circuit requiring 20 qubits cannot run on a 10-qubit device, and even when a device has enough qubits, heavy cross-chip routing can destroy fidelity. Circuit cutting solves this by decomposing a large circuit into smaller subcircuits that run independently on separate devices (or the same device in separate jobs). Classical post-processing then reconstructs the original circuit’s expectation values from the subcircuit results.

The core tradeoff is simple: you reduce quantum resource requirements at the cost of increased classical processing and more total shots. Each cut introduces a multiplicative overhead in the number of circuit executions needed. This makes circuit cutting practical only when the number of cuts is small, typically one to three.

Circuit cutting fits two main use cases:

Distributing across QPUs. When no single QPU has enough qubits, you split the circuit across two or more devices. Each device runs its subcircuit independently, and you combine results classically.
Reducing circuit depth. Subcircuits are shallower than the original circuit, which means less accumulated gate error on noisy hardware. Even when a single QPU has enough qubits, cutting can improve fidelity if the depth reduction outweighs the reconstruction overhead.

Wire Cutting vs Gate Cutting: Physical Intuition

Two distinct techniques exist for cutting circuits. Understanding the physical mechanism behind each one clarifies when to use which.

Wire Cutting

Wire cutting inserts a classical communication channel across a quantum wire. Imagine a circuit where qubit 1 in partition A connects to qubit 2 in partition B through a wire. Wire cutting replaces that quantum connection with a measure-communicate-prepare protocol:

Measure the state on the sender qubit (in partition A) in one of several Pauli bases (X, Y, or Z).
Classically communicate the measurement outcome.
Prepare a corresponding state on the receiver qubit (in partition B) based on the measurement result.

Because a single measurement in one basis cannot capture all the information in a quantum state (quantum states live in a continuous space, but measurements yield discrete outcomes), you must repeat this process across multiple basis choices and average the results. Specifically, you need several different preparation and measurement configurations per wire cut.

The key insight is that no quantum communication occurs between partitions. Everything flows through classical channels. The price you pay is that you need many more circuit executions to reconstruct the same statistical precision.

Gate Cutting

Gate cutting takes a different approach. Instead of cutting a wire, you decompose a two-qubit gate (like a CNOT) into a weighted sum of local single-qubit operations, one set acting on partition A and another on partition B. The decomposition looks like this:

CNOT(q1, q2) = sum_i  c_i * [U_i(q1)] ⊗ [V_i(q2)]

Each term in the sum is a pair of single-qubit operations that can be executed independently on separate partitions. The coefficients c_i can be negative (quasi-probabilities), which is why reconstruction requires careful weighted combination of results.

Gate cutting does not require any physical modification to the circuit’s wiring. It replaces a single two-qubit gate with multiple single-qubit experiments. Like wire cutting, it uses only classical post-processing to combine results.

When to Use Which

Gate cutting is generally preferred when you have a specific two-qubit gate crossing the partition boundary, because its overhead per cut is lower than wire cutting for common gates like CNOT. Wire cutting is useful when the circuit has a natural wire boundary (a qubit whose state flows from one partition to another) or when you need to split the circuit at a point that does not correspond to a single gate.

Quasi-Probability Decomposition: The Math Behind Cutting

The mathematical foundation of circuit cutting is the quasi-probability decomposition (QPD). Understanding QPD explains where the overhead comes from and why it scales the way it does.

The Core Idea

Consider a single wire cut. The identity channel on one qubit (the operation “do nothing, just pass the quantum state through”) can be decomposed as:

I = (1/2) * sum_{P in {|0⟩,|1⟩,|+⟩,|-⟩,|+i⟩,|-i⟩}}  c_P * (prepare P) ⊗ (measure in basis of P)

This sum has six terms, corresponding to preparations and measurements in the X, Y, and Z eigenbases. The coefficients c_P are quasi-probabilities: they sum to 1, but some are negative. You cannot interpret them as classical probabilities, but you can use them as weights when combining measurement results.

The Overhead Factor: Gamma

The total variation of the quasi-probability distribution defines the overhead factor gamma:

gamma = sum_i |c_i|

For a single wire cut implemented with local operations only (no classical communication between subcircuits, which is how the Qiskit addon runs), gamma = 4. The number of additional shots required to achieve the same statistical precision as the uncut circuit scales as gamma squared:

shots_needed = gamma^2 * target_shots = 16 * target_shots

This means a single wire cut requires 16 times more shots. Intuitively, the negative coefficients introduce sign cancellations during reconstruction, which increases the variance of the estimator. You need more samples to beat down that extra variance.

For k independent cuts, the overhead multiplies:

total_gamma^2 = (gamma_1^2) * (gamma_2^2) * ... * (gamma_k^2)

Gate Cutting Overhead vs Wire Cutting Overhead

Different gates have different QPD decompositions with different gamma values. Here is a comparison:

Cut Type	gamma	gamma^2 (shot overhead)
Wire cut	4	16x
CNOT gate cut	3	9x
CZ gate cut	3	9x
SWAP gate cut	7	49x
RZZ(theta) gate cut	depends on theta	varies

The practical implication: when a CNOT crosses the partition boundary, gate cutting (9x overhead) is cheaper than wire cutting (16x overhead). Prefer gate cutting when the boundary corresponds to a specific two-qubit gate.

For multiple cuts, the overhead compounds. Two CNOT gate cuts cost 9^2 = 81x shots. Two wire cuts cost 16^2 = 256x shots. Three wire cuts cost 16^3 = 4,096x shots. This exponential scaling is the fundamental limitation of circuit cutting.

Setup

pip install qiskit-addon-cutting qiskit qiskit-aer

All code in this tutorial requires the Qiskit Circuit Cutting addon. Install it separately from qiskit and qiskit-aer. The addon provides the partition_problem, generate_cutting_experiments, and reconstruct_expectation_values functions that form the core cutting workflow.

Understanding Partition Labels

Before diving into code, let’s clarify how partition labels work. The partition label string maps each qubit index to a named partition. For a 4-qubit circuit with labels "AABB":

Qubit 0 maps to partition A
Qubit 1 maps to partition A
Qubit 2 maps to partition B
Qubit 3 maps to partition B

Any two-qubit gate that acts across the A-B boundary (for example, cx(1, 2)) gets automatically identified and cut by partition_problem. Gates acting within a single partition (like cx(0, 1) within A or cx(2, 3) within B) remain intact.

Choosing Good Partition Labels

The quality of your partition choice directly affects performance. The goal is to minimize the number of gates crossing partition boundaries, because each cross-boundary gate becomes a cut with multiplicative overhead.

To identify the best partition boundary:

List all two-qubit gates in your circuit.
For each possible partition boundary, count how many two-qubit gates cross it.
Choose the boundary with the fewest crossings.

For example, in a linear chain of CNOT gates on qubits 0-1-2-3, the gate pattern is: cx(0,1), cx(1,2), cx(2,3). Partitioning as "AABB" puts one gate (cx(1,2)) on the boundary. Partitioning as "ABBA" would put two gates on boundaries. The first choice is better.

Example: Gate Cutting a 4-Qubit Circuit

This complete example creates a 4-qubit circuit, cuts it into two 2-qubit subcircuits using gate cutting, runs the subcircuits, reconstructs the expectation values, and verifies the results against the uncut circuit.

Step 1: Build the Circuit and Define Observables

import numpy as np
from qiskit import QuantumCircuit
from qiskit.quantum_info import SparsePauliOp

# Build a 4-qubit circuit with one cross-partition CX gate
circuit = QuantumCircuit(4)
circuit.h(0)
circuit.h(1)
circuit.cx(0, 1)    # Within partition A
circuit.cx(1, 2)    # Crosses A-B boundary: this gate will be cut
circuit.cx(2, 3)    # Within partition B
circuit.ry(0.4, 0)
circuit.ry(0.8, 1)
circuit.ry(1.2, 2)
circuit.ry(1.6, 3)

print("Original circuit:")
print(circuit.draw(output="text"))

# Define observables to measure
observable = SparsePauliOp(["ZZII", "IZZI", "IIZZ"])

Step 2: Partition the Problem

from qiskit_addon_cutting import (
    partition_problem,
    generate_cutting_experiments,
    reconstruct_expectation_values,
)

# Partition: qubits 0,1 -> A, qubits 2,3 -> B
# The cx(1,2) gate crosses the boundary and will be decomposed
partitioned_problem = partition_problem(
    circuit=circuit,
    partition_labels="AABB",
    observables=observable.paulis,
)

subcircuits = partitioned_problem.subcircuits
subobservables = partitioned_problem.subobservables
bases = partitioned_problem.bases

print(f"Number of subcircuits: {len(subcircuits)}")
for label, subcirc in subcircuits.items():
    print(f"\nPartition {label} ({subcirc.num_qubits} qubits):")
    print(subcirc.draw(output="text"))

The partition_problem function returns a PartitionedCuttingProblem named tuple with three fields:

subcircuits: a dictionary mapping partition labels to their quantum circuits
subobservables: a dictionary mapping partition labels to the local observables each partition must measure
bases: the QPD bases used for the decomposition

Step 3: Generate Subcircuit Experiments

# Generate all subcircuit experiments needed for reconstruction
# num_samples=np.inf means exact decomposition (all basis combinations)
subexperiments, coefficients = generate_cutting_experiments(
    circuits=subcircuits,
    observables=subobservables,
    num_samples=np.inf,
)

# Count the experiments
for label, expts in subexperiments.items():
    print(f"Partition {label}: {len(expts)} subcircuit experiments")
print(f"Total experiments: {sum(len(e) for e in subexperiments.values())}")

With num_samples=np.inf, the function generates all possible basis combinations for the QPD. This gives exact reconstruction (up to shot noise) but produces the maximum number of subcircuit experiments. For production use on real hardware with many cuts, you can set num_samples to a finite integer to stochastically sample a subset of basis combinations, trading reconstruction accuracy for fewer experiments.

Step 4: Run Subcircuit Experiments

from qiskit_aer.primitives import SamplerV2

# Create a sampler for running subcircuit experiments
sampler = SamplerV2()

# Run each partition's experiments
# Each partition's experiments are independent and can be submitted as a batch
results = {
    label: sampler.run(subsystem_subexpts, shots=4096).result()
    for label, subsystem_subexpts in subexperiments.items()
}

Step 5: Reconstruct Expectation Values

# Reconstruct the full expectation values from subcircuit results
reconstructed_expval_terms = reconstruct_expectation_values(
    results,
    coefficients,
    subobservables,
)

# Combine terms weighted by observable coefficients
reconstructed_expval = np.dot(reconstructed_expval_terms, observable.coeffs)
print(f"Reconstructed expectation value: {np.real(reconstructed_expval):.6f}")

Step 6: Verify Against the Full Circuit

from qiskit_aer.primitives import EstimatorV2

# Run the original uncut circuit for comparison
estimator = EstimatorV2()
exact_result = estimator.run([(circuit, observable)]).result()
exact_expval = exact_result[0].data.evs

print(f"Exact expectation value:         {exact_expval:.6f}")
print(f"Reconstructed expectation value: {np.real(reconstructed_expval):.6f}")
print(f"Absolute error:                  {abs(np.real(reconstructed_expval) - exact_expval):.6f}")

For an ideal (noiseless) simulation with sufficient shots, the reconstructed value should match the exact value to within shot noise. If you see large deviations, check that your partition labels correctly identify the cross-boundary gates.

Wire Cutting with the Move Instruction

Wire cutting uses a different mechanism: the Move instruction. This instruction represents the physical operation of transferring a qubit’s state from one register to another, which the cutting toolbox decomposes into measure-prepare pairs during experiment generation.

from qiskit_addon_cutting.instructions import Move

# Create a circuit that explicitly marks a wire cut with Move
qc_wire = QuantumCircuit(5)  # 5 qubits: 3 in partition A, 2 in partition B
qc_wire.h(0)
qc_wire.cx(0, 1)
qc_wire.cx(1, 2)
# Move qubit 2's state to qubit 3 (crossing the partition boundary)
qc_wire.append(Move(), [2, 3])
qc_wire.cx(3, 4)

print("Circuit with Move instruction:")
print(qc_wire.draw(output="text"))

# Partition: qubits 0,1,2 -> A, qubits 3,4 -> B
wire_partitioned = partition_problem(
    circuit=qc_wire,
    partition_labels="AAABB",
    observables=SparsePauliOp("IZZII").paulis,
)

The Move instruction tells the cutting toolbox exactly where to insert the wire cut. During experiment generation, it gets replaced by the measure-prepare pairs needed for the QPD of the identity channel.

Multiple Cuts: Scaling to Three Partitions

When a circuit needs to be split into more than two pieces, you use multiple cuts. Each additional cut multiplies the shot overhead.

Example: 6-Qubit Circuit with Two Cuts

# A 6-qubit circuit split into three 2-qubit partitions
circuit_6q = QuantumCircuit(6)
circuit_6q.h(range(6))
circuit_6q.cx(0, 1)    # Within partition A
circuit_6q.cx(1, 2)    # Cut 1: crosses A-B boundary
circuit_6q.cx(2, 3)    # Within partition B
circuit_6q.cx(3, 4)    # Cut 2: crosses B-C boundary
circuit_6q.cx(4, 5)    # Within partition C
circuit_6q.ry(0.3, range(6))

# Three partitions: A (qubits 0,1), B (qubits 2,3), C (qubits 4,5)
partitioned_6q = partition_problem(
    circuit=circuit_6q,
    partition_labels="AABBCC",
    observables=SparsePauliOp(["ZZIIII", "IIZZII", "IIIIZZ"]).paulis,
)

print(f"Partitions: {list(partitioned_6q.subcircuits.keys())}")
for label, subcirc in partitioned_6q.subcircuits.items():
    print(f"  Partition {label}: {subcirc.num_qubits} qubits")

Overhead Analysis for Multiple Cuts

With two CNOT gate cuts, the total shot overhead is:

total_overhead = gamma_1^2 * gamma_2^2 = 9 * 9 = 81x

With two wire cuts, the overhead is:

total_overhead = 16 * 16 = 256x

Here is how the overhead scales with the number of cuts:

Number of wire cuts	Shot overhead	Practical?
1	16x	Yes
2	256x	Yes, with sufficient shot budget
3	4,096x	Marginal, requires large shot budget
4	65,536x	Rarely practical
5	1,048,576x	Impractical for most applications

The rule of thumb: budget for at most 2-3 cuts. Beyond that, the exponential overhead makes reconstruction impractically noisy unless you have access to very large shot budgets (millions of shots per subcircuit experiment).

Subcircuit Parallelization

One of the primary benefits of circuit cutting is that subcircuits can run in parallel on separate QPUs or separate simulator instances. The subcircuit experiments for partition A and partition B are completely independent.

Parallel Execution on Simulators

from concurrent.futures import ThreadPoolExecutor

sampler = SamplerV2()

def run_partition(label_and_experiments):
    label, experiments = label_and_experiments
    result = sampler.run(experiments, shots=4096).result()
    return label, result

# Run all partitions in parallel
with ThreadPoolExecutor(max_workers=len(subexperiments)) as executor:
    futures = executor.map(run_partition, subexperiments.items())
    parallel_results = dict(futures)

Parallel Execution on Multiple QPUs

On real quantum hardware, you submit each partition’s experiments to a different backend:

from qiskit_ibm_runtime import SamplerV2, QiskitRuntimeService

service = QiskitRuntimeService()

# Assign each partition to a different backend
backend_assignments = {
    "A": service.backend("ibm_brisbane"),
    "B": service.backend("ibm_kyoto"),
}

# Submit jobs to separate backends
jobs = {}
for label, experiments in subexperiments.items():
    backend = backend_assignments[label]
    sampler = SamplerV2(mode=backend)
    jobs[label] = sampler.run(experiments, shots=4096)

# Collect results (jobs run concurrently on different hardware)
results = {label: job.result() for label, job in jobs.items()}

This parallel execution reduces wall-clock time roughly by the number of partitions, which can offset some of the shot overhead from cutting.

Classical Reconstruction Overhead

Reconstruction is a purely classical post-processing step that combines subcircuit results using the quasi-probability coefficients. Its computational cost scales as:

classical_operations = O(N_configs * M)

where N_configs is the number of distinct subcircuit configurations (basis combinations) and M is the number of observable terms.

For k cuts using exact decomposition (num_samples=np.inf):

Cuts (k)	Configs per cut	Total configs	With 5 observables
1	6	6	30 operations
2	6	36	180 operations
3	6	216	1,080 operations
4	6	1,296	6,480 operations

Even for 4 cuts, the classical reconstruction takes microseconds on a modern CPU. The classical overhead is never the bottleneck. The quantum shot overhead (gamma^2 per cut, for example 9^k for k CNOT cuts) is the real cost.

Hardware Noise and Circuit Cutting Interaction

On real hardware, circuit cutting creates an interesting tension between two opposing effects.

The Benefit: Shorter Subcircuits

Subcircuits have fewer qubits and lower depth than the original circuit. Fewer two-qubit gates means less accumulated decoherence and gate error. If the original circuit has 80 CNOT gates and the subcircuits each have 35, the per-subcircuit error rate is significantly lower.

The Cost: Noise Amplification During Reconstruction

The quasi-probability coefficients used in reconstruction include negative values. When you multiply noisy results by these coefficients and sum them, the noise gets amplified. Specifically, the variance of the reconstructed expectation value scales as gamma^2 times the per-subcircuit variance. This is the same factor that requires more shots in the noiseless case, but with hardware noise, the effect compounds: you need even more shots to overcome both the QPD variance and the hardware noise.

The Net Effect

Whether circuit cutting helps on noisy hardware depends on the balance:

Circuit cutting helps when the original circuit is deep (many two-qubit gates), the cross-partition entanglement is weak (few cuts needed), and the hardware has moderate gate error rates. A rule of thumb: cutting is beneficial when the original circuit has more than 50 two-qubit gates and requires only 1-2 cuts.
Circuit cutting hurts when the original circuit is shallow (few two-qubit gates) or requires many cuts. In these cases, the noise amplification from reconstruction outweighs the benefit of shorter subcircuits.

Practical Guideline

Before committing to circuit cutting on real hardware, run a noise simulation comparing:

The full circuit on a noisy simulator matching your target backend.
The cut-and-reconstructed result on the same noisy simulator.

If the cut version has lower error, proceed. If not, consider whether a larger backend or error mitigation alone would be more effective.

Combining Circuit Cutting with Error Mitigation

Circuit cutting works alongside Qiskit’s error mitigation primitives, but the order of operations matters.

Correct Approach: Mitigate Per Subcircuit, Then Reconstruct

Apply error mitigation (such as zero-noise extrapolation, ZNE) to each subcircuit independently, then feed the mitigated results into reconstruct_expectation_values. This is correct because each subcircuit is a self-contained circuit with its own noise profile.

from qiskit_ibm_runtime import EstimatorV2, Options

# Configure ZNE for subcircuit execution
options = Options()
options.resilience_level = 2  # Enables ZNE

# Use the resilience-enabled Estimator for each subcircuit
# The mitigated results then go into reconstruct_expectation_values

Incorrect Approach: Mitigate After Reconstruction

Do not apply ZNE to the reconstructed expectation value. The reconstructed value is a weighted linear combination of subcircuit results, and applying noise extrapolation to this combination does not correctly account for the quasi-probability structure. The result would be meaningless.

Entanglement forging is a specialized circuit cutting technique for circuits where the quantum state can be written as a Schmidt decomposition with a small number of terms:

|psi⟩ = sum_i  lambda_i * |phi_i⟩_A ⊗ |chi_i⟩_B

When the Schmidt rank is low (few terms in the sum), entanglement forging can reconstruct expectation values with lower overhead than general circuit cutting. Instead of decomposing arbitrary gates, it exploits the product-state structure directly.

The tradeoff: entanglement forging requires the circuit to produce a state with this specific structure, which limits its applicability. For circuits with high entanglement across the partition boundary, general gate or wire cutting is the only option.

The qiskit-addon-cutting package includes entanglement forging support. See the addon documentation for implementation details and examples.

Verification Workflow: Comparing Cut vs Uncut Results

Always verify your cutting implementation before deploying on real hardware. The procedure is straightforward:

import numpy as np
from qiskit import QuantumCircuit
from qiskit.quantum_info import SparsePauliOp
from qiskit_aer.primitives import SamplerV2, EstimatorV2
from qiskit_addon_cutting import (
    partition_problem,
    generate_cutting_experiments,
    reconstruct_expectation_values,
)

# 1. Build circuit and observable
qc = QuantumCircuit(4)
qc.h(0)
qc.cx(0, 1)
qc.cx(1, 2)
qc.cx(2, 3)
qc.rz(0.7, range(4))

observable = SparsePauliOp(["ZZII", "IIZZ", "ZIZI"])

# 2. Run the full circuit (reference)
estimator = EstimatorV2()
exact_expval = estimator.run([(qc, observable)]).result()[0].data.evs
print(f"Exact expectation value: {exact_expval:.6f}")

# 3. Run the cut circuit
partitioned = partition_problem(
    circuit=qc,
    partition_labels="AABB",
    observables=observable.paulis,
)

subexperiments, coefficients = generate_cutting_experiments(
    circuits=partitioned.subcircuits,
    observables=partitioned.subobservables,
    num_samples=np.inf,
)

sampler = SamplerV2()
results = {
    label: sampler.run(expts, shots=8192).result()
    for label, expts in subexperiments.items()
}

reconstructed_terms = reconstruct_expectation_values(
    results, coefficients, partitioned.subobservables
)
reconstructed_expval = np.dot(reconstructed_terms, observable.coeffs)

# 4. Compare
print(f"Reconstructed:  {np.real(reconstructed_expval):.6f}")
print(f"Exact:          {exact_expval:.6f}")
deviation = abs(np.real(reconstructed_expval) - exact_expval)
print(f"Deviation:      {deviation:.6f}")

# For ideal simulation with 8192 shots, deviation should be < 0.05
assert deviation < 0.1, f"Deviation too large: {deviation}"
print("Verification passed.")

If verification fails, check:

That your partition labels correctly capture the cross-boundary gates
That you passed observables=observable.paulis (a PauliList) to partition_problem, not the SparsePauliOp itself
That all subcircuit experiments completed successfully
That your shot count is sufficient (increase shots to reduce statistical noise)

Common Mistakes

1. Cutting High-Entanglement Boundaries

Placing cuts on boundaries with many crossing gates does not reduce quantum resource requirements efficiently. If four CNOT gates cross the A-B boundary, you need four cuts, giving an overhead of 9^4 = 6,561x for gate cuts or 16^4 = 65,536x for wire cuts. Instead, rearrange your partition labels to minimize cross-boundary gates, or restructure the circuit so that entanglement across the boundary is concentrated in fewer gates.

2. Using num_samples=np.inf on Real Hardware

Setting num_samples=np.inf generates all possible basis combinations for exact QPD. This is correct for simulation-based verification, but on real hardware with finite shot budgets, you often want to set num_samples to a finite integer. The function then stochastically samples the most important basis combinations according to the quasi-probability distribution, reducing the number of subcircuit experiments at the cost of some reconstruction accuracy.

# For simulation/verification: exact decomposition
subexperiments, coefficients = generate_cutting_experiments(
    circuits=subcircuits,
    observables=subobservables,
    num_samples=np.inf,
)

# For real hardware: stochastic sampling with fewer experiments
subexperiments, coefficients = generate_cutting_experiments(
    circuits=subcircuits,
    observables=subobservables,
    num_samples=1000,  # Sample 1000 basis combinations
)

3. Underestimating the Shot Budget

The overhead from cutting is multiplicative. If you want 4,096 effective shots for your final expectation value and you have one wire cut:

required_shots_per_experiment = 16 * 4096 = 65,536

For two wire cuts:

required_shots_per_experiment = 256 * 4096 = 1,048,576

Forgetting to scale the shot count by gamma^2 gives a reconstructed expectation value with much higher variance than expected. The result may look wrong, but it is simply undersampled.

4. Applying Error Mitigation at the Wrong Level

As discussed in the error mitigation section: apply ZNE or other mitigation techniques to each subcircuit experiment individually, before passing results to reconstruct_expectation_values. Applying mitigation after reconstruction produces incorrect results because the quasi-probability weighting interacts nonlinearly with the extrapolation.

When to Use Circuit Cutting (Decision Framework)

Use circuit cutting when:

Your circuit exceeds the qubit count of any available QPU.
Your circuit’s cross-partition entanglement is low (1-2 gates cross the boundary).
You have access to multiple QPUs and want to parallelize.
The depth reduction from cutting significantly improves subcircuit fidelity.

Avoid circuit cutting when:

A QPU with enough qubits is available and the circuit depth is manageable.
The circuit requires more than 3 cuts (overhead becomes impractical).
The circuit has dense cross-partition entanglement with no clean boundary.

Always benchmark first: run both the full circuit and the cut version on a noisy simulator, then compare accuracy. Circuit cutting is a tool, not a universal improvement.

Key Points

Circuit cutting splits large circuits into smaller subcircuits using gate cutting or wire cutting, both relying on quasi-probability decomposition.
Gate cutting a CNOT costs 9x shots per cut. Wire cutting costs 16x shots per cut. Overhead compounds exponentially with the number of cuts.
Choose partition boundaries that minimize cross-boundary two-qubit gates.
Subcircuits run independently and can be parallelized across QPUs.
Apply error mitigation per subcircuit before reconstruction.
Verify your implementation by comparing cut vs uncut results on a noiseless simulator.
For circuits with low Schmidt rank across the partition boundary, entanglement forging offers lower overhead than general cutting.
Practical limit: 1-3 cuts. Beyond that, the shot overhead dominates any benefit.