#### Sorting unsigned genomes

Theorem 7 that we will present below establishes the connection between the cycle/path decomposition of a breakpoint graph and the DCJ distance between two unsigned genomes. Its proof uses the following two lemmas (i.e., Lemmas 5 and 6).

**Lemma 5.***For every cycle/path decomposition of G*(

*A*,

*B*),

*there exists a signed version*
*and*
*of genomes A and B such that*
*where b is the number of black edges in the breakpoint graph G*(*A*, *B*) *and c* (*resp. I*_{
AA
}*and I*_{
AB
}) *is the number of cycles* (*resp. AA-paths and AB-paths*) *in the given cycle/path decomposition of G*(*A*, *B*)*.*

*Proof.* Note that every gene vertex would be visited twice if we traverse all the cycles/paths in a fixed cycle/path decomposition of *G*(*A*, *B*)*.* When a gene vertex (e.g., gene *a*) is visited for the first time, we may assume that we are visiting the 5′ end of the gene (denoted as *a*_{
h
}). When it is visited for the second time, we may assume that we are visiting the 3′ end of the gene (denoted as *a*_{
t
}). To obtain a signed genome
(represented as a set of adjacencies), we form an adjacency for every two extremities (or one extremity and one A-telomere) that are connected by a black edge in the given cycle/path decomposition. Similarly, to obtain a signed genome
we form an adjacency for every two extremities (or one extremity and one B-telomere) that are connected by a gray edge in the given cycle/path decomposition. It is easy to see that the resulting genomes
and
are the signed version of genomes *A* and *B*, respectively.

Moreover, the breakpoint graph

before closing its paths into cycles, preserves all the cycles/paths from the given cycle/path decomposition of

*G*(

*A*,

*B*)—that is, there are still

*b* black edges,

*I*
_{
AA
} AA-paths,

*I*
_{
AB
} AB-paths and

*I*
_{
BB
} BB-paths. After closing paths into cycles, the breakpoint graph

would have

black edges (as we close each BB-path into a cycle by adding one black edge) and

cycles. It hence follows from Theorem 4 that

**Lemma 6.***For every signed version*
*and*
*of genomes A and B*,

*there exists a cycle/path decomposition of G*(

*A*,

*B*)

*such that*
*where b is the number of black edges in the breakpoint graph G*(*A*, *B*) *and c* (*resp. I*_{
AA
}*and I*_{
AB
}) *is the number of cycles* (*resp. AA-paths and AB-paths*) *in this cycle/path decomposition of G*(*A*, *B*)*.*

*Proof.* Observe that we would obtain the breakpoint graph *G*(*A*, *B*) if we combine two extremity vertices of a same gene into a single vertex in the breakpoint graph
(before closing paths into cycles).

Therefore, the trivial cycle/path decomposition of
naturally gives rise to a cycle/path decomposition of *G*(*A*, *B*) which preserves the same numbers of black edges/cycles/AA-paths/AB-paths/BB-paths (denoted as *b*, *c*, *I*
_{
AA
}, *I*
_{
AB
} and *I*
_{
BB
}, respectively). After closing paths into cycles, the breakpoint graph
would have
black edges and
cycles, as justified in the preceding lemma. By Theorem 4, we then have

**Theorem 7.***Let A and B be two unsigned genomes defined on the same set of genes. Then*, *we have*

*d*_{
DCJ
}(*A*, *B*) = *b –* (*c* + *I*_{
AA
} + *I*_{
AB
})

*where b is the number of black edges in G*(*A*, *B*) *and c* (*resp.*, *I*_{
AA
}*and I*_{
AB
}) *is the number of cycles* (*resp.*, *AA-paths and AB-paths*) *in a maximum cycle/path decomposition of G*(*A*, *B*)*.*

*Proof.* Let us consider a maximum cycle/path decomposition of *G*(*A*, *B*)*.* By Lemma 5, we would obtain a signed version
and
of genomes *A* and *B* such that
It means that genome
can be transformed into genome
with a sequence of
DCJ operations. Observe that this same sequence of DCJ operations can be also used to transform genome *A* into genome *B.* It hence follows that

Assume now that a sequence of *d*
_{
DCJ
}(*A*, *B*) DCJ operations can be applied to transform genome *A* into genome *B.* Let
be a signed version of genome *A* in which all genes are positive. We then apply the same sequence of *d*
_{
DCJ
}(*A*, *B*) DCJ operations to the signed genome
The resulting genome
would be genome *B* if we disregard all the gene signs; in other words,
shall be a signed version of genome *B.* Thus,
On the other hand, by Lemma 6, there exists a cycle/path decomposition of *G*(*A*, *B*) such that
Thus, *d*
_{
DCJ
}(*A*, *B*) ≥ *b –* (c + *I*
_{
AA
} + *I*
_{
AB
})*.*