In this section we give the main definitions and notations of duplicated genomes and rearrangements. Next, we generalize the definitions of rearrangements in order to introduce a formal definition of *breakpoint-duplication rearrangements*, and the *Genome Dedoubling Problem* studied in the paper.

### Duplicated genomes

A genome consists of linear or circular chromosomes that are composed of genomic markers. Markers are represented by signed integers such that the sign indicates the orientations of markers in chromosomes. By convention, – –*x* = *x.* A linear chromosome is represented by an ordered sequence of signed integers surrounded by the unsigned marker ○ at each end indicating the telomeres. A circular chromosome is represented by a circularly ordered sequence of signed integers. For example, (1 2 –3) (○ 4 –5 ○) is a genome composed of one circular and one linear chromosome.

Each genome contains at most two occurrences of each marker. Two copies of a same marker in a genome are called paralogs. If a marker *x* is present twice, one of the paralogs is represented by
. By convention,
. Here, such markers represent segments duplicated by a breakpoint-duplication rearrangement.

**Definition 1***A* duplicated genome *is a genome in which a subset of the markers are duplicated.*

For example,
is a duplicated genome where markers 1, 2, and 5 are duplicated. A *non-duplicated genome* is a genome in which no marker is duplicated. A *totally duplicated genome* is a duplicated genome in which all markers are duplicated. For example,
is a totally duplicated genome.

An *adjacency* in a genome is a pair of consecutive markers. Since a genome can be read in two directions, the adjacencies (*x y*) and (*–y –x*) are equivalent. For example, the genome
has seven adjacencies,
, and
.

**Definition 2***A* dedoubled genome *is a duplicated genome G such that for any duplicated marker x in G*, *either*
, *or*
*is an adjacency of G.*

For example,
is a dedoubled genome. The *reduction* of a dedoubled genome *G*, denoted by *G*
^{
R
}, is the genome obtained from *G* by replacing every pair
, or
by a single marker *x.* For example the reduction of
is *G*
^{
R
} = (1 –2) (○ –3 4 –5 ○).

### Rearrangement

A rearrangement operation on a given genome cuts a set of adjacencies of the genome called *breakpoints* and forms new adjacencies with the exposed extremities, while altering no other adjacency. In this paper, we consider two types of rearrangement operation called *double-cut-and-join* (*DCJ*) and *reversal.* In the sequel, the breakpoints of a rearrangement operation are indicated in the genome by the symbol _{▲}, and the new adjacencies are indicated in the genome by dots.

A

*DCJ* operation on a genome

*G* cuts two different adjacencies in

*G* and glues pairs of the four exposed extremities to form two new adjacencies. For example, the following DCJ cuts adjacencies (1 2) and

to produce

and (–5 2).

A

*reversal* on a genome

*G* is a DCJ operation that cuts two adjacencies (

*a b*) and (

*c d*) in a chromosome of

*G* of the form (…

*a b* …

*c d* …) to form two new adjacencies adjacencies (

*a* –

*c*) and (–

*b d*), thus reversing the orientation of the segment of

*G* beginning with marker

*b* and ending with marker

*c*. For example, the following reversal cuts adjacencies

and

and reverses the segment

.

A *DCJ* (*resp. reversal*) *scenario* between two genomes *A* and *B* is a sequence of DCJ (resp. reversal) operations allowing to transform *A* into *B.* The length of a scenario is the number of rearrangement operations composing the scenario.

The *DCJ* (*resp. reversal*) *distance* between two genomes *A* and *B* is the minimum length of a DCJ (resp. reversal) scenario between *A* and *B.*

### Breakpoint-duplication rearrangements

We now generalize the definitions of rearrangement operations to account for possible duplications at their breakpoints.

A *1-breakpoint-duplication DCJ* (1-BD-DCJ) operation on a genome *G* is a rearrangement operation that alters two different adjacencies (*a b*) and (*c d*) of *G*, by:

A *2-breakpoint-duplication DCJ* (2-BD-DCJ) operation on a genome *G* is a rearrangement operation that alters two different adjacencies (*a b*) and (*c d*) of *G*, by:

**Definition 3***A* breakpoint-duplication DCJ (*BD-DCJ*) *operation on a genome G is either a 1-BD-DCJ operation*, *or a 2-BD-DCJ operation.*

In the sequel, if some markers are duplicated by a BD-DCJ operation, they are indicated in bold font in the initial genome. For example, the following rearrangement is a 2-BD-DCJ operation that acts on adjacencies (–2 –1) and (4 –3), and duplicates markers 2 and 4. The intermediate step resulting in the duplication of markers 2 and 4 is shown above the arrow.

To summarize, a BD-DCJ operation consists of a *first step* in which one or two markers are duplicated, followed by a *second step* where a DCJ operation is applied. Similarly, we now define a *breakpoint-duplication reversal* (BD-reversal) operation.

**Definition 4***A* breakpoint-duplication reversal (*BD-reversal*) *operation on a genome G is a BD-DCJ operation such that the DCJ operation applied in the second step of the BD-DCJ operation is a reversal.*

For example, the following rearrangement is a BD-reversal that is a 1-BD-DCJ operation that acts on adjacencies (2 –1) and (–3 4), and duplicates marker 2.

A *BD-DCJ scenario* (resp. *BD-reversal scenario*) between a non-duplicated genome *A* and a duplicated genome *B* is a sequence composed of BD-DCJ (resp. BD-reversal) operations and possibly DCJ (resp. reversal) operations allowing to transform *A* into *B.*

**Definition 5***Given a non-duplicated genome A and a duplicated genome B*, *the* BD-DCJ distance (*resp.* BD-reversal distance) *between A and B is the minimal length of a BD-DCJ* (*resp. BD-reversal*) *scenario between A and B.*

We now give an obvious, but useful property allowing to reduce a BD-DCJ scenario to a DCJ scenario.

**Proposition 1***Given a non-duplicated genome A and a duplicated genome B*, *for any a BD-DCJ* (*resp. BD-reversal*) *scenario between A and B*, *there exists a DCJ* (*resp. reversal*) *scenario of same length between a dedoubled genome D and B such that the reduction of D is A* (*D*^{
R
} = *A*)*.*

**Proof.** Let *S* be a BD-DCJ (resp. BD-reversal) scenario between *A* and *B. D* is the genome obtained from *A*, by adding, for any marker *x* duplicated by a BD-DCJ operation in *S*, the marker
in a way to produce either adjacency
, or
as done in *S.* Thus, *D*^{
R
} = *A.* The DCJ (resp. reversal) scenario between *D*^{
R
} and *B* having the same length as *S*, is the sequence of DCJ (resp. reversal) contained in *S* or in BD-DCJ (resp. BD-reversal) operations of *S*, with the same order as in *S.* ■

For example, in the following, a BD-reversal scenario of length 4 between

*A* = (○ 1 2 3 4 5 ○) and

induces a reversal scenario of length 4 between

and

*B.*
### Genome dedoubling problem

We now state the genome dedoubling problems considered in this paper.

**Genome dedoubling problem:***Given a duplicated genome G*, *the* DCJ (resp. reversal) genome dedoubling problem *consists of finding a non-duplicated genome H such that the BD-DCJ* (*resp. BD-reversal*) *distance between H and G is minimal.*

Given a duplicated genome *G*, we denote by *d*
_{
dcj
}(*G*) (resp. *d*
_{
rev
}(*G*)), the minimum BD-DCJ (resp. BD-reversal) distance between any non-duplicated genome and *G.* From Proposition 1, the following proposition is straightforward.

**Proposition 2***Given a duplicated genome G*, *the DCJ* (*resp. reversal*) *genome dedoubling problem on G is equivalent to finding a dedoubled genome D such that the DCJ* (*resp. reversal*) *distance between D and G is minimal.*

The next proposition describes a further reduction of the genome dedoubling problem on a duplicated genome *G.*

**Proposition 3***Given a duplicated genome G*, *the* DCJ (resp. reversal) genome dedoubling problem *on G is equivalent to the* DCJ (resp. reversal) genome dedoubling problem *on the totally duplicated genome G*^{
T
}*obtained from G by replacing every maximal subsequence of non-duplicated markers beginning with a marker**x**by the pair*
.

**Proof.** See proof in Additional file 1 (Supplemental proofs). ■

For example, solving the DCJ (resp. reversal) genome dedoubling problem on
is equivalent to solving it on
. The transformations applied on *G* to obtain *G*
^{
T
} are indicated in bold font.

In the sequel, *G* will always denote a totally duplicated genome, and we focus in Sections **Genome dedoubling by DCJ** and **Genome dedoubling by reversal** on the problem of finding a dedoubled genome *D* such that the DCJ (resp. reversal) distance between *D* and *G* is minimal.