 Methodology article
 Open Access
A unifying model of genome evolution under parsimony
 Benedict Paten^{1}Email author,
 Daniel R Zerbino^{2},
 Glenn Hickey^{1} and
 David Haussler^{1, 3}
https://doi.org/10.1186/1471210515206
© Paten et al.; licensee BioMed Central Ltd. 2014
 Received: 12 December 2013
 Accepted: 8 May 2014
 Published: 19 June 2014
Abstract
Background
Parsimony and maximum likelihood methods of phylogenetic tree estimation and parsimony methods for genome rearrangements are central to the study of genome evolution yet to date they have largely been pursued in isolation.
Results
We present a data structure called a history graph that offers a practical basis for the analysis of genome evolution. It conceptually simplifies the study of parsimonious evolutionary histories by representing both substitutions and double cut and join (DCJ) rearrangements in the presence of duplications. The problem of constructing parsimonious history graphs thus subsumes related maximum parsimony problems in the fields of phylogenetic reconstruction and genome rearrangement. We show that tractable functions can be used to define upper and lower bounds on the minimum number of substitutions and DCJ rearrangements needed to explain any history graph. These bounds become tight for a special type of unambiguous history graph called an ancestral variation graph (AVG), which constrains in its combinatorial structure the number of operations required. We finally demonstrate that for a given history graph G, a finite set of AVGs describe all parsimonious interpretations of G, and this set can be explored with a few sampling moves.
Conclusion
This theoretical study describes a model in which the inference of genome rearrangements and phylogeny can be unified under parsimony.
Keywords
 Genome rearrangement
 Phylogenomics
 Ancestral reconstruction
Background
In genome evolution there are two interacting relationships between nucleotides of DNA resulting from two key features: DNA nucleotides descend from common ancestral nucleotides, and they are covalently linked to other nucleotides. In this paper we explore the combination of these two relationships in a simple graph model, allowing for change by the process of replication, where a complete sequence of DNA is copied, by substitution, in which the chemical characteristics of a nucleotide are changed, and by the coordinated breaking and rematching of covalent adjacencies between nucleotides in rearrangement operations. These processes have quite different dynamics: DNA molecules replicate essentially continuously, much more rarely substitutions occur and more rarely still rearrangement operations take place. For this reason, and because of inherent complexity issues, a wealth of models, data structures and algorithms have studied these processes either in isolation or in a more limited combination.
Such evolutionary methods generally start with a set of observed sequences in an alignment, an alignment being a partitioning of elements in the sequences into equivalence classes, each of which represents elements that are homologous, i.e. that share a recognisably recent common ancestor. Though alignments represent an uncertain inference, and though there optimisation for standard models is intractable for multiple sequences ([1]), we make the common assumption that the alignment is given, as efficient heuristics exist to compute reasonable genome alignments ([2–4]).
If the sequences in an alignment only differ from one another by substitutions and rearrangements that delete subsequences, or insert novel subsequences (collectively indels), then the alignment data structure is naturally a 2D matrix. In such a matrix, by convention, the rows represent the sequences and the columns represent the equivalence classes of elements. The sequences are interspersed with “gap” symbols to indicate where elements are missing from a column due to indels. From such a matrix alignment, phylogenetic methods infer a history of replication ([5]). Such a history is representable as a phylogenetic tree, whose internal nodes represent the most recent common ancestors (MRCA) of subsets of the input sequences. To create a history including the MRCA sequences, additional rows can be added to the matrix ([6–8]). Both the problem of imputing maximum parsimony phylogenetic trees from matrix alignments and calculating maximum parsimony MRCA sequences given a phylogenetic tree and a matrix alignment are NPhard ([9, 10]).
In addition to substitutions and short indels, homologous recombination operations are a common modifier of individual genomes within a population. The alignment of long DNA sequences related by these operations is also representable as a matrix. However, the history of replication of such an alignment is no longer generally representable as a single phylogenetic tree, as each column in the matrix may have its own distinct tree. To represent the MRCAs of such an alignment requires a more complex data structure, termed an ancestral recombination graph (ARG) ([11, 12]). It is NPhard under the infinite sites model (no repeated or overlapping changes) to determine the minimum number of homologous recombinations needed to explain the evolutionary history of a given set of sequences, and probably NPhard under more general models ([13]).
Larger DNA sequences, or complete genomes, are often permuted by more complex rearrangements, such that the matrix alignment representation is insufficient. Instead, the alignment naturally forms a graph called a breakpoint graph ([14, 15]). Assuming rearrangements are balanced (neither involving the gain or loss of material), inferring parsimonious rearrangement histories between two genomes has polynomial or better time complexity, whether based upon inversions ([16]), translocations ([17]) or doublecutandjoin (DCJ) operations ([18]). However, for three or more genomes with balanced rearrangements ([19]) or when rearrangements are unbalanced (involving the gain or loss of material) leading to duplications (additional copies of subsequences resulting from rearrangement), these exact parsimony methods are intractable. Exact solutions in the most general case are therefore only feasible for relatively small problems ([20]) before heuristics become necessary ([21, 22]).
Despite the hardness of the general case, there has been substantial work on computing maximum parsimony results, allowing for a wider repertoire of rearrangements. ElMabrouk studied inversions and indels, though gave no exact algorithm for the general case ([23]). Recently Yancopoulous ([24]) then Braga ([25]) considered the distance between pairs of genomes differing by DCJ operations and indels, the latter providing the first lineartime algorithm for balanced rearrangements and indels, and the former proposing a datastructure to model duplications. Many methods have been proposed that deal with the combination of rearrangements and duplications, for good recent reviews see ([26, 27]), however until recently there were no algorithms to our knowledge that explicitly unified both duplications and genome rearrangements as forms of general unbalanced rearrangement. First [28] provided a model allowing for a subset of duplications and deletions as well as balanced DCJ operations, giving a lower bound approximation, while [29] studied a model allowing atomic (single gene) duplications, insertions and deletions, but arrived at no closedform formula for the total number of rearrangements.
The graph model introduced in this paper is capable of representing a general evolutionary history for any combination of replication, substitution and rearrangement operations, including duplications and homologous recombinations. It therefore generalises phylogenetic trees, graphs representing histories with indels, ancestral recombination graphs and breakpoint graphs, building upon the methods described above. We start by introducing this graph and then develop a maximum parsimony problem that, somewhat imperfectly, generalises maximum parsimony variants of all the problems mentioned, facilitating the study of all these subproblems in one unified domain. We adopt the common assumption that all substitutions and rearrangements occur independently of one another, and account for tradeoffs between them by independent rearrangement and substitution costs, which are themselves essentially sums over the numbers of inferred events. Importantly, replications that are combined with unbalanced rearrangements are costed by the underlying rearrangement cost. We finally provide a bounded sampling approach to cope with the NPhardness of the general maximum parsimony problem.
Results
Sequence graphs and threads
Sequence graphs are used extensively in comparative genomics, in rearrangement theory typically under the name (multi or master) breakpoint graph ([14, 15, 22]) and in alignment under the name Abruijn ([30]) or adjacency graph ([31]). We use the following bidirected form, which is similar to that used by [32] for sequence assembly.
A (bidirected) sequence graph G=(V_{ G },E_{ G }) is a graph in which a set V_{ G } of vertices are connected by a set E_{ G } of bidirected edges ([33]), termed adjacencies. A vertex represents a subsequence of DNA termed a segment. A vertex x is oriented, having a tail side and a head side, respectively denoted x_{ h e a d } and x_{ t a i l }. These categories {h e a d,t a i l} are called orientations. An adjacency, which represents the covalent bond between adjacent nucleotides of DNA, is a pair set of sides. We refer to the two sides contained in an adjacency as its endpoints. Adjacencies are bidirected, in that each endpoint is not just a vertex, but a vertex with an independent orientation (either head or tail). For convenience, we say a side is attached if it is contained in an adjacency, else it is unattached. By extension, we say a vertex is attached if either of its sides are attached, else it is unattached.
Associated with a sequence graph is a labeling, i.e. a function l:V_{ G }→Σ^{∗}∪{∅} where Σ={A/T,C/G,G/C,T/A} is the alphabet of bases, which are oriented, paired nucleotides of DNA, and Σ^{∗} is the set of all possible labels consisting of finite sequences of bases in Σ. Bases and labels are directed. For ρ/τ∈Σ, ρ is the forward complement and τ is the reverse complement. If a vertex is traversed from its tail to its head side, its label is read as the sequence of its forward complements. Conversely, if traversed from head to tail, the label is read as the reverse sequence of the reverse complements. A vertex x∈V_{ G } for which l(x)=∅ is unlabeled. A label represents a multibase allele. A path through the sequence therefore represents a single DNA sequence (and its reverse complement) whose bases are encoded by the labels of the vertices, where unlabeled vertices represent missing information.
History graphs
Nucleotides of DNA derive from one another by a process of replication. This replication process is represented in history graphs, which add ancestry relationships to thread graphs.
To avoid confusion we define terminology to discuss branch relationships. Each weakly connected component of branches forms a branchtree. Two vertices are homologous if they are in the same branchtree. A vertex y is a descendant of a vertex x, and conversely y is an ancestor of x, if y is reachable by a directed path of branches from x. If two homologous vertices do not have an ancestor/descendant relationship then they are indirectly related. For a branch e=(x,y), x is the parent of e and y, and y is the child of e and a child of x. Similarly, e is the parent branch of y and a child branch of x. A vertex is a leaf if it has no incident outgoing branches, a root if it has no incident incoming branches, else it is internal. We reuse the terminology of parent, child, homologous, ancestor, descendant and indirectly related with sides. Two sides have a given relationship if their vertices have the relationship and they have the same orientation. Similarly, a side is a leaf (resp. root) if its vertex is a leaf (resp. root).
Simple histories
We formally define a class of history graphs, called simple histories, for which parsimonious sequences of substitutions and rearrangements can be trivially derived.
A bilayered history graph is a history graph whose threads can be partitioned into root and leaf layers, such that every branch connects a vertex in the root layer with a vertex in the leaf layer. A rearrangement epoch is a bilayered history graph in which every branch tree is a root with 1 child, every vertex is labeled, and any set of homologous sides are either all attached or all unattached. For n≥2, an nway replication epoch is a bilayered history graph in which every branch tree is a root with n children, every vertex is labeled, any set of homologous sides are either all attached or all unattached, if two root sides x_{ α } and y_{ β } are attached by an adjacency then each child of x_{ α } is attached to a child of y_{ β }, and a root vertex has at most one child with a label different from its own. An epoch is either a rearrangement epoch or an nway replication epoch for some n≥2. A layered history graph is a history graph that can be edge partitioned into a finite sequence of bilayered history graphs, such that the leaf layer of a contained bilayered history graph is the root layer of the following bilayered history graph. A simple history is a layered history graph whose bilayered subgraphs are all epochs. An example simple history with epoch subgraphs is shown in Figure 2(D).
A substitution occurs on a branch if the labels of its endpoints are not identical. Note that a substitution can occur either in a rearrangement or a replication epoch. The substitution cost of a simple history H is the total number of substitutions, denoted s(H). The example simple history in Figure 2(D) has substitution cost 4. Note the requirement that all homologous sides in a simple history be either all attached or all unattached does not forbid rearrangements involving the observed ends of chromosomes (linear threads), because it is always possible to add material to a simple history at zero cost that attaches such unattached sides and allows them to participate in rearrangements.
The substitution cost defined deals, abstractly, with changes of alleles in which any change between alleles is scored equally. However for the case Σ^{∗}=Σ, i.e. single base labels, the substitution cost is the minimum number of single base changes. Furthermore, any history graph in which all homologous labels have the same length can easily be converted to a semantically equivalent history graph for which Σ^{∗}=Σ. More complex substitution costs to deal with the case where the alphabet represents the alleles of genes, as is commonly dealt with in rearrangement theory, are straightforward but not pursued here for simplicity.
A rearrangement cycle in a rearrangement epoch is a circular path consisting of one or more repetitions of the basic pattern consisting of an adjacency edge in the root layer, a forward branch to the leaf layer, an adjacency edge in the leaf layer and a reverse branch to the root layer. Its size is the number of repetitions in it of this basic pattern minus 1. A linear path that follows this same basic pattern but does not complete every pattern and return to the original vertex is a degenerate rearrangement cycle. Its size is the size of the smallest rearrangement cycle that can be obtained from it by adding edges. The rearrangement cost of a simple history H is the total size of all rearrangement cycles in it, denoted r(H). This cost is known to be the number of doublecutandjoin (DCJ) operations needed to achieve all the rearrangements.
Lemma 1.
The rearrangement cost of an epoch is the minimum number of doublecutandjoin (DCJ) operations required to convert the root layer’s adjacencies into the leaf layer’s adjacencies.
Proof.
Similar to that given in [18].
The example simple history in Figure 2(D) has rearrangement cost 3.
Because different studies lay different emphases on substitution or rearrangement (e.g. because of the available data) and because the events do not have the same probability in practice, we allow for a degree of freedom in the definition of the overall cost function. A (simple history) cost function for a simple history is any monotone function on the substitution and rearrangement costs in which both substitutions and rearrangements have nonzero cost.
Reduction
Not all history graphs are as detailed as simple histories. We define below a partial order relationship that describes how one graph can be a generalization of another graph, so for example, a less detailed history graph can be used to subsume multiple simple histories.
A branch whose child is unlabeled and unattached is referred to as having a freechild. A branch whose parent is unlabeled, unattached and a root with a single child is referred to as having a freeparent. A vertex is isolated if it has no incident adjacencies or branches.

Deletes an adjacency, an isolated vertex or the label of a vertex.

Contracts a branch with a freechild or freeparent.
Lemma 2.
The result of a reduction operation is itself a history graph.
A history graph G is a reduction of another history graph G^{′} if G is isomorphic to a graph that can be obtained from G^{′} by a sequence of reduction operations, termed a reduction sequence.
Lemma 3.
The reduction relation is a partial order.
We write $G\u22de{G}^{\prime}$ to indicate that G is a reduction of G^{′} and G≺G^{′} to indicate that G is a reduction of G^{′} not equal to G^{′}. Like reduction and extension operations, if G is a reduction of G^{′}, G^{′} is an extension of G. An examination of the reduction relation is in the Discussion section.
History graph cost
Using the parsimony principle, we now extend parsimony cost functions, previously defined on simple histories, to all history graphs.
A simple history H that is an extension of a history graph G is called a realisation of G. The set $\mathcal{\mathscr{H}}(G)$ is the realisations of G.
Lemma 4.
The problem of finding the cost of a history graph is NPhard.
Proof.
There are parsimony problems on either substitutions or rearrangements alone that are NPhard and can be formulated as special cases of the problem of finding the minimum cost realisation of a history graph ([9, 34]).
The lifted graph
Although determining the cost of a history graph is NPhard, we will show that the cost can be bounded such that the bounds become tight for a broad, characteristic subset of history graphs. To do this we introduce the concept of lifted labels and adjacencies, which are used to project information about labels and adjacencies from descendant to ancestral vertices and are useful in reasoning about the cost of a history graph.
For a labeled vertex y, a lifted label is a label identical to l(y) on its lifting ancestor. For a vertex the lifted labels is therefore a multiset, because the same lifted label may be lifted to a lifting ancestor from multiple distinct descendants and each is considered an element of the multiset.
For an adjacency {x_{ α },y_{ β }}, a lifted adjacency is a bidirected edge {A(x_{ α }),A(y_{ β })}. In analogy with the lifted labels for a vertex, the lifted adjacencies for a side is the multiset of lifted adjacencies incident with the side.
A history graph G with freeroots, lifted labels and lifted adjacencies is a lifted graph L(G). Figure 4(A) shows an example lifted graph that outlines these concepts.
Some lifted elements do not imply change between descendant and ancestral states, while others do. To formalise such a notion we define trivial and nontrivial labels and and adjacencies. A lifted label ρ of a labeled vertex x is trivial if l(x)=ρ. A lifted label ρ on an unlabeled vertex x (necessarily a free root) is trivial if it is the only lifted label on x. Otherwise a lifted label is nontrivial.
A junction side is a most recent common ancestor (MRCA) of two attached, indirectly related sides. For a history graph G, a lifted adjacency e={A(x_{ α }),A(y_{ β })} is trivial if there exists no unattached junction side on the path of branches from (but excluding) A(x_{ α }) to (but excluding) x_{ α }, or on the path of branches from (but excluding) A(y_{ β }) to (but excluding) y_{ β } and either there is a (regular) adjacency between A(x_{ α }) and A(y_{ β }) in G or A(x_{ α }) and A(y_{ β }) are free roots, else e is nontrivial. See Figure 4(A) for examples of trivial and nontrivial labels and adjacencies.
Ancestral variation graphs
We can now define a broad class of history graphs for which cost can be computed in polynomial time. To do this we will define ambiguity, information that is needed to allow the tractable assessment of cost. There are two types of ambiguity. The substitution ambiguity of a history graph G, denoted u_{ s }(G), is the total number of nontrivial lifted labels in excess of one per vertex. Substitution ambiguity reflects uncertainty about MRCA bases. The substitution ambiguity of the history graph in Figure 2(B) is 1, as there exists one vertex with two nontrivial lifted labels.
The rearrangement ambiguity of a history graph G, denoted u_{ r }(G), is the total number of nontrivial lifted adjacency incidences in excess of one per side.Rearrangement ambiguity reflects uncertainty about MRCA adjacencies. The rearrangement ambiguity of the history graph in Figure 2(B) is 5, because two sides have three incident nontrivial lifted edges and one side has two incident nontrivial lifted edges.
The ambiguity of a history graph G is u(G)=u_{ s }(G)+u_{ r }(G). An ancestral variation graph (AVG) H is a history graph such that u(H)=0, i.e. an unambiguous history graph.
Lemma 5.
Simple histories are AVGs.
Bounds on cost
We provide trivially computable lower and upper bound cost functions for history graphs that are tight for AVGs.
The lower bound substitution cost (LBSC) of a history graph G, denoted s_{ l }(G), is the total number of distinct (not counting duplicates in the multiset) nontrivial lifted labels at all vertices minus the number of unlabeled vertices with nontrivial lifted labels (necessarily free roots). The LBSC of the history graph in Figure 2(B) is 4.
The upper bound substitution cost (UBSC) of a history graph G, denoted s_{ u }(G), is the total number of nontrivial lifted labels at all vertices minus the maximum number of identical lifted labels at each unlabeled vertex with nontrivial lifted labels (again, necessarily free roots). The UBSC of the history graph in Figure 2(B) is 5. For the AVG in Figure 5, LBSC = UBSC = 4.
The module graph of a history graph G is a multigraph in which the vertices are the sides of vertices in L(G) that have incident real or lifted adjacencies and the edges are the real and lifted adjacencies in L(G) incident with these sides. Each connected component in a module graph is called a module. The set of modules in the module graph for G is denoted M(G). Figure 4(B) shows the modules for Figure 4(A).
For a history graph that is a simple history this definition is equivalent to the earlier definition of rearrangement cost for simple histories.
The upper bound rearrangement cost (UBRC) of a history graph G, denoted r_{ u }(G), is the total number of nontrivial lifted adjacencies in L(G) minus the number of modules in M(G) in which every side has exactly one incident nontrivial lifted edge. The LBRC of the history graph in Figure 2(B) is 3 and its UBRC is 6. For the AVG in Figure 5 LBRC = UBRC = 3.
Theorem 1.
For any history graph G and any cost function c, c(s_{ l }(G),r_{ l }(G))≤C(G,c)≤c(s_{ u }(G),r_{ u }(G)) with equality if G is an AVG.
The proof is in given the Methods section.
Theorem 1 demonstrates that LBSC and LBRC are lower bounds on cost, UBSC and UCRC are upper bounds on cost, and that all these bounds become tight at the point of zero ambiguity. This implies that to assess cost of an arbitrary history graph G we need only search for extensions of G to the point that they have zero ambiguity and not the complete set of simple history realisations of G. For an AVG H, as the lower and upper bounds on cost are equivalent, we write r(H)=r_{ l }(H)=r_{ u }(H) and s(H)=s_{ l }(H)=s_{ u }(H).
Goptimal AVGs
We now explore the process of sampling AVG extensions of an initial starting graph. Though it is possible to start from any history graph, in practice we are likely to start from a history graph G based on sequence alignments, such as that shown in Figure 2(A). If G is already an AVG, by Theorem 1, it is trivial to assess its cost. If not we sample AVG extensions of G in order to assess cost and explore the set of most parsimonious realisations of G. With the aim of restricting this search, ultimately to a finite space, we first define the set of Goptimal AVGs.
An AVG extension H of a history graph G is Gparsimonious w.r.t. a cost function c if C(G,c)=c(s(H),r(H)). The set of Gparsimonious AVGs is necessarily infinite: it is always possible to add arbitrary vertices without affecting substitution or rearrangement costs. To avoid the redundant sampling of AVG extensions of G and their own extensions we define the notion of minimality.
An AVG extension H of G is Gminimal if there is no other AVG H^{′} such that G≺H^{′}≺H. The set of Gminimal AVGs contains those AVGs that can not be reduced without either ceasing to be AVGs or extensions of G. This set is also infinite for some DNA history graphs (Lemma 9 below).
An AVG is Goptimal w.r.t. a cost function c if it is both Gparsimonious w.r.t. to c and Gminimal. We establish below that the set of Goptimal AVGs is finite for any history graph G. By definition, any Gparsimonious AVG is either Gminimal or has a Gminimal reduction therefore we can implicitly represent and explore the set of parsimonious realisations of G by sampling just the Goptimal AVGs.
Gbounded history graphs
Unfortunately, because the history graph cost problem is NPhard, it is unlikely that there exists an efficient way to sample only Goptimal. Instead, we now define a finite bounding set that contains Goptimal and can be efficiently searched. Conveniently this bounding set is the same for all cost functions.
A side x_{ α } is a bridge side if it is not a junction, is incident with one nontrivial lifted adjacency and an adjacency e that defines a trivial lifted adjacency e^{′} whose A(x_{ α }) endpoint is a junction side incident with a nontrivial lifted adjacency, and such that if e is deleted at least one endpoint of e^{′} in the original graph remains a junction side in the resulting graph (see Figure 6(C,D)). An adjacency is a junction (again, overloading the term junction) if either of its endpoints are junctions, else it is a bridge (overloading bridge) if either of its endpoints are bridge sides.
An element is nonminimal if it is a branch with a freechild or freeparent, an isolated vertex, or label or adjacency that is not a junction or bridge.
For $G\u22de{G}^{\prime}$, an element in G^{′} is Greducible if there exists a reduction operation in a reduction sequence from G^{′} to G that either deletes the element if it is an adjacency, label or vertex or contracts it if it is a branch. We are interested in the set of Greducible elements of an extension of G, as they are the elements which may be added and removed during an iterative sampling procedure.
For $G\u22de{G}^{\prime}$, the Gunbridged graph of G^{′} is the reduction resulting from the deletion of all Greducible bridge adjacencies in G^{′}. A side x_{ α } that has no attached descendants is a hanging side. A pair of adjacencies e and e^{′}, each with a hanging side, and such that e has an endpoint whose most recent attached ancestor is incident with e^{′}, form a pair of pingpong adjacencies. We call e the ping adjacency and e^{′} the pong adjacency (Figure 6(E)).
A history graph G^{′} is Gbounded if it is an extension of G that does not contain a Greducible nonminimal element and its Gunbridged graph does not contain a Greducible ping adjacency.
Theorem 2.
The set of Gbounded AVGs contains the Goptimal AVGs for every cost function.
The proof is given in the Methods section.
Importantly, the following theorem demonstrates that there is a constant k such that any Gbounded history graph is at most k times the cardinality of G.
Theorem 3.
A Gbounded history graph contains less than or equal to max(0,10n−8)Greducible adjacencies and max(0,2m−2,20n−16,20n+2m−18) additional vertices, where n is the number of adjacencies in G and m is the number of labeled vertices in G. This bound is tight for all values of n and m.
The proof is given in the Methods section.
The set of Gbounded history graphs and, by inclusion, the set of Goptimal AVGs are therefore finite.
The Gbounded poset
Finally we demonstrate how to navigate between Gbounded history graphs using a characteristic set of operations that define a hierarchy between these graphs.

If x is unattached and unlabeled and has a Greducible parent branch, the contraction of the parent branch, renaming the resulting merged vertex x.

If x is then an unattached, unlabeled root and has a single Greducible child branch, the contraction of the child branch, renaming the resulting merged vertex x.

The deletion of x if subsequently isolated, unlabeled and Greducible.

a label detachment: the deletion of a Greducible label on a vertex x, followed by the composite minimisation of x (Figure 7(AC)).

an adjacency detachment: the deletion of a Greducible adjacency {x_{ a },y_{ ß }} followed by the composite minimisation of x and y (Figure 7(DF)). The inverse of an adjacency detachment is an adjacency attachment.

a lateraladjacency detachment: the adjacency detachment of a pair of Greducible junction adjacencies {x_{ a },y_{ ß }} and {A(x_{ a }),A(y_{ ß })}, and a subsequent adjacency attachment that creates an adjacency that includes x_{ a } or y_{ ß } as an endpoint (Figure 7(DE)).
Note that the first two Gbounded reduction operations are combinations of reduction operations, while the lateraladjacency detachment, which proves necessary to avoid creating intermediate graphs with Greducible pingpong edges, involves both reduction and extension operations, but always reduces the total number of adjacencies. As with reduction operations, the inverse of a Gbounded reduction operation is a Gbounded extension operation. A Gbounded history graph G^{′} is a Gbounded reduction (resp. extension) of another Gbounded history graph G^{′′} if G^{′} is isomorphic to a graph that can be obtained from G^{′′} by a sequence of Gbounded reduction (resp. extension) operations.
Lemma 6.
The Gbounded reduction relation is a partial order.
The Gbounded poset is the set of Gbounded history graphs with the Gbounded reduction relation. We write ≺_{ G } to denote the Gbounded reduction relation and ≺·_{ G } to denote its covering relation (i.e. A≺·_{ G }B iff A≺_{ G }B and there exists no C such that A≺_{ G }C≺_{ G }B).
Theorem 4.
The Gbounded poset is finite, has a single least element G, and its maximal elements are all AVGs. Also, G^{′}≺·_{ G }G^{′′} iff there exists a single Gbounded reduction operation that transforms G^{′′} into G^{′}.
The proof is given in the Methods section.
A basic implementation
The previous four theorems establish the mechanics of everything we need to sample the finite set of Goptimal AVGs, and thus, amongst other things, determine the cost of a history graph. Although it will require further work to establish practical and efficient sampling algorithms, we have implemented a simple graph library in Python that for an input history graph G iteratively generates Gbounded AVGs (https://github.com/dzerbino/pyAVG) through sequences of Gbounded extension operations.
Simulation results assessing substitution ambiguity and cost
exp.  s(H)  u_{ s }(G)  s_{ l }(G)  s_{ u }(G)  s(H_{ s m i n })  s(H_{ s m a x }) 

1  3  10  1  1  1  2 
2  1  14  1  2  2  3 
3  2  15  2  3  3  3 
4  3  12  2  2  2  4 
5  2  13  2  2  2  4 
6  2  12  2  2  2  5 
7  2  10  1  1  1  2 
8  1  13  1  1  1  2 
9  3  11  0  0  0  0 
10  4  8  2  2  2  3 
11  2  10  2  2  2  3 
12  2  13  1  1  1  1 
13  2  11  1  2  2  3 
14  2  11  2  2  2  4 
15  3  14  2  2  2  2 
16  2  10  1  1  1  1 
17  2  30  1  1  1  1 
18  3  13  1  1  1  1 
19  2  10  0  0  0  0 
20  1  9  1  1  1  1 
Simulation results assessing rearrangement ambiguity and cost
exp.  r(H)  u_{ r }(G)  r_{ l }(G)  r_{ u }(G)  r(H_{ r m i n })  r(H_{ r m a x }) 

1  2  12  2  10  2  9 
2  2  20  2  14  2  14 
3  2  20  2  14  2  12 
4  2  20  2  14  2  14 
5  2  18  1  13  1  11 
6  2  8  2  7  2  6 
7  2  8  0  7  0  4 
8  2  18  1  13  2  10 
9  2  10  1  7  1  7 
10  2  14  0  11  0  8 
11  2  6  0  6  0  4 
12  2  6  1  7  1  4 
13  2  16  0  12  0  9 
14  2  20  2  14  4  12 
15  2  20  1  14  1  10 
16  2  6  0  5  0  5 
17  1  26  1  17  1  13 
18  2  18  1  13  1  11 
19  2  6  0  6  0  5 
20  2  4  2  5  2  2 
For these simulations the minimum rearrangement cost of any sampled AVG is often close or equal to r_{ l }(G), while the maximum rearrangement cost of any sampled AVG is generally slightly greater than r_{ u }(G). Notably, we found that AVG extensions sometimes had lower cost than the original simple history, this occurring because of the information loss that resulted from reducing H to G.
Repeating these experiments with histories that started with 10 root vertices in the simple history, but which were otherwise simulated identically, demonstrates that the naive random search procedure implemented here fails to find reasonable histories within a set of only 20,000 random samples (data not shown), so, as might be expected, more intelligent sampling strategies will be needed to find parsimonious interpretations of even moderately complex datasets. However, with more efficient sampling algorithms, a history graph sampling algorithm could be applied to find solutions to various established parsimony problems, such as the DCJ median problem, or be used for less explored problems, such as the inference of gene trees incorporating synteny information.
Discussion
A valid permutation of a reduction sequence is a permutation in which all operations remain reduction operations when performed in sequence. Clearly not all permutations of a reduction sequence have this property, however the following lemma illustrates the relationship between valid permutations.
Lemma 7.
All valid permutations of a reduction sequence create isomorphic reductions.
Reduction is somewhat analogous to a restricted form of the graph minor. Importantly, the graph minor is a wellquasiordering (WQO) ([35]), i.e. in any infinite set of graphs there exists a pair such that one is the minor of the other.
Lemma 8.
Reduction is not a WQO.
Proof.
Consider the infinite set of cyclic threads, they are not reductions of one another.
An ordering is a WQO if every set has a finite subset of minimal elements. In contrast, it can be shown that for the reduction relation, even the set of AVG extensions of a single base history G can have an infinite set of minimal elements.
Lemma 9.
There exists a history graph G with an infinite number of Gminimal extensions.
The proof is given in the Methods section.
One barrier to exploring the Gbounded poset is deciding for a pair of history graphs G and G^{′} such that $G\u22de{G}^{\prime}$ if an element is Greducible. This problem is of unknown complexity, and may well be NPhard. To avoid the potential complexity of this problem we can define an alternative notion of reducibility. A fix for (G,G^{′}), where $G\u22de{G}^{\prime}$, is a history subgraph of $({V}_{{G}^{\prime}},{E}_{{G}^{\prime}},{B}_{{G}^{\prime}}^{+})$ isomorphic to G, where ${B}_{{G}^{\prime}}^{+}$ is the transitive closure of ${B}_{{G}^{\prime}}$. Starting from an input history graph G and a fix isomorphic to it, we can easily update the fix as we create extensions of G. For an extension of G, elements in the fix become the equivalent of Girreducible, while elements not in the fix become the equivalent of Greducible. From a starting graph we can therefore explore a completely analogous version of Gbounded, replacing the question of Greducibilty with membership of the fix.
Following from Lemma 7, there is a bijection between the set of fixes for $G\u22de{G}^{\prime}$ and the set of equivalence classes of reduction sequences that are all valid permutations of each other. This is the limitation of considering membership of a fix instead of assessing if an element is Greducible, it limits us to considering only a single equivalence class of reduction sequences in exploring the analogous poset to Gbounded.
It is in general possible to reduce the size of the set Gbounded while still maintaining the properties that it can be efficiently sampled and contains Goptimal. However, this is likely to be at the expense of making the definition of Gbounded more complex. One approach is to add further “forbidden configurations” to the definition of Gbounded, like the Greducible ping adjacencies that are forbidden in the current definition of Gbounded. Forbidding these was essential to making Gbounded finite, but we might consider also forbidding other configurations just to make Gbounded smaller.
It is possible to consider a graph representation of histories that use fewer vertex nodes if we are willing to allow for the possibility that a subrange of the sequence of a vertex be ancestral to a subrange of the sequence of another vertex. This is a common approach in ancestral recombination graphs ([11]). Such a representation entails the additional complexity of needing to specify the sequence subranges for every branch, but may in some applications be a worthwhile trade off for reducing the number of vertices in the graph. The theory of such graphs is mathematically equivalent to the theory of the history graphs presented here, but the implementation would differ.
Conclusion
We have introduced a graph model in which a set of chromosomes evolves via the processes of whole chromosome replication, gain and loss, substitution and DCJ rearrangements. We have demonstrated upper and lower bounds on maximum parsimony cost that are trivial to compute despite the intractability of the underlying problem. Though these cost bounding functions are relatively crude and can almost certainly be tightened for many cases, they become tight for AVGs. This implies that we only need to reach AVG extensions to assess cost when sampling extensions.
To our knowledge, this is the first fully general model of chromosome evolution by substitution, replication, and rearrangement. However, it has its limitations. For example, it treats common rearrangements, such as recombinations and indels as any other rearrangement, and only takes into account maximum parsimony evolutionary histories. We anticipate future extensions that incorporate more nuanced cost functions, as well as probabilistic models over all possible histories.
The constructive definition of the Gbounded poset, coupled with the upper and lower bound functions, suggests simple branch and bound based sampling algorithms for exploring lowcost genome histories. To facilitate the practical exploration of the space of optimal and near optimal genome histories, we expect that more advanced sampling strategies across the Gbounded poset could be devised.
Methods
Proof of Theorem 1
We first define some convenient notations to describe lifted labels and edges. For a vertex x let ${L}_{x}^{\prime}=({L}_{x},{N}_{x})$ be its multiset of lifted labels, where L(x) is the set of distinct lifted labels for x, and for each lifted label ρ, N_{ x }(ρ) is the number of times ρ appears as a lifted label for x, i.e. L_{ x }={l(y):A(y)=x}⊆Σ^{∗} and ${N}_{x}:{L}_{x}\to {\mathbb{Z}}_{+}$ such that N_{ x }(ρ)={y:A(y)=x,l(y)=ρ}.
For a side x_{ α }, and overloading notation, let ${L}_{{x}_{\alpha}}^{\prime}=({L}_{{x}_{\alpha}},{N}_{{x}_{\alpha}})$ be its multiset of lifted edges, where L(x_{ α }) is the set of distinct lifted adjacencies incident with x_{ α }, and for each lifted adjacency {x_{ α },w_{ γ }}, ${N}_{{x}_{\alpha}}(\{{x}_{\alpha},{w}_{\gamma}\})$ is the number of sides whose lifting ancestor is x_{ α }, and which are connected by an adjacency to a side whose lifting ancestor is w_{ γ }, i.e. ${L}_{{x}_{\alpha}}=\left\{\right\{{x}_{\alpha}=A({y}_{\alpha}),A({z}_{\beta})\}:\{{y}_{\alpha},{z}_{\beta}\}\in {E}_{G}\}$ and ${N}_{{x}_{\alpha}}={L}_{{x}_{\alpha}}\to {\mathbb{Z}}_{+}$ such that ${N}_{{x}_{\alpha}}(\{{x}_{\alpha},{w}_{\gamma}\})=\{{y}_{\alpha}:\{{x}_{\alpha}=A({y}_{\alpha}),{w}_{\gamma}\}\in {L}_{{x}_{\alpha}}\}$.
Note that for a side x_{ α }, ${N}_{{x}_{\alpha}}(\{{x}_{\alpha},{w}_{\gamma}\})$ gives the multiplicity of lifted adjacency incidences with x_{ α }, not the multiplicity of {x_{ α },w_{ γ }}. In particular, if two sides x_{ α } and ${x}_{\alpha}^{\prime}$ are attached and share the same lifting ancestor A(x_{ α }), then ${N}_{A({x}_{\alpha})}(\{A({x}_{\alpha}),A({x}_{\alpha})\})$ is incremented by 2. On the contrary, if x_{ α } is connected to w_{ γ } and A(x_{ α }) is distinct from A(w_{ γ }), then both ${N}_{A({x}_{\alpha})}(\{A({x}_{\alpha}),A({w}_{\gamma})\})$ and ${N}_{A({w}_{\gamma )}}(\{A({x}_{\alpha}),A({w}_{\gamma})\})$ are incremented by 1.
For a vertex (resp. side) x the multiset of nontrivial lifted labels (adjacencies) is ${\stackrel{~}{L}}_{x}^{\prime}=({\stackrel{~}{L}}_{x},{\xd1}_{x})\subseteq {L}_{x}^{\prime}$.
The equivalence of LBSC to UBSC and LBRC to UBRC for AVGs
Lemma 10.
For any AVG H, s_{ l }(H)=s_{ u }(H).
Proof.
A module is simple if each side has at most one incidence with a nontrivial lifted adjacency.
Lemma 11.
All modules in an AVG are simple.
Proof.
Follows from definition of rearrangement ambiguity.
Lemma 12.
For an AVG H, r_{ l }(H)=r_{ u }(H).
Proof.
Let M be a simple module and let ${k}_{M}=\sum _{{x}_{\alpha}\in {V}_{M}}{\delta}_{1,{\stackrel{~}{L}}_{{x}_{\alpha}}^{\prime}}$, i.e. the number of sides in V_{ M } with a single incidence with a nontrivial lift.
A bounded transformation of a history graph into an AVG
In this section we will prove that any history graph G has an AVG extension H such that s_{ u }(G)≥s_{ u }(H) and r_{ u }(G)≥r_{ u }(H). To do this we define sequences of extension operations that when applied iteratively and exhaustively construct such an extension.
Lemma 13.
For any history graph G containing an ambiguous freeroot there exists a root labeling extension G^{′} of G such that s_{ u }(G)=s_{ u }(G^{′}), r_{ u }(G)=r_{ u }(G^{′}) and u(G)>u(G^{′}).
For a branch (x,x^{′}) an interpolation is the extension resulting from the creation of a new vertex x^{′′} and branches (x,x^{′′}) and (x^{′′},x^{′}) and the deletion of (x,x^{′}). Let x be a labeled and ambiguous vertex and x^{′} be a labeled vertex such that A(x^{′})=x and l(x)≠l(x^{′}). A substitution ambiguity reducing extension is the interpolation of a vertex x^{′′} along the parent branch of x^{′} labeled with l(x) (See Figure 11(B)).
Lemma 14.
For any history graph G containing no ambiguous freeroots and such that u_{ s }(G)>0, there exists a substitution ambiguity reducing extension G^{′} of G such that s_{ u }(G)=s_{ u }(G^{′}), r_{ u }(G)=r_{ u }(G^{′}) and u(G)>u(G^{′}).
The following is used for eliminating rearrangement ambiguity. For an unattached junction side x_{ α } a junction side attachment extension is the extension resulting from the following: If x_{ α } has no attached ancestor, the creation of a new vertex and adjacency connecting a side of the new vertex to x_{ α } (see Figure 11(C) for an example), else {A(x_{ α }),y_{ β }}∈E_{ G } and the extension is the creation of a new vertex y^{′}, branch (y,y^{′}) and adjacency $\{{x}_{\alpha},{y}_{\beta}^{\prime}\}$ (See Figure 11(D)).
Lemma 15.
For any history graph G containing an unattached junction side, there exists a junction side attachment extension G^{′} of G such that s_{ u }(G)=s_{ u }(G^{′}), r_{ u }(G)≥r_{ u }(G^{′}), u(G)≥u(G^{′}) and G^{′} contains one less unattached junction side than G.
Let {x_{ α },y_{ β }} and {A(x_{ α }),z_{ γ }} be a pair of adjacencies and A(x_{ α }) be ambiguous. A rearrangement ambiguity reducing extension is the interpolation along the parent branch of x a vertex x^{′}, the creation of a new vertex z^{′}, new branch (z,z^{′}) and new adjacency $\{{x}_{\alpha}^{\prime},{z}_{\gamma}^{\prime}\}$ (See Figure 11(E)).
Lemma 16.
For any history graph G containing no unattached junction sides and such that u_{ r }(G)>0, there exists a rearrangement ambiguity reducing extension G^{′} of G such that s_{ u }(G)=s_{ u }(G^{′}), r_{ u }(G)≥r_{ u }(G^{′}) and u(G)>u(G^{′}).
We can now prove the desired lemma.
Lemma 17.
Any history graph G has an AVG extension H such that s_{ u }(G)≥s_{ u }(H) and r_{ u }(G)≥r_{ u }(H).
A bounded transformation of an AVG into a realisation
In this section we will prove that any AVG H has a realisation H such that s_{ l }(H)=s(H) and r_{ l }(H)=r(H).
For an attached root vertex x, the creation of a new vertex x^{′} and branch (x^{′},x) is a case 1 extension. The case 1 extension is used iteratively to initially ensure all roots are unattached.
For an attached leaf vertex, the creation of a new vertex x^{′} and branch (x,x^{′}) is a case 2 extension. The case 2 extension is used iteratively to initially ensure all leaves are unattached.
For a side x_{ α } if A(x_{ α }) is in a module M, x_{ α } is in the face of M. Let M be a simple module containing an odd number of sides and let x_{ α } be an unattached root side in the face of M. The following is a case 3 extension: the creation of a pair of vertices y and y^{′}, an adjacency connecting a side of y to x_{ α } and the branch (y,y^{′}). The case 3 extension is used iteratively to ensure all modules contain an even number of sides.
Similarly to vertices and sides, a thread X is ancestral to a thread Y in a history graph G, and reversely Y is a descendant of X, if there exists a directed path in D(G) from the vertex representing X to the vertex representing Y, otherwise two threads are unrelated if they do not have an ancestor/descendant relationship. For a vertex x, T(x) is the thread it is part of. For a pair of unattached root sides x_{ α } and y_{ β } in the face of a simple module such that T(x)=T(y) or T(x) and T(y) are unrelated, the creation of a new adjacency {x_{ α },y_{ β }} is a case 4 extension. The case 4 extension is used iteratively to ensure all modules contain attached root sides.
Let x_{ α } be a side in the face of a simple module M such that x_{ α } is internal, unattached and has an attached parent. Let (y,y^{′}) be a branch such that ${y}_{\beta}^{\prime}$ is a side in the face of M, T(y) is not descendant of T(x), if T(y)=T(x) then y is unattached, T(y^{′}) is descendant or unrelated to T(x), and the sides A(x_{ α }) and $A({y}_{\beta}^{\prime})$ in M are connected by a path containing an odd number of adjacencies/lifted adjacencies. If y_{ β } is unattached and T(y) is unrelated or equal to T(x) then the creation of the adjacency {x_{ α },y_{ β }} is the case 5 extension, else the interpolation of a vertex y^{′′} on the branch (y,y^{′}) and creation of the adjacency $\{{x}_{\alpha},{y}_{\beta}^{\mathrm{\prime \prime}}\}$ is the case 5 extension. The case 5 extension is used iteratively to ensure all internal vertices are attached.
For an adjacency {x_{ α },y_{ β }} such that y has fewer children than x, the creation of a new vertex y^{′} and branch (y,y^{′}) is a case 6 extension. The case 6 extension is used iteratively to ensure there are no vertices with missing children.
Let x_{ α } and y_{ β } be a pair of unattached leaf sides in the face of a simple module M such that T(x) and T(y) are unrelated or equal, A(x_{ α }) and A(y_{ β }) are attached and are either connected by an adjacency or both not incident with a nontrivial lifted adjacency. The creation of a new adjacency {x_{ α },y_{ β }} is a case 7 extension. The case 7 extension is used iteratively to ensure there are no leaf vertices with missing adjacencies.
For a branchtree containing no labeled vertices, the labeling of any single vertex in the branchtree with a member of Σ^{∗} is a case 8 extension. For a branch (x,y), such that y is labeled and x is unlabeled the labeling of x with the label of y is a case 9 extension. For a branch (x,y), such that x is labeled and y is unlabeled the labeling of y with the label of x is a case 10 extension. The case 8, 9 and 10 extensions are used iteratively to ensure there are no unlabeled vertices.
Lemma 18.
For an AVG H, if H^{′} is obtained from H by any of the 10 extensions cases above then s_{ l }(H)=s_{ l }(H^{′}) and r_{ l }(H)=r_{ l }(H^{′}).
Lemma 19.
For an AVG H, each of the ten types of extensions above can only be applied consecutively a finite number of times until there are no more opportunities in the graph to apply an extension of that type.
Lemma 20.
Any AVG H has an AVG extension H^{′} with no missing labels, adjacencies or branches and such that s_{ l }(H)=s_{ l }(H^{′}) and r_{ l }(H)=r_{ l }(H^{′}).
Proof.
We will demonstrate that the following algorithm converts an AVG into an AVG with no missing adjacencies or branches or unlabeled vertices.

The case 3 complete extension contains no modules with an odd number of sides.

The case 2 extensions ensure that all root vertices are unattached, and every case 3 extension attaches a root vertex in a module with an odd number of sides to a newly created root vertex, so ensuring the module contains an even number of sides, so for every module with an odd number of sides there exists a case 3 extension.

The case 4 complete extension additionally contains no root sides with missing adjacencies or root vertices with missing parents.

The case 3 extensions ensure that there always 0 or 2 unattached root sides in a module, so any unattached root side in a module always has a potential unattached partner root side within the module. The requirement that sides connected in a case 4 extension be in the same or unrelated threads prior to connection does not prevent any root side within the face of a module from becoming attached, because the case 1 extensions ensure that all root vertices are unrelated, the case 2 extensions do not effect root vertices and the case 3 and 4 vertices only result in root vertices being connected to one another.

The case 5 complete extension additionally contains no internal vertices with missing adjacencies.

The case 4 extensions ensure that all root sides within modules are attached. The case 2 extensions ensure that all attached sides have children and the case 3, 4 and 5 extensions ensure this remains true. Given this, and that every module has an even number of sides within it (as a case 3 complete extension), it is straightforward to verify that there is always a case 5 extension in a sequence of such extensions for any internal side within the face of a module.

The case 6 complete extension additionally contains no vertices with missing child branches.

The case 7 complete extension additionally contains no leaf sides with missing adjacencies, and therefore has no missing branches or adjacencies.

Analogously with the case 4 extensions, the requirement that sides connected in a case 7 extension be in the same or unrelated threads does not prevent any leaf side within the face of a module from becoming attached by a case 7 extension, this is because the case 2 extensions ensure all leaf vertices are unrelated, the case 3, 4, 5 and 6 extensions do not connect leaf vertices, and the case 7 extensions only connect leaf sides to one another.

The case 8 complete extension additionally contains no branchtrees without any labeled vertices.

The case 9 complete extension additionally contains no unlabeled ancestral vertices that have labeled descendants.

The case 10 complete extension additionally contains no unlabeled vertices, and therefore has no missing adjacencies, branches or labels.
We can now prove the desired lemma.
Lemma 21.
Any AVG H has a realisation H such that s_{ l }(H)=s(H) and r_{ l }(H)=r(H).
Proof.

On every branch of H^{′} interpolate a vertex.

Label each interpolated vertex identically to its parent.

Connect the sides of the interpolated vertices to one another such that for any adjacency {x_{ α },y_{ β }} connecting interpolated vertices, $\{A({x}_{\alpha}),A({y}_{\beta})\}\in {E}_{{H}^{\prime}}$.
It is easily verified that the result is an AVG that can be edge partitioned into rearrangement and replication epochs and hence is a simple history.
LBSC and LBRC are lower bounds
Lemma 22.
LBSC is a lower bound on substitution cost.
Proof.
From Lemmas 17 and 21 it follows that every history graph has a realisation. It is sufficient therefore to further prove that for any simple history H, s(H)=s_{ l }(H) and that a history graph G has no extension G^{′} such that s_{ l }(G)>s_{ l }(G^{′}). The former is easily verified and we now prove the latter.
Let (G=G_{ n })≺G_{n−1}≺…G_{2}≺(G_{1}=G^{′}) be a sequence of n history graphs for a reduction sequence of n−1 reduction operations. For some integer i∈ [ 1,n) if the ith reduction operation is a vertex deletion, adjacency deletion or branch contraction, as these each have no impact on the calculation of LBSC, s_{ l }(G_{i+1})=s_{ l }(G_{ i }). Else the ith reduction operation is a label deletion. Let x be the vertex whose label is being deleted. As the number of nontrivial lifted labels for A(x) after the deletion of x is less than or equal to the sum of nontrivial lifted labels for x and A(x), it follows that s_{ l }(G_{i+1})≤s_{ l }(G_{ i }). Therefore by induction s_{ l }(G)≤s_{ l }(G^{′}).
Lemma 23.
LBRC is a lower bound on rearrangement cost.
Proof.
Analogously to the proof of Lemma 22, from Lemmas 17 and 21 it follows that every history graph has a realisation. It is sufficient therefore to further prove that for any simple history H, r(H)=r_{ l }(H) and that a history graph G has no extension G^{′} such that r_{ l }(G)>r_{ l }(G^{′}). The former is easily verified and we now prove the latter.
Let (G=G_{ n })≺G_{ n }−1≺…G_{2}≺(G_{1}=G^{′}) be a sequence of n history graphs for a reduction sequence of n−1 reduction operations. For some integer i∈[1,n) if the ith reduction operation is a label deletion, vertex deletion or contraction of a branch with a freeparent, as each removes an element that has no effect on the calculation of the LBRC, r_{ l }(G_{i+1})=r_{ l }(G_{ i }).
Hence r_{ l }(G)=E_{ G }+Q(G)−M(G), where $Q(G)=\sum _{M\in M(G)}\lceil q(M)/2\rceil $. Suppose r_{ l }(G_{i+1})>r_{ l }(G_{ i }). If the ith reduction operation is an adjacency deletion, ${E}_{{G}_{i+1}}+1={E}_{{G}_{i}}$, therefore Q(G_{i+1})−M(G_{i+1})≥Q(G_{ i })−M(G_{ i })+2.
The removal of an adjacency can reduce the number of modules by at most two, therefore M(G_{ i })−M(G_{i+1})≤2. The number of modules decreases by the maximum of two only when the adjacency to be deleted connects two sides that each have no incident lifted adjacencies (see Figure 13(B)). However, in this case Q(G_{ i })=Q(G_{i+1})+1, as the number of unattached sides in a module decreases by 2, therefore if M(G_{ i })−M(G_{i+1})=2 then r_{ l }(G_{i+1})≤r_{ l }(G_{ i }).
An unattached side in a module is the side of a freeroot, and such a freeroot side has incident lifted adjacencies. The side of a freeroot with no incident lifted adjacencies can not become part of a module by the removal of any adjacency from the associated history graph, as by definition the homologous sides in its associated branchtree are all unattached. The removal of an adjacency can therefore only decrease or leave the same the total number of unattached sides in modules. The only way for Q(G_{i+1})−Q(G_{ i }) to be positive is therefore by the redistribution of unattached sides between modules to exploit the ceiling function. As in the removal of a single adjacency at most two unattached sides can be redistributed from a single module (see Figure 13(C)), therefore Q(G_{i+1})−Q(G_{ i })≤1. But if Q(G_{i+1})−Q(G_{ i })=1 then it is easily verified M(G_{ i })−M(G_{i+1})≤0. This is all the cases, therefore r_{ l }(G_{i+1})≤r_{ l }(G_{ i }), by induction therefore r_{ l }(G)≤r_{ l }(G^{′}).
Theorem 1.
For any history graph G and any cost function c, c(s_{ l }(G),r_{ l }(G))≤C(G,c)≤c(s_{ u }(G),r_{ u }(G)) with equality if G is an AVG.
Proof.
Follows from Lemmas 10, 12, 17, 21, 22 and 23.
Proof of Theorem 2
We first classify nonminimal adjacencies and labels.

A leaf if ${L}_{x}^{\text{'}}=\left\{\right\}$,

else, as it is not a junction, ${L}_{x}^{\text{'}}=1$ and:

the label is redundant if ${\stackrel{~}{L}}_{x}^{\text{'}}=\left\{\right\}$,

else complicating if l(A(x))?l(x),

else l(A(x))=l(x) and, as it is not a bridge, then ${\stackrel{~}{L}}_{A(x)}=\left\{\right\}$ and it is an unnecessary bridge.


a leaf if ${L}_{{x}_{a}}^{\text{'}}?{L}_{{y}_{\xdf}}^{\text{'}}=\left\{\right\}$,

else, as it is not a junction, neither x_{ a } or y_{ ß } are junction sides and it is complex if ${L}_{{x}_{a}}^{\text{'}}>1$ or ${L}_{{y}_{\xdf}}^{\text{'}}>1$,

else ${L}_{{x}_{a}}^{\text{'}}=1$, ${L}_{{y}_{\xdf}}^{\text{'}}=1$ and:

the adjacency is redundant if ${L}_{{x}_{a}}?{L}_{{y}_{\xdf}}=\left\{\right\{{x}_{a},{y}_{\xdf}\left\}\right\}$,

else complicating if {A(x_{ a }),A(y_{ ß })} is a nontrivial lifted adjacency,

else {A(x_{ a }),A(y_{ ß })} is a trivial lifted adjacency and, as it is not a bridge either:

$({L}_{A({x}_{a})}^{\text{'}}?{L}_{{x}_{a}}^{\text{'}})\backslash \left\{\right\{{x}_{a},{y}_{\xdf}\left\}\right\}=1$ and $({L}_{A({y}_{\xdf})}^{\text{'}}?{L}_{{y}_{\xdf}}^{\text{'}})\backslash \left\{\right\{{x}_{a},{y}_{\xdf}\left\}\right\}=1$ and it is a removable bridge.${\stackrel{~}{L}}_{A({x}_{a})}^{\text{'}}?{\stackrel{~}{L}}_{A({y}_{\xdf})}^{\text{'}}=\left\{\right\}$

and it is an unnecessary bridge,


Lemma 24.
A Gminimal AVG contains no Greducible nonminimal elements.
Proof.
We prove the contrapositive. It is easily verified that the deletion of any single nonminimal vertex or contraction of a nonminimal branch from an AVG results in a reduction that is also an AVG. It is also easily verified that the deletion of each possible type of nonminimal label/adjacency from an AVG results in a reduction that is also an AVG, with the exceptions of a complex nonminimal adjacency, which can not be present within an AVG (because such an edge implies ambiguity), and a removable bridge adjacency. After deletion of a removable bridge adjacency {x_{ α },y_{ β }} the adjacency {A(x_{ α }),A(y_{ β })} ceases to be a junction adjacency, and may either become a bridge, in which case the resulting graph is an AVG, or it may become a nonminimal adjacency. If it becomes a nonminimal adjacency, then, by the prior argument, if it is not a removable bridge adjacency then its deletion results in an AVG, else if it is a removable bridge then after the deletion of {A(x_{ α }),A(y_{ β })}, the process of considering if {A(A(x_{ α })),A(A(y_{ β }))} is nonminimal and deleting if necessary is repeated iteratively until the resulting graph is an AVG.
Lemma 25.
The only Greducible adjacencies in the Gunbridged graph of an extension of G containing no nonminimal elements are junction adjacencies.
Proof.
By definition, the only Greducible adjacencies in an extension of G with no Greducible nonminimal elements are junction adjacencies and bridges. Each deletion of a Greducible bridge adjacency does not create any Greducible nonminimal adjacencies, as a junction adjacency connecting sides that are the lifting ancestors of the sides connected by a bridge adjacency remains a junction adjacency after the deletion of the bridge, and the lifted adjacencies incident with the sides connected by the bridge, which are nontrivial, lift to this junction instead and therefore remain nontrivial.
Lemma 26.
The Gunbridged graph of a Goptimal AVG for any cost function contains no Greducible ping adjacencies.
Proof.
Theorem 2.
The Gbounded AVGs contain the Goptimal AVGs for every cost function.
Proof.
Follows from Lemmas 24 and 26.
Proof of Theorem 3
In the following let n be the number of adjacencies in a history graph G.
Lemma 27.
If n=0 any Gbounded extension of G contains 0 adjacencies.
Proof.
Follows from Lemma 24.
As the n=0 case is trivial now assume that n≥1. For an adjacency {x_{ α },y_{ β }} its received incidence is ${L}_{{x}_{\alpha}}^{\prime}+{L}_{{y}_{\beta}}^{\prime}$ and its projected incidence is equal to the number of members of {A(x_{ α }),A(y_{ β })} that are attached, either 0, 1 or 2. For an adjacency, the difference between projected incidence and received incidence is the incidence transmission. A positive incidence transmission occurs when the projected incidence is greater than the received incidence number, conversely a negative incidence transmission occurs when the projected incidence is less than the received incidence. The incidence sum of a history graph is the sum of the received incidences of its adjacencies, or, equivalently, the sum of the projected incidences of its adjacencies.
Lemma 28.
The maximum possible incidence sum of G is 2n−2.
Proof.
The 2n term is because each adjacency has a projected incidence of at most 2, the −2 term is because at least one adjacency has a projected incidence of 0.
It is trivial to show this bound can be achieved for all values of n.
Lemma 29.
The Gunbridged graph G^{′′} for a Gbounded history graph G^{′} has no Greducible adjacencies with a positive incidence transmission.
Proof.
By Lemma 25, the only Greducible adjacencies in G^{′′} are junction adjacencies. Junction adjacencies have an incidence transmission of 0 or less.
Lemma 30.
The Gunbridged graph G^{′′} for a Gbounded history graph G^{′} contains less than or equal to 2n−1 adjacencies that either have a negative incidence transmission, or which are Girreducible and have an incidence transmission of 0.
Proof.
The righthand side of the inequality is the number of adjacencies with a negative incidence transmission plus two times the number of Girreducible adjacencies with an incidence transmission of 0.
Lemma 31.
The Gunbridged graph G^{′′} of a Gbounded history graph G^{′} contains less than or equal to 3n−3Greducible adjacencies with an incidence transmission of 0.
Proof.
Now let e be a Greducible junction adjacency in G^{′′} that is contained in a thread that is ancestral or unrelated to all threads that contain a Greducible adjacency or label. If G^{′′} contains more adjacencies than G then such an adjacency must clearly exist in G^{′′}.
If e makes projected incidences to Girreducible adjacencies then it makes projected incidences to adjacencies not in X. If e does not make projected incidences then it has negative incidence transmission, and either e is a hanging adjacency, in which case it must receive projected incidences from adjacencies that are not in X (else there exists a Greducible ping adjacency), or e is not a hanging adjacency and a larger graph exists (see Figure 17(B)) that is Hbounded with respect to a graph H with the same number of adjacencies as G, in which case, using Lemma 30, there must be less than 2n−1Greducible negative transmission incidence adjacencies in G^{′′}. Therefore either there exist projected incidences made between adjacencies not in X or there are fewer than 2n−1Greducible negative transmission incidence adjacencies in G^{′′}, either way, there are fewer than 6n−4 projections made from adjacencies in X to adjacencies not in X, and as there are no projections made between adjacencies in X, and all adjacencies in X have a projected incidence of 2, therefore X has cardinality less than 3n−2.
Lemma 32.
A Gbounded history graph G^{′} contains less than or equal to 5n−4 junction adjacencies.
Proof.
From Lemmas 29, 30 and 31 it follows that the unbridged graph of G^{′} contains less than 5n−4 junction adjacencies. Extending the argument of Lemma 25, it is easily verified that G^{′} contains the same number of junction adjacencies as its unbridged graph.
Lemma 33.
A Gbounded history graph G^{′} contains less than or equal to 10n−8Greducible adjacencies and 20n−16 additional attached vertices. These bounds are tight for all n≥1.
Proof.
Let m be the number of labeled vertices in the history graph G. As with the n=0 case, the m=0 case is similarly trivial, but in terms of the number of Greducible labels.
Lemma 34.
If m=0 any Gbounded extension of G contains 0 labels.
Proof.
Follows from Lemma 24.
Now assume that m≥1 and that n≥0.
Lemma 35.
A Gbounded history graph G^{′} contains less than or equal to 2m−2Greducible vertex labels. This bound is tight for all m≥1.
Proof.
We are now in a position to prove the desired theorem for any value of n and m.
Theorem 3.
A Gbounded history graph contains less than or equal to max(0,10n−8)Greducible adjacencies and max(0,2m−2,20n−16,20n+2m−18) additional vertices. This bound is tight for all values of n≥0 and m≥0.
Proof.
Lemmas and 27 and 33 prove the bound on the number of Greducible adjacencies, it remains to prove the bound on the number of additional vertices.
Let X, Y and Z be the total numbers, respectively, of additional attached, labeled and both unattached and unlabeled vertices in G^{′}.
From Lemmas 27 and 33 it follows that X≤ max(0,20n−16). From Lemmas 34 and 35 it follows that Y≤ max(0,2m−2). Combining these results X+Y≤ max(0,2m−2,20n−16,20n+2m−18).
Assume X+Y+Z> max(0,2m−2,20n−16,20n+2m−18). As X+Y≤ max(0,2m−2,20n−16,20n+2m−18), Z≥1. As G^{′} contains no nonminimal branches, Z is a count of additional root vertices that are unlabeled, unattached and have two or more children, all of which are either labeled, attached or both. Using this information, it is straightforward to demonstrate that there exists a modified pair of history graphs (H,H^{′}) such that H has the same size and cardinality as G, and H^{′} is a Hbounded extension of H that has more labeled or attached vertices than G^{′}. The existence of (H,H^{′}) contradicts either or both Lemmas 35 or Lemma 33.
Proof of Theorem 4
A adjacency {x_{ α },y_{ β }} is old if both A(x_{ α }) and A(y_{ β }) are each independently either the side of a freeroot or incident with a Girreducible adjacency.
Lemma 36.
For any Gbounded history graph G^{′} not isomorphic to G there exists a label detachment or adjacency detachment that results in a Gbounded history graph.
Proof.
The previous lemma implies that for any Gbounded history graph there exists a sequence of label and adjacency detachments that results in G. We now seek the inverse, to demonstrate the existence of a sequence of moves to create a Gbounded AVG from any Gbounded history graph.
The inverse of a label/adjacency/lateraladjacency detachment is, respectively, a label/adjacency/lateraladjacency attachment.
Lemma 37.
A Gbounded history graph G^{′} such that u(G)>0 has a label/adjacency/lateraladjacency attachment that results in a Gbounded history graph.
Proof.
If G^{′} has a freeroot x such that ${L}_{x}^{\prime}>1$, then the labeling of the root of the branchtree whose freeroot is x is a label attachment that results in a Gbounded extension that contains an additional junction label (see Figure 7(AB) in the main text). Else, if G^{′} has substitution ambiguity then there exists a labeled vertex with two or more nontrivial lifted labels for which there exists a label attachment that results in a Gbounded extension, which contains an additional bridge label (see Figure 7(BC) in the main text). Else G^{′} has rearrangement ambiguity. If G^{′} has one or more unattached junction sides, let x_{ α } be such a side. If the most ancestral attached descendants of x_{ α } are not incident with hanging adjacencies then the creation of an isolated vertex y and adjacency {x_{ α },y_{ α }} is an adjacency attachment that results in a Gbounded extension (see Figure 7(CD) in the main text). Else there exists a lateraladjacency attachment that results in a Gbounded history graph in which x_{ α } is incident with an adjacency with no hanging endpoints (see Figure 7(DE), the operation is also an adjacency attachment in this example). Else G^{′} does not have an unattached junction side, and there exists an attached junction side with two or more incident nontrivial lifted adjacencies for which there exists an adjacency attachment that results in a Gbounded extension that contains an additional bridge adjacency (see Figure 7(EF) in the main text).
Given Theorem 3, the previous lemma implies that for any Gbounded history graph there exists a sequence of label/adjacency/lateraladjacency attachment operations that result in a Gbounded AVG.
Theorem 4.
The Gbounded poset is finite, has a single least element G, its set of maximal elements are AVGs, and if and only if there exists a Gbounded reduction operation to transform G^{′′} into G^{′} then G^{′}≺·_{ G }G^{′′}.
Proof.
That Gbounded is finite follows from Theorem 3. Lemma 36 implies it has a single least element. As a corollary of Theorem 3 and Lemma 37 it follows that the set of maximal elements of the Gbounded poset are AVGs.
It remains to prove G^{′}≺·_{ G }G^{′′} if and only if there exists a Gbounded reduction operation to transform G^{′′} into G^{′}. The only if follows by definition. If G^{′}≺_{ G }G^{′′} but not G^{′}≺·_{ G }G^{′′} then there exists a G^{′′′} such that G^{′}≺_{ G }G^{′′′}≺_{ G }G^{′′}. If G^{′′} is transformed to G^{′} by a single Gbounded reduction operation, to complete the proof it is sufficient to show that no such G^{′′′} can exist, this is easily verified.
Proof of Lemma 9
Lemma 9.
There exists a history graph G with an infinite number of Gminimal extensions.
Proof.
Consider the AVG extension H_{0} with zero copies of the repeating subunit and the terminal elements to attach w^{0} and y^{0}, as in Figure 24(C). As a, b, c and d are labeled but no other vertices are labeled, the adjacencies {a_{ head },b_{ head }} and {c_{ head },d_{ h e a d }} are Girreducible, because removal of either in any reduction would create a graph that can not then be an extension of G. Given this observation, by definition w^{0} and y^{0} or any vertices produced by contracting incident branches of w^{0} and y^{0} must be junctions in any Gminimal reduction, and therefore be attached, but by definition of the reduction relation, $\{{w}_{\mathit{\text{head}}}^{0},{w}_{\mathit{\text{head}}}^{{}^{\prime}0}\}$ and $\{{y}_{\mathit{\text{head}}}^{0},{y}_{\mathit{\text{head}}}^{{}^{\prime}0}\}$ can not be removed and yet w^{0} and y^{0} be attached in any Gminimal reduction. This therefore implies that the bridge adjacencies $\{{x}_{\mathit{\text{head}}}^{0},{x}_{\mathit{\text{head}}}^{{}^{\prime}0}\}$ and $\{{z}_{\mathit{\text{head}}}^{0},{z}_{\mathit{\text{head}}}^{{}^{\prime}0}\}$ are also not removed in a Gminimal reduction, but this is all the adjacencies in H_{0}, as all the vertices in H_{0} are attached, therefore H_{0} is Gminimal.
Let H_{ i } be an AVG with i such layers, where i>0 (Figure 24(D) shows an example for i=2). To prove that H_{ i } is a Gminimal AVG extension we proceed by induction. H_{0} is the base case. Assume the adjacencies incident with w^{i−1} and y^{i−1} are not removed in any Gminimal reduction. Using similar logic to the base case the adjacencies incident w^{ i }, x^{ i }, y^{ i } and z^{ i } are similarly not removed in a Gminimal reduction, again as this is all the added adjacencies and all vertices are attached, using the induction therefore H_{ i } is Gminimal.
Endnotes
^{a} The contraction of an edge e is the removal of e from the graph and merger of the vertices x and y incident with e to create new vertex z, such that edges incident with z were incident either with x or y or both, in the latter case becoming a loop edge on z.
^{b} Note: while $\mathcal{\mathscr{H}}(G)$ is infinite we show in the sequel that the infimum of this set of costs is always achieved by a history, hence the infimum is the minimum.
Declarations
Acknowledgements
We would like to thank Dent Earl for his help with figures and the Howard Hughes Medical Institute, Dr. and Mrs. Gordon Ringold, NIH grant 2U41 HG00237113 and NHGRI/NIH grant 5U01HG004695 for providing funding.
Authors’ Affiliations
References
 Elias I: Settling the intractability of multiple alignment. J Comput Biol. 2006, 13 (7): 13231339.View ArticlePubMedGoogle Scholar
 Miller W, Rosenbloom K, Hardison RC, Hou M, Taylor J, Raney B, Burhans R, King DC, Baertsch R, Blankenberg D, Kosakovsky Pond SL, Nekrutenko A, Giardine B, Harris RS, Tyekucheva S, Diekhans M, Pringle TH, Murphy WJ, Lesk S, Weinstock GM, LindbladToh K, Gibbs RA, Lander ES, Siepel A, Haussler D, Kent WJ: 28way vertebrate alignment and conservation track in the UCSC Genome browser. Genes Dev. 2007, 17 (12): 17971808.Google Scholar
 Darling AE, Mau B, Perna NT: Progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PloS one. 2010, 5 (6): e11147View ArticlePubMed CentralPubMedGoogle Scholar
 Paten B, Earl D, Nguyen N, Diekhans M, Zerbino D, Haussler D: Cactus: algorithms for genome multiple sequence alignment. Genome Res. 2011, 21 (9): 15121528.View ArticlePubMed CentralPubMedGoogle Scholar
 Felsenstein J: Inferring Phylogenies. 2004, Sinauer Associates: SunderlandGoogle Scholar
 Blanchette M, Green ED, Miller W, Haussler D: Reconstructing large regions of an ancestral mammalian genome in silico. Genome Res. 2004, 14 (12): 24122423.View ArticlePubMed CentralPubMedGoogle Scholar
 Kim J, Sinha S: Indelign: a probabilistic framework for annotation of insertions and deletions in a multiple alignment. Bioinformatics (Oxford, England). 2007, 23 (3): 289297.View ArticleGoogle Scholar
 Paten B, Herrero J, Fitzgerald S, Beal K, Flicek P, Holmes I, Birney E: Genomewide nucleotidelevel mammalian ancestor reconstruction. Genome Res. 2008, 18 (11): 18291843.View ArticlePubMed CentralPubMedGoogle Scholar
 Day W: Computational complexity of inferring phylogenies from dissimilarity matrices. Bull Math Biol. 1987, 49 (4): 461467.View ArticlePubMedGoogle Scholar
 Chindelevitch L, Li Z, Blais E, Blanchette M: On the inference of parsimonious indel evolutionary scenarios. J Bioinform Comput Biol. 2006, 4 (3): 721744.View ArticlePubMedGoogle Scholar
 Song YS, Hein J: Constructing minimal ancestral recombination graphs. J Comput Biol. 2005, 12 (2): 147169.View ArticlePubMedGoogle Scholar
 Westesson O, Holmes I: Accurate detection of recombinant breakpoints in wholegenome alignments. PLoS Comput Biol. 2009, 5 (3): e1000318View ArticlePubMed CentralPubMedGoogle Scholar
 Wang LL, Zhang KK, Zhang LL: Perfect phylogenetic networks with recombination. J Comput Biol. 2001, 8 (1): 6978.View ArticlePubMedGoogle Scholar
 Bergeron A, Mixtacki J, Stoye J: A unifying view of genome rearrangements. Lecture Notes in Bioinformatics. 4175: 163173.Google Scholar
 Alekseyev M, Pevzner P: Multibreak rearrangements and chromosomal evolution. Theor Comput Sci. 2008, 395 (2–3): 193202.View ArticleGoogle Scholar
 Hannenhalli S, Pevzner PA: Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals. J ACM. 1999, 46 (1): 127.View ArticleGoogle Scholar
 Bergeron A, Mixtacki J, Stoye J: On sorting by translocations. J Comput Biol. 2006, 13 (2): 567578.View ArticlePubMedGoogle Scholar
 Yancopoulos S, Attie O, Friedberg R: Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics. 2005, 21 (16): 33403346.View ArticlePubMedGoogle Scholar
 Caprara A: Formulations and hardness of multiple sorting by reversals. Proc. 3rd Conf. Computational Molecular Biology RECOMB99. 1999, 1: 8493.Google Scholar
 Xu AW: A fast and exact algorithm for the median of three problem: a graph decomposition approach. J Comput Biol. 2009, 16 (10): 13691381.View ArticlePubMedGoogle Scholar
 Bourque G, Pevzner PA: Genomescale evolution: reconstructing gene orders in the ancestral species. Genome Res. 2002, 12 (1): 2636.PubMed CentralPubMedGoogle Scholar
 Ma J, Ratan A, Raney BJ, Suh BB, Miller W, Haussler D: The infinite sites model of genome evolution. Proc Natl Acad Sci USA. 2008, 105 (38): 1425414261.View ArticlePubMed CentralPubMedGoogle Scholar
 ElMabrouk N: Sorting signed permutations by reversals and insertions/deletions of contiguous segments. J Discrete Algorithm. 2000, 1 (1): 105121.Google Scholar
 Yancopoulos S, Friedberg R: DCJ path formulation for genome transformations which include insertions, deletions, and duplications. J Comput Biol. 2009, 16 (10): 13111338.View ArticlePubMedGoogle Scholar
 Braga MD, Willing E, Stoye J: Double cut and join with insertions and deletions. J Comput Biol. 2011, 18 (9): 11671184.View ArticlePubMedGoogle Scholar
 ElMabrouk N, Sankoff D: Analysis of gene order evolution beyond singlecopy genes. Methods Mol Biol. 2012, 855: 397429.View ArticlePubMedGoogle Scholar
 Chauve C, ElMabrouk N, Gueguen L, Semeria M, Tannier E: Models and Algorithms for Genome Evolution. 2013, London: SpringerVerlagView ArticleGoogle Scholar
 Bader M: Genome rearrangements with duplications. BMC Bioinformatics. 2010, 11 (Suppl 1): S27S27.View ArticlePubMed CentralPubMedGoogle Scholar
 Shao M, Lin Y: Approximating the edit distance for genomes with duplicate genes under DCJ, insertion and deletion. BMC Bioinformatics. 2012, 13 (Suppl 19): S13View ArticlePubMed CentralGoogle Scholar
 Raphael B, Zhi D, Tang H, Pevzner P: A novel method for multiple alignment of sequences with repeated and shuffled elements. Genome Res. 2004, 14 (11): 23362346.View ArticlePubMed CentralPubMedGoogle Scholar
 Paten B, Diekhans M, Earl D, John JS, Ma J, Suh B, Haussler D: Cactus graphs for genome comparisons. J Comput Biol. 2011, 18 (3): 469481.View ArticlePubMedGoogle Scholar
 Medvedev P, Brudno M: Maximum likelihood genome assembly. J Comput Biol. 2009, 16 (8): 11011116.View ArticlePubMed CentralPubMedGoogle Scholar
 Edmonds J, Johnson EL: Matching: A WellSolved Class of Integer Linear Programs. 1970, 2730.Google Scholar
 Tannier E, Zheng C, Sankoff D: Multichromosomal median and halving problems under different genomic distances. BMC Bioinformatics. 2009, 10: 120View ArticlePubMed CentralPubMedGoogle Scholar
 Bienstock D, Langston MA: Algorithmic implications of the graph minor theorem. Handbooks in Operations Research and Management Science. 1994, 7: 481502.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.