Probabilistic modeling of the evolution of gene synteny within reconciled phylogenies
© Semeria et al. 2015
Published: 2 October 2015
Most models of genome evolution concern either genetic sequences, gene content or gene order. They sometimes integrate two of the three levels, but rarely the three of them. Probabilistic models of gene order evolution usually have to assume constant gene content or adopt a presence/absence coding of gene neighborhoods which is blind to complex events modifying gene content.
We propose a probabilistic evolutionary model for gene neighborhoods, allowing genes to be inserted, duplicated or lost. It uses reconciled phylogenies, which integrate sequence and gene content evolution. We are then able to optimize parameters such as phylogeny branch lengths, or probabilistic laws depicting the diversity of susceptibility of syntenic regions to rearrangements. We reconstruct a structure for ancestral genomes by optimizing a likelihood, keeping track of all evolutionary events at the level of gene content and gene synteny. Ancestral syntenies are associated with a probability of presence.
We implemented the model with the restriction that at most one gene duplication separates two gene speciations in reconciled gene trees. We reconstruct ancestral syntenies on a set of 12 drosophila genomes, and compare the evolutionary rates along the branches and along the sites. We compare with a parsimony method and find a significant number of results not supported by the posterior probability. The model is implemented in the Bio++ library. It thus benefits from and enriches the classical models and methods for molecular evolution.
Genomes evolve through processes that modify their content and organization at different scales, ranging from substitutions, insertions or deletions of single nucleotides to large scale chromosomal rearrangements. Extant genomes are the result of a combination of many such processes, which makes it difficult to reconstruct the big picture of genome evolution. Instead, most models and methods focus on one scale and use only one kind of data, such as gene orders or sequence alignments.
Models based on sequence alignments were first developed in the 1960's and underwent steady development until reaching a high level of complexity . In a recent development, they have been extended to include gene content, modeling duplications, losses and transfers of genes with reconciliation methods [2, 3]. Reconciled gene trees account for evolutionary events at both the sequence level and the gene family level. They thus yield a better representation of genome evolution and pave the way for approaches integrating other levels of information [4, 5].
In parallel extant gene orders have long been used to infer evolutionary relationships between organisms and to reconstruct ancestral genomes [6–8]. Although the early stages of their development were computational challenges, methods based on gene orders gradually overcame theoretical and computational constraints so that they can now handle unequal gene content, multi-chromosomal genomes, whole genome duplications and dozens of genomes with large amounts of genes [9–11], and can be inserted into probabilistic frameworks [12–17].
All ingredients are present to integrate gene order and sequence evolution models, yet this leap has not been taken, mostly because of computational issues. Reconstructing gene order histories is often hard . A computational solution to reconstruct gene orders and scale up with the size of datasets is to see a genome as a set of independently evolving adjacencies, i.e. the links between consecutive genes . One can reconstruct ancestral gene orders following three main steps:
Group potentially homologous adjacencies (they connect homologous pairs of genes)
For each group, reconstruct the common history of adjacencies, by recovering ancestral ones
Assemble the ancestral adjacencies in each ancestral species to obtain ancestral chromosomes
The assumption that adjacencies evolve independently allows quick computations at the second step: the size of the data can be an order of magnitude larger than without the assumption. But an optimization assembly step is required because of possible conflicts between adjacencies wrongly assumed independent .
Another difficulty is the integration of gene content dynamics. Often probabilistic solutions are limited to invariable gene content [12–14]. A solution is to encode altogether the presence and absence of genes and adjacencies as binary characters and use a binary sequence evolution model [15, 16], but it lacks an evolutionary model of gene content and order dynamics. Gene profiles  or reconciled gene trees [22, 10] are more promising for integration with sequence evolution models. They were mainly used with parsimony methods to reconstruct ancestral adjacencies, which makes it difficult to combine with a model at a different scale.
We propose a probabilistic model of adjacency evolution accounting for gene duplications and losses, using extant gene orders and reconciled gene trees. We base on the parsimony algorithm of DeCo  that we adapt to Felsenstein's maximum likelihood algorithm  with a birth/death process that models the evolution of adjacencies. We compute the most likely adjacencies in ancestral genomes and the quantity of gains and losses of adjacencies in all the branches of a species trees, thus providing an insight into the dynamics of rearrangements in these lineages. The model is implemented in Bio++ , the present form allowing at most one duplication node between two speciation nodes in gene trees. We compute the likelihood of gene orders in a set of 12 drosophila whose genomes are annotated in the Ensembl Metazoa  database. We optimize branch lengths in a species phylogeny and construct ancestral genomes. We compare the results with a parsimony approach, showing that while most adjacencies inferred by parsimony have a good probability, a non negligible proportion (> 11%) are not supported (posterior probability < 0.5).
A rooted species tree is a binary tree that describes the evolutionary relationships between organisms. The leaves of the tree are available species, internal nodes are ancestral species. The species tree has branch lengths indicating the quantity of expected evolution. Branch lengths can be also estimated as an output.
An ordered set of genes is represented by a set of adjacencies, which are pairs of consecutive genes. For example, a genome A containing the sequence of genes a1 − a2 − a3 − a4 contains adjacencies a1a2, a2a3, and a3a4. Adjacencies are not oriented, meaning that a1a2 is equivalent to a2a1.
Genes are grouped into homologous families across genomes. The evolutionary history of each family is represented by a rooted gene tree. Gene trees are reconciled with the species tree (see precomputation below).
In this section, we give an overview of the main steps in our method. All these steps are detailed in the following sections, except the precomputation, for which we refer to .
Precomputation: gene trees are reconciled with the species tree in order to minimize the number of duplications and losses (using DeCo ). It consists in annotating each internal node by an ancestral gene, together with the species it belongs to, and the evolutionary event (speciation, duplication, loss) taking place at the bifurcation. This determines a set of ancestral genes for all ancestral species. Gene losses are also annotated in the trees.
Classify extant adjacencies so that every class can be handled independently. Inside each class, two gene families with two trees are involved and all adjacencies have an extremity in each family.
For each selected pair of gene families, construct a tree, called the tree of possible adjacencies. Its nodes are all the couples of nodes from each gene tree, which are in the same extant or ancestral species (the speciation nodes), plus some duplication nodes; the leaves are labeled with the pattern of presence/absence of the possible adjacencies in the data.
Compute, between successive nodes of this tree, the probability of presence or absence of the adjacency using the model of evolution described below.
Compute the likelihood of the adjacency given the observed adjacencies.
Compute a posteriori probabilities of presence of ancestral adjacencies.
The likelihood computation for one adjacency tree allows to obtain a likelihood for the whole dataset by multiplying all likelihoods, considered as independent, and to optimize parameters. These can concern branch lengths on the species tree, or a law of differential fragility for different genome sites, modeling different susceptibility to rearrangements among chromosomal regions [25, 26].
We first reduce the problem to two gene trees, without loss of generality, by classifying adjacencies. Reconciled gene trees define ancestral genes of ancestral species. A necessary condition for an adjacency i1i2 to be an ancestor of a1a2 is that i1 is an ancestor of a1 and i2 an ancestor of a2. By the same idea a necessary condition for adjacencies a1a2 and b1b2 to be homologous is that there is a common ancestor i1 of a1 and b1, and a common ancestor i2 of a2 and b2, such that i1 and i2 are in the same species. This condition for homology is an equivalence relation on all extant adjacencies, which can be clustered and treated by equivalence classes of homology. To a class we can associate i1 and i2 the most ancient distinct common ancestors of all adjacency extremities in the class. So every adjacency in the class has an extremity which is a descendant of i1 and an extremity which is a descendant of i2. Without loss of generality we can work with the two sub-trees rooted at i1 and i2.
Trees of possible adjacencies
We now suppose that we have G1 and G2 two reconciled gene trees with some leaves of G1 involved in adjacencies with some leaves of G2. Each node n in G1 and G2 is annotated with an event (speciation, duplication, loss) and a species S(n). Take each pair of nodes i1i2, where i1 and i2 are speciation nodes associated with the same ancestral species s, i1 ∈ G1 and i2 ∈ G2. Since S(i1) = S(i2) and adjacencies exist between leaves of G1 and leaves of G2, i1i2 is called a possible adjacency.
All possible adjacencies define nodes of the tree of possible adjacencies, in which duplication nodes can be added, as explained below.
If i1i2 is a possible adjacency such that S(i1) = S(i2) = s, let s1 and s2 be the two children of s in the species tree. There is a descent path in the tree of possible adjacencies from i1i2 to all possible adjacencies j1j2 in s1 such that i1 is an ancestor of j1 and i2 is an ancestor of j2, and a similar independent path from s to s2. If there is no duplication node between i1 and j1 and i2 and j2, then this path is a single edge. If there is at least one duplication node between i1 and j1 or i2 and j2, then the path from i1i2 to j1j2 has two edges, one between i1i2 and d, a new duplication node, and one from d to j1j2. The node i1i2 always has only two descendants, but the node d can have an arbitrary number, according to the number of duplications in the gene lineages.
Model of evolution
In the case when there is no duplication in the two gene trees, likelihoods can be computed directly from the tree of possible adjacencies (which itself has no duplication nodes) with Felsenstein's algorithm .
An adjacency can be lost because of a rearrangement (1 → 0), or because at least one of the two adjacent genes is lost. In the first case, the state of the leaf in the tree of possible adjacency is simply 0. In the second case, we assign an undetermined state ? to the loss leaf in the tree of possible adjacencies to differentiate it from a loss due to a rearrangement. We do not compute probabilities of transition for branches leading to these nodes.
In the case when there are duplication nodes, we write the probabilities according to a model of evolution of adjacencies in presence of duplications: when one gene belonging to an adjacency is duplicated, the adjacency is transmitted to one of the two copies of the gene. This is always verified, whether the duplication is tandem or remote. For example, consider a gene i2 involved in an adjacency i1i2 in species I with a gene i1. In species A (descendant of species I ), i1 has one descendant a1, whereas i2 is duplicated, giving two copies a2 and . If the duplication is in tandem it leads to the gene order , and the only adjacency conserved with a1 is a1a2. Otherwise it leads to the gene order and again only a1a2 is conserved. Note nevertheless that the adjacency can appear later in the phylogeny following a rearrangement.
Between two speciation events, we have no date for duplication events. We argue that fixing a date, for example with gene branch lengths, would be a mistake as the position of a duplication between two speciations influences the transition probabilities. Besides, the probabilistic approach means that we can account for all possible dates. Hence we compute an average transition probability for the duplicated branch over all the moments on the branch of the species where this duplication could have occurred. To do this, we integrate the transition probabilities P(t) uniformly over the length of this branch. Depending on the date of the duplications, the probabilities of the several resulting adjacencies are more or less linked. Hence, the integrated transition probability is no longer from one adjacency to another adjacency, but from one adjacency to the set of all the possible adjacencies that result from the duplication. In the previous example (one duplication), the transition probability is from i1i2 to (. We can fully model such a process as several processes in parallel. If Q is the generator of the binary model, Q ⊗ Q ⊗ ... ⊗ Q is the generator of the whole process, where ⊗ is the Kronecker product. Here, from a single Q generator at the beginning of the branch, along the branch each event of duplication gives rise to a larger Kronecker product. From a computational point of view, the whole parallel process is considered all along the branch, but just a subset of the transition probabilities is used.
We restrict here the description of the model to the case when there is at most one duplication node between two speciation nodes in the gene trees, which means that in the tree of possible adjacencies, duplication nodes have at most four descendants (because a gene duplication would have occurred in each gene tree). However, in case of several duplications, the same principle holds, with much more complicated formula.
Let x be the state of adjacency i1i2, y the state of a1a2 and z the state of , (x, y, z) ∈ [0, 1]3. Assuming that is on the duplicated branch, the overall transition probabilities from x to y and z are given by .
The two choices for the duplicated branch are considered during the computation of the likelihood.
Likelihood is computed in the rooted tree of possible adjacencies in a bottom-up way. From here, we describe adjacency nodes with single letters for better clarity. We denote as D i the data that is below node i.
where Y is one possible order of p. If i has n children, Y is a vector of n binary characters representing the states of the n children. Y c is thus the c th element of Y and D c the data below the c th child of i.
This formula is valid for any number of children for the duplication node i, provided N 11 is replaced by an appropriate matrix.
Ancestral adjacencies reconstruction
Ancestral states, that is, posterior probabilities of presence of adjacencies in the tree of possible adjacencies, are reconstructed by a top-down (from the root to the leaves) algorithm following the the bottom-up likelihood computation algorithm. In the top-down likelihood computation algorithm, we compute the conditional likelihoods of each node i according to the conditional likelihood of the data below it (D i ), and to the conditional likelihood of the data that is on the other part of its father f, D f , and to the conditional likelihood that is below the brothers of i (say one brother i′).
where y is the state of i, x is the state of f, z the state of i′ and t′ the length of the branch from f to i′.
From these conditional likelihoods, a posteriori probabilities of presence of adjacencies can be computed. The result is, for each ancestral species, a set of adjacencies associated with probabilities of presence. Transforming it into a bona fide gene order necessitates finding a subset of probable adjacencies in which one ancestral gene can be adjacent to only two others. Efficient methods exist  to do so, but they ignore the main source of possible conflict between adjacencies when they are seen as independently evolving characters: errors in gene trees . So in general we prefer presenting a set of adjacencies associated with probabilities, and leave open the way of choosing among them and/or correcting the input data to avoid conflict.
Implementation and availability
We implemented the model of evolution and the likelihood calculation algorithm in the Bio++ library (http://biopp.univ-montp2.fr/). The algorithm that builds the trees of possible adjacencies was implemented in a separate program which also uses Bio++. Reconciliation was performed with DeCo . All the analytical formulas in our model were computed using Maxima (see Additional file 1). These programs are available upon request to the authors.
Note that the drosophila genomes are not all perfectly assembled and some are fragmented in several hundred contigs. So all the signal does not have to be interpreted as rearrangements, but some of it is due to the absence of adjacencies in extant genomes.
Statistics of extant and ancestral genomes in the drosophila dataset.
adjacencies > 0.9
genes with more than 2 adjacencies
The "degree" column in Table 1 shows that in general less than 4% of the genes harbor a conflicting signal with more than 2 attached adjacencies having posterior probability > 0.9. While this remains a high rate of error, it means that most of the supported signal constitutes linear ancestral contigs or chromosomes. The conflict is variable according to the lineages. A surprisingly high amount of conflict arises for the ancestor of yacuba and erecta, predicted as recent. Perhaps this reflects an ambiguity in the species tree which precisely at this place is debated . It seems that rearrangements support an alternative topology.
Comparison with parsimony
We always reconstruct more ancestral adjacencies than DeCo because DeCo reconstructs ancestral adjacencies up to the last common ancestor of an adjacency class, whereas we reconstruct possible ancestral adjacencies up to the most ancient ancestor of an adjacency class. This explains why many possible ancestral adjacencies have low or no support in the presence/absence pattern at the leaves.
Probabilistic models of evolution have at least four advantages over parsimony approaches: they provide more accurate results in presence of many mutations; they provide a natural support scheme of the results in the form of a probability of ancestral states; the likelihood is computed by an integration over all scenarios rather than choosing only one, even if optimal; and several models at different scales of the genome can be integrated.
But most probabilistic models of gene order evolution are computationally intractable on large datasets, working with too large state spaces. Coding gene order by binary characters is a solution, like for many characters characterized by their presence or absence. Then it is possible, like in , to use a standard model of binary sequence evolution to achieve a probabilistic reconstruction of phylogenies and ancestral gene orders based on the presence/absence of adjacencies in extant species. This way can handle unequal gene content but does not model the processes of joint evolution of gene content and order, and has to simplify the data to make it fit into standard models. As a result a part of the understanding of genome evolution remains out of reach.
This is why we put some efforts in a model of gene neighborhood evolution handling complex histories of genes depicted by their reconciled phylogenies.
We gain several advantages. For example the model allows to follow a pattern of descent of adjacencies. Links between genes evolve, just as genes evolve too. This can be used to detect the positional orthology (orthology of a gene as a locus, in addition to a sequence) when a gene is duplicated in an asymmetric way  - not in tandem, so that from the loci point of view, only one duplicate is a descendant of the unique copy before duplication. Here we allow any kind of duplication, symmetric or not, but in any case an adjacency is transmitted to one copy. In the case of a tandem duplication, this does not yield an asymmetry for the genes, because a gene has two adjacencies, and the two can transmit a descendant to a different copy in the case of a tandem duplication. But in the case of an asymmetric duplication, the two adjacencies are transmitted to the same copy of a gene and a positional homolog is detected.
We also keep track of the evolutionary events that can be responsible for the gain and loss of an adjacency. For example an adjacency can be lost because one of the genes is lost, or because of a rearrangement. It is two different reasons for an adjacency to be absent, and we are able with a model to differentiate both cases.
We found that a significant number of adjacencies inferred by parsimony on a drosophila dataset are not supported by a probabilistic model. It corroborates the usual findings in evolutionary models each time reasonably distant species are compared, whether it is sequence evolution , gene content evolution , or gene order evolution .
There are still several limitations to this work. For the moment the computation time is one of them, the efficiency of optimization algorithms coupled with our model allowed us to work only on a small fixed phylogeny. Theoretically we could even infer phylogenies, coupling a model of sequence evolution and such a model of genome organization evolution, but it will necessitate algorithmic progresses. Another limit is that our current implementation only handles independent duplication events, although we are also developing a model for joint duplications. Finally, the possible presence of many duplications yields intricate integrals difficult to solve analytically, if we want to stick with exact solutions integrating over their position in a branch. Numerical approximations or simplifying hypotheses have to be incorporated. For the moment families with many duplications are filtered out.
The present model is a proof of concept that it is possible to handle whole genomes of dozens of species, including genes with complex histories, into a probabilistic model for gene organization.
We open a path that has many possible continuations:
Handle joint duplications of two consecutive genes as a single duplication event.
Handle more than one gene duplication between two gene speciations.
Handle horizontal gene transfer (a parsimonious framework is available ).
Jointly infer probabilistic presence and absence of genes and gene neighborhoods, using conditional probabilities mixing two models.
Integrate the model into an integrative probabilistic model of genome evolution, handling both sequence evolution and gene content evolution, like Phyldog .
With this integration the model can be used to infer species phylogenies, or at least in the current state of computational complexity, to test among a small number of species phylogenies. For example we will test two different alternative drosophila species tree topologies according to the likelihood of our model, and according to the coherence of ancestral genomes (the linear organization of genes along chromosomes).
Use this model to detect highly variable sites by correlating variable rates of adjacency evolution (in a similar framework as for sequence evolution ) and intergene sizes, and bring a stone to the study of fragile and solid regions .
These constitute our work in progress. We see the model we present here as a decisive step.
Publication of this work is funded by the Agence Nationale pour la Recherche, Ancestrome project ANR-10-BINF01-01. This work was performed using the computing facilities of the Computing Center LBBE/PRABI.
This article has been published as part of BMC Bioinformatics Volume 16 Supplement 14, 2015: Proceedings of the 13th Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics: Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/16/S14.
- Felsenstein J: Inferring Phylogenies. 2004, Sinauer Associates, Incorporated, New YorkGoogle Scholar
- Szöllősi GJ, Boussau B, Abby SS, Tannier E, Daubin V: Phylogenetic modeling of lateral gene transfer reconstructs the pattern and relative timing of speciations. Proc Nat Acad Sci USA. 2012, 109 (43): 17513-17518.PubMedPubMed CentralView ArticleGoogle Scholar
- Sjöstrand J, Tofigh A, Daubin V, Arvestad L, Sennblad B, Lagergren J: A bayesian method for analyzing lateral gene transfer. Syst Biol. 2014, 63 (3): 409-420.PubMedView ArticleGoogle Scholar
- Boussau B, Daubin V: Genomes as documents of evolutionary history. Trends Ecol Evol. 2010, 25 (4): 224-32.PubMedView ArticleGoogle Scholar
- Chauve C, El-Mabrouk N, Guéguen L, Semeria M, Tannier E: Duplication, Rearrangement and Reconciliation: A Follow-Up 13 Years Later. Model Algorithms Genome Evol. Edited by: Chauve C, El-Mabrouk N, Tannier E. 2013, Springer, London, 47-62. Chap. 4View ArticleGoogle Scholar
- Sturtevant AH, Dobzhansky T: Inversions in the Third Chromosome of Wild Races of Drosophila Pseudoobscura, and Their Use in the Study of the History of the Species. Proc Natl Acad Sci USA. 1936, 22 (7): 448-50.PubMedPubMed CentralView ArticleGoogle Scholar
- Sankoff D: Mechanisms of genome evolution: models and inference. Bulletin of international statistical institute. 1989, 47: 461-475.Google Scholar
- Bourque G, Pevzner PA, Tesler G: Reconstructing the genomic architecture of ancestral mammals: lessons from human, mouse, and rat genomes. Genome Res. 2004, 14 (4): 507-16.PubMedPubMed CentralView ArticleGoogle Scholar
- Muffato M, Louis A, Poisnel C-E, Roest Crollius H: Genomicus: a database and a browser to study gene synteny in modern and ancestral genomes. Bioinformatics. 2010, 26 (8): 1119-21.PubMedPubMed CentralView ArticleGoogle Scholar
- Bérard S, Galien C, Boussau B, Szollosi G, Daubin V, Tannier E: Evolution of gene neighborhoods within reconciled phylogenies. Bioinformatics. 2012, 28 (18): 382-388.View ArticleGoogle Scholar
- Gagnon Y, Blanchette M, El-Mabrouk N: A flexible ancestral genome reconstruction method based on gapped adjacencies. BMC Bioinformatics. 2012, 13 Suppl 1 (Suppl 19): 4-Google Scholar
- Durrett R, Nielsen R, York TL: Bayesian estimation of genomic distance. Genetics. 2004, 166 (1): 621-629.PubMedPubMed CentralView ArticleGoogle Scholar
- Larget B, Kadane J, Simon D: A bayesian approach to the estimation of ancestral genome arrangements. Molecular phylogenetics and evolution. 2005, 36 (2): 214-223.PubMedView ArticleGoogle Scholar
- Ma J: A probabilistic framework for inferring ancestral genomic orders. Proceedings of IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2010, 179-184.Google Scholar
- Zhang Y, Hu F, Tang J: A mixture framework for inferring ancestral gene orders. BMC Genomics. 2012, 13 (Suppl 1): 7-View ArticleGoogle Scholar
- Lin Y, Hu F, Tang J, Moret B: Maximum likelihood phylogenetic reconstruction from high-resolution whole-genome data and a tree of 68 eukaryotes. Proc 18th Pacific Symp Bio. 2013, 285-96.Google Scholar
- Yang N, Hu F, Zhou L, Tang J: Reconstruction of ancestral gene orders using probabilistic and gene encoding approaches. PLoS One. 2014, 9 (10): 108796-View ArticleGoogle Scholar
- Blin G, Chauve C, Fertin G, Rizzi R, Vialette S: Comparing genomes with duplications: a computational complexity point of view. IEEE/ACM Trans Comput Biol Bioinform. 2007, 4 (4): 523-534.PubMedView ArticleGoogle Scholar
- Gallut C, Barriel V: Cladistic coding of genomic maps. Cladistics. 2002, 18 (5): 526-536.View ArticleGoogle Scholar
- Maňuch J, Patterson M, Wittler R, Chauve C, Tannier E: Linearization of ancestral multichromosomal genomes. BMC Bioinformatics. 2012, 13 (Suppl 19): 11-Google Scholar
- Wu Y, Rasmussen M, Kellis M: Evolution at the subgene level: domain rearrangements in the drosophila phylogeny. Mol Biol Evol. 2012, 29 (2): 689-705.PubMedPubMed CentralView ArticleGoogle Scholar
- Ma J, Ratan A, Raney B: DUPCAR: reconstructing contiguous ancestral regions with duplications. Journal of computational biology. 2008, 15 (8): 1007-1027.PubMedPubMed CentralView ArticleGoogle Scholar
- Guéguen L, Gaillard S, Boussau B, Gouy M, Groussin M, Rochette NC, Bigot T, Fournier D, Pouyet F, Cahais V, Bernard A, Scornavacca C, Nabholz B, Haudry A, Dachary L, Galtier N, Belkhir K, Dutheil JY: Bio++: efficient extensible libraries and tools for computational molecular evolution. Mol Biol Evol. 2013, 30 (8): 1745-1750.PubMedView ArticleGoogle Scholar
- Cunningham F, Amode MR, Barrell D, Beal K, Billis K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fitzgerald S, Gil L, Gir´on CG, Gordon L, Hourlier T, Hunt SE, Janacek SH, Johnson N, Juettemann T, K¨ah¨ari AK, Keenan S, Martin FJ, Maurel T, McLaren W, Murphy DN, Nag R, Overduin B, Parker A, Patricio M, Perry E, Pignatelli M, Riat HS, Sheppard D, Taylor K, Thormann A, Vullo A, Wilder SP, Zadissa A, Aken BL, Birney E, Harrow J, Kinsella R, Muffato M, Ruffier M, Searle SMJ, Spudich G, Trevanion SJ, Yates A, Zerbino DR, Flicek P: Ensembl 2015. Nucleic Acids Research. 2015, 662-669.Google Scholar
- Peng Q, Pevzner PA, Tesler G: The fragile breakage versus random breakage models of chromosome evolution. PLoS Comput Biol. 2006, 2 (2): 14-View ArticleGoogle Scholar
- Berthelot C, Muffato M, Abecassis J, Roest Crollius H: The 3D Organization of Chromatin Explains Evolutionary Fragile Genomic Regions. Cell Reports. 2015, 10: 1-12.View ArticleGoogle Scholar
- Boussau B, Szöllősi GJ, Duret L, Gouy M, Daubin V: Genome-scale coestimation of species and gene trees. Genome Research. 2013, 23: 323-330.PubMedPubMed CentralView ArticleGoogle Scholar
- Evolution of genes and genomes on the drosophila phylogeny. Nature. 2007, 450 (7167): 203-218. Drosophila 12 Genomes ConsortiumGoogle Scholar
- Pollard DA, Iyer VN, Moses AM, Eisen MB: Widespread discordance of gene trees with species tree in drosophila: evidence for incomplete lineage sorting. PLoS Genet. 2006, 2: 173-View ArticleGoogle Scholar
- Hu F, Zhou J, Zhou L, Tang J: Probabilistic Reconstruction of Ancestral Gene Orders with Insertions and Deletions. IEEE/ACM Trans Comput Biol Bioinforma. 2014, 5963 (c): 1-1.Google Scholar
- Dewey CN: Positional orthology: putting genomic evolutionary relationships into context. Brief. Bioinform. 2011, 12 (5): 401-12.PubMedPubMed CentralView ArticleGoogle Scholar
- Mahmudi O, Sjostrand J, Sennblad B, Lagergren J: Genome-wide probabilistic reconciliation analysis across vertebrates. BMC Bioinformatics. 2013, 14 (Suppl 15): 10-View ArticleGoogle Scholar
- Patterson M, Szöllősi G, Daubin V, Tannier E: Lateral gene transfer, rearrangement, reconciliation. BMC bioinformatics. 2013, 14 (Suppl 15): 4-View ArticleGoogle Scholar
- Yang Z: Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol Evol. 1996, 11: 367-372.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.