- Research Article
- Open Access
Inferring synteny between genome assemblies: a systematic evaluation
© The Author(s). 2018
- Received: 26 June 2017
- Accepted: 15 January 2018
- Published: 30 January 2018
Genome assemblies across all domains of life are being produced routinely. Initial analysis of a new genome usually includes annotation and comparative genomics. Synteny provides a framework in which conservation of homologous genes and gene order is identified between genomes of different species. The availability of human and mouse genomes paved the way for algorithm development in large-scale synteny mapping, which eventually became an integral part of comparative genomics. Synteny analysis is regularly performed on assembled sequences that are fragmented, neglecting the fact that most methods were developed using complete genomes. It is unknown to what extent draft assemblies lead to errors in such analysis.
We fragmented genome assemblies of model nematodes to various extents and conducted synteny identification and downstream analysis. We first show that synteny between species can be underestimated up to 40% and find disagreements between popular tools that infer synteny blocks. This inconsistency and further demonstration of erroneous gene ontology enrichment tests raise questions about the robustness of previous synteny analysis when gold standard genome sequences remain limited. In addition, assembly scaffolding using a reference guided approach with a closely related species may result in chimeric scaffolds with inflated assembly metrics if a true evolutionary relationship was overlooked. Annotation quality, however, has minimal effect on synteny if the assembled genome is highly contiguous.
Our results show that a minimum N50 of 1 Mb is required for robust downstream synteny analysis, which emphasizes the importance of gold standard genomes to the science community, and should be achieved given the current progress in sequencing technology.
- Genome synteny
- Assembly quality
- Comparative genomics
- Nematode genomes
The essence of comparative genomics lies in how we compare genomes to reveal species’ evolutionary relationships. Advances in sequencing technologies have enabled the exploration of many new genomes across all domains of life [1–8]. Unfortunately, in most instances correctly aligning even just two genomes at base-pair resolution can be challenging. A genome usually contains millions or billions of nucleotides and is different from the genome of a closely related species as a result of evolutionary processes such as sequence mutations, chromosomal rearrangements, and gene family expansion or loss. There are high computational costs when trying to align and assign multiple copies of DNA that are identical to each other, such as segmental duplications and transposable elements [9–12]. In addition, it has been shown that popular alignment methods disagree with each other .
An alternative and arguably more practical approach relies on the identification of synteny blocks [13, 14], first described as homologous genetic loci that co-occur on the same chromosome [15, 16]. Synteny blocks are more formally defined as regions of chromosomes between genomes that share a common order of homologous genes derived from a common ancestor [17, 18]. Alternative names such as conserved synteny or collinearity have been used interchangeably [13, 19–22]. Comparisons of genome synteny between and within species have provided an opportunity to study evolutionary processes that lead to diversity of chromosome number and structure in many lineages across the tree of life [23, 24]; early discoveries using such approaches include chromosomal conserved regions in nematodes and yeast [25–27], evolutionary history and phenotypic traits of extremely conserved Hox gene clusters across animals and MADS-box gene family in plants [28, 29], and karyotype evolution in mammals  and plants . Analysis of synteny in closely related species is now the norm for every new published genome. However, assembly quality comes into question as it has been demonstrated to affect subsequent analysis such as annotation or rate of lateral transfer [32, 33].
In general, synteny identification is a filtering and organizing process of all local similarities between genome sequences into a coherent global picture . The most intuitive way to identify synteny would be to establish from selective genome alignments [35, 36], but levels of nucleotide divergence between species may make such methodologies challenging. Instead, many tools use orthologous relationships between protein-coding genes as anchors to position statistically significant local alignments. Approaches include the use of a directed acyclic graph [37, 38], a gene homology matrix (GHM) , and an algorithm using reciprocal best hits (RBH) . All of these methods generally agree on long synteny blocks, but have differences in handling local shuffles as well as in the resolution of synteny identification [34, 40]. Better resolution of micro-rearrangements in synteny patterns has been shown when using an improved draft genome of Caenorhabditis briggsae versus Caenorhabditis elegans [26, 41]. Hence, synteny analysis depends highly on assembly quality. For example, missing sequences in an assembly may lead to missing gene annotations and subsequently missing orthologous relationships . With respect to assembly contiguation, it still remains unclear whether assembly fragmentation affects homology assignments for identifying anchors, sequence arrangements for examining order and gaps, or other factors in synteny analysis.
In this study, we focus on how assembly quality affects the identification of genome synteny. We investigate the correlation between error rate (%) in detecting synteny and the level of assembly contiguation using five popular software packages (DAGchainer , i-ADHoRe , MCScanX , SynChro , and Satsuma ) on four nematodes: Caenorhabditis elegans, Caenorhabditis briggsae, Strongyloides ratti, and Strongyloides stercoralis. We also carried out and explored the effects of three scenarios associated with synteny analysis: gene ontology (GO) enrichment, reference-guided assembly scaffolding, and annotation quality. Our findings show that assembly quality does matter in synteny analysis, and fragmented assemblies ultimately lead to erroneous findings. In addition, the true evolutionary relationship may be lost if a fragmented assembly is scaffolded using a reference-guided approach. Our main aim here is to determine a minimum contiguation of assembly for subsequent synteny analysis to be trustworthy, which should be possible using the latest sequencing technologies .
Definition of synteny block, break and coverage
Evaluation of synteny identification programs in fragmented assemblies
We then fragmented the C. elegans genome into fixed intervals of either 100 kb, 200 kb, 500 kb or 1 Mb to evaluate the performance of different programs when using self-comparisons (see Methods). Synteny coverages of the fragmented assembly (query) against the original assembly (reference) were calculated for both query and reference sequences. Generally, synteny coverage decreased as the assembly was broken into smaller pieces. For example, an average of 16% decrease in synteny coverage was obtained using the assembly with fixed fragment size of 100 kb (Additional file 2: Table S1). Sites of fragmentation are highly correlated with synteny breaks in anchor-based programs (Fig. 2, Additional file 3: Figure S2, and Additional file 4: Figure S3). One explanation is that the fragmented assembly introduced breaks within genes that resulted in loss of anchors (Fig.1, break a), which can be common in real assemblies if introns contain hard to assemble sequences . Another explanation is that the breaks between genes lead to the number of anchors not reaching the required minimum (Fig. 1, Break a). For the case of Satsuma, synteny identification was not affected by assembly fragmentation (Fig. 2, Additional file 3: Figure S2, and Additional file 4: Figure S3; Additional file 2: Table S1).
Contribution of assembly contiguation and intrinsic species effect to synteny analysis
A common analysis when reporting a new genome is inferring synteny against a closely related species. Hence, we reanalyzed synteny between C. elegans and C. briggsae, S. ratti and S. stercoralis. Satsuma found only 19% and 54% synteny in C. elegans-C. briggsae and S. ratti-S. stercoralis comparisons, respectively, presumably because of difficulty in establishing orthology at the nucleotide level (Additional file 2: Table S1). On average, the four anchor-based tools found 77% and 83% synteny between C. elegans-C. briggsae and S. ratti-S. stercoralis respectively (Additional file 2: Table S1). In contrast to the general agreement on within-species self-comparisons, the anchor-based tools varied considerably on these inter-species (i.e. more diverged) comparisons (Additional file 6: Figure S5 and Additional file 2: Table S1). For example, in the C. elegans-C. briggsae comparisons, a difference of 25% in synteny coverage was found between the results of i-ADHoRe and SynChro (Additional file 6: Figure S5 and Additional file 2: Table S1), while this tool variation was interestingly much lower in S. ratti-S. stercoralis—only a 9% difference (Additional file 2: Table S1). To increase the complexity, we fragmented C. briggsae and S. stercoralis into fixed sequence sizes using the same approach as previously mentioned and compared them with the genome of C. elegans and S. ratti, respectively. We found that MCScanX still underestimated synteny the most as the scaffold size decreased from 1 Mb to 100 kb. Strikingly, the median error rate was as high as 40% in C. elegans-C. briggsae but only 12% in S. ratti-S. stercoralis comparisons (Fig. 4c and d). The error rate is also as high as 40% and largely variable in the comparison between S. ratti and 100 kb fragmented S. stercoralis using Satsuma (Fig. 4d). This observation suggests that higher gene density leads to a more robust synteny detection in fragmented assemblies when more anchors (genes) are available in a given sequence (Additional file 1: Table 1 and Additional file 1: Figure S1).
Synteny identification in real-world scenarios
Erroneous findings from fragmented assemblies in synteny analysis
True synteny may be compromised by reference-guided assembly methods
Annotation quality has little effect on synteny analysis
Synteny analysis is a practical way to investigate the evolution of genome structure [28–31, 56]. In this study, we have revealed how genome assembly contiguity affects synteny analysis. We present a simple scenario of breaking an assembly into a more fragmented state, which only mimics part of the poor assembly problem. Our genome fragmentation method randomly breaks sequences into same-sized pieces, which gives rise to a distribution of sequence length with peaks enriched in tiny sequences and fixed-size fragments (Fig. 5b). Real assemblies, which usually comprise a few large sequences and many more tiny sequences that are difficult to assemble because of their repetitive nature [25, 26], possess very different sequence length distributions (Fig. 5b). It is probable that we overestimated error rate in regions that can be easily assembled and underestimated error rate in regions that will be more fragmented, but overall synteny coverage were comparable (Fig. 5a). Note that some of the sequences in real assemblies may contain gaps (scaffolds) that will result in more missing genes and will result in further underestimation of synteny. Our results are quite similar when a de novo Pacbio C. elegans assembly and three older versions C. briggsae assemblies were compared to the reference C. elegans genome (Fig. 5a). The use of long read technology and advanced genome mapping such as the Hi-C approach  are becoming the norm for de novo assembly projects. Assemblies with lower contiguation were not compared here as we emphasize the responsibility of research groups to produce assemblies that are of the higher contiguity, made possible by long reads .
Synteny identification from different programs (i.e., DAGchainer , i-ADHoRe , MCScanX , SynChro , and Satsuma ) performed across different species (C. elegans, C. briggsae, S. ratti, and S. stercoralis) have allowed us to examine the wide-ranging effects of assembly contiguation on downstream synteny analysis. Satsuma demonstrates fewer contiguation-dependent patterns as its detection of synteny relies on nucleotide alignments (Fig. 2). However, we show that Satsuma was only robust when comparing species with very low divergence, for example, between strains or assembly versions from the same species. Only ~ 19% of C. elegans and C. briggsae were identified as syntenic using Satsuma, and ~ 54% in S. ratti-S. stercoralis (Additional file 2: Table S1). Because initial identification of synteny coverage was low, any regions that failed to align in fragmented assemblies would consist of larger proportion of the original synteny coverage and lead to a higher error rate (Fig. 4d).
The other four programs, which are anchor-based, tend to produce the same results when the original assembly is compared to itself, but differ extensively when assemblies become fragmented (Fig. 2). It is interesting to note that DAGchainer and MCScanX use the same scoring algorithm for determining synteny regions, except that DAGchainer uses the number of genes without orthology assignment as gaps while MCScanX uses unaligned nucleotides. When comparing closely related species, results from the four anchor-based programs fluctuate even without fragmentation in Caenorhabditis species, while the pattern remains similar to self-comparison in Strongyloides species (Fig. 4). Sensitivity in synteny identification drops sharply as the genome assembly becomes fragmented, and thus genome assembly contiguation must be considered when inferring synteny relationships between species. Our fragmentation approach only affects N50, which mostly leads to loss of anchors in synteny analysis. Other scenarios such as unknown sequences (NNNs) in the assembly, rearrangements causing a break in anchor ordering (Fig. 1, break b), or insertions/deletions (Fig. 1, break c) were not addressed and may lead to greater inaccuracies.
We have shown that genomic features such as gene density and length of intergenic regions play an essential role during the process of synteny identification (Fig. 4; Tables 1 and Additional file 2: Table S1). Synteny identification can be established more readily in species with higher gene density or shorter intergenic space, which is related to the initial setting of minimum anchors needed for synteny identification (Fig. 1 and Additional file 1: Figure S1). Repetitiveness of paralogs is another factor in how anchors were chosen from homology assignments. For example, we found that synteny coverage is low along chromosomal arm regions of C. elegans in both self-comparison and versus C. briggsae, which has been reported to have expansion of G protein-coupled receptor gene families  (Fig. 2 and Additional file 6: Figure S5). Nevertheless, this case may be a result of a combination of repetitive paralogs and high gene density.
Interestingly, synteny comparison with scaffolded assemblies using ALLMAPS  exhibited unexpected variation among programs. Unfortunately, we did not resolve the reason behind the sharp decrease in synteny coverage from i-ADHoRe (Fig. 7). Nevertheless, we have shown that it is dangerous to scaffold an assembly using a reference from closely related species without a priori information about their synteny relationship. Subsequent synteny identification would be misleading if the same reference was compared again  (Fig. 8). We also considered the interplay between genome annotation, assembly and synteny identification. Although it may be intuitive to assume lower annotation quality can lead to errors in synteny analysis, we demonstrated that such influence was minimal if an initial genome assembly of good contiguation is available (Table 2).
In conclusion, this study has demonstrated that a minimum quality of genome assembly is essential for synteny analysis. To keep the error rate below 5% in synteny identification, we suggest that an N50 of 200 kb and 1 Mb is required when gene density of species of interest are 290 and 200 genes per Mb, respectively (Tables 1 and Additional file 1: Figure S1). This is a minimum standard, and a higher N50 may be required for other species with lower gene density or highly expanded gene families.
Data preparation and fragmentation simulation
Identification of Synteny blocks
The four anchor-based programs DAGchainer , i-ADHoRe  (v3.0), MCScanX  and SynChro , and the nucleotide alignment-based Satsuma , were used to identify synteny blocks. Settings for each program were modified to resemble each other on the results of C. elegans vs. C. elegans, where synteny should be close to 100%, with the exception of default setting in Satsuma. All of the anchor-based programs use gene orthology to find anchor points during process of synteny blocks detection. For DAGchainer, i-ADHoRe and MCScanX, we obtained gene orthology from OrthoFinder  (v0.2.8). SynChro has an implemented program called OPSCAN to scan for gene orthology. We arranged the following parameters for each program: DAGchainer (accessory script filter_repetitive_matches.pl was run with option 5 before synteny identification as recommended by manual; options: -Z 12 -D 10 -A 5 -g 1), i-ADHoRe (only top 1 hit of each gene in input blast file was used as recommended; options: cluster_type = collinear, alignment_method = gg2, max_gaps_in_alignment = 10, tandem_gap = 5, gap_size = 10, cluster_gap = 10, q_value = 0.9, prob_cutoff = 0.001, anchor_points = 5, level_2_only = false), MCScanX (only top 5 hits of each gene in the input blast file was used as suggested; options: default) and SynChro (options: 0 6; 0 for all pairwise, and 6 for delta of RBH genes). To calculate synteny coverage, the total length of merged synteny blocks along scaffolds was divided by total assembly size.
Visualization of synteny linkages was made by R (v3.3.1) and circos  (v0.69–4). Gene ontology enrichment analysis was performed using the topGO  (v1.0) package in R and only focused on Biological Process (options: nodeSize = 3, algorithm = “weight01”, statistic = “Fisher”). Gene ontology associations files for C. elegans and C. briggsae were downloaded from WormBase WS255 . Gene orthology was assigned by OrthoFinder . Then, assemblies were scaffolded using ALLMAPS  with a reference guided approach. De novo annotations of C. elegans were predicted using either the manually trained species parameter or from BUSCO  (v2.0).
We thank John Wang for commenting on the manuscript.
D.L and I.J.T are funded by Academia Sinica.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
Analysis: DL Wrote the manuscript: DL, MH and IJT. Conceived and directed the project: IJT. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Gordon D, Huddleston J, Chaisson MJ, Hill CM, Kronenberg ZN, Munson KM, Malig M, Raja A, Fiddes I, Hillier LW, et al. Long-read sequence assembly of the gorilla genome. Science (New York, NY). 2016;352:aae0344.View ArticleGoogle Scholar
- Lien S, Koop BF, Sandve SR, Miller JR, Matthew P, Leong JS, Minkley DR, Zimin A, Grammes F, Grove H, et al. The Atlantic salmon genome provides insights into rediploidization. Nature. 2016;533:200–5.View ArticlePubMedGoogle Scholar
- Iorizzo M, Ellison S, Senalik D, Zeng P, Satapoomin P, Huang J, Bowman M, Iovene M, Sanseverino W, Cavagnaro P, et al. A high-quality carrot genome assembly provides new insights into carotenoid accumulation and asterid genome evolution . Nat Genet. 2016;48:657–66.Google Scholar
- Jarvis DE, Ho YS, Lightfoot DJ, Schmöckel SM, Li B, Borm TJA, Ohyanagi H, Mineta K, Michell CT, Saber N, et al. The genome of Chenopodium Quinoa. Nature. 2017;542:1-6.Google Scholar
- Ma L, Chen Z, Huang DW, Kutty G, Ishihara M, Wang H, Abouelleil A, Bishop L, Davey E, Deng R, et al. Genome analysis of three Pneumocystis species reveals adaptation mechanisms to life exclusively in mammalian hosts. Nat Commun. 2016;7:10740.View ArticlePubMedPubMed CentralGoogle Scholar
- de Man TJB, Stajich JE, Kubicek CP, Teiling C, Chenthamara K, Atanasova L, Druzhinina IS, Levenkova N, SSL B, Barribeau SM, et al. Small genome of the fungus Escovopsis weberi, a specialized disease agent of ant agriculture. Proc Natl Acad Sci. 2016;113:3567–72.View ArticlePubMedPubMed CentralGoogle Scholar
- Hunt VL, Tsai IJ, Coghlan A, Reid AJ, Holroyd N, Foth BJ, Tracey A, Cotton JA, Stanley EJ, Beasley H, et al. The genomic basis of parasitism in the Strongyloides clade of nematodes. Nat Genet. 2016;48:299–307.View ArticlePubMedPubMed CentralGoogle Scholar
- Cotton JA, Bennuru S, Grote A, Harsha B, Tracey A, Beech R, Doyle SR, Dunn M, JCD H, Holroyd N, et al. The genome of Onchocerca volvulus, agent of river blindness. Nat Microbiol. 2016;2:16216.View ArticlePubMedPubMed CentralGoogle Scholar
- Chen X, Tompa M. Comparative assessment of methods for aligning multiple genome sequences. Nat Biotechnol. 2010;28:567–72.View ArticlePubMedGoogle Scholar
- Alkan C, Coe BP, Eichler EE. Genome structural variation discovery and genotyping. Nat Rev Genet. 2011;12:363–76.View ArticlePubMedPubMed CentralGoogle Scholar
- Treangen TJ, Salzberg SL. Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat Rev Genet. 2012;46:36–46.View ArticleGoogle Scholar
- Uricaru R, Michotey C, Chiapello H, Rivals E. YOC, a new strategy for pairwise alignment of collinear genomes. BMC Bioinf. 2015;16:111.View ArticleGoogle Scholar
- Ehrlich J, Sankoff D, Nadeau JH: Synteny conservation and chromosome rearrangements during mammalian evolution. Genetics 1997, 296:289-296.Google Scholar
- Ghiurcuta CG, BME M. Evaluating synteny for improved comparative studies. Bioinformatics. 2014;30:9–18.View ArticleGoogle Scholar
- Renwick JH. The mapping of human chromosome. Annu Rev Genet. 1971;5:81–120.View ArticlePubMedGoogle Scholar
- Nadeau JH. Maps of linkage and synteny homologies between mouse and man. Trends Genet. 1989;5:82—6.Google Scholar
- Vergara IA, Chen N. Large synteny blocks revealed between Caenorhabditis Elegans and Caenorhabditis Briggsae genomes using OrthoCluster. BMC Genomics. 2010;11:516.View ArticlePubMedPubMed CentralGoogle Scholar
- Tang H, Lyons E, Pedersen B, Schnable JC, Paterson AH, Freeling M. Screening synteny blocks in pairwise genome comparisons through integer programming. BMC Bioinformatics. 2011:1-11.Google Scholar
- Schmidt R. Synteny - recent advances and future prospects. Curr Opin Plant Biol. 2000;3:97–102.View ArticlePubMedGoogle Scholar
- Vandepoele K, Saeys Y, Simillion C, Raes J, Van de Peer Y. The automatic detection of homologous regions (ADHoRe) and its application to microcolinearity between Arabidopsis and rice. Genome Res. 2002;12:1792–801.View ArticlePubMedPubMed CentralGoogle Scholar
- Coghlan A, Eichler EE, Oliver SG, Paterson AH, Stein L. Chromosome evolution in eukaryotes: a multi-kingdom perspective. Trends Genet. 2005;21:673–82.View ArticlePubMedGoogle Scholar
- Molinari NA, Petrov DA, Price HJ, Smith JD, Gold JR, Vassiliadis C, Dudley JW, Biradar DP, Devos KM, Bennetzen JL, et al. Synteny and collinearity in plant genomes. Science. 2008;320(5875):486—8.Google Scholar
- Zhang G, Li B, Li C, MTP G, Jarvis ED, Wang J. Comparative genomic data of the avian Phylogenomics project. GigaScience. 2014;3:26.View ArticlePubMedPubMed CentralGoogle Scholar
- Howe KL, Bolt BJ, Cain S, Chan J, Chen WJ, Davis P, Done J, Down T, Gao S, Grove C, et al. WormBase 2016: expanding to enable helminth genomic research. Nucleic Acids Res. 2016;44:D774–80.View ArticlePubMedGoogle Scholar
- C. elegans Sequencing Consortium TCeS, Fleischmann RD, Bult CJ, Goffeau A, Coulson AR, Coulson A, Coulson A, Greenwald I, Coulson A, Sulston J, et al. Genome sequence of the nematode C. Elegans: a platform for investigating biology. Science (New York, NY). 1998;282:2012–8.View ArticleGoogle Scholar
- Stein LD, Bao Z, Blasiar D, Blumenthal T, Brent MR, Chen N, Chinwalla A, Clarke L, Clee C, Coghlan A, et al. The genome sequence of Caenorhabditis Briggsae: a platform for comparative genomics. PLoS Biol. 2003;1:E45.View ArticlePubMedPubMed CentralGoogle Scholar
- Wong S, Wolfe KH. Birth of a metabolic gene cluster in yeast by adaptive gene relocation. Nat Genet. 2005;37:777–82.View ArticlePubMedGoogle Scholar
- Luebeck G. Genomic evolution of metastasis.Editorial. Nature. 2010;467:1053–4.View ArticlePubMedGoogle Scholar
- Ruelens P, de Maagd RA, Proost S, Theißen G, Geuten K, Kaufmann K. FLOWERING LOCUS C in monocots and the tandem origin of angiosperm-specific MADS-box genes. Nat Commun. 2013;4:2280.View ArticlePubMedGoogle Scholar
- Kemkemer C, Kohn M, Cooper DN, Froenicke L, Högel J, Hameister H, Kehrer-Sawatzki H. Gene synteny comparisons between different vertebrates provide new insights into breakage and fusion events during mammalian karyotype evolution. BMC Evol Biol. 2009;9:84.View ArticlePubMedPubMed CentralGoogle Scholar
- Murat F, Armero A, Pont C, Klopp C, Salse J. Reconstructing the genome of the most recent common ancestor of flowering plants. Nat Genet. 2017;49:490–6.View ArticlePubMedGoogle Scholar
- Denton JF, Lugo-Martinez J, Tucker AE, Schrider DR, Warren WC, Hahn MW. Extensive Error in the Number of Genes Inferred from Draft Genome Assemblies. PLoS Comput Biol. 2014;10(12):e1003998.Google Scholar
- Dupont P-Y, Cox MP. Genomic data quality impacts automated detection of lateral gene transfer in fungi. G3 (Bethesda, Md). 2017;7:g3.116.038448.Google Scholar
- Batzoglou S. The many faces of sequence alignment. Brief Bioinform. 2005;6:6–22.View ArticlePubMedGoogle Scholar
- Minkin I, Patel A, Kolmogorov M, Vyahhi N, Pham S. Sibelia: a fast synteny blocks generation tool for many closely related microbial genomes. Algorithms Bioinformatics. 2013:215-29.Google Scholar
- Grabherr MG, Russell P, Meyer M, Mauceli E, Alföldi J, di Palma F, Lindblad-Toh K. Genome-wide synteny through highly sensitive sequence alignment: Satsuma. Bioinformatics. 2010;26:1145–51.View ArticlePubMedPubMed CentralGoogle Scholar
- Haas BJ, Delcher AL, Wortman JR, Salzberg SL. DAGchainer: a tool for mining segmental genome duplications and synteny. Bioinformatics. 2004;20:3643–6.View ArticlePubMedGoogle Scholar
- Wang Y, Tang H, Debarry JD, Tan X, Li J, Wang X, Lee TH, Jin H, Marler B, Guo H, et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 2012;40:1–14.View ArticleGoogle Scholar
- Proost S, Fostier J, De Witte D, Dhoedt B, Demeester P, Van De Peer Y, Vandepoele K. I-ADHoRe 3.0-fast and sensitive detection of genomic homology in extremely large data sets. Nucleic Acids Res. 2012;40:1–11.View ArticleGoogle Scholar
- Drillon G, Carbone A, Fischer G. SynChro: a fast and easy tool to reconstruct and visualize synteny blocks along eukaryotic chromosomes. PLoS One. 2014;9:1–8.View ArticleGoogle Scholar
- Ross JA, Koboldt DC, Staisch JE, Chamberlin HM, Gupta BP, Miller RD, Baird SE, Haag ES. Caenorhabditis Briggsae recombinant inbred line genotypes reveal inter-strain incompatibility and the evolution of recombination. PLoS Genet. 2011;7(7):e1002174.Google Scholar
- Bhutkar A, Russo S, Smith TF, Gelbart WM. Techniques for multi-genome synteny analysis to overcome assembly limitations. Genome Inform Int Conference Genome Inform. 2006;17:152–61.Google Scholar
- Goodwin S, McPherson JD, McCombie WR. Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet. 2016;17:333–51.View ArticlePubMedGoogle Scholar
- Treangen TJ, Ondov BD, Koren S, Phillippy AM. The harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes. Genome Biol. 2014;15:524.View ArticlePubMedPubMed CentralGoogle Scholar
- Viney ME. The biology and genomics of Strongyloides. Med Microbiol Immunol. 2006;195:49–54.View ArticlePubMedGoogle Scholar
- Ward JD. Rendering the intractable more tractable: tools from caenorhabditis elegans ripe for import into parasitic nematodes. Genetics. 2015;201:1279–94.View ArticlePubMedPubMed CentralGoogle Scholar
- Armengol L, Marquès-Bonet T, Cheung J, Khaja R, González JR, Scherer SW, Navarro A, Estivill X. Murine segmental duplications are hot spots for chromosome and gene evolution. Genomics. 2005;86:692–700.View ArticlePubMedGoogle Scholar
- Davidson RM, Gowda M, Moghe G, Lin H, Vaillancourt B, Shiu SH, Jiang N, Robin Buell C. Comparative transcriptomics of three Poaceae species reveals patterns of gene expression evolution. Plant J. 2012;71:492–502.PubMedGoogle Scholar
- Lovell PV, Wirthlin M, Wilhelm L, Minx P, Lazar NH, Carbone L, Warren WC, Mello CV. Conserved syntenic clusters of protein coding genes are missing in birds. Genome Biol. 2014;15(565):1—27.Google Scholar
- Baldauf J, Marcon C, Paschold A, Hochholdinger F. Nonsyntenic genes drive tissue-specific dynamics of differential, nonadditive and allelic expression patterns in maize hybrids. Plant Physiol. 2016;171:00262.02016.Google Scholar
- Assefa S, Keane TM, Otto TD, Newbold C, Berriman M. ABACAS: algorithm-based automatic contiguation of assembled sequences. Bioinformatics. 2009;25:1968–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Husemann P, Stoye J. r2cat: Synteny plots and comparative assembly. Bioinformatics. 2009;26:570–1.View ArticlePubMedPubMed CentralGoogle Scholar
- Tang H, Zhang X, Miao C, Zhang J, Ming R, Schnable JC, Schnable PS, Lyons E, Lu J. ALLMAPS: robust scaffold ordering based on multiple maps. Genome Biol. 2015;16:3.View ArticlePubMedPubMed CentralGoogle Scholar
- Stanke M, Waack S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics. 2003;19:215–25.View ArticleGoogle Scholar
- Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV. BUSCO : assessing genome assembly and annotation completeness with single-copy orthologs. Genome Anal. 2015;31:9–10.Google Scholar
- Hillier LW, Miller RD, Baird SE, Chinwalla A, Fulton LA, Koboldt DC, Waterston RH. Comparison of C. Elegans and C. Briggsae genome sequences reveals extensive conservation of chromosome organization and synteny. PLoS Biol. 2007;5:1603–16.View ArticleGoogle Scholar
- Dudchenko O, Batra SS, Omer AD, Nyquist SK, Hoeger M, Durand NC, Shamim MS, Machol I, Lander ES, Aiden AP, et al. De novo assembly of the Aedes Aegypti genome using hi-C yields chromosome-length scaffolds. Science. 2017;356:92–5.View ArticlePubMedPubMed CentralGoogle Scholar
- PSG C, Grafham DV, Fulton RS, Fitzgerald MG, Hostetler J, Muzny D, Ali J, Birren B, Bruce DC, Buhay C, et al. Genome project standards in a new era of sequencing. Science (New York, NY). 2009;326:4–5.Google Scholar
- Thompson PC, Zarlenga DS, Liu M-Y, Rosenthal BM. Long-read sequencing improves assembly of Trichinella genomes 10-fold, revealing substantial synteny between lineages diverged over 7 million years. Parasitology. 2017;144(10):1—14.Google Scholar
- Emms DM, Kelly S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 2015;16:157.View ArticlePubMedPubMed CentralGoogle Scholar
- Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA. Circos: an information esthetic for comparative genomics. Genome Res. 2009;19:1639–45.View ArticlePubMedPubMed CentralGoogle Scholar
- Alexa A, Rahnenfuhrer J. topGO: enrichment analysis for gene ontology. R package version 2260 2016.Google Scholar