Automated alignment-based curation of gene models in filamentous fungi
© van der Burgt et al.; licensee BioMed Central Ltd. 2014
Received: 15 July 2013
Accepted: 11 January 2014
Published: 16 January 2014
Automated gene-calling is still an error-prone process, particularly for the highly plastic genomes of fungal species. Improvement through quality control and manual curation of gene models is a time-consuming process that requires skilled biologists and is only marginally performed. The wealth of available fungal genomes has not yet been exploited by an automated method that applies quality control of gene models in order to obtain more accurate genome annotations.
We provide a novel method named alignment-based fungal gene prediction (ABFGP) that is particularly suitable for plastic genomes like those of fungi. It can assess gene models on a gene-by-gene basis making use of informant gene loci. Its performance was benchmarked on 6,965 gene models confirmed by full-length unigenes from ten different fungi. 79.4% of all gene models were correctly predicted by ABFGP. It improves the output of ab initio gene prediction software due to a higher sensitivity and precision for all gene model components. Applicability of the method was shown by revisiting the annotations of six different fungi, using gene loci from up to 29 fungal genomes as informants. Between 7,231 and 8,337 genes were assessed by ABFGP and for each genome between 1,724 and 3,505 gene model revisions were proposed. The reliability of the proposed gene models is assessed by an a posteriori introspection procedure of each intron and exon in the multiple gene model alignment. The total number and type of proposed gene model revisions in the six fungal genomes is correlated to the quality of the genome assembly, and to sequencing strategies used in the sequencing centre, highlighting different types of errors in different annotation pipelines. The ABFGP method is particularly successful in discovering sequence errors and/or disruptive mutations causing truncated and erroneous gene models.
The ABFGP method is an accurate and fully automated quality control method for fungal gene catalogues that can be easily implemented into existing annotation pipelines. With the exponential release of new genomes, the ABFGP method will help decreasing the number of gene models that require additional manual curation.
KeywordsGene model Automated gene model curation Sequence error Truncated gene model Pseudogene Fungal genome Cladosporium fulvum
In the past decade, numerous fungal genomes of importance to medicine, agriculture and industry have been sequenced [1, 2] and continuous innovations in next generation sequencing technology will spur this number to rapidly increase further. Once sequenced and assembled, genomes are annotated through an automated gene-calling pipeline, which is still an error-prone process, particularly for the highly plastic and diverse genomes of fungal species.
Most gene annotation pipelines integrate different gene prediction algorithms to increase the accuracy of the annotation . These algorithms include ab initio supervised, ab initio unsupervised and (supervised) alignment-based gene predictors, which are implemented in tools such as Augustus , GeneMark-ES  and TWINSCAN 2.0α , respectively. Augustus is one of the most frequently employed and best performing ab initio supervised gene prediction tools that offers parameterizations for several dozens of fungi . For species lacking a provided parameterization, a considerable manual input is required to obtain such species-specific parameterization by training the algorithm with a large sample (~1000) of correct gene models . Thus, its applicability is limited to only those species for which parameterization is available [5, 6]. GeneMark-ES-2 is an ab initio unsupervised gene predictor iteratively training itself on the input genome sequence alone that outperformed Augustus , but is reported to be relatively inaccurate in predicting single exon genes . A hybrid strategy between ab initio and alignment- (or evidence) based gene prediction is currently implemented in several tools. Updated versions of Augustus integrate evidence obtained from unigene alignments , protein multiple sequence alignments  and intron- and exon-hints acquired from RNA-Seq data, which greatly improved their prediction accuracy. To our knowledge, alignment-based gene prediction in fungi using genomic data alone has only been successfully applied using TWINSCAN 2.0 α, which was specifically adapted and trained to Cryptococcus neoformans. In that case, the whole-genome DNA alignment of two strains of this fungus, whose genomes are largely syntenic and exhibit around 95% nucleotide identity in coding regions, served as input. The reported ~60% gene accuracy clearly outperformed non-alignment-based ab initio gene prediction software . TWINSCAN 2.0 α requires extensive species-specific training and parameterization, offering a tailor-made solution for a defined pair of related species only. Most importantly, the approach taken in TWINSCAN 2.0 α is difficult to apply to fungal genomes because of their high plasticity [8–10]. The absence of conserved regions exhibiting macro- or even meso-synteny between related fungal genomes  severely hampers the construction of whole-genome DNA alignments. Besides reshuffled gene orders, a highly variable gene content is also observed among fungi with a large number of genes showing a discontinuous distribution in the fungal tree of life. This is caused by frequent gene, gene-cluster, segmental and whole chromosome duplications, losses or horizontal transfers, which have created complex variation in both gene family expansion and reduction [8, 10]. Although homologous gene loci can often be inferred easily between distantly related fungi, annotation of fungal genomes by classical alignment-based gene prediction tools is problematic. In recent years, ensemble predictors have been developed to weigh and combine similarity evidence and the predictions made by various other tools into a single, more accurate gene model [11, 12]. However, it “often requires significant effort in implementation to cast comparative information into a form compatible with the existing gene models” .
Because none of the available gene prediction tools were specifically developed for fungal genomes, automatic gene annotation of fungi often yields a relatively high fraction of incorrect gene models. These can only be revised through a time-consuming process of quality control and manual curation by skilled biologists or bioinformaticians, but this is often only marginally performed. Manual curation usually involves comparative analyses with tools that can accurately identify a spliced gene structure in a target DNA sequence using a homologous protein sequence as a so-called “informant” sequence (e.g. GeneWise , Scipio , etc.). However, a large proportion of gene models and derived protein sequences in current fungal sequence releases contain errors, and a manual curator can easily propagate existing errors when using incorrectly predicted informant protein(s). A typical example of the marginal quality of fungal gene catalogues is exemplified by the re-annotation of the Fusarium graminearum genome . In the new version, 1,770 gene models were revised by using various new gene predictors, exploiting expression data, performing extensive manual curation and evidence-based selection of the best gene model from alternative predictions . Despite this effort, recent RNA-Seq data provided experimental proof for at least another 655 incorrectly predicted gene models in the latest version of the F. graminearum annotation .
We have now entered an era in which genome sequencing of clusters of related fungi will be performed on a massive scale. Subsequent gene prediction on these genomes will require automation with very little manual inspection . Although gene prediction software suitable for fungal genomes has become more accurate over the last decade, they are still error-prone. A method that facilitates or automates the process of curating gene models is therefore needed to increase the accuracy of the catalogues of predicted genes in sequenced fungal genomes. Here, we present a novel gene-by-gene method for alignment-based gene prediction that is particularly suitable for the plastic genomes of fungi. Our method, called alignment-based fungal gene prediction (ABFGP), (i) provides improved accuracy of predicted gene models, (ii) is species-independent, (iii) does not require partial or whole-genome DNA alignments, (iv) does not require supervision and (v) can use a variable number of informant genes. We demonstrate the accuracy and versatility of the ABFGP method by re-annotating the genomes of a selection of six sequenced Ascomycete fungi.
The alignment-based fungal gene prediction (ABFGP) method
The output of an ABFGP execution is a GFF file containing the predicted gene model and several features that assist manual inspection of the predicted gene model. Input for ABFGP is a list of orthologous gene loci, of which one is assigned as the target locus to be re-annotated, and all others serve as informants. This resulting list of gene encoding loci is provided as a multi fasta file. A second input option provides additional functionality, where each (informant) gene locus is a folder that contains the genomic locus (fasta format), optionally its currently annotated gene model (gff format) and unigenes aligned to this locus (gff format). A provided unigene is used as an additional informant, from which spliced alignments are exploited as guidance to infer intron-exon boundaries to enhance the prediction performance. A provided gene structure is used to speed up similarity searches by prioritization and to visualize differences between current annotation and the ABFGP prediction. Optionally, the exons of provided genes can be used as prior knowledge to facilitate detection of poorly conserved parts of the gene.
Benchmarking of the ABFGP method
Benchmarking of the ABFGP performance on validated genes compared to GeneMark-ES
10 pooled species1
Magnaporthe oryzae 2
The ABFGP method applied to a set of available full-length unigenes from Magnaporthe oryzae and Fusarium verticillioides was compared to GeneMark-ES , which was previously used on a smaller set of unigenes from these two fungi (Table 1). The ABFGP method performed better than GeneMark-ES on all gene components (exons, introns and nucleotides), in terms of sensitivity but most noticeably in terms of precision. The gene sensitivity achieved by ABFGP was 81.7% and 82.1% for the unigenes of M. oryzae and F. verticilloides, respectively. The results of this benchmarking show that the ABFGP method can confidently be applied to improve gene models in fungal genomes.
ABFGP as a tool to curate gene models of six annotated fungal genomes
Gene models in six fungal species re-annotated by the ABFGP method
Zymoseptoria tritici 1
Fold genome coverage
# Annotated genes
Total eligible gene models
Bi Directional Best Hit
Gene Model Error
Bi Directional Best Hit5
Gene Model Error5
Reliability of the ABFGP-predicted gene models
Introspection of results obtained by the ABFGP method
Zymoseptoria tritici 1
Total number of assessed genes3
Types of revisions proposed by the ABFGP method
Types of revisions in annotated gene models made by the ABFGP method
Zymoseptoria tritici 1
Total revised genes2
Genes containing SE and/or DMs3
Genes split by ABFGP
Genes merged by ABFGP
Total annotated exons
5’ or 3’ removed (−) / added (+)5,6
Internal removed (−) / added (+)5,7
Total annotated introns
Stopless 3n removed (−) / added (+)9
Increasing the number of informants improves performance of the ABFGP method
ABFGP performance decreased when using fewer informants or when closely related informants are not available (data not shown). For the curation of a particular gene model, the most closely related fungal species failed to provide informants for 7 to 19% of selected loci (Additional file 5). Conversely, fungal species that provided the lowest number of informants still contributed 16 to 38% of informant loci. In addition, in some cases, fungal species that provided most of the informant loci are not always the closest relatives. For example, M. fijiensis, the closest relative of Z. tritici, is not among the top three species that provided the highest number of informants (Additional file 5). Similarly, N. haematococca and M. oryzae provide more informants than V. albo-atrum for the curation of V. dahliae. For C. fulvum and M. fijiensis, it is striking that fungi that belong to a different taxonomic class are in the top three species that provided the highest number of informants. Our results show that the six studied fungal gene catalogues differ in quality. Because all informant catalogues were predicted by the same genome sequence centres (see Additional file 1), similar error rates are expected to occur in their gene models. An unexpected low contributor to the pool of informants could be explained by a slightly higher error rate in its gene catalogue. In addition, many genes show a discontinuous distribution in the fungal tree of life [8, 10]. This underlines the importance of selecting informants from a wide phylogenetic spectrum of species rather than from a small set of closely related species.
The ABFGP method accurately predicts intron-exon structures of protein-encoding genes in fungi
The ABFGP method can accurately re-annotate the intron-exon structure in a gene-by-gene fashion when a gene locus is provided with sufficient informants. GeneMark-ES was chosen as a state of the art ab initio gene predictor, and we have shown that the ABFGP method improves the quality of the gene models. This is explained by a higher precision (Table 1), which means that a lower number of false positives are reported by ABFGP. Indeed, in general, evidence- or alignment-based methods are less prone to wrongly assign additional exons , because they are only predicted when supported by informants. Predicting introns in compact genomes with numerous small introns is challenging , yet ABFGP achieves both a high sensitivity (91.2%) and precision (97.3%) (Table 1). This is achieved by exploiting abundantly occurring intron presence-absence patterns . SEs and/or DMs can be confidentially recognized as discontinuities when compared with exonic sequences of informant genes. Finally, lack of synteny in distantly related fungi facilitates recognition of false gene fusions, which is a frequently observed error made by ab initio gene predictors [5, 16]. Adjacent genes with the same orientation are prone to be falsely fused to the target gene, but this is minimized in the ABFGP method because of the shuffled gene order in informant genomes. Whole-genome alignment-based gene prediction benchmarked on a test set of 1,483 genes from two strains of C. neoformans achieved 88% and 89% exon sensitivity and precision, respectively, resulting in an overall gene sensitivity of ~60% , which is low considering the high conservation between the two genomes. This shows that the gene-by-gene approach by the ABFGP method is more powerful, even by making use of informant genes from evolutionary distant fungal species. The benchmark test showed uniform performance on unigenes from ten selected species (Additional file 4). Yet, this performance was, in case of D. septosporum, achieved with generic PSSMs that were not derived from its own splice sites. Species-specific parameterization of gene properties was indicated as crucial for the performance of ab initio supervised , unsupervised  as well as the alignment-based gene prediction methods . We speculate that in the ABFGP method, the number of informants compensates for the absence of species-specific parameterization.
ABFGP as a genome-wide annotation assessment tool
Between 7,205 and 8,270 gene models of six fungal genomes were automatically assessed by the ABFGP method. Between 1,724 and 3,505 (on average 2,480) of these gene models were proposed to be incorrect and needed revision. A more stringent indication of correct revisions is obtained by counting only those revised gene models that were labelled ‘ok’ (Table 2), corrected for the observed error rate of the ABFGP method (based on 79% gene sensitivity). This yields an estimated revision of between 1,362 and 2,769 gene models for each fungal species. These numbers are in the same range as those obtained in a recent genome-wide re-annotation effort of the F. graminearum genome, which was based on predictions by a suite of gene predictors, using expression data and followed by extensive manual curation . In that case, 1,770 gene models were revised, 691 new gene models were added and 286 gene models were removed. Yet, a recent study using RNA-Seq data revised another 655 gene models , showing that the quality-improving manual curation effort was not yet exhaustive. Their analysis  and ours independently show that thousands of genes are still wrongly annotated in gene catalogues of many published fungal genomes. Interestingly, the same types of revision were reported (false gene splits and fusions, novel introns and a decrease in average intron length) as those proposed by the ABFGP method.
Types of revision are often related to the annotation pipelines used (Table 2). For example, inclusion of new exons represents a rare class of revisions, except in the two genomes that were annotated at the BROAD institute. In contrast, prediction of too many stopless 3n introns was observed in the genomes of M. fijiensis and Z. tritici that were sequenced at the JGI. The lowest number of revised gene models was proposed for C. fulvum and D. septosporum, which represent the most recently sequenced and independently annotated genomes . We speculate that this might reflect the steady increase in accuracy of ab initio gene prediction software.
In this study six different fungi from three distinct phylogenetic classes were re-annotated, using informants from five classes of Ascomycota and two unrelated Basidiomycota. This shows that the ABFGP method is species-independent and can be applied to a wide variety of fungal genomes.
Genome-wide re-annotation by the ABFGP method did not capture the complete gene catalogues (Table 2) which is mainly due to the stringent criteria that were chosen to obtain the most likely orthologous informant genes (see Methods). This effect is most obvious for informant genes obtained from poorly annotated genomes. Performance for those genes can be improved, besides lowering this threshold, by expanding beyond using annotated genes only. An informant locus can be any genomic region that has ample sequence similarity to the target protein or locus. TBLASTN or TBLASTX could be used to detect loci that failed to be recognized and annotated as protein-coding genes or were poorly annotated (see Figure 1). Loci that are obtained directly from a (non-annotated) genomic sequence could be used as an additional resource for informants that would simultaneously increase the number of eligible target genes and prediction performance of ABFGP. The reverse strategy could also be employed by using the ABFGP method to generate de novo gene models in the target genome that lack predicted gene models but have significant sequence similarity to predicted proteins in other species. However, a general limitation of de novo evidence-based gene prediction, including the ABGFP method, is that annotation of species-specific or fast evolving genes is not possible by any prediction method. The ABFGP method follows an alternative approach to the various other ensemble predictors, because it derives its evidence directly from genomic informant sequences. Moreover, it proposes revised gene models that include SEs and/or DMs. This makes the ABFGP method complementary to other ensemble predictors, because these occur frequently in the gene catalogues of these fungal genomes .
Sequence errors and disruptive mutations in fungal genes
Presumed inconsistent gene models were revised in 70 to 83% of all cases (Table 2), of which on average 55% were labelled by the introspection procedure as ‘ok’ for all introns and exons. Among these revisions was an unexpected high number of gene models containing SEs and/or DMs. Because ab initio gene prediction software does not allow in-frame stops or frame-shifts causing indels, (pseudo)genic regions with strong coding signals will often be predicted to be truncated or split gene model(s). Of the six studied fungi, most revisions were proposed for B. cinerea, likely because its Sanger sequenced genome assembly is supported by 4.5× coverage only , and its annotation was performed several years ago. Recently, resequencing of B. cinerea using Illumina, supplemented with some additional small Sanger reads, resulted in a new assembly with 50× coverage . This new sequence not only revealed 31,275 SEs (personal communication Dr. Martijn Staats), but also a considerable number of assembly errors in the original reference sequence, of which many were located in coding regions that contained annotated, yet apparently fragmented genes (personal communication Dr. Jan van Kan). This could be an explanation for the higher frequency (2.0% versus 0.4-1.2% for the other five fungi, Table 2) of abandoned executions by the ABFGP method. However, a considerable fraction of inconsistencies observed in coding regions were confirmed by resequencing, indicating that they were not SEs but true DMs. Additional studies on DMs in these six fungal species suggest that pseudogenization is very common in fungi . Our results show that many fungal gene catalogues still contain numerous unidentified truncated and erroneous gene models due to SEs and/or DMs, that are readily detected by the ABFGP method.
Introspection of proposed gene model revisions
The introspection module for assessing gene model correctness is a useful extension of the ABFGP method as it helps to prioritize gene models that still need manual curation. For the six fungal genomes, between 3,942 and 5,505 genes were suggested to not require additional manual curation (Table 3). Based on the benchmarked performance of the introspection procedure using the unigene dataset, the error rate of genes incorrectly labelled as ‘ok’ is estimated to be 12.9%. This accounts for only 500 to 700 models out of 4,000 to 5,500 that contain errors. For gene models that were recognized as ‘doubtful’, the ABFGP method provides a GFF-track that shows the doubtful parts of the predicted gene model that require manual curation. However, the introspection module still needs further improvement because 20.6% of the gene models is incorrectly labelled: 12.9% is labelled as ‘ok’ but do contain (small) errors and 7.7% is labelled as ‘doubtful’ whereas the gene models are correct. Lowering the number of false positives can possibly be achieved by including ab initio gene model prediction in the ABFGP method, which would allow better detection of species-specific variation of genic regions. This would further increase the efficiency of the ABFGP method as an automated and accurate method for gene model curation.
Availability of an accurate gene catalogue of an organism is a prerequisite and starting point for functional analyses of its genes. Obtaining such a catalogue with minimal manual input is still a major challenge. The ABFGP method is a useful tool to integrate into existing gene annotation pipelines because it can assess and improve gene models with great accuracy in a fully automated manner. The concept of gene-by-gene alignment-based gene prediction exploits the availability of dozens of sequenced fungal genomes, which is particularly useful for annotating novel genomes of these plastic organisms. The possibility of the ABFGP introspection procedure at the gene and intron-exon level helps to decrease the number of gene models that still require manual curation. Because fungal genome sequencing is undertaken at an accelerating pace , both quality and number of informant gene loci are expected to increase in the coming years, which will disclose more target gene loci in genomes and also increase the efficiency and reliability of the ABFGP method.
Sequences, annotations and third party software used
Genomes, proteomes and annotations of 29 fungal species were downloaded from the Fungal Genome Initiative of the BROAD Institute  and the Fungal Genomics Program of the Joint Genome Institute (JGI)  (Additional file 1). Available unigenes from ten fungal species were downloaded from the JGI and The Gene Index Project (http://compbio.dfci.harvard.edu/tgi/). The ABFGP method uses several third party applications: BLAST 2.2.8, ClustalW 2.0.12, HMMER 2.3.2, SignalP 3.0, TMHMM 2.0, transeq, getorf and tcode from EMBOSS 6.2.0.
Datasets of assembled unigenes (Additional file 1) were aligned to their genomes using GeneSeqer (October 2005) and for each unigene the obtained intron-exon structure of its coding sequence was compared to its annotated gene model. For benchmarking the ABFGP method only those unigenes that were full-length were selected.
An all-versus-all similarity matrix was created between all proteins from the 29 predicted proteomes using BLASTP. From this matrix, informant proteins from different fungi were selected for each target protein by applying the following criteria: the protein must represent (i) the bi-directional best hit (BDBH) in the informant’s proteome, (ii) the alignment must span at least 70% of the length of both target and informant protein, (iii) the relative difference in length between target and informant protein must be below 50% (calculated from ii) and (iv) the alignment‘s bitscore between target and informant protein must be at least 10% of the bitscore of the proteins when compared to themselves. As a final criterion, at least four informant proteins must be available for a target protein, and the total number of informants was limited to the 19 most similar informants (based on bitscore). This dataset of genes eligible for ABFGP is referred to as BDBH. A second category was created by lowering the requirement of length coverage to 25% and increasing length difference to 300%, followed by filtering for target proteins that were linked to either consistently longer or shorter informant proteins. Consistent protein length variation putatively indicates species-specific variation or that the corresponding gene model contains major errors (this dataset is referred to as GME). For both categories, target and informant proteins were loaded into ABFGP as DNA sequence of their genomic locus flanked by an additional 1.5 kb of sequence on both sides of the gene’s start and stop codon. Unigenes aligned to these gene loci were taken along as additional informants. In the benchmark that uses unigenes, informants were selected only by the BDBH approach and full-length unigene data aligned to the target gene locus were discarded; the parameters `--abinitio` and `--benchmark` were used to discard the unigene of the target locus and annotated gene models as hints. In all benchmark analyses, sensitivity and precision are calculated according as described by Picardi and Pesole , in which specificity is an alias for precision.
Position Specific Scoring Matrices of genic elements
Definitions of donor site, acceptor site, branch point and polypyrimidine tracks were chosen according to . Generic fungal PSSMs (Additional file 2) for the canonical donor (n = 571,185), the non-canonical GC donor (n = 2,428) and the canonical acceptor (n = 576,021) were derived from all splice sites without any nonambiguous nucleotide in 25 annotated genomes (excluding the annotations of Cladosporium fulvum, Coccidioides posadasii, Dothistroma septosporum, Nectria haematococca and Trichoderma atroviride, which were added as target and/or informant species in a later stage of the analyses).
Access to the method and data
A technical explanation of the ABFGP method, and its GFF visualization is provided in Additional file 2. The source code of the ABFGP method is available (see Availability and requirements). Other datasets are available upon request by the corresponding authors: the complete list of unigene identifiers used for the benchmark analyses (.xls), the predicted gene models from the benchmark that uses unigenes (GFF files) and the genome-wide re-annotation of the six fungi (fasta and simplified GFF files).
Availability and requirements
Project name: ABFGP
Project home page: https://github.com/atevanderburgt/ABFGP
Operating system: Linux, Unix
Programming language: Python
Other requirements: Python 2.6 or higher
Licence: GNU GPL
Any restrictions to use by non-academics: None
Ate van der Burgt, Jerome Collemare and Pierre JGM de Wit were funded in part by the laboratory of Phytopathology of Wageningen University and a grant from The Royal Netherlands Academy of Arts and Sciences. Edouard Severing and Ate van der Burgt were also funded in part by the Applied Bioinformatics Group of Wageningen University. The authors would like to thank Dr. Jan A.L. van Kan and Dr. Martijn Staats for sharing unpublished data on the re-sequenced genome of Botrytis cinerea.
- Grigoriev IV, Nordberg H, Shabalov I, Aerts A, Cantor M, Goodstein D, Kuo A, Minovitsky S, Nikitin R, Ohm RA, et al: The genome portal of the Department of Energy Joint Genome Institute. Nucleic Acids Res. 2012, 40 (Database issue): D26-D32.View ArticlePubMed CentralPubMedGoogle Scholar
- Cuomo CA, Birren BW: The fungal genome initiative and lessons learned from genome sequencing. Methods Enzymol. 2010, 470: 833-855.View ArticlePubMedGoogle Scholar
- Picardi E, Pesole G: Computational methods for ab initio and comparative gene finding. Methods Mol Biol. 2010, 609: 269-284. 10.1007/978-1-60327-241-4_16.View ArticlePubMedGoogle Scholar
- Stanke M, Tzvetkova A, Morgenstern B: AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome. Genome Biol. 2006, 7 (1): S11-11–18View ArticlePubMed CentralPubMedGoogle Scholar
- Ter-Hovhannisyan V, Lomsadze A, Chernoff YO, Borodovsky M: Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome Res. 2008, 18 (12): 1979-1990. 10.1101/gr.081612.108.View ArticlePubMed CentralPubMedGoogle Scholar
- Tenney AE, Brown RH, Vaske C, Lodge JK, Doering TL, Brent MR: Gene prediction and verification in a compact genome with numerous small introns. Genome Res. 2004, 14 (11): 2330-2335. 10.1101/gr.2816704.View ArticlePubMed CentralPubMedGoogle Scholar
- Stanke M, Schoffmann O, Morgenstern B, Waack S: Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics. 2006, 7: 62-10.1186/1471-2105-7-62.View ArticlePubMed CentralPubMedGoogle Scholar
- Ohm RA, Feau N, Henrissat B, Schoch CL, Horwitz BA, Barry KW, Condon BJ, Copeland AC, Dhillon B, Glaser F, et al: Diverse lifestyles and strategies of plant pathogenesis encoded in the genomes of eighteen Dothideomycetes fungi. PLoS Pathog. 2012, 8 (12): e1003037-10.1371/journal.ppat.1003037.View ArticlePubMed CentralPubMedGoogle Scholar
- Oliver R: Genomic tillage and the harvest of fungal phytopathogens. New Phytol. 2012, 196 (4): 1015-1023. 10.1111/j.1469-8137.2012.04330.x.View ArticlePubMedGoogle Scholar
- Raffaele S, Kamoun S: Genome evolution in filamentous plant pathogens: why bigger can be better. Nat Rev Microbiol. 2012, 10 (6): 417-430.PubMedGoogle Scholar
- Liu Q, Mackey AJ, Roos DS, Pereira FC: Evigan: a hidden variable model for integrating gene evidence for eukaryotic gene prediction. Bioinformatics. 2008, 24 (5): 597-605. 10.1093/bioinformatics/btn004.View ArticlePubMedGoogle Scholar
- Bernal A, Crammer K, Pereira F: Automated gene-model curation using global discriminative learning. Bioinformatics. 2012, 28 (12): 1571-1578. 10.1093/bioinformatics/bts176.View ArticlePubMedGoogle Scholar
- Liu Q, Crammer K, Pereira FC, Roos DS: Reranking candidate gene models with cross-species comparison for improved gene prediction. BMC Bioinformatics. 2008, 9: 433-10.1186/1471-2105-9-433.View ArticlePubMed CentralPubMedGoogle Scholar
- Birney E, Clamp M, Durbin R: GeneWise and Genomewise. Genome Res. 2004, 14 (5): 988-995. 10.1101/gr.1865504.View ArticlePubMed CentralPubMedGoogle Scholar
- Keller O, Odronitz F, Stanke M, Kollmar M, Waack S: Scipio: using protein sequences to determine the precise exon/intron structures of genes and their orthologs in closely related species. BMC Bioinformatics. 2008, 9: 278-10.1186/1471-2105-9-278.View ArticlePubMed CentralPubMedGoogle Scholar
- Wong P, Walter M, Lee W, Mannhaupt G, Munsterkotter M, Mewes HW, Adam G, Guldener U: FGDB: revisiting the genome annotation of the plant pathogen Fusarium graminearum. Nucleic Acids Res. 2011, 39 (Database issue): D637-D639.View ArticlePubMed CentralPubMedGoogle Scholar
- Zhao C, Waalwijk C, de Wit PJ, Tang D, van der Lee T: RNA-Seq analysis reveals new gene models and alternative splicing in the fungal pathogen Fusarium graminearum. BMC Genomics. 2013, 14 (1): 21-10.1186/1471-2164-14-21.View ArticlePubMed CentralPubMedGoogle Scholar
- Kupfer DM, Drabenstot SD, Buchanan KL, Lai H, Zhu H, Dyer DW, Roe BA, Murphy JW: Introns and splicing elements of five diverse fungi. Eukaryot Cell. 2004, 3 (5): 1088-1100. 10.1128/EC.3.5.1088-1100.2004.View ArticlePubMed CentralPubMedGoogle Scholar
- Nielsen CB, Friedman B, Birren B, Burge CB, Galagan JE: Patterns of intron gain and loss in fungi. PLoS Biology. 2004, 2 (12): e422-10.1371/journal.pbio.0020422.View ArticlePubMed CentralPubMedGoogle Scholar
- de Wit PJ, van der Burgt A, Okmen B, Stergiopoulos I, Abd-Elsalam KA, Aerts AL, Bahkali AH, Beenen HG, Chettri P, Cox MP, et al: The genomes of the fungal plant pathogens Cladosporium fulvum and Dothistroma septosporum reveal adaptation to different hosts and lifestyles but also signatures of common ancestry. PLoS Genetics. 2012, 8 (11): e1003088-10.1371/journal.pgen.1003088.View ArticlePubMed CentralPubMedGoogle Scholar
- Amselem J, Cuomo CA, van Kan JA, Viaud M, Benito EP, Couloux A, Coutinho PM, de Vries RP, Dyer PS, Fillinger S, et al: Genomic analysis of the necrotrophic fungal pathogens Sclerotinia sclerotiorum and Botrytis cinerea. PLoS Genetics. 2011, 7 (8): e1002230-10.1371/journal.pgen.1002230.View ArticlePubMed CentralPubMedGoogle Scholar
- Klosterman SJ, Subbarao KV, Kang S, Veronese P, Gold SE, Thomma BP, Chen Z, Henrissat B, Lee YH, Park J, et al: Comparative genomics yields insights into niche adaptation of plant vascular wilt pathogens. PLoS Pathogens. 2011, 7 (7): e1002137-10.1371/journal.ppat.1002137.View ArticlePubMed CentralPubMedGoogle Scholar
- Goodwin SB, M'Barek SB, Dhillon B, Wittenberg AH, Crane CF, Hane JK, Foster AJ, Van der Lee TA, Grimwood J, Aerts A, et al: Finished genome of the fungal wheat pathogen Mycosphaerella graminicola reveals dispensome structure, chromosome plasticity, and stealth pathogenesis. PLoS Genetics. 2011, 7 (6): e1002070-10.1371/journal.pgen.1002070.View ArticlePubMed CentralPubMedGoogle Scholar
- van der Burgt A, Karimi M, Bahkali AH, de Wit PJ: Pseudogenization in pathogenic fungi with different host plants and lifestyles might reflect their evolutionary past. Mol Plant Pathol. 2013, 15: 133-144. in pressView ArticlePubMedGoogle Scholar
- Staats M, van Kan JA: Genome update of Botrytis cinerea strains B05.10 and T4. Eukaryot Cell. 2012, 11 (11): 1413-1414. 10.1128/EC.00164-12.View ArticlePubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.