Volume 10 Supplement 7
Comparison of annotation terms between automated and curated E. coli K12 databases
© Marpuri and Rinehart; licensee BioMed Central Ltd. 2009
Published: 25 June 2009
Genome sequencing and annotation may provide ways to understand genomes. Annotation of genome results in identification of genes in terms of precise start and end sites and description of cellular components, molecular functions and biological process. Increase in the wealth of the genomic data has led to the necessity of identification of information encoded within the genome which in turn resulted in the development of automated annotation techniques that assigns functions to newly sequenced genes based on similarity to previously annotated genes. This approach has a few problems, for example if there was a mistake or error in previously annotated genomes it will result in whole family of misannotated genes. Annotation usually fails to meet the "golden standard" of the curated databases as the level of details in automated annotation systems is reduced, classifying proteins into more broader categories. To overcome this problem; ontology terms were used in automated databases as a means of understanding and recognizing types of proteins to the level of curated databases.
In this project we tried to compare the results of predictive automated bacterial annotation programs to a curated annotation databases such as EcoCyc. EcoCyc is a conservative multidimensional annotation system that is validated by over 15,000 publications. Automated annotation systems, such as BASys can be used as first pass annotation tools that try to add as many annotations as possible by drawing upon over 30 sources. Gene Ontology is described by a defined library of terms related to the biological process, cellular components and molecular functions of a gene in an organism. Because of the limited and common terms in the ontology annotations, we compared ontology's between the BASys and EcoCyc databases. Additional, non-ontology terms and metadata were generated in BASys. Methods were developed to compare these additional terms to the EcoCyc database and it was found that approximately 17% of the BASys predicted ontology's matched the EcoCyc database.
Materials and methods
Each of the annotation terms from the respective databases were converted into common GO numbers by using the respective conversion files from the Gene Ontology site http://www.geneontology.org/.
Results and conclusion
Summary of matches and mismatches between databases
Bioinformatics and Information Science Center, Western Kentucky University.
- Karp PD, Keseler IM, Shearer A, Latendresse M, Krummenacker M, Paley SM, Paulsen I, Collado-Vides J, Gama-Castro S, Peralta-Gil M, et al.: Multidimensional annotation of the Escherichia coli K-12 genome. Nucleic Acids Res 2007, 35(22):7577–7590. 10.1093/nar/gkm740PubMed CentralView ArticlePubMedGoogle Scholar
- Van Domselaar GH, Stothard P, Shrivastava S, Cruz JA, Guo A, Dong X, Lu P, Szafron D, Greiner R, Wishart DS: BASys: a web server for automated bacterial genome annotation. Nucleic Acids Res 2005, (33 Web Server):W455-W459. 10.1093/nar/gki593Google Scholar
- BASys Bacterial Annotation System[http://wishart.biology.ualberta.ca/basys/cgi/gallery.pl]
- Gene Ontology[http://www.geneontology.org/]
This article is published under license to BioMed Central Ltd.