iBarcode.org: web-based molecular biodiversity analysis
© Singer and Hajibabaei; licensee BioMed Central Ltd. 2009
Published: 16 June 2009
DNA sequences have become a primary source of information in biodiversity analysis. For example, short standardized species-specific genomic regions, DNA barcodes, are being used as a global standard for species identification and biodiversity studies. Most DNA barcodes are being generated by laboratories that have an expertise in DNA sequencing but not in bioinformatics data analysis. Therefore, we have developed a web-based suite of tools to help the DNA barcode researchers analyze their vast datasets.
Our web-based tools, available at http://www.ibarcode.org, allow the user to manage their barcode datasets, cull out non-unique sequences, identify haplotypes within a species, and examine the within- to between-species divergences. In addition, we provide a number of phylogenetics tools that will allow the user to manipulate phylogenetic trees generated by other popular programs.
The use of a web-based portal for barcode analysis is convenient, especially since the WWW is inherently platform-neutral. Indeed, we have even taken care to ensure that our website is usable from handheld devices such as PDAs and smartphones. Although the current set of tools available at iBarcode.org were developed to meet our own analytic needs, we hope that feedback from users will spark the development of future tools. We also welcome user-built modules that can be incorporated into the iBarcode framework.
Advancements in DNA sequencing technologies in recent years have resulted in an explosive use of comparative DNA sequence analysis in biological sciences. DNA sequence information has been used in a wide range of applications and for addressing different biological questions from development to evolution and biodiversity. In the early days of molecular biology a handful of sequence analysis software applications existed, several of them have been developed by researchers to address their needs. In last decade or so, development of more robust sequencing platforms, mainly as a result of human and other genome projects, resulted in the introduction of more powerful data analysis packages. Additionally, advancements in computer technologies and applications have been essential for a boom in bioinformatics. With the widespread use of Internet, it soon became an important vehicle for sequence databases such as GenBank. In addition, organizations such as the National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI) as well as smaller initiatives and even individual labs started offering some of their services (i.e. search, access to data, analysis and visualization) through web-based portals.
The majority of tools and portals that have been developed for sequence data analysis have been directed towards genome projects data, mainly because of the overwhelming complexity and large size of genomes as compared to sequence of a single gene. Genome browsers and search tools are good examples. This expansion of sequence information from genes to genomes, have also influenced and been applied to biosystematics analysis. For example, the field of phylogenomics  argues for the use of genome sequences (either as a whole or several portions) to study evolutionary relationships.
In contrast to this move from genes to genomes, a relatively new approach, DNA barcoding, aims at developing a species-specific sequence library for all eukaryotes, using a small gene region, with the primary mission of enhancing biodiversity analysis . DNA barocding is based on two key principles of minimalism and standardization. While an efficient identification library requires analyzing maximal number of specimens in different taxonomic groups, species-level identification can be achieved by limiting the analysis to small fragments of genomes (i.e. DNA barcodes). A 650 bp fragment of a mitochondrial gene, cytochrome c oxidase 1 (CO1, cox1) has been proposed as the DNA barcode for animal species . Several studies have demonstrated the effectiveness of this CO1-barcode system in groups such as fishes , mammals , birds  and several arrays of insects [7, 8]. While barcoding by using a single gene fragment has proven efficient for most animals tested, it may be necessary to use 2–3 fragments to achieve species-level resolution in other kingdoms of life.
Although DNA barcoding data – sequence information attached to specimens from different species – has similarities to other biosystematics sequence data (i.e. phylogenetic and population genetics data) , new analysis tools are required to facilitate efficient use of barcode information in biodiversity studies. One of the most distinctive features of barcode datasets involves relatively large number of barcode sequences (i.e. several thousands) connected to collateral information (i.e. geographic, ecologic). The analysis and visualization of such large datasets have been challenging.
Here we introduce iBarcode.org, a web-based application server that provides various visualization and analysis tools for DNA barcoding data in a user-friendly environment. These tools have mainly been designed to enable the analysis of large barcode-style data sets, although the features can be used for the analysis of other sequence data. iBarcode.org is free and does not require registration.
The current implementation of iBarcode.org (July 2008) includes a sequence upload and management suite and nine analysis and visualization tools. The sequence upload and management suite enables input, selection, verification, concatenation, and visualization of sequences. The web server provides tools that are divided into three categories. Here we introduce key features of iBarcode.org and provide exemplar cases from barcode data for each analysis and visualization module.
a. Haplotype variation
b. Haplotype map (Barcode-HAPMAP)
c. Tests of selection at different taxonomic levels
d. DNA barcode cloud visualization
Genetic distance analysis
a. Between- vs. within-species variation graph
DNA barcoding is based on a simple premise: genetic variation between species exceeds that of within species. This tool allows the user to visualize this principle in a given barcode dataset. Specifically, for each species with 3 or more individuals, this tool plots maximum Within Species Divergence (Max-WSD) against minimum Between Species Divergence (Min-BSD) . The input for this tool is a genetic distance matrix (text format) produced either internally (by calculating number of nucleotide differences between and within species) or by common sequence analysis programs such as Mega . Several barcoding studies have used graphs of between- vs. within-species variation. These graphs are considered as one of the standard methods of visualizing barcode data [i.e. ], as they allow the user to quickly see outliers that may represent misannotated specimens or sequencing errors.
a. Organic trees
In Hajibabaei et al. , we pioneered a new visually-appealing technique for drawing organic-looking phylogenetic trees. This method maximizes resolution for tips of the tree (i.e. species), which are most important in barcode analysis. The process of building organic trees takes several hours and therefore we have been offering the creation of such trees as an e-mail service.
b. Tree collapse
This tool uses bootstrap values in a phylogenetic tree as a benchmark for visualizing statistical support of a given barcode dataset . This is done by collapsing all the branches that are unsupported by a bootstrap cut-off value that is specified by the user. Although short barcode sequences are not strong phylogenetic markers at deep levels, they are excellent for species-level divergences. A high bootstrap cut-off (i.e. 100%) leads to collapsing most of the branches deeper than species-level, but the majority of the species-level branches are kept intact. However, exceptionally closely related species may require longer sequences to gain a very high bootstrap support.
c. Tree tip colourization
This visualization tool uses a standard Newick format tree and colourizes the branches leading to individuals of each species (within-species distances) in red and the branches leading to each unique species in blue. It provides a robust method to visually compare different parts of a tree and therefore helps pinpointing exceptional divergence levels or regions of the tree that lack monophyly.
iBarcode.org is built on the Python-based web.py application framework . Although most analyses are performed using Python itself, visualization and analysis are accomplished via calls to the statistical language R , the graphing package GraphViz , and the phylogenetic analysis package PAML . We have intentionally kept the interface light and clean so that it loads quickly over low-bandwidth connections, and so that it is viewable and functional from text-based browsers (such as Lynx) or from small handheld devices (cell phones or PDAs).
In the future, we plan to have an application programming interface (API) for our tools, allowing other developers to integrate our analyses into their own tools.
Similarly to several other branches of biology, biodiversity science has increasingly been relying on DNA sequence information. DNA barcoding, as a new global initiative for biodiversity analysis, demands specialized bioinformatics tools and applications. iBarcode.org is a web-based application server developed for visualization and analysis of DNA barcode data. The suite of simple but highly customized tools in iBarcode.org allows the analysis and visualization of barcode data at sequence, genetic distance, and phylogenetic tree levels. Several of these applications have already contributed to barcode publications. iBarcode.org provides a web2.0 environment for developing and sharing tools for barcode data and sets the stage for a new wave of community driven bioinformatics applications.
We acknowledge feedback and support from DNA barcode community especially during the 2nd International Barcode of Life Conference in Taipei (September 2007). We acknowledge the support from the Canadian Centre for DNA Barcoding (CCDB) and an award from the Consortium for Barcode of Life (CBOL).
This article has been published as part of BMC Bioinformatics Volume 10 Supplement 6, 2009: European Molecular Biology Network (EMBnet) Conference 2008: 20th Anniversary Celebration. Leading applications and technologies in bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/10?issue=S6.
- Murphy WJ, Pevzner PA, O'Brien SJ: Mammalian phylogenomics comes of age. Trends Genet 2004, 20: 631–639. 10.1016/j.tig.2004.09.005View ArticlePubMedGoogle Scholar
- Marshall E: Taxonomy. Will DNA bar codes breathe life into classification? Science 2005, 307: 1037. 10.1126/science.307.5712.1037View ArticlePubMedGoogle Scholar
- Hebert PDN, Cywinska A, Ball SL, deWaard JR: Biological identifications through DNA barcodes. Proc Biol Sci 2003, 270: 313–321. 10.1098/rspb.2002.2218PubMed CentralView ArticlePubMedGoogle Scholar
- Ward RD, Zemlak TS, Innes BH, Last PR, Hebert PDN: DNA barcoding Australia's fish species. Philos Trans R Soc Lond B Biol Sci 2005, 360: 1847–1857. 10.1098/rstb.2005.1716PubMed CentralView ArticlePubMedGoogle Scholar
- Hajibabaei M, Singer GA, Clare EL, Hebert PDN: Design and applicability of DNA arrays and DNA barcodes in biodiversity monitoring. BMC Biol 2007, 5: 24. 10.1186/1741-7007-5-24PubMed CentralView ArticlePubMedGoogle Scholar
- Hebert PDN, Stoeckle MY, Zemlak TS, Francis CM: Identification of birds through DNA barcodes. PLoS Biol 2004, 2: E312. 10.1371/journal.pbio.0020312PubMed CentralView ArticlePubMedGoogle Scholar
- Hajibabaei M, Janzen DH, Burns JM, Hallwachs W, Hebert PDN: DNA barcodes distinguish species of tropical Lepidoptera. Proc Natl Acad Sci USA 2006, 103: 968–971. 10.1073/pnas.0510466103PubMed CentralView ArticlePubMedGoogle Scholar
- Smith MA, Woodley NE, Janzen DH, Hallwachs W, Hebert PDN: DNA barcodes reveal cryptic host-specificity within the presumed polyphagous members of a genus of parasitoid flies (Diptera: Tachinidae). Proc Natl Acad Sci USA 2006, 103: 3657–3662. 10.1073/pnas.0511318103PubMed CentralView ArticlePubMedGoogle Scholar
- Hajibabaei M, Singer GAC, Hebert PDN, Hickey DA: DNA barcoding: how it complements taxonomy, molecular phylogenetics and population genetics. Trends Genet 2007, 23: 167–172. 10.1016/j.tig.2007.02.001View ArticlePubMedGoogle Scholar
- Hajibabaei M, Singer GAC, Hickey DA: Benchmarking DNA barcodes: an assessment using available primate sequences. Genome 2006, 49: 851–854. 10.1139/G06-025View ArticlePubMedGoogle Scholar
- Saitou N, Nei M: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 1987, 4: 406–425.PubMedGoogle Scholar
- Burns JM, Janzen DH, Hajibabaei M, Hallwachs W, Hebert PDN: DNA barcodes and cryptic species of skipper butterflies in the genus Perichares in Area de Conservacion Guanacaste, Costa Rica. Proc Natl Acad Sci USA 2008, 105: 6350–6355. 10.1073/pnas.0712181105PubMed CentralView ArticlePubMedGoogle Scholar
- McDonald JH, Kreitman M: Adaptive protein evolution at the Adh locus in Drosophila. Nature 1991, 351: 652–654. 10.1038/351652a0View ArticlePubMedGoogle Scholar
- Yang Z: PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 1997, 13: 555–556.PubMedGoogle Scholar
- Yang Z, Nielsen R: Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol 2000, 17: 32–43.View ArticlePubMedGoogle Scholar
- Kumar S, Tamura K, Nei M: MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment. Brief Bioinform 2004, 5: 150–163. 10.1093/bib/5.2.150View ArticlePubMedGoogle Scholar
- Clare EL, Lim BK, Engstrom MD, Eger JL, Hebert PDN: DNA barcoding of Neotropical bats: species identification and discovery within Guyana. Mol Ecol Notes 2007, 7: 184–190. 10.1111/j.1471-8286.2006.01657.xView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.