The current implementation of iBarcode.org (July 2008) includes a sequence upload and management suite and nine analysis and visualization tools. The sequence upload and management suite enables input, selection, verification, concatenation, and visualization of sequences. The web server provides tools that are divided into three categories. Here we introduce key features of iBarcode.org and provide exemplar cases from barcode data for each analysis and visualization module.
Sequence analysis
a. Haplotype variation
This tool identifies unique haplotypes for each species and provides statistical information on haplotype frequency and nucleotide variation in a user-friendly table format. A simple measure of number of nucleotide difference between sequences is used to calculate haplotype variation across the sequences. Figure 1 demonstrates the screen capture from output of the haplotype variation tool for a set of primate species (partial data set from Hajibabaei et al. [10]). In addition to this table, a reduced dataset containing unique haplotypes is produced in FASTA format. This dataset is stored for further use in other tools (see below) or for download by the submitter.
b. Haplotype map (Barcode-HAPMAP)
This data visualization module provides a graphical view of the nucleotide character variation in a barcode data set. It allows the user to quickly pinpoint nucleotide positions within the barcode sequence that account for barcode variation in a set of species. The tool takes a FASTA alignment of barcode sequences (or the alignment of unique haplotypes created in the Haplotype Analysis tool from a given barcode dataset) as input and highlights variable positions across the barcode sequence in an easy-to-read format. It also shows the nucleotide position for each variable site (counting from 5' to 3') as well as the codon positions they belong to. It is therefore important that the FASTA file of the barcode sequences is in the correct reading frame. This tool works best for focused character-based analysis of a limited number of taxa (i.e. in a species complex or when dealing with cryptic species) as a complement to distance-based methods such as Neighbour-joining analysis [11]. The HTML output format generated by this tool allows robust data transfer to other software packages such as MS-Excel. Figure 2 is an exemplar Barcode-HAPMAP of the unique haplotypes in a set of 4 species of skipper butterflies (Lepidoptera:Hesperiidae) [12].
c. Tests of selection at different taxonomic levels
This module uses the popular ratio of non-synonymous to synonymous substitutions (ω) [13] at various taxonomic levels. This ratio has been used for estimating the degrees of selective pressure in molecular biosystematics. The module uses the program yn00 from the PAML package [14, 15] to calculate the ratio of non-synonymous to synonymous substitutions (ω) for all pairs within a set of aligned sequences. It then calculates the average and standard deviation of ω for all sequences pairs that belong to the same species, belong to the same genus, or belong to different genera. A final bar graph depicting these various values is then displayed (Figure 3).
d. DNA barcode cloud visualization
This module takes the popular "word cloud" concept and applies it to number of individuals of each species within a given barcode dataset, producing a visually-appealing means of seeing the relative abundance of species within a dataset. These relative abundances are linearly scaled between font sizes of 50 and 200 points. This feature also provides cloud visualization for sequence divergence within species and haplotype diversity in each species. Each species represented in the cloud visualization output can be selected to create a new subset dataset for further analysis using other tools. Figure 4 provides an example of a barcode cloud for a set of species of primates.
Genetic distance analysis
a. Between- vs. within-species variation graph
DNA barcoding is based on a simple premise: genetic variation between species exceeds that of within species. This tool allows the user to visualize this principle in a given barcode dataset. Specifically, for each species with 3 or more individuals, this tool plots maximum Within Species Divergence (Max-WSD) against minimum Between Species Divergence (Min-BSD) [7]. The input for this tool is a genetic distance matrix (text format) produced either internally (by calculating number of nucleotide differences between and within species) or by common sequence analysis programs such as Mega [16]. Several barcoding studies have used graphs of between- vs. within-species variation. These graphs are considered as one of the standard methods of visualizing barcode data [i.e. [7]], as they allow the user to quickly see outliers that may represent misannotated specimens or sequencing errors.
Tree analysis
a. Organic trees
In Hajibabaei et al. [7], we pioneered a new visually-appealing technique for drawing organic-looking phylogenetic trees. This method maximizes resolution for tips of the tree (i.e. species), which are most important in barcode analysis. The process of building organic trees takes several hours and therefore we have been offering the creation of such trees as an e-mail service.
b. Tree collapse
This tool uses bootstrap values in a phylogenetic tree as a benchmark for visualizing statistical support of a given barcode dataset [10]. This is done by collapsing all the branches that are unsupported by a bootstrap cut-off value that is specified by the user. Although short barcode sequences are not strong phylogenetic markers at deep levels, they are excellent for species-level divergences. A high bootstrap cut-off (i.e. 100%) leads to collapsing most of the branches deeper than species-level, but the majority of the species-level branches are kept intact. However, exceptionally closely related species may require longer sequences to gain a very high bootstrap support.
c. Tree tip colourization
This visualization tool uses a standard Newick format tree and colourizes the branches leading to individuals of each species (within-species distances) in red and the branches leading to each unique species in blue. It provides a robust method to visually compare different parts of a tree and therefore helps pinpointing exceptional divergence levels or regions of the tree that lack monophyly.