BACA: bubble chArt to compare annotations
© Fortino et al.; licensee BioMed Central. 2015
Received: 7 October 2014
Accepted: 26 January 2015
Published: 5 February 2015
DAVID is the most popular tool for interpreting large lists of gene/proteins classically produced in high-throughput experiments. However, the use of DAVID website becomes difficult when analyzing multiple gene lists, for it does not provide an adequate visualization tool to show/compare multiple enrichment results in a concise and informative manner.
We implemented a new R-based graphical tool, BACA (Bubble chArt to Compare Annotations), which uses the DAVID web service for cross-comparing enrichment analysis results derived from multiple large gene lists. BACA is implemented in R and is freely available at the CRAN repository (http://cran.r-project.org/web/packages/BACA/).
The package BACA allows R users to combine multiple annotation charts into one output graph by passing DAVID website.
High-throughput technologies, such as microarrays and RNA-sequencing, typically produce long lists of differentially expressed genes or transcripts, which are interpreted using functional annotation tools. One of the most used functional annotation program is DAVID [1,2]. The DAVID Bioinformatics Resources [1,2] at http://david.abcc.ncifcrf.gov provides an integrated biological knowledgebase and analytic tools to help users quickly find significantly represented biological themes (e.g. gene ontologies or pathways) in lists of pre-selected genes. DAVID functional annotation tool typically compiles biological terms enriched (overrepresented) in a list of up- or down-regulated genes, for instance from a transcriptomics experiment, in tabular format, which might be difficult to understand when comparing multiple experimental conditions (e.g. treatments, disease states, etc.). Several tools are available to visually compare the results from multiple enrichment analysis, such as GOBar , Go-Mapper , high-throughput GoMiner , the GOEAST  and REViGO . These are specific tools that focus more on the integration than the visualization aspect.
Here we provide BACA, a novel R-based package to concisely visualize DAVID annotations across different experimental conditions. It makes use of the R package RDAVIDWebService  to query the DAVID knowledgebase and the advanced graphical functions provided by the R package ggplot2 (http://ggplot2.org) to build charts showing multiple enrichment analysis results across conditions.
The BBplot creates a global, synthetic picture showing unique and common functional annotations found by using DAVID. In particular, it shows how common annotations are represented across different experimental conditions. BBPlot function accepts different, optional input parameters. The two most important are p-value (or EASE score) and count. These parameters are useful to select from the results by DAVID the most significant annotations.
Results and discussion
BACA is an R package designated to facilitate visualization and comparison of multiple enrichment analysis results. Like any R package, it needs to be installed with all the necessary dependencies. BACA uses external packages and assumes that they are installed. Packages to install and load before to use BACA: RDAVIDWebService  and ggplot2 . After installing, the BACA package can be loaded with the command.
In order to carry out quick examples, a set of data is supplied with BACA. This data consists of artificial up- and down-regulated gene lists corresponding to two time points of two different experimental conditions. These gene lists can be loaded with the command.
Once the data is loaded, the R function DAVIDsearch is used to query the DAVID knowledge base.
DAVIDsearch requires two inputs: 1) the lists of up-/down-regulated gene sets and 2) the email of a given registered DAVID users (http://david.abcc.ncifcrf.gov/webservice/register.htm). Additionally, a number of optional parameters can be specified. For instance, the type of submitted ids (e.g., “ENTREZ_GENE_ID”, “GENBANK_ACCESSION”) and the category name (e.g., “GOTERM_BP_ALL”, “KEGG_PATHWAY”, etc.) to be used in the functional annotation analysis can be indicated, as specified in the BACA manual. During the querying process some notes are printed out. They include the name of the gene list, the number of genes successfully loaded, the number of genes mapped (and unmapped) in DAVID, the specie and the number of annotations found by DAVID.
The DAVIDsearch function compiles a list of DAVIDFunctionalAnnotationChart objects, one for each specified gene list. This list is used as input of the BBplot function in order to build a chart that shows how the functional annotations found by DAVID have changed across different experimental conditions.
BBplot builds a chart where the annotations are compared by the means of bubbles. The bubble size indicates the number of genes enriching the corresponding annotation, while the colour indicates the state of these genes in terms of down- and up-regulation.
BBplot works out with different optional parameters to filter the enrichment analysis results. In particular, they can use the parameters max.pval (or EASE score) and min.genes in order to select the most significant enriched annotations. This is necessary when the lists of enriched annotations found by DAVID are very large.
After building the bubble plot, the users can visualize and save it.
Finally, the users can use the Jplot function to build/plot pairwise comparisons between functional annotation charts.
The Jplot function takes in input two different DAVIDFunctionalAnnotationChart objects and provides in output a table/matrix with colored boxes. Each box reports the Jaccard index-based similarity score computed between the gene sets enriching two functional annotations.
The BACA package provides a set of simple R functions to provide visual comparisons of multiple enrichment results obtained by using DAVID.
Availability and requirements
Project name: BACA project
Project home page: http://cran.r-project.org/web/packages/BACA/
Operating system(s): Platform independent
Programming language: R
Other requirements: BioC 2.13 (R-3.0)
License: GPL (> = 2)
Funding: This work has been supported by the European Commission, under grant agreement FP7-309329 (NANOSOLUTIONS).
- Dennis G, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, et al. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol. 2003;4:P3.View ArticlePubMedGoogle Scholar
- Huang DW, Sherman BT, Tan Q, Kir J, Liu D, Bryant D, et al. DAVID Bioinformatics Resources: Expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res. 2007;35:169-175.Google Scholar
- Fresno C, Fernández EA. RDAVIDWebService: A versatile R interface to DAVID. Bioinformatics. 2013;29:2810–1.View ArticlePubMedGoogle Scholar
- Lee JSM, Katari G, Sachidanandam R. GObar: a gene ontology based analysis and visualization tool for gene sets. BMC Bioinformatics. 2005;6:189.View ArticlePubMedPubMed CentralGoogle Scholar
- Smid M, Dorssers LCJ. GO-Mapper: Functional analysis of gene expression data using the expression level as a score to evaluate Gene Ontology terms. Bioinformatics. 2004;20:2618–25.View ArticlePubMedGoogle Scholar
- Zeeberg BR, Feng W, Wang G, Wang MD, Fojo AT, Sunshine M, et al. GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol. 2003;4:R28.View ArticlePubMedPubMed CentralGoogle Scholar
- Zheng Q, Wang XJ. GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis. Nucleic Acids Res. 2008;36:358–363.Google Scholar
- Supek F, Bošnjak M, Škunca N, Šmuc T. Revigo summarizes and visualizes long lists of gene ontology terms. PLoS One. 2011; 6:e21800.Google Scholar
- H Wickham. ggplot2: elegant graphics for data analysis. Springer New York. 2009.Google Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.