- Software
- Open access
- Published:
MicroScope: ChIP-seq and RNA-seq software analysis suite for gene expression heatmaps
BMC Bioinformatics volume 17, Article number: 390 (2016)
Abstract
Background
Heatmaps are an indispensible visualization tool for examining large-scale snapshots of genomic activity across various types of next-generation sequencing datasets. However, traditional heatmap software do not typically offer multi-scale insight across multiple layers of genomic analysis (e.g., differential expression analysis, principal component analysis, gene ontology analysis, and network analysis) or multiple types of next-generation sequencing datasets (e.g., ChIP-seq and RNA-seq). As such, it is natural to want to interact with a heatmap’s contents using an extensive set of integrated analysis tools applicable to a broad array of genomic data types.
Results
We propose a user-friendly ChIP-seq and RNA-seq software suite for the interactive visualization and analysis of genomic data, including integrated features to support differential expression analysis, interactive heatmap production, principal component analysis, gene ontology analysis, and dynamic network analysis.
Conclusions
MicroScope is hosted online as an R Shiny web application based on the D3 JavaScript library: http://microscopebioinformatics.org/. The methods are implemented in R, and are available as part of the MicroScope project at: https://github.com/Bohdan-Khomtchouk/Microscope.
Background
Most currently existing heatmap software produce static heatmaps [21, 25, 26, 29, 32, 35, 36, 44], without features that would allow the user to dynamically interact with, explore, and analyze the landscape of a heatmap via integrated tools supporting user-friendly analyses in differential expression, principal components, gene ontologies, and networks. Such features would allow the user to engage the heatmap data in a visual and analytical manner while in real-time, thereby allowing for a deeper, quicker, and more comprehensive data exploration experience.
An interactive, albeit non-reproducible heatmap tool was previously employed in the study of the transcriptome of the Xenopus tropicalis genome [41]. Likewise, manual clustering of dot plots depicting RNA expression is an integral part of the Caleydo data exploration environment [42]. Chemoinformatic-driven clustering can also be toggled in the user interface of Molecular Property Explorer [27]. Furthermore, an interactive heatmap software suite was previously developed with a focus on cancer genomics analysis and data import from external bioinformatics resources [31]. Most recently, a general-purpose heatmap software providing support for transcriptomic, proteomic and metabolomic experiments was developed using the R Shiny framework [1].
Moreover, an interactive cluster heatmap library, InCHlib, was previously proposed for cluster heatmap exploration [39], but did not provide built-in support for gene ontology, principal component, or network analysis. However, InCHlib concentrates primarily in chemoinformatic and biochemical data clustering analysis, including the visualization of microarray and protein data. On the contrary, MicroScope is designed specifically for ChIP-seq and RNA-seq data visualization and analysis in the differential expression, principal component, gene ontology, and network analysis domains. In general, prior software has concentrated primarily in hierarchical clustering, searching gene texts for substrings, and serial analysis of genomic data, with no integrated features to support the aforementioned built-in features [3, 37, 46].
As of yet, no free, open-source heatmap software has been proposed to explore heatmaps at such multiple levels of genomic analysis and interactive visualization capacity. Here we propose a user-friendly genome software suite designed to handle dynamic, on-the-fly JavaScript visualizations of gene expression heatmaps as well as their respective differential expression analysis, principal component analysis, gene ontology analysis, and network analysis of genes.
Implementation
MicroScope is hosted online as an R Shiny web server application. MicroScope may also be run locally from within R Studio, as shown here: https://github.com/Bohdan-Khomtchouk/Microscope. MicroScope leverages the cumulative utility of R’s d3heatmap [20], shiny [19], stats [33], htmlwidgets [43], RColorBrewer [30], dplyr [45], data.table [23], goseq [47], GO.db [4], and networkD3 [24] libraries to create an integrative web browser-based software experience requiring absolutely no programming or statistical experience from the user, or even the need to download R on a local computer.
MicroScope employs the Bioconductor package edgeR [34] to create a one-click, built-in, user-friendly differential expression analysis feature that provides differential expression analysis of gene expression data based on the quantile-adjusted conditional maximum likelihood (qCML) procedure and the Benjamini & Hochberg correction. edgeR is a count-based statistical method that expects input data in the form of a matrix of integer values. The value in the i-th row and the j-th column of the matrix tells how many reads (or fragments, for paired-end RNA-seq) have been unambiguously assigned to gene i in sample j [28]. Analogously, for other types of assays, the rows of the matrix might correspond e.g., to binding regions (with ChIP-seq), species of bacteria (with metagenomic datasets), or peptide sequences (with quantitative mass spectrometry). In general, the values in the matrix must be raw counts of sequencing reads/fragments. This is important for the statistical model to hold, as only the raw counts allow assessing the measurement precision correctly. It is important to never provide counts that were pre-normalized for sequencing depth/library size, as the statistical model is most powerful when applied to raw counts, and is designed to account for library size differences internally via a series of built-in normalization procedures.
The edgeR results supply the user with rank-based information about nominal p-value, false discovery rate, fold change, and counts per million in order to establish which specific genes in the data are differentially expressed with a high degree of statistical significance. This information, in turn, is used to investigate the top gene ontology categories of differentially expressed genes, which can then be conveniently visualized as interactive network graphics. Finally, MicroScope provides user-friendly support for principal component analysis via the generation of biplots, screeplots, and summary tables. PCA is supported for both covariance and correlation matrices via R’s prcomp() function in the stats package.
Results and discussion
Figure 1 shows the MicroScope user interface (UI) upon login. After a user inputs an RNA-seq/ChIP-seq data file containing read counts per gene per sample, the user is guided through the differential expression analysis (Fig. 2) which, in turn, leads to the heatmap visualization stage of differentially expressed genes at user-specified statistical cutoff parameters (Fig. 3). Heatmaps visualizing statistically significant genes, as determined by the differential expression analysis, can be customized in a variety of ways, through user-friendly methods such as:
-
Statistical parameters visualization cutoff widget (p-value and/or FDR)
-
l o g 2 data transformation widget
-
Multiple heatmap color schemes widget
-
Hierarchical clustering widget
-
Row/column dendrogram branch coloring widget
-
Row/column font size widget
-
Heatmap download widget
MicroScope allows the user to magnify any portion of a heatmap by a simple click-and-drag feature to zoom in, and a click-once feature to zoom out. MicroScope is designed with large gene expression heatmaps in mind, where individual gene labels overlap and render the text unreadable. However, MicroScope allows the user to repeatedly zoom in to any sector of the heatmap to investigate a region, cluster, or even a single gene. MicroScope also allows the user to hover the mouse pointer over any specific gene to show gene name, expression level, and column ID. It should be noted that specifying the heatmap statistical parameters impacts the contents of the heatmap visualization itself, as stringent cutoffs will naturally result in less genes displayed. However, the downstream PCA or gene ontology or network analysis is not impacted by these heatmap visualizations. In other words, all downstream analyses are performed on the entire input dataset. It should also be noted that prior to visualizing heatmaps in MicroScope, experiment-specific data normalization procedures are left to the discretion of the user [2, 22, 38, 40], depending on whether the user wants to visualize differences in magnitude among genes or see differences among samples.
One of the user-friendly features within MicroScope is that it is responsive to the demands asked of it by the user. For example, gene ontology analysis buttons are not provided in the UI until a user runs differential expression analysis, which constitutes a prerequisite step required prior to conducting a successful gene ontology analysis. In other words, MicroScope is user-responsive in the sense that it automatically unlocks new features only as they become needed when the user progresses through successive stages in the software. Furthermore, MicroScope automatically provides short and convenient written guidelines directly in the UI to guide the user to the next steps in the usage of the software. As such, complex analytical operations can be performed by the user in a friendly, step-by-step fashion, each time facilitated by the help of the MicroScope software suite, which adjusts to the needs of the user and provides written guidelines on the next steps to pursue. It should be noted that the differential expression analysis in MicroScope (qCML and Benjamini & Hochberg correction) is designed to be broadly applicable to be run on any ChIP-seq or RNA-seq data inputted by the user.
Following the successful completion of the differential expression analysis and interactive heatmap visualization, a user is automatically supplied a suite of UI widgets to perform principal component analysis. The user is given the choice to specify the matrix type (i.e., covariance or correlation matrix) in the sidebar panel marked ‘Choose PCA Option’. After the PCA is completed, the user is supplied with a biplot and screeplot to visualize the results, as well as tabulated information showing the relative importance of each principal component.
Following the successful completion of the PCA (Fig. 4), the user is prompted with more UI widgets to proceed to the gene ontology analysis. Specifying values for these features and clicking the Do Gene Ontology Analysis button returns a list of the top gene ontology (GO) categories according to these exact specifications set by the user (Fig. 5). To perform the gene enrichment analysis, MicroScope uses the Wallenius non-central hypergeometric distribution to retrieve p-values for each GO category analyzed. Specifically, the goseq package implements a default option to use the Wallenius distribution to approximate the true null distribution, without any significant loss in accuracy [47]. After a null distribution is established, each GO category is then tested for over and under-representation amongst the set of differentially expressed genes, and the null is used to calculate a p-value for over and under-representation. Supported organisms for GO category analysis include: human [5], mouse [6], rat [7], zebrafish [8], worm [9], chimpanzee [10], fly [11], yeast [12], bovine [13], canine [14], mosquito [15], rhesus monkey [16], frog [17], and chicken [18].
The successful completion of this step can be followed up by running a network analysis on the top GO categories, thereby generating network graphics corresponding to the number of top gene ontology categories previously requested by the user (Fig. 6). Nodes represent either gene names or gene ontology identifiers, and links represent direct associations between the two entities. In addition to serving as a visualization tool, this network analysis automatically identifies differentially expressed genes that are present within each top gene ontology, which is a level of detail not readily available by running gene ontology analysis alone. By immediately extracting the respective gene names from each top gene ontology category, MicroScope’s network analysis features serve to aid the biologist in identifying the top differentially expressed genes in the top respective gene ontology categories. Figure 7 compares interactive network visualizations of the top two gene ontologies, thereby demonstrating the immediate responsiveness of MicroScope’s network graphics to user-specified settings (e.g., number of top gene ontologies to display widget).
Conclusion
We provide access to a user-friendly web application designed to visualize and analyze dynamically interactive heatmaps within the R programming environment, without any prerequisite programming skills required of the user. Our software tool aims to enrich the genomic data exploration experience by providing a variety of complex visualization and analysis features to investigate gene expression datasets. Coupled with a built-in analytics platform to pinpoint statistically significant differentially expressed genes, an interactive heatmap production platform to visualize them, a principal component analysis platform to investigate variation and patterns in gene expression, a gene ontology platform to categorize the top gene ontology categories, and a network analysis platform to dynamically visualize gene ontology categories at the gene-specific level, MicroScope presents a significant advance in heatmap technology over currently available software.
Availability and requirements
-
Project name: MicroScope
-
Project home page: http://microscopebioinformatics.org
-
Operating system(s): Platform-independent
-
Programming language: R, Shiny
-
Other requirements: Internet connectivity
-
License: GNU General Public License version 3.0 (GPL-3.0)
-
Any restrictions to use by non-academics: No
Abbreviations
- DE:
-
Differential expression
- FDR:
-
False discovery rate
- GO:
-
Gene ontology
- PCA:
-
Principal component analysis
- qCML:
-
Quantile-adjusted conditional maximum likelihood
- UI:
-
User interface
References
Babicki S, Arndt D, Marcu A, Liang Y, Grant JR, Maciejewski A, Wishart DS. Heatmapper: web-enabled heat mapping for all. Nucleic Acids Res. 2016; 44(W1):W147–53.
Bailey T, Krajewski P, Ladunga I, Lefebvre C, Li Q, Liu T, Madrigal P, Taslim C, Zhang J. Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data. PLoS Comput Biol. 2013; 9(11):e1003326.
Caraux G, Pinloche S. Permutmatrix: A Graphical Environment to Arrange Gene Expression Profiles in Optimal Linear Order. Bioinformatics. 2005; 21:1280–1.
Carlson M. GO.db: A set of annotation maps describing the entire Gene Ontology. 2015. R package version 3.3.0.
Carlson M. org.Hs.eg.db: Genome wide annotation for Human. 2016. R package version 3.3.0.
Carlson M. org.Mm.eg.db: Genome wide annotation for Mouse. 2016. R package version 3.3.0.
Carlson M. org.Rn.eg.db: Genome wide annotation for Rat. 2016. R package version 3.3.0.
Carlson M. org.Dr.eg.db: Genome wide annotation for Zebrafish. 2016. R package version 3.3.0.
Carlson M. org.Ce.eg.db: Genome wide annotation for Worm. 2016. R package version 3.3.0.
Carlson M. org.Pt.eg.db: Genome wide annotation for Chimp. 2016. R package version 3.3.0.
Carlson M. org.Dm.eg.db: Genome wide annotation for Fly. 2016. R package version 3.3.0.
Carlson M. org.Sc.sgd.db: Genome wide annotation for Yeast. 2016. R package version 3.3.0.
Carlson M. org.Bt.eg.db: Genome wide annotation for Bovine. 2016. R package version 3.3.0.
Carlson M. org.Cf.eg.db: Genome wide annotation for Canine. 2016. R package version 3.3.0.
Carlson M. org.Ag.eg.db: Genome wide annotation for Anopheles. 2016. R package version 3.3.0.
Carlson M. org.Mmu.eg.db: Genome wide annotation for Rhesus. 2016. R package version 3.3.0.
Carlson M. org.Xl.eg.db: Genome wide annotation for Xenopus. 2016. R package version 3.3.0.
Carlson M. org.Gg.eg.db: Genome wide annotation for Chicken. 2016. R package version 3.3.0.
Chang W, Cheng J, Allaire JJ, Xie Y, McPherson J, RStudio, jQuery Foundation, jQuery contributors, jQuery UI contributors, Otto M, Thornton J, Bootstrap contributors, Twitter Inc., Farkas A, Jehl S, Petre S, Rowls A, Gandy D, Reavis B, Kowal KM, es5-shim contributors, Ineshin D, Samhuri S, SpryMedia Limited, Fraser J, Gruber J, Sagalaev I, R Core Team. shiny: Web Application Framework for R. 2015. R package version 0.12.2.
Cheng J, Galili T, RStudio Inc., Bostock M, Palmer J. d3heatmap: Interactive Heat Maps Using ‘htmlwidgets’ and ‘D3.js’. 2015. R package version 0.6.1.
Chu VT, Gottardo R, Raftery AE, Bumgarner RE, Yeung KY. MeV+R: using MeV as a graphical user interface for Bioconductor applications in microarray analysis. Genome Biol. 2008; 9:R118.
Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, Szcześniak MW, Gaffney DJ, Elo LL, Zhang X, Mortazavi A. A survey of best practices for RNA-seq data analysis. Genome Biol. 2016; 17(13):1–19.
Dowle M, Srinivasan A, Short T, Lianoglou S, Saporta R, Antonyan E. data.table: Extension of Data.frame. 2015. R package version 1.9.6.
Gandrud C, Allaire JJ, Russell K, Lewis BW, Kuo K, Sese C, Ellis P, Owen J, Rogers J. networkD3: D3 JavaScript Network Graphs from R. R package version 0.2.8.
Gould J. GENE-E software hosted at the Broad Institute. http://www.broadinstitute.org/cancer/software/GENE-E/.
Khomtchouk BB, Van Booven DJ, Wahlestedt C. HeatmapGenerator: high performance RNAseq and microarray visualization software suite to examine differential gene expression levels using an R and C++ hybrid computational pipeline. Source Code Biol Med. 2014; 9(1):1–6.
Kibbey C, Calvet A. Molecular Property eXplorer: a novel approach to visualizing SAR using tree-maps and heatmaps. J Chem Inf Model. 2005; 45(2):523–32.
Love M, Anders S, Kim V, Huber W. RNA-seq workflow: gene-level exploratory analysis and differential expression. 2016. http://www.bioconductor.org/help/workflows/rnaseqGene/.
Metsalu T, Vilo J. ClustVis: a web tool for visualizing clustering of multivariate data using Principal Component Analysis and heatmap. Nucleic Acids Res. 2015; 43(W1):W566–70.
Neuwirth E. RColorBrewer: ColorBrewer Palettes. 2014. R package version 1.1-2.
Perez-Llamas C, Lopez-Bigas N. Gitools: analysis and visualisation of genomic data using interactive heat-maps. PLoS ONE. 2011; e19541:6.
Qlucore Omics Explorer. The D.I.Y Bioinformatics Software. http://www.qlucore.com.
R Core Team. A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2015. https://www.R-project.org/.
Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010; 26:139–40.
Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP. GenePattern 2.0. Nat Genet. 2006; 38(5):500–1. 10.1038/ng0506-500.
Saeed AI, Sharov V, White J, Li J, Liang W, Bhagabati N, Braisted J, Klapa M, Currier T, Thiagarajan M, Sturn A, Snuffin M, Rezantsev A, Popov D, Ryltsov A, Kostukovich E, Borisovsky I, Liu Z, Vinsavich A, Trush V, Quackenbush J. TM4: a free, open-source system for microarray data management and analysis. Biotechniques. 2003; 34(2):374–8.
Saldanha AJ. Java Treeview – extensive visualization of microarray data. Bioinformatics. 2004; 20(17):3246–8.
Shin H, Liu T, Duan X, Zhang Y, Liu XS. Computational methodology for ChIP-seq analysis. Quant Biol. 2013; 1(1):54–70.
Škuta C, Bartu̇něk P, Svozil D. InCHlib — interactive cluster heatmap for web applications. J Cheminformatics. 2014; 6(44):1–9.
Soneson C, Delorenzi M. A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinforma. 2013; 14:91.
Tan MH, Au KF, Yablonovitch AL, Wills AE, Chuang J, Baker JC, Wong WH, Li JB. RNA sequencing reveals a diverse and dynamic repertoire of the Xenopus tropicalis transcriptome over development. Genome Res. 2013; 23:201–16.
Turkay C, Lex A, Streit M, Pfister H, Hauser H. Characterizing cancer subtypes using dual analysis in Caleydo StratomeX. IEEE Comput Graph Appl. 2014; 34(2):38–47.
Vaidyanathan R, Xie Y, Allaire JJ, Cheng J, Russell K. RStudio. htmlwidgets: HTML Widgets for R. 2015. R package version 0.5.
Verhaak RGW, Sanders MA, Bijl MA, Delwel R, Horsman S, Moorhouse MJ, van der Spek PJ, Lowenberg B, Valk PJM. HeatMapper: powerful combined visualization of gene expression profile correlations, genotypes, phenotypes and sample characteristics. BMC Bioinforma. 2006; 7:337.
Wickham H, Francois R. RStudio. dplyr: A Grammar of Data Manipulation. 2015. R package version 0.4.3.
Wu HM, Tien YJ, Chen CH. GAP: A Graphical Environment for Matrix Visualization and Cluster Analysis. Comput Stat Data Anal. 2010; 54:767–78.
Young MD, Wakefield MJ, Smyth GK, Oshlack A. Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biol. 2010; 11:R14.
Acknowledgements
BBK dedicates this work to the memory of his uncle, Taras Khomchuk. BBK wishes to acknowledge the financial support of the United States Department of Defense (DoD) through the National Defense Science and Engineering Graduate Fellowship (NDSEG) Program: this research was conducted with Government support under and awarded by DoD, Army Research Office (ARO), National Defense Science and Engineering Graduate (NDSEG) Fellowship, 32 CFR 168a. CW thanks Vytas Dargis-Robinson for assistance in early stages of the project.
Funding
Funding provided to BBK from the United States Department of Defense (DoD) through the National Defense Science and Engineering Graduate Fellowship (NDSEG) Program: this research was conducted with Government support under and awarded by DoD, Army Research Office (ARO), National Defense Science and Engineering Graduate (NDSEG) Fellowship, 32 CFR 168a.
Availability of data and materials
All source code has been made publicly available on Github at: https://github.com/Bohdan-Khomtchouk/Microscope.
Authors’ contributions
BBK conceived the study. BBK and JRH wrote the code. CW participated in the management of the source code and its coordination. BBK wrote the paper. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Consent for publication
Not applicable.
Ethics approval and consent to participate
This study does not involve humans, human data or animals.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Khomtchouk, B., Hennessy, J. & Wahlestedt, C. MicroScope: ChIP-seq and RNA-seq software analysis suite for gene expression heatmaps. BMC Bioinformatics 17, 390 (2016). https://doi.org/10.1186/s12859-016-1260-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12859-016-1260-x