- Open Access
WholeCellViz: data visualization for whole-cell models
BMC Bioinformatics volume 14, Article number: 253 (2013)
Whole-cell models promise to accelerate biomedical science and engineering. However, discovering new biology from whole-cell models and other high-throughput technologies requires novel tools for exploring and analyzing complex, high-dimensional data.
We developed WholeCellViz, a web-based software program for visually exploring and analyzing whole-cell simulations. WholeCellViz provides 14 animated visualizations, including metabolic and chromosome maps. These visualizations help researchers analyze model predictions by displaying predictions in their biological context. Furthermore, WholeCellViz enables researchers to compare predictions within and across simulations by allowing users to simultaneously display multiple visualizations.
WholeCellViz was designed to facilitate exploration, analysis, and communication of whole-cell model data. Taken together, WholeCellViz helps researchers use whole-cell model simulations to drive advances in biology and bioengineering.
Whole-cell computational models promise to predict how complex cellular behaviors such as growth and replication arise from individual molecules and their interactions. Recently, we developed the first whole-cell model of a single cell, the Gram-positive bacterium Mycoplasma genitalium. The model predicts the dynamics of every molecular species over the entire cell cycle, accounting for the specific function of every annotated gene product. The model’s simulations produce rich data containing valuable insights into cellular behavior. For example, the model’s simulations have generated new insights into cell cycle regulation, energy usage, and gene essentiality .
However, the large number of whole-cell model predictions - over 50 billion data points in a typical dataset - makes directly analyzing the predictions time consuming and cumbersome. Furthermore, directly analyzing the model’s predictions requires deep knowledge of mathematical modeling, computer programming, and the unique data structures used to represent the model’s predictions.
Data visualization software is critically needed to help researchers realize the full potential of whole-cell models by enabling researchers to more quickly and efficiently analyze whole-cell model simulations. We developed WholeCellViz to enable researchers to easily visualize whole-cell model predictions. WholeCellViz provides researchers interactive animations as well as time series plots to easily explore whole-cell model predictions. Furthermore, WholeCellViz facilitates comparisons within and across simulations by enabling researchers to view grids of animations and plots.
Interactive data visualization is becoming increasingly important as biological data continues to grow in complexity and volume. Data visualization can help scientists identify subtle patterns in large data sets leading to important scientific findings. For example, Lum et al. used Iris to visualize genetic data from 272 breast cancer patients . Iris revealed a specific genetic profile for women with low estrogen receptor expression, but high survival rates, a group which now receives targeted treatment for breast cancer. Shannon et al. used Cytoscape to visually link biomolecular networks with high-throughput data on various molecular states and functional annotations . Baliga et al. used Cytoscape to obtain a systems-level understanding of Halobacterium energy transduction by visualizing its protein interaction network . Pathway Tools enables researchers to visually integrate genomic, proteomic, and metabolomic data . Chang et al. and Paley et al. used the Pathway Tools Omics Viewer to investigate the role of individual metabolic networks in bacterial infection [6, 7]. MulteeSum was developed to visualize three-dimensional gene expression data, and has been used to gain insight into Drosophila development [8, 9].
Here we describe WholeCellViz’s implementation, features, and visualizations. We also provide two examples of how WholeCellViz can be used to analyze whole-cell model predictions.
Back-end storage server
Our whole-cell model software stores the predicted values of each biological variable at each time point using a set of MATLAB data files. We converted this data into the JSON format using custom Python scripts. We stored the metadata for each simulation, and the label and units for each data point in the database. The WholeCellViz front-end requests metadata and JSON file(s) from the back-end server as needed to display visualizations.
Graphical user interface
The visualizations were implemented using an extensible framework designed to enable additional visualizations to be easily added to WholeCellViz. Specifically, each visualization extends a common class by defining methods for requesting and displaying data. The source code contains a template for constructing additional visualizations.
We developed the time series plots using the Flot (http://www.flotcharts.org) plotting library. We used the JQuery and JQuery UI (http://jqueryui.com) libraries to implement WholeCellViz’s grid layout and animation controls.
Results and discussion
We developed WholeCellViz to accelerate data-driven discovery by visualizing whole-cell model simulation data. WholeCellViz uses simulation data to render 14 visualizations that display model predictions in their biological context. Time series plots supplement the visualizations by showing the detailed dynamics of one or multiple biological variables over time. WholeCellViz lays out these visualizations in an easily configurable grid. The animation timeline controls the simultaneous playback of all displayed animations in the grid. Hence, WholeCellViz is able to simultaneously visualize and animate multiple model predictions.
Figure 1 is a sample screenshot of WholeCellViz. We use this figure to describe the features of WholeCellViz.
WholeCellViz contains 14 visualizations that animate specific model predictions within their biological context. These visualizations are listed in Table 1 and illustrated in Figures 1 and 2. Together, these 14 visualizations are capable of displaying 88% of the model’s predictions. These visualizations are also interactive. For example, hovering over the metabolism (Figure 1b) visualization reveals tooltips which display metabolite names, compartments, and concentrations. The gene expression panel’s tooltips display gene names, descriptions, and instantaneous copy numbers (Figure 1c). Clicking on a gene in the translation panel (Figure 1f) opens a new tab which displays the gene’s entry in the WholeCellKB model organism database .
Time series plots
WholeCellViz can also display line plots showing the values of one or multiple biological variables over time. For example, the middle-left panel of Figure 3 illustrates the temporal dynamics of the intracellular ATP copy number. Time series plots can also display the dynamics of biological variables across simulations, facilitating comparisons across simulations.
The animation timeline at the bottom of the screen controls the simultaneous playback of all displayed visualizations. It provides play/pause, seek, speed, and repeat controls.
The layout editor is accessed by clicking the gear icon in the top-right corner of the visualization panels. The layout editor enables users to configure the grid dimensions and select the visualization or time series plot displayed in each panel.
Users can visualize data from any server running the server-side WholeCellViz software. The hosted version at http://wholecellviz.stanford.edu provides the over 3,000 described in Karr et al., 2012 . Users can install the whole-cell model and WholeCellViz server software on their own machines, or use the whole-cell Linux virtual machine to execute and visualize new simulations. See below for more information about availability.
Graphical & data export
WholeCellViz exports the plotted data in JSON format and exports graphics in SVG format.
Data exploration using WholeCellViz
WholeCellViz can display multiple visualization panels to facilitate comparative and simultaneous analysis of multiple aspects of simulated cell physiology. In particular, WholeCellViz provides six preconfigured views to help users quickly get started. Each of the six views is a grid of visualizations selected to represent a particular aspect of cellular or population dynamics. These views enable users to explore hypotheses about the data. Here we discuss two case studies to illustrate the power of WholeCellViz to facilitate data exploration.
Figure 3 shows a screen shot of the replication dynamics view. This view displays several perspectives on DNA replication and cytokinesis: cell shape, chromosome dynamics, cytokinesis, replication initiation, and dNTP copy number. First, the view shows that before replication initiates the cell contains a single chromosome and steadily accumulates an increasingly large pool of dNTPs. Second, the view shows that once a sufficiently large oriC DNA complex forms, replication begins accompanied by a sharp drop in the dNTP level. Third, the view shows that replication then proceeds quickly until the dNTP supply is depleted, at which point the rate of replication slows. Finally, the view shows that the FtsZ ring contracts immediately following replication completion.
Figure 4 shows a screen shot of the population variance view. This view presents summary statistics - growth rate, ATP copy number, dNTP copy number, DNA mass, RNA mass, and protein mass - for eight wild-type in silico cells. The view shows that the growth rate, ATP copy number, RNA mass, and protein mass have relatively little variance at the population level. The dNTP copy number and DNA mass have substantially more variance. In three simulations, the dNTP copy number is depleted more than two hours earlier than in the other simulations, and the DNA mass increases earlier in these simulations. This suggests that the timing of DNA replication initiation does not impact the cellular growth rate, ATP copy number, RNA mass, protein mass, or cell cycle length. Rather, the view suggests that metabolism is the primary factor controlling and coordinating the cell’s growth, chemical content, and division.
WholeCellViz is a web-based program designed to facilitate exploration, and analysis of in silico biological experiments of whole-cell models. The software enables users to fully explore whole-cell model simulations, and displays whole-cell model predictions in their biological context using visualizations and time series plots. Furthermore, WholeCellViz’s grid layout feature enables users to display multiple visualizations and plots, enabling comparative analysis both within and across in silico cells.
Going forward, we plan to improve WholeCellViz as a tool for novel model analysis. We plan to develop new visualizations to communicate additional model predictions including DNA supercoiling and RNA and protein maturation. We also plan to develop enhanced plotting tools for detecting complex relationships among model predictions and analyzing stochastic variation. For example, scatter plots could be used to drill-down to specific time points and examine correlations among multiple variable in a single simulation, or among one variable across multiple simulations. Box plots could be used to compare the variance of variables across simulations.
To date only one whole-cell model has been developed. Consequently, we chose to focus WholeCellViz on the over 3,000 M. genitalium simulations described in Karr et al., 2012 . Going forward, we plan to integrate WholeCellViz with other whole-cell models and simulation data servers as they become publicly available. Currently users can visualize alternative whole-cell model simulations by (1) running their own simulations using either our M. genitalium model or a similarly detailed model, (2) storing their simulations on their own server using the hybrid MySQL/JSON format described here, and (3) editing the back-end server URL configuration option from the WholeCellViz front-end. Researchers can achieve this either by installing the whole-cell model and WholeCellViz software on their own machine or by using our Linux virtual machine which contains both the whole-cell model and WholeCellViz software (see below for more information about availability). In the future, we also plan to enable researchers to configure and run whole-cell simulations through a simple graphical interface within WholeCellViz. However, this will require the development of more computationally efficient whole-cell model simulations.
Overall, whole-cell modeling is an emerging field that has the potential to accelerate the pace of biological discovery and enable rational bioengineering and personalized medicine. Data visualization software such as WholeCellViz is critically needed to help researchers access, explore, and analyze complex, high-dimensional whole-cell model simulations, as well as to accelerate model-driven biological discovery. With the current influx of big data in research and industry, WholeCellViz also serves as an example of how to use animation for scientific communication. We anticipate that WholeCellViz will play a critical role in realizing the full potential of whole-cell models.
Availability and requirements
Project name: WholeCellViz
Project home page: http://wholecellviz.stanford.edu
Operating system(s): Platform independent
Other requirements: Web browser
License: MIT license
Any restrictions to use by non-academics: None
WholeCellViz is available under the MIT license at http://wholecellviz.stanford.edu. The hosted version visualizes the over 3,000 simulations described in Karr et al., 2012 , and is also capable of visualizing simulations stored on other servers running the WholeCellViz server-side software. Researchers can install the whole-cell model and WholeCellViz software locally to execute and visualize new simulations. All source code is available open-source at SimTK: http://simtk.org/home/wholecell. A Linux virtual machine containing the whole-cell model and WholeCellViz server and client software is also available at SimTK.
Hypertext markup language
Origin of replication
PHP: hypertext preprocessor
Scalable vector graphics
Uniform resource locator
Extensible markup language.
Karr JR, Sanghvi JC, Macklin DN, Gutschow MV, Jacobs JM, Bolival B, Assad-Garcia N, Glass JI, Covert MW: A whole-cell computational model predicts phenotype from genotype. Cell. 2012, 150: 389-401. 10.1016/j.cell.2012.05.044.
Lum PY, Paquette J, Singh G, Carlsson G: Patient Stratification using Topological Data Analysis and Iris. http://www.ayasdi.com/_downloads/Patient_Stratification_using_Topological_Data_Analysis.pdf,
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13: 2498-2504. 10.1101/gr.1239303.
Baliga NS, Pan M, Goo YA, Yi EC, Goodlett DR, Dimitrov K, Shannon P, Aebersold R, Ng WV, Hood L: Coordinate regulation of energy transduction modules in Halobacterium sp. analyzed by a global systems approach. Proc Natl Acad Sci U S A. 2002, 99: 14913-14918. 10.1073/pnas.192558999.
Karp PD, Paley S, Romero P: The pathway tools software. Bioinformatics. 2002, 18: S225-S232. 10.1093/bioinformatics/18.suppl_1.S225.
Chang DE, Smalley DJ, Tucker DL, Leatham MP, Norris WE, Stevenson SJ, Anderson AB, Grissom JE, Laux DC, Cohen PS, Conway T: Carbon nutrition of Escherichia coli in the mouse intestine. Proc Natl Acad Sci U S A. 2004, 101: 7427-7432. 10.1073/pnas.0307888101.
Paley SM, Karp PD: The pathway tools cellular overview diagram and omics viewer. Nucleic Acids Res. 2006, 34: 3771-3778. 10.1093/nar/gkl334.
Meyer M, Munzner T, DePace A, Pfister H: Multeesum: a tool for comparative spatial and temporal gene expression data. IEEE Trans Vis Comput Graph. 2010, 16: 908-917.
Fowlkes CC, Eckenrode KB, Bragdon MD, Meyer M, Wunderlich Z, Simirenko L, Luengo Hendriks CL, Keränen SV, Henriquez C, Knowles DW, Biggin MD, Eisen MB, DePace AH: A Conserved Developmental Patterning Network Produces Quantitatively Different Output in Multiple Species of Drosophila. PLoS Genet. 2011, 7: e1002346-10.1371/journal.pgen.1002346.
Karr JR, Sanghvi JC, Macklin DN, Arora A, Covert MW: WholeCellKB: model organism databases for comprehensive whole-cell models. Nucleic Acids Res. 2013, 41: D787-92. 10.1093/nar/gks1108.
We thank Elsa Birch, Derek Macklin, Jane Maynard, Nick Ruggero, and Jayodita Sanghvi for helpful discussions on data analysis and visualization.
This work was supported by a NIH Director’s Pioneer Award (5DP1LM01150-05), an Allen Distinguished Investigator Award, and a Hellman Faculty Scholarship to M.W.C; a Stanford Bioengineering REU scholarship to R.L; and NDSEG, NSF, and Stanford Graduate Fellowships to J.R.K.
The authors declare that they have no competing interests.
RL and KR contributed equally to the conception and development of WholeCellViz. MC supervised the project. All authors wrote and approved the final manuscript.
Ruby Lee, Jonathan R Karr contributed equally to this work.