Methods for visual mining of genomic and proteomic data atlases
© Boyle et al; licensee BioMed Central Ltd. 2012
Received: 23 September 2011
Accepted: 23 April 2012
Published: 23 April 2012
As the volume, complexity and diversity of the information that scientists work with on a daily basis continues to rise, so too does the requirement for new analytic software. The analytic software must solve the dichotomy that exists between the need to allow for a high level of scientific reasoning, and the requirement to have an intuitive and easy to use tool which does not require specialist, and often arduous, training to use. Information visualization provides a solution to this problem, as it allows for direct manipulation and interaction with diverse and complex data. The challenge addressing bioinformatics researches is how to apply this knowledge to data sets that are continually growing in a field that is rapidly changing.
This paper discusses an approach to the development of visual mining tools capable of supporting the mining of massive data collections used in systems biology research, and also discusses lessons that have been learned providing tools for both local researchers and the wider community. Example tools were developed which are designed to enable the exploration and analyses of both proteomics and genomics based atlases. These atlases represent large repositories of raw and processed experiment data generated to support the identification of biomarkers through mass spectrometry (the PeptideAtlas) and the genomic characterization of cancer (The Cancer Genome Atlas). Specifically the tools are designed to allow for: the visual mining of thousands of mass spectrometry experiments, to assist in designing informed targeted protein assays; and the interactive analysis of hundreds of genomes, to explore the variations across different cancer genomes and cancer types.
The mining of massive repositories of biological data requires the development of new tools and techniques. Visual exploration of the large-scale atlas data sets allows researchers to mine data to find new meaning and make sense at scales from single samples to entire populations. Providing linked task specific views that allow a user to start from points of interest (from diseases to single genes) enables targeted exploration of thousands of spectra and genomes. As the composition of the atlases changes, and our understanding of the biology increase, new tasks will continually arise. It is therefore important to provide the means to make the data available in a suitable manner in as short a time as possible. We have done this through the use of common visualization workflows, into which we rapidly deploy visual tools. These visualizations follow common metaphors where possible to assist users in understanding the displayed data. Rapid development of tools and task specific views allows researchers to mine large-scale data almost as quickly as it is produced. Ultimately these visual tools enable new inferences, new analyses and further refinement of the large scale data being provided in atlases such as PeptideAtlas and The Cancer Genome Atlas.
Systems biology is a field that relies on both technical and scientific innovations. The technical innovations enable new scientific questions to be asked, and these in return make further demands for advances in technology. High throughput experimentation has been the main driving force behind these advances, and has primarily encompassed measurement types.
Two measurement types that have seen a vast increase in utility and volume are high-throughput sequencing (HTS) and mass spectrometry based proteomics. The dramatic increase in the volumes of data are due to changes in instrumentation. In proteomics the adoption of new techniques, principally targeted approaches and high resolution instruments, means that there is a need to capture and mine vast quantities of high resolution spectra to enable the design of new assays. In genomics the cost of HTS is now at a scale where populations of genomes and transcriptomes can be captured, and this is being done in a number of projects (e.g. The Cancer Genome Atlas, 1000 Genome, International Cancer Genome Consortium). This paper outlines an approach to the development of visual tools that have been developed to allow for the direct mining and usage of data derived from these two technologies. Development of such tools requires an understanding of both the scale of data and the typical needs of the user in any exploration. The basic approach involves providing interactive high level overviews of the data, and then allowing for the selection and drill down into smaller data sets. Separate visualization tools are used at each level of data exploration and linked to enable users to quickly move between data views. The fast moving pace of research means that these tools must be put in place quickly, and so they have been built on top of a series of rapid application development technologies, and are delivered as web applications.
The example tools support two major systems biology projects, the PeptideAtlas  and The Cancer Genome Atlas (TCGA) . The PeptideAtlas, encompasses SRM (Selected Reaction Monitoring) data across multiple species as well as shotgun based identifications, and the TCGA is a multi-institution effort to genomically characterize ten thousand cancer genomes across 20 different cancers.
High throughput visualization tools are required to allow for the exploration of large data sets. The data sets in question consist of thousands of genomic sequences and protein mass spectra. As these atlases are relatively new, work to provide visual mining is in its infancy, however there has been a large amount of work in visualizations in the areas of network and gene expression visualization that is being adapted and learned from.
Systems biology generally requires the integrated analysis of different data types [3, 4]. In systems biology the majority of information visualization has tended to focus on direct representations of networks . This is due to the fact that networks are often used to describe the dynamics of living systems (as an integrated and interacting network of genes, proteins and biochemical reactions). Network visualization has been studied in a large number of disciplines (e.g. software visualization, including complex dynamics of systems [6, 7] and the interactions of components [8, 9]). The interest in networks and molecular interactions has resulted in the progression of network visualization techniques, in particular involving the portrayal of the complexities of relationship characteristics using number of edge techniques  (e.g. edge bundles  and edge lenses ). Additionally, the context in which parts of the network exist, either through the discovery of motifs or through semantic similarities, have been used to reduce the graph's visual complexity  (e.g. different levels of focus on a network , use of magic/document lens , and provision of identified landmarks to aid navigation [16, 17]). These ideas are being applied to visualizations of systems biology networks [18, 19].
Alternative metaphors for the representation of complex data have also been explored. Gene expression array based experiments have provided a rich area for the development of visual tools. In particular visualization of gene expression data has extended a number of popular n-dimensional data techniques: projecting high dimensional data down to two dimensions e.g. pair-wise scatter plots  and parallel coordinates ; encoding aspects of the data onto intrinsic and (non-positional) extrinsic properties (e.g. Spotfire ); and using dimension reduction techniques, which transforms the data onto a small number of dimensions (e.g. PCA, best-fit approaches [23–25]). Innovations have also arisen from these investigations in terms of improved representation of the data  and the provision of specialised visualisations which present the data in a relevant context (e.g. ). A number of visualization suites have been developed which combine these approaches (e.g. [28–30]).
Due to their scale and complexity the visualizations of large repositories of genomic and proteomic data do represent new challenges, however it is possible to use many of the general information visualization techniques. We provide details of the visualizations that have been used across data from both the PeptideAtlas and TCGA to enable users to explore the large-scale, highly dimensional data.
Thousands of genomes
The Cancer Genome Atlas (TCGA) will, over the next three years, generate 10,000 patient genomic sequences across 20 different cancers. The goal is to provide a map of large-scale, genomic mutations, both between difference cancers (e.g. Ovarian and Glioblastoma) and across patients within a single cancer. Using these data, maps of normal variation, disease related disruptions and disease progression can be created for further analysis. Ultimately this atlas will provide a rich set of data to enable better characterization of disease sub types and the development of targeted therapies.
The data gathered by TCGA includes both full and exon-only genome sequences, epigenetic and transcriptomic data, and clinical information (e.g. age, clinical sub type). At the scale of thousands of patients this means that providing effective ways to visually explore this data is necessary for the development of useful analyses or in targeting areas for investigation. Such exploration requires the use of visual tools specifically adapted to data exploration at multiple scales. As the aim of TCGA is to provide genomic data across entire populations of patients and diseases, visual tools must enable exploration using specific knowledge (e.g. starting from a gene of interest) as well as providing for discovery of new information.
Various analyses are being performed across this data including pathway analysis, identification of functionally significant mutations and SNVs, tissue biopsy and imaging, microRNA regulation, and gene dosage analysis (to name a few). Each of these focuses on a different set of data within the atlas. Providing a generic visualization would result in a visual tool too abstract to be easily useful. Instead a set of tools targeted toward the specific analysis and underlying data is necessary. In developing tools for the analysis of gene disruptions, specifically structural variation, a linked set of interactive visual mining tools is used to directly compare the underlying genomic data. Each tool in the set is interactive so that genomic events can be discovered through exploration and used to find further information at each level (e.g. across cancers, across patients within a cancer, within a single patient).
Millions of spectra
The PeptideAtlas repository contains thousands of experiments and is designed to provide a compendium of the likelihood of a given peptide being detected on mass spectrometers. The repository contains information from thousands of mass spectrometry experiments (millions of spectra) across numerous species, tissues and disease conditions. The goals in providing this repository are both to annotate genomic information with observable peptides, and to provide an integrated view on a given proteome so it can be used as the basis of Selected Reaction Monitoring (SRM) experiments. PeptideAtlas can be used to identify representative (proteotypic) peptides that are unique to an individual protein. Mining the data it contains makes it possible to identify which transition patterns could be used to uniquely identify any set of proteins in the proteome. This allows for the design of targeted proteomic experiments, where the experimenter defines a priori which proteins they wish to detect and using the atlas, find which specific transitions should be scanned for. Targeted approaches can monitor at low mass/charge (m/z) levels, and have been shown to detect protein concentrations at low copy number . As SRM can be used on complex tissues, a minimum of separation chemistry is needed. This means that experiments can more accurately detect smaller amounts of protein in complex samples.
Mining this data requires the use of a number of integration strategies and information theoretic approaches to connect the peptide data with information from genomic sources (e.g. TCGA), disease literature (e.g. MEDLINE), and pathways (e.g. IntAct , MINT ). Providing visual tools that access this integrated data allows for refinement of biomarker or transition target lists from many thousands to the tens or hundreds that are detectable. The purpose is typically to identify peptides that will be suitable biomarkers for a specific disease or disease sub type. The mining tasks are rarely initiated without prior knowledge, instead they are typically initiated either through associations with other measurement types or through the literature information.
One of the important aspects of this work is that the tools must support active research, where the data sets are continually growing and often changing in scale and complexity . This means that the requirements continually change, and this must be factored into the design of visual analytic tools and the associated software technology choice [50, 51]. In most cases information visualizations require a costly investment in terms of expertise, user feedback, and developer time. Such investment is beyond most research groups, who must put tools in place quickly, and so often a simple, minimal functionality approach needs to be adopted. Instead of focusing on the best solution, we have found that visualizations must be used together, as each visualization has specific strengths in terms of: ability to work with different sizes of data (e.g. responsiveness); portraying generic aspects of information so that they can be used with multiple data sources; ease of use and understanding; and their suitability for specific tasks (e.g. feature identification).
Rapid development and deployment of visualizations, which allows for the development of tools to suit specific tasks through the use of software technologies.
Nested task specific views, which allows for the adoption of best information visualization practices without dramatically increasing development time.
Common understandable metaphors, which allows for acceptance as intuitive understanding minimizes learning time.
Rapid development and deployment of visualizations
Visualization is a crucial mechanism for discovering meaningful information from research data. The high volumes, complexity and heterogeneity of the proteomics and genomic data repositories means that representations that simply mirror the data are not appropriate. As research is not a static process, but rather an ongoing dynamic investigative endeavour, the visualizations must be able to deal with highly diverse and continually changing types of information. This means that it is difficult, or in many cases impossible, to provide one visualization that suits all users and usage. Instead, it is more effective to adopt a task based approach, where visualizations are provided for specific tasks (or analysis). If required, common visualizations can then be developed and refined. As data production and research are a constantly moving target, any tools provided to mine the data must be developed as quickly as the data is being produced.
Nested task specific views
While it is necessary to enable general exploration, researchers often need to explore the data from a particular starting point. Our visualizations are generally dedicated to a limited set of tasks however, so providing multiple levels of linked visualization for the data becomes necessary (e.g. cross-population, whole genome, chromosomal location). Information visualization has advocated common workflows and scopes for the development of visualizations which are useful when accessing massive data sets. The macro and micro view  ideas have more recently evolved into the information seeking mantra  workflow (overview, filter, data on demand). Such a workflow offers a practical approach to the delivery of visualizations, so that they can be accessed using desktop and web based tools. However, development of initial "overview" or macro visualizations is not straightforward with large research data sets, and depends largely on the task that is going to be undertaken. For this reason we have typically provide a number of macro views which can be used to start filtering or analyzing the data.
In the PeptideAtlas project the visual tools provide three different "overview" starting points to explore the underlying data. Using the mspecLINE tool (see Figure 4) users can mine literature associations through a disease of interest (e.g. breast neoplasm) and find observed peptide spectra that are linked within the literature. These spectra can then be viewed in the context of the genome or directly viewed. Using the PeptideAtlas Circos visualization, spectra can also be searched by a gene of interest or chromosomal location (see Figure 5) then viewed or exported to other tools. Alternatively, data from other experiments (e.g RNASeq) or network inference analysis can be imported into Cytoscape, and then information about the suitability of the associated proteins to act as biomarkers can be overlaid (see Figure 6).
Using data from TCGA analyses a separate set of macro visualizations provides users with several methods to search through the data. Starting from the Cancer Comparator (see Figure 1), a user can explore gene disruption rates across multiple diseases. This list of genes can then be used to mine a single cancer across multiple samples as within the genome visualization (see Figure 2). The genome visualization can then lead further to specific samples where the gene of interest can be compared across tens of patient samples with the disruptions annotated (see Figure 3).
Common understandable metaphors
Information visualization provides a means for the non-expert user to mine, interrogate and interact with information at a highly conceptual level. This conceptualization is through a mental model (or metaphor) of the data which is shared by both the developer of the visualization and the end-user. The majority of information visualizations are based around the idea of providing a metaphor which is easily and immediately understandable enables rich interactions with complex and diverse data sets.
In the development of the visualizations discussed above it was found that immediate understanding, typically through the use of common metaphors or positive knowledge transfer, were an important facet of the success of the visualization in portraying information. Rapid understanding and user acceptance of a visualization is important, as it allows scientists to immediately understand what is being portrayed. For example, the Circos plot suffers due to problems associated with the use of atypical non-rectangular based interactions, making it more difficult to use standard mouse drag based operations. However, the familiarity of the metaphor means that people are willing to accept these limitations as they understand both the visual encoding of the information, and are familiar with the layout. Conversely other visualizations, such as the structural variation visualization, are less easy to immediately understand and so general acceptance is limited. In our experience, easy to understand visualizations are those that are used widely, as the researchers themselves typically seek to apply the visualization to other data sets. The metaphors that are familiar are frequently those that are popularly and generically used in visualization (e.g. parallel coordinates) or those that are commonly used in a specific domain (e.g. pathway diagrams, circular genome plots).
Biology is a big data science with the added complexity that there is no clear understanding as to how the data may be used. Supporting the scale of data that is being generated has led to the development of a number of large scale repositories, and the need to provide visualization tools to mine this data. This paper discusses work that has been undertaken to provide such visual mining tools, and also discusses lessons that have been learned providing tools for both local researchers and the wider community.
We have developed a number of bespoke visual tools, but have preferred to adopt more commonly used designs. In this paper we have discussed a number of visualizations, including: an interactive Circos viewer with context sensitive zooming provided through the track viewer, which shows genomic features and their interactions; Parallel Coordinates, which is used to show and analyze comparisons of genes that are disruptions in different cancers; a table based view, for exploring proteins which are associated with specific diseases and are detectable; and a gene rearrangement viewer, which shows the complexity of localized rearrangements that have been identified through anomalies in read-pairs. We have found that delivery using web technologies is preferable, both due to the low admin requirements and the diverse community using the data. These visualizations each have advantages and disadvantages. The macro views, such as Circos and the parallel coordinates, are relatively easy to understand and use as they provide a high level overview. The nested views, such as the track and gene rearrangement views tend to be more specialized and therefore require some level of learning. Interactivity suffers with the high level views due to the number of items being displayed, this is especially true due to the limitations in web based delivery. Where possible principles of information visualization have been adopted which have been used extensively elsewhere to visualize biological data (e.g. context sensitive displays, multiple encoding of information using intrinsic and extrinsic properties, boundaries and brushing [18, 29, 55, 56]). However, practicalities due to the demands of research (i.e. short time scales, small development teams) means that good design always has to be weighed against rapidly delivering multiple visualizations with the required functionality.
Visual exploration of the large-scale atlas data sets being produced by the PeptideAtlas projects and to support TCGA analysis allows researchers to mine data to find new meaning and make sense at scales from single samples to entire populations. Providing task specific views that allow a user to start from points of interest (from diseases to single genes) allows targeted exploration of thousands of spectra and genomes. As the composition of the atlases changes, and our understanding of the biology increase, new tasks will continually arise. It is therefore important to provide the means to make the data available in a suitable manner in as small a time as possible. We have done this through the use of common visualization workflows, into which we rapidly deploy visual tools which follow common metaphors where possible to assist in understanding. Rapid development of tools and task specific views allows researchers to mine large-scale data almost as quickly as it is produced. Ultimately these visual tools enable new inferences, new analyses and further refinement of atlas level data.
This work was supported by grants U24CA143835 and R01CA137442 from the National Cancer Institute, P50GM076547 and R01GM087221 from the National Institute of General Medical Sciences, and NIH contract HHSN272200700038C from the National Institute of Allergy and Infectious Diseases. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
- Desiere F, et al.: The PeptideAtlas project. Nucleic Acids Res 2006, (34 Database):D655-D658.Google Scholar
- McLendon R, et al.: Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 2008, 455(7216):1061–1068. 10.1038/nature07385View ArticleGoogle Scholar
- Santamaria R, et al.: Systems biology of infectious diseases: a focus on fungal infections. Immunobiology 2011, 216(11):1212–1227. 10.1016/j.imbio.2011.08.004View ArticlePubMedGoogle Scholar
- Ideker T, Galitski T, Hood L: A new approach to decoding life: systems biology. Annu Rev Genomics Hum Genet 2001, 2: 343–372. 10.1146/annurev.genom.2.1.343View ArticlePubMedGoogle Scholar
- Suderman M, Hallett M: Tools for visually exploring biological networks. Bioinformatics 2007, 23(20):2651–2659. 10.1093/bioinformatics/btm401View ArticlePubMedGoogle Scholar
- De Pauw W, Vlissides J, Wegman M: Execution Patterns in Object-Oriented Visualization, in Conference on Object-Oriented Technologies and Systems. 1998.Google Scholar
- Topol B, Sunderman V: PVaniM: A Tool for Visualization in Network Computing Environments. Volume 10. Concurrency: Practice & Experience; 1998.Google Scholar
- van Ham F: Using Multilevel Call Matrices in Large Software Projects, in Information Visualization. 2003.Google Scholar
- Aftandilian E, et al.: Heapviz: interactive heap visualization for program understanding and debugging, in International Symposium on Software Visualization. 2010.View ArticleGoogle Scholar
- Herman I, Marhsall M: Graph Visualization and Navigation in Information Visualization: a Survey. IEEE Transactions on Visualization and Computer Graphics 2000., 6(1):Google Scholar
- Holton D: Hierarchical Edge Bundles: Visualization of Adjacency Relations in Hierarchical Data. IEEE Transactions on Visualization and Computer Graphics 2006., 12(5):Google Scholar
- Wong N, Greensberg S: EdgeLens:An Interactive Method for Managing Edge Congestion in Graphs, in Information Visualization. 2003.Google Scholar
- Shneiderman B, Aris A: Network Visualization by Semantic Substrates. IEEE Transactions on Visualization and Computer Graphics 2006., 12(5):Google Scholar
- Gansner E, Koren Y, North S: Topological fisheye views for visualizing large graphs. IEEE Transactions on Visualization and Computer Graphics 2005., 11(4):Google Scholar
- Bier E, et al.: Toolglass and magic lenses: the see-through interface, in Computer graphics and interactive techniques. 1993.View ArticleGoogle Scholar
- Plaisant C, Grosjean J, Bederson B: SpaceTree: supporting exploration in large node link tree, design evolution and empirical evaluation, in Information Visualization. 2002.Google Scholar
- White R, et al.: Supporting Exploratory Search. Volume 49. Communications of the ACM; 2006.Google Scholar
- Barsky A, et al.: Cerebral: a Cytoscape plugin for layout of and interaction with biological networks using subcellular localization annotation. Bioinformatics 2007., 23(8):Google Scholar
- Krywinski M, et al.: Hive plots - rational approach to visualizing networks. Briefings in Bioinformatics 2011. doi: 10.1093/bib/bbr069 doi: 10.1093/bib/bbr069Google Scholar
- Becker R, Cleveland W, Wilks A: Dynamic Graphics for Data Analysis. Statistical Science 1987., 2(4):Google Scholar
- Inselberg A: The Plane with Parallel Coordinates. Volume 1. The Visual Computer; 1985.Google Scholar
- Anhlberg C: Spotfire: an information exploration environment. Volume 25. ACM SIGMOD; 1996.Google Scholar
- Zhang L, Zhang A, Ramanathan M: VizStruct: exploratory visualization for gene expression profiling. Bioinformatics 2004, 20(1):85–92. 10.1093/bioinformatics/btg377PubMed CentralView ArticlePubMedGoogle Scholar
- Komura D, et al.: Multidimensional support vector machines for visualization of gene expression data. Bioinformatics 2005, 21(4):439–444. 10.1093/bioinformatics/bti188View ArticlePubMedGoogle Scholar
- Santamaria R, Theron R, Quintales L: BicOverlapper: a tool for bicluster visualization. Bioinformatics 2008, 24(9):1212–1213. 10.1093/bioinformatics/btn076View ArticlePubMedGoogle Scholar
- Eilers PH, Goeman JJ: Enhancing scatterplots with smoothed densities. Bioinformatics 2004, 20(5):623–628. 10.1093/bioinformatics/btg454View ArticlePubMedGoogle Scholar
- Kim J, et al.: ChromoViz: multimodal visualization of gene expression data onto chromosomes using scalable vector graphics. Bioinformatics 2004, 20(7):1191–1192. 10.1093/bioinformatics/bth052View ArticlePubMedGoogle Scholar
- Saeed AI, et al.: TM4 microarray software suite. Methods Enzymol 2006, 411: 134–193.View ArticlePubMedGoogle Scholar
- Boyle J: SeqExpress: desktop analysis and visualization tool for gene expression experiments. Bioinformatics 2004, 20(10):1649–1650. 10.1093/bioinformatics/bth123View ArticlePubMedGoogle Scholar
- Seo J, Gordish-Dressman H, Hoffman EP: An interactive power analysis tool for microarray hypothesis testing and generation. Bioinformatics 2006, 22(7):808–814. 10.1093/bioinformatics/btk052View ArticlePubMedGoogle Scholar
- Inselberg A, Dimsdale B: Parallel coordinates: a tool for visualizing multi-dimensional geometry, in Information Visualization. 1990.Google Scholar
- Chen K, et al.: BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat Methods 2009, 6(9):677–681. 10.1038/nmeth.1363PubMed CentralView ArticlePubMedGoogle Scholar
- Krzywinski M, et al.: Circos: an information aesthetic for comparative genomics. Genome Res 2009, 19(9):1639–1645. 10.1101/gr.092759.109PubMed CentralView ArticlePubMedGoogle Scholar
- Davy BE, Robinson ML: Congenital hydrocephalus in hy3 mice is caused by a frameshift mutation in Hydin, a large novel gene. Hum Mol Genet 2003, 12(10):1163–1170. 10.1093/hmg/ddg122View ArticlePubMedGoogle Scholar
- Dawe HR, et al.: The hydrocephalus inducing gene product, Hydin, positions axonemal central pair microtubules. BMC Biol 2007, 5: 33. 10.1186/1741-7007-5-33PubMed CentralView ArticlePubMedGoogle Scholar
- Lukk M, et al.: A global map of human gene expression. Nat Biotechnol 2010, 28(4):322–324. 10.1038/nbt0410-322PubMed CentralView ArticlePubMedGoogle Scholar
- Tanaka M, et al.: Identification of candidate cooperative genes of the Apc mutation in transformation of the colon epithelial cell by retroviral insertional mutagenesis. Cancer Sci 2008, 99(5):979–985. 10.1111/j.1349-7006.2008.00757.xView ArticlePubMedGoogle Scholar
- Parsons DW, et al.: An integrated genomic analysis of human glioblastoma multiforme. Science 2008, 321(5897):1807–1812. 10.1126/science.1164382PubMed CentralView ArticlePubMedGoogle Scholar
- Cerami E2009. [http://cbio.mskcc.org/tcga-generanker/index.jsp]
- Sjoblom T, et al.: The consensus coding sequences of human breast and colorectal cancers. Science 2006, 314(5797):268–274. 10.1126/science.1133427View ArticlePubMedGoogle Scholar
- D'Angelo A, Franco B: The dynamic cilium in human diseases. PathoGenetics 2009, 2(1):3. 10.1186/1755-8417-2-3PubMed CentralView ArticlePubMedGoogle Scholar
- Masica DL, Karchin R: Correlation of somatic mutation and expression identifies genes important in human glioblastoma progression and survival. Cancer Res 2011, 71(13):4550–4561. 10.1158/0008-5472.CAN-11-0180PubMed CentralView ArticlePubMedGoogle Scholar
- Anderson L, Hunter CL: Quantitative mass spectrometric multiple reaction monitoring assays for major plasma proteins. Molecular & cellular proteomics: MCP 2006, 5(4):573–588.View ArticleGoogle Scholar
- Aranda B, et al.: The IntAct molecular interaction database in 2010. Nucleic Acids Res 2010, (38 Database):D525-D531.Google Scholar
- Licata L, et al.: MINT, the molecular interaction database: 2012 update. Nucleic Acids Res 2012, (40 Database):D857-D861.Google Scholar
- Handcock J, Deutsch EW, Boyle J: mspecLINE: bridging knowledge of human disease with the proteome. BMC Med Genomics 2010, 3: 7. 10.1186/1755-8794-3-7PubMed CentralView ArticlePubMedGoogle Scholar
- Brusniak MY, et al.: ATAQS: a computational software tool for high throughput transition optimization and validation for selected reaction monitoring mass spectrometry. BMC Bioinforma 2011, 12: 78. 10.1186/1471-2105-12-78View ArticleGoogle Scholar
- Shannon P, et al.: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 2003, 13(11):2498–2504. 10.1101/gr.1239303PubMed CentralView ArticlePubMedGoogle Scholar
- Killcoyne S, Boyle J: Managing Chaos: lessons learned developing software in the life sciences. Computing in science & engineering 2009, 11(6):20–29.View ArticleGoogle Scholar
- Boyle J, et al.: Systems biology driven software design for the research enterprise. BMC Bioinforma 2008, 9: 295. 10.1186/1471-2105-9-295View ArticleGoogle Scholar
- Boyle J, et al.: Adaptable data management for systems biology investigations. BMC Bioinforma 2009, 10: 79. 10.1186/1471-2105-10-79View ArticleGoogle Scholar
- Bostock M, Heer J: Protovis: a graphical toolkit for visualization. IEEE Trans Vis Comput Graph 2009, 15(6):1121–1128.View ArticlePubMedGoogle Scholar
- Tufte E: Envisioning Information. 1990.Google Scholar
- Shneiderman B: The eye have it: A task by data type taxonomy for information visualizations. IEEE Visual Languages 1996.Google Scholar
- Maere S, Heymans K, Kuiper M: BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics 2005, 21(16):3448–3449. 10.1093/bioinformatics/bti551View ArticlePubMedGoogle Scholar
- Vlasblom J, et al.: GenePro: a Cytoscape plug-in for advanced visualization and analysis of interaction networks. Bioinformatics 2006, 22(17):2178–2179. 10.1093/bioinformatics/btl356View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.