INTEGRATOR: interactive graphical search of large protein interactomes over the Web
© Chang et al; licensee BioMed Central Ltd. 2006
Received: 31 October 2005
Accepted: 16 March 2006
Published: 16 March 2006
The rapid growth of protein interactome data has elevated the necessity and importance of network analysis tools. However, unlike pure text data, network search spaces are of exponential complexity. This poses special challenges for storing, searching, and navigating this data efficiently. Moreover, development of effective web interfaces has been difficult.
We present Integrator, a web-integrated graphical search tool for protein-protein interaction networks across 50+ genomes.
Integrator provides single and multiple protein searches of the Bioverse database containing experimentally-derived and predicted protein-protein interactions. The interface provides animated local network views, rapid subgraph manipulation, and cross-referencing of functional annotations. Integrator is available at http://bioverse.compbio.washington.edu/integrator.
High-throughput technologies that monitor cellular components on a large scale are becoming ubiquitous in the post-genomic era. An important analytical paradigm in systems biology is the molecular interaction network . Networks provide an intuitive visualization of component relationships and are amenable to quantitative graph analysis. Constituents include genes, proteins, small molecules or combinations thereof [2–5]. In particular, public repositories of protein-protein interaction (PPI) data collected from yeast two-hybrid arrays, affinity chromatography, and manual curation methods have grown significantly in recent years [6–8].
Building search tools that effectively navigate these interaction networks remains a significant informatics challenge. This is due in large part to the exponential complexity of the search space, rapid data turnover, decentralized storage of primary data, and diversity in data models . Taking this into consideration, we present Integrator, a tool for the analysis of PPI networks using a centralized data model. Integrator is composed of a highly interactive, low-memory overhead network viewer with an enterprise-level application server back-end accessing data from the Bioverse project [10, 11]. This database contains a large collection of experimentally-derived and predicted PPI data for over 50 genomes based in part by applying the Interolog prediction method . Interologs are interactions predicted between proteins in one species using experimental interaction evidence and the relative sequence homologies to proteins in an orthologous species. Such predictions have been used to extrapolate novel functional annotations for previously unannotated proteins with high accuracy .
In contrast to stand-alone network viewer applications, including one previously released by our group , the Integrator interface is completely intertwined with a server-based web application. This means several million PPIs stored in a relational database can be explored quickly over the web through common web browsers with minimal additional software. Integrator provides several new graph manipulation features that significantly improve upon tools previously released. It also performs multiple protein searches where protein identifier sets can be compared or contrasted by connected graph components. Integrator is a simple, all-in-one graphical search solution for large interactomes across several genomes.
Integrator is based on a three-tier web application architecture using the Java-based Struts web application framework . This design partitions the client, server, and database into three separate information layers. Advantages to this approach include the avoidance of having users install memory-intensive client programs, especially whenever an upgrade becomes available  and placing computational load away from clients and onto high-performance servers. The Struts model-view-controller (MVC) paradigm is used to organize tasks including node and edge searches, identifier synonym resolution, viewer assembly, and database query. The JUNG graph analysis Java library is used for connected component analysis in multiple protein searches . The network viewer is a modification of the Touchgraph Java applet viewer . The data layer contains non-redundant pairwise PPI data (experimentally derived and predicted) from the Bioverse project warehoused in a MySQL database as described previously [10, 11].
Results and discussion
Single protein identifier search
To search networks around a specific protein, Integrator first attempts to locate exact or similar identifier matches to a given query. If a single match is found, the user is returned a graph around the query protein. If similar identifier matches exist, the user is given a list from which to narrow the search. Links to sequence and functional annotation data for each protein are provided to aid this process. Integrator currently recognizes a number of identifiers including those from Genbank, Flybase, Wormbase, and the Saccharomyces Genome Database [10, 19–23].
Two tables are also provided below the main viewer, which list the proteins and their interactions. The protein (node) table can be used to sort and manipulate node size, shape, or color. Similarly, the interaction (edge) table can be used to sort columns and select specific interactions to render subgraphs in the graph window. A detailed navigation tutorial is provided online.
Multiple protein identifier search
Integrator provides a simple, unified interface for interactome data using familiar web technology. The learning curve required is relatively low and installation of client software is minimal. As a result, users can focus quickly on network analysis. By restricting searches to varying ranges of PPI (edge) confidence values, a user can compare different networks for a given set of proteins. Users have the added flexibility of manipulating subgraphs within each network using graphical and tabular interfaces.
One restriction for users is the method of PPI prediction scoring currently employed in the Bioverse database that is based on the Interolog method . However, users do have the option to restrict their analysis to original source PPIs by filtering for interactions with the highest confidence score (1.0) using the table interface. As new PPI data sets become available, we will continue to update the database, meaning search results are likely to change over time. Although the current database is limited to intra-genomic PPIs, efforts are underway to compile inter-genomic PPIs (i.e. host-pathogen PPIs).
Like other two-dimensional graph viewers, there is a practical upper-bound limit on the number of nodes in a network (~500) that can be effectively viewed at once. This is commonly due to spatial crowding produced by existing layout algorithms. Integrator attempts to overcome this limitation by utilizing multiple-frame searches. An alternative solution for viewing large global networks might be the use of three-dimensional hyperbolic layout viewers that can interactively display >100,000 simultaneous nodes . Such viewers are promising approaches to network visualization with the caveat that they are dependent on well-defined minimum spanning trees and require significant computational overhead. Two-dimensional viewers will likely remain the preferred interface of choice for related, web-based search technologies.
Currently, our group is working towards providing an advanced text search solution through the Bioverse search homepage (http://bioverse.compbio.washington.edu). This will include significant enhancements in Boolean queries and complex query handling. We are working towards a tighter integration of this interface with Integrator's network analysis tools. Furthermore, by making the Integrator codebase available to the public, we hope that integration with similar or derivative projects will provide interesting new features in the future.
Availability and requirements
Integrator is freely available to all users through any Java-enabled web browser at http://bioverse.compbio.washington.edu/integrator. All visits to the web application preserve user and data anonymity. A source code distribution is available at this site.
A.N.C. was supported by a National Library of Medicine Medical Informatics Training Grant (1T15LM07441-01). This work is supported in part by a Searle Scholar Award and NSF Grant DBI-0217241 to R.S.
- Barabasi AL, Oltvai ZN: Network biology: understanding the cell's functional organization. Nat Rev Genet 2004, 5: 101–113. 10.1038/nrg1272View ArticlePubMedGoogle Scholar
- Ito T, Chiba T, Yoshida M: Exploring the protein interactome using comprehensive two-hybrid projects. Trends Biotechnol 2001, 19: S23–7. 10.1016/S0167-7799(01)01790-5View ArticlePubMedGoogle Scholar
- Tong AH, Evangelista M, Parsons AB, Xu H, Bader GD, Page N, Robinson M, Raghibizadeh S, Hogue CW, Bussey H, Andrews B, Tyers M, Boone C: Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science 2001, 294: 2364–2368. 10.1126/science.1065810View ArticlePubMedGoogle Scholar
- Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, Eng JK, Bumgarner R, Goodlett DR, Aebersold R, Hood L: Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 2001, 292: 929–934. 10.1126/science.292.5518.929View ArticlePubMedGoogle Scholar
- Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I, Zeitlinger J, Jennings EG, Murray HL, Gordon DB, Ren B, Wyrick JJ, Tagne JB, Volkert TL, Fraenkel E, Gifford DK, Young RA: Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 2002, 298: 799–804. 10.1126/science.1075090View ArticlePubMedGoogle Scholar
- Xenarios I, Rice DW, Salwinski L, Baron MK, Marcotte EM, Eisenberg D: DIP: the database of interacting proteins. Nucleic Acids Res 2000, 28: 289–291. 10.1093/nar/28.1.289PubMed CentralView ArticlePubMedGoogle Scholar
- Rigaut G, Shevchenko A, Rutz B, Wilm M, Mann M, Seraphin B: A generic protein purification method for protein complex characterization and proteome exploration. Nat Biotechnol 1999, 17: 1030–1032. 10.1038/13732View ArticlePubMedGoogle Scholar
- Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, Qureshi-Emili A, Li Y, Godwin B, Conover D, Kalbfleisch T, Vijayadamodar G, Yang M, Johnston M, Fields S, Rothberg JM: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 2000, 403: 623–627. 10.1038/35001009View ArticlePubMedGoogle Scholar
- Stein LD: Integrating biological databases. Nat Rev Genet 2003, 4: 337–345. 10.1038/nrg1065View ArticlePubMedGoogle Scholar
- McDermott J, Samudrala R: Bioverse: Functional, structural and contextual annotation of proteins and proteomes. Nucleic Acids Res 2003, 31: 3736–3737. 10.1093/nar/gkg550PubMed CentralView ArticlePubMedGoogle Scholar
- McDermott J, Samudrala R: Enhanced functional information from predicted protein networks. Trends Biotechnol 2004, 22: 60–2; discussion 62–3. 10.1016/j.tibtech.2003.11.010View ArticlePubMedGoogle Scholar
- Matthews LR, Vaglio P, Reboul J, Ge H, Davis BP, Garrels J, Vincent S, Vidal M: Identification of potential interaction networks using sequence-based searches for conserved protein-protein interactions or "interologs". Genome Res 2001, 11: 2120–2126. 10.1101/gr.205301PubMed CentralView ArticlePubMedGoogle Scholar
- McDermott J, Bumgarner R, Samudrala R: Functional annotation from predicted protein interaction networks. Bioinformatics 2005, 21: 3217–3226. 10.1093/bioinformatics/bti514View ArticlePubMedGoogle Scholar
- Chang AN, McDermott J, Samudrala R: An enhanced Java graph applet interface for visualizing interactomes. Bioinformatics 2005, 21: 1741–1742. 10.1093/bioinformatics/bti237View ArticlePubMedGoogle Scholar
- Struts Framework[[http://struts.apache.org/]]
- Kurniawan B: Java for the Web with Servlets, JSP, and EJB. Indianapolis, New Riders; 2002.Google Scholar
- JUNG Graph Library[[http://jung.sourceforge.net]]
- Touchgraph Viewer[[http://touchgraph.sourceforge.net]]
- Gelbart WM, Crosby M, Matthews B, Rindone WP, Chillemi J, Russo Twombly S, Emmert D, Ashburner M, Drysdale RA, Whitfield E, Millburn GH, de Grey A, Kaufman T, Matthews K, Gilbert D, Strelets V, Tolstoshev C: FlyBase: a Drosophila database. The FlyBase consortium. Nucleic Acids Res 1997, 25: 63–66. 10.1093/nar/25.1.63PubMed CentralView ArticlePubMedGoogle Scholar
- Benson DA, Boguski MS, Lipman DJ, Ostell J, Ouellette BF, Rapp BA, Wheeler DL: GenBank. Nucleic Acids Res 1999, 27: 12–17. 10.1093/nar/27.1.12PubMed CentralView ArticlePubMedGoogle Scholar
- Christie KR, Weng S, Balakrishnan R, Costanzo MC, Dolinski K, Dwight SS, Engel SR, Feierbach B, Fisk DG, Hirschman JE, Hong EL, Issel-Tarver L, Nash R, Sethuraman A, Starr B, Theesfeld CL, Andrada R, Binkley G, Dong Q, Lane C, Schroeder M, Botstein D, Cherry JM: Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms. Nucleic Acids Res 2004, 32 Database issue: D311–4. 10.1093/nar/gkh033View ArticleGoogle Scholar
- Stein L, Sternberg P, Durbin R, Thierry-Mieg J, Spieth J: WormBase: network access to the genome and biology of Caenorhabditis elegans. Nucleic Acids Res 2001, 29: 82–86. 10.1093/nar/29.1.82PubMed CentralView ArticlePubMedGoogle Scholar
- Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Barrell D, Bateman A, Binns D, Biswas M, Bradley P, Bork P, Bucher P, Copley RR, Courcelle E, Das U, Durbin R, Falquet L, Fleischmann W, Griffiths-Jones S, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lopez R, Letunic I, Lonsdale D, Silventoinen V, Orchard SE, Pagni M, Peyruc D, Ponting CP, Selengut JD, Servant F, Sigrist CJ, Vaughan R, Zdobnov EM: The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res 2003, 31: 315–318. 10.1093/nar/gkg046PubMed CentralView ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000, 25: 25–29. 10.1038/75556PubMed CentralView ArticlePubMedGoogle Scholar
- Munzner T: Exploring Large Graphs in 3D Hyperbolic Space. IEEE Computer Graphics and Applications 1998, 18: 18–23. 10.1109/38.689657View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.