IntNetDB v1.0: an integrated protein-protein interaction network database generated by a probabilistic model
© Xia et al; licensee BioMed Central Ltd. 2006
Received: 14 July 2006
Accepted: 18 November 2006
Published: 18 November 2006
Although protein-protein interaction (PPI) networks have been explored by various experimental methods, the maps so built are still limited in coverage and accuracy. To further expand the PPI network and to extract more accurate information from existing maps, studies have been carried out to integrate various types of functional relationship data. A frequently updated database of computationally analyzed potential PPIs to provide biological researchers with rapid and easy access to analyze original data as a biological network is still lacking.
By applying a probabilistic model, we integrated 27 heterogeneous genomic, proteomic and functional annotation datasets to predict PPI networks in human. In addition to previously studied data types, we show that phenotypic distances and genetic interactions can also be integrated to predict PPIs. We further built an easy-to-use, updatable integrated PPI database, the Integrated Network Database (IntNetDB) online, to provide automatic prediction and visualization of PPI network among genes of interest. The networks can be visualized in SVG (Scalable Vector Graphics) format for zooming in or out. IntNetDB also provides a tool to extract topologically highly connected network neighborhoods from a specific network for further exploration and research. Using the MCODE (Molecular Complex Detections) algorithm, 190 such neighborhoods were detected among all the predicted interactions. The predicted PPIs can also be mapped to worm, fly and mouse interologs.
IntNetDB includes 180,010 predicted protein-protein interactions among 9,901 human proteins and represents a useful resource for the research community. Our study has increased prediction coverage by five-fold. IntNetDB also provides easy-to-use network visualization and analysis tools that allow biological researchers unfamiliar with computational biology to access and analyze data over the internet. The web interface of IntNetDB is freely accessible at http://hanlab.genetics.ac.cn/IntNetDB.htm. Visualization requires Mozilla version 1.8 (or higher) or Internet Explorer with installation of SVGviewer.
Protein-protein interactions (PPIs) underlie most biological processes. Dissecting the PPI network for a particular biological process may provide important clues into molecular mechanisms of the process . Recently, large-scale experimental studies have generated many PPI datasets in different model organisms by yeast two-hybrid (Y2H) screens [2–8] and by co-affinity purification (co-AP) followed by mass spectrometry (MS) [9, 10]. These studies have provided opportunities to examine cellular function at a network level.
There are two shortcomings of these data: (a) the coverage is very low and far from complete, and (b) the accuracy of each dataset is generally not very high and varies considerably from dataset to dataset . The unreliability and incompleteness of PPI data complicates elucidation of biological processes or cellular functions, and may potentially misrepresent the topological features of the network . Many methods have been used to predict PPI networks . These fit into three categories: sequence based , high-throughput data-based, and a combination of sequence and high-throughput data. The sequence-based prediction methods include gene fusion, gene neighborhood and phylogenic profiles , and predictions based on protein/domain structure [16, 17]. The high-throughput data based methods predict PPIs from data generated by high-throughput experiments, such as correlated mRNA expression [11, 18], correlated phenotype profiles , shared protein interaction partners , shared genetic interaction profiles [21, 22], or similar subcellular localizations . The combination methods predict interologs based on gene orthologs [23, 24].
Recently machine learning methods have been introduced to predict PPIs by combining genomic and experimental features. Bayesian classifiers are probability-based and competent in integrating large numbers of heterogeneous datasets [25–27]. Probabilistic decision trees and random forest (a collection of decision trees) specialize in classifying objects into different categories [28–31]. Logistic regression is especially suited for assigning elements into two opposing groups [32–35]. Support vector machines (SVM) have been used to predict PPIs from a limited number of attributes to binary outputs (interact versus not interact), but has not been used for integrating multiple evidences [36–43].
Among these machine learning approaches, Bayesian probabilistic model has many unique advantages in predicting PPIs. It can handle heterogeneous data types, such as numerical phenotype values, discrete survival fitness values, vector microarray expression values, binary interactome values or categorical Gene Ontology annotation values. Heterogeneous data types can be transformed into one uniform probabilistic score by calculating the likelihood ratios. Each data source is automatically weighted according to its confidence level. Missing data are tolerable for integration. Furthermore, Bayesian model is a fast simple algorithm, as it is probability-based and does not require much time to standardize different data of different sources or types. Most importantly, Bayesian model has been proven by previous studies to be particularly competent in predicting PPIs [31, 32]. Lastly, the simple integration scheme is very suitable for updating or including future datasets.
To date the Bayesian model has mostly been applied to yeast, and rarely to predict human PPI [27, 44]. Rhodes et al integrated 13 datasets of four different data types: physical interactions in model organism, co-expression, domain-domain interactions and shared biological functions . However, other types of high-throughput data then available were not examined. Since the publication of this analysis many other high-throughput data have been generated, some directly done on human proteins. Furthermore, the ever-growing high-throughput data and the data mining demand from the research community require a more comprehensive, current and updatable integration platform and database for integrating, storing, visualizing and mining the data. Toward achieving these goals, we examined the predictive power of new data types and datasets, created an Integrated Network Database (IntNetDB) and provided easy-to-use web-based visualization and data mining options.
We chose to adopt the Bayesian analysis method, because of unique advantages in predicting PPIs , and because of its proven effectiveness established by previous studies [25–27]. From the first  to the latest study  using this analysis framework, more accurate and more extensive integrated PPI networks have been predicted. Here, using ten-fold cross validation, we also demonstrated the effectiveness of Bayesian analysis in predicting human PPIs from 27 datasets of seven different data types.
Construction and content
Gold standard for integration
Naïve Bayes classifiers require a gold standard positive (GSP) and a gold standard negative (GSN) dataset. The Human Protein Reference Database (HPRD)  is a protein-protein interaction database with 19,438 distinct interactions among 5,983 proteins. It is manually curated by expert biologists based on small-scale and focused experiments described in the scientific literature. We accepted it as high quality and used it as the GSP dataset. We used the GSN dataset previously generated by Rhodes et al , which includes all the possible pair-wise combinations between two sets of proteins that are assigned a subcellular localization of the plasma membrane (1397 proteins) and the nucleus (2224 proteins), respectively, by the Gene Ontology (GO) Consortium . The GSN includes a total of 3,106,928 interactions. The size and the ratio of GSP and GSN are adequate for covering interactions of low prediction probability and for predicting human PPIs . To measure the predictive power or confidence level, we used the likelihood ratio (LR) of a gene pair to be a true positive interaction versus a true negative. This is calculated by Pr(E|GSP)/Pr(E|GSN), where the Pr(E|GSP) is the probability of a certain evidence observed within GSP set and Pr (E|GSN) is the probability of a certain evidence observed within GSN set (Methods).
Physical protein-protein interactions
Phenotypic data from model organisms
Loss of function among interacting or functionally related proteins tends to result in similar phenotypes [52–54]. Several large-scale phenotypic datasets are available for model organisms [52–54]. RNAi phenotype data have been used to predict PPIs for model organisms . To examine whether phenotype data from model organisms are also predictive for human PPIs, we transferred phenotype data from model organisms to human by matching the genes in model organisms to their corresponding human orthologs. We then calculated the pair-wise phenotype similarity scores between genes. Considering the various forms of the phenotypic data, we used different measurements for phenotypic distance depending on the form of phenotypic values. For the dataset with one value for each gene under a single condition, we simply used the absolute value of arithmetic difference between the phenotypic values . For the dataset with discrete values under multiple conditions, we used the cosine value between the phenotypic values of a pair of genes . For the dataset with continuous value under multiple conditions , the Pearson Correlation Coefficient (PCC) was used. Then each phenotypic dataset was binned according to the similarity score (phenotypic difference, cosine or PCC) and the LRs were evaluated within each bin. A correlation between phenotypic similarity and LR can be observed even in the cross-species phenotypic datasets (Figure 2B–D). Therefore phenotype data can also be integrated to predict the PPI network.
Genetic interactions from model organisms
Synthetic genetic analysis (SGA) has been used in Saccharomyces cerevisiae [21, 22] to globally map yeast genetic interactions. A significant overlap between PPIs and genetic interactions was recently demonstrated . The number of common neighbors between a pair of genetically interacting genes can be used to predict potential PPIs . We implemented such an analysis on binary yeast genetic interaction datasets [21, 22]. First, the genetic interactions were mapped as human interologs. Then we binned all the interactions by the number of the shared neighbors, and the LRs were calculated for each bin (Figure 2E). We found that the more neighbors a pair of genetically interacting genes share, the more likely a direct PPI occurs between them. These genetic interaction data actually gave rise to very high confidence PPI predictions, slightly lower than large-scale PPI mapping (Figure 2E).
Genes that exhibit mRNA co-expression tend to show protein interaction , especially for those in the same complex or in the same biochemical reaction. Such correlated genes might be regulated by the same transcriptional factor or a set of transcription factors. We examined three high quality large-scale microarray datasets [57–59] to predict the human PPIs. One dataset consists of gene expression profiles during aging of the human brain, and the other two consist of gene expression profiles across a variety of tissues or cell lines. For each dataset, gene pairs were assigned into 20 bins of increasing pairwise expression PCC values (Figure 2F). We observed a significant correlation between expression PCC and the likelihood of forming direct PPI (measured by LR) between a pair of genes when the PCC is above 0.5, which may help to predict human PPIs (Figure 2F).
Shared functional annotation
Proteins with the same biological function are more likely to physically interact than those without. In addition, proteins sharing a more specific annotation are more likely to interact than those sharing a commoner less specific annotation. The Gene Ontology Consortium  has assigned 4,416 GO annotations to 14,801 human genes (proteins). To quantify the similarity between gene annotations, we identified the smallest shared biological process (SSBP) between a pair of genes . The SSBP is calculated by three steps: (1) find all the GO terms shared by each pair of genes, (2) find the number of other genes also sharing these GO terms, (3) get the GO term with the smallest gene count. In agreement with expectation, the smaller the SSBP count, the more likely the proteins are to directly interact (Figure 2G). We also examined the GO annotations of four model organisms (Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster and Mus musculus). When genes from the four model organisms were mapped to human orthologs, the SSBP still has good predictive power to predict human PPIs (Figure 2G).
Domain-domain interaction (DDI)
Many protein-protein interactions are mediated by domain-domain interactions. If two domains can physically interact, proteins containing these two domains are also likely to interact. DDI predictions have been explored before. We used the domain-domain interaction scores in InterDom, which are derived largely from structural information . After transferring the scores to each pair of proteins that contain the interacting domains, we observed a weak but still clear correlation of the domain interaction score with LR (Figure 2H).
Gene context analysis
Gene context refers to in silico PPI predictions based on genome sequences . Three types of gene context information have been used to predict protein-protein interactions: a) gene fusion/fission, which finds that interacting proteins in one species are more likely to be fused into one single protein in another species; b) gene co-occurrence which finds that interacting proteins are more likely to be found both present or both absent in an organism with a fully sequenced genome; c) gene neighborhood which finds that functionally coupled genes (interacting proteins or proteins in a complex) are more likely to be located in the same operon or gene cluster in a genome. Gene context analysis has been previously performed by von Mering et al,  to generate in silico PPIs for Saccharomyces cerevisiae. We mapped these interactions to human through interologs and found that all three types of in silico PPI predictions are suitable for predicting human PPIs (Figure 2I).
Integration by probabilistic model
We chose the Naïve Bayes classifier model because it integrates all the independent evidences simply by multiplying the LRs of diverse evidences . Such expediency will facilitate integration of more datasets in the future. Using such an integration scheme, many weak evidences from several data types can be accumulated to predict interactions with increased confidence. We assume that different data types are conditionally independent, because they are derived from different experimental technologies that aim to measure different biological features of genes/proteins or protein pairs, or are from unrelated annotations. We categorized the 27 datasets into seven different types (Figure 1) to satisfy the requirement of conditional independency by the Naïve Bayes classifier. The LR contributed by each dataset toward the predicted confidence score of a gene pair (Figure 1) was set to the LR observed for the corresponding dataset bin where the gene pair was found (Figure 2). Then the maximal LR among all the datasets of the same data type was chosen to represent the LR contributed by the particular data type toward the prediction score. Finally the ratios determined by different data types were multiplied to represent the combined LR [25, 27]. Base 2 logarithmic form of LRs (LLRs) was used to calculate the final score.
Cross validation of the integration results
Improvements over the previous integration analysis for human PPI predictions
Improvements over the previous integration analysis for human PPI predictions
Rhodes et al.
Gold standard positive (HPRD version)
Sep, 2005 (About 10000 more interactions)
Number of data types integrated
4 (PPI, GO, Microarray, DDI)
7 (PPI, GO, Microarray, DDI, Phenotypic, Genetic, Gene context)
Number of datasets integrated
Info from model organism
Only the PPI is applied
All the possible large scale data from model organism
Datasets for each data type
4 yeast, 1 worm and 1 fly
5 yeast, 1 worm, 2 fly and 2 Human
yeast, worm, fly, mouse and human
2 yeast and 1 fly datasets
2 yeast datasets
Domain-domain interaction score dataset
3 yeast datasets
Method to validate the results
One simple test using the old HPRD as training set and the updated HPRD as test set
Size of integrated network under TP/FP ratio of 1
38,379 interactions among 5,791 proteins
180,010 interactions among 9,901 proteins
One time spread sheet download
Online query options with selectable data types and confidence level; data can be downloaded through spreadsheet or network graphs.
Network visualization and analysis
Only support one single gene's search and none for download or analysis
Network cluster extraction, visualization, drill-down and download options
Protein-protein interaction search
IntNetDB also provides information on LLR cutoffs. Users can obtain more reliable PPIs by selecting larger cutoff values, than the default value of 7.0. If users need as many predicted PPIs as possible for analysis, then the predicted PPIs with LLR>6.0 are also accessible from the website. Lower LLR includes more protein-protein interactions and proteins but of lower confidence. Confidence levels (percentage of expected true positive) of the predicted PPIs at different LLR cutoffs are also provided (Figure 3A inset). From the seven integrated heterogeneous data types currently available, users can select the exact data types that they want to access. A list of the PPIs under the user-defined LLR cutoff with data type information are generated after a gene list is submitted. The default gene list uses p53 and its interacting partners in human as an example. A tab-delimited text file of PPIs can be downloaded directly from the query result page (Figure 4B).
Network visualization provides a more informative way than a simple textual list for exploration of a network. IntNetDB provides an online visualization tool for protein-protein interaction networks called 'intView'. A PPI network is represented as an undirected graph by intView, where nodes represent proteins and edges correspond to potential PPIs. Figure 4C shows the network layout of TP53 and its partners. Users can click the nodes or edges to see more information about them (Figure 4C inset). When an edge is clicked, a pop-up panel will display which data type(s) support the functional interaction between the two nodes. Different data types are denoted by different colors, and the width of the edge representing a data type is proportional to the LR value derived from that data type (Figure 4C inset). Online visualization is displayed with a scalable vector graphics (SVG) file. Postscript file and Cytoscape-compatible GML file can also be generated for download.
Extracting highly connected modules
Densely interconnected neighborhoods or clusters in a network frequently correspond to functional modules [46, 66]. IntNetDB provides a tool to extract and reveal such neighborhoods or clusters in the network of query genes. For this IntNetDB uses the MCODE algorithm , which was created by Bader et al. for predicting yeast protein complexes and implemented by Lee et al for automatically searching protein complexes in any organism . The resulting modules can be visualized by clicking on the 'Network cluster' button on the web page. All the extracted clusters in IntNetDB are visualized in Cytoscape  (Figure 4D). For cross examination and fine control, the users are referred to a probabilistic model-based cluster-finding algorithm implemented in NetworkBlast .
An advantage of extracting such densely connected neighborhoods is that new functions can be assigned to a gene, or potential functions can be assigned to a gene of unknown function based on the functions of other genes in the same cluster . As an example take the Troponin complex, a key protein complex regulating sarcomeric muscle contraction that is composed of three subunits, Tn-I, Tn-C and Tn-T. The Tn-I subunit inhibits actomyosin ATPase, while the Tn-T binds Tn-C and has high affinity for tropomyosin. With the release of intracellular calcium, Tn-C subunit binds calcium and the conformation of the troponin complex changes accordingly to overcome inhibition of actomyosin ATPase activity . In the predicted human troponin complex, TNNI3 (coding for Tn-I subunit) directly interacts with CASQ1, a protein that binds and putatively stores calcium ions in the sarcoplasmic reticulum. This finding suggests that the Troponin complex may also regulate muscle contractions through direct association with intracellular calcium stores. The network modules also show the Tronponin complex tightly connected with structural proteins (MYH6, MYL3, MYL7, MYBPC3), suggesting that they may also function in the sarcomeric muscle contraction (Figure 4D inset).
Another example of an extracted functional module is the proteasome complex involved in ATP/ubiquitin-dependent peptide cleavage (Figure 4D inset). The proteasome is a multicatalytic proteinase complex comprised of many subunits. In the extracted network module a hypothetical protein, FLJ11848, is predicted to tightly interact with many subunits of the proteasome. Interestingly, FLJ11848 contains six WD40 domains, which are frequently found in adaptor or regulatory proteins. This observation suggests that FLJ11848 might have an important role in regulating the activity of the proteasome. This prediction has been validated by recent work demonstrating that FLJ11848 functions as a negative regulator of the proteasome by controlling the assembly/disassembly of the proteasome .
Data and GUI updating scheme
To keep IntNetDB up to date, we will keep tracking newly published large-scale genomic and proteomic datasets, evaluate the performance of them for PPI prediction, and use them to update the IntNetDB. The addition of new datasets should increase the comprehensiveness of the human interactome. To allow for future extension, and to avoid the burden of updating the user interface each time new data are added, GUI items are dynamically generated from database entries to reflect the current data types available in IntNetDB
We have integrated and evaluated potential PPIs from different up-to-date genomic and proteomic features. We have provided a user-friendly query and visualization platform which can be easily extended in the future when more data become available. The product of this effort, IntNetDB, will facilitate network and functional analysis.
The Human Protein Reference Database (HPRD)  is a manually annotated protein-protein interaction database with 19,438 distinct interactions among 5,983 proteins. As it is derived from the literature of high quality experimental results, we used it as the gold standard positive (GSP) dataset. We used the gold standard negative (GSN) dataset previously generated by Rhodes et al , which includes all possible pair-wise combinations between two sets of proteins that are assigned a subcellular localization of the plasma membrane (1,397 proteins) and the nucleus (2,224 proteins), respectively, by the Gene Ontology (GO) Consortium. The whole GSN set includes a total of 3,106,928 protein pairs. Ideally the GSP and GSN should have no overlapping interactions. Of the 19,438 protein pairs in the positive gold-standard, 4,863 protein pairs are both of known subcellular localization. Of these 4,863 protein pairs, there are 330 overlapping interactions (representing a fraction of 7% = 330/4,863). This is very small compared to the randomly expected size of the intersection (representing a fraction of 38% = 1856/4,863), which was computed by assigning the protein with the shuffled subcellular localization in the GSN set. Although the gold-standard sets are imperfect, they still can provide good approximations for PPI prediction.
HPRD dataset  was downloaded on November 13, 2005. GO annotations were downloaded from NCBI on March 10, 2005. The three recently published interactome datasets [7, 8, 51], the two genetic interaction datasets [21, 22], the three gene context datasets  and the three recently published phenotypic datasets of the model organisms [52–54] were downloaded from the journal or authors' websites.
Naïve Bayes model
We used the Naïve Bayes method described in Jansen et al. and Rhodes et al. [25, 27]. We defined as positive when two proteins interact and as negative when they do not. Considering the total number of positive pairs within all the possible protein pairs, the prior odds of finding a positive pair is:
where P(positive) is the possibility of getting an interacting pair of proteins in all the possible interactions while P(negative) stands for the possibility of getting a pair of non-interacting proteins. In contrast, the posterior odds are the odds of getting a positive when we consider the given evidence:
where the evidence is a data type used to infer PPI between the proteins. The terms 'prior' and 'posterior' refer to the condition before and after considering the information provided by the N evidences. Then the likelihood ratio (L) is defined as:
which relates prior and posterior odds according to the Bayes rule:
Oposterior = L(evidence 1...evidenceN)*Oprior.
When N evidences are derived independently, the Bayes rule can be simplified to Naïve Bayes rule and L can be simplified as:
The Likelihood ratio (L) of evidence can be calculated from the positive and negative hits by binning all the evidences into discrete intervals. Then the integrated L can be multiplied from all the independent evidences.
Human orthologs in mouse, fly, worm and yeast were identified as the best reciprocal BlastP hits with e-value cutoff of 10-6 based on RefSeq protein sequences downloaded on December 9, 2004.
Availability and requirements
The web interface of IntNetDB is freely accessible at http://hanlab.genetics.ac.cn/IntNetDB.htm. The graphical layouts are based on SVG, which requires Mozilla version 1.8 and up or installation of SVGviewer for Internet Explorer.
We thank Dr. Michael Cusick for carefully reading and editing our manuscript. This work was supported by grants from the China National Science Foundation (Grant # 30588001, 90508006) and the Hundred Talents Plan of the Chinese Academy of Sciences to J.-D.J.H.
- Hartwell LH, Hopfield JJ, Leibler S, Murray AW: From molecular to modular cell biology. Nature 1999, 402(6761 Suppl):C47–52. 10.1038/35011540View ArticlePubMedGoogle Scholar
- Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, et al.: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 2000, 403(6770):623–627. 10.1038/35001009View ArticlePubMedGoogle Scholar
- Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y: A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA 2001, 98(8):4569–4574. 10.1073/pnas.061034498PubMed CentralView ArticlePubMedGoogle Scholar
- Giot L, Bader JS, Brouwer C, Chaudhuri A, Kuang B, Li Y, Hao YL, Ooi CE, Godwin B, Vitols E, et al.: A protein interaction map of Drosophila melanogaster. Science 2003, 302(5651):1727–1736. 10.1126/science.1090289View ArticlePubMedGoogle Scholar
- Li S, Armstrong CM, Bertin N, Ge H, Milstein S, Boxem M, Vidalain PO, Han JD, Chesneau A, Hao T, et al.: A map of the interactome network of the metazoan C. elegans. Science 2004, 303(5657):540–543. 10.1126/science.1091403PubMed CentralView ArticlePubMedGoogle Scholar
- Formstecher E, Aresta S, Collura V, Hamburger A, Meil A, Trehin A, Reverdy C, Betin V, Maire S, Brun C, et al.: Protein interaction mapping: a Drosophila case study. Genome Res 2005, 15(3):376–384. 10.1101/gr.2659105PubMed CentralView ArticlePubMedGoogle Scholar
- Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H, Stroedicke M, Zenkner M, Schoenherr A, Koeppen S, et al.: A human protein-protein interaction network: a resource for annotating the proteome. Cell 2005, 122(6):957–968. 10.1016/j.cell.2005.08.029View ArticlePubMedGoogle Scholar
- Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, et al.: Towards a proteome-scale map of the human protein-protein interaction network. Nature 2005, 437(7062):1173–1178. 10.1038/nature04209View ArticlePubMedGoogle Scholar
- Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams SL, Millar A, Taylor P, Bennett K, Boutilier K, et al.: Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 2002, 415(6868):180–183. 10.1038/415180aView ArticlePubMedGoogle Scholar
- Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon AM, Cruciat CM, et al.: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002, 415(6868):141–147. 10.1038/415141aView ArticlePubMedGoogle Scholar
- von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P: Comparative assessment of large-scale data sets of protein-protein interactions. Nature 2002, 417(6887):399–403. 10.1038/nature750View ArticlePubMedGoogle Scholar
- Han JD, Dupuy D, Bertin N, Cusick ME, Vidal M: Effect of sampling on topology predictions of protein-protein interaction networks. Nat Biotechnol 2005, 23(7):839–844. 10.1038/nbt1116View ArticlePubMedGoogle Scholar
- Huynen MA, Snel B, von Mering C, Bork P: Function prediction and protein networks. Curr Opin Cell Biol 2003, 15(2):191–198. 10.1016/S0955-0674(03)00009-7View ArticlePubMedGoogle Scholar
- Joyce AR, Palsson BO: The model organism as a system: integrating 'omics' data sets. Nat Rev Mol Cell Biol 2006, 7(3):198–210. 10.1038/nrm1857View ArticlePubMedGoogle Scholar
- Marcotte EM: Computational genetics: finding protein function by nonhomology methods. Curr Opin Struct Biol 2000, 10(3):359–365. 10.1016/S0959-440X(00)00097-XView ArticlePubMedGoogle Scholar
- Ng SK, Zhang Z, Tan SH, Lin K: InterDom: a database of putative interacting protein domains for validating predicted protein interactions and complexes. Nucleic acids research 2003, 31(1):251–254. 10.1093/nar/gkg079PubMed CentralView ArticlePubMedGoogle Scholar
- Lin N, Wu B, Jansen R, Gerstein M, Zhao H: Information assessment on predicting protein-protein interactions. BMC bioinformatics [electronic resource] 2004, 5: 154. 10.1186/1471-2105-5-154View ArticleGoogle Scholar
- Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 1998, 95(25):14863–14868. 10.1073/pnas.95.25.14863PubMed CentralView ArticlePubMedGoogle Scholar
- Gunsalus KC, Ge H, Schetter AJ, Goldberg DS, Han JD, Hao T, Berriz GF, Bertin N, Huang J, Chuang LS, et al.: Predictive models of molecular machines involved in Caenorhabditis elegans early embryogenesis. Nature 2005, 436(7052):861–865. 10.1038/nature03876View ArticlePubMedGoogle Scholar
- Goldberg DS, Roth FP: Assessing experimentally derived interactions in a small world. Proc Natl Acad Sci USA 2003, 100(8):4372–4376. 10.1073/pnas.0735871100PubMed CentralView ArticlePubMedGoogle Scholar
- Tong AH, Evangelista M, Parsons AB, Xu H, Bader GD, Page N, Robinson M, Raghibizadeh S, Hogue CW, Bussey H, et al.: Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science 2001, 294(5550):2364–2368. 10.1126/science.1065810View ArticlePubMedGoogle Scholar
- Tong AH, Lesage G, Bader GD, Ding H, Xu H, Xin X, Young J, Berriz GF, Brost RL, Chang M, et al.: Global mapping of the yeast genetic interaction network. Science 2004, 303(5659):808–813. 10.1126/science.1091317View ArticlePubMedGoogle Scholar
- Matthews LR, Vaglio P, Reboul J, Ge H, Davis BP, Garrels J, Vincent S, Vidal M: Identification of potential interaction networks using sequence-based searches for conserved protein-protein interactions or "interologs". Genome Res 2001, 11(12):2120–2126. 10.1101/gr.205301PubMed CentralView ArticlePubMedGoogle Scholar
- Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, Vidal M, Gerstein M: Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. Genome Res 2004, 14(6):1107–1118. 10.1101/gr.1774904PubMed CentralView ArticlePubMedGoogle Scholar
- Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M: A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 2003, 302(5644):449–453. 10.1126/science.1087361View ArticlePubMedGoogle Scholar
- Lee I, Date SV, Adai AT, Marcotte EM: A probabilistic functional network of yeast genes. Science 2004, 306(5701):1555–1558. 10.1126/science.1099511View ArticlePubMedGoogle Scholar
- Rhodes DR, Tomlins SA, Varambally S, Mahavisno V, Barrette T, Kalyana-Sundaram S, Ghosh D, Pandey A, Chinnaiyan AM: Probabilistic model of the human protein-protein interaction network. Nat Biotechnol 2005, 23(8):951–959. 10.1038/nbt1103View ArticlePubMedGoogle Scholar
- Qi Y, Bar-Joseph Z, Klein-Seetharaman J: Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Proteins 2006, 63(3):490–500. 10.1002/prot.20865PubMed CentralView ArticlePubMedGoogle Scholar
- Qi Y, Klein-Seetharaman J, Bar-Joseph Z: Random forest similarity for protein-protein interaction prediction from multiple sources. Pacific Symposium on Biocomputing 2005, 531–542.Google Scholar
- Wong SL, Zhang LV, Tong AH, Li Z, Goldberg DS, King OD, Lesage G, Vidal M, Andrews B, Bussey H, et al.: Combining biological networks to predict genetic interactions. Proceedings of the National Academy of Sciences of the United States of America 2004, 101(44):15682–15687. 10.1073/pnas.0406614101PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang LV, Wong SL, King OD, Roth FP: Predicting co-complexed protein pairs using genomic and proteomic data integration. BMC bioinformatics [electronic resource] 2004, 5: 38. 10.1186/1471-2105-5-38View ArticleGoogle Scholar
- Zhong W, Sternberg PW: Genome-wide prediction of C. elegans genetic interactions. Science 2006, 311(5766):1481–1484. 10.1126/science.1123287View ArticlePubMedGoogle Scholar
- Bader JS, Chaudhuri A, Rothberg JM, Chant J: Gaining confidence in high-throughput protein interaction networks. Nature biotechnology 2004, 22(1):78–85. 10.1038/nbt924View ArticlePubMedGoogle Scholar
- Wuchty S: Topology and weights in a protein domain interaction network – a novel way to predict protein interactions. BMC genomics [electronic resource] 2006, 7: 122. 10.1186/1471-2164-7-122Google Scholar
- Xia Y, Lu LJ, Gerstein M: Integrated prediction of the helical membrane protein interactome in yeast. Journal of molecular biology 2006, 357(1):339–349. 10.1016/j.jmb.2005.12.067View ArticlePubMedGoogle Scholar
- Brown MP, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares M Jr, Haussler D: Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA 2000, 97(1):262–267. 10.1073/pnas.97.1.262PubMed CentralView ArticlePubMedGoogle Scholar
- Bock JR, Gough DA: Predicting protein – protein interactions from primary structure. Bioinformatics 2001, 17(5):455–460. 10.1093/bioinformatics/17.5.455View ArticlePubMedGoogle Scholar
- Bradford JR, Westhead DR: Improved prediction of protein-protein binding sites using a support vector machines approach. Bioinformatics 2005, 21(8):1487–1494. 10.1093/bioinformatics/bti242View ArticlePubMedGoogle Scholar
- Chinnasamy A, Mittal A, Sung WK: Probabilistic prediction of protein-protein interactions from the protein sequences. Computers in biology and medicine 2006, 36(10):1143–1154. 10.1016/j.compbiomed.2005.09.005View ArticlePubMedGoogle Scholar
- Koike A, Takagi T: Prediction of protein-protein interaction sites using support vector machines. Protein Eng Des Sel 2004, 17(2):165–173. 10.1093/protein/gzh020View ArticlePubMedGoogle Scholar
- Lewis DP, Jebara T, Noble WS: Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure. Bioinformatics 2006, 22(22):2753–60. 10.1093/bioinformatics/btl475View ArticlePubMedGoogle Scholar
- Lo SL, Cai CZ, Chen YZ, Chung MC: Effect of training datasets on support vector machine prediction of protein-protein interactions. Proteomics 2005, 5(4):876–884. 10.1002/pmic.200401118View ArticlePubMedGoogle Scholar
- Qian J, Lin J, Luscombe NM, Yu H, Gerstein M: Prediction of regulatory networks: genome-wide identification of transcription factor targets from gene expression data. Bioinformatics 2003, 19(15):1917–1926. 10.1093/bioinformatics/btg347View ArticlePubMedGoogle Scholar
- Ramani AK, Bunescu RC, Mooney RJ, Marcotte EM: Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome. Genome Biol 2005, 6(5):R40. 10.1186/gb-2005-6-5-r40PubMed CentralView ArticlePubMedGoogle Scholar
- Lu LJ, Xia Y, Paccanaro A, Yu H, Gerstein M: Assessing the limits of genomic data integration for predicting protein networks. Genome Res 2005, 15(7):945–953. 10.1101/gr.3610305PubMed CentralView ArticlePubMedGoogle Scholar
- Bader GD, Hogue CW: An automated method for finding molecular complexes in large protein interaction networks. BMC bioinformatics [electronic resource] 2003, 4: 2. 10.1186/1471-2105-4-2View ArticleGoogle Scholar
- Peri S, Navarro JD, Amanchy R, Kristiansen TZ, Jonnalagadda CK, Surendranath V, Niranjan V, Muthusamy B, Gandhi TK, Gronborg M, et al.: Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res 2003, 13(10):2363–2371. 10.1101/gr.1680803PubMed CentralView ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000, 25(1):25–29. 10.1038/75556PubMed CentralView ArticlePubMedGoogle Scholar
- Huang TW, Tien AC, Huang WS, Lee YC, Peng CL, Tseng HH, Kao CY, Huang CY: POINT: a database for the prediction of protein-protein interactions based on the orthologous interactome. Bioinformatics 2004, 20(17):3273–3276. 10.1093/bioinformatics/bth366View ArticlePubMedGoogle Scholar
- Pagel P, Mewes HW, Frishman D: Conservation of protein-protein interactions – lessons from ascomycota. Trends Genet 2004, 20(2):72–76. 10.1016/j.tig.2003.12.007View ArticlePubMedGoogle Scholar
- Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, Bastuck S, Dumpelfeld B, et al.: Proteome survey reveals modularity of the yeast cell machinery. Nature 2006, 440(7084):631–6. 10.1038/nature04532View ArticlePubMedGoogle Scholar
- Boutros M, Kiger AA, Armknecht S, Kerr K, Hild M, Koch B, Haas SA, Consortium HF, Paro R, Perrimon N: Genome-wide RNAi analysis of growth and viability in Drosophila cells. Science 2004, 303(5659):832–835. 10.1126/science.1091266View ArticlePubMedGoogle Scholar
- Brown JA, Sherlock G, Myers CL, Burrows NM, Deng C, Wu HI, McCann KE, Troyanskaya OG, Brown JM: Global analysis of gene function in yeast by quantitative phenotypic profiling. Mol Syst Biol 2006, 2: 2006.0001. 10.1038/msb4100043PubMed CentralView ArticlePubMedGoogle Scholar
- Dudley AM, Janse DM, Tanay A, Shamir R, Church GM: A global view of pleiotropy and phenotypically derived gene function in yeast. Mol Syst Biol 2005, 1: 2005.0001. 10.1038/msb4100004PubMed CentralView ArticlePubMedGoogle Scholar
- Shlomi T, Segal D, Ruppin E, Sharan R: QPath: a method for querying pathways in a protein-protein interaction network. BMC bioinformatics [electronic resource] 2006, 7: 199. 10.1186/1471-2105-7-199View ArticleGoogle Scholar
- Ge H, Liu Z, Church GM, Vidal M: Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat Genet 2001, 29(4):482–486. 10.1038/ng776View ArticlePubMedGoogle Scholar
- Lu T, Pan Y, Kao SY, Li C, Kohane I, Chan J, Yankner BA: Gene regulation and DNA damage in the ageing human brain. Nature 2004, 429(6994):883–891. 10.1038/nature02661View ArticlePubMedGoogle Scholar
- Su AI, Cooke MP, Ching KA, Hakak Y, Walker JR, Wiltshire T, Orth AP, Vega RG, Sapinoso LM, Moqrich A, et al.: Large-scale analysis of the human and mouse transcriptomes. Proc Natl Acad Sci USA 2002, 99(7):4465–4470. 10.1073/pnas.012025199PubMed CentralView ArticlePubMedGoogle Scholar
- Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, et al.: A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA 2004, 101(16):6062–6067. 10.1073/pnas.0400782101PubMed CentralView ArticlePubMedGoogle Scholar
- Bader GD, Donaldson I, Wolting C, Ouellette BF, Pawson T, Hogue CW: BIND – The Biomolecular Interaction Network Database. Nucleic acids research 2001, 29(1):242–245. 10.1093/nar/29.1.242PubMed CentralView ArticlePubMedGoogle Scholar
- Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M: BioGRID: a general repository for interaction datasets. Nucleic acids research 2006, (34 Database):D535–539. 10.1093/nar/gkj109PubMed CentralView ArticlePubMedGoogle Scholar
- Xenarios I, Salwinski L, Duan XJ, Higney P, Kim SM, Eisenberg D: DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic acids research 2002, 30(1):303–305. 10.1093/nar/30.1.303PubMed CentralView ArticlePubMedGoogle Scholar
- Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, et al.: The COG database: an updated version includes eukaryotes. BMC bioinformatics [electronic resource] 2003, 4: 41. 10.1186/1471-2105-4-41View ArticleGoogle Scholar
- O'Brien KP, Remm M, Sonnhammer EL: Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic acids research 2005, (33 Database):D476–480.PubMed CentralView ArticlePubMedGoogle Scholar
- Barabasi AL, Oltvai ZN: Network biology: understanding the cell's functional organization. Nat Rev Genet 2004, 5(2):101–113. 10.1038/nrg1272View ArticlePubMedGoogle Scholar
- Lee HK, Hsu AK, Sajdak J, Qin J, Pavlidis P: Coexpression analysis of human genes across many microarray data sets. Genome Res 2004, 14(6):1085–1094. 10.1101/gr.1910904PubMed CentralView ArticlePubMedGoogle Scholar
- Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 2003, 13(11):2498–2504. 10.1101/gr.1239303PubMed CentralView ArticlePubMedGoogle Scholar
- Sharan R, Ideker T, Kelley B, Shamir R, Karp RM: Identification of protein complexes by comparative analysis of yeast and bacterial protein interaction data. J Comput Biol 2005, 12(6):835–846. 10.1089/cmb.2005.12.835View ArticlePubMedGoogle Scholar
- Schreier T, Kedes L, Gahlmann R: Cloning, structural analysis, and expression of the human slow twitch skeletal muscle/cardiac troponin C gene. J Biol Chem 1990, 265(34):21247–21253.PubMedGoogle Scholar
- Park Y, Hwang YP, Lee JS, Seo SH, Yoon SK, Yoon JB: Proteasomal ATPase-associated factor 1 negatively regulates proteasome activity by interacting with proteasomal ATPases. Mol Cell Biol 2005, 25(9):3842–3853. 10.1128/MCB.25.9.3842-3853.2005PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.