Reconstruction of human protein interolog network using evolutionary conserved network
© Huang et al; licensee BioMed Central Ltd. 2007
Received: 14 April 2006
Accepted: 10 May 2007
Published: 10 May 2007
The recent increase in the use of high-throughput two-hybrid analysis has generated large quantities of data on protein interactions. Specifically, the availability of information about experimental protein-protein interactions and other protein features on the Internet enables human protein-protein interactions to be computationally predicted from co-evolution events (interolog). This study also considers other protein interaction features, including sub-cellular localization, tissue-specificity, the cell-cycle stage and domain-domain combination. Computational methods need to be developed to integrate these heterogeneous biological data to facilitate the maximum accuracy of the human protein interaction prediction.
This study proposes a relative conservation score by finding maximal quasi-cliques in protein interaction networks, and considering other interaction features to formulate a scoring method. The scoring method can be adopted to discover which protein pairs are the most likely to interact among multiple protein pairs. The predicted human protein-protein interactions associated with confidence scores are derived from six eukaryotic organisms – rat, mouse, fly, worm, thale cress and baker's yeast.
Evaluation results of the proposed method using functional keyword and Gene Ontology (GO) annotations indicate that some confidence is justified in the accuracy of the predicted interactions. Comparisons among existing methods also reveal that the proposed method predicts human protein-protein interactions more accurately than other interolog-based methods.
Large-scale protein-protein interactions (PPIs) have been experimentally identified in several eukaryotic model organisms, such as Drosophila melanogaster [1–3], Caenorhabditis elegans [4, 5], and Saccharomyces cerevisiae [6–9]. Moreover, thousands of PPIs have been collected from web databases including BIND , CYGD , DIP , BioGRID , IntAct , and MINT . Although the mammalian interactions, MPPI , have been published, the amount of the data with similar scale has not been described. The large-scale set of interactions of human proteins is still hard to determine directly.
Many computational methods have been developed to predict protein-protein interactions. A phylogenetic profile method  describes the presence or absence of proteins among different organisms with sequenced genomes. Proteins have similar phylogenetic profiles, between which functional links can be detected. The gene or domain fusion method [18, 19] describes a pair of proteins encoded as separate genes in one organism and fused into a single protein in another organism. Such a pair of proteins can be inferred by the function link, particularly among metabolic pathways. In the gene neighbor or gene order method [20–22], the genes that encode two proteins are adjacent in chromosome proximity in several organisms, and are likely to be functionally linked. However, this method exploits the prevalance of operons in prokaryotes, but operons appear to be uncommon in eukaryoyes such as humans. Predictions using interologs  are based on the theory that proteins interacting in one organism co-evolve such that their respective orthologs maintain the ability to interact in another organism. The interolog concept has been applied to predict human protein interactions [23–29]. Some bioinformatics models [30, 31] have also been developed to detect interactions among proteins by probability and machine-learning methods and the literature text-mining approach [32–34] based on natural language processing. Bader et al. developed a logistic regression approach  that adopts employs statistical and topological descriptors to predict the biological relevance of PPIs obtained from high-throughput screening for yeast. Other sources of information, such as mRNA expression, genetic interactions and database annotations, are subsequently used to validate the model predictions. Lu et al. used a simple Naive Bayes classifier to integrate diverse sources of genomic evidence, ranging from co-expression relationships to phylogenetic profiling similarity .
The greatest challenge in predicting human PPIs using the interolog-based method is that the high-throughput interactions generate too many false positives when applied to phylogenetically distant organisms or lower eukaryotes , and some researchers have suggested that only 50% of yeast two-hybrid interactions are reliable . Therefore, other filtering examinations of features and scoring schema should be further considered in order to increase the confidence in the prediction of human interactions performed by the interolog-based method. This study constructs human PPI maps from six eukaryotes, namely rat, mouse, fly, worm, thale cress and baker's yeast. The quasi-clique is analyzed and determined as a relative conservation score from the protein interaction networks in each organism. The other feature scores further drawn from spatial proximity (sub-cellular localization and tissue-specificity), temporal synchronicity (cell-cycle stage) and domain-domain combinations are also inspected, to obtain human PPI networks with confidence scores.
Results and discussion
Predicted human protein interactions
Number and sources of predicted interactions inferred from each reference organism.
Predicted interologs (nr)
The experimental human PPIs and standard benchmark are limited from well-known databases and few interactions are known completely. Therefore, the absence of interactions between proteins from the experimental databases does not indicate that the interactions are negative. Given this limited knowledge, functional keyword annotation and GO term matching were tested to determine the accuracy of measurement of various interaction data sets.
Testing for true positives
Number of human interactions (true positives) successfully predicted from each reference organism in different experimental databases.
Predicted true positive interactions
Testing scoring method
Testing functional annotation
Interacting proteins commonly have similar functions. Additionally, researchers should be able to validate the functions of predicted protein pairs. The interactions predicted by the proposed method were optimized in terms of UniProt functional keyword annotations, GO 'molecular function' (MF) and GO 'biological process' (BP). Their relevant GO terms such as 'molecular function unknown', 'obsolete molecular function', 'biological process unknown' and 'obsolete biological process' were discarded.
where K a , K b are the keyword vectors of interacting protein pairs a and b, respectively. For example, in K a = [1, 0, 1, 0, 1], the presence or absence of a keyword are represented as 1 or 0, respectively. Protein self-interactions or homo-dimers tend to have high scores, and always share the same functional annotations. Hence, these interactions were eliminated from the predicted pairs to eliminate bias in the results.
where N and M denote the total number of proteins in the population, and the number of proteins that have a particular functional keyword, respectively, and n and x denote the total number of proteins in the set, and the number of proteins annotated with the particular functional keyword, respectively. Since a pair of proteins is observed, both n and x are equal to 2. A protein pair is treated as enriched by a UniProt functional if the corrected p-value is ≤ 0.05. The total of 90, 871 predicted interactions with this p-value are listed (see Additional File 1).
Testing conservation score (C) and interolog score (I)
Mean and standard error of Conservation score (C) among the different InParanoid score (IP) interaction data sets.
Mean of C score
Mean of C score
IP > 0.0(CS ≥ 4)
IP > 0.0(CS ≥ 0)
IP ≥ 0.1
IP ≥ 0.2
IP ≥ 0.3
IP ≥ 0.4
IP ≥ 0.5
IP ≥ 0.6
IP ≥ 0.7
IP ≥ 0.8
IP ≥ 0.9
IP = 1.0
Mean and standard error of Interolog score (I) among the different InParanoid score (IP) interaction data sets.
Mean of I score
Mean of I score
IP > 0.0(CS ≥ 4)
IP > 0.0(CS ≥ 0)
IP ≥ 0.1
IP ≥ 0.2
IP ≥ 0.3
IP ≥ 0.4
IP ≥ 0.5
IP ≥ 0.6
IP ≥ 0.7
IP ≥ 0.8
IP ≥ 0.9
IP = 1.0
Comparison with cut-off scores
Number of human interactions (true positives) predicted from BLAST with minimum E-value and InParanoid.
InParanoid (1-to-1 mapping)
InParanoid (1-to-many mapping, CS ≥ 0)
InParanoid (1-to-many mapping, CS ≥ 4)
Relationship between cut-off threshold and predicted human interactions (true positives).
CS ≥ 0
CS ≥ 1
CS ≥ 2
CS ≥ 3
CS ≥ 4
CS ≥ 5
CS ≥ 6
CS ≥ 7
Comparison with BLAST data sets
All of the 175, 085 known interactions (Table 1) from the six reference organisms were used in the orthology search by BLAST with minimum E-value (the E ≤ 0.005 was configured in the BLAST tool). The protein sequences were downloaded from UniProt. The InParanoid one-to-one mapping (InPranoid score = 1.0) and one-to-many mappings (InPranoid score ≥ 0.0) were also compared, as were the InParanoid data sets with threshold CS = 4. Table 5 shows the results of these predictions. Although BLAST can more true positive interologs in quantity than the InParanoid method, it also produced a higher putative ratio. The predicted and true positive ratios reveal that InParanoid can distinguish potential true orthologs. The BTP and BPU are data sets for true positive and putative predicted from BLAST mapping method, respectively. The scoring method testing results are also presented (see Figure 2 and Additional File 2).
Comparison with experimental data sets
Comparison with interolog-based approach
The proposed method was compared with other interolog-based methods for predicting human PPIs, namely HomoMINT , HPID , IPPRED , the method of Lehner et al.'s group , OPHID , POINT  and Rhodes et al.'s method .
The properties of the ortholog identification methods and other features are as follows.
An ortholog identification method indicates the orthologs between model organisms. Orthologs between organisms do not have a one-to-one relationship with BLAST search (B) or BLAST search with E-value (BE); yet one-to-many and many-to-many mappings exist. The InParanoid clustering algorithm distinguishes potential true orthologs from paralogs according to the InParanoid score (IP). Although similar structures typically share similar biological functions, the structural classification at the protein superfamily level (SS) is not trivial in the identification of structural similarities at the human protein level on the large scale.
Other features indicate that some other factors affecting their interactions are considered. The quasi-clique with maximal conservation score (C), domain-domain combinations (D), sub-cellular localization (L), cell-cycle phase (P) and tissue-specificity (T) were also carefully examined in this study. Other existing methods apply the 'biological process' (BP) and 'molecular function' (MF) annotations in the GO hierarchy.
Comparisons with other interolog-based approach for predicting human PPIs.
D, L, MF
Lehner et. al.
D, T, L
Rhodes et. al.
D, T, BP
Huang et. al.
D, T, L, P
Many predicted pairs have been identified in existing known human PPI databases (KNOWN) and the two human experimental PPIs data sets (as shown in Figure 4). The top 20 predictions that were not identified or not present in the existing databases were listed (see Additional File 3) using the proposed prediction system, and indicate that some top predicted protein interacting pairs were manifestations of their potentially physical interactions. For example, for the top 1 PLK1 and STK6 interaction, PLK1 (polo-likekinase1) has just been reported this year that it interacts with Aurora-B in playing critical roles in the regulation of chromosomal dynamics . STK6 is also known as Aurora-A. The kinase domains of Aurora-A and Aurora-B share more than 70% of their sequence data. Most importantly, in 3D structure, they are likely to share partially similar surface features . Therefore, the interaction of Aurora-A (i.e., STK6) with PLK1 (top1 interaction) is not surprising. ORC1, the origin recognition complex protein, binds specifically to origins of replication, and serves as a platform for the assembly of additional initial factors including MCM and CDC6 proteins. MCM proteins form a hexameric structure complex with 6 subunits, namely MCM2, MCM3, MCM4, MCM5, MCM6 and MCM7 . To date, ORC1 been confirmed to interact with MCM2 and MCM7. ORC1 can also be reasonably expected to interact with MCM4 (top 2 interaction) and MCM6 (top 5 interaction), because they are all localized in a complex or origin recognition site. Furthermore, since MCM proteins form a hexamer, MCM5 can reasonably be expected to interact with MCM6 (top 3 interaction), and MCM5 can be expected to interact with MCM4 (top 4 interaction). These findings reveal that constructing a protein-protein interaction network allows novel interacting proteins to be identified. All proteins of the prediction pairs are linked to a human disease in the OMIM database  whenever possible (see Additional File 3). Therefore, the interaction network can be further extended through these annotated disease-associated proteins. Moreover, these predicted interactions have high conservation (C) and interolog (I) scores (Table 3 and Table 4, respectively), revealing that these interactions are evolutionarily conserved across species.
Important high-throughput approaches such as yeast two-hybrid have recently been applied to systematically identify PPIs in humans (Figure 4). Surprisingly, the experimental results of the proposed and high-throughput methods did not overlap significantly, indicating that different biases exist because of the approaches applied to detect interactions. Hence, two methods (interolog-based and experimental methods) may indicate different and partial sub networks of the complete human-protein interaction network.
The accuracy of the predicted interactions depends mainly on the quality and completeness of the reference model organism interaction data sets. Although only a subset of the known interactions in the human interaction network can currently be accurately predicted (Table 2), the accuracy can be improved by large-scale protein interaction data in 'higher' eukaryotic reference model organisms in the future. The orthologous relationship between sequence and function is difficult to evaluate, because no clear measurement of functional similarity between any pair of proteins is made. Many one-to-many and many-to-many mappings exist across species, and can be used to identify protein orthologs. The InParanoid algorithm was applied because several proteins from so-called 'lower' eukaryotes have many co-orthologs in humans, and can be identified using InParanoid, but not with a simple one-to-one sequence similarity search based on BLAST or structural classification at the protein superfamily level.
The Interolog  concept was previously proposed to predict C. elegans PPIs from yeast. This study presents 'Interolog' as a concrete method for predicting human PPIs from those of six 'lower' eukaryotes. However, high-throughput interactions with false positives and false negatives have been noted in some eukaryotes . This study utilized other features and scoring schema to derive the confidence with which human interactions are predicted using the interolog-based method. Computational analysis can be applied to determine conservation scores and other feature scores, and is readily extensible to any newly sequenced genomes. Users can construct many genome-wide PPI networks with high confidence using interolog mapping and the proposed scoring method. This concept can also be applied to discover transcription networks, such as simultaneous protein-DNA and protein-protein interaction networks .
The evolution of PPIs from the relative conservation score is comprehensively assessed by finding a quasi-clique from protein networks. However, PPIs in biological organisms are complex, and do not depend only on a single feature, such as protein structural complementarity, gene proximity or co-evolution.
Moreover, some other protein interaction features, including sub-cellular localization, tissue specificity, cell-cycle stage and domain-domain combinations, are also critical factors to be considered. This study describes a scoring method based on integrating these heterogeneous but significant biological resources to prioritize human protein-protein interacting networks. The analytical results indicate that the proposed method can predict potential human PPIs with higher confidence than the other methods studied (Figure 2). The analytical results also reveal that some correlations exist between the true positive data set and the data set produced by the proposed method (Figure 3). Furthermore, the conservation score of a true positive interaction data set is higher that the score of the putative interaction data set (Table 2). Additionally, the proposed method allows researchers to identify quantitatively, rather than simply qualitatively, how (functional domain), when (cell cycle stage) and where (cellular compartment and tissue specificity) the two proteins interact, using a confidence score.
Some studies have been published on the experimental derivation of PPIs and so does the in silico PPIs. Examples of topics examined include domain-domain co-occurrence [31, 47, 48], gene co-expression as shown by microarrays [49–52] and co-localization to the same sub-cellular compartment using Gene Ontology cellular component terms [35, 38, 53, 54]. The combination of such evidence can support a broader range of PPIs than the predicted results from any single feature.
InParanoid score (IP)
Number of interologs and true positives predicted by InParanoid score (IP) without other feature scores.
IP > 0.0
IP > 0.1
IP ≥ 0.2
IP ≥ 0.3
IP ≥ 0.4
IP ≥ 0.5
IP ≥ 0.6
IP ≥ 0.7
IP ≥ 0.8
IP ≥ 0.9
IP = 1.0
Quasi-clique and conservation score (C)
Let G = (V, E) denote a graph, where V is the set of vertices, and E is the set of edges in graph G. A graph is γ-dense, such that γ = 2 |E|/|V| (|V| - 1). For a subset S ⊆ V, G S is the sub-graph induced by S. A quasi-clique, also called a γ-clique S, is a subset of G, such that the induced graph G S is connected and γ-clique. The original maximum problem γ-clique S is to find a 1-clique, complete sub-graph (γ = 1) with maximum vertices in graph G.
A quasi-clique in PPI networks is a group of proteins that tend to interact with each other, but a complete sub-graph (γ = 1) is not always biologically significant. Hence, C = γ |E| is defined as the protein complex conservation score. The value of |E| is the functional links of a protein complex.
Interolog score (I)
Number and sources of model organism interaction data sets.
Rual et, al
Stelzl et, al
The weight of evolutionary conservation (w ec ) is defined such that a higher w ec value indicates an organism that is genetically closer to humans. The following w ec values were considered: wrat = 1.0, wmouse = 1.0, wfly = 0.75, wworm = 0.75, wthalecress = 0.5 and wyeast = 0.25 for rat, mouse, fly, worm, thale cress and baker's yeast, respectively. Because rat and mouse are both mammals, and are thus genetically closest to human, they were assigned the highest value of 1.0. Drosophila and C. elegans are two animal models that are widely studied to understand human disease genes and development, and are ranked second closest to humans among the organisms studied. Finally, thale cress is sorted in higher order than yeast, since it is multi-cellular organism, while yeast is a single-cell species. If a pair of human protein interactions is derived from two or more reference model organisms, then only the highest interolog score is used to generate non-redundant (nr) human protein-protein interactions.
Domain-domain combination score (D)
where pdi and pdj are sets i and j in the power set PDd, respectively, and N' (pdi, pdj) and N (pdi, pdj) are the number of interacting protein pairs and the total number of protein pairs that contain (pdi, pdj) in known interactions, respectively.
Tissue specificity score (T)
The tissue specificity is another spatial proximity value to be considered. Two proteins that are activated at the same sub-cellular localization, and co-expressed in the same tissue, are likely to interact with each other. This information can be used to discover tissue-specific PPIs associated with human diseases for biomedical research. Tissue-specific gene expression information was extracted from the GeneAtlas Affymetrix data set, which includes 44, 775 human probe sets (30, 694 proteins) from 79 normal human tissue samples .
where e A i and e B i are the normalized expression values of proteins A and B, respectively, in tissue sample i, and and are the mean expression values of proteins A and B, respectively, under 79 tissue samples.
Sub-cellular localization score (L)
The physical PPI requires contact between two proteins at certain cellular locations. Hence, this study used the Gene Ontology (GO)  annotation in the deep 'Cellular Component' (CC) hierarchy, discarding irrelevant GO terms such as 'cellular component unknown' and 'obsolete cellular component'.
If two interacting proteins share a common ancestor of the GO term, then L is the sub-cellular localization score, which is the deepest level number of the common GO term among ancestor terms (including itself) in the GO hierarchy. For example, a protein pair (A, B) has the GO cellular component annotation 'GO:0005623 cell' and 'GO:0005819 spindle' at depths of 2 and 8, respectively. The sub-cellular localization score L = 2 since the deepest level of common GO term among ancestors is at a depth of 2 in the GO hierarchy. Figure 8 shows the detailed hierarchy.
Cell-cycle stage score (P)
Number of human cell cycle-regulated proteins at different phases.
Cell cycle stage
Confidence score (CS)
In this scoring scheme, all data sources are weighted equally: w I = 1, w D = 1, w T = 1, w L = 1 and w P = 1. Moreover, the confidence score CS = 4, as derived by recall ratio ≥ 50% (Table 6).
Human known interaction data set obtained from well-known databases.
The overlapping of KNOWN2 and our predicted data set when confidence score CS > 0.
PU0 is the all-predicted data set absent from TP0 when confidence score CS > 0.
The overlap of KNOWN2 and the predicted data set when confidence score CS ≥ 4.
The all predicted data set absent from TP4 when confidence score CS ≥ 4.
The overlap of KNOWN2 and BLAST predicted data sets.
The all BLAST predicted data set absent from BTP.
Random interaction data sets with the same number of TP4 interactions.
Molecular function in Gene Ontology categories.
Biological process in Gene Ontology categories.
- IP :
- I :
- D :
Domain-domain combination score.
- T :
Tissue specific score.
- L :
Sub-cellular localization score.
- P :
Cell-cycle stage score.
- CS :
The authors would like to thank AfCS-Nature, BIND, BioGRID, CYGD, DIP, Gene Ontology, HPRD, IntAct, InParanoid, MINT, MPPI, Pfam and UniProt for their publicly accessible databases, which provided the foundation for this study. This research was partial supported by the National Research Program for Genomic Medicine, National Science Council, Taiwan (NSC95-3112-B-001-003).
- Giot L, Bader JS, Brouwer C, Chaudhuri A, Kuang B, Li Y, Hao YL, Ooi CE, Godwin B, Vitols E, Vijayadamodar G, Pochart P, Machineni H, Welsh M, Kong Y, Zerhusen B, Malcolm R, Varrone Z, Collis A, Minto M, Burgess S, McDaniel L, Stimpson E, Spriggs F, Williams J, Neurath K, Ioime N, Agee M, Voss E, Furtak K, Renzulli R, Aanensen N, Carrolla S, Bickelhaupt E, Lazovatsky Y, DaSilva A, Zhong J, Stanyon CA, Finley JRL, White KP, Braverman M, Jarvie T, Gold S, Leach M, Knight J, Shimkets RA, McKenna MP, Chant J, Rothberg JM: A protein interaction map of Drosophila melanogaster. Science 2003, 302(5651):1727–36.View ArticlePubMedGoogle Scholar
- Stanyon CA, Liu G, Mangiola BA, Patel N, Giot L, Kuang B, Zhang H, Zhong J, Finley RL: A Drosophila protein-interaction map centered on cell-cycle regulators. Genome Biol 2004, 5(12):R96.PubMed CentralView ArticlePubMedGoogle Scholar
- Formstecher E, Aresta S, Collura V, Hamburger A, Meil A, Trehin A, Reverdy C, Betin V, Maire S, Brun C, Jacq B, Arpin M, Bellaiche Y, Bellusci S, Benaroch P, Bornens M, Chanet R, Chavrier P, Delattre O, Doye V, Fehon R, Faye G, Galli T, Girault JA, Goud B, de Gunzburg J, Johannes L, Junier MP, Mirouse V, Mukherjee A, Papadopoulo D, Perez F, Plessis A, Ross C, Saule S, Stoppa-Lyonnet D, Vincent A, White M, Legrain P, Wojcik J, Camonis J, Daviet L: Protein interaction mapping: a Drosophila case study. Genome Res 2005, 15(3):376–384.PubMed CentralView ArticlePubMedGoogle Scholar
- Li S, Armstrong CM, Bertin N, Ge H, Milstein S, Boxem M, Vidalain PO, Han JD, Chesneau A, Hao T, Goldberg DS, Li N, Martinez M, Rual JF, Lamesch P, Xu L, Tewari M, Wong SL, Zhang LV, Berriz GF, Jacotot L, Vaglio P, Reboul J, Hirozane-Kishikawa T, Li Q, Gabel HW, Elewa A, Baumgartner B, Rose DJ, Yu H, Bosak S, Sequerra R, Fraser A, Mango SE, Saxton WM, Strome S, Van Den Heuvel S, Piano F, Vandenhaute J, Sardet C, Gerstein M, Doucette-Stamm L, Gunsalus KC, Harper JW, Cusick ME, Roth FP, Hill DE, Vidal M: A map of the interactome network of the metazoan C. elegans. Science 2004, 303(5657):540–3.PubMed CentralView ArticlePubMedGoogle Scholar
- Walhout AJ, Boulton SJ, Vidal M: Yeast two-hybrid systems and protein interaction mapping projects for yeast and worm. Yeast 2000, 17(2):88–94.PubMed CentralView ArticlePubMedGoogle Scholar
- Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon AM, Cruciat CM, Remor M, Hofert C, Schelder M, Brajenovic M, Ruffner H, Merino A, Klein K, Hudak M, Dickson D, Rudi T, Gnau V, Bauch A, Bastuck S, Huhse B, Leutwein C, Heurtier MA, Copley RR, Edelmann A, Querfurth E, Rybin V, Drewes G, Raida M, Bouwmeester T, Bork P, Seraphin B, Kuster B, Neubauer G, Superti-Furga G: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002, 415(6868):141–7.View ArticlePubMedGoogle Scholar
- Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams SL, Millar A, Taylor P, Bennett K, Boutilier K, Yang L, Wolting C, Donaldson I, Schandorff S, Shewnarane J, Vo M, Taggart J, Goudreault M, Muskat B, Alfarano C, Dewar D, Lin Z, Michalickova K, Willems AR, Sassi H, Nielsen PA, Rasmussen KJ, Andersen JR, Johansen LE, Hansen LH, Jespersen H, Podtelejnikov A, Nielsen E, Crawford J, Poulsen V, Sorensen BD, Matthiesen J, Hendrickson RC, Gleeson F, Pawson T, Moran MF, Durocher D, Mann M, Hogue CW, Figeys D, Tyers M: Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 2002, 415(6868):180–3.View ArticlePubMedGoogle Scholar
- Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y: A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA 2001, 98(8):4569–74.PubMed CentralView ArticlePubMedGoogle Scholar
- Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, Qureshi-Emili A, Li Y, Godwin B, Conover D, Kalbfleisch T, Vijayadamodar G, Yang M, Johnston M, Fields S, Rothberg JM: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 2000, 403(6770):623–7.View ArticlePubMedGoogle Scholar
- Bader GD, Betel D, Hogue CWV: BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res 2003, 31: 248–250.PubMed CentralView ArticlePubMedGoogle Scholar
- Guldener U, Munsterkotter M, Kastenmuller G, Strack N, van Helden J, Lemer C, Richelles J, Wodak SJ, Garcia-Martinez J, Perez-Ortin JE, Michael H, Kaps A, Talla E, Dujon B, Andre B, Souciet JL, De Montigny J, Bon E, Gaillardin C, Mewes HW: CYGD: the Comprehensive Yeast Genome Database. Nucleic Acids Res 2005, (33 Database):D364–8.
- Xenarios I, Rice DW, Salwinski L, Baron MK, Marcotte EM, Eisenberg D: DIP: the database of interacting proteins. Nucleic Acids Res 2000, 28: 289–291.PubMed CentralView ArticlePubMedGoogle Scholar
- Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M: BioGRID: a general repository for interaction datasets. Nucleic Acids Res 2006, (34 Database):D535-D539.
- Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S, Orchard S, Vingron M, Roechert B, Roepstorff P, Valencia A, Margalit H, Armstrong J, Bairoch A, Cesareni G, Sherman D, Apweiler R: IntAct: an open source molecular interaction database. Nucleic Acids Res 2004, (32 Database):D452–5.
- Zanzoni A, Montecchi-Palazzi L, Quondam M, Ausiello G, Helmer-Citterich M, Cesareni G: MINT: a Molecular INTeraction database. FEBS Lett 2002, 513: 135–40.View ArticlePubMedGoogle Scholar
- Pagel P, Kovac S, Oesterheld M, Brauner B, Dunger-Kaltenbach I, Frishman G, Montrone C, Mark P, Stumpflen V, Mewes HW, Ruepp A, Frishman D: The MIPS mammalian protein-protein interaction database. Bioinformatics 2005, 21(6):832–4.View ArticlePubMedGoogle Scholar
- Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO: Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA 1999, 96(8):4285–8.PubMed CentralView ArticlePubMedGoogle Scholar
- Enright AJ, Iliopoulos I, Kyrpides NC, Ouzounis CA: Protein interaction maps for complete genomes based on gene fusion events. Nature 1999, 402(6757):86–90.View ArticlePubMedGoogle Scholar
- Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D: Detecting protein function and protein-protein interactions from genome sequences. Science 1999, 285(5428):751–3.View ArticlePubMedGoogle Scholar
- Dandekar T, Snel B, Huynen M, Bork P: Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci 1998, 23(9):324–8.View ArticlePubMedGoogle Scholar
- Overbeek R, Fonstein M, D'Souza M, Pusch GD, Maltsev N: The use of gene clusters to infer functional coupling. Proc Natl Acad Sci USA 1999, 96(6):2896–901.PubMed CentralView ArticlePubMedGoogle Scholar
- Tamames J, Casari G, Ouzounis C, Valencia A: Conserved clusters of functionally related genes in two bacterial genomes. J Mol Evol 1997, 44: 66–73.View ArticlePubMedGoogle Scholar
- Brown KR, Jurisica I: Online predicted human interaction database. Bioinformatics 2005, 21(9):2076–82.View ArticlePubMedGoogle Scholar
- Goffard N, Garcia V, Iragne F, Groppi A, De Daruvar A: IPPRED: server for proteins interactions inference. Bioinformatics 2003, 19(7):903–4.View ArticlePubMedGoogle Scholar
- Han K, Park B, Kim H, Hong J, Park J: HPID: the Human Protein Interaction Database. Bioinformatics 2004, 20(15):2466–70.View ArticlePubMedGoogle Scholar
- Huang TW, Tien AC, Huang WS, Lee YC, Peng CL, Tseng HH, Kao CY, Huang CY: POINT: a database for the prediction of protein-protein interactions based on the orthologous interactome. Bioinformatics 2004, 20(17):3273–6.View ArticlePubMedGoogle Scholar
- Lehner B, Fraser AG: A first-draft human protein-interaction map. Genome Biol 2004, 5(9):R63.PubMed CentralView ArticlePubMedGoogle Scholar
- Persico M, Ceol A, Gavrila C, Hoffmann R, Florio A, Cesareni G: HomoMINT: an inferred human network based on orthology mapping of protein interactions discovered in model organisms. BMC Bioinformatics 2005, 6(Suppl 4):S21.PubMed CentralView ArticlePubMedGoogle Scholar
- Rhodes DR, Tomlins SA, Varambally S, Mahavisno V, Barrette T, Kalyana-Sundaram S, Ghosh D, Pandey A, Chinnaiyan AM: Probabilistic model of the human protein-protein interaction network. Nat Biotechnol 2005, 23(8):951–9.View ArticlePubMedGoogle Scholar
- Dohkan S, Koike A, Takagi T: Prediction of protein-protein interactions using support vector machines. Bioinformatics and Bioengineering, 2004. BIBE 2004. Proceedings. Fourth IEEE Symposium on 2004, 576–583.View ArticleGoogle Scholar
- Han DS, Kim HS, Jang WH, Lee SD, Suh JK: PreSPI: a domain combination based prediction system for protein-protein interaction. Nucleic Acids Res 2004, 32(21):6312–20.PubMed CentralView ArticlePubMedGoogle Scholar
- Donaldson I, Martin J, de Bruijn B, Wolting C, Lay V, Tuekam B, Zhang S, Baskin B, Bader GD, Michalickova K, Pawson T, Hogue CW: PreBIND and Textomy-mining the biomedical literature for protein-protein interactions using a support vector machine. BMC Bioinformatics 2003, 4: 11.PubMed CentralView ArticlePubMedGoogle Scholar
- Huang M, Zhu X, Hao Y, Payan DG, Qu K, Li M: Discovering patterns to extract protein-protein interactions from full texts. Bioinformatics 2004, 20(18):3604–12.View ArticlePubMedGoogle Scholar
- Ramani AK, Bunescu RC, Mooney RJ, Marcotte EM: Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome. Genome Biol 2005, 6(5):R40.PubMed CentralView ArticlePubMedGoogle Scholar
- Bader JS, Chaudhuri A, Rothberg JM, Chant J: Gaining confidence in high-throughput protein interaction networks. Nat Biotechnol 2004, 22: 78–85.View ArticlePubMedGoogle Scholar
- Lu LJ, Xia Y, Paccanaro A, Yu H, Gerstein M: Assessing the limits of genomic data integration for predicting protein networks. Genome Res 2005, 15(7):945–953.PubMed CentralView ArticlePubMedGoogle Scholar
- von Mering C, Krause R, Snel B, Cornell M, Oliver S, Fields S, Bork P: Comparative assessment of large-scale data sets of protein-protein interactions. Nature 2002, 417(6887):399–403.View ArticlePubMedGoogle Scholar
- Sprinzak E, Sattath S, Margalit H: How reliable are experimental protein-protein interaction data? J Mol Biol 2003, 327(5):919–23.View ArticlePubMedGoogle Scholar
- Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, Klitgord N, Simon C, Boxem M, Milstein S, Rosenberg J, Goldberg DS, Zhang LV, Wong SL, Franklin G, Li S, Albala JS, Lim J, Fraughton C, Llamosas E, Cevik S, Bex C, Lamesch P, Sikorski RS, Vandenhaute J, Zoghbi HY, Smolyar A, Bosak S, Sequerra R, Doucette-Stamm L, Cusick ME, Hill DE, Roth FP, Vidal M: Towards a proteome-scale map of the human protein-protein interaction network. Nature 2005, 437(7062):1173–8.View ArticlePubMedGoogle Scholar
- Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H, Stroedicke M, Zenkner M, Schoenherr A, Koeppen S, Timm J, Mintzlaff S, Abraham C, Bock N, Kietzmann S, Goedde A, Toks?z E, Droege A, Krobitsch S, Korn B, Birchmeier W, Lehrach H, Wanker EE: A human protein-protein interaction network: a resource for annotating the proteome. Cell 2005, 122(6):957–68.View ArticlePubMedGoogle Scholar
- Liu Y, Liu N, Zhao H: Inferring protein-protein interactions through high-throughput interaction data from diverse organisms. Bioinformatics 2005, 21(15):3279–3285.View ArticlePubMedGoogle Scholar
- Goto H, Kiyono T, Tomono Y, Kawajiri A, Urano T, Furukawa K, Nigg EA, Inagaki M: Complex formation of Plk1 and INCENP required for metaphase-anaphase transition. Nat Cell Biol 2006, 8(2):C180–187.View ArticleGoogle Scholar
- Bayliss R, Sardon T, Ebert J, Lindner D, Vernos I, Conti E: Determinants for Aurora-A activation and Aurora-B discrimination by TPX2. Cell Cycle 2004, 3(4):404–407.View ArticlePubMedGoogle Scholar
- Bell SP, Dutta A: DNA replication in eukaryotic cells. Annu Rev Biochem 2002, 71: 333–374.View ArticlePubMedGoogle Scholar
- Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA: Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 2005, (33 Database):D514–7.
- Nariai N, Tamada Y, Imoto S, Miyano S: Estimating gene regulatory networks and protein-protein interactions of Saccharomyces cerevisiae from multiple genome-wide data. Bioinformatics 2005, 21(Suppl 2):ii206-ii212.View ArticlePubMedGoogle Scholar
- Deng M, Mehta S, Sun F, Chen T: Inferring domain-domain interactions from protein-protein interactions. Genome Res 2002, 12(10):1540–8.PubMed CentralView ArticlePubMedGoogle Scholar
- Sprinzak E, Margalit H: Correlated sequence-signatures as markers of protein-protein interaction. J Mol Biol 2001, 311(4):681–92.View ArticlePubMedGoogle Scholar
- Bader GD, Hogue CWV: Analyzing yeast protein-protein interaction data obtained from different sources. Nat Biotechnol 2002, 20(10):991–7.View ArticlePubMedGoogle Scholar
- Deane CM, Salwinski L, Xenarios I, Eisenberg D: Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol Cell Proteomics 2002, 1(5):349–356.View ArticlePubMedGoogle Scholar
- Deng M, Zhang K, Mehta S, Chen T, Sun F: Prediction of protein function using protein-protein interaction data. J Comput Biol 2003, 10(6):947–60.View ArticlePubMedGoogle Scholar
- Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M: A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 2003, 302(5644):449–53.View ArticlePubMedGoogle Scholar
- Jansen R, Greenbaum D, Gerstein M: Relating whole-genome expression data with protein-protein interactions. Genome Res 2002, 12: 37–46.PubMed CentralView ArticlePubMedGoogle Scholar
- Kumar A, Agarwal S, Heyman JA, Matson S, Heidtman M, Piccirillo S, Umansky L, Drawid A, Jansen R, Liu Y, Cheung KH, Miller P, Gerstein M, Roeder GS, Snyder M: Subcellular localization of the yeast proteome. Genes Dev 2002, 16(6):707–19.PubMed CentralView ArticlePubMedGoogle Scholar
- Remm M, Storm CE, Sonnhammer EL: Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol 2001, 314(5):1041–52.View ArticlePubMedGoogle Scholar
- Vespignani A: Evolution thinks modular. Nat Genet 2003, 35(2):118–9.View ArticlePubMedGoogle Scholar
- Wuchty S, Oltvai ZN, Barabasi AL: Evolutionary conservation of motif constituents in the yeast protein interaction network. Nat Genet 2003, 35(2):176–9.View ArticlePubMedGoogle Scholar
- Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW: Evolutionary rate in the protein interaction network. Science 2002, 296(5568):750–2.View ArticlePubMedGoogle Scholar
- AfCS-Nature Signaling Gateway[http://www.signaling-gateway.org/data/Y2H/cgi-bin/y2h.cgi]
- Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats C, Eddy SR: The Pfam protein families database. Nucleic Acids Res 2004, (32 Database):D138–41.
- Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB: A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA 2004, 101(16):6062–7.PubMed CentralView ArticlePubMedGoogle Scholar
- Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, Richter J, Rubin GM, Blake JA, Bult C, Dolan M, Drabkin H, Eppig JT, Hill DP, Ni L, Ringwald M, Balakrishnan R, Cherry JM, Christie KR, Costanzo MC, Dwight SS, Engel S, Fisk DG, Hirschman JE, Hong EL, Nash RS, Sethuraman A, Theesfeld CL, Botstein D, Dolinski K, Feierbach B, Berardini T, Mundodi S, Rhee SY, Apweiler R, Barrell D, Camon E, Dimmer E, Lee V, Chisholm R, Gaudet P, Kibbe W, Kishore R, Schwarz EM, Sternberg P, Gwinn M, Hannick L, Wortman J, Berriman M, Wood V, de la Cruz N, Tonellato P, Jaiswal P, Seigfried T, White R: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 2004, (32 Database):D258–61.
- Whitfield ML, Sherlock G, Saldanha AJ, Murray JI, Ball CA, Alexander KE, Matese JC, Perou CM, Hurt MM, Brown PO, Botstein D: Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol Biol Cell 2002, 13(6):1977–2000.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.