Volume 8 Supplement 4
Using indirect protein interactions for the prediction of Gene Ontology functions
© Chua et al; licensee BioMed Central Ltd. 2007
Published: 22 May 2007
Protein-protein interaction has been used to complement traditional sequence homology to elucidate protein function. Most existing approaches only make use of direct interactions to infer function, and some have studied the application of indirect interactions for functional inference but are unable to improve prediction performance. We have previously proposed an approach, FS-Weighted Averaging, which uses topological weighting and level-2 indirect interactions (protein pairs connected via two interactions) for predicting protein function from protein interactions and have found that it yields predictions with superior precision on yeast proteins over existing approaches. Here we study the use of this technique to predict functional annotations from the Gene Ontology for seven genomes: Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, Rattus norvegicus, Mus musculus, and Homo sapiens.
Our analysis shows that protein-protein interactions provide supplementary coverage over sequence homology in the inference of protein function and is definitely a complement to sequence homology. We also find that FS-Weighted Averaging consistently outperforms two classical approaches, Neighbor Counting and Chi-Square, across the seven genomes for all three categories of the Gene Ontology. By randomly adding and removing interactions from the interactions, we find that Weighted Averaging is also rather robust against noisy interaction data.
We have conducted a comprehensive study over seven genomes. We conclude that FS-Weighted Averaging can effectively make use of indirect interactions to make the inference of protein functions from protein interactions more effective. Furthermore, the technique is general enough to work over a variety of genomes.
Although sequence similarity search has proven useful in many cases, it has fundamental limitations. First, only a fraction of newly discovered sequences have identifiable homologous genes in current databases. Second, the most prominent vertebrate organisms in GenBank have only a fraction of their genomes present in finished sequences. New bioinformatics methods allow inference of protein function using "associative analysis" of functional properties to complement traditional sequence homology-based methods. Associative properties that have been used to infer function not evident from sequence homology include: co-occurrence of proteins in operons or genome context [1–3]; proteins sharing common domains in fusion proteins [4–6]; proteins with similar phylogenetic profiles [7, 8]; proteins with correlated gene expression patterns ; and so on. Many approaches have also been proposed for utilizing protein-protein interaction data for functional inference [10–18]. A simple but effective approach is to assign a protein with the function that occurs most frequently in its interaction partners . This is further improved in , which predicts function based on chi-square statistics instead of frequency. Some approaches apply machine-learning methods such as clustering and support vector machines [12–14]. Others apply global optimization techniques, such as Markov random fields [15, 16] and simulated annealing , to propagate predictions so that the function of proteins without characterized neighbors can be predicted. Most of these approaches use the observation that a protein often shares functions with proteins that interact with it (i.e., its level-1 neighbors). However, proteins that interact with the same proteins (i.e., level-2 neighbors) may also have a greater likelihood of sharing similar physical or biochemical characteristics with the target protein. In a previous study , we investigated interactions between proteins from the Saccharomyces cerevisiae (bakers' yeast) genome from the General Repository of Interaction Datasets (GRID) . We observed that there are proteins that do not share any function with their immediate interaction partners (i.e., level-1 neighbors, S1) and yet share some function similarity with the interaction partners of their immediate partners (i.e., level-2 neighbors, S2). Out of 4,162 annotated yeast proteins, only 1,999, or 48.0%, share some function with its level-1 neighbors. Of the remaining proteins, 943, or 22.7% of the annotated proteins, share some similarity with at least one of its level-2 neighbors. Less than 2% of the annotated proteins share functions exclusively with level-1 neighbors.
With the assumption that there is no unobserved interaction or annotation, we proposed indirect functional association as a reasonable explanation for this observation . Such an indirect functional association can be considered as an instance of the "guilt by association" principle – the common "property" between the level-2 neighbors and the target protein that is used for deriving the "association" is precisely the set of proteins that they both interact with, namely the level-1 neighbors. It is plausible that two proteins that interact with a common set of proteins have a good likelihood of sharing similar physical or biochemical characteristics, and thus exhibit a common function.
While level-2 neighbors may be used to provide greater coverage during function inference, they contain too many false positives to be useful. In order to reduce the impact of these false positives, we devised a topological weighting measure, Functional Similarity Weight (FS-Weight), which can be used to identify both direct and indirect (level-2) neighbors that are more likely to share functions. FS-Weight improves the precision of function inference, while the inclusion of FS-weighted level-2 neighbors improves both sensitivity and precision. A new method, FS-Weighted Averaging, which incorporates indirect interactions and FS-Weight, was shown to perform better than existing approaches in predicting protein functions for S. cerevisiae from protein interactions .
Here we study how FS-Weighted Averaging performs in predicting protein functions from the protein-protein interaction maps of seven genomes. We also study how the approach performs on noisy interaction data and on predicted interactions. Finally, we show some examples of novel functions predicted for uncharacterized proteins in the S. cerevisiae genome and study the predictions that are biologically significant.
Coverage of protein-protein interactions
To appreciate the feasibility of protein-protein interactions for function discovery, we want to find out whether protein-protein interactions provide any additional coverage over sequence homology. We look at two well-studied genomes, S. cerevisiae and D. melanogaster, and examine: 1) how many known functions can be inferred from other proteins with sequence similarity in the genome; 2) how many more functions can be suggested from interaction partners on top of (1); and 3) how many more functions can be suggested from indirect interaction partners on top of (1) and (2).
Effectiveness of FS-Weight
Statistics of interaction data from seven genomes
Interactions involving annotated proteins
Avg. no. of annotated neighbors per protein
S. cerevisiae (predicted)
Pearson's coefficient between FS-Weight and function sharing likelihood for each genome and GO category
Function prediction using FS-Weighted Averaging
Using Neighbor Counting and Chi-Square as a benchmark, we want to study the prediction performance of FS-Weighted Averaging over a variety of different genomes. These genomes vary greatly in the availability of annotations and interaction data, which provides a good setup to study the strengths and limitations of the technique. Ten-fold cross validation is performed on each genome using Neighbor Counting, Chi-Square, and FS-Weighted Averaging. Proteins with known functions are randomly divided into 10 groups. In each run, the annotations for one group of proteins will be hidden, and their functions predicted based on the annotations for the remaining groups and the interaction data. The hidden annotations are not available to any preprocessing steps, including reliability estimation and the weighting of interactions. The predictions from the three methods are validated and compared using the two criteria, precision-recall analysis and receiver operating characteristics, as outlined in the Methods section. Only informative GO terms (see Methods) are used for validation.
Receiver operating characteristics
Function prediction with predicted protein-protein interactions
Robustness of FS-Weighted Averaging against noise and missing data
FS-Weighted Averaging is designed to take into account the fact that interaction data can be rather noisy . As mentioned in the Methods section, the FS-Weight measure incorporates two forms of countermeasure against noisy interaction data – estimation of the reliability of experimental sources and topological weight. Here we want to study how well the method can perform against noisy data. To simulate noise in interactions, we take the current interaction data and contaminate it with different forms of synthetic noise. We perform this analysis on the S. cerevisiae genome since it has a denser interaction data and more complete functional annotations. To differentiate the effects of different types of noise, we modify the interaction data by 1) adding random interactions and 2) randomly removing interactions. Random additions reflect false positives in experimental sources, while random deletions reflect missing data or detection failures. Real protein-protein interactions are likely to include both forms of noise.
Identifying functions better predicted with indirect neighbors
For all the four genomes in Figure 4, most points on the graph lie above the diagonal line, indicating that FS-Weighted Averaging performed better than Neighbor Counting for most of these GO terms.
To identify GO terms on which FS-Weighted Averaging can achieve the best relative performance over Neighbor Counting, we first select only level-4 GO terms that appear in at least two of the four genomes. For each term, we define a score that reflects the relative ROC score of FS-Weighted Averaging against Neighbor Counting as follows:
where ROC WA (x) is the ROC score of the predictions made by FS-Weighted Averaging; and ROC NC (x) is the ROC score of the predictions made by Neighbor Counting.
Level-4 GO terms annotated to at least 30 proteins in at least two genomes with the top five average FL2 scores for each category of the Gene Ontology
Avg. FL2 score
Regulation of kinase activity
Regulation of biosynthesis
Cellular macromolecule metabolism
Response to pest, pathogen, or parasite
Phosphotransferase activity, alcohol group as acceptor
Transcription factor activity
Transcription cofactor activity
Calcium ion binding
Eukaryotic 48S initiation complex
Eukaryotic 43S preinitiation complex
Intrinsic to plasma membrane
Intracellular non-membrane-bound organelle
We have shown that FS-Weighted Averaging consistently outperforms Neighbor Counting and Chi-Square in function prediction for the various genomes. By incorporating interaction reliability, topological weight, and indirect interactions, the method can predict more functions with higher precision in all three categories of the Gene Ontology. It is also reasonably resilient against interaction noise, maintaining consistent prediction performance even when a large number of interactions are randomly added to the interaction data. However, we have also seen that the performance edge of FS-Weighted Averaging is less significant in the genomes with sparse interaction networks and also less significant when the interaction network is made sparser by random deletions. This is due to two factors. First, the number of indirect interactions is much lower for sparser networks due to limited connectivity. Indirect interactions can only involve proteins that interact with at least one other protein, i.e., the proteins in the interaction network. Second, the performance of FS-Weighted Averaging is dependent on the effectiveness of the FS-Weight measure, which is limited when the network is sparse.
Limitations of FS-Weight
We have seen from Table 2 that FS-Weight remains effective even for the sparser networks. However, while FS-Weight can still identify function-sharing interactions for sparser networks, it will miss more interactions that do share function. The topological weight is computed based on the common interaction neighbors of the network. When the interaction network is very sparse, there is often insufficient information in the local topology for FS-Weight to get a confident estimate on functional similarity between proteins. In such cases, FS-Weight assigns a low weight to the interaction. As such, it may limit the contribution of some function-sharing interactions to the function prediction mechanism in FS-Weighted Averaging. Nonetheless, we can see this as a feature rather than a limitation. When a protein interacts with very few proteins, any form of measure that would assign a high reliability score or high confidence in sharing function without additional evidence would be very susceptible to noise and will not give consistent performance over different datasets.
Examples of indirect functional association
Here we take a look at two examples that illustrate how indirect interactions can provide functional association that cannot be captured through direct interactions.
Indirect functional association of biological process
Indirect functional association of molecular function
Interestingly, the level-2 neighbors and YBR264C form a bipartite graph with four proteins: YGL198W, YNL263C, YNL044W, and YER136W. YGL198W is a Ypt-interacting protein that interacts with Rab GTPases. YNL263C and YNL044W have no known molecular function but are known to be involved in ER to Golgi transport.
YER136W is a GDP dissociation inhibitor that regulates vesicle traffic in secretory pathways by regulating the dissociation of GDP from GTP binding proteins. Two of the four proteins have molecular functions that require interaction with GTPases, while the other two have no known molecular function. It is likely that YNL263C and YNL044W, which have no known molecular function, would have molecular functions that involve interaction with GTPases.
We also notice that YGR172C is the only member on its side of the bipartite graph that does not have the molecular function GTPase activity. YGR172C is known to be an integral membrane protein required for the biogenesis of ER-derived COPII transport vesicles and has no known molecular function. It is likely that YGR172C would share the molecular function GTPase activity with YBR264C.
Novel predictions for S. cerevisiae
Excluding GO terms that are associated with fewer than 30 annotated proteins;
Excluding GO terms that have an ROC of less than 0.7 during cross validation;
For each remaining GO term, retaining only novel predictions that have a score greater than or equal to at least 70% of annotated proteins with the term.
Propagating predictions to include ancestor terms for consistency.
These predictions are publicly available at . We welcome collaborations with experimentalists interested in verifying some of these predictions.
We have investigated the protein-protein interactions from seven genomes and shown that by incorporating topological weighting and indirect neighbors, FS-Weighted Averaging can predict protein function effectively for all three categories of the Gene Ontology. This result is consistent across the seven genomes, indicating that the approach is robust and likely to be generally applicable. We have also studied the impact of noisy interaction data on the performance of FS-Weighted Averaging and find that it is very robust against random perturbations in the interaction network. The study also reveals that FS-Weighted Averaging displays greater effectiveness for sufficiently dense interaction networks as its weighting mechanism requires sufficient local network information.
Interaction and annotation datasets for multiple genomes
In this study, we will cover several genomes, namely Saccharomyces cerevisiae, Drosophila melanogaster (fruit fly), Caenorhabditis elegans (roundworm), Arabidopsis thaliana (mouse-ear cress), Rattus norvegicus (Norway rat), Mus musculus (house mouse), and Homo sapiens (human). Protein-protein interactions for D. melanogaster, C. elegans, and S. cerevisiae are obtained from the latest release (2.0.20) of BioGRID (formerly GRID ). Interaction data for A. thaliana, R. norvegicus, M. musculus, and H. sapiens are obtained from the Biomolecular Interaction Network Database (BIND) . Predicted protein-protein interactions for S. cerevisiae are obtained from the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database .
As genome-specific function annotation schemes may have inherent biases, we use a unified annotation scheme, the Gene Ontology , for function annotations. Gene Ontology (GO) terms are arranged in a hierarchical manner with more general terms at the lower level and more specific terms at the higher level. In this study, we define the GO term "biological process" as level 0, its children terms as level 1, and so on. Annotations follow the true path rule – a protein annotated with a GO term is also annotated with all its ancestor terms.
Table 1 shows some statistics of the various interaction datasets. We consider only annotated proteins in our study since our interest is in function inference. As the lower levels in the GO hierarchy can be very general, we refer to a protein as "annotated" if it is being annotated with at least one level-4 GO term. The first column depicts the number of interactions between annotated proteins. The second column shows the number of proteins that are annotated and have at least one interaction partner. The third column shows the average number of annotated neighbors per (annotated) protein. We use this as a simple indicator of the density of the interaction network. The S. cerevisiae dataset has the densest interaction network followed by D. melanogaster and H. Sapiens datasets. The R. norvegicus and C. elegans datasets have sparser interaction networks, with less than one annotated neighbor per annotated protein on the average.
Direct and indirect interactions
We define a direct interaction as an actual interaction between proteins in the protein-protein interaction data. In Figure 1, nodes in the graph represent proteins while edges between the nodes represent protein-protein interactions. There is a direct interaction between proteins A and B. We define an indirect interaction as the sharing of common interaction partners between two proteins (i.e., the two proteins are level-2 neighbors). Figure 1 shows an indirect interaction between proteins A and C. A pair of proteins may have both direct and indirect interactions, as illustrated by proteins A and D in Figure 1. Level-2 neighbors are able to bind to similar proteins; thus they have a higher likelihood of having similar molecular functions. Since subcellular localization shows substantial correlation with molecular function , level-2 neighbors are also likely to reside in similar cellular components.
Not all indirect interactions indicate function sharing. Indirect relationships are defined upon direct ones and are subjected to noise in the interaction network. Also, while two proteins can interact with a common protein, they may not bind to the common protein at the same site. To identify which indirect interactions are more likely to share functions, we proposed a topological weighting scheme, FS-Weight .
The FS-Weight of the direct or indirect interaction between two proteins u and v is defined as:
N(p) refers to the set that contains protein p and its level-1 neighbors;
λu,v is a pseudo-count included in the computation to penalize similarity weights between protein pairs when any of the proteins has very few level-1 neighbors.
ru,w refers to the estimated reliability of the interaction between u and w:
r i is the estimated reliability of experimental source i;
Eu,vis the set of experimental sources in which interaction between u and v is observed; and
ni,u,vis the number of times that interaction between u and v is observed from experimental source i.
FS-Weight addresses the abovementioned problems in two ways. First, the interaction network is weighted using estimated reliability values to reduce the impact of noise. Second, the weight is largely determined by the number of common interaction partners between the two proteins: if the two proteins share many common neighbors, the likelihood of sharing common physical characteristics increases. The number of non-common neighbors is used as a penalizing factor: if any of the two proteins also bind to many other proteins, then the likelihood of sharing common physical characteristics decreases.
Reliability of experimental sources
To reduce the impact of noise in the interaction network, the reliability of each experimental source of protein interaction data (e.g., two-hybrid, synthetic lethality) is assessed and weighted. Each source may be assigned an estimated reliability weight by experts based on domain knowledge. However, since we do not have domain knowledge, a simple way to do this is to estimate based on known functions as well as the agreement among independent experimental sources. Here we estimate the reliability of each experimental source by the fraction of unique interactions detected by the experimental source in which at least one level-4 Gene Ontology term is shared. This is done using annotated proteins in the training data during cross validation. The reliability of interactions observed in many independent experimental sources will be combined as described in the definition of FS-Weight. We do not use indirect interactions for the estimation of reliability to avoid circular reasoning as the definition of indirect neighbors is dependent upon the reliability.
The comparison of FS-Weighted Averaging with many existing approaches has been done in  on the yeast genome. Here we will study the performance of the approach on various additional genomes using two classical methods, Neighbor Counting and Chi-Square, as a benchmark:
The Neighbor Counting method is proposed in . For each protein u, each function x is ranked based on the frequency of its occurrence in the interaction partners (level-1 neighbors) of u. The rank of each function is used as its score for u:
δ(v, x) = 1 if v has function x, 0 otherwise;
rank(q(x)) refers to the rank of the function x relative to all functions based on q(x).
The Chi-Square method is proposed in . The approach scores each function f observed in the neighbors of a protein u using the Chi-Square statistic. The statistic measure computes the deviation of the observed occurrence of function f in the neighbors of u from its expected occurrence. In , the function with the largest chi-square value is assigned to u. Since we want to assign multiple functions to each protein, we use the rank of each function as its score instead:
e x is the expected number of proteins with function x among the interaction partners of u, computed by multiplying the number of annotated interaction partners of u with the frequency of function x among annotated proteins in the interaction map
Neighbor Counting uses occurrence-based ranking as a score for functions. A score derived for one protein may not reflect similar confidence as the same score derived for another. In , we introduced FS-Weighted Averaging, which uses a normalized weighted voting approach. The likelihood that a protein u has a function x is estimated by:
Z is the sum of all weights:
In the same equation from , we added the background frequency of function x to the summation of weights in f x (u). When the weights of all the interactions in the local neighbourhood of a protein are very low, the background frequency gives an estimate of the score. This is done so that all proteins can be given a prediction. However, as many of the genomes in this study are very sparse, derived interaction weights are very low. As a result, the background frequency will be given excessive weight, which negatively affects predictions results. Hence we exclude the background frequency from FS-Weighted Averaging in this work.
Prediction performance evaluation
To evaluate the performance of each approach, we use two popular validation methods, precision-recall analysis and receiver operating characteristics, both of which are described below.
The first method is to plot the precision against recall for the predictions made. Precision-recall analysis indicates the overall prediction performance of a prediction method. It also reflects the ability of a method to assign scores to predictions across different GO terms since it does not differentiate between scores assigned for different terms.
k i is the number of functions correctly predicted for protein i;
m i is the number of functions predicted for protein i; and
n i is the number of functions annotated for protein i
Receiver operating characteristics
While precision-recall analysis summarizes the overall prediction performance of a prediction method, it does not tell us about the prediction performance for each term. Since it does not differentiate between predictions made for different terms, it also penalizes methods that do not assign scores that reflect prediction confidence uniformly across different terms. Hence we choose to complement precision-recall analysis with another validation method. The Receiver operating characteristics (ROC)  score is the area under the curve derived from plotting true positives as a function of false positives. The ROC score is computed for the predictions for each informative GO term and measures how well the term is predicted for proteins. A higher ROC score indicates a better classifier, and the perfect classifier has an ROC score of 1. For any given GO term, if no prediction is made for a protein, we assume that the lowest possible score is assigned. The ROC does not reflect the recall of a method and does not differentiate between a method with very low recall and a method with high recall but low precision. Hence the two validation methods are complementary.
Since statistical measures are used for the validation of predictions, we only consider terms that are annotated to a reasonably large number of proteins to ensure that conclusions drawn from these measures are statistically sound. We adopt the approach of informative functional classes used in  and . For each of the 3 GO categories – biological process, molecular function, and cellular component – we define an informative GO term as a term which is annotated to at least n proteins and does not have any child term that is annotated to at least n proteins. n = 30 is used for the S. cerevisiae, D. melanogaster, M. musculus, and H. sapiens genomes. For the remaining genomes, n = 10 is used instead as there are few or no informative terms for validation when n = 30 is used.
Functional Similarity Weight
GTP binding proteins
Receiver Operating Characteristics
This research is funded in part by Singapore Agency for Science, Technology, and Research (A*STAR) Graduate Scholarship (HNC) and the A*STAR Institute for Infocomm Research-NUS School of Computing Joint Lab on Knowledge Discovery from Clinical Data (WKS, LW).
This article has been published as part of BMC Bioinformatics Volume 8, Supplement 4, 2007: The Second Automated Function Prediction Meeting. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/8?issue=S4.
- Dandekar T, Snel B, Huynen M, Bork P: Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci 1998, 23: 324–328. 10.1016/S0968-0004(98)01274-2View ArticlePubMedGoogle Scholar
- Overbeek R, Fonstein M, D'Souza M, Pusch GD, Maltsev N: The use of gene clusters to infer functional coupling. Proc Natl Acad Sci USA 1999, 96: 2896–2901. 10.1073/pnas.96.6.2896PubMed CentralView ArticlePubMedGoogle Scholar
- Salgado H, Moreno-Hagelsieb G, Smith TF, Collado-Vides J: Operons in Escherichia coli : genomic analyses and predictions. Proc Natl Acad Sci USA 2000, 97: 6652–6657. 10.1073/pnas.110147297PubMed CentralView ArticlePubMedGoogle Scholar
- Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D: Detecting protein function and protein-protein interactions from genome sequences. Science 1999, 285: 751–753. 10.1126/science.285.5428.751View ArticlePubMedGoogle Scholar
- Enright AJ, Iliopoulos I, Kyrpides NC, Ouzounis CA: Protein interaction maps for complete genomes based on gene fusion events. Nature 1999, 402: 86–90. 10.1038/47056View ArticlePubMedGoogle Scholar
- Huynen M, Snel B, Lathe W, Bork P: Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res 2000, 10: 1204–1210. 10.1101/gr.10.8.1204PubMed CentralView ArticlePubMedGoogle Scholar
- Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO: Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proc Natl Acad Sci USA 1999, 96: 4285–4288. 10.1073/pnas.96.8.4285PubMed CentralView ArticlePubMedGoogle Scholar
- Wu J, Kasif S, DeLisi C: Identification of functional links between genes using phylogenetic profiles. Bioinformatics 2003, 19: 1524–1530. 10.1093/bioinformatics/btg187View ArticlePubMedGoogle Scholar
- Zhou X, Kao MC, Wong WH: Transitive functional annotation by shortest-path analysis of gene expression data. Proc Natl Acad Sci USA 2002, 99: 12783–12788. 10.1073/pnas.192159399PubMed CentralView ArticlePubMedGoogle Scholar
- Schwikowski B, Uetz P, Fields S: A network of interacting proteins in yeast. Nat Biotechnol 2000, 18: 1257–1261. 10.1038/82360View ArticlePubMedGoogle Scholar
- Hishigaki H, Nakai K, Ono T, Tanigami A, Takagi T: Assessment of prediction accuracy of protein function from protein-protein interaction data. Yeast 2001, 18: 525–531. 10.1002/yea.706View ArticleGoogle Scholar
- Brun C, Chevenet F, Martin D, Wojcik J, Guénoche A, Jacq B: Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network. Genome Biol 2003, 5: R6. 10.1186/gb-2003-5-1-r6PubMed CentralView ArticlePubMedGoogle Scholar
- Samanta MP, Liang S: Predicting protein functions from redundancies in large-scale protein interaction networks. Proc Natl Acad Sci USA 2003, 100: 12579–83. 10.1073/pnas.2132527100PubMed CentralView ArticlePubMedGoogle Scholar
- Lanckriet GR, Deng M, Cristianini N, Jordan MI, Noble WS: Kernel-based data fusion and its application to protein function prediction in yeast. Pac Symp Biocomput 2004,:300–311.Google Scholar
- Letovsky S, Kasif S: Predicting protein function from protein/protein interaction data: a probabilistic approach. Bioinformatics 2003, 19(Suppl 1):i197-i204. 10.1093/bioinformatics/btg1026View ArticlePubMedGoogle Scholar
- Deng M, Zhang K, Mehta S, Chen T, Sun F: Prediction of protein function using protein-protein interaction data. J Comput Biol 2003, 10: 947–960. 10.1089/106652703322756168View ArticlePubMedGoogle Scholar
- Vazquez A, Flammi A, Maritan A, Vespignani A: Global protein function prediction from protein-protein interaction networks. Nat Biotechnol 2003, 21: 697–700. 10.1038/nbt825View ArticlePubMedGoogle Scholar
- Chua HN, Sung WK, Wong L: Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions. Bioinformatics 2006, 22: 1623–1630. 10.1093/bioinformatics/btl145View ArticlePubMedGoogle Scholar
- Breitkreutz BJ, Stark C, Tyers M: The GRID: the General Repository for Interaction Datasets. Genome Biol 2003, 4: R23. 10.1186/gb-2003-4-3-r23PubMed CentralView ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight S, Eppig JT, et al.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000, 25: 25–29. [http://www.godatabase.org] 10.1038/75556PubMed CentralView ArticlePubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: A basic local alignment search tool. J Mol Biol 1990, 215: 403–410.View ArticlePubMedGoogle Scholar
- Gribskov M, Robinson NL: Use of receiver operating characteristic analysis to evaluate sequence matching. Comput Chem 1996, 20: 25–33. 10.1016/S0097-8485(96)80004-0View ArticlePubMedGoogle Scholar
- Snel B, Lehmann G, Bork P, Huynen MA: STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. Nucleic Acids Res 2000, 28: 3442–3444. [http://string.embl.de/] 10.1093/nar/28.18.3442PubMed CentralView ArticlePubMedGoogle Scholar
- Grigoriev A: On the number of protein-protein interactions in the yeast proteome. Nucleic Acids Res 2003, 31: 4157–4161. 10.1093/nar/gkg466PubMed CentralView ArticlePubMedGoogle Scholar
- Sprinzak E, Sattath S, Margalit H: How reliable are experimental protein-protein interaction data? J Mol Biol 2003, 327: 919–923. 10.1016/S0022-2836(03)00239-0View ArticlePubMedGoogle Scholar
- FS-Weighted Averaging predictions for uncharacterized yeast ORFs[http://srs2.bic.nus.edu.sg/~kenny/fsweightedavg/]
- Bader G, Donaldson I, Wolting C, Ouellette B, Pawson T, Hogue C: BIND – the biomolecular interaction network database. Nucleic Acids Res 2001, 29: 242–245. 10.1093/nar/29.1.242PubMed CentralView ArticlePubMedGoogle Scholar
- Lu Z, Hunter L: GO molecular function terms are predictive of subcellular localization. Pac Symp Biocomput:151–161.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.