Volume 11 Supplement 1
Selected articles from the Eighth AsiaPacific Bioinformatics Conference (APBC 2010)
Functional characterization and topological modularity of molecular interaction networks
 Jayesh Pandey^{1}Email author,
 Mehmet Koyutürk^{2, 3} and
 Ananth Grama^{1}
DOI: 10.1186/1471210511S1S35
© Pandey et al; licensee BioMed Central Ltd. 2010
Published: 18 January 2010
Abstract
Background
Analyzing interaction networks for functional characterization poses significant challenges arising from the noisy, incomplete, and generic nature of both the interaction data as well as functional annotation of molecules. Networkbased methods focus on interacting molecules (pairs or sets) occurring in close proximity to infer functional associations.
Results
In this paper we perform a formal comparative investigation of the relationship between functional coherence and topological proximity in networks. We investigate the problem of assessing the coherence of sets of biomolecules (or segments thereof) taking into account functional specificity as well as the distribution of functional attributes across entity groups. We also propose novel measures of topological proximity that are more robust to noisy and incomplete interaction data.
Conclusion
We derive the following results in this paper: (i) there exists strong correlation between functional similarity and topological proximity in various network abstractions, with domain interaction networks (DDIs) demonstrating higher correlation than protein interaction networks (PPIs); (ii) measures that quantify coherence among entire sets of proteins are superior to aggregates of known pairwise measures; and (iii) randomwalk based measures of topological proximity are better suited to existing interaction data. We validate our methods on diverse data, including experimentally and computationally derived PPIs and DDIs, as well as on sets of known biologically related groups of molecules.
Background
Analysis of interaction data generated from high throughput experiments takes a networkcentric view of functions of biological systems and the role of the underlying components. Recent advances in this area have focused on the development of computational tools for networkbased functional annotation [1], identification of functionally coherent modules [2], and relationship between network structure and function [3, 4], among others. Network proximity and connectivity are also shown to be effective in identifying proteins that are implicated in similar phenotypes [5].
In this paper, we comprehensively investigate the relationship between topological and functional modularity in the context of two network abstractions  proteinprotein interaction (PPI) and domaindomain interaction (DDI) networks. Key to understanding the relationship between network topology and functional modularity are: (i) suitable measures for assessing the functional coherence (or similarity) of a group of entities with respect to each other, and (ii) measures for quantifying the topological proximity in a network with potential missing interactions and noise. To assess functional coherence, canonical libraries of molecular function, such as Gene Ontology [6], are typically used [7]. Since annotations for different types of molecular entities (e.g., proteins or domains) are derived in different ways [8], they have different implications with respect to their specificity and frequency distributions. Consequently, an important challenge in assessment of functional coherence is the development of measures that are robust to variations in distribution as well as missing data. In recent work, we have shown that informationtheoretic measures that are specifically designed to address these challenges are effective in capturing the relationship between the functional coherence and network proximity of pairs of proteins [9]. In this paper, we build upon existing methods for quantifying functional coherence and topological proximity through the following key results:

We propose novel measures for assessing the functional coherence of a group of molecules (in contrast to pairs of molecules).

We propose the use of information flow based modeling of topological proximity and connectivity in a network of interactions (in contrast to traditionally used interaction counts or shortest paths).
We elaborate on these contributions below.
Functional coherence of a group of molecules
Traditional measures of functional coherence, including our own prior results [9], have largely focused on pairwise distance measures. Generalizing from pairwise measures to coherence measures for sets of molecules adds significant complexity. For example, in testing the hypothesis that functional modularity is related to connectivity in PPI networks, it is common to investigate the functional purity of groups of proteins that induce dense subgraphs in the network [10]. While these enrichmentbased methods have been widely used, they provide common overrepresented GO terms in a given set. They do not, however, provide a measure for the homogeneity of underlying modules (sets). We show that simple extensions of pairwise measures to group measures by averaging, taking the min, max, or other such associative operations result in suboptimal setcoherence measures. We propose novel measures of homogeneity of entire protein sets and demonstrate their superiority over generalized pairwise measures on known groups of homogeneous complexes as compared to a control of randomly generated protein sets.
Information flow based topological proximity
Topological information is used to identify functionally related proteins using shortest paths or density of direct interactions [1, 5]. However, evidence suggests that multiple alternate paths between functionally associated proteins are often conserved through evolution, owing to their contribution to robustness against perturbations, as well as amplification of signals [11]. Consequently, consideration of multiple paths between molecules in a network of interactions is likely to be more effective in capturing the functional association between these molecules. Furthermore, consideration of alternate paths may account for missing data and noise in PPI networks [12]. There exist many methods for the assessment of network proximity based on the multiplicity of paths between nodes, including effective resistance [13], commute distance [14], and random walk proximity [15]. In this paper, we adapt an abstraction that models information flow in the cell using random walks with restarts [16].
Methods
Several methods have been proposed for assessing functional similarity of biological entities (genes, proteins, domains) [17–19]. Since the functional categories in which these entities are categorized are themselves interrelated through a taxonomy (e.g., Gene Ontology), measures for similarity must consider the underlying taxonomy while comparing molecules in terms of their functional annotation [20]. Various approaches take into account different factors, including taxonomic distance, specificity/generality (rank in hierarchy) of common ancestors, and associated number of molecules for the functional terms being compared (statistical significance or information content). Since most molecules are associated with multiple functional terms, assessment of functional similarity between two molecules poses the additional challenge of evaluating the similarity between two sets of terms, as opposed to a pair of terms. In [9], we developed an information theoretic measure for computing similarity of two sets of terms associated with a pair of molecules. We show that our measure is superior to other composite measures computed by applying associative operators (average, max, etc.) to pairwise term similarity measures.
In this paper, we generalize and extend our results to quantify the functional coherence (or similarity) of a set of biomolecules (as opposed to a pair). Since each molecule corresponds to a set of annotations, the problem is one of quantifying the coherence of a set of sets of terms. A straightforward approach to this would compute pairwise similarities of each pair of molecules in the set and to aggregate them using associative operators (min, max, average). Pairwise similarities (similarity of two sets of annotations) may themselves be computed using our information theoretic measure. An alternate approach to the problem, proposed in this paper, computes the coherence of the set of molecules without computing intermediate pairwise similarity scores. We show that the latter approach is strictly superior to the former in quantifying the coherence of a set of biomolecules. We validate this claim by applying our proposed measure, along with several other currently used measures to a test group of known functionally related proteins. We also apply the measures to randomly generated groups and identify measures that induce the greatest separation between the test and random groups.
Finally, in order to study the correlation between functional coherence and topological proximity in networks, we also need a measure for topological proximity. Traditional measures of topological proximity rely on the shortest path between two nodes. While this measure is more suited to wellcurated and complete datasets, it is susceptible to missing interactions and noise. A single false positive or negative may lead to significant (erroneous) perturbation in shortestpath based measures. Measures based on random walks with restart [16], on the other hand, are more resilient to incomplete and noisy data. We consider both classes of measures of topological proximity, and evaluate their correlation with various functional similarity measures for both protein interaction (PPI) and domain interaction (DDI) networks. We show that a combination of randomwalk based topological proximity and our similarity measure([9]) yield the strongest correlation between network proximity and functional coherence.
Concepts and ontologies
Let C = {c_{ i }1 ≤ i ≤ N } be a finite partially ordered set of concepts. In terms of Gene Ontology (GO), these concepts represent the GO terms in the subontologies (i.e., molecular function, biological process, and cellular component). Without loss of generality, we refer to concepts as terms throughout this paper. Terms are related to each other through is a and part of relationships, such that c_{ i }→ c_{ j }denotes c_{ i }is a/part of c_{ j }. Note that, if c_{ i }→ c_{ j }, then the molecules associated with c_{ i }are also associated with c_{ j }, known as the true path rule. Based on these relationships, we define a binary relation over C, denoted by ≼. We say c_{ j }is an ancestor of c_{ i }, denoted by c_{ i }≼ c_{ j }if and only if either c_{ i }→ c_{ j }, or for some ℓ ≥ 1, there exist for 1 ≤ ℓ ≤ 1 such that for 1 ≤ ℓ < l, and (c_{ j }is an ancestor of c_{ i }in GO hierarchy). Two terms c_{ i }, c_{ j }are comparable, denoted by c_{ i }~ c_{ j }, if either c_{ j }≼ c_{ i }or c_{ i }≼ c_{ j }. If c_{ i }and c_{ j }are comparable, then the shortest path between c_{ i }and c_{ j }is given by L(c_{ i }, c_{ j }) = L(c_{ j }, c_{ i }) = ℓ + 1 for minimum such ℓ.
We denote the set of ancestors of a term c_{ i }by A_{ i }= {c_{ k }∈ Cc_{ i }≼ c_{ k }}. Note that, not all ancestors of a term are comparable, since the GO hierarchy is a directed acyclic graph, as opposed to a tree. We represent the root term of GO with a terminal concept r, such that c_{ i }≼ r ∀c_{ i }∈ C.
Semantic similarity of terms
Here, is said to be the minimum common ancestor of c_{ i }and c_{ j }.
Observe that this measure does not take into account the specificity of terms with identical common ancestors. This problem can be alleviated by normalizing the similarity between two terms by the selfsimilarities of the terms being compared, e.g., by [21]. Note, this measure has a well defined maximum of 1 and offer bounded interpretation (ranging from 0 to 1) of Resnik's metric. We now generalize these termsimilarity measures to setsimilarity.
Functional similarity of molecules
Biomolecules are generally associated with multiple molecular functions and often involved in multiple processes. Consequently annotations of molecules correspond to sets of terms, as opposed to individual terms. While assessing the similarity of sets of terms, we assume that the sets are nonredundant, i.e., each set consists of terms that are not comparable. This can be easily enforced by ensuring that each branch in the hierarchy is represented by at most one term in each set. In GO, this involves considering only the most specific annotations associated with a gene, which provides a nonredundant representation of functional annotation. In this representation, the association between the gene and the ancestors of the most specific term is implied by the true path rule.
An important challenge in the assessment of the functional coherence of sets is that these sets are often incomplete (that is, for many molecules, some of their functions are unknown). Therefore, a reliable measure is one that rewards the abundance of similar terms in the terms, but does not penalize existence of unrelated terms in one of the sets, since the relation between these terms and the other set may be currently unknown. Simple associative measures that aggregate the similarity of pairs of terms in the two terms, such as average (ρ_{ A }) [17], maximum (ρ_{ M }) [22], or average of maximums (ρ_{ H }) [18] do not satisfy these properties [9].
where is the set of molecules that are associated with all terms in the set Λ(S_{ i }, S_{ j }). Note that ρ_{ I }also needs to be normalized with respect to self similarities, i.e., ρ_{ JC }= 1/(ρ_{ I }(S_{ i }, S_{ i }) + ρ_{ I }(S_{ j }, S_{ j })  2ρ_{ I }(S_{ i }, S_{ j }) + 1)
Functional coherence of modules
Average of pairwise information content
In our running example, the average pairwise information content of the molecules in complex ℛ_{1} is given by σ_{ A }(ℛ_{1}) = (I(c_{4}) + I(c_{4})/2 + 0 + I(c_{4})/2 + 0 + I(c_{6})/4)/6 = 3/8, while that of subcomplex ℛ_{2} is given by σ_{ A }(ℛ_{2}) = (I(c_{4}) + I(c_{4})/2 + I(c_{4})/2)/3 = 2/3, given that I(c_{4}) = I(c_{6}) = log_{2} (3/6) = 1. Bridged complexes get lower score than specialized complexes due to differences in subcomplex annotations.
Generalized information content
In our running example, since λ(c_{4}, c_{1}) = λ(c_{4}, c_{6}) = λ(c_{4}, c_{1}, c_{6}) = r, we have Λ(ℛ_{1}) = {r}, thus the generalized information content of complex ℛ_{1} is σ_{ I }(ℛ_{1}) = I(r) = 0. On the other hand, since Λ(ℛ_{2}) = {c_{4}}, we have σ_{ I }(ℛ_{2}) = I(c_{4}). As illustrated by this example, σ_{ I }is a rather conservative measure of functional coherence and it only rewards specialized modules in which all molecules share very similar functions.
Graph information content
Observe that, if all molecules are annotated with the same set of terms, σ_{ G }(ℛ) would be equal to one, and zero if they have no common terms. Similar to σ_{ I }, a drawback of this measure is its sensitivity to outliers; that is, if a single molecule in the set is sufficiently functionally different it has a significant impact on the score. Indeed, in our running example, we have σ_{ G }(ℛ_{1}) = I(r) = 0, while σ_{ G }(ℛ_{2}) = (I(c_{4}) + I(c_{2}))/(I(c_{4}) + I(c_{2}) + I(c_{6}) + I(c_{3})) = 1/2, given that I(c_{2}) = I(c_{3}) = log_{2} (3/6) = 1.
Weighted information content
Complexes are functionally cohesive modules, but they are often composed of subcomplexes, each performing a specific part of the general function of the complex [25]. However, as illustrated by our running example, generalized information content (σ_{ I }) and graph information content (σ_{ G }) require all molecules to be functionally coherent with each other for the module to be considered coherent. In order to provide a more relaxed, and biologically motivated measure of functional coherence, we consider shared functionality between all combinations of molecules and weigh the information content of shared functionality by the number of molecules that contribute to the shared functionality.
In other words, we consider all the partial DAGs ( ) generated by each S_{ i }in ℛ. All the terms that are part of overlapping DAG correspond to shared information among those proteins. The numerator in the above equation corresponds to the information content of the overlapping DAG, while the denominator normalizes that score with total information of the combined DAG. In our running example, we have
σ_{ W }(ℛ_{1}) = (3I(c_{4}) + 3I(c_{2}) + 2I(c_{6}) + 2I(c_{3}))/(3I(c_{4}) + 3I(c_{2}) + 2I(c_{6}) + 2I(c_{3}) + I(c_{1})) = 0.86 and σ_{ W }(ℛ_{2}) = (3I(c_{4}) + 3I(c_{2}))/(3I(c_{4}) + 3I(c_{2}) + I(c_{6}) + I(c_{3})) = 3/4, given that I(c_{1}) =  log_{2} (2/6) ≈ 1.6 Since members of the module ℛ_{1} share all functions other than c_{1}, this measure captures the coherence of the bridged module better than other methods. This method only penalizes for functions which are not shared by a member with rest of the module.
Postprocessing coherence scores
We now discuss how coherence scores are processed to make them comparable against each other for different module sizes and across various subontologies.
Combination of subontology scores
where max ρ^{(BP)}and max ρ^{(MF)}are the maximum possible scores for biological process and molecular function, respectively. Module coherence scores (σ) are based only on biological process ontology.
Accounting for module size
In order to compare modules of different sizes, we normalize the functional coherence scores based on a background distribution that characterizes the coherence of modules of identical size. Specifically, for a given module ℛ, we generate a sufficiently large number of random modules of size ℛ and compute the functional coherence of each of these modules. Then, letting denote the average functional coherence of these modules, we compute the sizeadjusted coherence score of ℛ as .
Index of detectability
which is proportional to the area under the binormal ROC curve [27]. Here, T and C denote the sets of test and control modules, respectively.
Measure for topological proximity
The most commonly used measure of topological proximity is graph distance, where the distance between a pair of nodes in a connected graph is defined as the length of the shortest path between them. In the context of biological networks, there are several drawbacks to this measure. It is particularly susceptible to missing or incorrect data  i.e., a single missing edge may reduce proximity significantly, alternately, a single false edge may increase proximity incorrectly [28]. Furthermore, this measure does not take into account the global structure and connectedness of the graph, with alternate paths between a pair of nodes.
Nodes connected to each other via disjoint paths are likely to be functionally closer than nodes that are connected via a single path. Indeed, evidence suggests that multiple alternate paths between functionally associated proteins are often conserved through evolution, owing to their contribution to robustness against perturbations, as well as amplification of signals [11].
Here, W is the stochastic matrix derived from the adjacency matrix of the network, is the restart vector with if j = i and 0 otherwise, and . Then, the proximity of node j to node i is given by . Repeating this procedure for all proteins, we obtain a matrix of network proximity scores for all pairs of proteins. Note, however, that this measure of proximity is not symmetric (proximity of j to i is not necessarily equal to the proximity of i to j). Therefore, we take the average of the two proximity values to compute the proximity between a pair of proteins. Using the proposed measures of functional coherence and the randomwalk based measure for topological proximity, we quantify the relationship between topological proximity and functional coherence by computing the correlation of the resulting matrices.
Materials
We obtain protein interaction data for S. cerevisiae and S. pombe, from the BioGRID database [29] version 2.0.51. We filter the dataset to obtain a set of physical interactions between proteins, i.e., genetic interactions are removed based on experiment systems (e.g., knockout experiments) mentioned on the BioGRID website. Integr8 [30] is used to map the proteins in the interaction dataset to their Uniprot names, keeping only those proteins that we can map to a Gene Ontology term using Integr8.
We obtain domain interaction data from the DOMINE database [31] version 1.1. This dataset is composed of known, as well as predicted domain interactions. Based on the source and quality of the data, we partition this dataset. Struct interactions are inferred from PDB entries of protein complexes and are collected from iPfam and 3did. Comp2 interactions are predicted by at least two computational methods that infer domain interactions from protein interaction networks using techniques such as maximum likelihood estimation or from coevolution of conserved sites in protein sequences. HC+MC interactions consists of high and medium confidence interactions (for details, please refer to [31]).
To test the functional coherence of sets, we obtain positive and random cases from GRIP [32]. GRIP generates positive cases from MIPS CYGD complex catalogue [33] by picking sets from known complexes. For wildtype cases, GRIP selects proteins at random. We generate a total of 16 datasets of which eight are made up of positive cases and eight are random. Each set consists 2000 sets of proteins (complexes), ranging from four to eleven proteins each.
Gene Ontology Annotation (GOA) [34] release 47.0 dated 2009/03/09 is used to obtain annotation information for Uniprot proteins. GOA combines manual and automated inferences of gene product annotations. The mapping of PfamA domains to their Gene Ontology functions is obtained from pfam2go http://www.geneontology.org/external2go/pfam2go released on 2009/03/04. We use only the Biological Process and Molecular Function subontologies of Gene Ontology [6] version 1.550 for evaluation, since the coverage for the Cellular Component subontology is relatively sparse.
Results and discussion
We first compare the behavior of the molecular similarity metrics by examining their correlation with different topological proximity measures, and follow with a detailed look at their behavior on comprehensive PPI and DDI data. We then investigate the differences between PPI and DDI networks in terms of the relationship between network proximity and functional similarity using our generalized information content based metric. Finally we compare various measures for computing the functional coherence of sets. To evaluate similarity visavis proximity, we compute, for every pair of nodes in a network, the shortest distance between them, proximity for a given value of c, and various semantic similarity measures. Using these, we compute correlation of topological proximity metrics and functional similarity measures. As in [9] we normalize raw similarity scores to obtain a mean similarity score of zero and standard deviation of one. We create groups of pairs based on their topological proximity and compute average semantic similarity for each group.
Topological proximity and semantic similarity measures
We first evaluate the proximity measure based on random walks. Since the parameter c can be varied to perturb the affinity between nodes, we first estimate an optimal value for c. We compute the proximity matrix for various values of c, ranging from 0.1 to 0.9, for the domain network HC+MC. We also compute the semantic similarity scores for different metrics  average of information content (IC) based term similarity (ρ_{ A }/δ_{ I }), average of selfnormalized IC based term similarity (ρ_{ A }/δ_{ JC }), IC based molecule similarity (ρ_{ I }), and selfnormalized IC based molecule similarity (ρ_{ JC }). We compute the correlation between these computed functional similarity scores and topological proximity. Semantic similarity is computed for the biological process (BP) and molecular function (MF) ontology separately, as well as by combining the two scores.
In Figure 2(b) we plot the correlation of topological proximity and semantic similarity measures using BP, MF and both ontologies. BP offers slightly better correlation than MF. In general, MF corresponds to a lower level property of a molecule directly related to its structure. BP is a higher level construct, related to the wider neighborhood in the network. Hence interacting molecules are more likely to belong to the same processes even if they have different functions. Finally the correlation obtained by combining the two ontologies is higher than taking them separately.
Topological proximity and functional similarity in networks
Correlation of proximity measures and similarity in Figure 2a provides comparison of the curves in Figures 3a and 3b. Further comparison of Figures 3a and 3b indicates that the slopes of the curves are are generally higher for random walk based proximity, as compared to shortest path. Again, since proximity binds two nodes not just on topological distance but also on number of paths in between, only strongly correlated nodes have higher proximity values. These observations suggest that network proximity based on random walks is likely to be more relevant to, hence indicative of, functional coherence and modularity.
Comparing measures for sets
We evaluate coherence measures for sets using index of detectability on sets with (known) functionally correlated protein and sets of randomly selected proteins. We compute functional coherence using the following measures: average of pairwise information content (using term σ_{ A }/ρ_{ A }/δ_{ I }and molecule σ_{ A }/ρ_{ I }based similarity), Generalized Information Content (σ_{ I }), Extended graph information content (σ_{ G }), and Weighted information content (σ_{ W }). As we observe in the previous section, since biological processes span wider neighborhoods, they are more likely to be shared in a module. For this reason, we compute the coherence score using only the biological process ontology. As the index of detectability is a measure of significance, we can plot a curve indicating the threshold corresponding to pvalue < 0.05.
Conclusion
We draw the following conclusions from our study: (i) our proposed measure of functional coherence of sets of entities (proteins, domains) is superior to other existing measures, (ii) we comprehensively study the relationship between functional coherence and topological proximity using suitable measures and derive formal conclusions for process and function based annotations, and (iii) we use our measures to study a range of PPIs and DDIs and establish the suitability of these abstractions to various kinds of analyses.
Declarations
Acknowledgements
This research was supported by NSF grants 0800568, 0835541, and 0641037.
This article has been published as part of BMC Bioinformatics Volume 11 Supplement 1, 2010: Selected articles from the Eighth AsiaPacific Bioinformatics Conference (APBC 2010). The full contents of the supplement are available online at http://www.biomedcentral.com/14712105/11?issue=S1.
Authors’ Affiliations
References
 Sharan R, Ulitsky I, Shamir R: Networkbased prediction of protein function. Mol Syst Biol. 2007, 3: 10.1038/msb4100129.Google Scholar
 Spirin V, Mirny LA: Protein complexes and functional modules in molecular networks. PNAS. 2003, 100 (21): 1212312128.PubMed CentralView ArticlePubMedGoogle Scholar
 Yook SH, Oltvai ZN, Barabási AL: Functional and topological characterization of protein interaction networks. Proteomics. 2004, 4: 928942.View ArticlePubMedGoogle Scholar
 Przulj N, Wigle DA, Jurisica I: Functional topology in a network of protein interactions. Bioinformatics. 2004, 20: 340348.View ArticlePubMedGoogle Scholar
 Lage K, Karlberg OE, Størling ZM, Páll , Pedersen AG, Rigina O, Hinsby AM, Tümer Z, Pociot F, Tommerup N: A human phenomeinteractome network of protein complexes implicated in genetic disorders. Nature Biotechnology. 2007, 25 (3): 309316. http://dx.doi.org/10.1038/nbt1295, http://dx.doi.org/10.1038/nbt1295View ArticlePubMedGoogle Scholar
 Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT: Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics. 2000, 25: 2529.PubMed CentralView ArticlePubMedGoogle Scholar
 Khatri P, Drãghici S: Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics. 2005, 21: 35873595.PubMed CentralView ArticlePubMedGoogle Scholar
 Schug J, Diskin S, Mazzarelli J, Brunk B, Stoeckert C: Predicting gene ontology functions from ProDom and CDD protein domains. Genome Res. 2002, 12: 648655.PubMed CentralView ArticlePubMedGoogle Scholar
 Pandey J, Koyuturk M, Subramaniam S, Grama A: Functional coherence in domain interaction networks. Bioinformatics. 2008, 24 (16): i2834. http://dx.doi.org/10.1093/bioinformatics/btn296View ArticlePubMedGoogle Scholar
 Grossmann S, Bauer S, Robinson PN, Vingron M: An Improved Statistic for Detecting OverRepresented Gene Ontology Annotations in Gene Sets. 10th International Conference on Research in Computational Moecular Biology (RECOMB'06). 2006, 8598.Google Scholar
 Kelley R, Ideker T: Systematic interpretation of genetic interactions using protein networks. Nature Biotechnology. 2005, 23 (5): 561566. http://dx.doi.org/10.1038/nbt1096PubMed CentralView ArticlePubMedGoogle Scholar
 Kelley BP, Sharan R, Karp RM, Sittler T, Root DE, Stockwell BR, Ideker T: Conserved pathways within bacteria and yeast as revealed by global protein network alignment. PNAS. 2003, 100 (20): 1139411399.PubMed CentralView ArticlePubMedGoogle Scholar
 Tetali P: Random walks and the effective resistance of networks. Journal of Theoretical Probability. 1991, 4: 101109. 10.1007/BF01046996.View ArticleGoogle Scholar
 Chandra AK, Raghavan P, Ruzzo WL, Smolensky R: The electrical resistance of a graph captures its commute and cover times. In STOC '89: Proceedings of the twentyfirst annual ACM symposium on Theory of computing. 1989, 574586. full_text. New York, NY, USA: ACM,View ArticleGoogle Scholar
 Fouss F, Pirotte A, Saerens M: A Novel Way of Computing Similarities between Nodes of a Graph, with Application to Collaborative Recommendation. In WI '05: Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence. 2005, 550556. full_text. Washington, DC, USA: IEEE Computer Society,View ArticleGoogle Scholar
 Tong H, Faloutsos C, Pan JY: Random walk with restart: fast solutions and applications. Knowl Inf Syst. 2008, 14 (3): 327346. 10.1007/s1011500700942.View ArticleGoogle Scholar
 Lord P, Stevens R, Brass A, Goble C: Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics. 2003, 19: 12751283.View ArticlePubMedGoogle Scholar
 Schlicker A, Huthmacher C, Ramírez F, Lengauer T, Albrecht M: Functional evaluation of domaindomain interactions and human protein interaction networks. Bioinformatics. 2007, 23: 859865.View ArticlePubMedGoogle Scholar
 Chagoyen M, Carazo JM, PascualMontano A: Assessment of protein set coherence using functional annotations. BMC Bioinformatics. 2008, 9: 444PubMed CentralView ArticlePubMedGoogle Scholar
 Resnik P: Using Information Content to Evaluate Semantic Similarity in a Taxonomy. IJCAI. 1995, 448453. http://citeseer.ist.psu.edu/resnik95using.htmlGoogle Scholar
 Jiang JJ, Conrath DW: Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. ICRCL. 1997,Google Scholar
 Sevilla J, Segura V, Podhorski A, Guruceaga E, Mato J, MartínezCruz L, Corrales F, Rubio A: Correlation between gene expression and GO semantic similarity. IEEE/ACM Trans Comput Biol Bioinform. 2005, 2: 330338.View ArticlePubMedGoogle Scholar
 Pu S, Vlasblom J, Emili A, Greenblatt J, Wodak SJ: Identifying functional modules in the physical interactome of Saccharomyces cerevisiae. Proteomics. 2007, 7: 944960.View ArticlePubMedGoogle Scholar
 Pesquita C, Faria D, Bastos H, Ferreira AE, Falcão AO, Couto FM: Metrics for GO based protein semantic similarity: a systematic evaluation. BMC Bioinformatics. 2008, 9 (Suppl 5): S4PubMed CentralView ArticlePubMedGoogle Scholar
 Gavin AC, SupertiFurga G: Protein complexes and proteome organization from yeast to man. Curr Opin Chem Biol. 2003, 7: 2127.View ArticlePubMedGoogle Scholar
 Schlicker A, Rahnenfuhrer J, Albrecht M, Lengauer T, Domingues F: GOTax: investigating biological processes and biochemical activities along the taxonomic tree. Genome Biology. 2007, 8 (3): R33http://genomebiology.com/2007/8/3/R33PubMed CentralView ArticlePubMedGoogle Scholar
 Simpson A, Fitter M: What is the best index of detectability?. Psychological Bulletin. 1973, 80 (6): 481488. 10.1037/h0035203.View ArticleGoogle Scholar
 Edwards AM, Kus B, Jansen R, Greenbaum D, Greenblatt J, Gerstein M: Bridging structural biology and genomics: assessing protein interaction data with known complexes. Trends Genet. 2002, 18: 529536.View ArticlePubMedGoogle Scholar
 Breitkreutz B, Stark C, Reguly T, Boucher L, Breitkreutz A, Livstone M, Oughtred R, Lackner D, Bähler J, Wood V: The BioGRID Interaction Database: 2008 update. Nucleic Acids Res. 2007,Google Scholar
 Kersey P, Bower L, Morris L, Horne A, Petryszak R, Kanz C, Kanapin A, Das U, Michoud K, Phan I: Integr8 and Genome Reviews: integrated views of complete genomes and proteomes. Nucleic Acids Res. 2005, 33: 297302. 10.1093/nar/gki039.View ArticleGoogle Scholar
 Raghavachari B, Tasneem A, Przytycka T, Jothi R: DOMINE: a database of protein domain interactions. Nucleic Acids Res. 2007,Google Scholar
 Browne F, Wang H, Zheng H, Azuaje F: GRIP: A webbased system for constructing Gold Standard datasets for proteinprotein interaction prediction. Source Code Biol Med. 2009, 4: 2PubMed CentralView ArticlePubMedGoogle Scholar
 Güldener U, Müunsterkötter M, Kastenmüller G, Strack N, van Helden J, Lemer C, Richelles J, Wodak SJ, GarcíaMartínez J, PérezOrtín JE: CYGD: the Comprehensive Yeast Genome Database. Nucleic Acids Res. 2005, 33: D364368.PubMed CentralView ArticlePubMedGoogle Scholar
 Barrell D, Dimmer E, Huntley RP, Binns D, O'Donovan C, Apweiler R: The GOA database in 2009an integrated Gene Ontology Annotation resource. Nucleic Acids Res. 2009, 37: 396403. 10.1093/nar/gkn803.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.