# Commensurate distances and similar motifs in genetic congruence and protein interaction networks in yeast

- Ping Ye
^{1, 2}, - Brian D Peyser
^{3, 4}, - Forrest A Spencer
^{3, 4}and - Joel S Bader
^{1, 2}Email author

**6**:270

**DOI: **10.1186/1471-2105-6-270

© Ye et al; licensee BioMed Central Ltd. 2005

**Received: **04 April 2005

**Accepted: **09 November 2005

**Published: **09 November 2005

## Abstract

### Background

In a genetic interaction, the phenotype of a double mutant differs from the combined phenotypes of the underlying single mutants. When the single mutants have no growth defect, but the double mutant is lethal or exhibits slow growth, the interaction is termed synthetic lethality or synthetic fitness. These genetic interactions reveal gene redundancy and compensating pathways. Recently available large-scale data sets of genetic interactions and protein interactions in *Saccharomyces cerevisiae* provide a unique opportunity to elucidate the topological structure of biological pathways and how genes function in these pathways.

### Results

We have defined congruent genes as pairs of genes with similar sets of genetic interaction partners and constructed a genetic congruence network by linking congruent genes. By comparing path lengths in three types of networks (genetic interaction, genetic congruence, and protein interaction), we discovered that high genetic congruence not only exhibits correlation with direct protein interaction linkage but also exhibits commensurate distance with the protein interaction network. However, consistent distances were not observed between genetic and protein interaction networks. We also demonstrated that congruence and protein networks are enriched with motifs that indicate network transitivity, while the genetic network has both transitive (triangle) and intransitive (square) types of motifs. These results suggest that robustness of yeast cells to gene deletions is due in part to two complementary pathways (square motif) or three complementary pathways, any two of which are required for viability (triangle motif).

### Conclusion

Genetic congruence is superior to genetic interaction in prediction of protein interactions and function associations. Genetically interacting pairs usually belong to parallel compensatory pathways, which can generate transitive motifs (any two of three pathways needed) or intransitive motifs (either of two pathways needed).

## Background

A powerful tool to dissect the genetic buffering contributing to robustness of an organism is to identify gene pairs whose individual mutants are viable, but whose double mutants are lethal or exhibit reduced fitness [1, 2]. These are particular types of genetic interactions, which more generally indicate that the phenotype of a double mutant differs from that expected from the phenotypes of the single mutants. Other types of genetic interaction include epistasis (an anticipated combined effect is not observed) and suppression (a defect is rectified by a second mutation). For convenience, we use genetic interaction henceforth to refer specifically to synthetic lethal and synthetic fitness genetic interactions.

Genetic interaction partners have been described as acting either in parallel compensating pathways, or in the same essential process [2]. Through revealing gene redundancy and compensating pathways, genetic interaction has contributed to the understanding of gene functions as well as the networks and pathways in which gene products participate [3–6]. It is also highly relevant to understanding genetic instability and variation occurring in various human diseases [2].

While a genetic interaction indicates that genes have compensating function, it does not necessarily indicate that the gene products work in the same pathway, for example as indicated by biochemical, physical interactions between proteins. Protein interactions can indicate correct network topology by linking proteins within the same biological pathway. The recent availability of high-throughput genetic interaction screens [3–6] and protein interaction screens [7–10] for the model organism *Saccharomyces cerevisiae* (budding yeast) provides a unique opportunity to investigate the genetic interaction network and protein interaction network both individually and jointly. Genetic interactions often reflect functional relationships that reach far beyond local protein interactions. Protein interaction data from high-throughput approaches are known to include false positive as well as physiologically relevant observations. It is critical to understand the correlations between genetic and protein interactions, as information derived from these two types of networks can provide complementary views for developing our understanding of how genes function in specific biological pathways, and how failures of these pathways lead to pathologic conditions that are relevant to the occurrence and progression of human diseases.

Graph theoretic approaches have been applied to study global properties of protein interaction networks and genetic interaction networks in yeast [6, 11–22]. A few global network analyses also directly compared the genetic and protein interaction maps. It has been suggested that the current genetic interaction network is at least four times denser than the protein interaction network; genetic interactions are significantly more abundant between physically interacting proteins and the number of common genetic neighbors between two genes correlates with a known protein-protein interaction [6]. Other studies show that highly connected hubs in the protein network have a higher probability to genetically interact with each other [23], that the two-hop physical-genetic interaction is the top predictor of genetic interactions [24], and that probabilistic network models favor between-pathway explanations over within-pathway explanations for synthetic lethal genetic interactions [22].

Here, we present a global and local network investigation of the connections among genetic interaction, genetic congruence, and protein interaction networks for yeast, focusing on quantitative comparison of path length and motifs. Our results demonstrate that the genetic congruence network inferred from direct genetic interactions largely overlaps with the protein interaction network, with corresponding distances and motifs, while the genetic interaction network does not. This finding indicates that genetic congruence provides evidence for physical interaction and protein complex membership, as well as similar gene functions. The genetic congruence network we have defined can function as a mini-map to reveal network properties before the entire genetic interaction map is completed in yeast.

## Results

### Network overview

*symmetric*when query and target genes are reversed. To account for the symmetric property of the entire genetic network, we have constructed both an

*asymmetric genetic network*that includes all currently available high-throughput genetic interactions and the

*symmetric genetic network*that covers interactions only between genes that have been used as queries (Fig. 1A). The graph of the symmetric genetic network is shown in Fig. 1B. Each node represents a gene, and each edge represents the genetic interaction between two connected genes. The edges are considered undirected, and we do not distinguish between edges that were detected in one or both directions. High connectivity in the symmetric genetic network (Fig. 1B) reflects that query genes were selected based on related functionality [6].

Previous analysis has suggested that shared genetic interaction partners correlate with physical interactions [6]. Quantitative measures for partner sharing in physical interaction networks has been defined as Mutual Clustering Coefficients (MCC) [14]. Here we use the negative Log_{10} of the P-value of the hypergeometric MCC as a quantitative measure of neighbor sharing in the genetic interaction network, and for convenience term it the congruence score [25]. Higher scores indicate that two genes share more genetic interaction partners than expected by chance. A genetic congruence network is then derived from introducing non-directed edges between congruent genes, using the congruence score to provide an edge weight (Fig. 1C). Asymmetric and symmetric congruence networks have been constructed from the corresponding genetic networks, respectively. A P-value of 0.01 for shared genetic interaction partners after correcting for multiple testing corresponds to a congruence score of 8 for the congruence network derived from the asymmetric genetic interactions and a congruence score of 6 for the network derived from the symmetric genetic interactions.

The protein interaction network we used is derived from ~45,000 protein-protein interactions compiled from the large-scale yeast two-hybrid and mass spectrometry analyses [7–10]. Each interaction has been assigned with a confidence score that corresponds to the network edge weight. The confidence score represents the probability that two proteins interact with each other [12].

Standard global topological measures describing network structure. Detailed analyses on path lengths and local motifs are described in Fig. 2 and 3.

Asymmetric genetic network | Symmetric genetic network | Asymmetric congruence network | Symmetric congruence network | Protein network | |
---|---|---|---|---|---|

No. of nodes | 1004 | 111 | 122 | 61 | 3208 |

No. of edges | 3799 | 813 | 267 | 146 | 13038 |

Average degree | 7.6 | 14.6 | 4.4 | 4.8 | 8.1 |

Average clustering coefficient | 0.10 | 0.37 | 0.73 | 0.84 | 0.45 |

### Network distances

Conventional analysis shows genetically interacting genes encode proteins in the same complex more often than would be expected by chance [6]. Because physical associations and genetic interactions each report on functional similarity, we might naively expect that physical and genetic links should be correlated. However, it has also been recognized that the number of genetic interaction pairs having direct physical interaction is a small fraction of the total number of genetic interaction pairs (~1%) [6]. Therefore, given currently known genetic and protein interactions and their overlap, the majority of genetic interactions do not connect physical partners.

We asked whether the other view of genetic interactions, i.e. genetic congruence, might yield improved concordance with physical interactions. We first explored the relationship between pair-wise genetic congruence versus direct physical interaction. High-throughput physical interaction data sets are known to include many false-positives, which can confound analysis. Confidence scores have been developed to reflect the probability that a physical interaction is a true-positive [12]. We observed that protein interaction confidence increases with the congruence score (Fig. 2B). Above the congruence score of 8 and 6, which corresponds to the network P-value of 0.01 for the asymmetric and symmetric networks respectively, all protein pairs exhibit high confidence interactions with confidence score greater than ~0.8. This implies that genetic congruence acts as an indication of high-confidence protein interactions. It is notable that information from a purely genetic experiment correlates well with information from a purely proteomic experiment. We also used receiver operating characteristic (ROC) curves to assess the relationship of congruence scores and physical interactions. ROC curves for asymmetric and symmetric congruence scores both climb rapidly away from the origin with high true positive rates and low false positive rates [see additional file 1, supp. fig. S2]. According to the area under the curve, the congruence score from the symmetric network performs better than the score from the asymmetric network, but at the cost of making fewer predictions. This is in agreement with the result from Fig. 2B that congruence scores of the symmetric network predict higher confidence physical interactions as compared with those of asymmetric network. The reason for the differences may be due to biased selection of query genes, as the symmetric network only contains query genes and all query genes were selected from a few related biological processes [6].

We further investigated the pair-wise congruence in the context of the protein interaction network. Our results show that the shortest path of physical interactions between congruent pairs decreases from ~3.6 links to 1 link (direct physical interaction) with increasing of congruence score (Fig. 2C). The path length transition begins when the congruence score increases beyond 8 and 6 for asymmetric and symmetric congruence networks, respectively. Once the score reaches 21 and 20 for asymmetric and symmetric networks, the congruent gene encoded proteins coincide with known direct physical interactions (4 pairs with congruence score = 21 in the asymmetric network and 1 pair with congruence score = 20 in the symmetric network).

Finally, to explore the connection between the congruence network and the protein network, we computed the highest score path for any two genes in the congruence network. Edge weights are in the range of 0 and 1 generated by applying a sigmoid function to the congruence scores (see Methods). The higher the path score, the higher probability two genes share similar genetic interaction partners. When comparing the highest path score in the congruence network with the shortest path length in the protein interaction network, we observed that the physical distance decreases monotonically from the average path length ~3.6 links to 1 (direct physical interaction) as the highest path score increases in both asymmetric and symmetric congruence networks (Fig. 2D). Therefore, *transitive* genetic congruence is commensurate with physical distances, which is similar to *direct* genetic congruence (Fig. 2C).

### Network motifs

Network motifs represent significantly recurrent patterns of simple interactions in complex networks [17]. Comparison of local structures in the network can help reveal the connections among superficially unrelated biological or social networks [18]. Additionally, the local structure of the network contributes to the understanding of global organization of the network [16]. To contrast the local structure of three types of networks, we counted the abundance of non-directed triads and tetrads in genetic, congruence, and protein networks. The random networks used to detect tetrads were generated to preserve the same triad counts as the real network [18].

When using the asymmetric genetic and congruence networks for comparison with the physical network, the pattern of enriched motifs (the relative motif ratio) is significantly correlated for congruence and protein interaction networks (Pearson correlation coefficient R = 0.76, P-value = 0.03), and these are anti-correlated with the enriched motifs for direct genetic interactions (R = -0.66, P-value = 0.08; R = -0.69, P-value = 0.06, respectively) (Fig. 3A). This is consistent with the above global distance analysis, supporting significant overlap between congruence and protein networks.

The enriched motifs in asymmetric congruence and protein networks are all transitive, including triad2 (triangle motif) and tetrad6. The triangle motif is the most significantly enriched motif, suggesting the transitive property of congruence interactions and physical interactions. This result is in agreement with our observation in the previous section that the *transitive* congruence is correlated with short physical distance (Fig. 2D). The asymmetric genetic interaction network, however, consists of both intransitive motif tetrad4 (square motif) and transitive motif tetrad6, with the square motif as the most enriched structure.

The detection of intransitive motifs in the asymmetric genetic network may be due to the artifact that the interactions have not yet been tested. It does not necessarily mean that these interactions do not exist. To overcome this bias, we repeated motif-finding procedure using the symmetric genetic network and corresponding congruence network (Fig. 3A). The pattern of enriched motifs is still significantly correlated for symmetric genetic congruence and protein interaction networks (Pearson correlation coefficient R = 0.73, P-value = 0.04), but these are insignificantly correlated with those for the symmetric genetic network (R = 0.29, P-value = 0.49; R = 0.10, P-value = 0.82, respectively). The enriched motifs in the symmetric congruence network remain the same as for the asymmetric congruence network, i.e. all transitive motifs, triad2 (triangle motif) and tetrad6.

A final concern is that the transitive motifs arise from the mathematical process of generating congruence scores: if genes A and B share synthetic lethal partners, and B and C share partners, then A and C may have an increased probability of sharing partners. To address the question, we followed the following protocol [see additional file 1, supp. fig. S3]: (1) Randomize the genetic interaction network. (2) Calculate congruence scores for gene pairs in the randomized network. (3) Set a threshold and calculate motif enrichment for the random congruence network. We repeated this process 100 times for both the symmetric and the asymmetric genetic interaction networks. The typical extreme value for the maximum congruence score observed was 5 for the symmetric network and 6 for the asymmetric. Thus, applying the same cutoff for congruence scores as in the actual network, 6 for symmetric and 8 for symmetric, typically rejects all the congruence edges in the randomized network. We reduced the thresholds to retain the same number of congruence edges as in the actual network, with mean values of 1.8 (symmetric) and 3.2 (asymmetric) over the 100 randomizations. The average clustering coefficient is significantly smaller in the random networks than the actual network: 0.23 vs. 0.84 (random vs. actual symmetric, P-value 10^{-402}), and 0.12 vs. 0.73 (random vs. actual asymmetric, P-value 10^{-933}). Although the transitive motif triad2 (triangle) is enriched in the random congruence network relative to a fully random network, the motif count is far below that observed in the actual congruence network [see additional file 1, supp. table S3]. Other patterns of motif enrichment are quite different: tetrad4 (square motif, intransitive) is enriched in the random congruence network and depleted in the actual network, and tetrad6 (4-clique, transitive) is enriched in the actual network but not in the random network [see additional file 1, supp. fig. S4]. The transitive motifs in the congruence network are therefore enriched significantly beyond what would be expected based solely on the method of defining congruence edges.

Both transitive and intransitive motifs are still detected in the symmetric genetic interaction network. However, the types are different from those in the asymmetric genetic network. The transitive triangle motif becomes the most enriched structure in the symmetric genetic network, in agreement with a previous study that genetic interaction partners of a gene have an increased likelihood to interact with each other [24]. One source of the triangle motif could be the requirement for any two of three pathways for viability. Notably, the square motif is still highly enriched in the symmetric genetic network despite the abundance of the triangles, indicating that the square motif will remain enriched when the complete genetic interaction network is determined.

The view from recent studies indicates that high clustering is a generic feature of biological networks, as exemplified by protein interaction and protein domain networks [13]. However, we find that the genetic interaction network has both transitive and intransitive motifs. The coexistence of triangle and square motifs in the genetic network suggests two scenarios for genetic interactions between pathway components. In one scenario, genetic interactions between two pathways generate a square motif. Each edge crosses between the pathways, and genes at opposite corners are in the same pathway. In the second scenario, any two of three pathways are required for viability. Genetic interactions cross between all three pathways, generating the triangle motif.

To further answer the question whether the enriched triangles and squares overlap with each other or are excluded from each other, we compared the members of triangle and square motifs in the symmetric genetic network (Fig. 3B). Results show that one-node sharing is the dominant scenario (76%) for triangles and squares. Assuming three pathways for the triangle motif and two pathways for the square motif, the one-node sharing case defines four parallel pathways with one shared by the square and triangle. Two-node sharing accounts for 22% of total possibilities, and suggests three parallel pathways with two shared by the triangle and square. Only 2% of total cases are the complete overlap of the triangle and square, which is in an agreement with our observation that tetrad5 is not an enriched motif in the symmetric genetic network (Fig. 3A).

Because the completed genetic interaction map will necessarily be symmetric (except for false-positives or false-negatives), the enriched motifs in the symmetric genetic network are more relevant than the enriched motifs in the asymmetric genetic network.

### Biological relevance

Correct interpretation of the relationship between genetic and protein interactions enables interesting biological predictions. As we have demonstrated in previous sections, genetic congruence and protein networks are similarly organized with corresponding distances and motifs. Then, we would expect that two genes closer in the congruence network have higher tendency to physically interact with each other, reside within one protein complex, and involve in similar biological process.

Physical interactions usually suggest functional association. Accordingly, we asked whether congruence also indicates functional connection besides physical connection. As an initial validation, we found that genes close in the congruence network share similar functional annotations recorded in the database of Gene Ontology (GO)[28], i.e. biological process and molecular function (Fig. 4B). Moreover, the functional similarity is consistently higher for gene pairs based on path score in the congruence network than based on distance in the genetic network.

An example of congruence coinciding with protein interaction and function association is the prefoldin complex, which includes *PAC10*, *GIM3*, *GIM4*, *GIM5*, and *YKE2*. These five genes are clustered in the congruence network and the average path score between any two members of this complex is 0.51 (Fig. 1C). They are all chaperone proteins forming a complex, which promotes efficient protein folding [29, 30].

## Discussion

We have demonstrated that high genetic congruence implies high probability of a physical interaction and short distance in the physical interaction network. Short distances in the congruence network (measured by a high path score), but not in the genetic network, are commensurate with distances in the protein network. To account for false-positives in the high throughput protein interaction datasets, parallel analyses were performed using a protein network with edges weighted according to interaction confidence [see additional file 1, supp. fig. S5], and the results were similar to those obtained from the un-weighted protein network. A guide to the figures showing path length comparisons among genetic, congruence, and protein networks is provided [see additional file 1, supp. table S4]. Local structure indicates similar transitive motif enrichment in congruence and protein networks, while the genetic network significantly consists of transitive as well as intransitive motifs. Both global distance analysis and local motif analysis demonstrate that the genetic congruence network possesses similar network transitivity to the protein network.

The similarity between congruence and protein networks and the dissimilarity between genetic and protein networks have yielded three interesting conclusions with biological significance. *First*, we have demonstrated that significant genetic congruence correlates strongly with protein complex membership and functional association. *Second*, genetically interacting pairs usually belong to compensatory pathways without direct physical interactions. *Finally*, the coexistence of triangles and squares in the genetic network indicates that robustness may be due to two pathways that compensate each other (squares), or three pathways any two of which are needed (triangles).

While the protein interaction and genetic congruence networks exhibit a high degree of similarity, we do not expect them to be identical because they are based on distinctly different experimental measures. The protein interaction network is based on protein binding constants in cellular extracts under selective precipitation conditions [7, 8] or within cells through over-expression of tested proteins [9, 10]. The congruence network is based on growth defects exhibited by cells lacking a pair of gene products cultured under standard conditions[6]. Thus, high congruence may not necessarily indicate a physical interaction. The concordance we observed between congruence and protein interaction network structures provides strong support for the argument that they both faithfully reflect biologically relevant network relationships.

The conclusions drawn from our study are limited by the current coverage of genetic and protein networks. This is especially true for the genetic network, which is at low coverage. Moreover, the current genetic network is biased by query gene selection. The ~150 query genes all have relative large numbers of interaction partners and related functionality[6]. As the coverage and symmetric property are increased, we expect that the average degree and average clustering coefficient will decline. Network distance results are robust in response to changes in genetic network symmetry and protein network edge weight. The symmetric genetic network has been used for motif counting and the relative motif ratio is insensitive to network size[18]. Therefore, we believe that our conclusions on network distances and motifs should continue to hold as the entire genetic interaction network is mapped.

## Conclusion

In summary, we have demonstrated that genetic congruence is superior to genetic interaction in predicting protein interactions and within-pathway functional associations. In contrast, genetic interaction pairs usually act in parallel compensatory pathways. Motif study indicates that genetic interactions bear both transitive and intransitive characters. Consideration of the symmetric property of a complete genetic interaction network is crucial to determination of motif enrichment for the genetic network.

## Methods

### Genetic interaction networks

The genetic interaction dataset is derived from a recent high throughput study in budding yeast [6]. The interaction is detected by cell growth defect through introducing a deletion of interest (query gene) into all viable yeast single-deletion strains (target gene). Interactions derived from 6 essential query genes, including *MYO2*, *SCC1*, *CDC2*, *CDC7*, *CDC42*, and *CDC45* were removed in our study because phenotypes exhibited by conditional alleles of essential genes may include loss of function, unregulated function, and gain of function, while null alleles of nonessential genes are by definition solely loss of function mutations. Results and conclusions do not change, however, when these 6 essential genes are included in the analysis.

We constructed two types of genetic networks. The *asymmetric genetic network* includes currently available high throughput genetic interactions, i.e. 3799 genetic interactions between 126 non-essential query genes and 982 target genes. The *symmetric genetic network* only contains interactions between query genes, i.e. 813 genetic interactions between 108 non-essential query genes and 104 target genes that have been used as queries.

### Randomization of genetic interactions

Genetic interactions from the high throughput study [6] were reported as an interaction between the query gene and the target gene. A randomized network was generated by keeping the query gene list unchanged, randomly matching one of the target genes according to the probability of each target gene shown in the interaction list with replacement. Duplicate query-target pairs and self-interaction pairs, which are not possible in the experimental networks, were rejected during randomization. Results depict the average over 1000 randomizations.

### Genetic congruence networks

The congruence score was defined as -log_{10} [hypergeometric P - value (*x* ≥ *k*_{obs}], and hypergeometric $\text{P}-\text{value}(x\ge {k}_{obs})={\displaystyle \sum _{x=k}^{\mathrm{min}\phantom{\rule{0.5em}{0ex}}(m,n)}C(m,x)C(t-m,n-x)/C(t,n)}$, where two target genes having m and n genetic interaction partners share x partners from a list of t query genes, and *C(j,k)* is the binomial coefficient j!/k!(j-k)! [25]. Related measures have been used to analyze protein interaction networks to predict protein-protein interactions [14]. The congruence score is calculated for every target gene pair in the symmetric and asymmetric genetic networks. The symmetric and asymmetric congruence networks are derived from the corresponding genetic networks, respectively. The distribution of network size over different congruence scores is provided [see additional file 1, supp. fig. S1]. The congruence score of 8 (122 nodes with 267 edges) for asymmetric congruence network corresponds to the network P-value of 0.01 after correction for multiple testing of per-link P-value 0.01/982^{2} = 10^{-8}. Similarly, the congruence score of 6 (61 nodes with 146 edges) is the cutoff value for the symmetric congruence network.

### Protein interaction network

We used 47,783 protein-protein interactions with confidence scores [12] compiled from the large-scale two-hybrid data sets of protein-protein interactions [9, 10] and mass spectrometry analysis of protein complexes [7, 8]. The distribution of network size over different confidence scores is provided [see additional file 1, supp. fig. S1].

### Network distances

The shortest path distance was counted for any two nodes in the un-weighted genetic interaction and protein interaction networks. The shortest path length is the sum of lengths of individual linkage.

The SEEDY algorithm [31] was used to compute highest score path distance for the weighted genetic congruence and protein interaction networks. The highest score path is the path with the maximal value of the product of edge weights. Disconnected components are ignored for both shortest path and highest score path calculations.

The edge weight for the protein network is the confidence score (in the range of 0 and 1) [12]. The edge weight for the genetic congruence network is derived from a sigmoid function $w=\frac{{e}^{(s-a)/b}}{1+{e}^{(s-a)/b}}$ (in the range of 0 and 1), where *s* is the congruence score, *a* and *b* are parameters. The rationale of introducing the above sigmoid function is derived from the probability distribution of Pr(true positive|*s*) = Pr(protein interaction|*s*) as genes sharing genetic interaction partners usually exhibit physical association [6]. The parameters *a* = 15.9 and *b* = 1.6 are the best-fit values for the sigmoid function to form a smoothed interpolation of Pr(protein interaction|*s*) for the asymmetric congruence network [see additional file 1, supp. fig. S6]. Results were not sensitive to the choice of parameter values [see additional file 1, supp. fig. S7]. Similarly, *a* = 17.7 and *b* = 3.4 are the best-fit values for the symmetric congruence network.

### Network motifs

We used the mfinder1.1 – network motifs detection tool http://www.weizmann.ac.il/mcb/UriAlon/groupNetworkMotifSW.html to count non-directed triad and tetrad motifs in genetic interaction, genetic congruence, and protein interaction networks. Both symmetric and asymmetric genetic networks were used for motif searching. Motifs were also counted for the symmetric congruence network with cutoff value of 6, the asymmetric congruence network with cutoff value of 8, and the protein network with confidence score greater than 0.5 [12]. Motif results are insensitive to the threshold values for congruence and protein networks [see additional file 1, supp. table S2]. The Metropolis algorithm was used to conserve the number of triads in random networks for tetrad motif counting. The relative motif ratio (RMR) was calculated to represent the abundance of each motif relative to random networks in which each node has the same number of edges as the corresponding node in the real network. The formula for RMR is defined as $RMR={\Delta}_{i}/{({\displaystyle \sum {\Delta}_{{i}^{2}}})}^{1/2}\text{,}{\Delta}_{i}=\frac{{N}_{rea{l}_{i}}-{N}_{ran{d}_{i}}}{{N}_{rea{l}_{i}}+{N}_{ran{d}_{i}}+\epsilon}\text{,and}\epsilon =4$. The criteria taken for enriched motifs are *N*_{
real
}*Zscore* > 2, *N*_{
real
}/*N*_{
rand
} > 1.1, *Uniqueness* ≥ 4 where Uniqueness is the number of times a motif appears in the network with completely disjoint groups of nodes [17, 18].

To quantify the motif transitivity, we give the definition of motif transitivity score (MTS) as $MTS=\frac{3\times \text{numberof'}\Delta \text{'-numberof'V'}}{3\times \text{numberof'}\Delta \text{'+numberof'V'}}$, where 'Δ' is a group of 3 vertices each of which is connected to the other two, and 'V' is a group of 3 vertices only one of which is connected to the other two. The 'Δ' and 'V' are mutually exclusive subgroups in the MTS calculation. The factor of 3 accounts for the fact that each 'Δ' is equivalent to three 'V'. This formula quantifies the motif transitivity in the range from -1 to 1, and is *insensitive* to the motif size. The MTS is 1 for a fully connected motif, and is -1 for a motif without the triangle. The values of MTS for triads and tetrads are listed [see additional file 1, supp. table S1].

## Declarations

### Acknowledgements

JSB acknowledges support from NIGMS, NCRR, and the Whitaker Foundation. FAS and BDP acknowledge support from the NHGRI. BDP acknowledges support from an NIH/NIGMS training grant. The authors acknowledge the reviewers for helpful suggestions that improved the manuscript and motivated the motif enrichment calculations for congruence networks obtained from randomized genetic interaction networks.

## Authors’ Affiliations

## References

- Hartman JL, Garvik B, Hartwell L: Principles for the buffering of genetic variation.
*Science*2001, 291(5506):1001–1004. 10.1126/science.291.5506.1001View ArticlePubMedGoogle Scholar - Tucker CL, Fields S: Lethal combinations.
*Nat Genet*2003, 35(3):204–205. 10.1038/ng1103-204View ArticlePubMedGoogle Scholar - Ooi SL, Shoemaker DD, Boeke JD: DNA helicase gene interaction network defined using synthetic lethality analyzed by microarray.
*Nat Genet*2003, 35(3):277–286. 10.1038/ng1258View ArticlePubMedGoogle Scholar - Pan X, Yuan DS, Xiang D, Wang X, Sookhai-Mahadeo S, Bader JS, Hieter P, Spencer F, Boeke JD: A robust toolkit for functional profiling of the yeast genome.
*Mol Cell*2004, 16(3):487–496. 10.1016/j.molcel.2004.09.035View ArticlePubMedGoogle Scholar - Tong AH, Evangelista M, Parsons AB, Xu H, Bader GD, Page N, Robinson M, Raghibizadeh S, Hogue CW, Bussey H, Andrews B, Tyers M, Boone C: Systematic genetic analysis with ordered arrays of yeast deletion mutants.
*Science*2001, 294(5550):2364–2368. 10.1126/science.1065810View ArticlePubMedGoogle Scholar - Tong AH, Lesage G, Bader GD, Ding H, Xu H, Xin X, Young J, Berriz GF, Brost RL, Chang M, Chen Y, Cheng X, Chua G, Friesen H, Goldberg DS, Haynes J, Humphries C, He G, Hussein S, Ke L, Krogan N, Li Z, Levinson JN, Lu H, Menard P, Munyana C, Parsons AB, Ryan O, Tonikian R, Roberts T, Sdicu AM, Shapiro J, Sheikh B, Suter B, Wong SL, Zhang LV, Zhu H, Burd CG, Munro S, Sander C, Rine J, Greenblatt J, Peter M, Bretscher A, Bell G, Roth FP, Brown GW, Andrews B, Bussey H, Boone C: Global mapping of the yeast genetic interaction network.
*Science*2004, 303(5659):808–813. 10.1126/science.1091317View ArticlePubMedGoogle Scholar - Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon AM, Cruciat CM, Remor M, Hofert C, Schelder M, Brajenovic M, Ruffner H, Merino A, Klein K, Hudak M, Dickson D, Rudi T, Gnau V, Bauch A, Bastuck S, Huhse B, Leutwein C, Heurtier MA, Copley RR, Edelmann A, Querfurth E, Rybin V, Drewes G, Raida M, Bouwmeester T, Bork P, Seraphin B, Kuster B, Neubauer G, Superti-Furga G: Functional organization of the yeast proteome by systematic analysis of protein complexes.
*Nature*2002, 415(6868):141–147. 10.1038/415141aView ArticlePubMedGoogle Scholar - Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams SL, Millar A, Taylor P, Bennett K, Boutilier K, Yang L, Wolting C, Donaldson I, Schandorff S, Shewnarane J, Vo M, Taggart J, Goudreault M, Muskat B, Alfarano C, Dewar D, Lin Z, Michalickova K, Willems AR, Sassi H, Nielsen PA, Rasmussen KJ, Andersen JR, Johansen LE, Hansen LH, Jespersen H, Podtelejnikov A, Nielsen E, Crawford J, Poulsen V, Sorensen BD, Matthiesen J, Hendrickson RC, Gleeson F, Pawson T, Moran MF, Durocher D, Mann M, Hogue CW, Figeys D, Tyers M: Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry.
*Nature*2002, 415(6868):180–183. 10.1038/415180aView ArticlePubMedGoogle Scholar - Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y: A comprehensive two-hybrid analysis to explore the yeast protein interactome.
*Proc Natl Acad Sci U S A*2001, 98(8):4569–4574. 10.1073/pnas.061034498PubMed CentralView ArticlePubMedGoogle Scholar - Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, Qureshi-Emili A, Li Y, Godwin B, Conover D, Kalbfleisch T, Vijayadamodar G, Yang M, Johnston M, Fields S, Rothberg JM: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae.
*Nature*2000, 403(6770):623–627. 10.1038/35001009View ArticlePubMedGoogle Scholar - Bader GD, Hogue CW: Analyzing yeast protein-protein interaction data obtained from different sources.
*Nat Biotechnol*2002, 20(10):991–997. 10.1038/nbt1002-991View ArticlePubMedGoogle Scholar - Bader JS, Chaudhuri A, Rothberg JM, Chant J: Gaining confidence in high-throughput protein interaction networks.
*Nat Biotechnol*2004, 22(1):78–85. 10.1038/nbt924View ArticlePubMedGoogle Scholar - Barabasi AL, Oltvai ZN: Network biology: understanding the cell's functional organization.
*Nat Rev Genet*2004, 5(2):101–113. 10.1038/nrg1272View ArticlePubMedGoogle Scholar - Goldberg DS, Roth FP: Assessing experimentally derived interactions in a small world.
*Proc Natl Acad Sci U S A*2003, 100(8):4372–4376. 10.1073/pnas.0735871100PubMed CentralView ArticlePubMedGoogle Scholar - Maslov S, Sneppen K: Specificity and stability in topology of protein networks.
*Science*2002, 296(5569):910–913. 10.1126/science.1065103View ArticlePubMedGoogle Scholar - Vazquez A, Dobrin R, Sergi D, Eckmann JP, Oltvai ZN, Barabasi AL: The topological relationship between the large-scale attributes and local interaction patterns of complex networks.
*Proc Natl Acad Sci U S A*2004, 101(52):17940–17945. 10.1073/pnas.0406024101PubMed CentralView ArticlePubMedGoogle Scholar - Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U: Network motifs: simple building blocks of complex networks.
*Science*2002, 298(5594):824–827. 10.1126/science.298.5594.824View ArticlePubMedGoogle Scholar - Milo R, Itzkovitz S, Kashtan N, Levitt R, Shen-Orr S, Ayzenshtat I, Sheffer M, Alon U: Superfamilies of evolved and designed networks.
*Science*2004, 303(5663):1538–1542. 10.1126/science.1089167View ArticlePubMedGoogle Scholar - Lin N, Wu B, Jansen R, Gerstein M, Zhao H: Information assessment on predicting protein-protein interactions.
*BMC Bioinformatics*2004, 5(1):154. 10.1186/1471-2105-5-154PubMed CentralView ArticlePubMedGoogle Scholar - Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M: A Bayesian networks approach for predicting protein-protein interactions from genomic data.
*Science*2003, 302(5644):449–453. 10.1126/science.1087361View ArticlePubMedGoogle Scholar - Middendorf M, Ziv E, Adams C, Hom J, Koytcheff R, Levovitz C, Woods G, Chen L, Wiggins C: Discriminative topological features reveal biological network mechanisms.
*BMC Bioinformatics*2004, 5(1):181. 10.1186/1471-2105-5-181PubMed CentralView ArticlePubMedGoogle Scholar - Kelley R, Ideker T: Systematic interpretation of genetic interactions using protein networks.
*Nat Biotechnol*2005, 23(5):561–566. 10.1038/nbt1096PubMed CentralView ArticlePubMedGoogle Scholar - Ozier O, Amin N, Ideker T: Global architecture of genetic interactions on the protein network.
*Nat Biotechnol*2003, 21(5):490–491. 10.1038/nbt0503-490View ArticlePubMedGoogle Scholar - Wong SL, Zhang LV, Tong AH, Li Z, Goldberg DS, King OD, Lesage G, Vidal M, Andrews B, Bussey H, Boone C, Roth FP: Combining biological networks to predict genetic interactions.
*Proc Natl Acad Sci U S A*2004, 101(44):15682–15687. 10.1073/pnas.0406614101PubMed CentralView ArticlePubMedGoogle Scholar - Ye P, Peyser BD, Pan X, Boeke JD, Spencer FA, Bader JS: Gene function prediction from congruent synthetic lethal interactions in yeast.
*Molecular Systems Biology*2005., In press:Google Scholar - Newman ME, Strogatz SH, Watts DJ: Random graphs with arbitrary degree distributions and their applications.
*Phys Rev E Stat Nonlin Soft Matter Phys*2001, 64(2 Pt 2):26118.View ArticleGoogle Scholar - Watts DJ, Strogatz SH: Collective dynamics of 'small-world' networks.
*Nature*1998, 393(6684):440–442. 10.1038/30918View ArticlePubMedGoogle Scholar - Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.
*Nat Genet*2000, 25(1):25–29. 10.1038/75556PubMed CentralView ArticlePubMedGoogle Scholar - Geissler S, Siegers K, Schiebel E: A novel protein complex promoting formation of functional alpha- and gamma-tubulin.
*Embo J*1998, 17(4):952–966. 10.1093/emboj/17.4.952PubMed CentralView ArticlePubMedGoogle Scholar - Vainberg IE, Lewis SA, Rommelaere H, Ampe C, Vandekerckhove J, Klein HL, Cowan NJ: Prefoldin, a chaperone that delivers unfolded proteins to cytosolic chaperonin.
*Cell*1998, 93(5):863–873. 10.1016/S0092-8674(00)81446-4View ArticlePubMedGoogle Scholar - Bader JS: Greedily building protein networks with confidence.
*Bioinformatics*2003, 19(15):1869–1874. 10.1093/bioinformatics/btg358View ArticlePubMedGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.