- Open Access
GAIA: a gram-based interaction analysis tool – an approach for identifying interacting domains in yeast
© Zhang and Ouellette; licensee BioMed Central Ltd. 2009
- Published: 30 January 2009
Protein-Protein Interactions (PPIs) play important roles in many biological functions. Protein domains, which are defined as independently folding structural blocks of proteins, physically interact with each other to perform these biological functions. Therefore, the identification of Domain-Domain Interactions (DDIs) is of great biological interests because it is generally accepted that PPIs are mediated by DDIs. As a result, much effort has been put on the prediction of domain pair interactions based on computational methods. Many DDI prediction tools using PPIs network and domain evolution information have been reported. However, tools that combine the primary sequences, domain annotations, and structural annotations of proteins have not been evaluated before.
In this study, we report a novel approach called Gram-bAsed Interaction Analysis (GAIA). GAIA extracts peptide segments that are composed of fixed length of continuous amino acids, called n-grams (where n is the number of amino acids), from the annotated domain and DDI data set in Saccharomyces cerevisiae (budding yeast) and identifies a list of n-grams that may contribute to DDIs and PPIs based on the frequencies of their appearance. GAIA also reports the coordinate position of gram pairs on each interacting domain pair. We demonstrate that our approach improves on other DDI prediction approaches when tested against a gold-standard data set and achieves a true positive rate of 82% and a false positive rate of 21%. We also identify a list of 4-gram pairs that are significantly over-represented in the DDI data set and may mediate PPIs.
GAIA represents a novel and reliable way to predict DDIs that mediate PPIs. Our results, which show the localizations of interacting grams/hotspots, provide testable hypotheses for experimental validation. Complemented with other prediction methods, this study will allow us to elucidate the interactome of cells.
- True Positive Rate
- Protein Pair
- Query Protein
- Domain Pair
- Receiver Operating Characteristic Plot
Biological functions of cells are determined by strict regulations of molecular interactions of proteins, lipids, carbohydrates and nuclear acids both temporally and spatially. Protein-Protein Interactions (PPIs) play important roles in all biological functions from enzyme catalysis, signal transduction, as well as many structural functions. Owing to advances in large-scale techniques such as the yeast two-hybrid system and affinity purification followed by mass spectrometry, interactomes of several model organisms such as Saccharomyces cerevisiae [1–6], Drosophila melanogaster [7, 8] and Caenorhabditis elegans  have recently been extensively studied. While such large-scale interaction data sets provide tremendous opportunities for data exploration although there are limitations: 1) the experimental techniques for detecting PPIs are time-consuming, costly and labour intensive; 2) the quality of certain datasets is uneven; and 3) technical limitations such as the requirement to tag proteins of interest still exist. It has been widely accepted that some proteins interact with each other through interactions between their domains, which are defined as independently structural and/or functional blocks of proteins. For example, some cytoskeletal proteins interact with actin because of the interaction between their gelsolin repeat domains . It has also been reported that sets of conserved residues within the WW domains can bind to proline-rich peptides . Therefore, the identification of DDIs can potentially shred light on the mechanism underlying PPIs. Unfortunately, identifying neither DDIs nor PPIs through experimental approaches is trivial. As a complementary alternative, computational approaches that identify DDIs have been studied intensively for years yielding some interesting results.
The currently available computational DDI prediction approaches can be categorized as follows: 1) Association-based approaches where each DDI is scored by the association of the number of interacting domain pairs between interacting protein pairs and non-interacting protein pairs. These methods, however, only compute each DDI locally without considering the information of other DDIs between protein pairs [12–14]. Deng et al. proposed an optimized approach, maximum likelihood estimation (MLE), which globally calculates the probabilities of interaction between two domains using the expectation-maximization (EM) algorithm . 2) Pattern-based approaches where the domain interaction pattern of each interacting protein pair is utilized to predict DDIs by applying machine learning approaches such as clustering algorithm  or random forest algorithm . 3) The Co-evolution-based approach where a pair of domains is regarded as interacting with each other if they share very similar phylogenetic trees . However, one of the caveats for these DDI prediction approaches is that the information regarding the sequences and structures of these domains is neglected and as a result they suffer from low sensitivities and specificities.
It is known that segments of n contiguous amino acids (or n-grams) correlate to specific secondary structure elements [19, 20]. Therefore, n-gram-based methods are widely exploited to predict the secondary structure or subcellular localization of proteins and to classify protein families using machine learning techniques [21–23]. The finding that n-grams are closely related to the secondary structure of protein domains prompts us to wonder whether n-grams can interact with each other. In fact, several studies have reported the interaction between n-grams. For example, molecular interaction exists between Smurf1 WW2 domain and PPXY motifs of Smad1. Src-homology 3 domain (SH3) binds to a PXXP peptide . Therefore, we hypothesize that some over-represented gram-gram interactions mediate DDIs and thus PPIs. In this study, we introduced a novel DDI prediction approach based on the primary sequence of proteins, by extracting n-gram frequencies from the annotated domain and DDI data set in yeast. This approach adopted substantial expansion from a related study reported previously .
Our approach, called GAIA, improves on other prediction approaches. When tested against a gold-standard data set, GAIA achieves a true positive rate (sensitivity) of 82% with a false positive rate (1 – specificity) of 21% and performs more accurately when the length of the gram is set to 4 amino acids. Using GAIA, we generated a list of 4-gram pairs that are significantly over-represented in the DDI data set. We postulate that these pairs mediate the DDIs in yeast. Overall, we demonstrate that GAIA, a gram-based method, provides a novel and reliable way to predict DDIs that may mediate PPIs in yeast. Our results, which show the localization of interacting grams/hotspots, provide testable hypotheses for experimental validation. Complemented with other prediction methods, this study facilitates us to elucidate the entire interactome of cells.
Performance of the GAIA algorithm
Next, we tested whether our predicted DDIs could be utilized to predict PPIs. When there is at least one of our predicted DDIs existing between a pair of proteins, this pair of proteins is predicted as interacting with each other. For the positive data set, it was observed that 76% (452 out of 595) of interacting protein pairs were successfully predicted. For the negative data set, 25% (149 out of 595) of non-interacting protein pairs were incorrectly detected to interact with each other, reaching a sensitivity of 76% and a specificity of 75% when the threshold of DDI's hits is set to 8.3. These results demonstrate GAIA superiority to even in vivo experimental PPI identification approaches [1–6, 8] as pointed out by several recent publications [26–28]. However, it should be noted that PPIs are predicted in GAIA under the assumption that interactions of given proteins are mediated by pairs of domains. Therefore, GAIA is not able to predict those PPIs mediated by amino acid segments outside of known interacting domains.
In order to investigate whether some gram pairs act as sequence signatures or markers of PPIs, we assigned a probability score to each gram pair (see method section) and compared the performance of GAIA with probability scores to that without probability scores. By using weighed gram pairs with probability scores, GAIA improved the sensitivity of DDI prediction from 68% to 82% and specificity from 66% to 79%. This improvement reflects the importance of highlighting gram-pairs that are over-represented in pairs of interacting domains but not in pairs of non-interacting domains, suggesting that these gram-pairs can act as sequence signatures.
Parameters of the GAIA algorithm
The GAIA algorithm is solely based on protein sequence so no further information such as protein function or protein evolution information is needed. Only two parameters are needed to tune GAIA: (i) the length of gram (Lg). Different gram lengths (3-grams, 4-grams, and 5-grams) have been tested. From observations of the ROC plots (Fig. 1), we found that with gram length of 3 or less, the DDI hits are not specific to the input DDI data set, therefore, yielding low true positive and high false positive rates. Conversely, with gram length of 5 or more, the DDI hits are too specific/low to differentiate between the positive and negative data sets. Therefore, we concluded that 4-gram yielded the best accuracy; and (ii) the threshold of the number of DDI hits (Nhit). Choosing a proper threshold value optimizes the sensitivity at the expense of the specificity. For example, setting a lower threshold results in an increased sensitivity, at the expense of a decreased specificity. Similarly, a higher threshold results in a decreased sensitivity with an increase in specificity. Based on the ROC plots, it was found that GAIA achieves a sensitivity of 82% and specificity of 79% when the threshold is set to 8.3 (Fig. 1).
Case studies on predicted DDIs
Detecting new DDI-mediated PPIs and unknown domains
The GAIA tool performs well on previously reported PPIs mediated by DDIs in the gold-standard data set at a true positive rate of 82%. We therefore sought to apply the GAIA tool to identity novel PPIs and to determine the domains through which these interactions are mediated. Recently, Smy2p (YBR172C, NP_009731.2), a yeast gene encoding a protein of unknown function, was found to interact with Sec23p (YPR181C, NP_015507.1)/Sec24p (YIL109C, NP_012157.1) subcomplex and to participate in the coat protein complex II (COPII) vesicle formation from the endoplasmic reticulum (ER) . The interaction between Smy2p and Sec23p was also predicted by GAIA. This successful prediction not only proves the ability of GAIA to detect novel PPIs but also suggests that the interaction might be mediated by DDIs. According to the domain annotations from the Pfam database , there is one annotated domain (PF02213: GYF) in Smy2p and 5 annotated domains (PF04810: zf-Sec23_Sec24; PF04811: Sec23_trunk; PF08033: Sec23_BS; PF04815: Sec23_helical; PF00626: Gelsolin) in Sec23p. Currently, there is no report of the DDIs between Smy2p and Sec23p in the literature. However, upon close examination of the prediction results from GAIA, we found two gram-pairs that may contribute to this PPI. The first pair has 18.7 DDI hits and is located at residues 410 – 413 of Sec23p, which corresponds to PF08033 and residues 68 – 71 of Smy2p. The second pair has 15.3 DDI hits and is located at residues 409 – 412 of Sec23p which corresponds to PF08033 and residues 499 – 502 of Smy2p. These results suggest that the Beta sandwich domain on Sec23p might well be involved in the PPI between Sec23p and Sym2p. Furthermore, we also found that another pair of 4-grams located at residues 616 – 619 in the Beta sandwich domain of Sec24p interacts with another 4-gram located at residue 713 – 716 of Sym2p, further supporting the important role for the Beta sandwich domain in the interaction between Sec23p/Sec24p and Sym2p. However, no known domain annotations have been associated with the location of the 4-grams on Smy2p, suggesting that potential domains of functional interest on Smy2p need to be further validated experimentally.
In addition to identifying new PPIs mediated by DDI, we also tested our GAIA tool on some protein pairs to infer new interacting domains from the predicted PPIs. Bud5 (YCR038C, NP_009967.2) and Bud8 (YLR353W, NP_013457.1) are two proteins involved in bud-site selection of diploid cells in yeast . Krappmann et. al utilized the systematic structure-function analyses to identify that Bud5p physically interacts with Bud8p, and also interacts with Bud9p (YGR041W) which is involved in the delivery of the proteins to the cell poles . They also found that the region of residues 74 – 216 on Bud8p and the region of residues 91 – 218 on Bud9p are interacting domains required to bind Bud5. Interestingly, GAIA also predicted a 4-gram pair that might mediate this interaction. This gram pair has 12.4 DDI hits and is located at residues 183 – 186 of Bud8p, which corresponds within the newly discovered 74 – 216 region mentioned above. This data supports our hypothesis that GAIA can be used to detect novel interacting domains from public domain-related data sets.
Characterizing over-represented gram pairs
A list of the most frequent gram pairs in DDI data set.
2.2 × 10–16
7.7 × 10–16
2.2 × 10–16
2.2 × 10–16
2.2 × 10–16
2.2 × 10–16
2.2 × 10–16
2.2 × 10–16
2.2 × 10–16
2.2 × 10–16
Comparison between different approaches
DDI prediction algorithms similar to GAIA such as association method (AM) , maximum likelihood estimation approach (MLE)  and relative co-evolution of domain pairs approach (RCDP)  have recently been reported. It is difficult to compare the prediction accuracy of each approach directly because different testing datasets were utilized in each study. It is reported that AM achieves a sensitivity of 97% when tested against a small subset of interacting proteins. MLE achieved a sensitivity of 77.6% and a positive prediction value (PPV) of 42.5% when tested against a combined data set identified by yeast two-hybrid (Y2H) system. RCDP reported a sensitivity of 63.95% against a positive data set containing interacting proteins with DDIs derived from Protein Data Bank (PDB) crystal structures  and a specificity of 55.19% against a data set of randomly generated protein pairs. In order to eliminate the possibility that our gold standard data set is biased towards GAIA, therefore, we tested GAIA against the same testing data set (a combined data of two Y2H data sets derived from Uetz et al.  and Ito et al. ) used in each approach. GAIA achieved a PPV of 69% at the sensitivity is of 78% whereas AM and MLE achieved PPV of 42.5% and 24%, respectively, at the sensitivity of 78% , indicating that GAIA outperforms both AM and MLE. To account for the consideration that the improved performance is due to the better quality of input data, we also trained AM and MLE on 6304 PPIs containing identical number of DDIs as our GAIA training data set. We found that AM achieved a sensitivity of 51% with a specificity of 79% and MLE achieved a sensitivity of 57% with a specificity of 79% when tested against our gold-standard data set, proving that protein sequence information combined with structural information derived from iPfam is a better indicator to predict DDIs. In addition, GAIA also achieved a better sensitivity of 83% if the specificity was set to 55% in comparison to RCDP using the same testing data set as RCDP, illustrating that GAIA also performs better than RCDP. In summary, GAIA has the following advantages compared to other aforementioned approaches: 1) We have shown that GAIA can achieve better sensitivity and specificity in detecting DDIs; 2) GAIA is solely based on domain sequences and DDIs derived from PDB, rather than just PPI information, since prediction performance may be affected by poor PPI data set quality. We strongly believe that gram pairs such as those used in GAIA play a "signature" role in mediating the binding of a domain pair or protein pair. 3) By using protein sequences, GAIA precisely specifies the localization of interacting grams/hotspots.
GAIA is a novel tool for identifying DDIs that mediate PPIs. GAIA takes the public DDI data set and the domain sequence data set as inputs and predicts the interaction between a query protein pair if the DDI hit frequencies of the gram pairs across the query proteins are above the preset threshold (8.3 DDIs). Tested against a "gold-standard" data set, GAIA achieves 82% true positive rate at the expense of 21% false positive rate. GAIA was used to identify a list of 4-gram pairs that is significantly over-represented in the DDI data set that may mediate PPIs. GAIA allows us to predict currently unknown interacting domains and to identify potential interacting gram pairs/hotspots between proteins. This study complements previous prediction approaches and also improves upon similar prediction modeling systems. The resultant predictions provide testable hypotheses for experimental validation. In the meantime, GAIA is limited by its highly intensive computational time (10 mins/per pair), which is currently being addressed by making changes to GAIA so that it can run in a distributed environment. While GAIA has good prediction capacity, increasing the size of the DDI data set would assist identification of a more complete set of gram pairs within the DDI data sets. This could ultimately lead us to a more complete identification of PPIs mediated by DDIs.
The aim of this work is to predict DDIs based on the frequency of each possible gram-pair from a pair of query proteins. The frequencies of aforementioned gram-pairs are calculated from the annotated DDI data set and random data sets. In addition to predicting DDIs, GAIA also generates a list of gram pairs and their protein primary structure coordinates that contribute to the interaction between pairs of domains on query proteins. Details of how the GAIA algorithm works are provided in the following section, along with information about the data set collection, performance evaluation, and development environment.
The GAIA algorithm
Step A. For each 4-gram G i , in query protein A, we generated a list of iPfam annotated domains dlistG[i] that contain this gram and the number of hits of this gram in each domain;
Step B. For each 4-gram G j appearing in query protein B, we also generated a list of Pfam annotated domains dlistG[j] that contain this gram and the number of hits of this gram in each domain;
Step C. For each gram-pair (G i , G j ) between the query proteins A and B, we calculated the frequency of hits freq[i][j] for this gram-pair represented in interacting domain-domain pairs previously established in Pfam . Then, the final frequency of hits score[i][j] for this gram-pair was weighted by weightScore[i][j] to determine if the number of its occurrences in the interacting domain pairs is statistically significant. The hit scores and weight scores are calculated by the following formulas:
hitscore[i][j] = No. of hits *weightScore[i][j]
weightScore[i][j] = P(real|random)(Gram[i][j])
Here, P(real|random) [i] [j] is the probability of the number of occurrences of Gram[i][j] in the interacting domain pairs is expected at random. Comparable control domain pairs were randomly generated by pairing domains from the DDI data set.
Data set collection
We compiled 3,020 DDIs in yeast and their corresponding amino acid sequences from Pfam , a database containing protein domains and domain families, and iPfam , a database of DDIs derived from their RCSB Protein Data Bank (PDB) crystal structures . For the purpose of evaluating prediction performance, we also used a "gold-standard" dataset that contained 595 PPIs compiled from a PPI dataset identified by the homologous protein interaction verification (HPIV) method . It is reported that the HPIV positive dataset has better quality when used as the training data set for predicting PPIs . All interacting protein pairs in our positive gold-standard dataset were expected to match three following criteria: 1) each pair is in the HPIV positive dataset; 2) each protein contains more than one domain; 3) each pair contains at least one iPfam domain-domain interaction. We also generated another "gold-standard" negative dataset containing 595 non-interacting protein pairs from the HPIV negative dataset. Compared to other simple approaches [39, 40], HPIV applied a more sophisticated way to identify non-interacting protein pairs by multiple evidences such as functional, localization, expression and homology-based data .
Evaluation of the GAIA algorithm
The performance of the scoring method was measured by the Receiver Operating Characteristic (ROC) curve and the Area Under the Curve (AUC). The area under the curve was calculated by the Wilcoxon rank sum test. ROC curve provides us an indicator of the sensitivity and how it is affected by the specificity. The area under the curve highlights discrimination (i.e., the ability of correctly classifying those interacting and non-interacting proteins). The ROC curve was generated by calculating the true positive rate (sensitivity) and the false positive rate (1-specificity) at the different thresholds on scores derived from PPIs and DDIs in the network, and combined scores from both kinds of interactions against the "gold-standard" data set. If the number of hits of any domain pair in a protein pair was above the threshold and it was in the DDIs of positive portion of the "gold-standard" data set, then it was regarded as a true positive. Alternatively, if it was not in the positive portion of the "gold-standard" dataset, then it was a false positive. If the number of hits of a domain pair in a protein pair was below the threshold and it was in the negative portion of the "gold-standard" data set, then it was regarded as a true negative. Alternatively, if it was not in the negative portion of the "gold-standard" data set, then it was a false negative.
Data and program availability
The related data sets and scripts, source code, and binaries are available for download from . All scripts were written in Perl language version 5.8.6 and tested on a MacOS10.4.10 with a Macintosh work station (2.4 GHz Intel Core 2 Duo with 2GB 667 MHz DDR2 SDRAM). The source code and scripts are distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
The authors are grateful to Quang Trinh, Michelle Brazas and Li Zhang for comments on the manuscripts. This work was conducted with the support of the Ontario Institute for Cancer Research through funding provided by the government of Ontario. KXZ is supported by the CIHR/MSFHR Strategic Training Program in Bioinformatics. KXZ is also supported by the CIHR Canada Graduate Scholarships Doctoral Award.
This article has been published as part of BMC Bioinformatics Volume 10 Supplement 1, 2009: Proceedings of The Seventh Asia Pacific Bioinformatics Conference (APBC) 2009. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/10?issue=S1
- Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, Bastuck S, Dumpelfeld B: Proteome survey reveals modularity of the yeast cell machinery. Nature. 2006, 440 (7084): 631-636.View ArticlePubMedGoogle Scholar
- Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon AM, Cruciat CM: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature. 2002, 415 (6868): 141-147.View ArticlePubMedGoogle Scholar
- Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams SL, Millar A, Taylor P, Bennett K, Boutilier K: Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 2002, 415 (6868): 180-183.View ArticlePubMedGoogle Scholar
- Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y: A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA. 2001, 98 (8): 4569-4574.PubMed CentralView ArticlePubMedGoogle Scholar
- Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, Datta N, Tikuisis AP: Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature. 2006, 440 (7084): 637-643.View ArticlePubMedGoogle Scholar
- Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000, 403 (6770): 623-627.View ArticlePubMedGoogle Scholar
- Formstecher E, Aresta S, Collura V, Hamburger A, Meil A, Trehin A, Reverdy C, Betin V, Maire S, Brun C: Protein interaction mapping: a Drosophila case study. Genome Res. 2005, 15 (3): 376-384.PubMed CentralView ArticlePubMedGoogle Scholar
- Giot L, Bader JS, Brouwer C, Chaudhuri A, Kuang B, Li Y, Hao YL, Ooi CE, Godwin B, Vitols E: A protein interaction map of Drosophila melanogaster. Science. 2003, 302 (5651): 1727-1736.View ArticlePubMedGoogle Scholar
- Li S, Armstrong CM, Bertin N, Ge H, Milstein S, Boxem M, Vidalain PO, Han JD, Chesneau A, Hao T: A map of the interactome network of the metazoan C. elegans. Science. 2004, 303 (5657): 540-543.PubMed CentralView ArticlePubMedGoogle Scholar
- McGough AM, Staiger CJ, Min JK, Simonetti KD: The gelsolin family of actin regulatory proteins: modular structures, versatile functions. FEBS Lett. 2003, 552 (2–3): 75-81.View ArticlePubMedGoogle Scholar
- Kato Y, Nagata K, Takahashi M, Lian L, Herrero JJ, Sudol M, Tanokura M: Common mechanism of ligand recognition by group II/III WW domains: redefining their functional classification. J Biol Chem. 2004, 279 (30): 31833-31841.View ArticlePubMedGoogle Scholar
- Kim WK, Park J, Suh JK: Large scale statistical prediction of protein-protein interaction by potentially interacting domain (PID) pair. Genome Inform. 2002, 13: 42-50.PubMedGoogle Scholar
- Ng SK, Zhang Z, Tan SH, Lin K: InterDom: a database of putative interacting protein domains for validating predicted protein interactions and complexes. Nucleic Acids Res. 2003, 31 (1): 251-254.PubMed CentralView ArticlePubMedGoogle Scholar
- Sprinzak E, Margalit H: Correlated sequence-signatures as markers of protein-protein interaction. J Mol Biol. 2001, 311 (4): 681-692.View ArticlePubMedGoogle Scholar
- Deng M, Mehta S, Sun F, Chen T: Inferring domain-domain interactions from protein-protein interactions. Genome Res. 2002, 12 (10): 1540-1548.PubMed CentralView ArticlePubMedGoogle Scholar
- Wojcik J, Schachter V: Protein-protein interaction map inference using interacting domain profile pairs. Bioinformatics. 2001, 17 (Suppl 1): S296-305.View ArticlePubMedGoogle Scholar
- Chen XW, Liu M: Prediction of protein-protein interactions using random decision forest framework. Bioinformatics. 2005, 21 (24): 4394-4400.View ArticlePubMedGoogle Scholar
- Jothi R, Cherukuri PF, Tasneem A, Przytycka TM: Co-evolutionary analysis of domains in interacting proteins reveals insights into domain-domain interactions mediating protein-protein interactions. J Mol Biol. 2006, 362 (4): 861-875.PubMed CentralView ArticlePubMedGoogle Scholar
- Pauling L, Corey RB, Branson HR: The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain. Proc Natl Acad Sci USA. 1951, 37 (4): 205-211.PubMed CentralView ArticlePubMedGoogle Scholar
- Vries JK, Liu X, Bahar I: The relationship between n-gram patterns and protein secondary structure. Proteins. 2007, 68 (4): 830-838.View ArticlePubMedGoogle Scholar
- Birzele F, Kramer S: A new representation for protein secondary structure prediction based on frequent patterns. Bioinformatics. 2006, 22 (21): 2628-2634.View ArticlePubMedGoogle Scholar
- King BR, Guda C: ngLOC: an n-gram-based Bayesian method for estimating the subcellular proteomes of eukaryotes. Genome Biol. 2007, 8 (5): R68-PubMed CentralView ArticlePubMedGoogle Scholar
- Wu CH, Huang H, Yeh LS, Barker WC: Protein family classification and functional annotation. Comput Biol Chem. 2003, 27 (1): 37-47.View ArticlePubMedGoogle Scholar
- Sangadala S, Metpally RP, Reddy BV: Molecular interaction between Smurf1 WW2 domain and PPXY motifs of Smad1, Smad5, and Smad6 – modeling and analysis. J Biomol Struct Dyn. 2007, 25 (1): 11-23.View ArticleGoogle Scholar
- Lim WA, Richards FM, Fox RO: Structural determinants of peptide-binding orientation and of sequence specificity in SH3 domains. Nature. 1994, 372 (6504): 375-379.View ArticlePubMedGoogle Scholar
- Pitre S, Dehne F, Chan A, Cheetham J, Duong A, Emili A, Gebbia M, Greenblatt J, Jessulat M, Krogan N: PIPE: a protein-protein interaction prediction engine based on the re-occurring short polypeptide sequences between known interacting protein pairs. BMC Bioinformatics. 2006, 7: 365-PubMed CentralView ArticlePubMedGoogle Scholar
- Bader GD, Hogue CW: Analyzing yeast protein-protein interaction data obtained from different sources. Nat Biotechnol. 2002, 20 (10): 991-997.View ArticlePubMedGoogle Scholar
- Kiermer V: Protein-protein interactions: better by the dozen. Nat Methods. 2007, 4 (5): 389-View ArticlePubMedGoogle Scholar
- Collins SR, Kemmeren P, Zhao XC, Greenblatt JF, Spencer F, Holstege FC, Weissman JS, Krogan NJ: Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae. Mol Cell Proteomics. 2007, 6 (3): 439-450.View ArticlePubMedGoogle Scholar
- Hunte C, Palsdottir H, Trumpower BL: Protonmotive pathways and mechanisms in the cytochrome bc1 complex. FEBS Lett. 2003, 545 (1): 39-46.View ArticlePubMedGoogle Scholar
- Higashio H, Sato K, Nakano A: Smy2p Participates in COPII Vesicle Formation Through the Interaction with Sec23p/Sec24p Subcomplex. Traffic. 2007Google Scholar
- Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL: The Pfam protein families database. Nucleic Acids Res. 2007Google Scholar
- Ni L, Snyder M: A genomic study of the bipolar bud site selection pattern in Saccharomyces cerevisiae. Mol Biol Cell. 2001, 12 (7): 2147-2170.PubMed CentralView ArticlePubMedGoogle Scholar
- Krappmann AB, Taheri N, Heinrich M, Mosch HU: Distinct domains of yeast cortical tag proteins Bud8p and Bud9p confer polar localization and functionality. Mol Biol Cell. 2007, 18 (9): 3323-3339.PubMed CentralView ArticlePubMedGoogle Scholar
- Hofmann MW, Peplowska K, Rohde J, Poschner BC, Ungermann C, Langosch D: Self-interaction of a SNARE transmembrane domain promotes the hemifusion-to-fusion transition. J Mol Biol. 2006, 364 (5): 1048-1060.View ArticlePubMedGoogle Scholar
- Finn RD, Marshall M, Bateman A: iPfam: visualization of protein-protein interactions in PDB at domain and amino acid resolutions. Bioinformatics. 2005, 21 (3): 410-412.View ArticlePubMedGoogle Scholar
- PDB. [http://www.pdb.org/]
- Saeed R, Deane C: An assessment of the uses of homologous interactions. Bioinformatics. 2008, 24 (5): 689-695.View ArticlePubMedGoogle Scholar
- Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M: A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science. 2003, 302 (5644): 449-453.View ArticlePubMedGoogle Scholar
- Qi Y, Klein-Seetharaman J, Bar-Joseph Z: Random forest similarity for protein-protein interaction prediction from multiple sources. Pac Symp Biocomput. 2005, 531-542.Google Scholar
- GAIA. [http://www.oicr.on.ca/research/ouellette/gaia]
- Cn3D. [http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml]
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.