High-throughput identification of interacting protein-protein binding sites
© Chung et al; licensee BioMed Central Ltd. 2007
Received: 21 December 2006
Accepted: 27 June 2007
Published: 27 June 2007
With the advent of increasing sequence and structural data, a number of methods have been proposed to locate putative protein binding sites from protein surfaces. Therefore, methods that are able to identify whether these binding sites interact are needed.
We have developed a new method using a machine learning approach to detect if protein binding sites, once identified, interact with each other. The method exploits information relating to sequence and structural complementary across protein interfaces and has been tested on a non-redundant data set consisting of 584 homo-dimers and 198 hetero-dimers extracted from the PDB. Results indicate 87.4% of the interacting binding sites and 68.6% non-interacting binding sites were correctly identified. Furthermore, we built a pipeline that links this method to a modified version of our previously developed method that predicts the location of binding sites.
We have demonstrated that this high-throughput pipeline is capable of identifying binding sites for proteins, their interacting binding sites and, ultimately, their binding partners on a large scale.
Protein-protein interactions are essential to most biological processes, for example, signal transduction, hormone-receptor binding and immunological recognition. These processes comprise complex cellular protein interaction networks that are becoming increasingly accessible in the post-genome era of high-throughput proteomics. Experimental methods such as mass spectrometry, phage display and yeast two hybrid have been developed to quickly identify interactions between proteins in various organisms [1–4]. Concurrently, computational approaches exploiting amino acid properties, genomic and evolutionary information [5–13] have been proposed to determine whether proteins interact or not (binary interactions). While both large scale experimental and computational methods are known to produce many false positive and false negative predictions, the combination of using several methods may provide more reliable results. The idea of using consensus results is not new and has been used in the meta-servers for structure prediction, generating consensus models according to the results of several prediction servers . Here we provide a new approach to the computational prediction of interacting protein-protein binding sites which can contribute to this greater accuracy .
Thanks, in large part, to the availability of increasing sequence and structural information, various computational methods have been proposed to identify putative protein-protein binding sites utilizing evolutionary relationships [15–19], properties of surface patches [20, 21], residue hydrophobicity , etc. Recently, machine learning approaches such as neural networks [23–26], support vector machines [27–31] and Bayesian network  have been used to distinguish interface residues from non-interface residues based on sequence and structural properties.
All of these methods locate binding sites from protein surfaces, but none of them provide information about their binding partners (binding specificity). Therefore, methods that identify interacting protein binding sites are necessary. Inherently these methods would then allow more reliable determination of binary interactions. Docking approaches provide this information by predicting the binary complex of two known structures based on energetic or geometric complementary [33, 34]. However, long computation time is often required to determine each putative complex and most docking approaches are limited to rigid protein model analysis. Homology modeling  and multimeric threading  build an atomic model of a complex based on a template structure using sequence alignments. These two methods have been tested on large scale data sets [37, 38]. They both rely on the limited number of structure templates of complexes  and usually require sequence identity above 30% between homologs [40, 41]. Aytuna et al.  predicted protein-protein complexes by seeking pairs of proteins that share structurally and evolutionarily conserved residue similarity to 67 template interfaces. Pazos et al.  utilized correlated mutation for determining pairs of proteins that are likely to bind and also identified binding sites concurrently. Although structural information is not required for this method, a large set of multiple sequence alignment for each possible pair of proteins is needed.
Results and discussion
The contact preferences of interface residues
Ofran et al. have reported significant differences in residue composition and contact preferences between interfaces of hetero-obligomers, hetero-complexes, homo-obligomes, and homo-complexes . In our data set, some differences between homo-dimer and hetero-dimer interfaces have also been observed. There was a higher tendency for hydrophobic-hydrophobic interactions in homo-dimers. On the other hand, more salt bridges and fewer contacts between residues with the same charge were preferred in hetero-dimers.
Several previous studies have provided detailed analysis of interaction preferences of different types of protein-protein interfaces in terms of amino acid, secondary structure or other properties [47–51]. The results of those previous studies and our studies show some variations because of the different composition of data sets and the definition of the interface residues. Nevertheless, the survey presented here indicates that information from sequence profile, secondary structure and accessible surface area (ASA) may be useful discriminators for defining contacting interface residues and can be captured by SVM predictors.
Identification of interacting binding sites using support vector machines
We then evaluated the prediction method exploiting structural information. At a surface patch size of 1 (Figure 7(b)), incorporating information of secondary structure with the sequence profile increased the prediction accuracy significantly. However, further incorporation of information on the accessible surface area did not result in any additional improvement. As we increased the patch size to 3 (Figure 7(c)), we noticed that secondary structure and ASA had no impact on prediction. Therefore, for this study, we chose a sequence profile with a patch size 3 as the default input features.
The prediction performances
Accuracy for interacting binding sites, also known as recall (%)
Accuracy for non-interacting binding sites (%)
Average accuracy (%)
Mix1 (sequence window)
Mix1 (putative binding sites)
The surface patch used here included only the two nearest surface residues, which are very likely to be located in the same sequence segment. For this reason we further performed a trial using the sequence profile with a window size 5 in sequence (that is, including 4 sequentially nearest residues). As expected, the ROC curves in Figure 7(a) indicates that the predictor using sequence information only was able to perform similarly to the predictor using a patch size of 3. At a threshold of 56%, 88.7% of interacting binding sites and 63.3% of non-interacting binding sites were correctly assigned. When the average accuracy reached its maximum (threshold: 64%), 78.5% of interacting binding sites and 76.9% of non-interacting binding sites were correctly assigned.
In this study, two binding sites were predicted to interact with each other if >56% of the total possible residue pairs between them were predicted to be in contact with each other. Raising the threshold increases the precision but decreases the recall (assigns less interacting binding sites and more non-interacting sites) and vice versa. In the data set, the contacting residue pairs for two interacting binding sites constituted, on average, only approximately 8% of the total possible residue pairs. Therefore, the threshold (56%) selected above seem to be very high. However, when we considered the 3 spatially nearest residues of each any two contacting residues to be in contact with each other across the interface, the fraction of contacting residue pairs out of the total possible residue pairs increased to 47%.
Predictions on putative binding sites
A pipeline was built to test if two putative binding sites would interact with each other (Figure 1). Given two proteins A and B, the pipeline first identifies the putative binding site of each protein (with predictor I) and then identifies the interaction between the two putative binding sites (with predictor II, which is the method presented in this study). This pipeline is able to provide information on both the location of binding sites and their binding partners.
Putative binding sites of individual components of each complex in our data set were determined by a method modified from our previous work, using sequence and structural information  (predictor I, see methods). With this method, the recall was 65.6% and the precision was 45.2% at the residue level. The results are summarized as follows: 49.81% of the binding sites were precisely predicted, 71.30% of the binding sites were correctly predicted and 23.6% of the binding sites were partially covered by the predicted residues. If at least 70% of the residues at a site were identified, we defined this to be precisely predicted. If at least 50% of the residues at a site were identified, we considered this to be correctly predicted.
The pipeline consists of binding site identification (predictor I) and subsequent prediction of whether two of these identified sites interact (predictor II). Since predictor II depends little on spatially neighboring residues and is mostly sequence dependant, a sequence only predictor I could be substituted [19, 42].
We have developed a new method to predict interacting protein binding sites using a machine learning technique. To the best of our knowledge, this method is the first trial that predicts interacting binding sites without the constraint of using structure templates. An SVM was trained to learn the complementary information across interfaces and has been tested on a data set consisting of 584 non-redundant homo-dimer and 198 hetero-dimer interfaces. Our predictor successfully identified 87.4% of the interacting binding sites and 68.6% of the non-interacting binding sites. Separate training and testing on homo-dimers and hetero-dimers showed different prediction results, which might be caused by the differences in residue contact preferences between these two types of interfaces. For homo-dimers, 96.4% of the interacting binding sites and 67.5% of the non-interacting binding sites were correctly identified. For hetero-dimers, 66.3% of the interacting binding sites and 62.8% of the non-interacting binding sites were correctly identified. Better predictions are expected as more structures are determined and the number of homo-dimer and hetero-dimer complexes upon which to train increases.
We built a pipeline combining the method discussed here to a modified version of our previously developed method that identifies the location of binding sites. Taking both predictors together we showed that the prediction accuracy that were based on putative binding sites only decreased slightly over accurately known sites. Thus the pipeline enables the simultaneous prediction of binding sites and binding partners, identifying 87.3% of the interacting binding sites and 67.6% of the non-interacting binding sites in our data set. In the future, the pipeline can be used to search new protein binding sites and interactions in various biological systems, and therefore build interaction networks based on interaction details between proteins. It can also be used to validate existing networks.
At this time it is difficult to compare the results presented here with those of other methods since each uses different training sets and there is a lack of a common test set . In addition, most existing methods have not been tested on negative data and hence prediction statistics (precision, recall, etc.) were not provided. Nevertheless, different methods have different limitations and exploit different information to various extents. For example, most docking procedures are computational expensive, homology modeling and multimeric threading rely on the availability of complex structure templates and correlated mutation methods need a large set of sequence alignment for each possible protein pair. Current efforts are directed at attaining higher prediction accuracy through incorporation of additional information such as local interface geometry or water mediated interactions into our predictor.
A non-redundant data set of dimer complexes was compiled using the method of Zhou et al.  modified as follows. All non-NMR multiple-chain protein entries with resolution better than 3.5 Å were collected from the PDB (March, 2004) . For each entry, two chains were selected as an interacting protein pair if both have more than 20 residues that formed interfacial contacts with each other. A residue was considered to form an interfacial contact if the distance between any of its heavy atoms and any heavy atoms of its interacting proteins were <5 Å. The pairs containing chains with < 80 amino acids or SCOP class >= 8 were then filtered out.
Each of the collected chains was further compared against all other chains by BLAST. Chains were assigned to the same cluster if the sequence identity was > 30% and > 90% of the amino acids were aligned. All interacting protein pairs were mapped to these clusters and the representative pairs were selected. In order to consider dimers only, the representative pairs with chains interacting with more than one chain were discarded. Homo-dimers with both chains having > 30 interface residues and hetero-dimers with both chains having > 20 interface residues were collected, in order to roughly exclude those from crystallographic complexes . This resulted in a non-redundant data set of 584 homo-dimers and 196 hetero-dimers. The data are available upon request from the authors.
The calculations of contact preferences of interface residues
We have surveyed the preferences of contacts between different groups of interface residues. We classified the residues into 20 groups in terms of amino acids, 3 groups in terms of secondary structures (alpha helix, beta strand, and others, including coil) and 2 groups in terms of the extent of water exposure (fully exposed: ASA >= 40% of a residue's nominal maximum area ; partially exposed: 15% <= ASA < 40%). The secondary structure and ASA of residues for each protein chain were calculated using the DSSP program  with the coordinates of a single chain obtained from the corresponding complex structure.
The contact preference (L) for interface residues from group a and group b was calculated as follows :
L(a, b) = F observed (a, b)/F expected (a, b)
where the observed contact frequency was defined as:
F observed (a, b) = N observed (a, b)/N total
and N observed (a, b) was the number of contact residue pairs between residue group a and group b. N total was the total number of all contacting residue pairs. The expected contact frequency was defined as:
F expected (a, b) = F(a) × F(b)
where F(a) and F(b) were the frequency of residue group a and group b at interfaces respectively. In this study, a residue was defined as a surface residue if its ASA was at least 15% of its nominal maximum area . A surface residue was defined to be an interface residue (residue at a binding site) if it formed an interfacial contact. The definition of interfacial contact was described in the data set section.
Support vector machine (SVM) classifiers
The SVMs were trained to predict if two binding sites interact with each other. The SVM software used in this study was SVM light . The radial basis function exp(-γ||b - a||2) was chosen as a kernel with γ = 0.01 and regularization parameter C = 10.
During the training process (Figure 5), two interface residues, each from interacting binding sites, were considered to form a contacting residue pair (positive class) if the distance between any of their respective heavy atoms was less than 5 Å. A non-contacting residue pair (negative class) was defined as any possible interface residue pair between two non-interacting protein binding sites (binding sites from two non-interacting protein chains). Non-interacting protein chains were generated from our data set having determined that two proteins were not reported to be in the same cellular location as defined by the UniProt database . Since the number of non-interacting protein pairs greatly outnumbered the number of interacting protein pairs, we randomly selected a small portion out of the large pool to be representative data, making the number of non-interacting protein pairs equal to the number of interacting protein pairs. For example, for the combined data set of homo-dimers and hetero-dimers, there were 780 interacting protein pairs. The number of all possible non-interacting protein pairs was 460070. 780 non-interacting protein pairs were further randomly chosen from this pool.
To reduce the data redundancy and training time, for each binding site residue with multiple contacting residues at the other site, only the pairing with the smallest distance was selected to be included in the positive training set. Since there were many more non-interacting residue pairs than interacting residue pairs, a set of non-interacting residue pairs was randomly selected so that the ratio of positive to negative data was 1:1.
The SVM was fed two surface patches, each included a residue of an interface residue pair and its n spatially nearest surface residues (n was an adjustable parameter). The input features were different combinations of sequence profile, secondary structure and accessible surface area of residues in these 2 surface patches. If all three input features were used and surface patch size was set to 3, each residue pair was encoded as a feature vector with a dimension of 2 × 3 × 24: 2 × (the surface residue to be predicted + 2 nearest neighbors) × (20 amino acids + accessible surface area + 3 types of secondary structure). The sequence profiles were obtained from 3 iterations of a PSI-BLAST search against the NCBI non-redundant database (NR) with e = 0.001 and h = 0.001 . The 3 categories of secondary structure were: alpha helix, beta sheet, and others, including coil regions (encoded 1 if it was in this category and -1 if it was not). All input values were scaled between -1 and 1.
During the testing process (Figure 6), two binding sites A and B were predicted to interact with each other if the number of positively predicted residue pairs between them were above a certain threshold:
N predicted ≥ P% × N total
where N predicted is the number of positively predicted residue pairs. N total is the number of total possible residue pairs between binding sites A and B. Given the threshold, the prediction performance was measured as follows:
Precision = TP/(TP +FP)
Accuracy for positive class, or recall, or true positive rate = TP/(TP + FN)
Accuracy for negative class = TN/(TN + FP)
False positive rate = FP/(FP + TN)
Average accuracy = (TP + TN)/(TP + TN + FP + FN)
where TP is the number of correctly predicted interacting binding sites, TN is the number of correctly predicted non-interacting binding sites, FP is the number of non-interacting binding sites incorrectly predicted to be interacting and FN is the number of interacting binding sites incorrectly predicted to be non-interacting.
Identification of the putative binding sites
The putative protein-protein binding sites were determined by a method modified from our previous work . A SVM was trained to locate binding site residues on a protein surface by using sequence profile and accessible surface area of spatially neighboring surface residues. 976 non-redundant chains (584 chains from one of the components of homo-dimers, and 196 × 2 chains from both components of hetero-dimers) were trained and tested with 2 fold cross validation. Each of the other component of a homo-dimer was tested by the training set which didn’t contain its homolog. The residues ranked as the top 30% by SVM were further clustered using the clustering method described in .
This work was supported by NIH grants GM63208 and GM08326.
- von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P: Comparative assessment of large-scale data sets of protein-protein interactions. Nature 2002, 417(6887):399–403. 10.1038/nature750View ArticlePubMedGoogle Scholar
- Fields S, Song O: A novel genetic system to detect protein-protein interactions. Nature 1989, 340(6230):245–246. 10.1038/340245a0View ArticlePubMedGoogle Scholar
- McCafferty J, Griffiths AD, Winter G, Chiswell DJ: Phage antibodies: filamentous phage displaying antibody variable domains. Nature 1990, 348(6301):552–554. 10.1038/348552a0View ArticlePubMedGoogle Scholar
- Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon AM, Cruciat CM, et al.: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002, 415(6868):141–147. 10.1038/415141aView ArticlePubMedGoogle Scholar
- Valencia A, Pazos F: Computational methods for the prediction of protein interactions. Curr Opin Struct Biol 2002, 12(3):368–373. 10.1016/S0959-440X(02)00333-0View ArticlePubMedGoogle Scholar
- Tamames J, Casari G, Ouzounis C, Valencia A: Conserved clusters of functionally related genes in two bacterial genomes. J Mol Evol 1997, 44(1):66–73. 10.1007/PL00006122View ArticlePubMedGoogle Scholar
- Dandekar T, Snel B, Huynen M, Bork P: Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci 1998, 23(9):324–328. 10.1016/S0968-0004(98)01274-2View ArticlePubMedGoogle Scholar
- Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D: Detecting protein function and protein-protein interactions from genome sequences. Science 1999, 285(5428):751–753. 10.1126/science.285.5428.751View ArticlePubMedGoogle Scholar
- Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO: Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA 1999, 96(8):4285–4288. 10.1073/pnas.96.8.4285PubMed CentralView ArticlePubMedGoogle Scholar
- Enright AJ, Iliopoulos I, Kyrpides NC, Ouzounis CA: Protein interaction maps for complete genomes based on gene fusion events. Nature 1999, 402(6757):86–90. 10.1038/47056View ArticlePubMedGoogle Scholar
- Goh CS, Bogan AA, Joachimiak M, Walther D, Cohen FE: Co-evolution of proteins with their interaction partners. J Mol Biol 2000, 299(2):283–293. 10.1006/jmbi.2000.3732View ArticlePubMedGoogle Scholar
- Pazos F, Valencia A: Similarity of phylogenetic trees as indicator of protein-protein interaction. Protein Eng 2001, 14(9):609–614. 10.1093/protein/14.9.609View ArticlePubMedGoogle Scholar
- Pazos F, Valencia A: In silico two-hybrid system for the selection of physically interacting protein pairs. Proteins 2002, 47(2):219–227. 10.1002/prot.10074View ArticlePubMedGoogle Scholar
- Moult J, Fidelis K, Zemla A, Hubbard T: Critical assessment of methods of protein structure prediction (CASP)-round V. Proteins 2003, 53(Suppl 6):334–339. 10.1002/prot.10556View ArticlePubMedGoogle Scholar
- Glaser F, Pupko T, Paz I, Bell RE, Bechor-Shental D, Martz E, Ben-Tal N: ConSurf: identification of functional regions in proteins by surface-mapping of phylogenetic information. Bioinformatics 2003, 19(1):163–164. 10.1093/bioinformatics/19.1.163View ArticlePubMedGoogle Scholar
- Landgraf R, Xenarios I, Eisenberg D: Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. J Mol Biol 2001, 307(5):1487–1502. 10.1006/jmbi.2001.4540View ArticlePubMedGoogle Scholar
- Lichtarge O, Bourne HR, Cohen FE: An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 1996, 257(2):342–358. 10.1006/jmbi.1996.0167View ArticlePubMedGoogle Scholar
- Lichtarge O, Sowa ME: Evolutionary predictions of binding surfaces and interactions. Curr Opin Struct Biol 2002, 12(1):21–27. 10.1016/S0959-440X(02)00284-1View ArticlePubMedGoogle Scholar
- Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N: Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 2002, 18(Suppl 1):S71–77.View ArticlePubMedGoogle Scholar
- Jones S, Thornton JM: Prediction of protein-protein interaction sites using patch analysis. J Mol Biol 1997, 272(1):133–143. 10.1006/jmbi.1997.1233View ArticlePubMedGoogle Scholar
- Neuvirth H, Raz R, Schreiber G: ProMate: a structure based prediction program to identify the location of protein-protein binding sites. J Mol Biol 2004, 338(1):181–199. 10.1016/j.jmb.2004.02.040View ArticlePubMedGoogle Scholar
- Gallet X, Charloteaux B, Thomas A, Brasseur R: A fast method to predict protein interaction sites from sequences. J Mol Biol 2000, 302(4):917–926. 10.1006/jmbi.2000.4092View ArticlePubMedGoogle Scholar
- Chen H, Zhou HX: Prediction of interface residues in protein-protein complexes by a consensus neural network method: test against NMR data. Proteins 2005, 61(1):21–35. 10.1002/prot.20514View ArticlePubMedGoogle Scholar
- Fariselli P, Pazos F, Valencia A, Casadio R: Prediction of protein – protein interaction sites in heterocomplexes with neural networks. Eur J Biochem 2002, 269(5):1356–1361. 10.1046/j.1432-1033.2002.02767.xView ArticlePubMedGoogle Scholar
- Ofran Y, Rost B: Predicted protein-protein interaction sites from local sequence information. FEBS Lett 2003, 544(1–3):236–239. 10.1016/S0014-5793(03)00456-3View ArticlePubMedGoogle Scholar
- Zhou HX, Shan Y: Prediction of protein interaction sites from sequence profile and residue neighbor list. Proteins 2001, 44(3):336–343. 10.1002/prot.1099View ArticlePubMedGoogle Scholar
- Bordner AJ, Abagyan R: Statistical analysis and prediction of protein-protein interfaces. Proteins 2005, 60(3):353–366. 10.1002/prot.20433View ArticlePubMedGoogle Scholar
- Bradford JR, Westhead DR: Improved prediction of protein-protein binding sites using a support vector machines approach. Bioinformatics 2005, 21(8):1487–1494. 10.1093/bioinformatics/bti242View ArticlePubMedGoogle Scholar
- Chung JL, Wang W, Bourne PE: Exploiting sequence and structure homologs to identify protein-protein binding sites. Proteins 2006, 62(3):630–640. 10.1002/prot.20741View ArticlePubMedGoogle Scholar
- Koike A, Takagi T: Prediction of protein-protein interaction sites using support vector machines. Protein Eng Des Sel 2004, 17(2):165–173. 10.1093/protein/gzh020View ArticlePubMedGoogle Scholar
- Yan C, Dobbs D, Honavar V: A two-stage classifier for identification of protein-protein interface residues. Bioinformatics 2004, 20(Suppl 1):I371-I378. 10.1093/bioinformatics/bth920View ArticlePubMedGoogle Scholar
- Bradford JR, Needham CJ, Bulpitt AJ, Westhead DR: Insights into protein-protein interfaces using a Bayesian network prediction method. J Mol Biol 2006, 362(2):365–386. 10.1016/j.jmb.2006.07.028View ArticlePubMedGoogle Scholar
- Halperin I, Ma B, Wolfson H, Nussinov R: Principles of docking: An overview of search algorithms and a guide to scoring functions. Proteins 2002, 47(4):409–443. 10.1002/prot.10115View ArticlePubMedGoogle Scholar
- Smith GR, Sternberg MJ: Prediction of protein-protein interactions by docking methods. Curr Opin Struct Biol 2002, 12(1):28–35. 10.1016/S0959-440X(02)00285-3View ArticlePubMedGoogle Scholar
- Aloy P, Russell RB: Interrogating protein interaction networks through structural biology. Proc Natl Acad Sci USA 2002, 99(9):5896–5901. 10.1073/pnas.092147999PubMed CentralView ArticlePubMedGoogle Scholar
- Lu L, Lu H, Skolnick J: MULTIPROSPECTOR: an algorithm for the prediction of protein-protein interactions by multimeric threading. Proteins 2002, 49(3):350–364. 10.1002/prot.10222View ArticlePubMedGoogle Scholar
- Aloy P, Bottcher B, Ceulemans H, Leutwein C, Mellwig C, Fischer S, Gavin AC, Bork P, Superti-Furga G, Serrano L, et al.: Structure-based assembly of protein complexes in yeast. Science 2004, 303(5666):2026–2029. 10.1126/science.1092645View ArticlePubMedGoogle Scholar
- Lu L, Arakaki AK, Lu H, Skolnick J: Multimeric threading-based prediction of protein-protein interactions on a genomic scale: application to the Saccharomyces cerevisiae proteome. Genome Res 2003, 13(6A):1146–1154. 10.1101/gr.1145203PubMed CentralView ArticlePubMedGoogle Scholar
- Vakser IA: Protein-protein interfaces are special. Structure 2004, 12(6):910–912. 10.1016/j.str.2004.05.003View ArticlePubMedGoogle Scholar
- Aloy P, Ceulemans H, Stark A, Russell RB: The relationship between sequence and interaction divergence in proteins. J Mol Biol 2003, 332(5):989–998. 10.1016/j.jmb.2003.07.006View ArticlePubMedGoogle Scholar
- Wodak SJ, Mendez R: Prediction of protein-protein interactions: the CAPRI experiment, its evaluation and implications. Curr Opin Struct Biol 2004, 14(2):242–249. 10.1016/j.sbi.2004.02.003View ArticlePubMedGoogle Scholar
- Aytuna AS, Gursoy A, Keskin O: Prediction of protein-protein interactions by combining structure and sequence conservation in protein interfaces. Bioinformatics 2005, 21(12):2850–2855. 10.1093/bioinformatics/bti443View ArticlePubMedGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000, 28(1):235–242. 10.1093/nar/28.1.235PubMed CentralView ArticlePubMedGoogle Scholar
- Halperin I, Wolfson H, Nussinov R: Protein-protein interactions; coupling of structurally conserved residues and of hot spots across interfaces. Implications for docking. Structure 2004, 12(6):1027–1038. 10.1016/j.str.2004.04.009View ArticlePubMedGoogle Scholar
- Lu H, Lu L, Skolnick J: Development of unified statistical potentials describing protein-protein interactions. Biophys J 2003, 84(3):1895–1901.PubMed CentralView ArticlePubMedGoogle Scholar
- Bogan AA, Thorn KS: Anatomy of hot spots in protein interfaces. J Mol Biol 1998, 280(1):1–9. 10.1006/jmbi.1998.1843View ArticlePubMedGoogle Scholar
- Ansari S, Helms V: Statistical analysis of predominantly transient protein-protein interfaces. Proteins 2005, 61(2):344–355. 10.1002/prot.20593View ArticlePubMedGoogle Scholar
- Glaser F, Steinberg DM, Vakser IA, Ben-Tal N: Residue frequencies and pairing preferences at protein-protein interfaces. Proteins 2001, 43(2):89–102. 10.1002/1097-0134(20010501)43:2<89::AID-PROT1021>3.0.CO;2-HView ArticlePubMedGoogle Scholar
- Ofran Y, Rost B: Analysing six types of protein-protein interfaces. J Mol Biol 2003, 325(2):377–387. 10.1016/S0022-2836(02)01223-8View ArticlePubMedGoogle Scholar
- Saha RP, Bahadur RP, Chakrabarti P: Interresidue contacts in proteins and protein-protein interfaces and their use in characterizing the homodimeric interface. J Proteome Res 2005, 4(5):1600–1609. 10.1021/pr050118kView ArticlePubMedGoogle Scholar
- De S, Krishnadev O, Srinivasan N, Rekha N: Interaction preferences across protein-protein interfaces of obligatory and non-obligatory components are different. BMC Struct Biol 2005, 5: 15. 10.1186/1472-6807-5-15PubMed CentralView ArticlePubMedGoogle Scholar
- Pazos F, Helmer-Citterich M, Ausiello G, Valencia A: Correlated mutations contain information about protein-protein interaction. J Mol Biol 1997, 271(4):511–523. 10.1006/jmbi.1997.1198View ArticlePubMedGoogle Scholar
- Sprinzak E, Margalit H: Correlated sequence-signatures as markers of protein-protein interaction. J Mol Biol 2001, 311(4):681–692. 10.1006/jmbi.2001.4920View ArticlePubMedGoogle Scholar
- Jones S, Thornton JM: Protein-protein interactions: a review of protein dimer structures. Prog Biophys Mol Biol 1995, 63(1):31–65. 10.1016/0079-6107(94)00008-WView ArticlePubMedGoogle Scholar
- Zeng Z, Castano AR, Segelke BW, Stura EA, Peterson PA, Wilson IA: Crystal structure of mouse CD1: An MHC-like fold with a large hydrophobic binding groove. Science 1997, 277(5324):339–345. 10.1126/science.277.5324.339View ArticlePubMedGoogle Scholar
- Rowland P, Norager S, Jensen KF, Larsen S: Structure of dihydroorotate dehydrogenase B: electron transfer between two flavin groups bridged by an iron-sulphur cluster. Structure 2000, 8(12):1227–1238. 10.1016/S0969-2126(00)00530-XView ArticlePubMedGoogle Scholar
- Card GL, Knowles P, Laman H, Jones N, McDonald NQ: Crystal structure of a gamma-herpesvirus cyclin-cdk complex. Embo J 2000, 19(12):2877–2888. 10.1093/emboj/19.12.2877PubMed CentralView ArticlePubMedGoogle Scholar
- Hopfner KP, Karcher A, Craig L, Woo TT, Carney JP, Tainer JA: Structural biochemistry and interaction architecture of the DNA double-strand break repair Mre11 nuclease and Rad50-ATPase. Cell 2001, 105(4):473–485. 10.1016/S0092-8674(01)00335-XView ArticlePubMedGoogle Scholar
- Rost B, Sander C: Conservation and prediction of solvent accessibility in protein families. Proteins 1994, 20(3):216–226. 10.1002/prot.340200303View ArticlePubMedGoogle Scholar
- Kabsch W, Sander C: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22(12):2577–2637. 10.1002/bip.360221211View ArticlePubMedGoogle Scholar
- Schölkopf B, Burges CJC, Smola AJ: Advances in kernel methods: support vector learning. Cambridge, Mass.: MIT Press; 1999.Google Scholar
- Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, et al.: The Universal Protein Resource (UniProt). Nucleic Acids Res 2005, (33 Database):D154–159.
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389–3402. 10.1093/nar/25.17.3389PubMed CentralView ArticlePubMedGoogle Scholar
- Kyte J, Doolittle RF: A simple method for displaying the hydropathic character of a protein. J Mol Biol 1982, 157(1):105–132. 10.1016/0022-2836(82)90515-0View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.