Exploiting residue-level and profile-level interface propensities for usage in binding sites prediction of proteins
© Dong et al; licensee BioMed Central Ltd. 2007
Received: 01 February 2007
Accepted: 05 May 2007
Published: 05 May 2007
Recognition of binding sites in proteins is a direct computational approach to the characterization of proteins in terms of biological and biochemical function. Residue preferences have been widely used in many studies but the results are often not satisfactory. Although different amino acid compositions among the interaction sites of different complexes have been observed, such differences have not been integrated into the prediction process. Furthermore, the evolution information has not been exploited to achieve a more powerful propensity.
In this study, the residue interface propensities of four kinds of complexes (homo-permanent complexes, homo-transient complexes, hetero-permanent complexes and hetero-transient complexes) are investigated. These propensities, combined with sequence profiles and accessible surface areas, are inputted to the support vector machine for the prediction of protein binding sites. Such propensities are further improved by taking evolutional information into consideration, which results in a class of novel propensities at the profile level, i.e. the binary profiles interface propensities. Experiment is performed on the 1139 non-redundant protein chains. Although different residue interface propensities among different complexes are observed, the improvement of the classifier with residue interface propensities can be negligible in comparison with that without propensities. The binary profile interface propensities can significantly improve the performance of binding sites prediction by about ten percent in term of both precision and recall.
Although there are minor differences among the four kinds of complexes, the residue interface propensities cannot provide efficient discrimination for the complicated interfaces of proteins. The binary profile interface propensities can significantly improve the performance of binding sites prediction of protein, which indicates that the propensities at the profile level are more accurate than those at the residue level.
Protein function is very often encoded in a small number of residues located in the functional active site, which are dispersed around the primary sequence, but packed in a compact spatial region . Recognition of functional sites in proteins is a direct computational approach to the characterization of proteins in terms of biological and biochemical function. Localization of functional sites will allow us to understand how the protein recognizes other molecules, to gain clues about its likely function at the level of the cell and the organism, and to identify important binding sites that may serve as useful targets for pharmaceutical design .
Recently, a series of computational efforts to identify interaction sites or interfaces in proteins have been undertaken. A number of studies on the characteristics of protein interfaces have provided clues for binding site prediction. Several methods have been proposed to predict these sites based on the sequence or structure characteristics of known protein-protein interaction sites.
In terms of physical chemistry, protein interfaces are generally observed to be more hydrophobic than the remainder of the protein surface [3, 4]. Moreover, the interfaces of permanent complexes tend to be more hydrophobic when compared to those of transient complexes . Some interfaces have a significant number of polar residues , usually where interactions are less permanent . Charged side-chains are often excluded from protein-protein interfaces with the exception of arginine , which is one of the most abundant interface residues regardless of interaction types . The evolutionary conservation of residues is another property that may be utilized to predict protein-protein interfaces . The evolutionary trace (ET) method tries to identify functional sites by using the sequence variations and functional divergences found in nature [11, 12]. Accurate ET analysis requires functionally relevant sequence and high-quality alignments as input . A structure-independent criterion has been presented to measure the quality of evolutionary trace . Because sequence conservation reflects not only evolutionary selection at binding sites to maintain protein function, but also the selection throughout the protein to maintain the stability of the folded state , many researchers try to distinguish functional and structural constraints on protein evolution [16, 17]. A comprehensive evaluation of different conservation scores has been performed by Valdar . Other sequence information has also been exploited such as the phylogenetic profile [19, 20], the sequence motifs , sequence profile [22, 23], evolution rate [24, 25], etc.
The features extracted from the three-dimensional structures of protein complexes are critical for a full understanding of the mechanism of interactions because they provide specific interaction details at the atomic level. The accessible surface area (ASA) is one of the most widely used features . Molecular docking seems to be the most principled computational approach for identifying the interaction sites , but it requires the precise design of energy function , either physical energy  or empirical scoring functions [27, 30]. 3D-motifs have also been successfully used to identify binding sites of the same type in proteins with different folds [31–34]. Patch analysis using a six-parameter scoring function can distinguish the interface from other surfaces .
Because none of the above-mentioned properties is able to make an unambiguous identification of interface regions or patches, a combination of some of them (via either a linear combination  or machine learning ) is found to be effective for improving the accuracy of binding-site prediction . The PINUP method predicts interface residues using an empirical score function made of a linear combination of the energy score, interface propensity and residue conservation score .
Rossi et al. first construct a scoring function, and then perform a Monte Carlo optimization, to find a good scoring patch on the protein surface .
Machine Learning Methods are well suited to the classification of interface and non-interface surface residues [40, 41]. Neural networks  and support vector machine [43, 44] have been applied in this field. These studies take sequential or structural information as input . Other researchers adopt two-stage model  to further improve the performance. Recently, the conditional random field (CRF) model has been introduced, which formalizes the prediction of protein interaction sites as a sequence-labeling task .
In this study, we revisit the difference of amino acid compositions between the interface area and other surface area. Although some researchers have found that there are different amino acid compositions among the interaction sites of different complexes (homo-permanent complexes, homo-transient complexes, hetero-permanent complexes, and hetero-transient complexes) , such difference has not been integrated into the prediction process. Here, the residue interface propensities of different complexes are collected. These propensities, combined with sequence profiles and accessible surface areas, are inputted to the support vector machine for the prediction of protein binding sites. Such propensities are further improved by taking evolutional information into consideration. The frequency profiles are directly calculated from the multiple sequence alignments outputted by PSI-BLAST  and converted into binary profiles  with a probability threshold. As a result, the protein sequences are represented as sequences of binary profiles rather than sequences of amino acids. Similar to the residue interface propensities, a class of novel propensities at the profile level is introduced. Binary profiles can be viewed as novel building blocks of proteins. It has been successfully applied in many computational biology tasks, such as domain boundary prediction , knowledge-based mean force potentials , protein remote homology detection  etc. Experimental results show that the binary profile propensities significantly improve the performance of binding sites prediction of proteins.
Results and discussion
Residue interface propensities
The four kinds of complexes have similar residue interface propensities. They all show that hydrophobic residues (F, I, L, M, V) and some polar aromatic residues (W, Y, H) are favored in interface area. The charged residue R also shows preferences for the interface area. Other polar amino acid T, E and small amino acid P, A are disfavored in the interface. The same phenomena have been observed by others  although some researchers evaluated the ASA contribution for amino acid [3, 38] while we count them. Bio-physically similar residues, such as L and I, or D and E, usually showed similar trends, indicating the reliability of the data.
There are minor differences among the four kinds of complexes. Although many amino acids show the same trend for interface area or surface area, the propensities are different for the four kinds of complexes. Further more, some amino acids reveal different propensities in different complexes. Amino acid Q, S and T show preferences for the hetero complexes rather than homo complexes.
Amino acid C and L are favored in permanent complexes rather than transient complexes. Ofran and Rost  found that the composition of all interface types differed substantially from that of SWISS-PROT. Here we conclude that the residue interface propensities show general trends and have minor differences among different kinds of complexes.
Binary profile interface propensities
The binary profile frequencies in interface are different from those in surface area. These differences can be used to produce the discriminative binary profile propensities. In theory, the total number of binary profiles is extremely large (220), but in fact, only a small fraction of binary profiles appears, which is dependent on the choice of probability threshold P h and the dataset. Based on the results of cross-validation (Next section), the four kinds of complexes have different number of binary profiles, ranging from one hundred to several thousands. The binary profiles and their propensities of the four kinds of complexes are listed in the Additional files (see additional file 1, 2, 3, 4). Note that the binary profiles with low occurrence times (<3) are ignored, since these profiles are not statistically significant and may introduce much noise.
An increased propensity of hydrophobic residues and their combinations in interface has been observed, such as the binary profile FHWY, ILMV. Although some amino acids are preferred in surface, the combination of these amino acids with other amino acids may be preferred in interface such as AEP, ST. Another special phenomenon is that some binary profiles only occur in interface while other binary profiles only occur in surface area. The former results in a maximum propensity (being set as 4) and the latter results in a minimum propensity (being set as – 4). Each kind of complexes has many such binary profiles.
The differences of binary profile interface propensities among the four kinds of complexes
Comparative results with and without propensities
The first SVM takes profile and ASA of spatially neighboring residues as input, which are common input features used by previous studies [15, 44, 51]. Then we add the amino acid or binary profile interface propensities as an extra feature to evaluate whether these propensities can improve the performance or not. All the results are obtained by five-fold cross-validation.
Comparative results with and without residue interface propensities on the four kinds of complexes.
Cross-validation results with binary profile interface propensities
P h a
Comparative results with propensities from other complexes
Comparative results with residue interface propensities from other complexes.
Comparative results with binary profile interface propensities from other complexes.
The performances of Table 4 are close to those of Table 2, which indicates that the differences of residue interface propensities among different complexes can be negligible. The performances of Table 5 decrease significantly in comparison with those of Table 3, so the profile interface propensities are sensitive to the types of complexes. In other words, the propensities at the profile-level can give more exact description of interfaces than the propensities at the residue level.
Comparison with conservation scores
Please refer to  for detail calculation and comparison of these scores.
Cross-validation results with conservation scores
All these conservation scores show positive correlation with binary profile interface propensities, although the Pearson correlation coefficients are small (0.017, 0.053, 0.064 for Ventropy, VKarlin and Vvaldar respectively). The results show that the improvement by conservation scores is much lower than that by binary profile interface propensities.
Results on the protein-protein docking benchmark 2.0 dataset.
No. of Protein
The results are better than those of related works. Liang et al . developed an empirical scoring function for binding site prediction, which is a weighted combination of energy scores, conservation scores and residue interface propensities. They achieved the precision of 0.294 and the recall of 0.305. The overall F1 is only 0.30. Their method is trained on a small dataset (only 57 proteins). Furthermore their method is a simple combination of three features while our method is based on discriminative model.
In this study, the residue interface propensities of four kinds of complexes (hetero-permanent complex, hetero-transient complex, homo-permanent complexes and homo-transient complex) are collected and applied in the process of predicting binding sites of proteins. Such propensities are improved by taking evolutionary information into consideration, which results in the binary profile interface propensities. Although there are minor differences among the four kinds of complexes, the residue interface propensities cannot provide efficient discrimination for the complicated interfaces of proteins. The binary profile interface propensities can significantly improve the performance of binding sites prediction of protein, which indicates that the propensities at the profile level are more accurate than those at the residue level.
A comprehensive set of complexes is chosen from the Protein Data Bank (PDB)  and then subjected to a number of stringent filtering steps. All proteins with multi-chains, non-NMR structures and resolution better than 4 Å are selected. Two chains in a protein are considered as interacting pairs if at least two non-hydrogen atoms in each chain are separated by no more than 5 Å [42, 56].
For PDB structure with more than two chains, each chain is selected for at most one time. For protein chain that interacts with multiple partners, only one partner with the most interfacial residues is selected as its partner. The protein chains with less than 40 amino acids are removed. The PQS web-server  is used to eliminate crystal packing complexes rather than biologically functional multimers. The selected chains are further filtered such that no pair of chain has more than 25% sequence identify. Finally, a total of 1139 chains are obtained.
Classification of complexes
The protein-protein interactions can be divided into different types according to different criterions . In this study, the complexes are classified by the homology of interacting chains (homo versus hetero) and the lifetime of the complexes (transient versus permanent).
Using simple sequence comparisons, the complexes can be classified as homo-complexes or hetero-complexes. Two interacting protein chains were defined as homo-complex if over 90% of them are aligned and the sequence identity over the aligned region is more than 95% . All other complexes are classified as hetero-complexes.
A permanent complex is usually very stable and thus only exists in its complexed form. In contrast, a transient complex can exist in separated state. The method of differentiating the transient complexes and permanent complexes is same as the one used by Ofran and Rost . The guild lines for classifying the hetero-complexes and homo-complexes into permanent and transient states are different. They are briefly described here. If the chains from the hetero-complexes are stored in the same SWISS-PROT files , the complexes are classified as hetero-permanent complexes; otherwise they are classified as hetero-transient complexes. All homo-complexes that are annotated as monomers in DIP  database are classified as homo-transient complexes; otherwise they are classified as homo-permanent complexes.
Summary of the four complexes
Calculation of propensities
The amino acid frequencies between interface and other surface area are different. Such difference can be used to produce the residue interface propensity, which is defined as the log ratio between the amino acid frequency in interface area and that in surface area:P a = In(Pa, I/Pa, S)
where Ca, Iis the count of amino acid a in interface area, C I is the total number of amino acid in interface area, Ca, Sis the count of amino acid a in surface area, C S is the total number of amino acid in surface area. The residue interface propensity describes the likelihood of amino acid to be found in interface area as compared to those in surface area. A propensity of 0 indicates that the amino acid has the same frequency in interface and surface area. A positive propensity means that the amino acid is over-representative in interface area.
In term of binary profile, the protein sequence is represented as sequence of binary profiles rather than sequence of amino acids. Each amino acid is replaced by the corresponding binary profiles that are derived from the multiple sequence alignments as described in the following section. The calculation formula of binary profile interface propensities are same as that of the residue interface propensities except that the subscripts are replaced by binary profiles rather than amino acid:P b = In(Pb, I/Pb, S)
where P b is the propensity of binary profile b, Pb, Iis the frequency of binary profile b in interface area and Pb, Sis the frequency of binary profile b in surface area. The frequencies can also be calculated by maximum likelihood estimation in the same manner of amino acid. The binary profile interface propensity contains evolution information and provides more accurate prediction of binding sites than amino acid interface propensity according to the experimental results.
An example of calculating the propensities of binary profiles
When the probability threshold P h is taken as 0.08, we get the following binary profile:
By collecting the non-zero term in binary profile, the combination of amino acid AGLN is obtained. Suppose the frequency of AGLN is 0.00042 in interface area and 0.00021 in surface area, which are calculated by maximum likelihood estimate using equation (5) and (6). Thus, the propensity of AGLN is 0.693147 (ln (0.00042/0.00021)) by equation (7).
Generating of binary profiles
A binary profile can be expressed by a vector with dimensions of 20, in which each element represents one kind of amino acid and can only take value of 0 or 1. When the element takes value of 1, it means that the corresponding amino acid can occur during evolution. Otherwise, it means that the corresponding amino acid cannot occur. A binary profile can also be expressed by a substring of amino acid combination, which is obtained by collecting each element of the vector with non-zero value. Each combination of the twenty amino acids corresponds to a binary profile and vice versa. Below we describe the process of generating the binary profiles.
The PSI-BLAST  is used to generate the profiles of amino acid sequences with parameters j = 3 and e = 0.001. The search is performed against the non-redundant database (NR) database from NCBI. The frequency profiles are directly obtained from the multiple sequence alignments outputted by PSI-BLAST. The target frequency reflects the probability of an amino acid occurrence in a given position of the sequences. The method of target frequency calculation is similar to that implemented in PSI-BLAST.
Support Vector Machine (SVM) is a class of supervised learning algorithms first introduced by Vapnik . Given a set of labelled training vectors (positive and negative input examples), SVM can learn a linear decision boundary to discriminate between the two classes. The result is a linear classification rule that can be used to classify new test examples. SVM has exhibited excellent performance in practice and has strong theoretical foundation of statistical learning theory. Here the LIBSVM package  is used as the SVM implementation with radial basis function as kernel. The values of γ and regularization parameter C are set to be 0.005 and 1, respectively.
The input of SVM is a window containing a surface residue and its 12 spatially nearest surface residues . An interface residue is defined as the positive sample, and a surface residue is defined as the negative sample. The input features are sequence profiles, accessible surface areas and propensities of residues in the window. The sequence profiles are taken from the Position-Specific Score Matrix (PSSM) outputted by PSI-BLAST . All the input values are scaled between -1 and 1 before being inputted to the SVM.
It is known that SVM cannot perform well on an unbalanced dataset. In this dataset, only 27.3% of the surface residues are interface residues. If all surface residues are used in the training, the classifier will be biased to predict a residue as a surface residue. To address this issue, a set of surface residues is randomly selected to make the ratio of positive and negative data 1:1. Fivefold cross-validation is then used to evaluate the SVM. The whole dataset is randomly divided into five subgroups with an approximately equal number of chains. Each SVM runs five times with five different training and test sets. For each run, three of the subsets are used as the training set, one subset is used to select the optimal parameters and the remaining one is used as the test set.
where TP is the number of true positives (interface residues correctly classified as interface residues), FP is the number of false positives (surface residues incorrectly classified as interface residues), TN is the number of true negatives (surface residues correctly classified as surface residues) and FN is the number of false negatives (interface residues incorrectly classified as surface residues).
Precision, recall and F1 are used to measure the performance of classifying interface residues, while accuracy is used to measure the performance of classifying the whole test dataset. Correlation coefficient (CC) is applied to measure the correlation between predictions and actual test data.
The authors would like to thank Xuan Liu for her comments on this work that significantly improve the presentation of the paper. Financial support is provided by the National Natural Science Foundation of China (60673019 and 60435020).
- Zhang Z, Grigorov MG: Similarity networks of protein binding sites. Proteins 2006, 62(2):470–478. 10.1002/prot.20752View ArticlePubMedGoogle Scholar
- Chelliah V, Chen L, Blundell TL, Lovell SC: Distinguishing structural and functional restraints in evolution in order to identify interaction sites. J Mol Biol 2004, 342(5):1487–1504. 10.1016/j.jmb.2004.08.022View ArticlePubMedGoogle Scholar
- Jones S, Thornton JM: Analysis of protein-protein interaction sites using surface patches. J Mol Biol 1997, 272(1):121–132. 10.1006/jmbi.1997.1234View ArticlePubMedGoogle Scholar
- Magliery TJ, Regan L: Sequence variation in ligand binding sites in proteins. BMC Bioinformatics 2005, 6: 240. 10.1186/1471-2105-6-240PubMed CentralView ArticlePubMedGoogle Scholar
- Lo Conte L, Chothia C, Janin J: The atomic structure of protein-protein recognition sites. J Mol Biol 1999, 285(5):2177–2198. 10.1006/jmbi.1998.2439View ArticlePubMedGoogle Scholar
- Bradford JR, Westhead DR: Improved prediction of protein-protein binding sites using a support vector machines approach. Bioinformatics 2005, 21(8):1487–1494. 10.1093/bioinformatics/bti242View ArticlePubMedGoogle Scholar
- Nooren IM, Thornton JM: Structural characterisation and functional significance of transient protein-protein interactions. J Mol Biol 2003, 325(5):991–1018. 10.1016/S0022-2836(02)01281-0View ArticlePubMedGoogle Scholar
- Bradford JR, Needham CJ, Bulpitt AJ, Westhead DR: Insights into protein-protein interfaces using a Bayesian network prediction method. J Mol Biol 2006, 362(2):365–386. 10.1016/j.jmb.2006.07.028View ArticlePubMedGoogle Scholar
- Chakrabarti P, Janin J: Dissecting protein-protein recognition sites. Proteins 2002, 47(3):334–343. 10.1002/prot.10085View ArticlePubMedGoogle Scholar
- Pils B, Copley RR, Schultz J: Variation in structural location and amino acid conservation of functional sites in protein domain families. BMC Bioinformatics 2005, 6: 210. 10.1186/1471-2105-6-210PubMed CentralView ArticlePubMedGoogle Scholar
- Lichtarge O, Bourne HR, Cohen FE: An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 1996, 257(2):342–358. 10.1006/jmbi.1996.0167View ArticlePubMedGoogle Scholar
- Morgan DH, Kristensen DM, Mittelman D, Lichtarge O: ET viewer: an application for predicting and visualizing functional sites in protein structures. Bioinformatics 2006, 22(16):2049–2050. 10.1093/bioinformatics/btl285View ArticlePubMedGoogle Scholar
- Yao H, Kristensen DM, Mihalek I, Sowa ME, Shaw C, Kimmel M, Kavraki L, Lichtarge O: An accurate, sensitive, and scalable method to identify functional sites in protein structures. J Mol Biol 2003, 326(1):255–261. 10.1016/S0022-2836(02)01336-0View ArticlePubMedGoogle Scholar
- Yao H, Mihalek I, Lichtarge O: Rank information: a structure-independent measure of evolutionary trace quality that improves identification of protein functional sites. Proteins 2006, 65(1):111–123. 10.1002/prot.21101View ArticlePubMedGoogle Scholar
- Chung JL, Wang W, Bourne PE: Exploiting sequence and structure homologs to identify protein-protein binding sites. Proteins 2006, 62(3):630–640. 10.1002/prot.20741View ArticlePubMedGoogle Scholar
- Cheng G, Qian B, Samudrala R, Baker D: Improvement in protein functional site prediction by distinguishing structural and functional constraints on protein family evolution using computational design. Nucleic Acids Res 2005, 33(18):5861–5867. 10.1093/nar/gki894PubMed CentralView ArticlePubMedGoogle Scholar
- Panchenko AR, Kondrashov F, Bryant S: Prediction of functional sites by analysis of sequence and structure conservation. Protein Sci 2004, 13(4):884–892. 10.1110/ps.03465504PubMed CentralView ArticlePubMedGoogle Scholar
- Valdar WS: Scoring residue conservation. Proteins 2002, 48(2):227–241. 10.1002/prot.10146View ArticlePubMedGoogle Scholar
- La D, Sutch B, Livesay DR: Predicting protein functional sites with phylogenetic motifs. Proteins 2005, 58(2):309–320. 10.1002/prot.20321View ArticlePubMedGoogle Scholar
- Kim Y, Subramaniam S: Locally defined protein phylogenetic profiles reveal previously missed protein interactions and functional relationships. Proteins 2006, 62(4):1115–1124. 10.1002/prot.20830View ArticlePubMedGoogle Scholar
- Liu AH, Zhang X, Stolovitzky GA, Califano A, Firestein SJ: Motif-based construction of a functional map for mammalian olfactory receptors. Genomics 2003, 81(5):443–456. 10.1016/S0888-7543(03)00022-3View ArticlePubMedGoogle Scholar
- Wang B, Chen P, Huang DS, Li JJ, Lok TM, Lyu MR: Predicting protein interaction sites from residue spatial sequence profile and evolution rate. FEBS Lett 2006, 580(2):380–384. 10.1016/j.febslet.2005.11.081View ArticlePubMedGoogle Scholar
- Yan C, Dobbs D, Honavar V: A two-stage classifier for identification of protein-protein interface residues. Bioinformatics 2004, 20(Suppl 1):I371-I378. 10.1093/bioinformatics/bth920View ArticlePubMedGoogle Scholar
- Bordner AJ, Abagyan R: REVCOM: a robust Bayesian method for evolutionary rate estimation. Bioinformatics 2005, 21(10):2315–2321. 10.1093/bioinformatics/bti347View ArticlePubMedGoogle Scholar
- Thibert B, Bredesen DE, Del Rio G: Improved prediction of critical residues for protein function based on network and phylogenetic analyses. BMC Bioinformatics 2005, 6(1):213. 10.1186/1471-2105-6-213PubMed CentralView ArticlePubMedGoogle Scholar
- Zhou HX, Shan Y: Prediction of protein interaction sites from sequence profile and residue neighbor list. Proteins 2001, 44(3):336–343. 10.1002/prot.1099View ArticlePubMedGoogle Scholar
- Meiler J, Baker D: ROSETTALIGAND: protein-small molecule docking with full side-chain flexibility. Proteins 2006, 65(3):538–548. 10.1002/prot.21086View ArticlePubMedGoogle Scholar
- Osterberg F, Morris GM, Sanner MF, Olson AJ, Goodsell DS: Automated docking to multiple target structures: incorporation of protein mobility and structural water heterogeneity in AutoDock. Proteins 2002, 46: 34–40. 10.1002/prot.10028View ArticlePubMedGoogle Scholar
- Laurie AT, Jackson RM: Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites. Bioinformatics 2005, 21(9):1908–1916. 10.1093/bioinformatics/bti315View ArticlePubMedGoogle Scholar
- Zhang C, Liu S, Zhu Q, Zhou Y: A knowledge-based energy function for protein-ligand, protein-protein, and protein-DNA complexes. J Med Chem 2005, 48(7):2325–2335. 10.1021/jm049314dView ArticlePubMedGoogle Scholar
- Torrance JW, Bartlett GJ, Porter CT, Thornton JM: Using a library of structural templates to recognise catalytic sites and explore their evolution in homologous families. J Mol Biol 2005, 347(3):565–581. 10.1016/j.jmb.2005.01.044View ArticlePubMedGoogle Scholar
- Ivanisenko VA, Pintus SS, Grigorovich DA, Kolchanov NA: PDBSite: a database of the 3D structure of protein functional sites. Nucleic Acids Res 2005, (33 Database):D183–187.
- Wilczynski B, Hvidsten TR, Kryshtafovych A, Tiuryn J, Komorowski J, Fidelis K: Using local gene expression similarities to discover regulatory binding site modules. BMC Bioinformatics 2006, 7: 505. 10.1186/1471-2105-7-505PubMed CentralView ArticlePubMedGoogle Scholar
- Snyder KA, Feldman HJ, Dumontier M, Salama JJ, Hogue CW: Domain-based small molecule binding site annotation. BMC Bioinformatics 2006, 7: 152. 10.1186/1471-2105-7-152PubMed CentralView ArticlePubMedGoogle Scholar
- Neuvirth H, Raz R, Schreiber G: ProMate: a structure based prediction program to identify the location of protein-protein binding sites. J Mol Biol 2004, 338(1):181–199. 10.1016/j.jmb.2004.02.040View ArticlePubMedGoogle Scholar
- Res I, Mihalek I, Lichtarge O: An evolution based classifier for prediction of protein interfaces without using protein structures. Bioinformatics 2005, 21(10):2496–2501. 10.1093/bioinformatics/bti340View ArticlePubMedGoogle Scholar
- Yan C, Terribilini M, Wu F, Jernigan RL, Dobbs D, Honavar V: Predicting DNA-binding sites of proteins from amino acid sequence. BMC Bioinformatics 2006, 7: 262. 10.1186/1471-2105-7-262PubMed CentralView ArticlePubMedGoogle Scholar
- Liang S, Zhang C, Liu S, Zhou Y: Protein binding site prediction using an empirical scoring function. Nucleic Acids Res 2006, 34(13):3698–3707. 10.1093/nar/gkl454PubMed CentralView ArticlePubMedGoogle Scholar
- Rossi A, Marti-Renom MA, Sali A: Localization of binding sites in protein structures by optimization of a composite scoring function. Protein Sci 2006.Google Scholar
- Down T, Leong B, Hubbard TJ: A machine learning strategy to identify candidate binding sites in human protein-coding sequence. BMC Bioinformatics 2006, 7: 419. 10.1186/1471-2105-7-419PubMed CentralView ArticlePubMedGoogle Scholar
- Deng H, Chen G, Yang W, Yang JJ: Predicting calcium-binding sites in proteins – a graph theory and geometry approach. Proteins 2006, 64(1):34–42. 10.1002/prot.20973View ArticlePubMedGoogle Scholar
- Chen H, Zhou HX: Prediction of interface residues in protein-protein complexes by a consensus neural network method: test against NMR data. Proteins 2005, 61(1):21–35. 10.1002/prot.20514View ArticlePubMedGoogle Scholar
- Dubey A, Realff MJ, Lee JH, Bommarius AS: Support vector machines for learning to identify the critical positions of a protein. J Theor Biol 2005, 234(3):351–361. 10.1016/j.jtbi.2004.11.037View ArticlePubMedGoogle Scholar
- Koike A, Takagi T: Prediction of protein-protein interaction sites using support vector machines. Protein Eng Des Sel 2004, 17(2):165–173. 10.1093/protein/gzh020View ArticlePubMedGoogle Scholar
- Li MH, Lin L, Wang XL, Liu T: Protein-protein interaction site prediction based on conditional random fields. Bioinformatics 2007. To be publishedView ArticleGoogle Scholar
- Ofran Y, Rost B: Analysing six types of protein-protein interfaces. J Mol Biol 2003, 325(2):377–387. 10.1016/S0022-2836(02)01223-8View ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ: Gapped Blast and Psi-blast: a new generation of protein database search programs. Nucleic Acids Research 1997, 25(17):3389–3402. 10.1093/nar/25.17.3389PubMed CentralView ArticlePubMedGoogle Scholar
- Dong Q, Wang XL, Lin L, Xu Z: Domain boundary prediction based on profile domain linker propensity index. Comput Biol Chem 2006, 30(2):127–133.View ArticlePubMedGoogle Scholar
- Dong Qw, Wang Xl, Lin L: Novel knowledge-based mean force potential at the profile level. BMC Bioinformatics 2006, 7: 324. 10.1186/1471-2105-7-324PubMed CentralView ArticlePubMedGoogle Scholar
- Dong QW, Wang XL, Lin L: Protein remote homology detection based on binary profiles. 1st International Conference on Bioinformatics Research and Development BIRD/LNBI 2007. To be publishedGoogle Scholar
- Ofran Y, Rost B: Predicted protein-protein interaction sites from local sequence information. FEBS Lett 2003, 544(1–3):236–239. 10.1016/S0014-5793(03)00456-3View ArticlePubMedGoogle Scholar
- Sander C, Schneider R: Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 1991, 9(1):56–68. 10.1002/prot.340090107View ArticlePubMedGoogle Scholar
- Karlin S, Brocchieri L: Evolutionary conservation of RecA genes in relation to protein structure and function. J Bacteriol 1996, 178(7):1881–1894.PubMed CentralPubMedGoogle Scholar
- Valdar WS, Thornton JM: Protein-protein interfaces: analysis of amino acid conservation in homodimers. Proteins 2001, 42(1):108–124. 10.1002/1097-0134(20010101)42:1<108::AID-PROT110>3.0.CO;2-OView ArticlePubMedGoogle Scholar
- Kouranov A, Xie L, de la Cruz J, Chen L, Westbrook J, Bourne PE, Berman HM: The RCSB PDB information portal for structural genomics. Nucleic Acids Res 2006, (34 Database):D302–305. 10.1093/nar/gkj120
- Bordner AJ, Abagyan R: Statistical analysis and prediction of protein-protein interfaces. Proteins 2005, 60(3):353–366. 10.1002/prot.20433View ArticlePubMedGoogle Scholar
- Henrick K, Thornton JM: PQS: a protein quaternary structure file server. Trends Biochem Sci 1998, 23(9):358–361. 10.1016/S0968-0004(98)01253-5View ArticlePubMedGoogle Scholar
- Nooren IM, Thornton JM: Diversity of protein-protein interactions. Embo J 2003, 22(14):3486–3492. 10.1093/emboj/cdg359PubMed CentralView ArticlePubMedGoogle Scholar
- Wu CH, Apweiler R, Bairoch A, Natale DA, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, et al.: The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res 2006, (34 Database):D187–191. 10.1093/nar/gkj161
- Xenarios I, Salwinski L, Duan XJ, Higney P, Kim SM, Eisenberg D: DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res 2002, 30(1):303–305. 10.1093/nar/30.1.303PubMed CentralView ArticlePubMedGoogle Scholar
- Kabsch W, Sander C: Dictionary of Secondary structure in Proteins: Pattern Recognition of Hydrogenbonded and Geometrical Features. Biopolymers 1983, 22(12):2577–2637. 10.1002/bip.360221211View ArticlePubMedGoogle Scholar
- Vapnik VN: Statistical learning theory. New York: Wiley; 1998.Google Scholar
- Chang CC, Lin CJ: LIBSVM: a library for support vector machines.2001. [http://www.csie.ntu.edu.tw/~cjlin/libsvm]Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.