HomPPI: a class of sequence homology based protein-protein interface prediction methods
© Xue et al; licensee BioMed Central Ltd. 2011
Received: 25 May 2010
Accepted: 17 June 2011
Published: 17 June 2011
Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate.
We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence.
Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein.
Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the target can be reliably identified. The HomPPI web server is available at http://homppi.cs.iastate.edu/.
Sequence homology-based methods offer a class of computationally efficient and reliable approaches for predicting the protein-protein interface residues that participate in either obligate or transient interactions. For query proteins involved in transient interactions, the reliability of interface residue prediction can be improved by exploiting knowledge of putative interaction partners.
Protein-protein interactions are central to protein function; they constitute the physical basis for formation of complexes and pathways that carry out virtually all major cellular processes. These interactions can be relatively permanent or "obligate" (e.g., in subunits of an RNA polymerase complex) or "transient" (e.g., kinase-substrate interactions in a signalling network). Both the distortion of protein interfaces in obligate complexes and aberrant recognition in transient complexes can lead to disease .
With the increasing availability of high throughput experimental data, two related problems have come to the forefront of research on protein interactions: i) prediction of protein-protein interaction partners; and ii) prediction of protein binding sites or protein-protein interfaces (PPIs). Although most effort to date has focused on one or the other of these problems, it is possible to use information from predicted protein-protein interaction networks as input for interface prediction methods, and predicted interface residues can be used as input for interaction partner predictions, a concept explored in a recent study of Yip et al. . In the current study, we focus on the prediction of protein-protein interfaces, specifically, the use of sequence homology-based methods to predict which residues of a query protein participate in its physical interaction with a partner protein or proteins.
Computational Prediction of Protein-Protein Interfaces
Several different genetic, biochemical, and biophysical methods have been used to identify and characterize protein interfaces . These experiments are very valuable and have contributed greatly to our knowledge of protein-protein interfaces. However, the high cost in time and resources required for these experiments call for reliable computational approaches to identify interface residues. In addition to providing important clues to biological function of novel proteins, computational predictions can reduce the searching space required for docking two polypeptides .
To distinguish interface residues from non-interface surface residues, a wide range of sequence, physicochemical and structural features have been investigated [3–18], and many in silico approaches to protein-protein interface prediction have been explored in the literature (reviewed in [19–21]). Protein-protein interface prediction algorithms can be classified into three categories: (i) sequence-based methods, which use only the primary amino acid sequence of the query protein as input [3, 22–28]; (ii) structure-based methods, which make use of information derived from the structure of the query protein [5, 18, 29–31]; and (iii) methods that use both sequence and structure derived information in making predictions [32, 33].
Several sequence-based protein-protein interface prediction methods have been explored in the literature [3, 22–28]. Most, if not all, of these methods, extract for each residue in the query protein, a fixed length window that includes the target residue and a fixed number of its sequence neighbours. Each residue is classified as an interface residue or a non-interface residue based on features of the amino acids in the corresponding window. Various methods differ both in the specific machine learning algorithms or statistical methods employed and in terms of the specific features of the amino acids used. Commonly used features include the identity of the amino acids in the window , the amino acid composition of interfaces , the physicochemical properties of the amino acids , and the degree of conservation of the amino acids (obtained by aligning the query sequence with homologous sequences) . Some studies report substantial improvements in interface residue prediction when predicted structural properties, e.g., solvent surface accessibility and secondary structure of the residues are utilized .
A number of structure-based methods [5, 18, 29–31] or hybrid methods that combine both sequence and structure-derived information [32, 33] have been proposed for predicting protein interfaces. The performance of the best-performing sequence-based methods is generally lower than that of structure-based methods (see  for a comparison). A possible explanation for the difference in the performance of sequence-based and structure-based protein interface residue predictors is that the latter can trivially eliminate non-surface residues from the set of candidate interface residues and potentially exploit a rich set of features derived from the 3D structures.
The use of structure-based methods, however, is limited to proteins for which the structure of the query protein is available, and the number of solved structures significantly lags behind the number of protein sequences . Even when the structure of a query protein is available, the application of structure-based prediction methods is complicated by conformational changes that take place when some proteins bind to their partners. Structure-based methods rely on structural features extracted from the structure in the unbound state or from a bound complex that has been separated into constituent proteins. It is unclear whether such structural features are indeed reliable predictors of interfaces for proteins that undergo significant conformational changes upon binding [20, 36]. Moreover, higher organisms have a large number of intrinsically disordered proteins/regions (IDPs/IDRs) that undergo induced folding only after binding to their partners . Such disordered regions - for which experimental structure information is, by definition, lacking - participate in many important cellular recognition events, and are believed to contribute to the ability of some hub proteins to interact with multiple partners in protein-protein interaction networks . Hence, there is an urgent need for sequence-based methods for reliable prediction of protein-protein interfaces.
Analysis of Interface Residue Conservation
The relationship between sequence conservation and various aspects of protein structure, interaction, expression, and function has been the focus of many studies over the past decades [39, 40], and sequence homology-based methods have been used for predicting both protein structure and protein function [41–53]. Thus, it is natural to ask whether protein-protein interface residues can be reliably identified using sequence homology-based methods. Published studies disagree on whether protein-protein interfaces are more conserved than the rest of the protein sequences. Grishin and Phillips , after examining five enzyme families, concluded that the degree of conservation of interfaces is same as that of protein sequences as a whole. The studies by Caffrey et al.  as well as Reddy and Kaznessis , found that the interacting surface-patches are not significantly more conserved than other surface-patches. Caffrey et al. , based on their study of 64 protein-protein interacting chains, found that interface residues are slightly more conserved than the rest of the protein surface residues. Reddy and Kaznessis , based on their study of 28 hetero transient and non-transient complexes, found that the fraction of highly conserved interface residues is greater than that of highly conserved non-interface surface residues. They suggested that the number of conserved residue positions is more predictive of protein-protein binding sites than the average conservation index of residues in the target patch. Choi et al.  analyzed 2,646 protein interfaces based on a conservation score that measures the position-specific evolutionary rate estimated using a phylogenetic tree , and concluded that protein interface residues are more conserved than non-interface surface residues.
Despite the disagreement regarding whether interface residues are conserved or not, several researchers have used conservation of residues to predict protein-protein interfaces with varying degrees of success. For example, the Evolutionary Trace (ET) method [59, 60] and its variants [58, 61–64] calculate conservation score for each residue using a phylogenetic tree built from a multiple sequence alignment. Residues with conservation scores above a certain threshold are mapped onto the 3D structure of the protein to identify putative binding sites. Carl et al.  used a dataset of sixteen transient protein chains to explore the feasibility of predicting protein-protein binding sites based on their membership in structurally conserved surface patches (where conserved patches are identified using structural alignment of a query protein with one or more of its structural homologs). Bordner and Abagyan  and Wang et al.  calculated evolution rate for each amino acid of protein sequences using phylogenetic trees, and used evolution rate as an attribute along with other physicochemical and sequential attributes to train a SVM classifier for interface residue prediction. Panchenko et al.  predicted functional sites of proteins using spatial averages of sequence conservation scores. Shoemaker et al.  have recently developed a web server for predicting protein binding sites by inspecting homologous proteins with similar structures. Based on a statistical analysis of target-template sequence alignments on a benchmark dataset of 329 two-chain complexes, Kundrotas and Vakser  have shown that it is possible to obtain high quality alignment of interface residues even when the overall alignment quality is rather poor. Specifically, they concluded that in approximately 50% of the complexes considered, the overall accuracy of the modelled interfaces was good enough for guiding docking.
Overview of the Paper
Against this background, we study a class of sequence homology-based methods for protein-protein interface prediction. We introduce a novel measure of interface conservation that captures the degree to which interface residues in each protein are conserved among its sequence homologs. First, we describe the results of our analysis of the interface conservation among homologous sequences using several large non-redundant datasets of protein-protein interfaces extracted from the Protein Data Bank (PDB) , including datasets that allow us to compare "obligate" versus "transient" interfaces. To explore the extent to which interface conservation can be exploited in the prediction of interface residues, we systematically examined the relationship between interface conservation and six sequence-based variables. In one set of experiments, we examined binding interfaces in homologous proteins without specifying a specific interaction partner (i.e., non-partner specific, NPS-interfaces). The results of this analysis indicated that interfaces in obligate complexes are, in general, more highly conserved than those in transient complexes. In a complementary set of experiments, we examined interfaces in complexes between specific pairs of proteins (i.e., partner-specific, PS-interfaces). In contrast to the results for NPS-interfaces, by focusing on the interface of each query protein with a specific binding partner, we discovered a high degree of sequence conservation in transient PS-interfaces. This analysis revealed that transient interfaces tend to be highly partner-specific.
Second, based on the results of protein interface conservation analysis we propose HomPPI, a class of sequence homology-based approaches to protein interface prediction. We present two variants of HomPPI: (i) NPS-HomPPI (non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein. The performance of both HomPPI methods was evaluated on several benchmark datasets, including a large non-redundant set of transient complexes. Due to the increasing importance of intrinsically disordered proteins in understanding molecular recognition mechanics and in rational drug design and discovery [72–75], we also tested NPS-HomPPI on two datasets of intrinsically disordered proteins.
Finally, we compare the performance of HomPPI with that of other web-based servers for interface residue prediction, using several performance measures that assess the reliability of correctly predicting, on average, interface and non-interface residues in a given protein. We discuss the relative advantages and limitations of homology-based methods for interface residue prediction.
To define conditions under which it should be possible to infer protein-protein interface (PPI) residues using conservation of interfaces in homologous proteins and/or complexes, we systematically examined the relationship between interface residue conservation and sequence similarity (based on BLAST alignments). Our analyses are based on the following datasets: Nr6505 (a large non-redundant dataset of protein chains extracted from PDB ), Oblig94 and Trans135 (a non-redundant obligate/transient binding dataset taken from ), and nr_pdbaa_s2c (BLAST database) (see Methods for additional details).
Conservation of PPIs in Non-Partner Specific (NPS) Interfaces
First, we examined the conservation of PPI residues in the absence of knowledge of interaction partners. For this study, we analyzed interfaces in putative homologs (hereafter, we refer to putative homologs as "homologs" for simplicity) of each protein in a large non-redundant dataset, Nr6505. After removing chains with interfaces containing fewer than 3 amino acids, we were left with 5853 chains. For each of the 5853 remaining proteins, we extracted homologs from the nr_pdbaa_s2c database using BLASTP  with expectation value (EVal) ≤ 10 from the resulting set of homologs, we eliminated those that were nearly identical to the query sequence (to ensure an accurate estimate of conservation). To ensure that the interface residues of the homologs could be reliably determined, we retained only those homologs that were part of complexes with resolution 3.5 Å or better. For each query-homolog pair in sequence alignments generated by BLASTP, we used the interface residues of the homolog(s) to predict the interface residues of the query protein. We calculated the correlation coefficient (CC) between the predicted and actual interface residues of the query protein, and refer to this value as the interface conservation (IC) score, i.e., the degree of conservation of interface residues between the query protein and its homologs (see Methods for details).
We examined the dependence of the interface conservation score on six NCBI BLAST alignment statistics: Expectation value (EVal), Identity Score, Positive Score, Local Alignment Length (LAL) and two Alignment Length Fractions (LAL/Query Length) and (LAL/Homolog Length). The EVal is a statistic that estimates the number of hits expected by chance when searching database of a particular size; the lower the EVal value, the more significant the score. The Identity Score is a measure of the degree of sequence identity between two amino acid sequences. The Positive Score returned by BLASTP is the number of positive-scoring matches in an alignment. It takes into account observed substitutions that preserve the physicochemical properties of the original residue. The LAL is the length of the local alignment; Alignment Length Fractions are LAL normalized by the length of the query or the length of the identified homologous sequence. We represent each query-homolog pair as a six dimensional vector defined by these six variables.
Principal Components Analysis of NPS-interface Conservation Space
As a first exploratory step, PCA (Principal Component Analysis) was applied to visualize the relationships between the interface conservation (IC) scores and the six BLAST alignment statistics. PCA, which is a dimensionality reduction technique, is typically used to represent dimensions that explain maximum variability and provide a simple and parsimonious description of the covariance structure .
The PCA analysis allows us to identify highly correlated explanatory variables. In Figure 1, the axes of the original 6 dimensional space are represented as blue vectors with red circles at their tips in the 2-dimensional space defined by PC1 and PC2. Highly correlated vectors (variables) have small angles between them. This type of analysis reveals, for example, that the two Alignment Length Fractions are highly correlated with each other, as are the Positive Score and Identity Score. Explanatory variables that are highly correlated with each other make similar contributions to the IC score.
BLAST EVal is a strong indicator of NPS-interface conservation
NPS-interface conservation in Twilight/Safe Zone is strongly positively correlated with log(LAL)
A high BLAST Positive Score reflects NPS-interface conservation
We also studied the relationship of IC score with the Identity Score, and the Local Alignment Length Fractions (LAL/Query Length) and (LAL/Homolog Length). As expected, the Identity Score results were similar to those for the Positive Score. The IC score was not as strongly linearly related to LAL fraction as it was to the log(LAL) (data not shown). Taken together, these results provide guidelines for choosing sequence similarity thresholds that reflect the degree of conservation in NPS interfaces.
NPS-Interface Conservation in Transient versus Obligate Binding Proteins
In light of reports that protein interfaces in transient complexes are not as conserved as those in obligate (permanent) complexes , it is interesting to ask whether the query-homolog pairs with near-zero IC scores (Figure 2 and Figure 3) tend to involve proteins that participate in transient interactions. To address this question, we further studied the differences in protein interface conservation among proteins that participate in transient versus obligate interactions.
To compare protein interfaces in transient and obligate complexes, we used the Trans135 and Oblig94 dataset obtained from , which includes a total of 270 chains from transient and 188 chains from obligate complexes. We extracted the homologs of each chain from nr_pdbaa_s2c using BLASTP with EVal ≤ 10 Query and homolog proteins with interfaces containing fewer than 3 amino acids were removed, as were homologs that were nearly identical to the query proteins. We extracted 43,115 query-homolog pairs containing chains that participate in transient interactions and 24,212 pairs containing chains that participate in obligate interactions.
Figure 6c reveals an obvious pattern of interface conservation in obligate binding proteins: a strong trend of increasing median IC score with decreasing log(EVal). In contrast, Figure 6d shows that for transient binding proteins, more of the median values of IC scores cluster around 0, indicating that log(EVal) has little relation to interface conservation in transient complexes.
Also, comparison of Figure 6e and 6f reveals that the Positive Score is a good indicator of interface conservation in the case of proteins from obligate complexes; however, this is not the case for proteins from transient complexes. For obligate binding proteins, when the Positive Score exceeds 45%, the medians of IC scores begin to show an increasing trend (Figure 6e). In contrast, in the case of transient binding proteins, medians of IC scores do not begin to increase until the Positive Score approaches 70% (Figure 6f).
It is important to emphasize that all of the interfaces analyzed above are what we refer to as "non partner-specific" (NPS). That is, the interface residues of a query protein represent the complete set of its interface residues with all of its partners. However, a given query protein can interact with different binding partners through different interfaces. A possible explanation for the low IC scores for NPS-transient interfaces is that the union of all interface residues of a transient binding protein are not highly conserved across its homologs. This does not preclude the possibility that such interfaces are conserved in the context of partner-specific interactions. We investigate this possibility in the following section.
Conservation of PPIs in Partner-Specific (PS) Interfaces
To examine the conservation of partner-specific (PS) interfaces in transient protein complexes, we again used the Trans135 dataset of protein pairs that participate in transient interactions . For each of the proteins in an interacting pair, we separately extracted the corresponding homologs, using BLASTP with expectation value EVal≤10 against the nr_pdbaa_s2c database. We removed homologs that are part of complexes with resolution worse than 3.5 Å. If query proteins A and B form a complex A-B, and have homologs A' and B' that interact in a complex A'-B', we consider A'-B' as a homo-interolog of A-B. To ensure an accurate estimate of conservation, from the resulting set of homo-interologs, we eliminated those that were within the same PDB complex as the query proteins, and those that were nearly identical to the query pairs (see Methods for additional details). For each protein chain in a query pair, we use the interface residues of its homolog in a homo-interolog to infer the PS interface residues of the query protein chain. Thus, we use the interface residues of A' in the homo-interolog (A'-B') of query pair A-B to infer the interfaces of A with B, based on the sequence alignment between A and A' obtained using BLASTP. We measure the similarity between a pair of interacting proteins A-B and its homo-interolog A'-B', in terms of the metrics for the quality of sequence alignment between A and A' and between B and B', using the six BLAST alignment statistics described above.
We used PCA of 3, 456 candidate homo-interologs to explore the relationship between interface conservation (IC score) and the six alignment statistics computed from the predicted PS interfaces, e.g., of chain A when it interacts with B, using known interfaces of A' with B'. This analysis revealed that much of the observed variance in IC scores is explained by three factors: (i) the average log (EVal); (ii) the average Positive Score of the homo-interolog and (iii) the alignment fractions Frac A , Frac A ', Frac B , and Frac B ' computed from the alignments of constituent chains (A with A' and B with B') (see Methods for additional details).
A comparison of the results for PS-interface conservation in transient complexes here (Figure 7a and 7b) with those obtained for NPS-interface conservation in transient complexes above (Figure 6d and 6f), reveals that the conservation of transient interfaces can be detected easily when the binding partner sequence information is utilized. The seemingly weak conservation of interfaces in transient complexes shown in Figure 6 is thus a consequence of the specificity of transient interfaces for different partners. Therefore, we conclude that interfaces in transient complexes are both highly partner-specific and highly conserved, when their partner-specificity is taken into account.
HomPPI - Homologous Sequence-Based Protein-Protein Interface Prediction
NPS-HomPPI - Given a query protein sequence, NPS-HomPPI searches the nr_pdbaa_s2c database to identify homologous proteins that are components of experimentally determined complexes with one or more other proteins. NPS-HomPPI labels a residue of the query sequence as an "interface" residue if a majority of residues in a selected subset of homologs in alignment of the query sequence with its homologs are interface residues, and as "non-interface" residue otherwise. Specifically, given a query protein, we first use NPS-HomPPI to search for sequence homologs within the Safe Zone. If at least one homolog in the Safe Zone is found, NPS-HomPPI uses the Safe homolog(s) to infer the interfaces of the query protein. Otherwise, the process is repeated to search for homologs in the Twilight Zone or the Dark Zone. If no homologs of the query protein can be identified in any of the three zones, NPS-HomPPI does not provide any predictions. The Safe, Twilight, and Dark Zone homologs of the query protein sequence to be used for interface prediction are identified by searching the nr_pdb_s2c database using BLASTP with thresholds based on the interface conservation analysis (see Methods Section for details) (after removing the query sequence and any highly similar sequences from the same species as the query sequence, in order to allow unbiased evaluation of the performance of NPS-HomPPI).
PS-HomPPI - Given the sequences of a query protein A and its putative binding partner B, PS-HomPPI searches the nr_pdbaa_s2c database to identify homologous complexes i.e., the homo-interologs of A-B. PS-HomPPI labels a residue of the query sequence as an "interface" residue (with respect to its putative binding partner) if a majority of the residues in the corresponding position in homologous complexes are interface residues, and as "non-interface" residues otherwise. PS-HomPPI uses homo-interologs in Safe and Twilight Zones to make predictions. The PS-HomPPI prediction process is thus analogous to that for NPS-HomPPI, using thresholds for "close homo-interologs" based on the results of interface conservation analysis of PS-interface conservation (see Methods Section for additional details).
Performance Evaluation of HomPPI Methods
We report several performance measures that provide estimates of the reliability of interface (and non-interface) residue predictions obtained using the HomPPI family of predictors. We compare the performance of HomPPI predictors with several state-of-the-art interface prediction methods on a benchmark dataset. We evaluate the effectiveness of HomPPI in predicting the interface residues of disordered proteins. Finally, we compare the partner-specific and non-partner-specific versions of HomPPI.
We focus our discussion on results using several performance measures that assess the effectiveness of the methods in reliably predicting, on average, the interface and non-interface residues of any given protein (See Methods for details). However, because several of the published studies report performance measures that assess the effectiveness of the methods in reliably assigning interface versus non-interface labels, on average, to any given protein residue, we also include results using "residue-based" performance measures in Supplementary Materials (See http://homppi.cs.iastate.edu/supplementaryData.html).
(i) NPS-HomPPI Performance on the Benchmark180 Dataset
Boundaries of Safe, Twilight and Dark Zones used by NPS-HomPPIa.
Twilight Zone 1 b
Twilight Zone 2 b
Interface Residue Prediction Performance of NPS-HomPPI on Benchmark180.
Hetero-dimer - Obligate
Homo-dimer - Obligate
Prediction Performance of NPS-HomPPI using Homologs from the Safe, Twilight, Dark Zones.
Enzyme-inhibitor, - Transient
Non-enzyme-inhibitor, - Transient (NEIT)
Hetero-dimer, - Obligate
Homo-dimer, - Obligate
(ii) Comparison of NPS-HomPPI with other PPI Prediction Servers
Direct comparison of NPS-HomPPI with other methods described in the literature is complicated by the limited availability of implementations of the underlying methods (many of which are available only in the form of servers), and differences in the choice of training and evaluation datasets, evaluation procedures and evaluation measures . Hence, we limit our comparisons of HomPPI with five state-of-the-art methods available as web-based servers: Promate , Cons-PPISP [33, 81], meta-PPISP , PIER  and PSIVER. All of these methods except PSIVER take advantage of both sequence and experimentally determined protein structure of the query proteins. They have been reported to be among the best performing methods currently available for predicting PPIs (see [20, 21] for reviews). PSIVER is one of the most recently published methods for interface residue prediction that only uses protein sequence-derived information. Although direct comparisons of the data representation and the algorithms used by PSIVER with those used by other sequence-based interface residue predictors are currently not available, PSIVER has been reported to outperform two other sequence-based servers: ISIS  and the sequence-based variant (made available as an experimental version in 2008) of SPPIDER .
Promate samples the protein surface using circular patches around a set of anchoring dots and estimates the probability that each surface dot belongs to an interface, based on the distribution of various physicochemical properties within interface and non-interface patches. Cons-PPISP is a consensus method that combines six neural networks trained on six datasets. Meta-PPISP is a consensus method that combines the output from cons-PPISP, Promate, and PINUP . PIER relies on partial least squares (PLS) regression of surface patch properties of the query protein. PSIVER uses PSSM profiles and predicted solvent accessibility as input features, and uses a Naïve Bayes classifier with parameters obtained using kernel density estimation. Because NPS-HomPPI does not take structural information into account, to compare its performance with the structure-based servers, we mapped the interfaces predicted by each server onto the full sequence of each query protein in order to evaluate prediction performance on the entire protein sequence.
For the two transient complex types, enzyme-inhibitors (Figure 8a) and transient non-enzyme-inhibitors, transient (Figure 8b), NPS-HomPPI consistently outperforms Promate, PIER, meta-PPISP, cons-PPISP, and PSIVER except for sensitivity values lower than 0.2 (which is very low to be useful in practice). On both obligate heterodimers (Figure 8c) and homodimers (Figure 8d), NPS-HomPPI outperforms all five servers across the full range of sensitivity and specificity values for which it can generate homology-based predictions. It should be noted that structure-based methods predict which surface residues are interface residues. In contrast, sequence-based methods have the more challenging task of identifying interface residues from the set of all residues. In other words, structure-based methods can trivially eliminate all non-surface residues from the set of candidate interface residues. Viewed in this light, the observed predictive performance of NPS-HomPPI, a purely sequence-based method, suggests that it is possible to make reliable non-partner-specific interface residue predictions using only the sequences of a protein by taking advantage of the conservation of interfaces in the context of non-partner-specific interactions.
(iii) Performance of NPS-HomPPI on Intrinsically Disordered Proteins
Intrinsically disordered proteins (IDPs) and proteins containing intrinsically disordered regions (IDRs) are attractive targets for drug discovery . The lack of defined tertiary structure in IDPs/IDRs poses a major challenge to structure-based interface prediction methods. Hence, we compared the performance of NPS-HomPPI with ANCHOR , a recently published method for the prediction of binding regions in disordered proteins. For this comparison, we used two non-redundant disordered protein datasets, S1 and S2, recently collected by Meszaros et al. . Some of the test proteins are based on data from NMR structures. In order to compare NPS-HomPPI with ANCHOR on the largest possible number of cases available to us, we extracted interface residues from these NMR cases; however, we used only sequence homologs with interface residues determined from X-ray structures to make predictions.
At present, NPS-HomPPI has relatively high prediction coverage for long disordered proteins (78%; 31 out of 40 interfaces of disordered proteins), but lower coverage for short disordered proteins (50%; 28 out of 56 interfaces of disordered proteins). This is in part due to that fact that many disordered proteins available in the PDB have only NMR structures, which were excluded from the current study. Incorporation of data from NMR structures in the future can be expected to increase the coverage of NPS-HomPPI for disordered proteins.
(iv) Performance of NPS-HomPPI versus PS-HomPPI
Our analysis of the conservation of PS-transient interfaces described earlier suggests that many interfaces in transient protein complexes are highly partner-specific. Thus, we implemented a variant of HomPPI, designated PS-HomPPI, to evaluate the possibility that prediction of interface residues, especially in transient complexes, can be improved by using sequence information about specific binding partners, when available.
We first evaluated the performance of PS-HomPPI on a transient complex dataset, Trans135 (dimers from the dataset in ). PS-HomPPI found at least one homo-interolog that meets the Safe or Twilight similarity thresholds for 60% (162/270) proteins in the Trans135 dataset. Overall, PS-HomPPI had an average CC of 0.65, sensitivity of 0.69, specificity of 0.70 and accuracy of 0.92.
To investigate whether the partner information is, in fact, helpful in predicting interfaces we directly compared the performance of PS-HomPPI with NPS-HomPPI on the Trans135 dataset. In Trans135, there were 139 out of 270 chains that for which predictions could be generated by both NPS-HomPPI (using homologs) and PS-HomPPI (using homo-interologs) from the Safe or Twilight zones (see Methods for details).
Protein Interface Conservation across Structure Space
The study of protein interface conservation among proteins with similar structures has received considerable attention in recent years. By analyzing the structural similarity of representative protein-protein interfaces in dimeric proteins, Gao and Skolnick  showed that the vast majority of native interfaces have a close structural neighbor with similar backbone Cα geometry and interface contact pattern.
In a related study, Zhang et al.  explored the conservation of interface residues among structural neighbors of a query protein (i.e., proteins that share the same SCOP family, superfamily or fold, or a high degree of structural similarity regardless of their SCOP classification). They showed that: (i) interfaces are indeed conserved among structural neighbors; (ii) the degree of interface conservation is most significant among proteins that have a clear evolutionary relationship. They further showed that conservation of interface residues among structural neighbors can be successfully exploited to predict protein-protein interfaces based on protein structure information.
To investigate the extent to which conservation of interface residues can be used to improve the prediction of protein-protein interfaces based on protein sequence information, we systematically studied interface conservation across sequence space. Our results demonstrate that protein interfaces from different binding types are conserved among proteins with homologous sequences. We further showed that the degree of conservation of interfaces is even greater when putative interaction partners are taken into account. The IC score, our measure of interface conservation, unlike those used in previous studies  (e.g., residue conservation in sequence alignments), makes direct use of experimentally determined interface residues to measure the degree of interface conservation. Specifically, the IC score directly measures the extent to which the interface residues of sequence homologs of a query protein are predictive of the interface residues of a query protein. Hence, the IC score provides the basis for setting the parameters of our sequence homology-based interface prediction methods.
Distance Functions for Identifying Putative Homologs with Conserved Interfaces
Because we do not know the IC score for a query sequence with unknown interface residues, we identified several statistics associated with the BLASTP alignment of a query sequence with its homologs that are correlated with the IC score. We found that interface residues of a query protein can be reliably predicted from the known interfaces of its homologs (and in the case of partner-specific predictions, the homologs of its interaction partner as well) when the homologs are selected taking into account measures of quality of sequence alignment, specifically NCBI BLAST sequence alignment statistics. The HomPPI methods presented here use simple linear combinations of BLAST sequence alignment statistics, determined using PCA analysis of the relationship between the statistics and the IC score. It would be interesting to explore optimal, perhaps non-linear, combinations of parameters to maximize the desired performance criteria (e.g., sensitivity, specificity, or some combination thereof).
Conservation of Interfaces in Obligate and Transient Complexes
Our results are consistent with previous studies , in that we found interface residues to be more highly conserved than non-interface residues, in both obligate and transient complexes. We also found that when information regarding the specific binding partner of a query protein is not taken into account in estimating the conservation score, interfaces in transient complexes appear to be less highly conserved than those in obligate complexes. Our results further show that transient interfaces are highly partner-specific, and that the partner-specific interfaces in transient complexes are, in fact, highly conserved. Interfaces of intrinsically disordered proteins that nevertheless form ordered complexes with globular proteins are also highly conserved (see below).
Interfaces of Disordered Proteins Are Highly Conserved and Non Partner-Specific
Compared with its performance on transient binding proteins in the Benchmark180 dataset, NPS-HomPPI performs much better on interfaces of disordered proteins in the S1 and S2 datasets. This is consistent with the conclusion of Meszaros et al.  that interfaces of intrinsically disordered proteins are evolutionarily conserved. The high degree of conservation of interface (binding) regions in IDPs also reflects the important biological functions in which many disordered proteins participate. It is believed that the flexibility of disordered binding regions may facilitate the binding of IDPs using the same set of binding residues to different binding partners (at different times) . Our results suggest that this specialized disorder-to-order transition as a result of binding may be associated with a high degree of interface conservation.
The conservation of interfaces in IDPs may contribute to the generally successful application of interface residue predictors to interfaces in IDPs. Several groups have developed methods for predicting disordered binding regions, including PONDR VL-XT [92, 93], ANCHOR , and other examples reviewed in , that have produced encouraging results. The success of these predictors suggests that at least some sequence features are likely to be conserved within binding regions of different IDPs.
The fact that disordered interfaces can be reliably inferred by NPS-HomPPI indicates that disordered interfaces are non-partner-specific, which is consistent with findings that these proteins are able to bind a broad range of ligands through common binding regions [95, 96].
Performance of HomPPI Compared with Published Methods
Our results show that whenever the interfaces of the close sequence homologs of a query protein are available, NPS-HomPPI outperforms several state-of-the-art protein interface prediction servers (many of which take advantage of the structure of the query protein), over a broad range of sensitivity and specificity values. In the case of transient complexes (Figure 8a and 8b), NPS-HomPPI consistently outperforms Promate, PIER, meta-PPISP, cons-PPISP, and PSIVER except for sensitivity values lower than 0.2. On obligate dimers (Figure 8c and 8d), NPS-HomPPI significantly outperforms all five servers across the full range of sensitivity and specificity values for which it can generate homology-based predictions. These results strongly suggest that it is possible to reliably predict protein interface residues using only sequence information whenever the interface residues of sequence homologs of the query protein are known. Each of the webbased PPI servers with which we compared our NPS-HomPPI server, except PSIVER, take advantage of the structure of the query proteins to determine surface residues, and restrict the predicted interface residues to a subset of the surface residues. This trivially reduces the number of false positive interface residue predictions (relative to the total number of residues in the query protein) which, in turn, yields a substantial increase in the specificity of interface predictions produced by structure-based servers. Consequently, purely sequence-based protein interface prediction servers have a handicap relative to structure-based prediction servers. When viewed in this light, performance of NPS-HomPPI relative to the state-of-the-art protein interface prediction methods is especially impressive.
The HomPPI methods for interface residue prediction do have an important limitation, however, in that they rely on the availability putative homologs for which experimentally-determined structures of bound complexes are available in the PDB. One may ask whether the coverage of the HomPPI family of protein-protein interface prediction methods is broad enough to be sufficiently useful in practice. We address this question below.
Prediction Coverage of HomPPI Methods
The current coverage of HomPPI protein interface prediction methods can be assessed from our results as follows:
• Benchmark180 dataset: NPS-HomPPI found at least one homolog that meets the similarity thresholds for Safe or Twilight Zones for 73% (83/114) of the obligate binding chains (homo and hetero-dimers). Among these, 82% (68/83) were predicted with both sensitivity and specificity ≥0.50, simultaneously. Similarly, at least one homolog was found for 62% (42/66) of transient binding chains (enzyme-inhibitors and non-enzyme inhibitors) in this dataset. Among these 55% (23/42) were predicted with both sensitivity and specificity ≥0.5.
• Trans135 dataset: In the case of transient query proteins in the Trans135 dataset, NPS-HomPPI found at least one homolog that meets the similarity thresholds for Safe or Twilight Zones for 75% (202/270) of chains. Among these, 37% (74/202) were predicted with both sensitivity and specificity ≥0.5.
• Disordered protein datasets S1 and S2: In the case of disordered proteins, NPS-HomPPI found at least one homolog that meets the similarity thresholds for Safe or Twilight or Dark Zones for 50% (26/52) of interfaces of disordered proteins in S1, the short disordered protein set, and 75% (30/40) of interfaces of disordered proteins in S2.
• Trans135 dataset: PS-HomPPI found at least one homo-interolog that meets the Safe or Twilight similarity thresholds for 60% (162/270) proteins in the Trans135 dataset. Among these, 80% (130/162) where predicted with sensitivity and specificity ≥0.5, simultaneously.
Based on these results, we estimate that, at present, the coverage of the HomPPI protein interface prediction methods is in the range of 60-70% of all query proteins. As the structural genomics projects currently underway generate increasing numbers of structures of protein-protein complexes , we can expect corresponding increases in the coverage of HomPPI family of protein interface prediction methods. In the meantime, one can envision hybrid methods that combine HomPPI with one or more machine learning based methods that do not require the availability of putative homologs for which experimentally determined structures of bound complexes are available in the PDB.
Parameters for HomPPI Can Be Relaxed for Obligate Interactions
The current default parameters for HomPPI are intentionally rather stringently set based on the results of our statistical analysis of interface conservation using Trans135, which is a dataset of transient binding proteins. Our analyses suggest that NPS-HomPPI has wider Safe and Twilight Zones for obligate binding proteins than for transient binding proteins. Furthermore, even Dark Zone homologs yield interface predictions that are accurate enough to be useful in practice, with average specificity of 0.47 and sensitivity of 0.66 for hetero-obligate dimers, average specificity of 0.44 and sensitivity of 0.47 for homo-obligate dimers (see Table 3). Therefore, for obligate interactions, if a query protein has little sequence similarity with proteins in the PDB, the thresholds of NPS-HomPPI can be relaxed to allow identification of more distant homologs with potentially conserved interfaces that still provide reliable interface predictions.
Prediction of Binding Partners vs. Prediction of Interface Residues
Protein interface (binding site) predictions and protein interaction (partner) predictions answer closely related, but different questions. Non partner-specific protein interface predictors are designed to identify the residues in a query protein that are likely to make contact with the residues of one or more unspecified interaction partner proteins. Partner-specific protein interface predictors are designed to identify the residues in a query protein that are likely to make contact with residues of a putative interaction partner protein. In contrast, protein interaction predictors are designed to predict whether or not a given pair of proteins is likely to interact [98–101]. Although our study does not directly address the latter question, it is possible to use PS-HomPPI predictions to determine whether or not two query proteins interact: Given a pair of protein sequences, say A and B, we can first use PS-HomPPI to predict the interface residues of A with its putative partner B; and the interface residues of B with its putative partner A. If, in both cases, some number of interface residues are predicted, we can infer that proteins A and B are likely to interact with each other. Conversely, it is possible to use information from predicted protein-protein interactions to refine interface predictions. Yip et al.  have proposed an approach to utilize residue level information to improve the accuracy of protein level predictions, and vice versa. They have shown that a two-level machine learning framework that allows information flow between the two levels through shared features yields predictions that are more accurate than those obtained independently at each of the levels.
Using Interface Predictions to Steer Docking and to Rank Docked Conformations
Reliable partner-specific interface predictions can be used to restrict the search space for protein-protein docking by specifying the contacts that need to be preserved in the docked conformation. It is also possible to rank the conformations produced by docking, based on the degree of overlap between the interface of a query protein and its binding partner in the docked conformation with the interface generated by a partner-specific interface prediction method, e.g. PS-HomPPI. In related work, we have shown that PS-HomPPI provides reliable interface predictions on a large subset of a Docking Benchmark Dataset, and is both fast and robust in the face of conformational changes induced by complex formation. The quality of the ranking of docked conformations by PS-HomPPI interface prediction is consistently superior to that produced using ClusPro cluster-size-based and energy-based criteria for 61 out of 64 docking complexes for which PS-HomPPI produces interface predictions.
We studied a large number of sequence alignments between protein pairs with known interfaces to explore the conditions under which conservation of protein interface residues, as determined by the alignment of a query sequence against its homologs/homo-interologs, can be used to reliably predict protein-protein interfaces. Based on the results of these analyses, we developed HomPPI, a simple sequence-based method for predicting interface residues based on the known interface residues in homologous sequences. HomPPI has two variants: NPS-HomPPI (for predicting interface residues of a query protein with unspecified interaction partners) and PS-HomPPI (for predicting interface residues of query proteins with a specified putative interaction partner).
Our systematic evaluation of NPS-HomPPI showed that, when close homologs can be identified, NPS-HomPPI can reliably predict interface residues in both obligate and transient complexes, with a performance that rivals several state-of-the-art structure-based interface prediction servers. NPS-HomPPI can also be used as a reliable tool for identifying disordered binding regions. In this regard, NPS-HomPPI has an advantage over structure-based interface predictors, which cannot be used to predict binding sites in disordered regions of proteins because they do not form stable structures in their unbound state. In addition, the HomPPI family of interface prediction methods are fast enough for proteome-wide analyses.
Many studies on in silico identification of protein interfaces have been published in the past decade. However, despite the fact that many proteins are very specific in their choice of binding partners, the majority of studies focus on only one side of the bound complex. In this study, we implemented a novel partner-specific protein interface prediction method, PS-HomPPI, which infers interface residues based on known interfaces in the homo-interologs, i.e., complexes formed by homologs of the query protein and its putative interaction partner. When homo-interologs can be identified, PS-HomPPI can reliably predict highly partner-specific transient interfaces.
Although our focus in this study was on prediction of protein-protein interfaces, these methods could be useful in other settings, such as sequence-based prediction of protein-DNA, protein-RNA, and protein-ligand interfaces, and the prediction of B and T cell epitopes.
Both NPS-HomPPI and PS-HomPPI have been implemented in a server available at: http://homppi.cs.iastate.edu/.
Five datasets were used in this paper:
Nr6505 - For analyzing the protein interface conservation.
Oblig94 and Trans135 - For comparing the degree of conservation of protein interfaces in transient/obligate binding proteins.
Benchmark180 - For evaluating the prediction performance of HomPPI.
S1 and S2 - For evaluating the performance of NPS-HomPPI on interfaces of disordered proteins.
nr_pdbaa_s2c - For BLASTP searching for close sequence homologs
Extract all the X-ray derived protein structures with resolution 3.5 Å or better in PDB. Remove proteins with less than 40 residues. We obtained 102,853 protein chains.
Remove redundancy of the resulting dataset in step 1 using PISCES. All the remaining sequences have less than or equal to 30% sequence similarity. We obtained 6505 chains.
Oblig94 and Trans135
This dataset of 94 obligate protein-protein dimer complexes and the dataset of 135 transient dimer complexes was obtained from a large non-redundant dataset of 115 obligate complexes and 212 transient complexes (3.25 Å or better resolution, determined using X-ray crystallography) previously generated by Mintseris and Weng  to study the conservation of protein-protein interfaces. In ordered to exclude the influence of other types of interfaces, we extracted 94 obligate dimers and 135 transient dimers from the original dataset and get Oblig94 and Trans135. In Oblig94, 1QLA has been superseded by 2BS2. In Trans135, 1DN1 and 1IIS have been superseded by 3C98 and 1T83, respectively, and 1F83, 1DF9, 4CPA and 1JCH have since been deemed as obsolete and hence discarded from PDB.
We tested NPS-HomPPI on a benchmark dataset manually collected and used as evaluation dataset by Bradford and Westhead . This dataset consists of 180 protein chains taken from 149 complexes; 36 of these are involved in enzyme-inhibitor interactions, 27 in hetero-obligate interactions, 87 in homo-obligate interactions, and 30 in non-enzyme-inhibitor transient (NEIT) interactions.
Disordered protein datasets S1 and S2
We evaluated the performance of NPS-HomPPI on a non-redundant disordered dataset that has been recently collected by Meszaros et al . S1 consists of 46 complexes of short disordered and long globular proteins. S2 consists of 28 complexes of long disordered and long globular proteins. Note that a protein complex e.g., 1fv1 C:AB formed by a disordered protein C with two ordered proteins A and B, yields two sets of interface residues for C (corresponding to interfaces between C with A and C with B). As a result, 46 complexes in S1 and 28 complexes in S2 (respectively) correspond to 56 and 40 interfaces of disordered proteins. We focused on cases in which NPS-HomPPI is able to identify Safe/Twilight/Dark zone homologs for the query proteins resulting in NPS-HomPPI interface predictions for 28 out of 56 and 31 out of 40 interfaces of disordered proteins in S1 and S2 respectively.
This dataset is used for BLASTP searches. We used the fasta files from S2C database  to generate our BLAST database nr_pdbaa_s2c. We removed proteins with resolution worse than 3.5 Å from S2C fasta formatted database. We built a non-redundant database for BLAST queries from the S2C fasta formatted database. To generate the non-redundant BLAST database, we grouped proteins with identical sequences into one entry. We used the resulting database to search for homologs of a query sequence using BLASTP 2.2.22+ . There are 36,352 sequences and 9,549,671 total residues in nb_pdbaa_s2c.
The Proportion of Interface Residues in Datasets used in this Study.
Number of Interface Residuesa
Total Number of Residuesb
% Interface Residues
Disordered S1 c
Disordered S2 c
Mapping Interfaces in Structures to Sequences
We label the protein sequences as interface or non-interface residues (according to the definition of interface residues given above) as follows: We first calculate the relevant distances between atoms using the atom coordinates in ATOM section in PDB files. Then, by associating the ATOM section to residues in the SEQRES section, we can map the corresponding residues to protein sequences. However, various errors in PDB files make this a non-trivial task. Hence, we used the mapping files from S2C database, which offers corrected mapping information from ATOM section to residues in the SEQRES section of PDB files, to map interfaces determined in structures to full sequences.
NCBI BLAST Parameters
BLAST Substitution Matrices and Gap Costs used for BLASTP searches in this paper.
where TP i , FP i , TN i and FN i are respectively the number of interface residues of protein i that are correctly predicted to be interface residues, the number of residues of protein i that are incorrectly predicted to be interface residues, the number of residues of protein i that are correctly predicted to be non-interface residues, and the number of residues of protein i that are incorrectly predicted to be non-interface residues.
where N is the total number of test proteins.
These measures describe different aspects of predictor performance. The overall sensitivity is the probability, on average, of correctly predicting the interface residues of a given protein. The overall specificity is the probability, on average, that a predicted interface residue in any given protein is in fact an interface residue. The overall accuracy corresponds to the fraction of residues in any given protein, on average, that are correctly predicted. The overall Matthews correlation coefficient measures of how predictions correlate, on average, with true interfaces and non-interfaces.
Often it is possible to trade off one performance measure (e.g., specificity) against another (e.g., sensitivity) by varying the threshold that is applied to the prediction score to generate the binary (interface versus non-interface) predictions. Hence, we include of the overall sensitivity against overall specificity for different choices of the threshold. The resulting specificity-sensitivity plots or precision-recall plots show the trade-off between sensitivity and specificity and hence provide a much more complete picture of predictive performance.
Residue-based specificity-sensitivity plots in this case show how the trade-off between specificityR and specificityR is obtained by varying the threshold applied to the prediction score. The residue-based performance measures provide an estimate of the reliability of the predictor in correctly labelling a given residue. However, in practice, it is useful to know how well a predictor can be expected to perform on a given protein sequence as opposed to a residue. sensitivityP, specificityP, accuracyR, and CCP are more informative than their residue-based counterparts. Hence, in this paper, we report results based on the protein-based measures although, for the purpose of comparison with other published methods, we include the results based on the residue-based measures in Supplementary Materials in HomPPI website.
Interface Conservation (IC) Scores
In protein interface conservation analysis, we used the CC (defined above) as a measure of the extent to which the interface residues in query protein are similar to those in a putative homolog. For clarity, we refer this measure as the Interface Conservation (IC) score.
NPS-HomPPI is a Non-Partner-Specific Homologous Sequence-Based Protein-Protein Interface Prediction algorithm. NPS-HomPPI is based on the conclusion from statistical analysis of protein interface conservation on Nr6505, Trans135 and Oblig94, i.e., that protein interfaces are conserved across close sequence homologs.
If at least one homolog in the Safe Zone is found by the BLASTP search, NPS-HomPPI uses the Safe Zone homolog(s) to infer the interfaces of the query protein. Otherwise, the search is repeated for homologs in the Twilight and Dark Zones. If NPS-HomPPI cannot find homologs in any of the three zones, it does not provide any predictions. The default zone boundaries used by NPS-HomPPI (and hence the parameters used in NPS-HomPPI search for homologs of a query sequence) is based on our interface conservation analysis on the dataset of transient dimers Trans135 (Table 1). The choice of these default parameter thresholds for NPS-HomPPI is intentionally rather conservative; the thresholds can be relaxed if additional information is available (e.g., if we know that the query protein is an obligate binding protein). The IC score of each of the homologs of a query sequence in the alignment returned by BLASTP is predicted using the regression model for the IC score (see eq. 1) from the BLASTP statistics for the alignment of each homolog with the query sequence. For a given query sequence, at most K closest (Safe, Twilight, or Dark Zone homologs, as the case may be, in that order) are selected from the alignment of the query sequence with its homologs to be used to infer the interface residues of the query sequence. In our experiments, K, the maximum number of homologs used in the prediction was set equal to 10. At most K homologs of the query sequence are determined by ranking the homologs in the alignment in decreasing order of their predicted IC scores and choosing (at most) K Safe zone homologs (or Twilight zone homologs if no Safe zone homologs exist or Dark zone homologs if neither Safe nor Twilight zone homologs exist). Once the (at most) K closest homologs to be used for predicting the interface residues of the query sequence are chosen, each residue in the query sequence is labelled as an interface or non-interface residue based on the majority (over the set of at most K closest homologs of the query sequence) of the labels associated with the corresponding position in the alignment. More specifically, each of the at most K homologs provides a positive vote for a given position in the query sequence if the corresponding residue of the homolog is an interface residue; and a negative vote if it is a non-interface residue. The prediction score of NPS-HomPPI for that position in the query sequence is simply the number of positive votes divided by the total number of votes. A query sequence residue with a HomPPI score ≥0.5 is predicted to be an interface residue (See Figure 11 for an example); otherwise, it is predicted to be a non-interface residue. This procedure can be seen as an application of the (at most) K nearest neighbor classifier at each residue of the query sequence.
NPS-Interface Conservation As a Function of Sequence Alignment
We built a linear model for NPS-interface conservation based on the most important sequence alignment statistics identified in the PCA analysis: logEVal, Positive Score, logLAL.
Variables, Parameter Estimates and Significance Values for the Linear Model for NPS-Interface Conservation.
Pr > |t|
PS-HomPPI predicts the interface residues in a protein chain based on the known interface residues of its closest homo-interologs. Given a query protein A and its interaction partner B, PS-HomPPI first identifies the set homo-interologs of A-B using BLASTP to identify the homologs of A and homologs of B. From the BLASTP results, we identify a set of homo-interologs that meet sequence similarity thresholds (determined based on the results of our partner-specific interface conservation analysis, as described in the Results Section). We discard the whole PDB complex that contains A-B, to ensure an objective assessment of the reliability of our prediction procedure. For query A-B and its homologous interacting pair A'-B', we also discard the interacting protein pair A'-B' if A and A' or B and B' share ≥95% sequence identity and belong to the same species.
Boundaries of Safe, Twilight and Dark Zones used by PS-HomPPI.
Frace AA '
Frace BB '
Twilight Zone 1
Frace AA '
Frace BB '
Twilight Zone 2
Frace AA '
Frace BB '
Frace AA '
Frace BB '
PS-HomPPI predicts whether an amino acid in query sequence A is an interface residue or not based on the corresponding position in its alignment with (at most) K of the closest homo-interologs of A-B (based on their predicted IC scores). In our experiments, K was set equal to 10. Given a query-partner pair A-B, we label each position in the amino acid sequence of protein A as an interface or non-interface based on whether or not a majority of the corresponding positions of the homologs of A within the homo-interologs of A-B are interface residues. More specifically, each of the at most K homo-interologs provides a positive vote for a given position in the query protein sequence A if the corresponding residue of its homolog A' in its homo-interolog is an interface residue; and a negative vote if it is a non-interface residue. The prediction score of PS-HomPPI for that position in the query sequence is simply the number of positive votes divided by the total number of votes. A residue in the query protein A with a prediction score ≥0.5, is predicted as interface, otherwise, it is predicted as non-interface.
PS-Interface Conservation As a Function of Sequence Alignment
Variables, Parameter Estimates and Significance Values for the Linear Model for PS-Interface Conservation.
Pr > |t|
Availability and Requirements
* Project name: HomPPI
* Project home page: http://homppi.cs.iastate.edu/
* Programming language: Perl
This work was funded in part by the National Institutes of Health grant GM066387 to Vasant Honavar and Drena Dobbs and in part by a research assistantship funded by the Center for Computational Intelligence, Learning, and Discovery. The authors sincerely thank Irina Kufareva in Abagyan Lab at the University of California for providing PIER prediction results. The authors also thank Rafael Jordan and Fadi Towfic for helpful discussions and assistance with the web server implementation. The work of Vasant Honavar while working at the National Science Foundation was supported by the National Science Foundation. Any opinion, finding, and conclusions contained in this article are those of the authors and do not necessarily reflect the views of the National Science Foundation.
- Shoemaker BA, Panchenko AR: Deciphering protein-protein interactions. Part I. Experimental techniques and databases. PLoS Comput Biol 2007, 3: e42. 10.1371/journal.pcbi.0030042PubMed CentralPubMedGoogle Scholar
- Yip KY, Kim PM, McDermott D, Gerstein M: Multi-level learning: improving the prediction of protein, domain and residue interactions by allowing information flow between levels. BMC Bioinformatics 2009, 10: 241. 10.1186/1471-2105-10-241PubMed CentralPubMedGoogle Scholar
- Ofran Y, Rost B: ISIS: interaction sites identified from sequence. Bioinformatics 2007, 23: e13–16. 10.1093/bioinformatics/btl303PubMedGoogle Scholar
- Jones S, Marin A, Thornton JM: Protein domain interfaces: characterization and comparison with oligomeric protein interfaces. Protein Eng 2000, 13: 77–82. 10.1093/protein/13.2.77PubMedGoogle Scholar
- Jones S, Thornton JM: Analysis of protein-protein interaction sites using surface patches. J Mol Biol 1997, 272: 121–132. 10.1006/jmbi.1997.1234PubMedGoogle Scholar
- Jones S, Thornton JM: Principles of protein-protein interactions. Proc Natl Acad Sci USA 1996, 93: 13–20. 10.1073/pnas.93.1.13PubMed CentralPubMedGoogle Scholar
- Larsen TA, Olson AJ, Goodsell DS: Morphology of protein-protein interfaces. Structure 1998, 6: 421–427. 10.1016/S0969-2126(98)00044-6PubMedGoogle Scholar
- Argos P: An investigation of protein subunit and domain interfaces. Protein Eng 1988, 2: 101–113. 10.1093/protein/2.2.101PubMedGoogle Scholar
- Bogan AA, Thorn KS: Anatomy of hot spots in protein interfaces. J Mol Biol 1998, 280: 1–9. 10.1006/jmbi.1998.1843PubMedGoogle Scholar
- Chakrabarti P, Janin J: Dissecting protein-protein recognition sites. Proteins 2002, 47: 334–343. 10.1002/prot.10085PubMedGoogle Scholar
- Chothia C, Janin J: Principles of protein-protein recognition. Nature 1975, 256: 705–708. 10.1038/256705a0PubMedGoogle Scholar
- Janin J, Chothia C: The structure of protein-protein recognition sites. J Biol Chem 1990, 265: 16027–16030.PubMedGoogle Scholar
- Lo Conte L, Chothia C, Janin J: The atomic structure of protein-protein recognition sites. J Mol Biol 1999, 285: 2177–2198. 10.1006/jmbi.1998.2439PubMedGoogle Scholar
- Lijnzaad P, Berendsen HJ, Argos P: Hydrophobic patches on the surfaces of protein structures. Proteins 1996, 25: 389–397. 10.1002/(SICI)1097-0134(199607)25:3<389::AID-PROT10>3.3.CO;2-SPubMedGoogle Scholar
- Sheinerman FB, Norel R, Honig B: Electrostatic aspects of protein-protein interactions. Curr Opin Struct Biol 2000, 10: 153–159. 10.1016/S0959-440X(00)00065-8PubMedGoogle Scholar
- Tsai CJ, Lin SL, Wolfson HJ, Nussinov R: Studies of protein-protein interfaces: a statistical analysis of the hydrophobic effect. Protein Sci 1997, 6: 53–64.PubMed CentralPubMedGoogle Scholar
- Glaser DMSF, Vakser IA, Ben-Tal N: Residue frequencies and pairing preferences at protein-protein interfaces. Proteins 2001, 43: 89–102. 10.1002/1097-0134(20010501)43:2<89::AID-PROT1021>3.0.CO;2-HPubMedGoogle Scholar
- Neuvirth H, Raz R, Schreiber G: ProMate: a structure based prediction program to identify the location of protein-protein binding sites. J Mol Biol 2004, 338: 181–199. 10.1016/j.jmb.2004.02.040PubMedGoogle Scholar
- Ezkurdia I, Bartoli L, Fariselli P, Casadio R, Valencia A, Tress ML: Progress and challenges in predicting protein-protein interaction sites. Brief Bioinform 2009, 10: 233–246.PubMedGoogle Scholar
- Zhou HX, Qin S: Interaction-site prediction for protein complexes: a critical assessment. Bioinformatics 2007, 23: 2203–2209. 10.1093/bioinformatics/btm323PubMedGoogle Scholar
- de Vries SJ, Bonvin AM: How proteins get in touch: interface prediction in the study of biomolecular complexes. Curr Protein Pept Sci 2008, 9: 394–406. 10.2174/138920308785132712PubMedGoogle Scholar
- Murakami Y, Mizuguchi K: Applying the Naive Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites. Bioinformatics 2010, 26: 1841–1848. 10.1093/bioinformatics/btq302PubMedGoogle Scholar
- Florencio Pazos MH-C, Ausiello Gabriele, Valencia Alfonso: Correlated Mutations Contain Information About Protein-protein Interaction. J Mol Biol 1997, 271: 511–523. 10.1006/jmbi.1997.1198Google Scholar
- Sikic M, Tomic S, Vlahovicek K: Prediction of protein-protein interaction sites in sequences and 3D structures by random forests. PLoS Comput Biol 2009, 5: e1000278. 10.1371/journal.pcbi.1000278PubMed CentralPubMedGoogle Scholar
- Chen P, Li J: Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information. BMC Bioinformatics 2010, 11: 402. 10.1186/1471-2105-11-402PubMed CentralPubMedGoogle Scholar
- Yan C, Dobbs D, Honavar V: Identification of surface residues involved in protein-protein interaction - a support vector machine approach. Intelligent Systems Design and Applications 2003, 53–62.Google Scholar
- Yan C, Dobbs D, Honavar V: A two-stage classifier for identification of protein-protein interface residues. Bioinformatics 2004, 20(Suppl 1):i371–378. 10.1093/bioinformatics/bth920PubMedGoogle Scholar
- Res I, Mihalek I, Lichtarge O: An evolution based classifier for prediction of protein interfaces without using protein structures. Bioinformatics 2005, 21: 2496–2501. 10.1093/bioinformatics/bti340PubMedGoogle Scholar
- Bahadur RP, Zacharias M: The interface of protein-protein complexes: analysis of contacts and prediction of interactions. Cell Mol Life Sci 2008, 65: 1059–1072. 10.1007/s00018-007-7451-xPubMedGoogle Scholar
- Fariselli P, Pazos F, Valencia A, Casadio R: Prediction of protein--protein interaction sites in heterocomplexes with neural networks. Eur J Biochem 2002, 269: 1356–1361. 10.1046/j.1432-1033.2002.02767.xPubMedGoogle Scholar
- Fernandez-Recio J, Abagyan R, Totrov M: Improving CAPRI predictions: optimized desolvation for rigid-body docking. Proteins 2005, 60: 308–313. 10.1002/prot.20575PubMedGoogle Scholar
- Li N, Sun Z, Jiang F: Prediction of protein-protein binding site by using core interface residue and support vector machine. BMC Bioinformatics 2008, 9: 553. 10.1186/1471-2105-9-553PubMed CentralPubMedGoogle Scholar
- Chen H, Zhou HX: Prediction of interface residues in protein-protein complexes by a consensus neural network method: test against NMR data. Proteins 2005, 61: 21–35. 10.1002/prot.20514PubMedGoogle Scholar
- Ofran YaR B: Predicted protein-protein interaction sites from local sequence information. FEBS Lett 2003, 544: 236–239. 10.1016/S0014-5793(03)00456-3Google Scholar
- Chen XW, Jeong JC: Sequence-based prediction of protein interaction sites with an integrative method. Bioinformatics 2009, 25: 585–591. 10.1093/bioinformatics/btp039PubMedGoogle Scholar
- Wodak SJ, Mendez R: Prediction of protein-protein interactions: the CAPRI experiment, its evaluation and implications. Curr Opin Struct Biol 2004, 14: 242–249. 10.1016/j.sbi.2004.02.003PubMedGoogle Scholar
- Dunker AK, Obradovic Z, Romero P, Garner EC, Brown CJ: Intrinsic protein disorder in complete genomes. Genome Inform Ser Workshop Genome Inform 2000, 11: 161–171.PubMedGoogle Scholar
- Dunker AK, Obradovic Z: The protein trinity--linking function and disorder. Nat Biotechnol 2001, 19: 805–806. 10.1038/nbt0901-805PubMedGoogle Scholar
- Sander C, Schneider R: Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 1991, 9: 56–68. 10.1002/prot.340090107PubMedGoogle Scholar
- Rost B: Twilight zone of protein sequence alignments. Protein Eng 1999, 12: 85–94. 10.1093/protein/12.2.85PubMedGoogle Scholar
- Marti-Renom MA, Stuart AC, Fiser A, Sanchez R, Melo F, Sali A: Comparative protein structure modeling of genes and genomes. Annu Rev Biophys Biomol Struct 2000, 29: 291–325. 10.1146/annurev.biophys.29.1.291PubMedGoogle Scholar
- Thielmann Y, Weiergraber OH, Ma P, Schwarten M, Mohrluder J, Willbold D: Comparative modeling of human NSF reveals a possible binding mode of GABARAP and GATE-16. Proteins 2009, 77: 637–646. 10.1002/prot.22477PubMedGoogle Scholar
- Guex N, Peitsch MC, Schwede T: Automated comparative protein structure modeling with SWISS-MODEL and Swiss-PdbViewer: a historical perspective. Electrophoresis 2009, 30(Suppl 1):S162–173.PubMedGoogle Scholar
- Sharma M, Khanna S, Bulusu G, Mitra A: Comparative modeling of thioredoxin glutathione reductase from Schistosoma mansoni: a multifunctional target for antischistosomal therapy. J Mol Graph Model 2009, 27: 665–675. 10.1016/j.jmgm.2008.10.009PubMedGoogle Scholar
- Abascal F, Valencia A: Automatic annotation of protein function based on family identification. Proteins 2003, 53: 683–692. 10.1002/prot.10449PubMedGoogle Scholar
- Andrade MA: Position-specific annotation of protein function based on multiple homologs. Proc Int Conf Intell Syst Mol Biol 1999, 28–33.Google Scholar
- Pandit SB, Gosar D, Abhiman S, Sujatha S, Dixit SS, Mhatre NS, Sowdhamini R, Srinivasan N: SUPFAM--a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes. Nucleic Acids Res 2002, 30: 289–293. 10.1093/nar/30.1.289PubMed CentralPubMedGoogle Scholar
- Valencia A: Automatic annotation of protein function. Curr Opin Struct Biol 2005, 15: 267–274. 10.1016/j.sbi.2005.05.010PubMedGoogle Scholar
- Zehetner G: OntoBlast function: From sequence similarities directly to potential functional annotations by ontology terms. Nucleic Acids Res 2003, 31: 3799–3803. 10.1093/nar/gkg555PubMed CentralPubMedGoogle Scholar
- Nair R, Rost B: Sequence conserved for subcellular localization. Protein Sci 2002, 11: 2836–2847.PubMed CentralPubMedGoogle Scholar
- Aytuna AS, Gursoy A, Keskin O: Prediction of protein-protein interactions by combining structure and sequence conservation in protein interfaces. Bioinformatics 2005, 21: 2850–2855. 10.1093/bioinformatics/bti443PubMedGoogle Scholar
- Espadaler J, Romero-Isart O, Jackson RM, Oliva B: Prediction of protein-protein interactions using distant conservation of sequence patterns and structure relationships. Bioinformatics 2005, 21: 3360–3368. 10.1093/bioinformatics/bti522PubMedGoogle Scholar
- Matthews LR, Vaglio P, Reboul J, Ge H, Davis BP, Garrels J, Vincent S, Vidal M: Identification of potential interaction networks using sequence-based searches for conserved protein-protein interactions or "interologs". Genome Res 2001, 11: 2120–2126. 10.1101/gr.205301PubMed CentralPubMedGoogle Scholar
- Grishin NV, Phillips MA: The subunit interfaces of oligomeric enzymes are conserved to a similar extent to the overall protein sequences. Protein Sci 1994, 3: 2455–2458. 10.1002/pro.5560031231PubMed CentralPubMedGoogle Scholar
- Caffrey SSDR, Hughes JD, Mintseris J, Huang ES: Are Protein-Protein Interfaces More Conserved in Sequence than the Rest of the Protein Surface? Protein Sci 2004, 13: 190–202. 10.1110/ps.03323604PubMed CentralPubMedGoogle Scholar
- Reddy BV, Kaznessis YN: A quantitative analysis of interfacial amino acid conservation in protein-protein hetero complexes. J Bioinform Comput Biol 2005, 3: 1137–1150. 10.1142/S0219720005001429PubMedGoogle Scholar
- Choi YS, Yang JS, Choi Y, Ryu SH, Kim S: Evolutionary conservation in multiple faces of protein interaction. Proteins 2009, 77: 14–25. 10.1002/prot.22410PubMedGoogle Scholar
- Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N: Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 2002, 18(Suppl 1):S71–77. 10.1093/bioinformatics/18.suppl_1.S71PubMedGoogle Scholar
- Lichtarge O, Bourne HR, Cohen FE: An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 1996, 257: 342–358. 10.1006/jmbi.1996.0167PubMedGoogle Scholar
- Lichtarge O, Sowa ME: Evolutionary predictions of binding surfaces and interactions. Curr Opin Struct Biol 2002, 12: 21–27. 10.1016/S0959-440X(02)00284-1PubMedGoogle Scholar
- Landgraf R, Fischer D, Eisenberg D: Analysis of heregulin symmetry by weighted evolutionary tracing. Protein Eng 1999, 12: 943–951. 10.1093/protein/12.11.943PubMedGoogle Scholar
- Armon A, Graur D, Ben-Tal N: ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. J Mol Biol 2001, 307: 447–463. 10.1006/jmbi.2000.4474PubMedGoogle Scholar
- Madabushi S, Yao H, Marsh M, Kristensen DM, Philippi A, Sowa ME, Lichtarge O: Structural clusters of evolutionary trace residues are statistically significant and common in proteins. J Mol Biol 2002, 316: 139–154. 10.1006/jmbi.2001.5327PubMedGoogle Scholar
- Engelen S, Trojan LA, Sacquin-Mora S, Lavery R, Carbone A: Joint evolutionary trees: a large-scale method to predict protein interfaces based on sequence sampling. PLoS Comput Biol 2009, 5: e1000267. 10.1371/journal.pcbi.1000267PubMed CentralPubMedGoogle Scholar
- Carl N, Konc J, Janezic D: Protein surface conservation in binding sites. J Chem Inf Model 2008, 48: 1279–1286. 10.1021/ci8000315PubMedGoogle Scholar
- Bordner AJ, Abagyan R: Statistical analysis and prediction of protein-protein interfaces. Proteins 2005, 60: 353–366. 10.1002/prot.20433PubMedGoogle Scholar
- Wang B, Wong HS, Huang DS: Inferring protein-protein interacting sites using residue conservation and evolutionary information. Protein Pept Lett 2006, 13: 999–1005. 10.2174/092986606778777498PubMedGoogle Scholar
- Panchenko AR, Kondrashov F, Bryant S: Prediction of functional sites by analysis of sequence and structure conservation. Protein Sci 2004, 13: 884–892. 10.1110/ps.03465504PubMed CentralPubMedGoogle Scholar
- Shoemaker BA, Zhang D, Thangudu RR, Tyagi M, Fong JH, Marchler-Bauer A, Bryant SH, Madej T, Panchenko AR: Inferred Biomolecular Interaction Server--a web server to analyze and predict protein interacting partners and binding sites. Nucleic Acids Research 2010, 38: D518-D524. 10.1093/nar/gkp842PubMed CentralPubMedGoogle Scholar
- Kundrotas PJ, Vakser IA: Accuracy of protein-protein binding sites in high-throughput template-based modeling. PLoS Comput Biol 2010, 6: e1000727. 10.1371/journal.pcbi.1000727PubMed CentralPubMedGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000, 28: 235–242. 10.1093/nar/28.1.235PubMed CentralPubMedGoogle Scholar
- Tompa P, Fuxreiter M, Oldfield CJ, Simon I, Dunker AK, Uversky VN: Close encounters of the third kind: disordered domains and the interactions of proteins. Bioessays 2009, 31: 328–335. 10.1002/bies.200800151PubMedGoogle Scholar
- Metallo SJ: Intrinsically disordered proteins are potential drug targets. Curr Opin Chem Biol 14: 481–488.
- Fong JH, Panchenko AR: Intrinsic disorder and protein multibinding in domain, terminal, and linker regions. Mol Biosyst 2010, 6: 1821–1828. 10.1039/c005144fPubMed CentralPubMedGoogle Scholar
- Vacic V, Oldfield CJ, Mohan A, Radivojac P, Cortese MS, Uversky VN, Dunker AK: Characterization of molecular recognition features, MoRFs, and their binding partners. J Proteome Res 2007, 6: 2351–2366. 10.1021/pr0701411PubMed CentralPubMedGoogle Scholar
- Mintseris J, Weng Z: Structure, function, and evolution of transient and obligate protein-protein interactions. Proc Natl Acad Sci USA 2005, 102: 10930–10935. 10.1073/pnas.0502667102PubMed CentralPubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25: 3389–3402. 10.1093/nar/25.17.3389PubMed CentralPubMedGoogle Scholar
- Johnson RAWD, Ed: Applied Multivariate Statistical Analysis. 5th edition. Prentice Hall; 2002.Google Scholar
- Bradford JR, Westhead DR: Improved prediction of protein-protein binding sites using a support vector machines approach. Bioinformatics 2005, 21: 1487–1494. 10.1093/bioinformatics/bti242PubMedGoogle Scholar
- Caragea C, Honavar V: Machine Learning in Computational Biology. Encyclopedia of Database Systems 2008.Google Scholar
- Zhou HX, Shan Y: Prediction of protein interaction sites from sequence profile and residue neighbor list. Proteins 2001, 44: 336–343. 10.1002/prot.1099PubMedGoogle Scholar
- Qin S, Zhou HX: meta-PPISP: a meta web server for protein-protein interaction site prediction. Bioinformatics 2007, 23: 3386–3387. 10.1093/bioinformatics/btm434PubMedGoogle Scholar
- Kufareva I, Budagyan L, Raush E, Totrov M, Abagyan R: PIER: protein interface recognition for structural proteomics. Proteins 2007, 67: 400–417. 10.1002/prot.21233PubMedGoogle Scholar
- Porollo A, Meller J: Prediction-based fingerprints of protein-protein interactions. Proteins 2007, 66: 630–645.PubMedGoogle Scholar
- Liang S, Zhang C, Liu S, Zhou Y: Protein binding site prediction using an empirical scoring function. Nucleic Acids Res 2006, 34: 3698–3707. 10.1093/nar/gkl454PubMed CentralPubMedGoogle Scholar
- Dosztanyi Z, Meszaros B, Simon I: ANCHOR: web server for predicting protein binding regions in disordered proteins. Bioinformatics 2009, 25: 2745–2746. 10.1093/bioinformatics/btp518PubMed CentralPubMedGoogle Scholar
- Meszaros B, Simon I, Dosztanyi Z: Prediction of protein binding regions in disordered proteins. PLoS Comput Biol 2009, 5: e1000376. 10.1371/journal.pcbi.1000376PubMed CentralPubMedGoogle Scholar
- Gao M, Skolnick J: Structural space of protein-protein interfaces is degenerate, close to complete, and highly connected. Proc Natl Acad Sci USA 2010, 107: 22517–22522. 10.1073/pnas.1012820107PubMed CentralPubMedGoogle Scholar
- Zhang QC, Petrey D, Norel R, Honig BH: Protein interface conservation across structure space. Proc Natl Acad Sci USA 2010, 107: 10896–10901. 10.1073/pnas.1005894107PubMed CentralPubMedGoogle Scholar
- Meszaros B, Tompa P, Simon I, Dosztanyi Z: Molecular principles of the interactions of disordered proteins. J Mol Biol 2007, 372: 549–561. 10.1016/j.jmb.2007.07.004PubMedGoogle Scholar
- Fong JH, Shoemaker BA, Garbuzynskiy SO, Lobanov MY, Galzitskaya OV, Panchenko AR: Intrinsic disorder in protein interactions: insights from a comprehensive structural analysis. PLoS Comput Biol 2009, 5: e1000316. 10.1371/journal.pcbi.1000316PubMed CentralPubMedGoogle Scholar
- Li X, Romero P, Rani M, Dunker AK, Obradovic Z: Predicting Protein Disorder for N-, C-, and Internal Regions. Genome Inform Ser Workshop Genome Inform 1999, 10: 30–40.PubMedGoogle Scholar
- Dunker K: Sequence Data Analysis for Long Disordered Regions Prediction in the Calcineurin Family. Genome Inform Ser Workshop Genome Inform 1997, 8: 110–124.PubMedGoogle Scholar
- He B, Wang K, Liu Y, Xue B, Uversky VN, Dunker AK: Predicting intrinsic disorder in proteins: an overview. Cell Res 2009, 19: 929–949. 10.1038/cr.2009.87PubMedGoogle Scholar
- Oldfield CJ, Meng J, Yang JY, Yang MQ, Uversky VN, Dunker AK: Flexible nets: disorder and induced fit in the associations of p53 and 14–3-3 with their partners. BMC Genomics 2008, 9(Suppl 1):S1. 10.1186/1471-2164-9-S1-S1Google Scholar
- Russell RB, Gibson TJ: A careful disorderliness in the proteome: sites for interaction and targets for future therapies. FEBS Lett 2008, 582: 1271–1275. 10.1016/j.febslet.2008.02.027PubMedGoogle Scholar
- Chandonia JM, Brenner SE: The impact of structural genomics: expectations and outcomes. Science 2006, 311: 347–351. 10.1126/science.1121018PubMedGoogle Scholar
- Pan XY, Zhang YN, Shen HB: Large-scale prediction of human protein-protein interactions from amino acid sequence based on latent topic features. J Proteome Res 2010, 9: 4992–5001. 10.1021/pr100618tPubMedGoogle Scholar
- Hue M, Riffle M, Vert JP, Noble WS: Large-scale prediction of protein-protein interactions from structures. BMC Bioinformatics 2010, 11: 144. 10.1186/1471-2105-11-144PubMed CentralPubMedGoogle Scholar
- Cho YR, Zhang A: Identification of functional hubs and modules by converting interactome networks into hierarchical ordering of proteins. BMC Bioinformatics 2010, 11(Suppl 3):S3. 10.1186/1471-2105-11-S3-S3PubMed CentralPubMedGoogle Scholar
- Xia JF, Wang SL, Lei YK: Computational methods for the prediction of protein-protein interactions. Protein Pept Lett 2010, 17: 1069–1078. 10.2174/092986610791760405PubMedGoogle Scholar
- Xue LC, Jordan RA, El-Manzalawy Y, Dobbs D, Honavar V: Ranking Docked Models of Protein-Protein Complexes Using Predicted Partner-Specific Protein-Protein Interfaces: A Preliminary Study. Proceedings of the International Conference On Bioinformatics and Computational Biology (ACM-BCB); Chicago, Illinois, August 1–3, 2011, in press.
- Wang G, Dunbrack RL Jr: PISCES: a protein sequence culling server. Bioinformatics 2003, 19: 1589–1591. 10.1093/bioinformatics/btg224PubMedGoogle Scholar
- S2C-A database correlating sequence and atomic coordinate residue numbering in the Protein Data Bank[http://dunbrack.fccc.edu/Guoli/s2c/index.php]
- BLAST Substitution Matrix[http://www.ncbi.nlm.nih.gov/blast/html/sub_matrix.html]
- Baldi P, Brunak S, Chauvin Y, Andersen CA, Nielsen H: Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 2000, 16: 412–424. 10.1093/bioinformatics/16.5.412PubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.