- Open Access
NOXclass: prediction of protein-protein interaction types
© Zhu et al; licensee BioMed Central Ltd. 2006
- Received: 26 September 2005
- Accepted: 19 January 2006
- Published: 19 January 2006
Structural models determined by X-ray crystallography play a central role in understanding protein-protein interactions at the molecular level. Interpretation of these models requires the distinction between non-specific crystal packing contacts and biologically relevant interactions. This has been investigated previously and classification approaches have been proposed. However, less attention has been devoted to distinguishing different types of biological interactions. These interactions are classified as obligate and non-obligate according to the effect of the complex formation on the stability of the protomers. So far no automatic classification methods for distinguishing obligate, non-obligate and crystal packing interactions have been made available.
Six interface properties have been investigated on a dataset of 243 protein interactions. The six properties have been combined using a support vector machine algorithm, resulting in NOXclass, a classifier for distinguishing obligate, non-obligate and crystal packing interactions. We achieve an accuracy of 91.8% for the classification of these three types of interactions using a leave-one-out cross-validation procedure.
NOXclass allows the interpretation and analysis of protein quaternary structures. In particular, it generates testable hypotheses regarding the nature of protein-protein interactions, when experimental results are not available. We expect this server will benefit the users of protein structural models, as well as protein crystallographers and NMR spectroscopists. A web server based on the method and the datasets used in this study are available at http://noxclass.bioinf.mpi-inf.mpg.de/.
- Amino Acid Composition
- Protein Data Bank
- Support Vector Machine Classifier
- Interface Property
- Conservation Score
Protein-protein interactions play important roles in many biological processes. Structural models of the complexes resulting from these interactions are necessary to understand those processes at the molecular level. Among the different techniques which can be employed to determine the structures of protein complexes, X-ray crystallography is still the most popular . However, not all interactions observed in structures of protein complexes determined by X-ray crystallography are biologically relevant. Many of them are formed during the crystallization process and would not appear in vivo. Such crystal packing contacts are non-specific and have no biological function associated . The determination of the quaternary structure of protein complexes remains a field of active research [2–9].
In addition, there are diverse types of biological interactions . Protomers from obligate complexes do not exist as stable structures in vivo, whereas protomers of non-obligate complexes may dissociate from each other and stay as stable and functional units. Similarly, protein complexes have been divided as permanent or transient according to their lifetime.
A number of studies have examined properties of protein-protein interfaces in order to discriminate biologically relevant interactions and non-biological interactions resulting from crystal packing contacts. It has been shown that biological interactions tend to have larger interface size than non-biological interactions [2–6, 11]. PQS , which uses interface size as its main discriminant, separated true from false homodimers with an accuracy of 78% on a non-redundant dataset . A 400 Å2 cutoff for interface size between biological interactions and non-biological interactions is used by PQS. Ponstingl and coworkers reported an optimal cutoff of 856 Å2 for differentiating homodimers and monomers . However, counterexamples were also observed for which this criterion failed [4, 6]. Amino acid composition of the interface is another well-analyzed property for identifying biological interactions [3, 9, 13, 14]. It has been reported that the amino acid composition of biological interfaces is different from that of the rest of protein surface [9, 13, 14]. On the other hand, Carugo and collaborators showed that the chemical composition of crystal packing contacts is very similar to that of the rest of the surface as a whole . The importance of residue conservation in the identification of the oligomeric state of protein complexes has been investigated. Using a neural network algorithm for combining the size and conservation measures of the interface, biological homodimeric interactions and crystal packing contacts can be successfully classified with an accuracy of 98.3% . Zhang et al. introduced statistical learning methods to predict protein quaternary structures based on protein sequence information .
Similar properties have been employed for identifying protein-protein interaction sites. Jones and Thornton analyzed six physicochemical interface properties and used them for predicting interaction sites [13, 16]. Gallet et al. identified residues involved in protein interaction sites based on hydrophobicity . Zhou and Shan used sequence profiles of neighboring residues and solvent accessibility of a target residue . Also, residue conservation has been employed to infer functional hot spots at the protein surface [19–22]. The approaches are based on the assumption that key residues involved in biologically relevant interactions are more strongly conserved in evolution than the rest of protein surfaces. Though several conservation scores have proven useful, there is still room for improvement . Different properties have been combined with a support vector machine (SVM) implementation in order to predict protein-protein binding sites [24, 25]. Some efforts have been made to discriminate different types of biological interactions. Transient protein-protein interactions, including both homodimers and heterodimers, have been characterized at the structural level . This work revealed that interfaces of transient complexes have smaller area, and are more planar and polar on average than those of stable homodimers. In addition, interface residues of transient homodimers have been found to be more conserved than the other surface residues. Gunasekaran and coworkers reported that both per-residue surface area and interface area of ordered proteins (involving non-obligate interactions) are much smaller than those of disordered proteins (involving obligate interactions) . Recently, De et al. performed a statistical analysis of the interface properties for obligate and non-obligate interactions . They reported that obligate interfaces have more contacts than non-obligate interfaces. And these contacts are mainly nonpolar. Involvement of secondary structure elements at interfaces were reported to be significantly different. In a recent paper, Mintseris and Weng investigated the difference between obligate and transient complexes from an evolutionary point of view . In obligate interactions, interface residues were reported to be significantly more conserved than those in transient interactions. In addition, the coevolution rate was observed to be lower for obligate interaction partners than for transient interaction partners. In general, obligate and non-obligate proteins have been shown to have distinct interaction preferences. Nevertheless, there is no single interface property with a clear cutoff on whose basis one can discriminate between the different protein interaction types. This is not surprising given the complexity and diversity of protein interactions. Mintseris and Weng used atomic contact vectors to discriminate obligate from non-obligate interactions . They achieved respectable accuracy (91%) in such a classification problem. Clearly, there has been considerable progress in the analysis and classification of the different types of interactions, but so far no method has been made available for the prediction of protein-protein interaction types.
In this paper, first we investigate six interface properties for a set of non-redundant protein-protein interactions. These properties are interface area, ratio of interface area to protein surface area, amino acid composition of the interface, correlation between amino acid compositions of interface and protein surface, interface shape complementarity, and conservation of the interface. Then we trained an SVM classifier with these interface properties to differentiate not only biological interaction from crystal packing contacts, but also obligate interactions from non-obligate interactions. We constructed a two-stage SVM to handle the three-class classification problem. Our SVM classifier achieved an accuracy of 91.8% using leave-one-out cross-validation on the non-redundant dataset containing 243 interactions.
We compiled a non-redundant data set with three types of protein-protein interactions from several sources. Here, every interaction involves two protomers, which refer to the two polypeptide chains in the protein complex. There may be more than two protomers per complex, resulting in several interactions. When considering a protein-protein interaction, only the two protomers involved are relevant.
Obligate interactions were taken from a previously compiled set . Non-obligate interactions were obtained from both a set of non-obligate interactions  and a set of transient interactions , which are non-obligate by definition. To remove redundancies , these interactions were first divided into groups. Each group is defined by the two SCOP families to which the two interaction protomers belong. Then we selected within each group the interaction whose complex has the highest AEROSPACI score . The AEROSPACI score is a measure of the quality of the structural models available in the Protein Data Bank (PDB) . After removing redundancy, we have 94 obligate interactions and 88 non-obligate interactions. Some problematic cases were found and removed from the set. For example, small ligands were found in some interfaces, or there was an interaction between two different parts of the same protein that was cleaved into two chains as a result of proteolysis. In total we removed eight cases from the obligate set (1bbh, 1bft, 1g4y, 1mka, 1nsy, 1scf, 1vfr and 5hvp) and six entries from the non-obligate set (1bpl, 1noc, 1fap, 1bmq 1ef1 and 2kau). The ConSurf server  was used to derive the conservation scores for these protein sequences. Only for a subset of these interactions we could obtain conservation scores for the protomers involved. In this subset of interactions, there are 75 obligate interactions and 62 non-obligate interactions. Enzyme homodimers predominate in the obligate set, but the set also includes other types of proteins, like transcription regulators or membrane receptors. The non-obligate set includes many interactions between enzyme and inhibitors, but it also includes other types of interactions like different examples of receptor-ligand interactions or transient signaling complexes.
Obligate Interactions (75)
1ahj A B
1b34 A B
1dce A B
lefv A B
1gux A B
1h2a L S
1luc A B
1pnk A B
1req A B
1tco A B
2aai A B
1aOf A B
1a4i A B
1afw A B
1aj8 A B
1ajs A B
1aom A B
1aq6 A B
1at3 A B
1b3a A B
1b5e A B
1b7b A C
1b8a A B
1b8j A B
1b9m A B
1bjn A B
1bol A B
1brm A B
1byf A B
1byk A B
1c7n A B
1cli A B
1cmb A B
1cnz A B
1coz A B
1cp2 A B
1dor A B
1f6y A B
1gpe A B
1hgx A B
1hjr A C
1hss A B
1isa A B
1jkm A B
1kpe A B
1msp A B
1nse A B
1one A B
1pp2 L R
1qae A B
1qax A B
1qbi A B
1qfe A B
1qfh A B
1qor A B
1qu7 A B
1smt A B
1sox A B
1spu A B
1trk A B
1vlt A B
1vok A B
1wgj A B
1xik A B
1xso A B
1ypi A B
1yve I J
2ae2 A B
2hdh A B
2hhm A B
2nac A B
2pfl A B
2utg A B
3tmk A B
4mdh A B
Non-obligate Interactions (62)
1ava A C
1avw A B
1bvn T P
1cse I E
1eai C A
1f34 A B
1fss A B
1gla F G
1kxq H A
1smp I A
1tab I E
1tgs I Z
2ptc I E
2sic I E
4sgb I E
1agr E A
1atn A D
1b6c A B
1bkd R S
1buh A B
1dow A B
1euv A B
1i2m A B
1i8l A C
1kac A B
1pdk A B
1qav A B
1tx4 A B
1cOf S A
1zbd A B
1ak4 A D
1d09 A B
1cqi A B
1fin A B
1dhk A B
1bi7 A B
1wql R G
1rrp A B
1ccO A E
1eg9 A B
1avz B C
1frv A B
3hhr A B
1ycs A B
1cvs A C
1aro L P
1cmx A B
1bml A C
2pcb A B
1fGO A B
1stf E I
1emv A B
1uea A B
1qbk B C
1hlu A P
1itb A B
1eth A B
1jtd A B
1lfd A B
1dnl A B
1tmq A B
1a4y A B
Crystal Packing Contacts (106)
Definition of interface properties
In order to characterize the different types of protein-protein interactions, we analyzed the following six interface properties: interface area, ratio of interface area to protein surface area, amino acid composition of the interface, correlation between amino acid compositions of interface and protein surface, gap volume index, and conservation score of the interface. A residue is defined as being part of the interface if its solvent accessible surface area (SASA) decreases by > 1 Å2 upon the formation of the complex . A protein-protein interface is defined to be the ensemble of all interface residues from both protomers. Solvent accessible surface areas for residues were calculated using NACCESS , with a probe sphere of radius 1.4 Å.
Interface area is defined as one half of the total decrease of SASA (ΔSASA) of the two protomers upon the formation of the interaction:
where a and b are two protomers in the complex ab; SASA a , SASA b and SASA ab are the SASA values for a, b, and ab, respectively. The native complex may contain additional protomers, but they are not considered.
Interface area ratio
Biological interactions that involve a small protomer cannot have large interface areas. This applies to some enzyme-inhibitor complexes, for instance. Therefore, we defined a new feature, in which the interface area is normalized by the SASA of the smaller protomer in the complex:
where SASA a and SASA b are the SASA values for protomers a and b, respectively.
Amino acid composition of the interface
We calculated both number-based and area-based amino acid composition . The number-based amino acid composition (v n ) is defined as the frequency of each type of the 20 standard amino acids in the protein-protein interface. By weighting each residue with its ΔSASA, the area-based amino acid composition v a is computed:
where type(r) is the type of the amino acid of residue r.
Correlation between amino acid compositions of interface and protein surface
The amino acid composition of the biological interface was shown to be significantly different from that of the rest of the protein surface . It is reasonable to expect the amino acid composition of the crystal packing interface to be similar to that of the rest of the protein surface. To measure this effect, the Pearson's correlation coefficients between the amino acid compositions of interface and surface were calculated. These correlations were calculated for both number-based and area-based amino acid compositions.
Gap volume index
It has been shown that the protein-protein interfaces are more complementary in obligate complexes than those in non-obligate complexes [9, 37]. The gap volume index is one of the measurements for interface complementarity . Since gap volume is dependent on protein size, this feature is computed by normalizing the gap volume between protomers with their interface area:
The smaller the gap volume index, the more complementary the interface shapes are. Gap volume was computed using the SURFNET program . The minimum and maximum radius for gap spheres were set to 1.0 and 5.0 Å, respectively. The grid separation was set to 2.0 Å.
Conservation score of the interface
We calculated the conservation scores for residues in the interface as determined by the ConSurf method . The conservation score of the interface was defined as the average value of conservation scores of all the residues at the protein-protein interface. In a similar way to the area-based amino acid composition, we weighted the conservation score for each residue by its ΔSASA upon the formation of the interaction. The average of these weighted residue conservation scores was used as the area-based conservation score of the interface.
List of Interface Properties
Amino Acid Composition of the interface, Area-based
Amino Acid Composition of the interface, Number-based
CORrelation between amino acid compositions of interface and surface, Area-based
CORrelation between amino acid compositions of interface and surface, Number-based
Conservation Score of the interface, Area-based
Conservation Score of the interface, Number-based
Δν DISTance between amino acid compositions of the interfaces, Number-based
Δν DISTance between amino acid compositions of the interfaces, Area-based
Gap Volume Index
Interface Area Ratio
Solvent Accessible Surface Area
We employed a support vector machine [39, 40] to classify the three types of interactions. In general, an SVM is a supervised learning algorithm for binary classification of data. For more than two classes of data, multi-class techniques are required. These techniques include "one-against-one" and "one-against-all" approaches . For these purposes, several binary SVM classifiers are constructed and the appropriate class is determined using a majority voting scheme. An alternative approach is a multi-stage classifier that separates data progressively. Here, the classification is performed in several stages, and in each stage one class of data is separated.
The R package e1071 [42, 43] interfacing to libsvm  was used to perform the SVM classification. Best results were obtained when radial basis kernels were chosen for SVMs in both stages. To achieve best performance, parameters gamma and C were tuned using the build-in function "tune" in e1071. We performed a recursive grid-search for the best parameters using a leave-one-out cross-validation procedure. The parameter search stops when the improvement of accuracy is less than 0.1%. In the best performing two-stage SVM using three interface properties (IA, IAR, and AACa), they were set to 0.004 and 128 for the SVM in the first stage, and 0.00085 and 512 for the SVM in the second stage.
We obtained posterior probabilities for our classification with the same R package. It fits a logistic distribution to the pairwise classification decision values using a maximum likelihood algorithm . With this fitted distribution the posterior pairwise class probabilities are estimated for each prediction.
Analysis of interface properties
Interface area ratio
Amino acid composition of the interface
Correlation between amino acid compositions of interface and protein surface
Gap volume index
Conservation score of the interface
Relationship between interface properties
Scatter plots comparing different interface properties are provided in the supplementary material (see Additional file 1: supplementary.pdf). In the scatter plots, one can observe that the crystal packing contacts are more clearly separable from the ensemble than the other two types of interactions.
Performance of the SVM classifiers
We performed leave-one-out cross-validation for the multi-class and two-stage SVMs using the six properties available for the BNCP-CS dataset as input features: IA, IAR, AACa, CORa, GVI, and CSa.
Definitions of Notions TP, FN, FP, and TN
We investigated the best performances of the two-stage SVM in terms of cross-validation accuracy when using combinations of six individual features: IA, IAR, AACa, CORa, GVI, and CSa (see Additional file 1: supplementary.pdf). For the BNCP-CS dataset, the best single feature is IA with an accuracy of 76.5%. The best combination of two features is IA and AACa, yielding 86.0%. Using the three features IA, IAR, and AACa, yields 91.8%. With the four features, IA, IAR, AACa, and GVI (or CSa), we obtained 91.4%. The best accuracy is 90.5% when using five features with IA, IAR, AACa, GVI, and CSa. When using all six features the accuracy is 89.7%.
The accuracy of the multi-class SVM classifier is slightly below that of the two-stage SVM classifier. With a leave-one-out cross-validation procedure we obtained a best accuracy of 90.9% when using four properties, IA, IAR, AACa, and GVI on the BNCP-CS dataset.
Leave-one-out cross-validation results for the BNCP-CS dataset using the two-stage SVMa
Performance of the two-stage SVM classifiera
Test for overfitting with nested cross-validation
By selecting parameters for the SVMs after cross-validation, we followed a standard procedure applied when limited data are available. Ideally, the data should be split into training, parameter optimization, and validation sets. Since our dataset is of limited size, we maximized the size of the training dataset to get the best-performing SVM classifiers. The drawback is that the accuracy estimates are possibly too optimistic. In order to test for overfitting, we estimated the misclassification rate following a previously described nested cross-validation protocol . We divided the data into three parts, on two parts 10-fold cross-validation was performed to train the model and select optimal parameters. On the third part the model was tested. Repeating the whole procedure five times, the average accuracies and standard deviations are 81.4 ± 1.46% (BNCP-CS, multi-class, four features IA, IAR, AACa, and GVI), 83.1 ± 1.16% (BNCP-CS, two-stage, three features IA, IAR, and AACa). For the two-stage SVM, the accuracies for the first and second stage are 94.5 ± 0.92% and 75.2 ± 2.52%, respectively. There is no considerable difference between the two average accuracy values for the best performing multi-class and two-stage SVMs. The low standard deviations indicate that the method is quite robust. Because of the small size of the training dataset, the accuracy estimates from the nested cross-validation might be overly pessimistic.
Testing on Bahadur's dataset
We have applied our best performing SVM, which is the two-stage SVM trained using three features (IA, IAR, and AACa), to the dataset used by Bahadur et al. . This dataset includes 188 crystal packing contacts, 122 homodimers, and 70 other protein-protein complexes. This dataset has some overlap with the BNCP-CS dataset. Between the two sets there are 36 homodimers and 19 other biological complexes with more than 40% sequence identity. In total, the accuracy of the first stage SVM is 80.0%, which is considerably less than the performance of the first stage SVM on the nested cross validation (94.5 ± 0.92%). This can be explained by the fact that the crystal packing dataset used by Bahadur et al. is heavily biased toward crystal packing contacts with large contacting area (> 400 Å2).
We can reasonably expect that in this dataset the subset of homodimers mostly includes obligate interactions. In addition, inspecting the descriptions of the 70 other protein-protein complexes in the PDB files, one can expect that this subset mostly contains non-obligate interactions. The second stage SVM predicts 84.4% of the homodimers to be obligate, and 78.6% of the remaining complexes to be non-obligate. Although these results do not represent an actual validation, they do agree with our expectations.
In this paper we analyzed five interface properties for three types of protein-protein interactions. Interface area remains one of the most important features for distinguishing biological interactions from crystal packing contacts. The area of a crystal packing interface is typically smaller than that of a biological interface (Figure 2) Different cutoffs have been proposed for separating crystal packing contacts from biological interactions [5, 6]. In our analysis we found 650 Å2 to be a reasonable cutoff of interface area for the binary classification of biological and non-biological interactions. This threshold separates the BNCP-CS dataset with an accuracy of 93%. Biological interactions where small protomers are involved are better identified using the interface area ratio property in addition.
The 20 amino acids display variable preference for protein-protein interaction in terms of the number of residues taking part in the interaction and the ΔSASA involved in the total interface area. Obligate and non-obligate interactions show noticeable differences regarding the features based on amino acid composition.
Residues involved in biological interactions were shown to be more strongly conserved than residues involved in crystal packing contacts (Figure 8). With the increase of conservation scores of the interface residues, the difference between the three types of interactions are more obvious in terms of their ΔSASA per residue. In particular, conserved residues involved in crystal packing contacts tend to have lower ΔSASA values (Figure 9). The SVM classifier did not benefit from including conservation scores. We investigated whether confidence measures for the conservation score improve performance. To this end, we tested the number of sequences used to calculated the ConSurf score as well as the DOPS score . Improvement was only observed when the number of sequences was combined with the conservation score feature in comparison to only using the ConSurf score as a single feature (55% to 60% improvement using multi-class SVM). No significant improvement was observed when using the number of sequences in addition to the five other features. The effect of confidence measures and conservation scores in the SVM performance deserve further investigation.
As demonstrated in the section on the analysis of the interface properties, the non-obligate interactions in our datasets exhibit intermediate values for all interface properties except the interface area ratio. These results agree with the expected different stability of these types of interactions . Recently, Gunasekaran and coworkers examined the structural properties of ordered and disordered proteins . According to their description, ordered proteins are involved in either non-obligate interactions or crystal packing contacts, while disordered proteins are involved in obligate interactions. The authors have shown that ordered proteins have significantly smaller per-residue SASA at both interface and surface than disordered proteins. These results are in agreement with our analysis. In addition, protomers involved in non-obligate interactions are shown to resemble the protomers involved in crystal packing contacts. Recently, De et al. published the results of a statistical analysis of the interface properties for obligate and non-obligate interactions . Our conclusions agree with their results with respect to the interface properties of interface area, residue propensities at the interface, and shape complementarity.
The first stage of the two-stage SVM classifier distinguishes crystal packing contacts from biological interactions with an accuracy of 97.9% (see the Two-stage SVM section). Valdar and Thornton obtained an accuracy of 98.3% on a similar problem . Nevertheless, the performances of the two methods are not directly comparable because the datasets are different and, in particular, the biological interactions were restricted to homodimers in the latter method.
The nested cross-validation results indicate that there is no considerable difference between the performances of the multi-class and two-stage SVMs. The small variances of these results along with the minor difference between the performances of the SVM implementations indicate that the approach is quite robust.
The method based on atomic contact vectors described by Mintseris and Weng results in considerable accuracy (91%) in the classification of obligate and non-obligate interactions . We intend to integrate this type of feature in a future version of NOXclass.
This study is also related to the work of Bradford and Westhead, investigating different interaction types . However, the aims of the two studies are different. Bradford and Westhead identify the possible binding site at the surface of a given protein, while we use the structural model of the complex to determine the interaction types. Although the oligomeric states of many proteins may be inferred during the process of protein purification for crystallization, this is not always the case. In addition, this information is not easily available in the literature or well annotate in structural databases like the Protein Data Bank (PDB). There is a current lack of a well-defined criterion for defining interaction types based on experimental results, but there has been some recent progress in this area .
In this work we have analyzed several interface properties for three types of protein-protein interactions, i.e. obligate interactions, non-obligate interactions, and crystal packing contacts. These three types of interactions exhibit distinct interface properties.
To classify the three types of interactions, we have combined the properties using a support vector machine algorithm and implemented it as NOXclass. NOXclass allows the interpretation and analysis of protein quaternary structures. In particular, it generates testable hypothesis regarding the nature of protein-protein interactions, when experimental results are not available. We can expect this server will benefit the users of protein structural models, as well as protein crystallographers and NMR spectroscopists.
Program home page
A web server based on the method and the datasets used in this study are available at . Source code for the program can be downloaded from the same address.
NOXclass requires LINUX or UNIX operation system, as well as a Python interpreter.
External program requirement
The NOXclass program uses NACCESS  to calculate the solvent accessible surface areas for residues. The LIBSVM  package is required by NOXclass to operate. These two programs are not distributed in the NOXclass package and the users must obtain these programs by themselves for executing the NOXclass program on their local computer.
In addition, the NOXclass program uses SURFNET  to compute the gap volume between two protomers. Users have to obtain this program for including this feature in the prediction. Similarly, to include evolutionary information in the prediction, the users must obtain the corresponding conservation scores for their protein sequences from the ConSurf server .
The source code of the NOXclass program is distributed under the terms of GNU LGPL.
A list of abbreviations used in this paper has been given in table 2.
We are grateful to Jörg Rahnenführer, Oliver Sander, Tobias Sing and Andreas Steffen for helpful discussions. We thank Andreas Kämper for critically reading the manuscript. We want to thank Joachim Büch for his help in the implementation of the NOXclass web server. HZ is supported by the International Max Planck Research School for Computer Science (IMPRS-CS). This research was performed in the context of the EU Network of Excellence BioSapiens (EU grant No. LSHG-CT-2003-503265).
- Russell R, Alber F, Aloy P, Davis F, Korkin D, Pichaud M, Topf M, Sali A: A structural perspective on protein-protein interactions. Curr Opin Struct Biol 2004, 14(3):313–24. 10.1016/j.sbi.2004.04.006View ArticlePubMedGoogle Scholar
- Janin J, Rodier F: Protein-protein interaction at crystal contacts. Proteins 1995, 23(4):580–7. 10.1002/prot.340230413View ArticlePubMedGoogle Scholar
- Carugo O, Argos P: Protein-protein crystal-packing contacts. Protein Sci 1997, 6(10):2261–3.PubMed CentralView ArticlePubMedGoogle Scholar
- Janin J: Specific versus non-specific contacts in protein crystals. Nat Struct Biol 1997, 4(12):973–4. 10.1038/nsb1297-973View ArticlePubMedGoogle Scholar
- Henrick K, Thornton J: PQS: a protein quaternary structure file server. Trends Biochem Sci 1998, 23(9):358–61. 10.1016/S0968-0004(98)01253-5View ArticlePubMedGoogle Scholar
- Ponstingl H, Henrick K, Thornton J: Discriminating between homodimeric and monomeric proteins in the crystalline state. Proteins 2000, 41: 47–57. 10.1002/1097-0134(20001001)41:1<47::AID-PROT80>3.0.CO;2-8View ArticlePubMedGoogle Scholar
- Elcock A, McCammon J: Identification of protein oligomerization states by analysis of interface conservation. Proc Natl Acad Sci USA 2001, 98(6):2990–4. 10.1073/pnas.061411798PubMed CentralView ArticlePubMedGoogle Scholar
- Chakrabarti P, Janin J: Dissecting protein-protein recognition sites. Proteins 2002, 47(3):334–43. 10.1002/prot.10085View ArticlePubMedGoogle Scholar
- Bahadur R, Chakrabarti P, Rodier F, Janin J: A dissection of specific and non-specific protein-protein interfaces. J Mol Biol 2004, 336(4):943–55. 10.1016/j.jmb.2003.12.073View ArticlePubMedGoogle Scholar
- Nooren I, Thornton J: Diversity of protein-protein interactions. EMBO J 2003, 22(14):3486–92. 10.1093/emboj/cdg359PubMed CentralView ArticlePubMedGoogle Scholar
- Dasgupta S, Iyer G, Bryant S, Lawrence C, Bell J: Extent and nature of contacts between protein molecules in crystal lattices and between subunits of protein oligomers. Proteins 1997, 28(4):494–514. 10.1002/(SICI)1097-0134(199708)28:4<494::AID-PROT4>3.0.CO;2-AView ArticlePubMedGoogle Scholar
- Valdar W, Thornton J: Conservation helps to identify biologically relevant crystal contacts. J Mol Biol 2001, 313(2):399–416. 10.1006/jmbi.2001.5034View ArticlePubMedGoogle Scholar
- Jones S, Thornton J: Analysis of protein-protein interaction sites using surface patches. J Mol Biol 1997, 272: 121–32. 10.1006/jmbi.1997.1234View ArticlePubMedGoogle Scholar
- Lo Conte L, Chothia C, Janin J: The atomic structure of protein-protein recognition sites. J Mol Biol 1999, 285(5):2177–98. 10.1006/jmbi.1998.2439View ArticlePubMedGoogle Scholar
- Zhang S, Pan Q, Zhang H, Zhang Y, Wang H: Classification of protein quaternary structure with support vector machine. Bioinformatics 2003, 19(18):2390–6. 10.1093/bioinformatics/btg331View ArticlePubMedGoogle Scholar
- Jones S, Thornton J: Prediction of protein-protein interaction sites using patch analysis. J Mol Biol 1997, 272: 133–43. 10.1006/jmbi.1997.1233View ArticlePubMedGoogle Scholar
- Gallet X, Charloteaux B, Thomas A, Brasseur R: A fast method to predict protein interaction sites from sequences. J Mol Biol 2000, 302(4):917–26. 10.1006/jmbi.2000.4092View ArticlePubMedGoogle Scholar
- Zhou H, Shan Y: Prediction of protein interaction sites from sequence profile and residue neighbor list. Proteins 2001, 44(3):336–43. 10.1002/prot.1099View ArticlePubMedGoogle Scholar
- Lichtarge O, Bourne H, Cohen F: An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 1996, 257(2):342–58. 10.1006/jmbi.1996.0167View ArticlePubMedGoogle Scholar
- Lockless S, Ranganathan R: Evolutionarily conserved pathways of energetic connectivity in protein families. Science 1999, 286(5438):295–9. 10.1126/science.286.5438.295View ArticlePubMedGoogle Scholar
- Armon A, Graur D, Ben-Tal N: ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. J Mol Biol 2001, 307: 447–63. 10.1006/jmbi.2000.4474View ArticlePubMedGoogle Scholar
- Ma B, Elkayam T, Wolfson H, Nussinov R: Protein-protein interactions: structurally conserved residues distinguish between binding sites and exposed protein surfaces. Proc Natl Acad Sci USA 2003, 100(10):5772–7. 10.1073/pnas.1030237100PubMed CentralView ArticlePubMedGoogle Scholar
- Valdar W: Scoring residue conservation. Proteins 2002, 48(2):227–41. 10.1002/prot.10146View ArticlePubMedGoogle Scholar
- Bordner AJ, Abagyan R: Statistical analysis and prediction of protein-protein interfaces. Proteins 2005, 60(3):353–66. 10.1002/prot.20433View ArticlePubMedGoogle Scholar
- Bradford J, Westhead D: Improved prediction of protein-protein binding sites using a support vector machines approach. Bioinformatics 2005, 21(8):1487–94. 10.1093/bioinformatics/bti242View ArticlePubMedGoogle Scholar
- Nooren I, Thornton J: Structural characterisation and functional significance of transient protein-protein interactions. J Mol Biol 2003, 325(5):991–1018. 10.1016/S0022-2836(02)01281-0View ArticlePubMedGoogle Scholar
- Gunasekaran K, Tsai C, Nussinov R: Analysis of ordered and disordered protein complexes reveals structural features discriminating between stable and unstable monomers. J Mol Biol 2004, 341(5):1327–41. 10.1016/j.jmb.2004.07.002View ArticlePubMedGoogle Scholar
- De S, Krishnadev O, Srinivasan N, Rekha N: Interaction preferences across protein-protein interfaces of obligatory and non-obligatory components are different. BMC Struct Biol 2005, 5: 15. 10.1186/1472-6807-5-15PubMed CentralView ArticlePubMedGoogle Scholar
- Mintseris J, Weng Z: Structure, function, and evolution of transient and obligate protein-protein interactions. Proc Natl Acad Sci USA 2005, 102(31):10930–10935. 10.1073/pnas.0502667102PubMed CentralView ArticlePubMedGoogle Scholar
- Mintseris J, Weng Z: Atomic contact vectors in protein-protein recognition. Proteins 2003, 53(3):629–39. 10.1002/prot.10432View ArticlePubMedGoogle Scholar
- Neuvirth HRR, Schreiber G: ProMate: a structure based prediction program to identify the location of protein-protein binding sites. J Mol Biol 2004, 338: 181–99. 10.1016/j.jmb.2004.02.040View ArticlePubMedGoogle Scholar
- Aloy P, Ceulemans H, Stark A, Russell R: The relationship between sequence and interaction divergence in proteins. J Mol Biol 2003, 332(5):989–98. 10.1016/j.jmb.2003.07.006View ArticlePubMedGoogle Scholar
- Chandonia J, Hon G, Walker N, Lo Conte L, Koehl P, Levitt M, Brenner S: The ASTRAL Compendium in 2004. Nucleic Acids Res 2004, (32 Database):D189–92. 10.1093/nar/gkh034Google Scholar
- Berman H, Westbrook J, Feng Z, Gilliland G, Bhat T, Weissig H, Shindyalov I, Bourne P: The Protein Data Bank. Nucleic Acids Res 2000, 28: 235–42. 10.1093/nar/28.1.235PubMed CentralView ArticlePubMedGoogle Scholar
- Hubbard S, Thornton J: 'NACCESS', Computer Program, Department of Biochemistry and Molecular Biology, University College London. 1993.Google Scholar
- Ofran Y, Rost B: Analysing six types of protein-protein interfaces. J Mol Biol 2003, 325(2):377–87. 10.1016/S0022-2836(02)01223-8View ArticlePubMedGoogle Scholar
- Jones S, Thornton J: Principles of protein-protein interactions. Proc Natl Acad Sci USA 1996, 93: 13–20. 10.1073/pnas.93.1.13PubMed CentralView ArticlePubMedGoogle Scholar
- Laskowski RA: SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. J Mol Graph 1995, 13(5):323–30. 307–8 10.1016/0263-7855(95)00073-9View ArticlePubMedGoogle Scholar
- Vapnik V: The nature of statistical learning theory. New York: Springer; 1995.View ArticleGoogle Scholar
- Vapnik V: Statistical Learning Theory. New York: Wiley; 1998.Google Scholar
- Hsu C, Lin C: A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks 2002, 13(2):415–425. 10.1109/72.991427View ArticlePubMedGoogle Scholar
- R Development Core Team: R: A Language and Environment for Statistical Computing.R Foundation for Statistical Computing, Vienna, Austria; 2005. [http://www.r-project.org]Google Scholar
- Dimitriadou E, Hornik K, Leisch F, Meyer D, Weingessel A: e1071: Misc functions of the department of statistics (e1071), TU Wien. R package version 1.5–8 2005.Google Scholar
- Chang C, Lin C: LIBSVM: a Library for Support Vector Machines.2005. [http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf]Google Scholar
- Ruschhaupt M, Huber W, Poustka A, Mansmann U: A Compendium to Ensure Computational Reproducibility in High-Dimensional Classification Tasks. Statistical Applications in Genetics and Molecular Biology 2004, 3.Google Scholar
- Bartlett G, Porter C, Borkakoti N, Thornton J: Analysis of catalytic residues in enzyme active sites. J Mol Biol 2002, 324: 105–21. 10.1016/S0022-2836(02)01036-7View ArticlePubMedGoogle Scholar
- NOXclass Web Page[http://noxclass.bioinf.mpi-inf.mpg.de/]
- Glaser F, Pupko T, Paz I, Bell R, Bechor-Shental D, Martz E, Ben-Tal N: ConSurf: identification of functional regions in proteins by surface-mapping of phylogenetic information. Bioinformatics 2003, 19: 163–4. 10.1093/bioinformatics/19.1.163View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.