Prediction of protein-binding areas by small-world residue networks and application to docking
© Pons et al; licensee BioMed Central Ltd. 2011
Received: 30 March 2011
Accepted: 26 September 2011
Published: 26 September 2011
Protein-protein interactions are involved in most cellular processes, and their detailed physico-chemical and structural characterization is needed in order to understand their function at the molecular level. In-silico docking tools can complement experimental techniques, providing three-dimensional structural models of such interactions at atomic resolution. In several recent studies, protein structures have been modeled as networks (or graphs), where the nodes represent residues and the connecting edges their interactions. From such networks, it is possible to calculate different topology-based values for each of the nodes, and to identify protein regions with high centrality scores, which are known to positively correlate with key functional residues, hot spots, and protein-protein interfaces.
Here we show that this correlation can be efficiently used for the scoring of rigid-body docking poses. When integrated into the pyDock energy-based docking method, the new combined scoring function significantly improved the results of the individual components as shown on a standard docking benchmark. This improvement was particularly remarkable for specific protein complexes, depending on the shape, size, type, or flexibility of the proteins involved.
The network-based representation of protein structures can be used to identify protein-protein binding regions and to efficiently score docking poses, complementing energy-based approaches.
Keywordsprotein interactions small-world networks binding site prediction protein-protein docking pyDock
Protein-protein interactions are fundamental to many cellular processes , and a detailed atomic-level description of protein complexes would be needed in order to fully understand their association mechanism . The inherent technical difficulties of experimental methods to solve the three-dimensional structure of many protein complexes  call for the integration of complementary computational approaches [4, 5]. However, the structural prediction of the complex formed by two interacting proteins remains one of the most challenging problems in computational biology. The complex nature of the rotational, translational, and conformational search makes this task extremely difficult and too costly in computational terms to be addressed purely by full-atom molecular mechanics simulations. Therefore, different simplifications are required in order to approach the docking problem . The treatment of proteins as rigid bodies or their description at low resolution (either in grids [7–10] or coarse-grained models [11–13]) are common simplifications in almost all docking approaches, at least in their initial stages. Additionally, the efficient combination of different scoring terms can increase the overall quality of the predictions if they identify different contributions to binding . Therefore, a way to improve the performance of current scoring functions is the detection of new descriptors for protein binding, orthogonal to existing ones, which could be easily integrated in the scoring phase.
Recently, the analysis of protein structures as small-world network systems has attracted significant interest [14–17]. In small-world networks (i) the average shortest path (between any two nodes) is logarithmically related to the total number of nodes, and (ii) a large average clustering coefficient is observed . Using this approach, proteins can be modeled as a network of interactions, where the nodes represent residues and the edges their contacts. It is assumed that highly connected regions of the network play a key role in the protein, which can be easily detected by means of different topology-based network parameters. Indeed, topological data based on small-world network descriptions of proteins have been recently exploited to predict protein-protein interfaces [19, 20], protein-DNA interfaces , protein-RNA interfaces , ligand binding sites [23, 24], modeling , protein dynamics , protein disorder , ribosome functional sites , to identify critical residues for protein function , or to evaluate protein docking poses .
In this work, we characterized unbound proteins as small-world networks for their use in docking. We used different topology measures and evaluated their use to predict protein binding sites. We then applied these descriptors to the scoring stage of protein-protein docking using the latest standard docking benchmark. These scoring functions were integrated in pyDock, a successful docking scoring algorithm based on physico-chemical terms .
Results and discussion
Interface prediction by network-based parameters
We modeled each of the unbound protein structures of the docking benchmark 3.0  as residue-based networks based on Cα atoms. We then calculated different topology-based parameters for all nodes of the network and mapped their values into the residues they represented (see Methods). For comparison purposes, we also generated topology networks based on the Cβ atoms. The closeness and degree values were virtually the same for the two types of networks (correlation r2 = 0.97 and 0.92, respectively), with some differences in the clustering and betweenness parameters (correlation r2 = 0.58 and 0.49, respectively). In the next section we describe how we directly used these values for docking scoring, with no further optimization. But first, we have evaluated the capabilities of the network-based values to predict binding interfaces. With this only purpose, for each protein and network parameter, we defined as interface predictions an arbitrary number of residues (i.e. nodes) with the highest network topology values (see below) and evaluated whether they were present in the binding site of the known protein complex. For this purpose of interface predictions, only surface residues of the unbound protein were considered, defined as those having relative accessible surface area (ASA) > 0.1%. The positive predictive value (PPV) for each complex was calculated as the percentage of predicted residues that were part of the real interface (i.e. residues with at least one atom within 10 Å of the partner protein in the complex). Then we computed the mean PPV of all unbound proteins. Additionally, we used different cutoff values to restrain the selection of predicted residues. It should be noted that some proteins had no residues with values above certain cutoffs and, thus, no predictions were computed in these cases. The random PPV was calculated by randomly selecting surface residues of the unbound proteins. This was repeated 100 times for the different cutoff values.
Network-based scoring of docking poses
Additionally, we evaluated how the docking scoring performance depended on the success of the interface predictions for each partner. Notably, the 29 cases (out of 124) for which closeness correctly predicted the interface in both partners achieved a top 10 docking success rate of 34.6%, clearly above average. On the other hand, in the 46 cases in which only one of the partners had a correct interface prediction, the docking predictions were of worse quality (top 10 success rate 7.7%). Finally, when both partners had incorrect interface predictions (49 cases), docking success clearly worsened (2.6%). The correlation between the success of the interface predictions and the docking scoring performance is thus evident.
Combined energy-based and network-based scoring
Scoring with topology-based network parameters was in all cases clearly better than the FTDock default scoring (top 10 success rate 1%), but still far from the performance achieved by state-of-the-art energy-based scoring functions like pyDock (see Figure 3B). Interestingly, the correlation between the results obtained by these two different types of scoring functions (physics-based pyDock and topology-based network) was very low (below 0.15 for all network parameters), which suggests that they are detecting different contributions to binding.
Analysis by complex type
Coarse-grained model and conformational changes upon binding
The improvement of pyDockCloseness over pyDock was noteworthy in cases with significant conformational change (see Figure 5B). We have previously reported the strong dependency of pyDock success rates on the flexibility of proteins . Indeed, top 10 success rate was excellent (85.7%) for cases with small conformational changes, but then it substantially dropped for the rest of cases. Closeness 15 Å behaved similarly, yielding top 10 success rates of 28.6%, 15.3% and 0% for the groups of proteins with small, medium and large conformational changes upon binding, respectively. In the cases with small changes upon binding (averaged unbound/bound RMSD for receptor and ligand < 0.5 Å), Closeness 15 Å contribution to the combined score could not improve the already excellent results of pyDock. On the contrary, in cases showing medium conformational changes (unbound/bound RMSD between 0.5 and 1.5 Å), pyDockCloseness top 10 success rate (30.6%) was considerably better than that of pyDock alone (20.0%). Interestingly, for the most difficult and challenging group of cases with high flexibility (unbound/bound RMSD >1.5 Å), the Closeness 15 Å contribution to the combined score made the success rate to improve with respect to pyDock (from 16.7% to 23.3%), regardless of the poor performance of the network-based scoring alone for the top 10 predictions (0%). Altogether, our coarse-grained network-based approach (only Cα atoms were used to build the networks and the scoring was at residue level, see Methods) seems to be especially helpful in cases with significant conformational changes, successfully complementing our all-atom approach whose predictions quickly degenerated in inaccurate geometries .
Size and anisotropy
The size and shape of a given protein determine the general topology of the network-based representation, and in consequence, the parameters derived from that are expected to show different features. Therefore, it was of interest to analyze how the different scoring schemes were affected by the size and anisotropy of the proteins.
The anisotropy of the proteins (i.e. the length of the most different axis divided by the mean length of the other two) played a crucial role in the success of the Closeness 15 Å scoring. Spherical cases (those where receptor and ligand had anisotropy values between 0.7 and 2.0) showed a poor performance, whereas prolate cases (those where either receptor or ligand had anisotropy value above 2.0) and, specially, oblate cases (those where either receptor or ligand had anisotropy value below 0.7) yielded better predictions (top 10 success rates were 3.9%, 10.8% and 23.7% for spherical, prolate and oblate cases, respectively; see Figure 6B). In spherical proteins, high closeness values tended to be in the core of the protein, which made difficult for these residue values to contribute to the scoring of near-native poses in a specific manner. Interestingly, this suggests that non-spherical proteins have general topological features that are recognized by the partner. This seems to be in contradiction with a recent work in which a new local closeness measure was defined in order to overcome the lack of predictive ability of global closeness (the measure that we use in this study) for protein-ligand binding sites in non-globular proteins . On the contrary, our results for anisotropic proteins clearly outperformed those obtained for spherical proteins. This perhaps reflects the different nature of the protein-protein and protein-ligand binding problem. On the other hand, pyDock performance was less affected by anisotropy (25.5%, 16.2% and 26.3% for spherical, prolate and oblate cases, respectively). In this case, the worse results of the prolate cases were probably due to the poorer sampling of FTDock (prolate cases tended to be larger than average in the benchmark). Nevertheless, success rates for pyDockCloseness improved those of the individual scorings, reaching top 10 success rates of 29.4%, 21.6% and 44.7% respectively.
Performance by interface area
We also found a strong correlation between the interface size of the complexes and the top 10 success rates of the scoring methods (see Figure 6C). Cases with very small or very large interfaces showed the worse predictions. Top 10 success rates steadily increased with the interface size for pyDock and Closeness 15 Å, but dramatically dropped for the group of largest interfaces (0% with pyDock and Closeness 15 Å. Notably, for this group pyDockCloseness showed top 10 success rate of 33.3%, emphasizing the complementarity effect of both individual scoring functions. It is also interesting that topological network parameters can give such predictive trends, similar to energy-based functions.
In this work, we have shown that network topology values can be used to identify binding regions in proteins. Predictions were significantly better than random in all complex types except in the antibody/antigen cases, where the highest closeness values were generally found in the concave surface formed by the two antibody chains. We have also analyzed in detail the potential use of such network topology parameters as scoring functions to identify near-native docking poses according to different interface definitions. Good performance was achieved for small, oblate and enzyme proteins, similar to that of physical-based methods like pyDock. However, the results from both types of scoring functions were found to be complementary and synergistic. Thus, the combination of the network-based scoring Closeness 15 Å and pyDock improved the latter top 10 success rate by 36% as tested in the most updated standard benchmark. This improvement was much larger for oblate proteins, complexes with large interfaces and cases classified as "other", in which energy-based pyDock typically had the worst results. More importantly, the coarse-grained representation in the network-based scoring made it possible to improve the predictive success in the most challenging type of docking cases, that is, those with significant conformational changes upon binding. Although this approach has limitations in cases with certain topological features, like spherical or very large proteins, we have shown here its potential applications for docking as a complement to energy-based approaches.
Representation of proteins as residue networks
In this work, unbound proteins were modeled as topological networks as follows. The nodes represented the Cα atoms of all the residues in a protein, and the edges the residues in contact (i.e. those whose Cα atoms were within 8.5 Å distance ). To construct the graph topology and calculate the four centrality parameters analyzed in this work, we used the NetworkX python package . For comparison purposes, we also generated topological networks based on the Cβ atoms instead (Cα for Glycine). The resulting networks were very similar (see additional file 1: Figure S8) and the predictions from these networks were virtually the same (see Results).
where N is the total number of nodes in the network and d(x, y) is the shortest path distance between node x and any other node y. Thus, the closeness of node x is the inverse of the average distance to all other nodes. The three remaining network parameters are defined as follows: for any node x, degree is the number of edges incident to that node, betweenness is the sum of the fraction of all shortest paths between any two nodes that pass through x and clustering is the fraction of contacts that exist between its neighbors (i.e. the number of triangles through x) relative to the maximum possible contacts between them. For this work we showed the inverted clustering value (1/clustering) so that higher scores correlate with protein binding sites.
We used the standard protein-protein docking benchmark 3.0  for (i) the assessment of the use of network-based parameters for binding site prediction, (ii) the comparison of the different topological parameters for docking scoring, and (iii) the training of the optimal balance between pyDock and the network-based scoring. The new cases in benchmark 4.0 (the latest so far)  were used to validate the optimal balance found between pyDock and the network-based scoring. Benchmark 4.0 (which includes the cases of benchmark 3.0) was used for the performance analysis.
Generation of docking poses
We used FTDock  with standard parameters (using electrostatics and 0.7 Å grid resolution) to generate 10,000 rigid-body docking poses for the 176 unbound cases of the latest standard protein-protein docking benchmark . A docking pose was considered a near-native solution if its ligand Cα-RMSD with respect to the crystal structure was below 10 Å. The success rate for the top 10 predictions was calculated as the percentage of cases in the benchmark that had a near-native solution within the first 10 predictions. For this calculation, only the cases for which FTDock generated at least one near-native solution were considered (103 for benchmark 3.0 and 141 for benchmark 4.0).
Scoring by network parameters
pyDock  is a scoring function that evaluates the binding energy of rigid-body docking poses, taking into account the contributions of the desolvation, electrostatics and van der Waals energy terms. The desolvation is ASA-based and uses atomic solvation parameters. Coulombic electrostatics is calculated with a distance-dependent dielectric constant, and individual contributions are truncated to ±1 kcal/mol to avoid artificial high scores from models with overlap proteins. The van der Waals term is based on a 6-12 Lennard-Jones potential, weighted to 0.1. Interatomic potentials are truncated to +1 kcal/mol to avoid excessive penalization for models containing clashes. To calculate the electrostatics and the van der Waals terms AMBER94 parameters are used. This scoring function showed excellent results in several CAPRI rounds [37, 38] and in external benchmarks .
Combining pyDock and network-based scoring
where was defined as the best rank of a near-native solution (ligand RMSD < 10 Å) for the benchmark case m, using w to balance the Closeness scoring in the pyDockCloseness function. Values ranging from 0.0 to 2.0 with a step of 0.05 were used to determine the lowest value of F(w).
In order to prevent overfitting, we validated the predictions on the subset of benchmark 4.0 that was not used for the training of w. In addition, we performed a leave-one-out cross-validation to ensure the optimized parameter was robust to permutations. The process consisted in calculating w using all the cases of the training set except one, which was then used for validation. This was repeated in a way that each case in the training set was used once for validation.
FG would like to thank Prof. Yael Mandel-Gutfreund and Dr. Hilda David-Eden from the Technion - Israel Institute of Technology, for useful discussions and suggestions. JFR acknowledges financial support from the Spanish Ministry of Science (grant BIO2010-22324).
- Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, Klitgord N, Simon C, Boxem M, Milstein S, Rosenberg J, Goldberg DS, Zhang LV, Wong SL, Franklin G, Li S, Albala JS, Lim J, Fraughton C, Llamosas E, Cevik S, Bex C, Lamesch P, Sikorski RS, Vandenhaute J, Zoghbi HY, et al.: Towards a proteome-scale map of the human protein-protein interaction network. Nature 2005, 437: 1173–1178. 10.1038/nature04209View ArticlePubMedGoogle Scholar
- Aloy P, Russell RB: Structural systems biology: modelling protein interactions. Nat Rev Mol Cell Biol 2006, 7: 188–197. 10.1038/nrm1859View ArticlePubMedGoogle Scholar
- Russell RB, Alber F, Aloy P, Davis FP, Korkin D, Pichaud M, Topf M, Sali A: A structural perspective on protein-protein interactions. Curr Opin Struct Biol 2004, 14: 313–324. 10.1016/j.sbi.2004.04.006View ArticlePubMedGoogle Scholar
- Robinson CV, Sali A, Baumeister W: The molecular sociology of the cell. Nature 2007, 450: 973–982. 10.1038/nature06523View ArticlePubMedGoogle Scholar
- Alber F, Förster F, Korkin D, Topf M, Sali A: Integrating diverse data for structure determination of macromolecular assemblies. Annu Rev Biochem 2008, 77: 443–477. 10.1146/annurev.biochem.77.060407.135530View ArticlePubMedGoogle Scholar
- Ritchie DW: Recent progress and future directions in protein-protein docking. Curr Protein Pept Sci 2008, 9: 1–15. 10.2174/138920308783565741View ArticlePubMedGoogle Scholar
- Gabb HA, Jackson RM, Sternberg MJ: Modelling protein docking using shape complementarity, electrostatics and biochemical information. J Mol Biol 1997, 272: 106–120. 10.1006/jmbi.1997.1203View ArticlePubMedGoogle Scholar
- Kozakov D, Brenke R, Comeau SR, Vajda S: PIPER: an FFT-based protein docking program with pairwise potentials. Proteins 2006, 65: 392–406. 10.1002/prot.21117View ArticlePubMedGoogle Scholar
- Mintseris J, Pierce B, Wiehe K, Anderson R, Chen R, Weng Z: Integrating statistical pair potentials into protein complex prediction. Proteins 2007, 69: 511–520. 10.1002/prot.21502View ArticlePubMedGoogle Scholar
- Garzon JI, Lopéz-Blanco JR, Pons C, Kovacs J, Abagyan R, Fernandez-Recio J, Chacon P: FRODOCK: a new approach for fast rotational protein-protein docking. Bioinformatics 2009, 25: 2544–2551. 10.1093/bioinformatics/btp447PubMed CentralView ArticlePubMedGoogle Scholar
- Moont G, Gabb HA, Sternberg MJ: Use of pair potentials across protein interfaces in screening predicted docked complexes. Proteins 1999, 35: 364–373. 10.1002/(SICI)1097-0134(19990515)35:3<364::AID-PROT11>3.0.CO;2-4View ArticlePubMedGoogle Scholar
- Zacharias M: ATTRACT: protein-protein docking in CAPRI using a reduced protein model. Proteins 2005, 60: 252–256. 10.1002/prot.20566View ArticlePubMedGoogle Scholar
- Pons C, Talavera D, de la Cruz X, Orozco M, Fernandez-Recio J: Scoring by Intermolecular Pairwise Propensities of Exposed Residues (SIPPER): A New Efficient Potential for Protein-Protein Docking. J Chem Inf Model 2011, 51: 370–377. 10.1021/ci100353eView ArticlePubMedGoogle Scholar
- Vendruscolo M, Dokholyan NV, Paci E, Karplus M: Small-world view of the amino acids that play a key role in protein folding. Phys Rev E Stat Nonlin Soft Matter Phys 2002, 65: 061910.View ArticlePubMedGoogle Scholar
- Greene LH, Higman VA: Uncovering network systems within protein structures. J Mol Biol 2003, 334: 781–791. 10.1016/j.jmb.2003.08.061View ArticlePubMedGoogle Scholar
- Atilgan AR, Akan P, Baysal C: Small-world communication of residues and significance for protein dynamics. Biophys J 2004, 86: 85–91. 10.1016/S0006-3495(04)74086-2PubMed CentralView ArticlePubMedGoogle Scholar
- Bagler G, Sinha S: Network properties of protein structures. Physica A: Statistical Mechanics and its Applications 2005, 346: 27–33. 10.1016/j.physa.2004.08.046View ArticleGoogle Scholar
- Watts DJ, Strogatz SH: Collective dynamics of "small-world" networks. Nature 1998, 393: 440–442. 10.1038/30918View ArticlePubMedGoogle Scholar
- del Sol A, O'Meara P: Small-world network approach to identify key residues in protein-protein interaction. Proteins 2005, 58: 672–682.View ArticlePubMedGoogle Scholar
- del Sol A, Fujihashi H, Amoros D, Nussinov R: Residue centrality, functionally important residues, and active site shape: analysis of enzyme and non-enzyme families. Protein Sci 2006, 15: 2120–2128. 10.1110/ps.062249106PubMed CentralView ArticlePubMedGoogle Scholar
- Sathyapriya R, Vijayabaskar MS, Vishveshwara S: Insights into protein-DNA interactions through structure network analysis. PLoS Comput Biol 2008, 4: e1000170. 10.1371/journal.pcbi.1000170PubMed CentralView ArticlePubMedGoogle Scholar
- Maetschke SR, Yuan Z: Exploiting structural and topological information to improve prediction of RNA-protein binding sites. BMC Bioinformatics 2009, 10: 341. 10.1186/1471-2105-10-341PubMed CentralView ArticlePubMedGoogle Scholar
- Amitai G, Shemesh A, Sitbon E, Shklar M, Netanely D, Venger I, Pietrokovski S: Network analysis of protein structures identifies functional residues. J Mol Biol 2004, 344: 1135–1146. 10.1016/j.jmb.2004.10.055View ArticlePubMedGoogle Scholar
- Hu Z, Bowen D, Southerland WM, del Sol A, Pan Y, Nussinov R, Ma B: Ligand binding and circular permutation modify residue interaction network in DHFR. PLoS Comput Biol 2007, 3: e117. 10.1371/journal.pcbi.0030117PubMed CentralView ArticlePubMedGoogle Scholar
- Sathyapriya R, Duarte JM, Stehr H, Filippis I, Lappe M: Defining an essence of structure determining residue contacts in proteins. PLoS Comput Biol 2009, 5: e1000584. 10.1371/journal.pcbi.1000584PubMed CentralView ArticlePubMedGoogle Scholar
- Montiel Molina HM, Millán-Pacheco C, Pastor N, del Rio G: Computer-based screening of functional conformers of proteins. PLoS Comput Biol 2008, 4: e1000009. 10.1371/journal.pcbi.1000009PubMed CentralView ArticlePubMedGoogle Scholar
- Konrat R: The protein meta-structure: a novel concept for chemical and molecular biology. Cell Mol Life Sci 2009, 66: 3625–3639. 10.1007/s00018-009-0117-0View ArticlePubMedGoogle Scholar
- David-Eden H, Mandel-Gutfreund Y: Revealing unique properties of the ribosome using a network based analysis. Nucleic Acids Res 2008, 36: 4641–4652. 10.1093/nar/gkn433PubMed CentralView ArticlePubMedGoogle Scholar
- Chang S, Jiao X, Li C-hua, Gong X-qi, Chen W-zu, Wang C-xin: Amino acid network and its scoring application in protein-protein docking. Biophys Chem 2008, 134: 111–118. 10.1016/j.bpc.2007.12.005View ArticlePubMedGoogle Scholar
- Cheng TM-K, Blundell TL, Fernandez-Recio J: pyDock: electrostatics and desolvation for effective scoring of rigid-body protein-protein docking. Proteins 2007, 68: 503–515. 10.1002/prot.21419View ArticlePubMedGoogle Scholar
- Hwang H, Pierce B, Mintseris J, Janin J, Weng Z: Protein-protein docking benchmark version 3.0. Proteins 2008, 73: 705–709. 10.1002/prot.22106PubMed CentralView ArticlePubMedGoogle Scholar
- Hwang H, Vreven T, Janin J, Weng Z: Protein-protein docking benchmark version 4.0. Proteins 2010, 78: 3111–3114. 10.1002/prot.22830PubMed CentralView ArticlePubMedGoogle Scholar
- Pons C, Grosdidier S, Solernou A, Pérez-Cano L, Fernández-Recio J: Present and future challenges and limitations in protein-protein docking. Proteins 2010, 78: 95–108. 10.1002/prot.22564View ArticlePubMedGoogle Scholar
- Mitternacht S, Berezovsky IN: A geometry-based generic predictor for catalytic and allosteric sites. Protein Eng Des Sel 2011, 24: 405–409. 10.1093/protein/gzq115View ArticlePubMedGoogle Scholar
- Dokholyan NV, Li L, Ding F, Shakhnovich EI: Topological determinants of protein folding. Proc Natl Acad Sci USA 2002, 99: 8637–8641. 10.1073/pnas.122076099PubMed CentralView ArticlePubMedGoogle Scholar
- Hagberg AANL, Swart PANL, S Chult DU: Exploring network structure, dynamics, and function using networkx. 2008.Google Scholar
- Grosdidier S, Pons C, Solernou A, Fernández-Recio J: Prediction and scoring of docking poses with pyDock. Proteins 2007, 69: 852–858. 10.1002/prot.21796View ArticlePubMedGoogle Scholar
- Pons C, Solernou A, Perez-Cano L, Grosdidier S, Fernandez-Recio J: Optimization of pyDock for the new CAPRI challenges: Docking of homology-based models, domain-domain assembly and protein-RNA binding. Proteins 2010, 78: 3182–3188. 10.1002/prot.22773View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.