Protein docking prediction using predicted protein-protein interface
© Li and Kihara; licensee BioMed Central Ltd. 2012
Received: 26 August 2011
Accepted: 10 January 2012
Published: 10 January 2012
Skip to main content
© Li and Kihara; licensee BioMed Central Ltd. 2012
Received: 26 August 2011
Accepted: 10 January 2012
Published: 10 January 2012
Many important cellular processes are carried out by protein complexes. To provide physical pictures of interacting proteins, many computational protein-protein prediction methods have been developed in the past. However, it is still difficult to identify the correct docking complex structure within top ranks among alternative conformations.
We present a novel protein docking algorithm that utilizes imperfect protein-protein binding interface prediction for guiding protein docking. Since the accuracy of protein binding site prediction varies depending on cases, the challenge is to develop a method which does not deteriorate but improves docking results by using a binding site prediction which may not be 100% accurate. The algorithm, named PI-LZerD (using Predicted Interface with Local 3D Zernike descriptor-based Docking algorithm), is based on a pair wise protein docking prediction algorithm, LZerD, which we have developed earlier. PI-LZerD starts from performing docking prediction using the provided protein-protein binding interface prediction as constraints, which is followed by the second round of docking with updated docking interface information to further improve docking conformation. Benchmark results on bound and unbound cases show that PI-LZerD consistently improves the docking prediction accuracy as compared with docking without using binding site prediction or using the binding site prediction as post-filtering.
We have developed PI-LZerD, a pairwise docking algorithm, which uses imperfect protein-protein binding interface prediction to improve docking accuracy. PI-LZerD consistently showed better prediction accuracy over alternative methods in the series of benchmark experiments including docking using actual docking interface site predictions as well as unbound docking cases.
Many important cellular processes, such as gene expression regulation and transport, are carried out by protein complexes [1–3]. The importance and the abundance of protein interactions and complexes have been recently further highlighted by large-scale protein-protein interaction maps revealed for many organisms [4–7]. The tertiary structure of proteins is necessary for understanding the underlying molecular mechanism of protein interaction , however, it is often difficult to obtain complex structures by experimental methods, e.g. the X-ray crystallography or NMR. Thus, experimentally solved protein complex structures only share a small fraction among known protein complexes confirmed by biochemical experiments. Therefore, an important task in bioinformatics is to develop efficient and accurate computational methods for predicting protein-protein docking conformations.
Many protein-protein docking methods have been developed in the past employing various ideas and techniques [8–20]. Typically a docking prediction for a pair of proteins produces a few thousands of docking conformations (docking decoys), which are subject to ranking using a scoring function. Conformational search algorithms employed include the Fast Fourier Transform (FFT) [16, 17, 21], the Geometry Hashing [18, 22], Monte Carlo algorithms , genetic algorithm [23, 24], and Langevin dynamics . For scoring a docking decoy, usually several terms are combined, which include physics-based scores  and those concern geometrical shape complementarity [18, 27, 28]. Clustering of docking decoys is also shown to be effective in selecting near native conformations [29–31]. Some of the recent docking algorithms have more elaborate procedures, for example, by considering alternative conformations of flexible protein chains  or post docking optimization steps [14, 33]. Nevertheless, despite significant efforts of developing methods, it is still difficult to identify and rank the correct conformations in top ranks among hundreds of decoys [18, 27, 34] as is also evidenced by results from recent Critical Assessment of Prediction of Interactions (CAPRI), a community wide experiment on the comparative evaluation of protein-protein docking methods .
The accuracy of docking prediction could improve when a part, even if not all, of protein-protein interface (PPI) residues are known. PPI residues for a pair of interacting proteins can be identified by experiments including point mutation such as the alanine scanning [35–38], chemical modification of residues [39, 40], NMR , hydrogen/deuterium exchange , and disulfide cross-linking . If several PPI residues are known, they can be simply used for filtering, i.e. to select docking decoys which have the known PPI residues at their docking interface [44, 45]. Alternatively, known PPI residues from interacting proteins can be incorporated as distant constraints . However, experimental methods are time consuming. This is particularly true if identification of a whole PPI region of an interacting protein pair is attempted or if investigating many interacting proteins in a network is planned.
PPI residues can be also predicted by computational methods, which capture sequence and structural features of PPI regions . There are a number of PPI site prediction methods developed. Sequence features used for PPI site prediction include amino acid residue propensity [46–52], sequence conservation [53–57], and correlated mutation [58–60]. Structure information used include hydrophobic patches, the secondary structure propensity , atom group propensity , relative accessible surface area , geometrical surface shape , the crystallographic B-factor , and energetic characteristics of PPI residues [62, 63]. Current protein interface prediction methods choose one or combinations of these features to construct scoring functions for machine learning techniques [51, 55, 56, 64–67]. Recent development of PPI site prediction methods has been overviewed in recent review articles [68, 69]. The obvious advantage of the computational methods over experimental methods is that the former can be performed much faster than the latter. However, the problem of computational prediction methods is that they are not always accurate. For example, the Meta-PPISP method , one of the state-of-the-art methods, predicts PPI residues on average with a precision of 50% at the coverage of 50% for enzyme-inhibitor complexes . Moreover, the prediction accuracy varies depending on target proteins and thus it is difficult to estimate the accuracy for individual cases. Therefore, computational PPI residue prediction cannot be reliably used for simple post-filtering of docking decoys. A naive use of PPI residue prediction for post-filtering may actually decrease the prediction accuracy, as we will show in Results.
Here, we present a novel protein docking algorithm, PI-LZerD (using Predicted Interface with Local 3D Zernike descriptor-based Docking algorithm), which utilizes imperfect PPI residue prediction for guiding protein-protein docking. PI-LZerD performs iterative improvement of docking results starting from an initial run of docking that uses potentially inaccurate PPI prediction as restraints. The base of the docking algorithm used is the LZerD (Local 3D Zernike descriptor-based Docking algorithm), which we have developed previously . The idea of using additional predicted information for aiding protein docking has been explored by a few previous works. In their works, PPI information is used for post-filtering docking decoys [16, 71–73] or incorporated as an additional scoring term [14, 45, 74, 75]. Compared to these related works, the current work is significantly different in the design and some important aspects: First, we have developed a novel algorithm which is specifically designed to utilize imperfect PPI prediction. Thus, we don't use PPI information simply for post filtering. Second, we perform thorough investigation on how the accuracy of PPI prediction affects to the docking prediction accuracy. PI-LZerD is shown to be able to consistently improve docking predictions when actual PPI predictions are used for unbound docking cases. The datasets used and the developed PI-LZerD program are made freely available for academic community.
We start with brief explanation of the original LZerD pairwise protein-protein docking algorithm . As will be explained in the next section, PI-LZerD performs an iterative use of a modified version of LZerD. LZerD takes two protein tertiary structures (Protein Data Bank, PDB , files) as input (termed a ligand and a receptor protein) and outputs over 30000~50000 of docking decoys ranked by a scoring function. The geometric hashing algorithm  is used for docking conformational search.
The scoring function is a weighted sum of the following terms: van der Waals, where, repulsive and attractive parts of the term are considered separately ; an electrostatics term, which considers repulsive/attractive and short-range/long-range contributions separately ; a hydrogen and disulfide bond term ; two solvation terms [80, 81]; and a knowledge-based atom contact term . Weighting factors for the linear combination of the terms were trained on two datasets, the protein-protein docking benchmark 2.0 , which contains 84 pairwise unbound-unbound and bound-unbound docking structures, and also on 851 protein-protein dimeric complexes compiled by Huang and Zou . The combination of weight values were determined by using logistic regression with the interface root mean square deviation (iRMSD) between predicted decoys and the native structure as the target function to be optimized.
We modified the LZerD algorithm so that additional information of a PPI region can be used to restrict the docking search space. Figure 1B illustrates the two methods of restricting conformational search space in geometric hashing. Given a set of (predicted) PPI residues in a ligand or a receptor protein, each surface point is classified into either PPI (points within the gray ellipsoid in Figure 1B) or non-PPI depending on whether the closest atom for the point belong to a PPI residue or not. In the geometry hashing, two base points (two crosses) are selected to define a reference coordinate system, based on which the other local points are transformed. Base points are selected only from the PPI surface points for both ligand and receptor proteins. Then, in the voting stage, matching points between the ligand and receptor are counted either only from the PPI surface points (i.e. matches are only considered within the predicted PPI regions; triangles in the region in gray in Figure 1B) or from all the surface points (triangles and squares) including non-PPI points. Obviously, the former seeks for a geometrical complementarity of the two proteins only at the predicted PPI regions while the latter explores a wider surface area to identify well fitting docking conformation in the vicinity of the predicted PPI regions. PI-LZerD uses these two search areas in different stages of docking conformation search. The more permissive search area is considered for the initial LZerD run and the more restricted searches are performed for the subsequent iterations.
The 1000 decoys are subject to clustering by considering the similarity of docking interface regions. For a given pair of docking decoys, common atoms between the two PPI regions from the two decoys are selected. Then, the RMSD is computed for the common atoms only when the common atoms share more than 60% of all interface atoms of both PPI sites (if the common atoms do not exceed 60% then the two proteins are not clustered together). We call it the common interface RMSD (ciRMSD) of two docking decoys. The ciRMSD is more suitable for the PI-LZerD algorithm as compared to the conventional coordinate RMSD  or the ligand RMSD , since it focuses on capturing the similarity of docking interface regions.
Once the ciRMSD is computed for all the pairs of decoys, 60 decoys are selected by considering the physics-based score and the cluster size of the decoys. First, the decoy with the lowest score (the lower, the better) is selected and close decoys with a ciRMSD ≤ 4.0Å are discarded from the pool. This process is repeated until 30 decoys are identified. Next, additional 30 decoys are selected based on the cluster size. For each of the decoys, the number of the other decoys within 4.0Å ciRMSD is computed. Then, the largest cluster (i.e. the center decoy with the largest number of close decoys) is selected. If several clusters with the same size are found, the one which has the center decoy with the lowest physics-based score is selected. All the decoys in the cluster are removed, and the process is repeated until 30 representative decoys are selected. Consequently, 30 decoys are selected based on the lowest energy and 30 more decoys are selected based on the cluster size. It was shown that combining the energy value and the cluster size can find more hits than using a single metric alone (Additional file 1, Figure S1).
The selected 60 decoys are passed to the subsequent process. For each of the 60 docking decoys, PPI residues are extracted. PPI residues are defined as those which have a heavy atom closer than 5.0 Å to any atom to the docking partner. The decoys do not necessarily have the identical PPI region as the initially provided PPI information because the modified LZerD has explored the vicinity of the input PPI in the docking conformation search. Using the identified PPI residues as the updated constraint, the modified LZerD is run for the second time. In this round, only the PPI surface points are considered at the voting stage in the geometric hashing (the restrictive search). From the resulting docking decoys, the top 1000 lowest energy docking decoys are clustered based on ciRMSD, whose cluster centers are sorted by the physics-based score. Since the modified LZerD is run for each of the 60 decoys, in total of 60 LZerD runs are performed.
In addition to the 60 runs of the modified LZerD, we run the original LZerD without using predicted PPI information followed by post-filtering by using the predicted PPI residues (naive-filtering method) (the left branch of Figure 2). In the naive-filtering method, docking decoys are sorted not by the physics-based score but by the agreement of the docking interface residues to the predicted PPI residues. Therefore, the overall procedure produces 61 runs of docking predictions, i.e. 61 ranked lists of docking decoys. To make the final ranking of docking decoys, first, the top ranked docking decoys from each of the 61 lists are ranked by the physics-based score, and then the decoys in the same subsequent ranks from the 61 lists are ranked in the same way. Thus, the decoys from all the lists are first sorted by their ranks in each list then sorted by the physics-based score. If the identical decoys appear, one which is ranked lower in the entire final list is removed (it is not common but possible that identical docking decoys appear in different LZerD runs).
The first dataset we use for benchmarking PI-LZerD is the protein-protein docking benchmark version 3.0  with 124 bound cases. The average length of the proteins is 256 and the number of docking interface residues of the proteins range from 10 to 70 with an average of 25.
To investigate how the accuracy of PPI prediction affects to the docking prediction, we first use "simulated" PPI predictions as input. The actual PPI region of a ligand and a receptor proteins are shifted by 5, 10, 12, and 15 residues to two opposite directions on the protein surface along the major axis of the PPI region. To shift a PPI region on the surface, n PPI residues (n = 5, 10, 12, 15) at one end of the PPI site along the axis are removed and the same number of residues are added on the opposite side of the PPI site. Thus, the shifting of PPI regions are done geometrically rather than along the protein sequence (Additional file 1, Figure S2). By combining two shifted PPI regions from a ligand and a receptor protein, four test cases are made for each protein complex (because the PPI region on each protein is shifted in two opposite directions). The protein complexes are removed from the dataset if one of proteins has a smaller PPI region than the number of shifted residues. The total number of tested protein complexes with 5, 10, 12, and 15 PPI residues shift are 124 (124 × 4 = 496 test cases), 122 (488 cases), 118 (472 cases), and 104 (416 cases), respectively. Since four different combinations of shifted PPI regions of a ligand and a receptor are tested, the number of tested cases is four times of the number of protein complexes, which is shown in the parentheses.
We also test PI-LZerD using actual PPI predictions with a state-of-the-art PPI prediction method, meta-PPISP . Meta-PPISP is a meta server which combines predictions by three methods, Promate , PINUP , and cons-PPISP . The benchmark dataset is selected from the iPFAM database , a subset of PFAM database , which provides multiple sequence alignments (MSA) of interacting proteins. We used iPFAM because meta-PPISP needs a MSA as an input. The iPFAM entries were pruned using the following criterion: (1) PFAM families with 20 to 100 seed sequences were selected. (2) PFAM families consisting local domain sequences were replaced with their corresponding full-length sequences from UniProt . A representative PDB structure was then selected from each PFAM family given by the association in iPFAM. (3) Protein structures that do not have any observable interacting partners in their PDB files were removed. (4) Proteins with their PDB entries that have non-standard amino acids and obsolete PDB files were filtered out. (4) PDB structures with antibody-antigen and protein-DNA/RNA interactions were removed. (5) Protein complexes with more than two chains are removed. (6) Complexes were eliminated if they are classified as monomers bound by crystal contacts in the PQS definition . (7) Proteins with the size between 75 to 300 amino acids were selected. (8) In the final dataset, PFAM families with redundant representative structures with ≥35% sequence identity were filtered out. Given that MSAs in PFAM may not have the PDB structure as a part of the alignment, we employed MUSCLE (ver. 3.6)  with default parameters to compute MSAs from PFAM unaligned sequences and one sequence from the selected PDB structure. The final dataset includes 127 protein complexes. Using prediction output of the meta-PPISP server, residues which have a meta-PPISP score of 0.1 or higher are identified as PPI residues.
The executable program of PI-LZerD for Linux is freely available to academic institutions at our website, http://kiharalab.org/PI-LZerD. The datasets used in this study are also available at the same webpage. The program requires a computer with at least 1.5 GB RAM operated by Linux OS. The average times combining both docking and scoring range are about 1-2 hours for small proteins (about 400 points on the receptor and ligand) and it may take longer for larger proteins. This timing is reported on a computer with a dual-core 2.1 GHz processor with 8 GB RAM. In addition, the pairwise docking program, LZerD, which is the base of PI-LZerD, is also made available at http://kiharalab.org/proteindocking.
An obvious approach to use predicted PPI information for protein docking prediction is to select docking decoys with a PPI site that agrees well to the provided PPI information. This approach, termed as the naive post-filtering method, was tested on datasets with the five different accuracy levels of PPI prediction. In addition to the set of accurate PPI information, we used PPI sites shifted by 5, 10, 12, and 15 residues. For each protein complex with PPI information, we run original LZerD to produce top 1000 scoring docking decoys. Then, for each docking decoy, the fraction of the overlap of residues in the provided PPI information the PPI region of the docking decoy is computed for both ligand and for the receptor proteins, and the average of the two are used for sorting decoys.
Next we examine performance of PI-LZerD on the dataset of simulated PPI predictions. This experiment is for understanding the effect of various levels of inaccuracy in PPI predictions to the docking results. In the later sections we discuss the results using actual PPI predictions on bound and unbound docking cases. The full implementation of PI-LZerD (Figure 2, PI-LzerD-2) was compared with four other variations of LZerD, namely, the original LZerD without PPI information (the base LZerD), the original LZerD followed by post-clustering without using PPI information, LZerD with naive post-filtering with the PPI information, and PI-LZerD using PPI information with only one iteration of the modified LzerD (PI-LZerD-1). PI-LZerD-1 clusters output of docking decoys using the ciRMSD.
As the accuracy of the PPI information starts to deteriorate, the docking prediction accuracy by the naive post-filtering quickly drops relative to the others. When 5 residue shifted PPI information was used, the post-filtering method still showed the highest number of successful cases up to the 100 ranks (Figures 5C & 5D). When PPI regions were further shifted by 10 residues, PI-LZerD clearly outperformed the post-filtering method. The performance of the post-filtering method went down as low as the base LZerD which did not use the PPI information. It is also noticed that the PI-LZerD-2 performed better than PI-LZerD-1.
Figures 5G & 5H show that when the 12 residue shifted PPI regions were used, the naive filtering method performed even worse than the base LZerD. In contrast, remarkably, PI-LZerD-2 managed to successfully use the inaccurate PPI information, showing a higher accuracy than the base LZerD. The accuracy of PI-LZerD-1 is now comparable to the base LZerD when 2.5 Å iRMSD threshold was used (Figure 5G) but better for 4.0 Å iRMSD threshold (Figure 5H). Finally, with 15 residue shifted PPI regions (Figures 5I & 5J) PI-LZerD-2 still remained superior to the base LZerD while the accuracy by the naive post-filtering went further down. It is worth mentioning that the prediction accuracy by PI-LZerD-2 stays almost the same with 5, 10, 12, and 15 shifted PPI regions. Importantly, the stability of the prediction by PI-LZerD was observed only for PI-LZerD-2 but not PI-LZerD-1. This indicates that the two iterations of modified LZerD run are necessary to effectively explore the vicinity of specified PPI region to find the lowest energy conformation.
In Additional file 1, Figures S3 and S4, we analyzed the same results by classifying the shifted PPI sites by their accuracy. In Additional file 1, Figure S3, the protein complexes are classified by the average sensitivity of the shifted PPI sites of the receptor and the ligand proteins, while they are classified based on the fnat of shifted PPI sites of the receptor and the ligand proteins in Additional file 1, Figure S4. Essentially the same trend was observed in Additional file 1, Figures S3 & S4 as Figure 5. Using the naive post-filtering, near perfect prediction accuracy can be achieved only when the correct PPI information is provided. However, its results quickly deteriorate as the accuracy PPI site information drops. In contrast, PI-LZerD can take advantage of PPI information even when it is not very accurate. For the range of the PPI site information accuracy tested, PI-LZerD always showed better performance than the base LZerD without using PPI information. It is very important that employing additional information (in this case PPI site prediction) do not deteriorate prediction results even if the quality of information is not high, which is successfully achieved by PI-LZerD.
On this dataset, PI-LZerD-2 performed consistently the best at every rank cutoff (x-axis) with both 2.5 Å and 4.0 Å (Figures 6C & 6D) iRMSD thresholds. Within top 10 predictions, PI-LZerD-2 made at least one hit for 51.2% of the cases, while the base LZerD and the naive post-filtering obtained hits for 42.5%, 31.5% of the cases with the 2.5 Å iRMSD cutoff (Figure 6C). Within the rank of 100, the successful cases for the methods increased to 72.4, 55.1, and 38.6%, respectively. Thus, PI-LzerD-2 improved the success rate over the base LZerD by 8.7 and 17.3% points within the rank of 10 and 100. When 4.0Å is used for iRMSD cutoff (Figure 6D), PI-LZerD-2 obtained at least one hit for 33.1/59.8/85.0/95.3% within top 1/10/100/1000 predictions, respectively. The naive post-filtering performed consistently worse than the base LZerD. An important conclusion from these results is that blind PPI site predictions cannot be used for improving docking prediction with the post-filtering procedure. On average it will only deteriorate prediction accuracy.
We observe again the same trend as we observed in the previous experiments: PI-LZerD-2 showed consistently better success rate than the base LZerD at each rank cutoff (Figures 7C & 7D). At the rank cutoff of 10, 100, 1000, PI-LZerD-2 made successful predictions within 2.5 Å iRMSD (Figure 7C) for 9.32%, 23.73%, and 44.92% of the cases, while the success rate of the base LZerD was 7.63%, 20.34%, and 38.98%. With 4.0 Å iRMSD cutoff, (Figure 7D), the success rate of PI-LZerD-2/the base LZerD was 16.95/11.86, 39.83/29.66, and 61.02/53.39 at 10, 100, 1000 ranks. The naive post-filtering performed again worse than the base LZerD at most of the rank cutoff values.
Using this test set, we have also examined the effect of using a different number of decoys in the second round of LZerD run in PI-LZerD. As shown in the illustration of the PI-LZerD algorithm (Figure 2), we use top 30 lowest energy decoys and another 30 decoys with the largest clusters, thus 60 decoys, as the sources of updated PPI sites. We compared prediction results using 50 (i.e. 25 lowest energy decoys and 25 largest cluster decoys), 80, and 100 decoys in Additional file 1, Figure S5. The results show that using 60 docking decoys performs overall best among tested when the cutoff of 2.5 Å is used. When the cutoff of 4.0 Å is used to define near native decoys, all of them showed similar performance.
First example is human cdk2 kinase complex with cell cycle-regulatory protein ckshs1 (PDB ID: 1BUH). The best predictions within top 50 using PI-LZerD/naive post-filtering/LzerD were 1.03 Å (8)/9.09 Å (24)/9.91 Å (17) iRMSD, respectively. In the parentheses the rank of the decoys are shown. The second example (Figure 9B) is monoclonal antibody fab d44.1 complexed with lysozyme (1MLC). The best prediction using PI-LZerD-2/naive post-filtering/LZerD were 0.89 Å (34)/8.37 Å (9)/14.35 Å (22) iRMSD, respectively. The predicted ligand protein position by the naive post-filtering method (shown in red) indicates where the shifted PPI site information pointed. Thus, PI-LZerD managed to find the near native docking pose (green) from the originally provided wrong PPI site information. The near native pose (iRMSD ≤ 4.0 Å) was not found among the top 50 lowest energy score decoys.
The next two examples are taken from the iPFAM dataset where actual PPI predictions by meta-PPISP were used (Figure 6). Figure 9C is a complex of adenovirus single-stranded DNA-binding proteins (1ADU). The PPI site prediction by meta-PPISP is fine for one protein (sensitivity: 0.77) but totally missed the correct PPI site for another protein (sensitivity and specificity of 0.0). PI-LZerD-2 managed to identify a 1.04 Å iRMSD conformation (blue) while the naive post-filtering method made significantly wrong prediction (red). The LZerD energy function failed to identify the near native conformation within top 50 ranks (yellow). Figure 9D is a complex of methionine synthase (1BMT). The best PI-LZerD-2 prediction is at 2.31 Å iRMSD, while the post-filtering method and the base LZerD predictions are at iRMSD of 14.4 Å and 13.0 Å iRMSD, respectively. The PPI prediction for the both chains are much worse than average.
The last two examples are from unbound docking experiments using meta-PPISP predictions. The first example is the predictions for α-1-antitrypsin precursor and trypsinogen complex (1OPH). The best iRMSD predictions by PI-LZerD, the post-filtering, and base LZerD were 3.76 Å, 5.71 Å, and 10.28 Å, respectively. The last one, the complex of human factor VIII and human monoclonal BO2C11 Fab (1IQD), again PI-LZerD-2 identified a near-native pose (an iRMSD of 2.91 Å) (Figure 9E). The base LZerD found lower energy decoys at very different position, an iMRSD of 10.28 Å.
The performance of docking prediction with CPORT and PI-LZerD are compared in Figures 10C & 10D. Overall, for both iRMSD threshold of 2.5 Å (Figure 10C) and 4.0 Å (Figure 10D), PI-LZerD-2 showed a higher success rate at each rank cutoff (x-axis). For example, PI-LZerD-2 obtained 14 success cases out of 57 complexes (24.6%) within 2.5Å when top 100 scoring decoys are considered, while CPORT had 9 successful cases (15.8%) at the same cutoff (Figure 9A). Using a 4.0 Å iRMSD threshold value, PI-LZerD-2 and CPORT obtained 23 (40.4%) and 21 successful cases (36.8%) within top 100 decoys, respectively.
We have developed PI-LZerD, a pairwise docking algorithm that uses imperfect PPI prediction to improve docking accuracy. In the series of experiments, we showed that PI-LZerD successfully improved docking results even when accuracy of PPI information is significantly low. Unlike the post-filtering whose success largely depends on the accuracy of provided PPI information, PI-LZerD can use imperfect PPI prediction to improve prediction by exploring docking poses in the neighborhood of provided PPI prediction. PI-LZerD identifies matches of two proteins at local surface regions that only partially overlap with the provided PPI prediction. In addition, employing two iterations of docking searches (PI-LZerD-2) is shown to be more effective than one round of docking (PI-LZerD-1) because the two iterations enable exploring further from the provided PPI site prediction. Improvement of the average docking accuracy by PI-LZerD over LZerD was observed consistently in the series of benchmark experiments including docking using actual PPI site predictions as well as unbound docking cases.
While this work focused on pairwise docking, the same procedure can be applied for multiple protein-protein docking algorithms [94–100]. As the protein interactions and their networks have become a very important research focus in systems biology, the procedure developed here will be valuable for providing physical picture of such interactions.
The authors gratefully acknowledge David La for helping preparing the benchmark dataset from the iPFAM database. We also thank Vishwesh Venkatraman and Yifeng D. Yang for providing the physics-based scoring function. We have used in part the Moffett clusters at Purdue University Rosen Center for Advanced Computing. This work has been supported by grants from the National Institutes of Health (R01GM075004, R01GM097528). DK also acknowledges grants from National Science Foundation (DMS0800568, EF0850009, IIS0915801).
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.