Skip to main content
  • Research article
  • Open access
  • Published:

Protein-protein docking using region-based 3D Zernike descriptors

Abstract

Background

Protein-protein interactions are a pivotal component of many biological processes and mediate a variety of functions. Knowing the tertiary structure of a protein complex is therefore essential for understanding the interaction mechanism. However, experimental techniques to solve the structure of the complex are often found to be difficult. To this end, computational protein-protein docking approaches can provide a useful alternative to address this issue. Prediction of docking conformations relies on methods that effectively capture shape features of the participating proteins while giving due consideration to conformational changes that may occur.

Results

We present a novel protein docking algorithm based on the use of 3D Zernike descriptors as regional features of molecular shape. The key motivation of using these descriptors is their invariance to transformation, in addition to a compact representation of local surface shape characteristics. Docking decoys are generated using geometric hashing, which are then ranked by a scoring function that incorporates a buried surface area and a novel geometric complementarity term based on normals associated with the 3D Zernike shape description. Our docking algorithm was tested on both bound and unbound cases in the ZDOCK benchmark 2.0 dataset. In 74% of the bound docking predictions, our method was able to find a near-native solution (interface C-α RMSD ≤ 2.5 Å) within the top 1000 ranks. For unbound docking, among the 60 complexes for which our algorithm returned at least one hit, 60% of the cases were ranked within the top 2000. Comparison with existing shape-based docking algorithms shows that our method has a better performance than the others in unbound docking while remaining competitive for bound docking cases.

Conclusion

We show for the first time that the 3D Zernike descriptors are adept in capturing shape complementarity at the protein-protein interface and useful for protein docking prediction. Rigorous benchmark studies show that our docking approach has a superior performance compared to existing methods.

Background

Protein-protein interactions are a pivotal component of many biological processes and mediate a diverse variety of functions that include signal transduction, antibody-antigen complex, transport, and gene expression regulation, to name a few [1–4]. Although detailed structural information of a protein complex is necessary for the understanding of the underlying mechanisms, such details are often difficult to obtain through traditional experimental means (X-ray crystallography, NMR). To this end, a number of computational protein docking methods have been developed [5, 6], which employ various schemes for describing both geometric and energetic complementarity at the docking interface and explore the docking conformational space [7–11].

Although significant efforts have been made to develop protein docking prediction methods, recent Critical Assessment of PRediction of Interactions (CAPRI) [12] experiments have shown that plenty of room for improvement still remains. It is often discussed that one of the weaknesses of current docking methods is that they are not fully capable of handling the inherent flexibility of protein chains, which induces varying degrees of conformational change upon docking [13, 14]. Indeed, accounting for full flexibility of proteins is still a very challenging task for any computational method, especially for cases where the main-chain of a protein undergoes substantial conformational change (a root mean square deviation (RMSD) of a few Angstroms) upon docking. Such large fluctuations of loop regions or domain motions associated with several proteins is difficult to predict even using current molecular dynamics simulation [15] or ab initio protein structure prediction approaches [16–18]. Therefore, some of the earliest works on protein docking treat interacting proteins as rigid-bodies using geometrical complementarity at the docking interface to guide the search for putative binding modes [9, 19, 20].

The importance of effective methods of capturing geometrical information has been recently emphasized because shape complementarity forms the basis of most docking schemes, including those that accommodate a reasonable amount of conformational change. This can be attributed to the following reasons: (1) There are many protein complexes which undergo only subtle conformational change upon docking, which are practically well approximated by rigid-body docking methods [21]. Enzyme-inhibitors, an important class of protein complexes, fall into this category. (2) Flexibility of the protein can be handled to some extent by introducing a degree of softness in the surface representation [22–24] or by applying energy optimization [15] or side chain conformation search. (3) To account for chain flexibility in the docking prediction, pair-wise docking of simulated conformational ensembles of the interacting proteins can be performed, while treating each pre-determined conformation as a rigid body (cross-docking) [15]. (4) More broadly, effective protein surface shape representation has a wide variety of other applications, including structure-based function prediction [25–28], binding-ligand prediction [29, 30], and fast protein database search [31, 32]. Accordingly, there has been a renewed interest in shape driven docking [33–37], with more advanced descriptors being introduced. A brief overview of previous studies relating to protein shape representation and their application to docking is given below.

Connolly published pioneering works on detecting shape complementarity, the solid angle approach [38] and its improved variant called shape distributions [39]. These works attempted to distinguish regions with similar concavity but with different shapes. Proceeding along the same lines, a recently published method, Context Shapes (CS) [33] uses Boolean values to represent local shape features by applying a ray tracing scheme. Each context shape captures the local shape within a sphere centered on a surface point. The description is in the form of a binary vector with each bit assigned 1/0 depending on its location, i.e. inside or outside the surface. A rotational search is then performed to identify docking orientations that are ranked in terms of the buried surface area. Some approaches have also used surface normals for defining docking orientations and shape complementarity [40, 41]. For example, the S C statistic uses the product of the normals of the points at the interface weighted by the distance between them to quantify the geometrical match [42]. In the approach proposed by Bordner and Gorin [43], surface normal vectors are anti-aligned and rotated along the common axis provided by the contacting normal. PatchDock combines geometric hashing and pose clustering to identify matching patterns across geometric patches (concave, convex, flat) in the participating surfaces [44]. The algorithm relies on the use of transformation invariant shape signatures (local reference frames) that are defined using pairs of points and their associated normals. The patch based comparison is also used in another docking program 3D-GARDEN [45] that compares the triangular facets of the ligand and receptor surfaces to define initial docking orientations. 3D Grid-based representations, on the other hand, provide a discretized form of the protein surface and volume and have been a popular choice in several docking approaches [19, 23, 46, 47]. While minor conformational changes can be handled implicitly, the surface complementarity of the docked structures can be assessed in terms of the grid overlap, which can be evaluated by applying Fast Fourier transforms (FFT).

More recently, several publications have featured the use of spherical harmonics and its extension, the 3D Zernike descriptors (3DZDs), which have been successfully applied to comparing shapes of proteins and ligands [31, 32, 48, 49]. HEX [50], for example, uses spherical polar basis functions to model surface shapes. It also avoids the use of expensive grid-based calculations employed in FFT based methods and instead uses the expansion coefficients of spherical harmonics to calculate correlations of the ligand and receptor surface overlaps. Spherical harmonics, however, are not rotationally invariant and make use of Wigner matrices [51] to identify rotationally invariant regions. In contrast, 3DZD corrects this drawback while providing a more compact shape definition. Our previous studies [25, 31, 32, 52, 53] have shown that the rotation invariant descriptor effectively captures protein surface shape similarity on both global and local levels. To our knowledge, no known application of the 3DZD to docking has been reported in literature.

In this paper, we introduce, for the first time, the application of the 3DZD [25, 54] for docking. The present work uses 3DZD for capturing protein shape complementarity and applies it to protein-protein docking by matching local regions around points across arbitrary rotations. The docking algorithm named LZerD (L ocal 3D Zer nike descriptor-based D ocking program) LZerD uses a geometric hashing scheme [10] and incorporates a novel geometric scoring function that serves as an efficient first stage filter. The method does not use any binding site information and is moderately tolerant to conformational changes. The efficacy of the method is demonstrated using the ZDOCK benchmark datasets [21, 55] and results are compared with other approaches [33, 33, 47, 50, 56]. We first show that 3DZD is able to capture shape complementary of docking interfaces effectively and by applying 3DZD to a docking scoring function it improves docking prediction results. We further show that LZerD has a better performance than others in unbound docking predictions while staying comparable for bound-bound docking cases.

Results

Capturing local shape complementarity using 3D Zernike descriptors

To begin with, we demonstrate that the 3DZD sufficiently captures the shape complementarity of protein docking interfaces and can also quantifies the complementarity. As explained in the Methods section, the 3DZD is a series expansion of a three dimensional (3D) function (i.e. in this work protein surface is represented by a 3D function). Hence, two surfaces are compared by the series of coefficients assigned to each term in the 3DZD. Here, we compute the correlation coefficient of the 3DZDs of two surfaces to quantify the similarity. A strong advantage of 3DZD is that it is rotation and translation invariant, i.e. the same 3DZD is obtained for an object in any pose. While our previous studies used the 3DZD for capturing similarity of global and local shapes [25, 31, 32, 57], this study focuses on their use in capturing the complementarity across local surface regions and for protein docking prediction.

The 3DZD of two objects with perfect shape complementarity are identical with a corresponding correlation coefficient of 1.0, as illustrated for a hypothetical case in the Additional file 1. Table 1 lists fifteen protein complexes, for which the 3DZDs of the docking interfaces of bound and unbound proteins are compared. Five complexes are taken from each of the three complex categories in ZDOCK benchmark 2.0, i.e. the enzyme/inhibitor, the antigen/antibody, and the others. Docking interfaces of both bound and unbound proteins are compared. The interface is defined as the set of surface grid points within 4.5 Ã… of any atom of the other protein. Also shown is the change in accessible surface areas upon the complex formation as computed by the NACCESS program [58]. Overall, a high correlation is observed among the 3DZD of the docking interface regions for both bound and unbound complexes; all fifteen cases show a correlation coefficient higher than 0.9 for the bound cases and for all but two of the unbound docking interfaces. SA-barstar complex (PDB: 1AY7) is shown in Figure 1, which has a significant 3DZD correlation of 0.93. There are five cases in Table 1 where the unbound interface shows a higher correlation coefficient than the bound interface with the difference of 0.01 or 0.02 (1ATN). To further examine this effect, we rotated the docking interface of two proteins, 1AHW and 2PCC and computed the differences in the correlation coefficients across the different orientations (taken at 30 degree intervals along the x, y and z axis). We found that these are within the error range (within 0.02) and can be attributed to the discretization of the molecular surface i.e. caused by placing proteins in different orientations on the 3D grid (the size of the grid is set to 0.6 Ã…). Note that the 3DZD is rotation invariant from mathematical point of view, but in practice causes errors due to the voxelization of protein surface. The effect of the variance of the 3DZD upon rotation of the protein has been examined in our previous paper (See Figure 2 in the paper by Sael et al. [31]).

Table 1 Correlation coefficient of 3DZD for bound and unbound interfaces for some protein complexes from the ZDOCK Benchmark datasets.
Figure 1
figure 1

The interface region for the protein complex 1AY7(ribonuclease SA complex with barstar). Magnitudes of 3DZDs of docking interface regions of ribonuclease SA (1AY7-A, interface shape and corresponding 3DZD are shown in black) and barstar (1AY7-B, the interface is shown in white and the corresponding 3DZD is in gray). 3DZD of the order n = 20 is used. Although small discrepancies can be seen in terms of the magnitudes, the high correlation coefficient of 0.95 indicates the similarity of the regions across the interface. The difference of the 3DZDs is emphasized by striped regions.

Figure 2
figure 2

The correlation coefficient of 3DZD and the angle between normal vectors sampled at surface points of (a), the bound and (b), unbound interface of Ras-RasGAP complex. The PDB ID is 1WQ1 and 6A21/1WER for the bound and the unbound proteins, respectively.

As the interface is not known in advance in blind docking predictions, the 3DZDs are computed for local spherical regions centered at selected surface points, then combinations of compatible spheres across the ligand and receptor proteins are sought. The surface normals associated with the points are also combined with the 3DZD to be able to capture both local direction and shape of surface. Figure 2 shows examples of the 3DZD correlation and the angle between the normals calculated for points less than 2.5 Ã… apart in the bound and unbound interfaces of the Ras-RasGAP complex (PDB: 1WQ1). As is shown later, the docking prediction for this complex turned out to be successful by using these two variables to describe the local shape features of the ligand and the receptor.

Docking prediction results

In what follows, we report results of docking predictions by our algorithm, LZerD. To rank the predictions, LZerD uses a combination of four scoring terms, a reward and a penalty term derived from the 3DZD and normal vectors, the buried surface area, and a penalty term for atomic clashes measured in terms of the excluded volume. Docking conformation search is performed using an efficient geometric hashing scheme that incorporates a kd-tree nearest neighbor algorithm. Weighting factors for combining the four scoring terms are optimized on a training dataset which consists of 29 bound-bound protein complexes taken from ZDOCK benchmark dataset 0.0 and 1.0. The resulting scoring function is then tested on ZDOCK benchmark dataset ver. 2.0. Please refer to the Methods section for details.

First, the results for the training dataset, on which the weighting factors are optimized, are shown (Table 2). In the second part, results of the unbound docking using LZerD are reported for 84 Benchmark 2.0 complexes are compared (Table 3). Predictions by LZerD are further analyzed with respect to previously published results of other geometry based docking approaches:

Table 2 Comparison of the combination of the scoring terms.
Table 3 Comparison for the Benchmark 2.0 unbound complexes.
  1. 1)

    ZDOCK(PSC) [47] - The FFT based algorithm incorporates a pairwise shape complementarity (PSC) term that takes into consideration close atomic contacts (within a distance cutoff) between the ligand and the receptor with an added penalty term to account for clashes.

  2. 2)

    PatchDock [56] - Geometric hashing based scheme with a grid based scoring function.

  3. 3)

    Context Shapes [33] - Local shape representation in terms of binary values and a scoring function based on the buried surface area.

  4. 4)

    HEX [50] - Uses spherical polar Fourier correlations to perform a six degree search with a corresponding steric complementarity score (Table 4).

Table 4 Comparison with HEX on unbound cases of Benchmark 2.0.

Prediction results on the training set

Table 2 shows the performance of LZerD for the bound-bound cases used in the training set. A near-native structure or a hit is defined as one with an interface RMSD (iRMSD) of 2.5 Ã… or lower (better) to the native complex. For the 29 cases considered, a hit for 18 complexes are ranked within the top 100. Table 2 also details the effectiveness of adding the scoring terms. The first column shows the results the combined weighted score (the LZerD score) as defined in Eqn. 11 with weights as shown in Table 5 in Methods, the second uses both 3DZD and normal vectors (Eqn. 6), and the last uses only the angle between the normals. In addition, we also compare results by using the buried surface area (Eqn. 7), the excluded volume term (Eqn. 8), and the combination of the two (the last column in Table 2). The weights for combining the buried surface area and the excluded volume terms are optimized in the same way as the weights for the full LZerD score on the same dataset (Table 5). Comparison of the performance shows that the full LZerD score obtains better (i.e. lower) ranks in more cases (20 cases) than the combination of the buried surface area and the excluded volume terms (6 cases), indicating the usefulness of the 3DZD term. The combination of the four terms to yield the LZerD score is justified by the result that it produces the top ranked hits in 20 cases.

Table 5 Parameters computed for the four scoring terms.

Comparison with ZDOCK, Context Shapes, and PatchDock

To assess the performance of the algorithm, LZerD results were compared with those of ZDOCK(PSC) [47], Context Shapes (CS) [33], and PatchDock [56]. Comparison was performed on the bound-bound docking test set (Additional file 2) and also on the unbound test set of ZDOCK Benchmark 2.0 (Table 3). Results on the bound test set for CS, ZDOCK (PSC), and PatchDock are taken from the original article of CS [33] (Table 2 in their article). As in their study, only the top ranking 3600 predictions are considered. As for the unbound test set, ZDOCK decoys (produced by running ZDOCK in the dense mode with a 6° sampling) for the Benchmark 2.0 dataset (84 complexes) was downloaded from the ZDOCK website http://zlab.bu.edu/zdock/benchmark.shtml. For LZerD and ZDOCK, the top ranking 54000 conformations are considered. The two algorithms are compared in two ways (Table 3). First, the re-ranking performance of the LZerD scoring function, when applied to the ZDOCK decoys, is examined. Next, the docking predictions by LZerD are compared with the ZDOCK results to assess the conformation searching capability and scoring performance. Results for CS and PatchDock are taken from the original article of CS [33] (Table 4 in their article).

For the docking predictions of 76 bound-bound cases (Additional file 3), CS has a better performance than others at each rank cutoff of 100, 500, 1000, and 2000. LZerD comes a close second followed by ZDOCK and PatchDock. This order is also gauged by the number of "wins" and the MLR values (Eqn. 16), which yields 20, 56, 96 and 115 for CS, LZerD, ZDOCK, and PatchDock, respectively.

For the unbound test set (Table 3), ZDOCK and LZerD show significantly better performance than CS and PatchDock. It would seem that CS is better tuned for bound-bound docking but is not as well suited for handling unbound cases. In comparing the re-ranked ZDOCK decoys the LZerD score (the second block from the right in Table 3) with the original ranking produced by ZDOCK (the block on the left), similar success rates are seen for ranks below 500. They also have the same number of wins, i.e. cases when either the LZerD score or ZDOCK (PSC) has a better rank than the other. However, the LZerD score performs better when ranks above 500 are considered with as many as 32 and 38 cases in the top 1000 and 2000, respectively, as compared with 29 and 33 for ZDOCK. Comparison between the docking predictions by LZerD (the right block) and ZDOCK shows that the former has a better performance. LZerD obtains a better rank in nine more cases (24 vs. 33) and has more successful cases in the top 100 and 2000 and a tied performance for ranks below 1000. The two last rows of Table 3 show head-to-head comparison between the LZerD Rerank or LZerD with the other methods. These comparisons clearly indicate the superior performance of LZerD Reranking and LZerD Docking over the others. The head-to-head comparison between LZerD and LZerD Rerank shows that the former is better than the latter, implying LZerD's better conformation sampling scheme.

Figure 3 shows examples of complex predictions by LZerD for four cases. 1AHW is a relatively easy case, where both LZerD and ZDOCK (PSC) have more than 20 hits in the top 2000 decoys (Figure 3A). LZerD obtains a hit ranked in the top five and significantly improves the ranking of corresponding ZDOCK decoys from 268 to 15. 1UDI (Figure 3B) and 1WQ1 (Figure 3C) are relatively difficult cases, where only a few hits are obtained within 2000 ranks by the two docking algorithms. The last case, 1SBB, is a difficult target, where both LZerD and ZDOCK failed to obtain any hits within the top 54000 ranks. LZerD generates a correct hit with an iRMSD of 2.5 Ã… at the rank 81598 (Figure 3D2), but the best prediction within top 54000 has an iRMSD of 10.24 Ã… (Figure 3D1). Therefore, the problem of LZerD for this case is the scoring function rather than the conformation sampling.

Figure 3
figure 3

Near-native docking configurations identified by LZerD for unbound-unbound complexes. Top ranking hits are shown for A, tissue factor with inhibitory Fab. The PDB ID of bound complex: 1AHW; unbound structures: 1FGN_LH (RMSD to bound structure: 1.25 Å) & 1TFH_A(RMSD: 0.75 Å). iRMSD of the prediction shown is 1.34 Å, which is ranked fifth. B, uracil-DNA glycosylase with uracil glycosylase inhibitor protein complex. Bound: 1UDI. Unbound structures: 1UDH (RMSD to bound: 0.47 Å) & 2UGI_B (RMSD: 0.88 Å). iRMSD: 2.36 Å, Rank: 59. C, Ras-RasGAP complex. Bound: 1WQ1. Unbound structures: 6Q21_D (RMSD: 0.79 Å) &1WER (RMSD: 0.85 Å). iRMSD: 1.87 Å; Rank: 141. D. T-cell receptor β chain with superantigen SEB. Bound: 1SBB. Unbound: 1BEC (RMSD: 0.50 Å) & 1SE4 (RMSD: 0.83 Å). D1 is the best prediction within the top 54000 rank, which has an iRMSD of 10.24 Å. D2 is a hit with an iRMSD: 2.13 Å found at the rank 81598. The unbound receptor and ligand complex are shown in green and pale blue while the predicted ligand is shown in magenta.

Coparison with HEX

Table 4 shows the comparison between the results for LZerD with another shape-based docking algorithm, HEX [50] on unbound cases of Benchmark 2.0. The docking results for HEX are taken from Table 3 of the paper by Ritchie et al. [50]. In keeping with these results, we computed the ligand RMSD (lRMSD) and the MLR value, and excluded complexes for which no hit (lRMSD of less than 10 Ã…) was found within the top 2000 predictions. The results for ZDOCK are the same as in Table 3 (decoys downloaded from ZDOCK Benchmark website were analyzed).

Out of the 84 benchmark cases, all three methods failed in 25. In the remaining 59 cases, LZerD obtains a better rank in 22 while HEX and ZDOCK have corresponding values of 18 and 20, respectively. LZerD also manages to obtain the lowest MLR score of 164 as compared to ZDOCK and HEX. LZerD also outperforms the other two methods in terms of the number of cases with hits ranked within the top 100, 500, and 2000 decoys.

Discussion

In this paper, we have proposed a novel protein docking algorithm, LZerD that employs the 3DZD for the first time for capturing protein surface shape complementarity of docking interfaces. Previously we have shown that the 3DZD is a powerful tool for identifying both global [31, 32] and local [25, 32] surface similarity among proteins. In this work, we have further shown that 3DZD is also useful for capturing protein local surface complementarity, which is readily applicable for protein docking. LZerD uses geometric hashing as the core of the search algorithm and takes advantage of the rotation-invariant characteristics of the 3DZD. Our experiments show that the descriptor is effective in improving docking predictions when considered as one of the terms in the scoring function.

The results on the ZDOCK benchmark datasets have been compared with the other available shape-based docking algorithms, ZDOCK, Context Shapes, PatchDock, and HEX. On the bound benchmark dataset, LZerD showed a better performance than the two methods (ZDOCK and PatchDock) but slightly worse than Context Shapes (Additional file 2). However, on the unbound benchmark dataset (Tables 3 &4), LZerD clearly outperforms all the other approaches. Hits obtained by Context Shapes, which performed well for the bound docking set, decreased sharply for the unbound benchmark dataset. In contrast, LZerD improves its performance relative to the others for the unbound cases indicating that the 3DZD is effective in handling a certain range of the flexibility of the protein surface that needs to be considered in protein docking. Shape-based approaches which can effectively handle a certain range of flexibility (an RMSD of 1-2 Ã…) is very important in the computational docking problem because such method is an indispensable component in a flexible docking prediction in combination with explicit conformational sampling scheme, such as "ab initio" structure modeling methods (e.g. ROSETTA [59], TASSER [17], CABS [60], TOUCHSTONE [16],) and molecular dynamics approaches [61, 62], that are aimed at handling alternative conformation explicitly. Since even the state-of-the-art ab initio modeling methods are still not capable of precise modeling of protein structures, it is docking methods' task to be able to handle the small range of flexibility.

While the current study only uses geometric terms for scoring, the inclusion of more interaction energy based terms such as desolvation and electrostatics are expected to make further improvement. As a future work, we are following a cross docking approach using LZerD based on an ensemble of simulated structures of individual proteins to further consider flexibility of protein chains.

Conclusion

We showed that the 3D Zernike descriptors are effective in capturing shape complementarity at the protein-protein docking interface. Employing the 3D Zernike descriptors, we developed a novel shape-based protein docking algorithm, LZerD. Comparison with existing shape-based docking algorithms showed that LZerD has a better performance than the other existing methods in unbound docking while remaining competitive for bound docking cases.

Methods

Protein surface representation

The protein surface is defined implicitly as the sum of atom-centered Gaussians [43] and takes the form of G(x) = σ:

(1)

where x is a point on the surface in the three dimensional space and x i the center of an atom in the protein. The degree of smoothness of the resulting isosurface is altered by varying the value of the parameter σ, the effect of which can be seen in Figure 4. The molecular surface definition of the form of Eqn. 1 has been commonly used for representing drug molecules [63], protein-protein interaction [64], and docking [65]. The smoothing parameter has the advantage of allowing a certain amount of flexibility to be incorporated into the representation which for this study is set to 0.35. 0.35 provides sufficient amount of surface detail and being simpler is faster to calculate as compared to the other forms proposed [64]. We used the Marching Cubes algorithm as implemented in the GNU Triangulated Surface (GTS) library http://gts.sourceforge.net to compute the protein surface, which is then discretized by placing it inside a cubic grid. Each grid cell (voxel) is then assigned 1 if it is on the surface and 0, otherwise.

Figure 4
figure 4

Molecular surfaces shown for varying degrees of smoothness σ. Larger values reveal more detail while those at the lower end produce more spherical surfaces. The parameter σ is set to 0.3, 0.4, and 0.5 from left to right.

Shape representation using 3D Zernike descriptors

The protein surface defined in Cartesian coordinates is represented by the 3D Zernike descriptors (3DZD). The 3DZD are a series expansion of a 3D function, f(x), (i.e. protein surface represented by Eqn. 1 in our case), which allows a compact representation of a 3D function. Desirable properties of 3DZD such as transformation invariance and minimum information redundancy (orthonormality) make them well suited for shape matching. Here we provide a brief description of the 3DZD. For more mathematical details, please refer to the paper by Canterakis [66] and also refer to our previous papers for technical procedure for computing the 3DZD for proteins [31, 32].

A protein surface is represented as a 3D function, f(x), by first placing the protein structure onto a 3D grid, and voxels overlapping with the protein surface are marked with a value of 1 and others are with 0. The size of the grid is set to 0.6 Ã…. The 3DZD are derived from the Zernike-Canterakis polynomials[66]. For the order n and repetition m, they are given by

(2)

where (ϑ, φ) are the spherical harmonics[67] ((ϑ, φ) are the polar coordinates) of the lthdegree with l ≤ n, m ∈ [-l, l], and (n-l) is even and non-negative. The radial polynomial R nl (r) where r is the radius is defined so that are orthonormal polynomials when written in Cartesian coordinates. For a 3D function f(x) where x ∈ ℜ3 the 3D Zernike moments are given by

(3)

The above equation is expressed as a linear combination of geometric moments of order n where M rst denotes the geometrical moment of the object normalized to fit in the unit sphere and is a set of complex coefficients. The moments are however not rotationally invariant. They are therefore collected into (2l+1)- dimensional vectors and the norms (||·||) of the vectors define the rotationally invariant 3D Zernike descriptors (Eqn. 4).

(4)

We use n = 10, which yields a total of 36 invariants for the 3DZD. Matching local shapes is thus reduced to comparing the vector of numbers associated with each local surface. The strength of complementarity between two such vectors (p = 36 values in vectors x and y) is given by the correlation coefficient:

(5)

During the docking process, 3DZD are computed for spherical patches of 6 Ã… radius centered on a set of evenly distributed points on the protein surface.

Overview of docking algorithm

A general outline of the docking procedure, LZerD, is shown in Figure 5. Starting with the atomic coordinates of a ligand protein and a receptor protein, evenly distributed points at a minimum separation of 1.8 Ã… are extracted from the corresponding protein surfaces. For each point, a corresponding normal vector is calculated. In all, each point is labeled by its coordinates in the 3D space, the surface normal, and a numeric vector of 3DZD that captures shape features in the region bounded by a sphere of set radius. Geometric hashing is then used to find partial matches between the two surfaces based on the signatures associated with the points. The alignment transformations calculated for the matching point sets are then applied to the ligand protein, following which, the scoring function examines the amount of overlap (with an added penalty for atom collisions). Another term that measures the local geometric complementarity as reflected by the angle between the normals and the shape correlation of the 3DZD vector is also calculated. The terms are then combined with appropriate weighting factors (determined using a genetic algorithm) to give a final score for each orientation. The individual components of the docking algorithm are described in detail in the following sections.

Figure 5
figure 5

Outline of the docking algorithm, LZerD. Although biological knowledge of docking interface regions can be used, we have not used such additional information in this study.

Geometric hashing based point matching

Given two protein surfaces (a ligand protein and a receptor protein) represented by a set of points with corresponding labels, the first task is to quickly identify matches between the two sets of features. We have used geometric hashing [10], a computer vision based approach, for identifying candidate matches. The algorithm proceeds in two stages. In the preprocessing (first) stage, points from the ligand protein are stored in a transformation invariant form in a hash table. For this, an orthonormal coordinate frame is defined using three non-collinear points and the coordinates of all the other points are expressed in this local frame. The calculated invariant vectors are then discretized and stored in a two-dimensional hash table. In the recognition stage, points from the receptor protein are used to calculate candidate frames. For each such coordinate basis and the receptor points encoded in it, corresponding ligand points are identified (those points must satisfy geometrical constraints) in the hash table constructed earlier.

A naïve implementation of the hash table based indexing is however found to be inefficient due to the non-uniform distribution of the data in the hash space. This results in longer search times as hash entry lists have different lengths. Efforts to tackle this problem have attempted the use of rehashing techniques [68] and self-organizing maps [69]. Here, a commonly used data structure for nearest neighbor search, the k d-tree (k is the dimension), which partitions point sets recursively into a small number of cells with no cell containing too many objects, is employed. Unlike hashing, the method is adaptive in that, the partitions are dependent on the data points to be stored. We have used the approximate nearest neighbor (ANN) library, which, for a given query point, retrieves all neighbors within a specified radius and allows more efficient search than conventional kd-tree [70] approaches. These points are further checked for compatibility criteria before they are added to the pool of matches.

Given N L and N R points on the ligand and receptor proteins, respectively, the geometric hashing scheme is as follows:

Define a reference frame using two points, A and B, and a direction where and 0 are the associated normals. For building the Cartesian frame, point A is taken as the origin, as the x-axis, as the y-axis and the z-axis as the cross-product of the other two axes. The orthonormal basis (transformation matrix) so formed includes a rotation and a translation to move a given point to the origin as shown in Figure 6. In choosing a frame, certain constraints are applied. Most values were taken from a previously published study [9], except for two values, dmin and dmax, which are described below. Values ranging from 2.5 to 5 Ã… for dmin and 7.5 to 12 Ã… for dmax were tested on the training set proteins. Considering the number of hits and the computational efficiency, we chose 4 Ã… for dmin and 9 Ã… for dmax.

Figure 6
figure 6

A coordinate frame is calculated using points A and B and the average of their normals as shown in (a). The change of coordinates from (X, Y, Z)-space to (U, V, N)-space is given by the rotation matrix (U, V, N) (the 3 × 3 matrix represented in the left upper corner in the 4 × 4 matrix in (b)) and the translation vector to the origin (assuming the point A is taken as the origin, (-AU, -AV, -AN) shown at the bottom of the 4 × 4 matrix is the translation vector). Coordinates of point C can be expressed in the new coordinate system by using a matrix vector multiplication.

  1. i.

    d min <d Ab <d max where d min and d max are the minimum and maximum allowable distances between two point are set to d min = 4 Ã… and d max = 9 Ã…

  2. ii.

    TorsionAng le(a n , A, B, b n ) < 1.4 radians

  3. iii.

    ∠(a n , b n )< 1.2 radians where ∠denotes the angle

  4. iv.

    0.87 radians < ∠(a n , B), ∠(b n , B) < 1.2 radians and |∠(a n , B) - ∠(b n , B)|< 1.2 radians (1.4, 1.2, and 0.87 radians are approximately equivalent to 80.2, 68.8, and 49.8 degrees)

  5. 1)

    The coordinates of all other points in the ligand are then recorded in this reference frame created in the previous step. Thus, for a ligand defined by P points, the other P-2 points (two points used for the construction of the frame) are transformed into this reference frame. Only those points C for which d AC , d BC < 15 Ã… are considered. The triangle (A, B, C) thus formed is weighted by the edge lengths along with the other labels associated with the vertices.

  6. 2)

    Enter all the transformed points of the ligand protein in each ligand reference frame into the k d- tree where k = 3 i.e. x, y, z coordinates of the ligand point. Each such point has a reference to the coordinate frame in which it was transformed.

  7. 3)

    Compute the coordinate frames and new positions of the receptor points as shown in Steps 1 and 2.

  8. 4)

    For every coordinate frame for the receptor protein:

  9. i.

    For every transformed point in the current receptor frame

  10. 1.

    Perform a range search to retrieve geometrically similar points of the ligand (points recorded in different reference frames are stored in the kd-tree). This search (using methods in the ANN library) locates all nearest neighbors of the current receptor point within a radius bound of 2.5 Ã….

  11. 2.

    For each ligand point located within this bound, compare the labels (3DZD, normals, torsion angles, and point distances) of the points (receptor vs ligand) and those of the corresponding reference frames. If the labels are compatible, then increment a vote counter for the current ligand-receptor basis pair. Criteria for comparison are as follows:

  12. a)

    The lengths of the triangle edges should be compatible i.e.

where r i , r j , l i , and l j are vertices of the triangles and i, j = 1, 2, 3.

  1. b)

    A ligand point i is compatible with a receptor point j if the correlation between their 3DZD exceed a threshold value i.e. Correlations(Z i , Z j ) ≥ 0.65

  2. c)

    Compatibility of pairwise normal angles formed between any two of the ligand or the receptor < 0.8 radian for 1 ≤ i, j ≤ 3

  3. d)

    Torsion angles formed by any two points and their corresponding normals should be compatible across the ligand and the receptor

< 0.8 radian for 1 ≤ i, j ≤ 3

(0.8 radian is approximately equivalent to 45.8 degrees)

  1. e)

    Compatibility of the angle between the normals and reference frame axes for the ligand and receptor. < 0.8 radian for i = 1, 2 where is the unit vector in the direction of l1l2 for the ligand and r1r2 for the receptor.

  2. 3.

    For all the receptor-ligand reference frames that exceed a threshold limit of votes (point pairs with matching labels). The threshold is set to 10 votes.

  3. a)

    For each matching list of point pairs that exceed the minimum list of votes, calculate the alignment transformation that overlays these points [71].

  4. b)

    Apply the transformation to the ligand protein. This transformation is actually a composition of three transforms as shown in Figure 7.

Figure 7
figure 7

Given a receptor R and a ligand L in their original orientation, a random orientation T 0 is first applied to the ligand. Selection of a three-point basis (two points + the average of the normals of the two points) for both the ligand (L" → TIL) and receptor (TIR) allows coordinates to be expressed in the new reference frame. Alignment of the matched set of coordinates in the hash space then produces an additional transformation (L"' → T2L). To move into the original coordinate system, the transformation matrix Z is applied to the ligand protein.

  1. c)

    Calculate the number of clashes as indicated by the excluded volume and the extent of overlap given by the contact surface area. If the excluded volume exceeds the threshold value of 500 Ã…3, discard the transformation.

  2. d)

    Output transformation and corresponding scores.

  3. ii.

    Clear the voting list and proceed with the next reference frame.

  4. 5)

    For all candidate orientations, evaluate the weighted score based on Equations 6-8 described in the subsequent sections. Output the ranked list.

Shape-based scoring function

The scoring function is based on geometric criteria and is composed of the following terms:

  1. 1)

    Term based on the orientation of the surface normals and the local shape correlation defined by the 3DZD. The term is further split into a reward and a penalty.

  2. 2)

    The buried surface area

  3. 3)

    The excluded volume

These individual terms are then combined with appropriate weighting factors (weight optimization performed using a genetic algorithm) to produce a single score with a higher value indicating a more favorable solution.

Term based on normals and 3DZD local shape correlation

This measure of the geometric surface complementarity is based on the orientation of the normals at the interface and the shape correlation between the 3DZD vectors. The score for a given ligand orientation is calculated using the following equation:

(6)

In the above equation, -1 ≤ η ij ≤ 1 is the dot product of the normals while-1 ≤ C ij ≤ 1 is the correlation between the 3DZD vectors as calculated by Eqn. 5. The score ZN(η ij , C ij ) is computed for every point on the ligand protein i and its closest counterpart j on the receptor and summed up for a given docking conformation. The first part of the equation represents the reward score given to cases where a given pair i and j have normals with opposing angles (π radians, i.e. the scalar product is -1) and the 3DZD correlation close to 1 (i.e. perfectly complementary to each other). The maximum reward value of 1.0 is assigned to the term A when C ij = 1 and η ij = -1. The reward value is suitably reduced for cases where the correlation and the angle values are poor. The second part of the equation deals with penalization that occurs when the scalar product of the normals yields a positive value or if the correlation is negative (indicating a poor match between the shapes). The term 1-A has a maximum value of 0.98 when C ij = -1 and η ij = 1. Both scores are weighted by an exponential distance term. The reward term and the penalty term is weighted separately with different weights (Table 5). Although these scores reflect the local shape complementarity, they do not adequately explain the extent of overlap of the surfaces and the resultant clashes that may occur due to the ligand orientation. Two additional terms have therefore been added to account for such cases.

Area of overlap

For a given orientation of the ligand, the amount of overlap is estimated by the buried surface area. The accessible area buried between the interacting components is given by:

(7)

Where SASA are the solvent accessible surface areas of the receptor R, the ligand L , and the complex RL , respectively. The areas were estimated using the boolean look-up tables approach by LeGrand and Merz [72]. Although larger values generally indicate a more stable association, they can often be misleading owing to the large interpenetration of the partners. A penalty term is also included to account for the steric clashes that may occur.

Excluded volume

Atom pairs at the interface that are closer than a cutoff distance (3 Ã… according to the CAPRI criteria [73]) are said to clash. Owing to the proximity, these clashing atoms have a repulsive effect which can be measured by the excluded volume term, for which an analytical expression has been proposed [74]. For two atoms, a and b, with van der Waals radii R a and R b , the distance between the atoms, d ab , the corresponding overlap volume V ab can be calculated as

(8)

where

(9)

The overall excluded volume is given by the sum of the volume overlaps over the receptor, a ∈ R, and the ligand atoms, b ∈ L:

(10)

Optimization of weights using Genetic Algorithm

Given the four scoring terms, the goal is to find a set of weighting factors w, c, such that the fitness function (Eqn. 12) is maximized [75]:

(11)

Here s i is individual scoring terms, w i is the set of weights for each term, and c i is the corresponding offset value. Eqn. 11 defines a ranking order on the set of docking predictions to be assessed. The form of Eqn. 11 was found to be superior to the linear weighting scheme [34] and has been duplicated in this study. A genetic algorithm (GA) based approach was used to determine an optimal set of weights and offset values for the terms. Starting with a population of randomly generated solutions, the GA iteratively attempts to improve the quality of the solutions using evolutionary schemes such as mutation and crossover with a suitably defined fitness function to guide the search.

The current implementation of the GA uses a(μ, λ) evolutionary strategy with self-adaptation [76] where the best μ individuals from a population of λ offspring are chosen as parents for the next generation. The fitness function is based on the Boltzmann-enhanced discrimination of receiver operating characteristic (BEDROC) [77]. The BEDROC metric is motivated by the fact that there are more false positives among the predictions and places more emphasis on hits located near the top of an ordered list as compared to those at the end. The metric has been shown to provide better discrimination of hits and non-hits in comparison to other measures such as average rank and the area under the ROC curve [78]. For a set of N predictions, with n hits, the BEDROC metric applies a continuously decreasing exponential weighting for the ranks of the hits and is calculated as:

(12)

Here r i is the rank of the hit and β = 160.9 is an exponent pre-factor defined such that 80% of the corresponding BEDROC score was based on the top-ranked 1% percent of the predictions. Given the discrepancies in ratio of hits to non-hits across different proteins, the average of the BEDROC values over the training set cases i was chosen as the final fitness.

(13)

Following is the algorithm flow for finding the weights based on GA:

  1. 1)

    Generate an initial population P = 30. Each individual in P is represented by 16 values, corresponding to the 4 feature weights, 4 offset values, and 8 mutation probabilities.

  2. 2)

    For a predefined number of generations K = 250 do

  3. 1.

    Create a new population using crossover and mutation. For each pair of parent individuals, two new offspring are created. Thus, for n = 30 individuals in P, n2 = 900 offspring are created. The blend crossover (Eqn. 14) operator as implemented in OpenBeagle [79] was used. Given two parents with mutation parameters , random values , and parameter α = 0.5 the children are recombined using the blend crossover operator as follows:

    (14)

As different mutation strengths were used for each parameter, a Gaussian mutation operator [76] is applied (Eqn. 15). The update rule for mutation parameters g i and offspring x i for β = 0.01 and random values drawn from a Gaussian distribution (zero mean and unit variance) N(0,1) is given by

(15)
  1. i.

    First the mutation probabilities are recombined (crossover) and then mutated.

  2. ii.

    The weights and offset values are then recombined using the blend crossover operator, followed by mutation using the newly created set of mutation probabilities.

  3. 2.

    Assess the fitness of the newly created population

  4. i.

    Use each individual set of weights (w i , c i ) ∈ P to score and rank the sets of predictions for the roteins in the training set.

  5. ii.

    Evaluate the BEDROC value (Eqn. 12) for each protein based on the ranks of the hits.

  6. iii.

    Calculate the fitness for each individual as given by the average BEDROC value.

  7. 3.

    Select the n best set of parameters from the n2 offspring. These will be the parents for the next generation.

  8. 3)

    At the end of K such cycles, record the best performing set of parameters for the current training set.

All the reported ranks for the LZerD predictions are based on the set of weights (Eqn. 11; Table 5) identified by the genetic algorithm (GA) trained on the data set listed in Table 2. The signs of the weights computed for the 3DZD-normal penalty, the buried surface area, and the excluded volume may seem counterintuitive i.e. they have positive, negative, and positive values when they are supposed to contribute as a penalty, a reward, and a penalty, respectively. However, they do function as desired, as for a vast majority of the cases, the computed terms are well below the offset values, thus effectively reversing the signs.

Datasets and experimental setup

The training and testing procedure of the docking prediction is illustrated in Figure 8. The weighting factors (Eqn. 11) are determined based on a training set of 29 bound-bound protein complexes taken from ZDOCK benchmark datasets 0.0[80] and 1.0 [55]. For each of the 29 protein complexes in the training set, LZerD generated around 50,000-250,000 predictions, which were then classified as hits (predicted complex conformations with an interface RMSD ≤ 2.5 Å) and non-hits. Since the number of predictions was significantly high, the training set was split into 20 subsets each containing 2000 randomly chosen predictions for each protein complex. The subsets were chosen such that 75% of the hits for each complex were included for training. For each of the 20 sets the GA was run 10 times and the best set of weighting factors over each run was recorded. The 200 sets of values were then applied to the whole training dataset covering all predictions and the best performing set of weighting factors was chosen based on the one that yielded the largest average BEDROC score (Eqn. 12). The BEDROC score of the 200 training GA runs is provided as the Additional file 3.

Figure 8
figure 8

Implementation of the training phase using the GA. Twenty splits of the training data set are considered where each split is a collection of 2000 randomly selected predictions that include 75% of the hits from the generated data for each protein complex. Training is repeated 10 times on each split and the best set of weights from each split are recorded. The set of weights that gives the best average BEDROC on the entire training data set is applied to the test set predictions.

The weighting factors obtained were then applied to a test set of (84 bound and unbound complexes) taken from the ZDOCK Benchmark 2.0 [21]. Protein complexes in ZDOCK Benchmark 2.0 that are common to the ZDOCK Benchmark 0.0 and 1.0 were removed from the training set thus ensuring an independent test set.

Evaluation Criteria

All docking predictions were evaluated in terms of the interface (any atom within 10 Å of the other protein is considered to be part of the interface) RMSD between the bound and unbound interface C-α with values under 2.5 Å considered a hit[73]. In addition to the interface RMSDs, the ligand RMSDs (LRMSD) i.e. RMSD of the backbone atoms of the bound and unbound ligand proteins have also been calculated. According to the CAPRI criteria [73], LRMSD values less than 10 Å fall in the acceptable range. Another criterion, based on the mean of the logarithm of rank of the first hit[50] has also been used in the assessment

(16)

Here N is the number of complexes and χ refers to maximum value of the rank to be considered and is set to 1000. The value of MLR ranges between 1 (all rank 1 hits) to χ (no hits within the threshold).

Availability and requirements

The docking program LZerD is written in C++. LZerD is freely available to academic institutions and Linux executables can be downloaded from our website, http://kiharalab.org/proteindocking/. The program requires a computer with at least 1.5 GB RAM operated by Linux OS. As the program requires some preprocessing of the PDB files, a shell script has been provided to automate the procedure. Thus, the user is only required to provide the ligand and receptor protein (PDB) files to be docked as input to the script. Output is in the form of PDB files of top ranking ligand orientations. The average times combining both docking and scoring range are about 1-2 hours for small proteins (about 400 points on the receptor and ligand) and it may take longer for larger proteins. This timing is reported on a computer with a dual-core 2.1 GHz processor with 8 GB RAM.

Abbreviations

3D:

three dimensional

3DZD:

3D Zernike descriptors

LZerD:

Local 3D Zernike descriptor-based Docking program.

References

  1. Russell RB, Alber F, Aloy P, Davis FP, Korkin D, Pichaud M, Topf M, Sali A: A structural perspective on protein-protein interactions. Curr Opin Struct Biol 2004, 14: 313–324. 10.1016/j.sbi.2004.04.006

    Article  CAS  PubMed  Google Scholar 

  2. Aloy P, Russell RB: Ten thousand interactions for the molecular biologist. Nat Biotechnol 2004, 22: 1317–1321. 10.1038/nbt1018

    Article  CAS  PubMed  Google Scholar 

  3. Salwinski L, Eisenberg D: Computational methods of analysis of protein-protein interactions. Curr Opin Struct Biol 2003, 13: 377–382. 10.1016/S0959-440X(03)00070-8

    Article  CAS  PubMed  Google Scholar 

  4. Szilagyi A, Grimm V, Arakaki AK, Skolnick J: Prediction of physical protein-protein interactions. Phys Biol 2005, 2: S1–16. 10.1088/1478-3975/2/2/S01

    Article  CAS  PubMed  Google Scholar 

  5. Ritchie DW: Recent progress and future directions in protein-protein docking. Curr Protein Pept Sci 2008, 9: 1–15. 10.2174/138920308783565741

    Article  CAS  PubMed  Google Scholar 

  6. Halperin I, Ma B, Wolfson H, Nussinov R: Principles of docking: An overview of search algorithms and a guide to scoring functions. Proteins 2002, 47: 409–443. 10.1002/prot.10115

    Article  CAS  PubMed  Google Scholar 

  7. Tovchigrechko A, Wells CA, Vakser IA: Docking of protein models. Protein Sci 2002, 11: 1888–1896. 10.1110/ps.4730102

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  8. Chen R, Li L, Weng Z: ZDOCK: an initial-stage protein-docking algorithm. Proteins 2003, 52: 80–87. 10.1002/prot.10389

    Article  CAS  PubMed  Google Scholar 

  9. Fischer D, Lin SL, Wolfson HL, Nussinov R: A geometry-based suite of molecular docking processes. J Mol Biol 1995, 248: 459–477.

    CAS  PubMed  Google Scholar 

  10. Wolfson H, Rigoutsos I: Geometric hashing: an overview. IEEE Computational Science Engineering 1997, 4: 10–21. 10.1109/99.641604

    Article  Google Scholar 

  11. Ritchie DW, Kemp GJ: Protein docking using spherical polar Fourier correlations. Proteins 2000, 39: 178–194. 10.1002/(SICI)1097-0134(20000501)39:2<178::AID-PROT8>3.0.CO;2-6

    Article  CAS  PubMed  Google Scholar 

  12. Lensink MF, Mendez R, Wodak SJ: Docking and scoring protein complexes: CAPRI. Proteins 3rd edition. 2007, 69: 704–718. 10.1002/prot.21804

    Article  CAS  PubMed  Google Scholar 

  13. Bonvin AM: Flexible protein-protein docking. Curr Opin Struct Biol 2006, 16: 194–200. 10.1016/j.sbi.2006.02.002

    Article  CAS  PubMed  Google Scholar 

  14. Andrusier N, Mashiach E, Nussinov R, Wolfson HJ: Principles of flexible protein-protein docking. Proteins 2008, 73: 271–289. 10.1002/prot.22170

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  15. Krol M, Chaleil RA, Tournier AL, Bates PA: Implicit flexibility in protein docking: cross-docking and local refinement. Proteins 2007, 69: 750–757. 10.1002/prot.21698

    Article  CAS  PubMed  Google Scholar 

  16. Kihara D, Lu H, Kolinski A, Skolnick J: TOUCHSTONE: an ab initio protein structure prediction method that uses threading-based tertiary restraints. Proc Natl Acad Sci USA 2001, 98: 10125–10130. 10.1073/pnas.181328398

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  17. Lee SY, Skolnick J: Benchmarking of TASSER_2.0: an improved protein structure prediction algorithm with more accurate predicted contact restraints. Biophys J 2008, 95: 1956–1964. 10.1529/biophysj.108.129759

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  18. Misura KM, Baker D: Progress and challenges in high-resolution refinement of protein structure models. Proteins 2005, 59: 15–29. 10.1002/prot.20376

    Article  CAS  PubMed  Google Scholar 

  19. Katchalski-Katzir E, Shariv I, Eisenstein M, Friesem AA, Aflalo C, Vakser IA: Molecular surface recognition: determination of geometric fit between proteins and their ligands by correlation techniques. Proc Natl Acad Sci USA 1992, 89: 2195–2199. 10.1073/pnas.89.6.2195

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  20. Gabb HA, Jackson RM, Sternberg MJ: Modelling protein docking using shape complementarity, electrostatics and biochemical information. J Mol Biol 1997, 272: 106–120. 10.1006/jmbi.1997.1203

    Article  CAS  PubMed  Google Scholar 

  21. Mintseris J, Wiehe K, Pierce B, Anderson R, Chen R, Janin J, Weng Z: Protein-Protein Docking Benchmark 2.0: an update. Proteins 2005, 60: 214–216. 10.1002/prot.20560

    Article  CAS  PubMed  Google Scholar 

  22. Jiang F, Kim SH: "Soft docking": matching of molecular surface cubes. J Mol Biol 1991, 219: 79–102. 10.1016/0022-2836(91)90859-5

    Article  CAS  PubMed  Google Scholar 

  23. Palma PN, Krippahl L, Wampler JE, Moura JJ: BiGGER: a new (soft) docking algorithm for predicting protein interactions. Proteins 2000, 39: 372–384. 10.1002/(SICI)1097-0134(20000601)39:4<372::AID-PROT100>3.0.CO;2-Q

    Article  CAS  PubMed  Google Scholar 

  24. Zacharias M: ATTRACT: protein-protein docking in CAPRI using a reduced protein model. Proteins 2005, 60: 252–256. 10.1002/prot.20566

    Article  CAS  PubMed  Google Scholar 

  25. Sael L, Kihara D: Protein surface representation and comparison: New approaches in structural proteomics. In Biological Data Mining. Edited by: Chen J, Lonardi S. Boca Raton, Florida, USA: Chapman & Hall/CRC Press; 2009:89–109.

    Google Scholar 

  26. Li B, Turuvekere S, Agrawal M, La D, Ramani K, Kihara D: Characterization of local geometry of protein surfaces with the visibility criterion. Proteins 2007, 71: 670–683. 10.1002/prot.21732

    Article  Google Scholar 

  27. Tseng YY, Dundas J, Liang J: Predicting Protein Function and Binding Profile via Matching of Local Evolutionary and Geometric Surface Patterns. J Mol Biol 2009, 387: 451–464. 10.1016/j.jmb.2008.12.072

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  28. Laskowski RA, Watson JD, Thornton JM: ProFunc: a server for predicting protein function from 3D structure. Nucleic Acids Res 2005, 33: W89-W93. 10.1093/nar/gki414

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  29. Kahraman A, Morris RJ, Laskowski RA, Thornton JM: Shape variation in protein binding pockets and their ligands. J Mol Biol 2007, 368: 283–301. 10.1016/j.jmb.2007.01.086

    Article  CAS  PubMed  Google Scholar 

  30. Kihara D, Sael L, Chikhi R: Local surface shape-based protein function prediction using Zernike descriptors. Biophys J 2009, 96: 650a. 10.1016/j.bpj.2008.12.3435

    Article  Google Scholar 

  31. Sael L, Li B, La D, Fang Y, Ramani K, Rustamov R, Kihara D: Fast protein tertiary structure retrieval based on global surface shape similarity. Proteins 2008, 72: 1259–1273. 10.1002/prot.22030

    Article  CAS  PubMed  Google Scholar 

  32. Sael L, La D, Li B, Rustamov R, Kihara D: Rapid comparison of properties on protein surface. Proteins 2008, 73: 1–10. 10.1002/prot.22141

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  33. Shentu Z, Al HM, Bystroff C, Zaki MJ: Context shapes: Efficient complementary shape matching for protein-protein docking. Proteins 2008, 70: 1056–1073. 10.1002/prot.21600

    Article  CAS  PubMed  Google Scholar 

  34. Bernauer J, Aze J, Janin J, Poupon A: A new protein-protein docking scoring function based on interface residue properties. Bioinformatics 2007, 23: 555–562. 10.1093/bioinformatics/btl654

    Article  CAS  PubMed  Google Scholar 

  35. Grunberg R, Leckner J, Nilges M: Complementarity of structure ensembles in protein-protein binding. Structure 2004, 12: 2125–2136. 10.1016/j.str.2004.09.014

    Article  CAS  PubMed  Google Scholar 

  36. Segal D, Eisenstein M: The effect of resolution-dependent global shape modifications on rigid-body protein-protein docking. Proteins 2005, 59: 580–591. 10.1002/prot.20432

    Article  CAS  PubMed  Google Scholar 

  37. Zhang Q, Sanner M, Olson AJ: Shape complementarity of protein-protein complexes at multiple resolutions. Proteins 2009, 75: 453–467. 10.1002/prot.22256

    Article  PubMed Central  PubMed  Google Scholar 

  38. Connolly ML: Shape complementarity at the hemoglobin alpha 1 beta 1 subunit interface. Biopolymers 1986, 25: 1229–1247. 10.1002/bip.360250705

    Article  CAS  PubMed  Google Scholar 

  39. Connolly ML: Shape distributions of protein topography. Biopolymers 1992, 32: 1215–1236. 10.1002/bip.360320911

    Article  CAS  PubMed  Google Scholar 

  40. Hou T, Wang J, Chen L, Xu X: Automated docking of peptides and proteins by using a genetic algorithm combined with a tabu search. Protein Eng 1999, 12: 639–648. 10.1093/protein/12.8.639

    Article  CAS  PubMed  Google Scholar 

  41. Gardiner EJ, Willett P, Artymiuk PJ: Protein docking using a genetic algorithm. Proteins 2001, 44: 44–56. 10.1002/prot.1070

    Article  CAS  PubMed  Google Scholar 

  42. Lawrence MC, Colman PM: Shape complementarity at protein/protein interfaces. J Mol Biol 1993, 234: 946–950. 10.1006/jmbi.1993.1648

    Article  CAS  PubMed  Google Scholar 

  43. Bordner AJ, Gorin AA: Protein docking using surface matching and supervised machine learning. Proteins 2007, 68: 488–502. 10.1002/prot.21406

    Article  CAS  PubMed  Google Scholar 

  44. Schneidman-Duhovny D, Inbar Y, Nussinov R, Wolfson HJ: PatchDock and SymmDock: servers for rigid and symmetric docking. Nucleic Acids Res 2005, 33: W363-W367. 10.1093/nar/gki481

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  45. Lesk VI, Sternberg MJ: 3D-Garden: a system for modelling protein-protein complexes based on conformational refinement of ensembles generated with the marching cubes algorithm. Bioinformatics 2008, 24: 1137–1144. 10.1093/bioinformatics/btn093

    Article  CAS  PubMed  Google Scholar 

  46. Jiang F, Kim SH: "Soft docking": matching of molecular surface cubes. J Mol Biol 1991, 219: 79–102. 10.1016/0022-2836(91)90859-5

    Article  CAS  PubMed  Google Scholar 

  47. Chen R, Weng Z: A novel shape complementarity scoring function for protein-protein docking. Proteins 2003, 51: 397–408. 10.1002/prot.10334

    Article  CAS  PubMed  Google Scholar 

  48. Mak L, Grandison S, Morris RJ: An extension of spherical harmonics to region-based rotationally invariant descriptors for molecular shape description and comparison. J Mol Graph Model 2007, 26: 1035–1045. 10.1016/j.jmgm.2007.08.009

    Article  PubMed  Google Scholar 

  49. Gramada A, Bourne PE: Multipolar representation of protein structure. BMC Bioinformatics 2006, 7: 242. 10.1186/1471-2105-7-242

    Article  PubMed Central  PubMed  Google Scholar 

  50. Ritchie DW, Kozakov D, Vajda S: Accelerating and focusing protein-protein docking correlations using multi-dimensional rotational FFT generating functions. Bioinformatics 2008, 24: 1865–1873. 10.1093/bioinformatics/btn334

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  51. Edmonds AR: Angular momentum in quantum mechanics. Princeton: Princeton University Press; 1957.

    Google Scholar 

  52. La D, Esquivel-Rodriguez J, Venkatraman V, Li B, Sael L, Ueng S, Ahrendt S, Kihara D: 3D-SURFER: software for high-throughput protein surface comparison and analysis. Bioinformatics 2009, 25: 2843–2844. 10.1093/bioinformatics/btp542

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  53. Sael L, Kihara D: Characterization and classification of local protein surfaces using self-organizing map. International Journal of Knowledge Discovery in Bioinformatics (IJKDB) 2010, in press.

    Google Scholar 

  54. Novotni M, Klein R: 3D Zernike descriptors for content based shape retrieval. ACM Symposium on Solid and Physical Modeling, Proceedings of the eighth ACM symposium on Solid modeling and applications 2003, 216–225. full_text

    Chapter  Google Scholar 

  55. Chen R, Mintseris J, Janin J, Weng Z: A protein-protein docking benchmark. Proteins 2003, 52: 88–91. 10.1002/prot.10390

    Article  CAS  PubMed  Google Scholar 

  56. Schneidman-Duhovny D, Inbar Y, Polak V, Shatsky M, Halperin I, Benyamini H, Barzilai A, Dror O, Haspel N, Nussinov R, Wolfson HJ: Taking geometry to its edge: fast unbound rigid (and hinge-bent) docking. Proteins 2003, 52: 107–112. 10.1002/prot.10397

    Article  CAS  PubMed  Google Scholar 

  57. Venkatraman V, Sael L, Kihara D: Potential for protein surface shape analysis using spherical harmonics and 3D Zernike descriptors. Cell Biochem Biophys 2009, 54: 23–32. 10.1007/s12013-009-9051-x

    Article  CAS  PubMed  Google Scholar 

  58. Hubbard SJ, Thornton JM: NACCESS. London: University College London, Department of Biochemistry and Molecular Biology; 1993.

    Google Scholar 

  59. Das R, Baker D: Macromolecular modeling with rosetta. Annu Rev Biochem 2008, 77: 363–382. 10.1146/annurev.biochem.77.062906.171838

    Article  CAS  PubMed  Google Scholar 

  60. Kolinski A: Protein modeling and structure prediction with a reduced representation. Acta Biochim Pol 2004, 51: 349–371.

    CAS  PubMed  Google Scholar 

  61. Schlick T: Molecular modeling and simulation. New York: Springer-Verlag; 2002.

    Book  Google Scholar 

  62. Brooks BR, Brooks CL III, Mackerell AD Jr, Nilsson L, Petrella RJ, Roux B, Won Y, Archontis G, Bartels C, Boresch S, Caflisch A, Caves L, Cui Q, Dinner AR, Feig M, Fischer S, Gao J, Hodoscek M, Im W, Kuczera K, Lazaridis T, Ma J, Ovchinnikov V, Paci E, Pastor RW, Post CB, Pu JZ, Schaefer M, Tidor B, Venable RM, Woodcock HL, Wu X, Yang W, York DM, Karplus M: CHARMM: the biomolecular simulation program. J Comput Chem 2009, 30: 1545–1614. 10.1002/jcc.21287

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  63. Grant JA, Pickup BT: A Gaussian Description of Molecular Shape. Journal of Physical Chemistry 1995, 99: 3503–3510. 10.1021/j100011a016

    Article  CAS  Google Scholar 

  64. Gabdoulline RR, Wade RC: Effective charges for macromolecules in solvent. Journal of Physical Chemistry 1996, 100: 3868–3878. 10.1021/jp953109f

    Article  CAS  Google Scholar 

  65. Lesk VI, Sternberg MJ: 3D-Garden: a system for modelling protein-protein complexes based on conformational refinement of ensembles generated with the marching cubes algorithm. Bioinformatics 2008, 24: 1137–1144. 10.1093/bioinformatics/btn093

    Article  CAS  PubMed  Google Scholar 

  66. Canterakis N: 3D Zernike moments and Zernike affine invariants for 3D image analysis and recognition. Proc 11th Scandinavian Conference on Image Analysis 1999, 85–93.

    Google Scholar 

  67. Dym H, McKean H: Fourier series and integrals. San Diego: Academic Press; 1972.

    Google Scholar 

  68. Lifshits M, Blayvas I, Goldenberg R, Rivlin E, Rudzsky M: Rehashing for Baysian geometric hasing. Proceedings of the 17th International Conference on ICPR'04 2004, 3: 99–102.

    Google Scholar 

  69. Bebis G, Georgiopoulos M, Lobo ND: Using self-organizing maps to learn geometric hash functions for model-based object recognition. Ieee Transactions on Neural Networks 1998, 9: 560–570. 10.1109/72.668897

    Article  CAS  PubMed  Google Scholar 

  70. Arya S, Mount DM, Netanyahu NS, Silverman R, Wu AY: An optimal algorithm for approximate nearest neighbor searching in fixed dimensions. Journal of the Acm 1998, 45: 891–923. 10.1145/293347.293348

    Article  Google Scholar 

  71. Umeyama S: Least-Squares Estimation of Transformation Parameters Between 2 Point Patterns. Ieee Transactions on Pattern Analysis and Machine Intelligence 1991, 13: 376–380. 10.1109/34.88573

    Article  Google Scholar 

  72. Legrand S, Merz K: Rapid approximation to molecular surface area via the use of Boolean logic and look-up tables. J Comp Chem 1993, 14: 349–352. 10.1002/jcc.540140309

    Article  CAS  Google Scholar 

  73. Janin J: Assessing predictions of protein-protein interaction: the CAPRI experiment. Protein Sci 2005, 14: 278–283. 10.1110/ps.041081905

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  74. Eyal E, Najmanovich R, McConkey BJ, Edelman M, Sobolev V: Importance of solvent accessibility and contact surfaces in modeling side-chain conformations in proteins. Journal of Computational Chemistry 2004, 25: 712–724. 10.1002/jcc.10420

    Article  CAS  PubMed  Google Scholar 

  75. Sebag M, Aze J, Lucas N: ROC-based evolutionary learning: Application to medical data mining. Artificial Evolution 2004, 2936: 384–396.

    Article  Google Scholar 

  76. Meyer-Nieberg S, Hans-Geor B: Self-adaptation in evolutionary algorithms. In Parameter Setting in Evolutionary Algorithms. Berlin/Heidelberg: Springer; 2007:47–75. full_text

    Chapter  Google Scholar 

  77. Truchon JF, Bayly CI: Evaluating virtual screening methods: good and bad metrics for the "early recognition" problem. J Chem Inf Model 2007, 47: 488–508. 10.1021/ci600426e

    Article  CAS  PubMed  Google Scholar 

  78. Truchon JF, Bayly CI: Evaluating virtual screening methods: good and bad metrics for the "early recognition" problem. J Chem Inf Model 2007, 47: 488–508. 10.1021/ci600426e

    Article  CAS  PubMed  Google Scholar 

  79. Gagne C, Parizeau M: Genericity in evolutionary computation software tools: principles and cas-study. Int J Artif Intell Tools 2006, 15: 173–194. 10.1142/S021821300600262X

    Article  Google Scholar 

  80. Chen R, Weng Z: Docking unbound proteins using shape complementarity, desolvation, and electrostatics. Proteins 2002, 47: 281–294. 10.1002/prot.10092

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge useful discussions with David La during the course of this work. We thank Paul Tuck for proofreading the manuscript. This work has been supported by grants from the National Institutes of Health (R01GM075004). DK also acknowledges grants from NIH (U24 GM077905) and National Science Foundation (DMS0604776, DMS0800568, EF0850009, IIS0915801).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daisuke Kihara.

Additional information

Authors' contributions

VV implemented the algorithms, designed and conducted the experiments, and drafted the paper. YDY participated in the development of geometric hashing. LS developed programs for computing the 3DZD and analyzed protein docking interfaces. DK conceived of the study, participated in its design and coordination, and helped to draft the manuscript. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Venkatraman, V., Yang, Y.D., Sael, L. et al. Protein-protein docking using region-based 3D Zernike descriptors. BMC Bioinformatics 10, 407 (2009). https://doi.org/10.1186/1471-2105-10-407

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1471-2105-10-407

Keywords