A multilayer dynamic perturbation analysis method for predicting ligand–protein interactions

Gu, Lin; Li, Bin; Ming, Dengming

doi:10.1186/s12859-022-04995-2

Research
Open access
Published: 02 November 2022

A multilayer dynamic perturbation analysis method for predicting ligand–protein interactions

Lin Gu¹,
Bin Li¹ &
Dengming Ming¹

BMC Bioinformatics volume 23, Article number: 456 (2022) Cite this article

1451 Accesses
1 Citations
1 Altmetric
Metrics details

Abstract

Background

Ligand–protein interactions play a key role in defining protein function, and detecting natural ligands for a given protein is thus a very important bioengineering task. In particular, with the rapid development of AI-based structure prediction algorithms, batch structural models with high reliability and accuracy can be obtained at low cost, giving rise to the urgent requirement for the prediction of natural ligands based on protein structures. In recent years, although several structure-based methods have been developed to predict ligand-binding pockets and ligand-binding sites, accurate and rapid methods are still lacking, especially for the prediction of ligand-binding regions and the spatial extension of ligands in the pockets.

Results

In this paper, we proposed a multilayer dynamics perturbation analysis (MDPA) method for predicting ligand-binding regions based solely on protein structure, which is an extended version of our previously developed fast dynamic perturbation analysis (FDPA) method. In MDPA/FDPA, ligand binding tends to occur in regions that cause large changes in protein conformational dynamics. MDPA, examined using a standard validation dataset of ligand-protein complexes, yielded an averaged ligand-binding site prediction Matthews coefficient of 0.40, with a prediction precision of at least 50% for 71% of the cases. In particular, for 80% of the cases, the predicted ligand-binding region overlaps the natural ligand by at least 50%. The method was also compared with other state-of-the-art structure-based methods.

Conclusions

MDPA is a structure-based method to detect ligand-binding regions on protein surface. Our calculations suggested that a range of spaces inside the protein pockets has subtle interactions with the protein, which can significantly impact on the overall dynamics of the protein. This work provides a valuable tool as a starting point upon which further docking and analysis methods can be used for natural ligand detection in protein functional annotation. The source code of MDPA method is freely available at: https://github.com/mingdengming/mdpa.

Peer Review reports

Introduction

Proteins are ubiquitous in the cellular environment. They are not only the main components of cells but also the vital workforce to maintain the biochemical function of life. Proteins achieve their functions by interacting with a variety of molecules in cells, including some macromolecules (DNA, RNA, membranes) and small molecules (catalytic substrates, nucleotides, peptides, and artificial chemicals, often called ligands) [1, 2]. These interactions usually occur at several key amino acids in proteins, known as ligand-binding sites(LBSs), so the identification and characterization of protein LBSs and their associated ligands is key to understanding enzyme catalytic mechanism, disease pathogenesis and provide indispensable knowledge for enzyme engineering and drug design [3,4,5,6]. Furthermore, identifying protein LBSs is typically an indispensable starting step in virtual screening calculations, such as finding inhibitors or screening enzymes that bind to specific substrates [7].

Over the years, biological experiments have provided a wealth of protein structural data spanning all domains of life, which are available from the Protein Data Bank (PDB [8]), providing valuable, highly accurate protein LBS data. The rapid accumulation of experimental protein LBS data has been compiled into different databases, such as PoSSuM [9], BioLip [10], PLIC [11], sc-PDB [12], PDBbind [13], eModel-DBD [14], MOAD [15], etc., for different applications, giving birth to many computational methods for detecting and analyzing protein LBSs [16]. In recent years, not only has the number of known protein structures characterized experimentally increased rapidly, but advanced protein structure prediction programs are giving an ever-increasing number of new structures with reliable accuracy at an unprecedented rate [17, 18], which poses a great new challenge for the computational characterization of protein LBSs. At present, a variety of computational methods have been developed in the field of protein LBS prediction. According to their main algorithms, they can be divided into geometric-, energy-, template-/consensus-, knowledge-based methods and machine-learning methods; see references [19, 20] for details. Here, we mainly limited ourselves to a few geometric- or structure-based dynamic approaches that are closely related to this study, with particular emphasis on those that are freely available and freestanding.

As an early geometric method, POCKET [21] and the derivative LigSite [22] used geometric grids to explore protein-solvent-protein events and predicted those grids that begin and end with “protein” and into which solvents can be inserted as pockets. SURFNET [23] and PASS [24] used a probing ball randomly rotating on protein surface to search candidate pockets. FPOCKET [25] is a stand-alone geometric method that produces grid points based on Voronoi tessellation. For a given protein, FPOCKET usually generates many pockets which cover most protein LBSs, but they are not always in the top pocket. However, due to its easy-to-use, stand-alone and fast, it is very popular in the field and has been used as a built-in component by other programs, such as P2Rank [19]. Other methods introduce energy calculation on the basis of geometric representation. Q-SiteFinder [26] places methyl probes (-CH3) at grid points, calculates van der Waals interaction energies between probe atoms and neighboring protein atoms, and then clusters low-energy probes into candidate pockets as predictions. SiteHound [27] calculates the interaction forces between grid points and the protein, and clusters those points with higher interaction energies as candidate pockets. Then comes the consensus method MetaPocket 2.0 [28], a popular web server that combines the results produced by the above mentioned methods, including LigSite, PASS, Q-SiteFinder, Surfnet, FPOCKET and other three methods to improve the overall prediction success rate. Other consensus methods include COACH [29], G-LoSA [30], Libra [31], bSiteFinder [32], eMatchSite [33], etc. The template-based or consensus methods were reported to be the most successful and useful LBS prediction tools in CASP [34], which computationally determine conserved sites for certain protein families that bind target ligands based on sequence alignments [35]. Machine learning methods classify LBSs or protein functional residues based on structural, physico chemical features of biomolecular sequences/structures in the training sets [36,37,38,39,40,41]. In recent years, just like their development in protein structure prediction, machine learning and deep learning methods have also developed the fastest in the field of ligand-binding pocket and LBS prediction, as driven by the rapid accumulation of large amounts of experimental data [19, 42,43,44,45].

We have developed a template-free, structure-based and stand-alone method to predict protein LBS based on the protein dynamics perturbation analysis (DPA, and its Fast version FDPA) [46, 47]. The algorithm is based on the observation that external interactions, such as ligand binding, tend to occur in protein regions where interactions cause large conformational distribution changes, as measured by the allosteric potential D_x [46], which is the Kullback–Leibler divergence between protein conformational distributions with and without interaction. The MSMS program [48] was used to generate a layer of test points with a 1.5 Å radius on the surface of the protein, and those points with higher D_x values were selected and clustered to form prediction regions. In DPA, proteins were treated as some elastic network structures, and the external interactions between test points and neighboring protein atoms were simulated by connecting springs. The method was evaluated using 305 proteins in GOLD test set [49] of the time, and DPA yielded 287 protein-pocket predictions (rate of 94%), of which 250 predictions (rate of 87%) gave at least one correct LBS, while FDPA predicted 267 (rate of 86%) and 251 (rate of 94%), respectively. FDPA costs about 3 s to make a prediction for a protein of 130 aa with 2.2 GHz Intel Core i7 processor. We noted that since the DPA/FDPA algorithms only use a set of two-dimensional test points that are closely parallel to the protein surface (with an average distance of 1.5 Å from the protein surface), they cannot give the spatial distribution of target compounds (ligand or substrate) inside the pocket. However, in addition to LBSs and pockets, the spatial extensions of ligands within pockets are also important in practice [50] and their predictions have also received increasing attention over the years [51,52,53,54,55].

In this study, we presented a new version of DPA, called the multilayer dynamics perturbation analysis (MDPA), to predict both the protein LBSs and the spatial extension of ligands within the predicted pocket. The method was evaluated with a popular CCDC/Astex dataset for ligand-binding site prediction [56] and compared with the popular algorithms FPOCK [25] and CAVIAR [57]. The output of the algorithm might be used as input parameters for docking programs such as AutoDock [58], Vina [59], and others, and may also help optimize the conformation of the target ligand.

Methods

Implementation of MDPA

MDPA is an extension of the previously developed fast DPA or FDPA method. See the original work[47] for details of the FDPA. Here, only the principle and calculation process are outlined, with special emphasis on the extension of the method. This method is based on the observation that interactions introduced at LBSs tend to cause large perturbations to the protein conformational distribution, thereby making the protein activity susceptible to modulation by external molecular interactions at particular LBSs[46]. The conformational distribution of a protein containing N atoms, \(P\left(\mathbf{X}\right)\propto exp(-U\left(\mathbf{X}\right)/{k}_{B}T)\), is a probability function of the protein configuration \(\mathbf{X}=(\overrightarrow{{\mathbf{r}}_{1}},\overrightarrow{{\mathbf{r}}_{2}},\cdots ,\overrightarrow{{\mathbf{r}}_{{\varvec{N}}}})\), where \(\overrightarrow{{\mathbf{r}}_{\mathbf{j}}}\) is the coordinate of jth atom. In DPA/FDPA, the potential energy \(U(\mathbf{X})\) was modeled in harmonic approximation as a quadratic function \(U(\mathbf{X})=E(\Delta \mathbf{X})=\frac{1}{2}{\Delta {\varvec{X}}}^{T}\mathbf{H}\Delta \mathbf{X}\), where \(\Delta \mathbf{X}=\mathbf{X}-{\mathbf{X}}_{0}\) measures the protein deformation from its equilibrium configuration \({\mathbf{X}}_{0}\), and \(\mathbf{H}\), called the Hessian matrix, is the second partial mass-weighted derivative matrix of the potential energy \(U\) evaluated at \({\mathbf{X}}_{0}\) with respect to local coordinate changes. Harmonic approximation of the potential energy function U is usually valid because, under normal conditions, biomacromolecules must keep their structural fluctuations within a narrow range to perform their function correctly [60].

The key step of DPA/FDPA was to introduce external test interactions at random position “a” on the protein surface. For a given protein conformation \(\mathbf{X}\), the algorithm determined the new configuration distribution \({P}^{({\varvec{a}})}(\mathbf{X})\) with the external interaction at “a” and measured its difference from unperturbed distribution \(P(\mathbf{X})\) based on the Kullback–Leibler divergence, giving the D-value: \({D}^{(a)}=\int d\mathbf{X}\mathbf{l}\mathbf{n}\left(\frac{{P}^{({\varvec{a}})}(\mathbf{X})}{P(\mathbf{X})}\right){P}^{({\varvec{a}})}(\mathbf{X})\) [61]. In practice, the configuration distribution \(P(\mathbf{X})\) was approximated by Boltzmann distribution with the potential energy \(U(\mathbf{X})\) modeled using the \({C}_{\alpha }\)-based elastic network model [62, 63], and an analytic solution of D^(a) was derived (depending on the protein equilibrium conformation \({\mathbf{X}}_{0}\)). In DPA/FDPA, a set of random test points, also called surface points, denoted as \({\mathbf{L}}_{1}=\left\{{a}_{j}^{(1)},j=\mathrm{1,2},...,{S}_{1}\right\},\) was generated by the MSMS program [48], and those points with higher D-value (\({D}^{({a}^{(1)})})\) were screened out and grouped into small clusters \({O}^{(1)},{P}^{(1)},...\), by using the OPTICS[64] clustering algorithm. Each cluster represented a predicted pocket, and protein residues near the cluster were then predicted as LBSs.

In DPA/FDPA, the set of test points generated by MSMS usually forms a closed surface parallel to the protein surface, which in turn makes the predicted pocket consisting of selected surface points very close (1.5 Å) to protein residues in any case. This often differs from what we observed when small molecules bind to proteins. In fact, except for a few marginal atoms, the entire small molecule tends to be located in the center of the binding pocket, some distance from the surface residues. Considering the importance of this ligand spatial extension in pockets, here we introduce a simple generalization of FDPA, called multilayer DPA or MDPA, for predicting protein LBSs and ligand spatial extension in pockets. MDPA treats the surface points \({\mathbf{L}}_{1}\) as a layer of virtual alpha-carbon atoms covering the protein structure, and then used MSMS program to generate a new set of surface points \({\mathbf{L}}_{2}=\left\{{a}_{j}^{(2)},j=\mathrm{1,2},...,{S}_{2}\right\}\) outside \({\mathbf{L}}_{1}\), using complex structure composed of the protein and \({\mathbf{L}}_{1}\). The D-values \({D}^{({a}^{(2)})}\) for \({\mathbf{L}}_{2}\) surface points \(a_{j}^{\left( 2 \right)} \;^{\prime } s\) are determined using a similar analytic solution formula in FDPA, and those with higher D-values were screened out, and grouped into clusters \({O}^{(2)},{P}^{(2)},...\). Compared to pockets \(\left\{{O}^{(1)},{P}^{(1)},...\right\}\) predicted based on layer 1 surface points, pockets \(\left\{{O}^{(2)},{P}^{(2)},...\right\}\) are predicted to be farther from the protein surface, with an average distance of twice the average distance from \({\mathbf{L}}_{1}\) to the protein surface, i.e., twice the diameter of the probe sphere. Repeating this process, the third layer of surface points \({\mathbf{L}}_{3}\) was built on top of \({\mathbf{L}}_{2}\), giving predicted pockets \(\left\{{O}^{(3)},{P}^{(3)},...\right\}\). Higher-layered pockets can also be generated in this way. In this study, only the pockets generated by the first three layers of test points were used, which was made after considering the balance between the prediction accuracy requirement and the computational load. Finally, the predicted pockets of all layers are collected, merged, and reclustered according to their spatial connectivity to obtain the final pocket predictions: \({O}_{M}, {P}_{M}\)… Therefore, in many cases, a final predicted pocket, say \({O}_{M}\), may be the result of connection of pockets from multiple layers such that, say \({O}_{M}= {O}^{(1)}+{O}^{(2)}+{P}^{(3)}+...\), and each pocket was re-ranked according to the averaged D-value of test points in it. The prediction of protein LBSs was straightforward by simply finding amino acids whose Cα atom is within a distance of 6 Å from a point in the rank-1 predicted cluster (denoted as cluster \({O}_{m}\)). LBSs predicted by lower-ranked pockets, if any, were also analyzed for comparison. The predicted clusters and protein structures were visualized by PyMOL (The PyMOL Molecular Graphics System, Version 2.5.2, Schrödinger, LLC.).

Validation dataset

To verify the accuracy of the algorithm in predicting ligand-binding pockets, the CCDC/Astex dataset of 85 protein-ligand complexes were used in this study [56]. The dataset consists of a diverse, high-quality test set originally developed to evaluate protein-ligand docking programs. Most structures in the dataset have little global conformational similarity, but some of them are selected from several major protein families, including 11 kinases, 9 nuclear receptors, 5 serine proteases, and 3 members of the phosphodiesterase family. This dataset was originally based on the statistical analysis and classification of all protein-ligand complexes in the PDB database. Starting from the sequence of protein, the structure of the ligand, and the structure of the complex, the clustering of proteins with similar sequences was used to display as many different proteins as possible. Characterization of the complexes was accomplished by extracting the structure of the ligand, based on the number of heavy atoms and rotatable bonds in the ligand, and determining whether it has therapeutic or agrochemical use in the complex. 90% of these complexes were related protein targets, contain similar drug complexes, and contain high-quality experimental structures that facilitate the evaluation of the ligand binding modes.

Validation of prediction LBSs and ligand spatial extension inside the binding pocket

To evaluate the prediction accuracy, we selected the set of protein residues \({R}_{P}\) whose Cα atoms were within 6 Å of any point in the rank-1 cluster \({O}_{m}\) and defined \({R}_{P}\) as the prediction LBSs. The prediction was compared with the set of LBSs \({R}_{L}\) found in the ligand-protein complex as collected from the SITE sections of the examined PDB files. The intersection of R_p and R_L defined a set of residues \({R}_{P\cap L}={R}_{P}\cap {R}_{L}\), recoding residues found in both the prediction and experimental determined LBSs. The overlap of the prediction with the experiment LBSs was then evaluated with the conventional prediction precision and recall: Precision = \({R}_{P\cap L}/{R}_{P}\), Recall = \({R}_{P\cap L}/{R}_{L}\).

Compared with DPA/FDPA, the new output of MDPA was a prediction of the spatial extension of ligands inside the predicted binding pocket. To evaluate the quality of predictions, we defined an evaluation strategy similar to evaluating LBS predictions. Let S_L be the set of atoms of a ligand found in the complex, and S_l be the subset of S_L whose elements are ligand atoms located within 3.5 Å of any test point of the predicted pocket \({O}_{M}\). Let \({O}_{m}\) be the subset of \({O}_{M}\), whose elements are test points located within 3.5 Å of any ligand atom. The overlap of the prediction with the spatial extension of the experimental ligand within the binding pocket was then assessed using precision and recall as follows: Precision = \({O}_{m}/ {O}_{M}\), and Recall = \({S}_{l}/ {S}_{L}\).

Compared with FPOCKET method

As a fast geometric stand-alone method, FPOCKET has become one of the most popular tools for analyzing protein structural pockets in recent years. The method is based on the determination of alpha spheres by Voronoi tessellation [25]. An alpha sphere is a sphere that contacts four atoms at the protein boundary, and the radius of the alpha sphere reflects the local curvature defined by the four atoms. For a protein, very small spheres are located inside the protein, large spheres are located outside the protein, and cracks and cavities correspond to spheres of intermediate radii. Therefore, spheres of different radius sizes are filtered to solve the pocket detection problem. The algorithm first inputs the PDB structure, then returns the pre-filtered set of spheres, and finally identifies groups of closely connected alpha spheres, returns the pocket information, and scores each pocket. FPOCKET typically gives a large number of predicted pockets, covering most known LBSs and ligand molecules, however, LBSs and ligands are not always associated with the top predicted pockets. For quantitative comparisons, similar precision and recall are also defined to evaluate the predictions of FPOCKET's top-ranked pockets, where test points are replaced with alpha spheres.

Results

MDPA detects regions causing large perturbation to protein dynamics

The lysozyme from turkey egg-white (PDB code: 1JEF [65]) has been used as an example to illustrate the performance of DPA. Like many other lysozymes from different organisms, this protein consists of two domains connected by a helix linker, with its substrate polysaccharide bound deep in between the two lobes (Fig. 1A). Experimental data and simulation calculations show that the enzyme is constantly switching between open and fully closed conformations by adjusting the distance between the two lobes [66, 67]. As mentioned above, from random protein surface points, DPA selects a subset of surface points that most perturb the protein conformational distribution, which are then predicted to be associated with the binding site for protein substrates (Fig. 1). In the case of lysozyme, the interactions introduced at those surface points predicted by DPA should intuitively have the highest efficiency in blocking the key intrinsic opening/closing motion of the enzyme. The original DPA algorithm only considers the first layer (~ 1.5 Å from the protein surface) surface points (Fig. 1B), and it has the obvious disadvantage that it provides little information about the position and spatial extension of the substrate in the binding pocket (Fig. 1C, F). In MDPA, multiple layers of surface points are created on the protein surface, each layer is added to the top of the previous layer, then a subset of surface points with high DPA values are selected for each layer, and finally, all selected surface points are collected and clustered to form predicted binding region (Fig. 1C,D,E,G). For turkey egg-white lysozyme, both DPA and MDPA predicted the same ligand binding sites, namely D52, Q57, I58, N59, W62, W63, A107, and W108, giving a recall rate of 62% and an accuracy rate of 83% compared to the sites listed in the SITE entries of the PDB file. Compared to DPA (Additional file 1: Table S1), since it uses multiple layers of surface points, the main improvement of MDPA lies in the overlap between the predicted spatial region and the physical occupancy of the bound ligand, from 67% in DPA increased to 100% in MDPA, with a slight sacrifice in accuracy from 87 to 82%.

As a second example, we applied MDPA to detect the ligand-binding pocket in the main protease of the recent global pandemic coronavirus. The COVID-19 caused by the novel coronavirus SARS-CoV-2 virus, coupled with the lack of targeted drugs, urgently needs to find new antiviral drugs. The main protease (M^pro) of coronaviruses has been identified as a key drug target for the development of inhibitors to block viral RNA protein processing. The overall structure of the protease (PDB code 6Y2F[68]) can be divided into 3 consecutive and mutually contacting domains (domain I, II, and III) corresponding to residues 10 to 99, 100 to 182, and 198 to 303, respectively. Figure 2 shows the 3 predicted inhibitor-binding regions in the structure given by MDPA, with the rank-1 predicted region located between domains I and II, and the 2nd and 3rd prediction regions between domains II and III. Compared to recently reported α-ketoamide inhibitors[68, 69], in which Cys145 mediates the nucleophilic reaction and Gly143 and Glu166 form hydrogen bonds with the inhibitor, the predicted rank-1 region perfectly matches where the α-ketoamide inhibitor binds within the pocket. The prediction identified important binding sites such as Glu166, His164, Cys145, Gly143, His41, and Arg188, which accounts for half of the listed ligand-binding residues. The predicted binding region overlapped with the α-ketoamide inhibitor with 89% accuracy and 81% recall.

Evaluation of MDPA for prediction of ligand-binding sites

One of the key features of protein pockets is the distribution of ligand-binding sites within them, which largely determines protein function. To evaluate the ability of MDPA to predict ligand-binding sites, MDPA was applied to the protein structures of the CCDC/Astex dataset of 85 protein-ligand complexes. The residues identified by 3-layered MDPA predictions were then compared to the ligand-binding sites listed in the SITE records for each complex structure (Table 1). MDPA made at least one prediction ligand-binding region for 94% of the listed complexes in the dataset, with the exception of the four cases where the ligand-binding region was embedded inside the internal protein cavities. Compared to the listed ligand binding sites in the PDB entry, the predictions were at least 30% accurate in 65% of cases and at least 50% accurate in 25% of cases; the recall rate of residues is at least 30% in 87% of cases and at least 50% in 71% of cases.

Table 1 Validation of MDPA method using the CCDC/Astex dataset

Full size table

Evaluation of MDPA for prediction of ligand spatial extension within the pocket

Compared with previous DPA studies, the development of the MDPA algorithm aims to improve the prediction of the spatial extension of ligands in protein pockets and provide useful information for subsequent ligand prediction and ligand-protein docking calculations. Any medium to large-sized ligand usually adopts a specific spatial distribution within the pocket, with parts of the ligand structure close to the protein and others away from the protein. Both DPA and MPA identify surface point clusters that impose high perturbation on the overall dynamics of the protein through interactions with neighboring protein amino acids, and according to this method, it is easier to detect interacting point sets closer to the protein surface. A key step in MDPA is to expand the collection of highly perturbing points at increasing distances from the protein surface in a layer-by-layer manner. Therefore, it would be interesting to verify the correlation between the direction of the predicted region increase and the spatial extension of ligands in the pocket. As an example, Fig. 3 shows that MPDA successfully computed the growth of the predicted region along the direction of the spatial distribution of the ligand 6-substituted 2-naphthamidine inhibitor in the human urokinase pocket (PDB code: 1OWE[70]). In this case, the predicted overlap of the binding region with the protein inhibitor increased from 30% in DPA to 100% in MDPA. To evaluate its ability to predict the spatial distribution of ligands in the pocket, MDPA was applied to the CCDC/Astex dataset of 85 protein-ligand complexes, and the overlap ratios of the predicted regions with the natural ligands found in the complexes were calculated (Table 1). Calculations showed that the predicted binding region of MDPA overlapped with the natural ligand in 82% of the complexes. Among them, the prediction accuracy is at least 30% in 87% of cases and at least 50% in 62% of cases, with the recall rate at least 30% in 95% of cases and at least 50% in 80% of cases.

Discussions

Compared to FPOCKET and CAVIAR

For comparison, we selected two free, standalone protein pocket prediction programs, FPOCKET [25] and CAVIAR, to examine MDPA by comparing their predictions of protein pockets of the complexes in the dataset (Additional file 1: Tables S2 and S3). FPOCKET identifies cavities through spherical probes, while CAVIAR detects pockets according to the grid algorithm, so most binding sites can be presented by a large number of pockets. Although calculations show that FPOCKET and CAVIAR can effectively identify the ligand-binding pockets of most proteins, the overlap rate between the predicted point sets and the bound ligand of the complexes is relatively small, with an average of < 10%, which provides little information about the ligand-binding position and spatial extension inside the pockets. Compared with FPOCKET and CAVIAR (Table 2), MDPA has higher accuracy in predicting binding sites (60%) and ligand spatial extension (71%), avoiding redundancy. We also noted that if the pocket is completely embedded inside the protein, MDPA does not generate any surface points in this region with the MSMS program, and its predicted binding region has zero overlap with the corresponding ligand (Fig. 4, PDB code: 1R1H[71]). Therefore, MDPA does not work for pockets buried in the protein core, whereas both FPOCKET and CAVIAR may correctly predict the embedded binding pocket. In this sense, MDPA provides a better description of the ligand-binding pose in pockets located on the protein’s outer surface, and this information is expected to be useful for further molecular docking calculations and other ligand–protein interaction characterizations.

Table 2 MDPA is compared with FPOCKET and CAVIAR for the prediction of binding pockets of the structures in the CCDC/Astex dataset

Full size table

Combining deep-learning models to improve the prediction accuracy of ligand binding pockets

Data-driven methods have long been used to predict important ligand binding residues in proteins and physicochemical properties involving ligand-protein interactions [19]. Recent years have seen an increasing number of deep-learning models reported to predict ligand-binding pockets or ligand-binding sites [19, 42, 72,73,74,75]. For example, Zhang and colleagues [76] developed a free and standalone deep-learning model called DeepbindPoc, which uses a 3D convolutional neural network to rank the predicted pockets identified by FPOCKET to screen for the pockets most likely to bind natural ligands. An interesting feature of DeepbindPoc is that the algorithm uses the mol2vec tool to add information about natural ligands to the training dataset, so it has the advantage of ranking near-native pockets for proteins with unknown binding sites but whose ligands are known in advance, and it may also help predict natural ligands for a given protein. Combining DeepbindPoc with MDPA, DeepbindPoc helps reprioritize the predicted regions identified by MDPA by using natural ligands(Additional file 1: Table S4). Of the 85 complexes, the highest-scoring pockets given by DeepbindPoc were consistent with the rank-1 pockets predicted by MDPA in 51% of cases, while the remaining 48% were different. In the cases where the two do not match, there are pros and cons between MDPA prediction ranking and DeepbindPoc scoring. For example, in the case of the influenza virus neuraminidase-inhibitor complex (PDB code: 1L7F [77]), DeepbindPoc gave the highest score to the fourth-ranked MDPA pocket, which does bind the ligand BCX-1812, while the rank-1 MDPA pocket has no binding ligand. However, in most cases (72%), the pockets screened by DeepbindPoc score do not seem as reasonable as the top-ranked pockets predicted by MDPA. The DeepbindPoc scoring function provides some references for MDPA-predicted pockets that do not bind to natural ligands. Therefore, screening candidate ligands of these pockets using tools such as DeepbindPoc is worth further study.

Effects of protein conformation changes on protein pocket detection

Just as flexible docking or flexible ligand recognition [78, 79] must take into account a certain degree of flexibility in the pocket side chain or backbone, it may be useful to examine the effect of protein conformational changes on the prediction of binding pockets. As an example, the human lysozyme was examined here, where structural changes in the binding pocket and different substrate binding modes have been characterized (PDB code: 1LZS [80]). The typical opening/closing conformational changes are simulated by the lowest frequency normal mode solved in MDPA (Fig. 5), where the entire structure rotates around a central axis connecting the two lobs.

Since no direct measurement data were available for the magnitude of the conformational change defined by the normal mode, a series of 12 conformations were generated, evenly distributed through a full cycle, with amplitude \(A\mathrm{cos}(i\pi /12), i=\mathrm{1,2},\dots ,12\), where A is an arbitrary value. To determine the maximum possible value of A, the Procheck[81] program was used to examine the distribution of unreasonable dihedral angles in the resulting conformations. Specifically, the largest A value was chosen under the condition that the Procheck program reports that at least 90% of the dihedral angles are within a reasonable area. MDPA was then applied to each conformation, and the predictions were compared with ligands for which binding sites were found (Table 3). The pocket was given a variety of flexibility from open to closed conformations, which mimics the conformational sampling of a flexible pocket. In the absence of conformational changes, the ligand-binding site predictions achieved 20% accuracy and 100% recall, and the ligand-spatial extension prediction achieved 80% accuracy and 100% recall. For comparison, a large conformational change \(A\) =200 and a medium-sized conformational change \(A=100\) were applied to the protein (Table 3). Calculations showed that the predictions given by MDPA vary slightly with conformational changes. When \(A=100\), the precision rate of LBS prediction reaches 31% in conformation 4 and 8, and the prediction accuracy of ligand spatial extension increases to 86%. In contrast, for the conformations derived by setting \(A\) =200, the highest ligand spatial extension prediction accuracy is only 83%. The calculations revealed that properly increasing the flexible deformation of protein conformation can improve the accuracy of ligand spatial extension prediction. This may be particularly useful for predicting ligand-binding regions within relatively large binding pockets.

Table 3 Comarison of MDPA predictions of human lysozyme (PDB code: 1LZS) undergoing different conformation changes by applying the lowest frequency normal mode motion with different amplitudes

Full size table

On the other hand, we also noticed that, in many cases, ligand binding might lead to protein conformation changes that are not well described by the low-frequency normal mode motions described above, such as induced-fit conformational changes that cause strong inelastic deformations. In this case, it might be difficult to make accurate LBS predictions using the apo protein structures. Therefore, comparing LBS predictions based on apo structures with those based on complex structures may be another topic worthy of further study.

Conclusion

In this study, a multilayer dynamic perturbation analysis method (MDPA) was developed to predict ligand-binding pockets in proteins. This structural-based method is a direct improvement on our previously established method. The method has been validated using a standard ligand-protein complex dataset, with an LBS prediction accuracy of at least 30% in 65% of cases and recall of at least 30% in 87% of cases. One of the key features of MDPA is that MDPA is designed to predict the spatial extension of ligands within the binding pocket. The predicted binding region overlapped with the natural ligand in 82% of the complexes in the test dataset, with a prediction accuracy of at least 30% in 87% of the cases and a recall of at least 30% in 95% of cases, indicating that MDPA can give a reasonable prediction for protein pickets.

The combination of MDPA with deep learning methods based on ligand-protein interaction big data (such as DeepbindPoc) provides a good starting point for MDPA to study further biomolecular functions, including the prediction of natural ligands and their involved physicochemical interactions.

Availability of data and materials

All data is provided; the MDPA program code is also uploaded to https://github.com/mingdengming/mdpa

References

Lins L, Thomas A, Brasseur R. Analysis of accessible surface of residues in proteins. Protein Sci. 2003;12(7):1406–17.
Article CAS PubMed PubMed Central Google Scholar
DeWitte RS, Ishchenko AV, Shakhnovich EI. SMoG: de novo design method based on simple, fast, and accurate free energy estimates. 2. Case studies in molecular design. J Am Chem Soc. 1997;119(20):4608–17.
Article CAS Google Scholar
Meyers J, Brown N, Blagg J. Mapping the 3D structures of small molecule binding sites. J Cheminformatics. 2016. https://doi.org/10.1186/s13321-016-0180-0.
Article Google Scholar
Monzon AM, et al. Conformational diversity analysis reveals three functional mechanisms in proteins. PLoS Comput Biol. 2017;13(2): e1005398.
Article PubMed PubMed Central Google Scholar
Shen Q, et al. Proteome-scale investigation of protein allosteric regulation perturbed by somatic mutations in 7000 cancer genomes. Am J Hum Genet. 2017;100(1):5–20.
Article CAS PubMed Google Scholar
Bhagavat R, et al. An augmented pocketome: detection and analysis of small-molecule binding pockets in proteins of known 3D structure. Structure. 2018;26(3):499-512 e2.
Article CAS PubMed Google Scholar
Sun J, Xia Y, Ming D. Whole-genome sequencing and bioinformatics analysis of apiotrichum mycotoxinivorans: Predicting putative zearalenone-degradation enzymes. Front Microbiol. 2020;11:1866.
Article PubMed PubMed Central Google Scholar
Rose PW, et al. The RCSB protein data bank: views of structural biology for basic and applied research and education. Nucleic Acids Res. 2015;43(D1):D345–56.
Article CAS PubMed Google Scholar
Ito J, et al. PoSSuM: a database of similar protein-ligand binding and putative pockets. Nucleic Acids Res. 2012;40(D1):D541–8.
Article CAS PubMed Google Scholar
Yang J, Roy A, Zhang Y. BioLiP: a semi-manually curated database for biologically relevant ligand-protein interactions. Nucleic Acids Res. 2013;41(D1):D1096–103.
Article CAS PubMed Google Scholar
Anand P, et al. PLIC: protein-ligand interaction clusters. Database (Oxford), 2014. 2014(0): p. bau029
Desaphy J, et al. sc-PDB: a 3D-database of ligandable binding sites–10 years on. Nucleic Acids Res. 2015;43(D1):D399–404.
Article CAS PubMed Google Scholar
Liu Z, et al. PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics. 2015;31(3):405–12.
Article CAS PubMed Google Scholar
Naderi M, Govindaraj RG, Brylinski M. eModel-BDB: a database of comparative structure models of drug-target interactions from the binding database. Gigascience. 2018. https://doi.org/10.1093/gigascience/giy091.
Article PubMed PubMed Central Google Scholar
Smith RD, et al. Updates to binding MOAD (mother of all databases): polypharmacology tools and their utility in drug repurposing. J Mol Biol. 2019;431(13):2423–33.
Article CAS PubMed PubMed Central Google Scholar
Vajda S, Guarnieri F. Characterization of protein-ligand interaction sites using experimental and computational methods. Curr Opin Drug Discov Devel. 2006;9(3):354–62.
CAS PubMed Google Scholar
Baek M, et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021. https://doi.org/10.1126/science.abj8754.
Article PubMed PubMed Central Google Scholar
Jumper J, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021. https://doi.org/10.1038/s41586-021-03819-2.
Article PubMed PubMed Central Google Scholar
Krivak R, Hoksza D. P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure. J Cheminform. 2018;10(1):39.
Article PubMed PubMed Central Google Scholar
Zhao J, Cao Y, Zhang L. Exploring the computational methods for protein-ligand binding site prediction. Comput Struct Biotechnol J. 2020;18:417–26.
Article CAS PubMed PubMed Central Google Scholar
Levitt DG, Banaszak LJ. POCKET: a computer graphics method for identifying and displaying protein cavities and their surrounding amino acids. J Mol Graph. 1992;10(4):229–34.
Article CAS PubMed Google Scholar
Hendlich M, Rippmann F, Barnickel G. LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. J Mol Graph Model. 1997;15(6):359–63.
Article CAS PubMed Google Scholar
Laskowski RA. SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. J Mol Graph. 1995;13(5):323–30.
Article CAS PubMed Google Scholar
Brady GP, Stouten PFW. Fast prediction and visualization of protein binding pockets with PASS. J Comput Aided Mol Des. 2000;14(4):383–401.
Article CAS PubMed Google Scholar
Le Guilloux V, Schmidtke P, Tuffery P. Fpocket: an open source platform for ligand pocket detection. BMC Bioinformatics. 2009;10:168.
Article PubMed PubMed Central Google Scholar
Laurie AT, Jackson RM. Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites. Bioinformatics. 2005;21(9):1908–16.
Article CAS PubMed Google Scholar
Ghersi D, Sanchez R. EasyMIFS and SiteHound: a toolkit for the identification of ligand-binding sites in protein structures. Bioinformatics. 2009;25(23):3185–6.
Article CAS PubMed PubMed Central Google Scholar
Zhang Z, et al. Identification of cavities on protein surface using multiple computational approaches for drug binding site prediction. Bioinformatics. 2011;27(15):2083–8.
Article CAS PubMed Google Scholar
Yang J, Roy A, Zhang Y. Protein-ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment. Bioinformatics. 2013;29(20):2588–95.
Article CAS PubMed PubMed Central Google Scholar
Lee HS, Im W. Ligand binding site detection by local structure alignment and its performance complementarity. J Chem Inf Model. 2013;53(9):2462–70.
Article CAS PubMed Google Scholar
le Hung V, et al. LIBRA: ligand binding site recognition application. Bioinformatics. 2015;31(24):4020–2.
CAS PubMed Google Scholar
Gao J, et al. bSiteFinder, an improved protein-binding sites prediction server based on structural alignment: more accurate and less time-consuming. J Cheminform. 2016;8:38.
Article PubMed PubMed Central Google Scholar
Brylinski M. Local alignment of ligand binding sites in proteins for polypharmacology and drug repositioning. Methods Mol Biol. 2017;1611:109–22.
Article CAS PubMed PubMed Central Google Scholar
Gallo Cassarino T, Bordoli L, Schwede T. Assessment of ligand binding site predictions in CASP10. Proteins. 2014;82(Suppl 2):154–63.
Article CAS PubMed Google Scholar
Caffrey DR, et al. Are protein-protein interfaces more conserved in sequence than the rest of the protein surface? Protein Sci. 2004;13(1):190–202.
Article CAS PubMed PubMed Central Google Scholar
Gutteridge A, Bartlett GJ, Thornton JM. Using a neural network and spatial clustering to predict the location of active sites in enzymes. J Mol Biol. 2003;330(4):719–34.
Article CAS PubMed Google Scholar
Ofran Y, Rost B. ISIS: interaction sites identified from sequence. Bioinformatics. 2007;23(2):e13–6.
Article CAS PubMed Google Scholar
Kauffman C, Karypis G. LIBRUS: combined machine learning and homology information for sequence-based ligand-binding residue prediction. Bioinformatics. 2009;25(23):3099–107.
Article CAS PubMed PubMed Central Google Scholar
Qiu ZJ, Wang XC. Improved prediction of protein ligand-binding sites using random forests. Protein Pept Lett. 2011;18(12):1212–8.
Article CAS PubMed Google Scholar
Chen P, Huang JHZ, Gao X. LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone. Bmc Bioinform. 2014. https://doi.org/10.1186/1471-2105-15-S15-S4.
Article Google Scholar
Shrihari S, Pinak C. Prediction of active site cleft using support vector machines. J Chem Inf Model. 2010;50(12):2266–73.
Article Google Scholar
Jimenez J, et al. DeepSite: protein-binding site predictor using 3D-convolutional neural networks. Bioinformatics. 2017;33(19):3036–42.
Article CAS PubMed Google Scholar
Pu LM, et al. DeepDrug3D: classification of ligand-binding pockets in proteins with a convolutional neural network. Plos Comput Biol. 2019;15(2):e1006718.
Article CAS PubMed PubMed Central Google Scholar
Zhang HP, et al. DeepBindPoc: a deep learning method to rank ligand binding pockets using molecular vector representation. Peerj. 2020;8:e8864–e8864.
Article PubMed PubMed Central Google Scholar
Simonovsky M, Meyers J. deeplytough: learning structural comparison of protein binding sites. J Chem Inf Model. 2020;60(4):2356–66.
Article CAS PubMed Google Scholar
Ming D, Wall ME. Interactions in native binding sites cause a large change in protein dynamics. J Mol Biol. 2006;358(1):213–23.
Article CAS PubMed Google Scholar
Ming D, Cohn JD, Wall ME. Fast dynamics perturbation analysis for prediction of protein functional sites. BMC Struct Biol. 2008;8:5.
Article PubMed PubMed Central Google Scholar
Sanner MF, Olson AJ, Spehner JC. Reduced surface: an efficient way to compute molecular surfaces. Biopolymers. 1996;38(3):305–20.
Article CAS PubMed Google Scholar
Jones G, et al. Development and validation of a genetic algorithm for flexible docking. J Mol Biol. 1997;267(3):727–48.
Article CAS PubMed Google Scholar
Feinstein WP, Brylinski M. Calculating an optimal box size for ligand docking and virtual screening against experimental and predicted binding pockets. J Cheminform. 2015;7:18.
Article PubMed PubMed Central Google Scholar
Hocker HJ, Rambahal N, Gorfe AA. LIBSA–a method for the determination of ligand-binding preference to allosteric sites on receptor ensembles. J Chem Inf Model. 2014;54(2):530–8.
Article CAS PubMed PubMed Central Google Scholar
Heo L, et al. GalaxySite: ligand-binding-site prediction by using molecular docking. Nucleic Acids Res. 2014;42(W1):W210–4.
Article CAS PubMed PubMed Central Google Scholar
Yang Y, Qian J, Ming D. Docking polysaccharide to proteins that have a Tryptophan box in the binding pocket. Carbohydr Res. 2015;414:78–84.
Article CAS PubMed Google Scholar
Sayama M, et al. Probing the hydrophobic binding pocket of G-protein-coupled lysophosphatidylserine receptor GPR34/LPS1 by docking-aided structure-activity analysis. J Med Chem. 2017;60(14):6384–99.
Article CAS PubMed Google Scholar
Desaphy J, et al. Encoding protein-ligand interaction patterns in fingerprints and graphs. J Chem Inf Model. 2013;53(3):623–37.
Article CAS PubMed Google Scholar
Hartshorn MJ, et al. Diverse, high-quality test set for the validation of protein-ligand docking performance. J Med Chem. 2007;50(4):726–41.
Article CAS PubMed Google Scholar
Marchand JR, et al. CAVIAR: a method for automatic cavity detection, description and decomposition into subcavities. J Comput Aided Mol Des. 2021;35(6):737–50.
Article CAS PubMed Google Scholar
Morris GM, et al. AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. J Comput Chem. 2009;30(16):2785–91.
Article CAS PubMed PubMed Central Google Scholar
Eberhardt J, et al. AutoDock Vina 1.2.0: new docking methods, expanded force field, and python bindings. J Chem Inf Model. 2021;61(8):3891–8.
Article CAS PubMed Google Scholar
Ma J. Usefulness and limitations of normal mode analysis in modeling dynamics of biomolecular complexes. Structure. 2005;13(3):373–80.
Article CAS PubMed Google Scholar
Ming D, Wall ME. Allostery in a coarse-grained model of protein dynamics. Phys Rev Lett. 2005;95(19): 198103.
Article PubMed Google Scholar
Atilgan AR, et al. Anisotropy of fluctuation dynamics of proteins with an elastic network model. Biophys J. 2001;80(1):505–15.
Article CAS PubMed PubMed Central Google Scholar
Tirion MM. Large amplitude elastic motions in proteins from a single-parameter atomic analysis. Phys Rev Lett. 1996;77(9):1905–8.
Article CAS PubMed Google Scholar
Ankerst M, Breunig MM, Kriegel HP, Sander J. OPTICS: ordering points to identify the clustering structur. In: Proceedings of the ACM SIGMON International Conference on Management of Data. 1999; 28: 49–60
Harata K, Muraki M. X-ray structure of turkey-egg lysozyme complex with tri-N-acetylchitotriose. Lack of binding ability at subsite A. Acta Crystallogr D Biol Crystallogr. 1997;53(6):650–7.
Article CAS PubMed Google Scholar
McHaourab HS, et al. Conformation of T4 lysozyme in solution. Hinge-bending motion and the substrate-induced conformational transition studied by site-directed spin labeling. Biochemistry. 1997;36(2):307–16.
Article CAS PubMed Google Scholar
Goto NK, et al. What is the average conformation of bacteriophage T4 lysozyme in solution? a domain orientation study using dipolar couplings measured by solution NMR11Edited by P E Wright. J Mole Biol. 2001;308(4):745–64.
Article CAS Google Scholar
Zhang L, et al. Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved & #x3b1;-ketoamide inhibitors. Science. 2020;368(6489):409–12.
Article CAS PubMed PubMed Central Google Scholar
Sabbah DA, et al. An updated review on SARS-CoV-2 main proteinase (M(Pro)): protein structure and small-molecule inhibitors. Curr Top Med Chem. 2021;21(6):442–60.
Article CAS PubMed Google Scholar
Wendt MD, et al. Identification of novel binding interactions in the development of potent, selective 2-naphthamidine inhibitors of urokinase synthesis, structural analysis, and SAR of N-phenyl amide 6-substitution. J Med Chem. 2004;47(2):303–24.
Article CAS PubMed Google Scholar
Oefner C, et al. Structural analysis of neprilysin with various specific and potent inhibitors. Acta Crystallogr D Biol Crystallogr. 2004;60(Pt 2):392–6.
Article PubMed Google Scholar
Saberi Fathi SM, Tuszynski JA. A simple method for finding a protein’s ligand-binding pockets. BMC Struct Biol. 2014;14:18.
Article PubMed PubMed Central Google Scholar
Pu L, et al. DeepDrug3D: classification of ligand-binding pockets in proteins with a convolutional neural network. PLoS Comput Biol. 2019;15(2): e1006718.
Article CAS PubMed PubMed Central Google Scholar
Aggarwal R, et al. deeppocket: ligand binding site detection and segmentation using 3D convolutional neural networks. J Chem Inf Model. 2021
Kandel J, Tayara H, Chong KT. PUResNet: prediction of protein-ligand binding sites using deep residual neural network. J Cheminform. 2021;13(1):65.
Article CAS PubMed PubMed Central Google Scholar
Zhang H, et al. DeepBindPoc: a deep learning method to rank ligand binding pockets using molecular vector representation. PeerJ. 2020;8: e8864.
Article PubMed PubMed Central Google Scholar
Smith BJ, et al. Structural studies of the resistance of influenza virus neuramindase to inhibitors. J Med Chem. 2002;45(11):2207–12.
Article CAS PubMed Google Scholar
Shin WH, Seok C. GalaxyDock: protein-ligand docking with flexible protein side-chains. J Chem Inf Model. 2012;52(12):3225–32.
Article CAS PubMed Google Scholar
Ollikainen N, de Jong RM, Kortemme T. Coupling protein side-chain and backbone flexibility improves the re-design of protein-ligand specificity. PLoS Comput Biol. 2015;11(9): e1004335.
Article PubMed PubMed Central Google Scholar
Song H, et al. Structural changes of active site cleft and different saccharide binding modes in human lysozyme co-crystallized with hexa-N-acetyl-chitohexaose at pH 4.0. J Mol Biol. 1994;244(5):522–40.
Article CAS PubMed Google Scholar
Laskowski RA, et al. PROCHECK: a program to check the stereochemical quality of protein structures. J Appl Crystallogr. 1993;26(2):283–91.
Article CAS Google Scholar

Download references

Acknowledgements

We are grateful to the High-Performance Computing Center of Nanjing Tech University for supporting the computational resources.

Funding

This work was supported, in part, by the National Key Research and Development Program of China, 2019YFA0905700, 2021YFC2102700.

Author information

Authors and Affiliations

College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Biotech Building Room B1-404, 30 South Puzhu Road, Jiangbei New District, Nanjing City, 211816, Jiangsu, People’s Republic of China
Lin Gu, Bin Li & Dengming Ming

Authors

Lin Gu
View author publications
You can also search for this author in PubMed Google Scholar
Bin Li
View author publications
You can also search for this author in PubMed Google Scholar
Dengming Ming
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.M. designed the study and made the funding acquisition. L.G. performed MDPA calculations and validations. B.L. did the DBP calculation and validations. D.M and L.G. wrote the main manuscript. L.G. prepared figures and tables. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Dengming Ming.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Table S1.

Prediction results of DPA for protein structures in CCDC/Asterx dataset; Table S2. Prediction results of FPOCKET for protein structures in CCDC/Asterx dataset; Table S3. Prediction results of CAVIAR for protein structures in CCDC/Asterx dataset; Table S4. Selection of optimal pockets for protein structures in CCDC/Asterx dataset by combining DeepbindPoc; Figure S1. MDPA calculation results of SARS-COV-2.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Gu, L., Li, B. & Ming, D. A multilayer dynamic perturbation analysis method for predicting ligand–protein interactions. BMC Bioinformatics 23, 456 (2022). https://doi.org/10.1186/s12859-022-04995-2

Download citation

Received: 09 May 2022
Accepted: 19 October 2022
Published: 02 November 2022
DOI: https://doi.org/10.1186/s12859-022-04995-2

A multilayer dynamic perturbation analysis method for predicting ligand–protein interactions

Abstract

Background

Results

Conclusions

Introduction

Methods

Implementation of MDPA

Validation dataset

Validation of prediction LBSs and ligand spatial extension inside the binding pocket

Compared with FPOCKET method

Results

MDPA detects regions causing large perturbation to protein dynamics

Evaluation of MDPA for prediction of ligand-binding sites

Evaluation of MDPA for prediction of ligand spatial extension within the pocket

Discussions

Compared to FPOCKET and CAVIAR

Combining deep-learning models to improve the prediction accuracy of ligand binding pockets

Effects of protein conformation changes on protein pocket detection

Conclusion

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Supplementary Information

Additional file 1: Table S1.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us