- Open Access
Protein-ligand binding region prediction (PLB-SAVE) based on geometric features and CUDA acceleration
© Lo et al.; licensee BioMed Central Ltd. 2013
- Published: 8 March 2013
Protein-ligand interactions are key processes in triggering and controlling biological functions within cells. Prediction of protein binding regions on the protein surface assists in understanding the mechanisms and principles of molecular recognition. In silico geometrical shape analysis plays a primary step in analyzing the spatial characteristics of protein binding regions and facilitates applications of bioinformatics in drug discovery and design. Here, we describe the novel software, PLB-SAVE, which uses parallel processing technology and is ideally suited to extract the geometrical construct of solid angles from surface atoms. Representative clusters and corresponding anchors were identified from all surface elements and were assigned according to the ranking of their solid angles. In addition, cavity depth indicators were obtained by proportional transformation of solid angles and cavity volumes were calculated by scanning multiple directional vectors within each selected cavity. Both depth and volume characteristics were combined with various weighting coefficients to rank predicted potential binding regions.
Two test datasets from LigASite, each containing 388 bound and unbound structures, were used to predict binding regions using PLB-SAVE and two well-known prediction systems, SiteHound and MetaPocket2.0 (MPK2). PLB-SAVE outperformed the other programs with accuracy rates of 94.3% for unbound proteins and 95.5% for bound proteins via a tenfold cross-validation process. Additionally, because the parallel processing architecture was designed to enhance the computational efficiency, we obtained an average of 160-fold increase in computational time.
In silico binding region prediction is considered the initial stage in structure-based drug design. To improve the efficacy of biological experiments for drug development, we developed PLB-SAVE, which uses only geometrical features of proteins and achieves a good overall performance for protein-ligand binding region prediction. Based on the same approach and rationale, this method can also be applied to predict carbohydrate-antibody interactions for further design and development of carbohydrate-based vaccines. PLB-SAVE is available at http://save.cs.ntou.edu.tw.
- Solid Angle
- Binding Region
- Query Protein
- Compute Unify Device Architecture
- Depth Indicator
The study of protein binding site prediction assists in understanding the mechanisms and principles of molecular recognition, provides information for drug design and vaccine development, and enables more detailed annotation of function in protein databases and in the construction of visual displays of protein-protein interaction networks [1, 2]. In recent years, various in silico methods for prediction of protein-protein and protein-ligand binding sites have been developed , but as the number of known protein structures and protein-complex structures has grown exponentially in the last decade, a fast and effective algorithm to identify binding regions of a protein is still urgently needed. An especially important application is carbohydrate vaccine development. This has gained much attention in recent years as a new strategy against pathogen infection and cancers, and the prediction of binding pockets between a glycan and antibody could be very valuable in the development of carbohydrate-based therapeutics . The binding affinity of a carbohydrate-based antibody is normally weaker than that of a protein-based antibody. A tool for predicting properties of carbohydrate binding sites could therefore provide sufficient information for the development of carbohydrate-based vaccines. Historically, several different approaches based on geometric characteristics, physicochemical properties, or combinations of these have been used to predict regions of protein interaction. For example, an algorithm using surface complementarity, calculated from the Connolly surfaces and geometric characteristics of proteins, has been used to model protein-protein interactions , and physical shape characteristics are frequently used to analyze and identify surface interfaces such as accessible surface areas [5, 6], sequence conservation [7, 8], and amino acid composition . In addition, a number of different approaches have used Fourier-based concepts, transforming a three-dimensional grid onto a set of orthogonal basis functions, and calculating overlapping areas using Fast Fourier Transform techniques [10–12]. Another approach is to consider the physicochemical properties of interface residues using statistical methods to predict binding sites. For example, aliphatic and aromatic residues are found at interface regions at a higher frequency compared with charged residues, and several methods have exploited this observation by examining the specific composition of amino acids in surface regions to predict binding sites [13–15]. Although most previous methods for predicting protein binding regions have adopted similar approaches for analyzing protein-protein interfaces and protein-ligand binding regions, these two major types of binding exhibit different characteristics such as binding architecture and binding region size . Here, we designed an improved prediction system for protein-ligand binding, in which the query proteins are assumed to be rigid and their geometric characteristics such as solid angle, cavity depth, and volume are considered. In keeping with most existing algorithms, we also used shape complementary as the primary filter to rank all potential binding regions. In addition, we considered a grid-based construction of structure for surface residue identification and used parallel processing mechanisms for more efficient computation on geometric features. Thus, irregularly shaped cavities and pockets on the protein surface can be efficiently identified and placed in a rank order of potential protein-ligand binding regions.
In our approach, we used the concept of the solid angle and its associated features as the main geometric attributes for analysis of protein-ligand binding potential. Connolly proposed the solid angle approach to examine protein surface binding characteristics such that if two three-dimensional shapes fit together, then the sum of their two solid angles equals 4π in three-dimensional space . There are two main methods for computing solid angles: the first approach uses the Gauss-Bonnet theorem to find solid angles subtended by surface regions; the second approach calculates the steradian formed by a virtual sphere on the protein surface, and then divides this by the square of the radius of the virtual sphere. Both methods calculate the solid angle of a specified surface region. Several researchers adopted the solid angle approach, and valuable results have been published in the fields of protein docking [18, 19] and structure alignment . Due to the huge number of atoms on a protein surface and the resulting demand on computational power and time for solid angle calculations, we used Compute Unified Device Architecture (CUDA) technology (NVIDIA Corporation, Santa Clara, CA) to enhance execution speed of the proposed algorithms. CUDA is a parallel computing architecture that utilizes graphics processing units (GPUs) for general-purpose computing. GPUs were originally employed to speed up graphics display and could quickly and easily generate multiple threads. In addition, floating point operations and memory bandwidth performance are much faster with GPUs than with central processing units (CPUs), as the multi-core architecture allows each thread to perform an identical computing task simultaneously . Since the introduction of CUDA in 2007, harnessing the power of the GPU has become easier, and recently, numerous GPU-based algorithms have been proposed in bioinformatics for sequence alignment [21–24], protein docking , surface area calculations [26, 27], molecular dynamic simulations , and in systems biology . Here, we use CUDA architecture to reduce computational time and develop an effective prediction system to identify binding regions by evaluating the geometric features of solid angle, depth, and volume of a cavity on a protein surface. Based on performance comparisons with other methods and validation of the predictions via experimental data, our algorithm, PLB-SAVE, is effective for detecting protein-ligand binding regions, and we believe it has considerable potential in drug and vaccine development.
Grid-based surface structure construction
The imported protein structure file, in PDB format , contains complete spatial coordinate information obtained by X-ray crystallography, NMR spectroscopy, cryo-electron microscopy, or in silico prediction methods. In this step, the coordinates of atoms and their corresponding van der Waals radii are transformed into corresponding volumetric pixels (voxels) within a grid structure. This facilitates rapid identification of protein surfaces and allows efficient calculation of solid angles for each atom. After discretization processes, the query protein is represented as a set of discrete voxels that are categorized as inside (buried) or outside (surface) portions of the query protein.
Solid angle computation
Identifying surface anchor residues and clustering
Because we are trying to identify binding cavities in the query protein, only those surface voxels possessing solid angles in the highest 20% were clustered into representative groups in order. Two surface voxels would be clustered into the same group if they are neighboring voxels located within a threshold distance of 20 Å and both voxels have high solid angles at a similar level. The surface voxel with the largest solid angle within the selected cluster is deemed the representative anchor for the group.
Geometric characteristics calculation
After the assignment of clustered groups and representative anchors, the algorithm calculates additional geometric characteristics for each group, including cavity depth and volume of the identified anchor regions. These selected characteristics are required to be rotation- and translation-invariant, and most importantly, must be feasible and efficient for protein-ligand binding analysis. The efficacious geometric characteristics are described below.
Cavity depth calculation
Cavity volume calculation
Binding region prediction
RV(v i ) is the ranked value for anchor voxel v i , CD(v i )avg is the value of average depth for v i , CDmax is the maximum depth of the query protein, CV(v i ) is the volume of v i , CVmax is the maximum volume of the query protein, and the sum of both weighting coefficients, and , is equal to 1.
Parallel computing architecture by CUDA
The CUDA Toolkit, version 4.0 (Nvidia Corporation) and Visual Studio 2010 (Microsoft Corporation, Redmond, WA) were used to implement PLB-SAVE on an Intel® Core™ i7-2600 Processor operating at 3.40 GHz, with a 16 GB DDR3 memory and a GeForce GTX 580 graphics card (Nvidia Corporation) using the Microsoft Windows 7 operating system. In order to compare performance, PLB-SAVE was implemented onto two platforms: one with CPU architecture alone, and another with CUDA-computing architecture. Two datasets contain various sizes of proteins will be evaluated through two different computing architectures individually.
Experimental datasets and measurements
The protein structure datasets used for testing included two types of bound and unbound proteins, collected from LigASite version 9.5 (http://www.bigre.ulb.ac.be/Users/benoit/LigASite/index.php) . Each dataset contained 388 representative and non-redundant protein structures, and the binding sites of each protein were also provided for method validation. Five evaluation parameters were calculated to compare the performance with other prediction systems, including sensitivity, specificity, accuracy, positive predictive value (PPV), and Matthew's correlation coefficient (MCC). These parameters were calculated using the following equations:
where TP is the number of true binding sites correctly predicted by our system to be binding sites; FP is the number of non-binding sites incorrectly predicted to be binding sites; TN is the number of non-binding sites correctly predicted not binding sites; FN is the number of true binding sites incorrectly predicted as non-binding sites. In this study, if the top 1 to top 3 predicted binding regions are indeed located at the true binding pocket sites, the prediction is claimed as a successful trial and the numbers of predicted binding and non-binding sites will be applied to evaluate all measurements.
Performance of PLB-SAVE
Performance of PLB-SAVE evaluated under tenfold cross-validation.
To demonstrate the superior performance of PLB-SAVE, we compared the prediction results with two existing methods: SiteHound  and MetaPocket v2.0 (MPK2) . SiteHound identified ligand binding sites by computing the interactions between a chemical probe and a protein structure, and it used the profiles of the affinity map and total interaction energy to rank predicted binding sites. MPK2 integrated eight approaches including LIGSITECSC , PASS , QsiteFinder , SURFNET , Fpocket , GHECOM, ConCavity , and POCASA , and combined predicted pocket sites from eight methods through consensus pocket analysis to improve the prediction success rate.
Prediction results of PLB-SAVE, SiteHound, and MPK2 using the APO dataset (388 proteins).
PLB-SAVE (373 proteins)
SiteHound (373 proteins)
PLB-SAVE (342 proteins)
MPK2 (342 proteins)
Prediction results of PLB-SAVE, SiteHound and MPK2 using the HOLO dataset.
PLB-SAVE (374 proteins)
SiteHound (374 proteins)
PLB-SAVE (339 proteins)
MPK2 (339 proteins)
Computational performance by CUDA
The use of the geometric construction of solid angles in molecular modeling was originally proposed as early as 1986 by Connolly. It is powerful and is frequently applied to verify the uneven nature of binding surfaces in three-dimensional space. Here, we included consideration of two additional geometric features of the surface anchor residues--depth and volume of the potential cavities-based on their ranked solid angles. We developed an efficient and effective identification system for predicting protein-ligand binding regions using a novel approach based on the combinatorial capabilities of CUDA parallel processing technology. The designed program, PLB-SAVE, included algorithms for calculating solid angles, clustering processes, anchor determination, and derived geometric features. The protein-ligand binding regions identified by PLB-SAVE on protein surfaces were mostly found to have a concave structure based on previous observations. Thus, all possible interactively combined anchors from the query protein can be identified for the potential application of drug and vaccine design strategies. Binding sites between the antibody and antigen are crucial for the efficacy of the protective effect. Recently, carbohydrate-based vaccines have gained increasing attention due to the serotypes of various bacterial or viral strains. As well as the glycans exposed on the surface of cancer cells, carbohydrates have been developed as targets to be neutralized by an antibody or for inducing antibody-dependent cell-mediated cytotoxicity for cancer therapy . Carbohydrate-based vaccines are therefore expected to specifically protect hosts against the infection and eliminate cancer cells by immunotherapy. Thus, prediction of the ligand-binding site, such as a carbohydrate- or a glycan-binding site, would contribute considerably to the field of vaccine development. This research not only emphasizes accurate identification of protein-ligand binding regions, but also provides a practical example of use of the CUDA parallel computing architecture. Two test datasets, which included 388 unbound and bound proteins, were evaluated using our software, PLB-SAVE, and two other well-known programs, SiteHound and MPK2. The results show that our algorithm achieved an average accuracy rate of 95% for correctly identifying protein-ligand binding regions on two unbound and bound proteins, and performed an average of 160 times faster on these test datasets. PLB-SAVE can therefore be used as one of the first prediction tools for protein surface analysis and protein-ligand binding region detection for application in drug and vaccine development.
This work was supported by the Center of Excellence for Marine Bioenvironment and Biotechnology in National Taiwan Ocean University and National Science Council, Taiwan (Grant Nos. NSC 101-2321-B-019-001 and NSC 100-2627-B-019-006 to T.-W. Pai), and by an award from the Clinical Trial and Research Center of Excellence, Department of Health, Taiwan (Grants No. DOH101-TD-B-111-004).
The funding for publication of this article is provided by the Center of Excellence for Marine Bioenvironment and Biotechnology in National Taiwan Ocean University and National Science Council, Taiwan, R.O.C.
This article has been published as part of BMC Bioinformatics Volume 14 Supplement 4, 2013: Special Issue on Computational Vaccinology. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/14/S4
- Astronomo RD, Burton DR: Carbohydrate vaccines: developing sweet solutions to sticky situations?. Nat Rev Drug Discov. 2010, 9 (4): 308-324. 10.1038/nrd3012.View ArticlePubMedGoogle Scholar
- Bonetta L: Protein-protein interactions: Interactome under construction. Nature. 2010, 468 (7325): 851-854. 10.1038/468851a.View ArticlePubMedGoogle Scholar
- Leis S, Schneider S, Zacharias M: In silico prediction of binding sites on proteins. Curr Med Chem. 2010, 17 (15): 1550-1562. 10.2174/092986710790979944.View ArticlePubMedGoogle Scholar
- Walls PH, Sternberg MJ: New algorithm to model protein-protein recognition based on surface complementarity. Applications to antibody-antigen docking. J Mol Biol. 1992, 228 (1): 277-297. 10.1016/0022-2836(92)90506-F.View ArticlePubMedGoogle Scholar
- Chung JL, Wang W, Bourne PE: Exploiting sequence and structure homologs to identify protein-protein binding sites. Proteins. 2006, 62 (3): 630-640.View ArticlePubMedGoogle Scholar
- Bradford JR, Westhead DR: Improved prediction of protein-protein binding sites using a support vector machines approach. Bioinformatics. 2005, 21 (8): 1487-1494. 10.1093/bioinformatics/bti242.View ArticlePubMedGoogle Scholar
- Neuvirth H, Raz R, Schreiber G: ProMate: a structure based prediction program to identify the location of protein-protein binding sites. J Mol Biol. 2004, 338 (1): 181-199. 10.1016/j.jmb.2004.02.040.View ArticlePubMedGoogle Scholar
- Liang SD, Zhang C, Liu S, Zhou YQ: Protein binding site prediction using an empirical scoring function. Nucleic Acids Res. 2006, 34 (13): 3698-3707. 10.1093/nar/gkl454.PubMed CentralView ArticlePubMedGoogle Scholar
- Soga S, Shirai H, Kobori M, Hirayama N: Use of amino acid composition to predict ligand-binding sites. J Chem Inf Model. 2007, 47 (2): 400-406. 10.1021/ci6002202.View ArticlePubMedGoogle Scholar
- Chen R, Li L, Weng Z: ZDOCK: an initial-stage protein-docking algorithm. Proteins. 2003, 52 (1): 80-87. 10.1002/prot.10389.View ArticlePubMedGoogle Scholar
- Katchalski-Katzir E, Shariv I, Eisenstein M, Friesem AA, Aflalo C, Vakser IA: Molecular surface recognition: determination of geometric fit between proteins and their ligands by correlation techniques. Proc Natl Acad Sci USA. 1992, 89 (6): 2195-2199. 10.1073/pnas.89.6.2195.PubMed CentralView ArticlePubMedGoogle Scholar
- Kozakov D, Brenke R, Comeau SR, Vajda S: PIPER: an FFT-based protein docking program with pairwise potentials. Proteins. 2006, 65 (2): 392-406. 10.1002/prot.21117.View ArticlePubMedGoogle Scholar
- Janin J, Seraphin B: Genome-wide studies of protein-protein interaction. Current opinion in structural biology. 2003, 13 (3): 383-388. 10.1016/S0959-440X(03)00064-2.View ArticlePubMedGoogle Scholar
- Bahadur RP, Chakrabarti P, Rodier F, Janin J: A dissection of specific and non-specific protein-protein interfaces. J Mol Biol. 2004, 336 (4): 943-955. 10.1016/j.jmb.2003.12.073.View ArticlePubMedGoogle Scholar
- Bahadur RP, Zacharias M: The interface of protein-protein complexes: analysis of contacts and prediction of interactions. Cellular and molecular life sciences: CMLS. 2008, 65 (7-8): 1059-1072. 10.1007/s00018-007-7451-x.View ArticlePubMedGoogle Scholar
- Burgoyne NJ, Jackson RM: Predicting protein interaction sites: binding hot-spots in protein-protein and protein-ligand interfaces. Bioinformatics. 2006, 22 (11): 1335-1342. 10.1093/bioinformatics/btl079.View ArticlePubMedGoogle Scholar
- Connolly ML: Measurement of Protein Surface Shape by Solid Angles. J Mol Graphics. 1986, 4 (1): 3-10.1016/0263-7855(86)80086-8.View ArticleGoogle Scholar
- Hendrix DK, Kuntz ID: Surface solid angle-based site points for molecular docking. Pac Symp Biocomput. 1998, 317-326.Google Scholar
- Shentu Z, Al Hasan M, Bystroff C, Zaki MJ: Context shapes: Efficient complementary shape matching for protein-protein docking. Proteins. 2008, 70 (3): 1056-1073.View ArticlePubMedGoogle Scholar
- Lanzavecchia S, Cantele F, Bellon PL: Alignment of 3D structures of macromolecular assemblies. Bioinformatics. 2001, 17 (1): 58-62. 10.1093/bioinformatics/17.1.58.View ArticlePubMedGoogle Scholar
- Vouzis PD, Sahinidis NV: GPU-BLAST: Using graphics processors to accelerate protein sequence alignment. Bioinformatics. 2011Google Scholar
- Manavski SA, Valle G: CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment. BMC Bioinformatics. 2008, 9 (Suppl 2): S10-10.1186/1471-2105-9-S2-S10.PubMed CentralView ArticlePubMedGoogle Scholar
- Schatz MC, Trapnell C, Delcher AL, Varshney A: High-throughput sequence alignment using Graphics Processing Units. BMC Bioinformatics. 2007, 8: 474-10.1186/1471-2105-8-474.PubMed CentralView ArticlePubMedGoogle Scholar
- Trapnell C, Schatz MC: Optimizing Data Intensive GPGPU Computations for DNA Sequence Alignment. Parallel Comput. 2009, 35 (8): 429-440. 10.1016/j.parco.2009.05.002.PubMed CentralView ArticlePubMedGoogle Scholar
- Ritchie DW, Venkatraman V: Ultra-fast FFT protein docking on graphics processors. Bioinformatics. 2010, 26 (19): 2398-2405. 10.1093/bioinformatics/btq444.View ArticlePubMedGoogle Scholar
- Dynerman D, Butzlaff E, Mitchell JC: CUSA and CUDE: GPU-accelerated methods for estimating solvent accessible surface area and desolvation. J Comput Biol. 2009, 16 (4): 523-537. 10.1089/cmb.2008.0157.View ArticlePubMedGoogle Scholar
- Juba D, Varshney A: Parallel, stochastic measurement of molecular surface area. J Mol Graph Model. 2008, 27 (1): 82-87. 10.1016/j.jmgm.2008.03.001.View ArticlePubMedGoogle Scholar
- Friedrichs MS, Eastman P, Vaidyanathan V, Houston M, Legrand S, Beberg AL, Ensign DL, Bruns CM, Pande VS: Accelerating molecular dynamic simulation on graphics processing units. J Comput Chem. 2009, 30 (6): 864-872. 10.1002/jcc.21209.PubMed CentralView ArticlePubMedGoogle Scholar
- Dematte L, Prandi D: GPU computing for systems biology. Brief Bioinform. 2010, 11 (3): 323-333. 10.1093/bib/bbq006.View ArticlePubMedGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res. 2000, 28 (1): 235-242. 10.1093/nar/28.1.235.PubMed CentralView ArticlePubMedGoogle Scholar
- Huang B, Schroeder M: LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation. BMC structural biology. 2006, 6: 19-10.1186/1472-6807-6-19.PubMed CentralView ArticlePubMedGoogle Scholar
- Dessailly BH, Lensink MF, Orengo CA, Wodak SJ: LigASite--a database of biologically relevant binding sites in proteins with known apo-structures. Nucleic Acids Res. 2008, 36 (Database): D667-673.PubMed CentralView ArticlePubMedGoogle Scholar
- Hernandez M, Ghersi D, Sanchez R: SITEHOUND-web: a server for ligand binding site identification in protein structures. Nucleic Acids Res. 2009, 37 (Web Server): W413-416. 10.1093/nar/gkp281.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang Z, Li Y, Lin B, Schroeder M, Huang B: Identification of cavities on protein surface using multiple computational approaches for drug binding site prediction. Bioinformatics. 2011, 27 (15): 2083-2088. 10.1093/bioinformatics/btr331.View ArticlePubMedGoogle Scholar
- Brady GP, Stouten PF: Fast prediction and visualization of protein binding pockets with PASS. Journal of computer-aided molecular design. 2000, 14 (4): 383-401. 10.1023/A:1008124202956.View ArticlePubMedGoogle Scholar
- Laurie AT, Jackson RM: Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites. Bioinformatics. 2005, 21 (9): 1908-1916. 10.1093/bioinformatics/bti315.View ArticlePubMedGoogle Scholar
- Laskowski RA: SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. Journal of molecular graphics. 1995, 13 (5): 323-330. 10.1016/0263-7855(95)00073-9. 307-328View ArticlePubMedGoogle Scholar
- Le Guilloux V, Schmidtke P, Tuffery P: Fpocket: an open source platform for ligand pocket detection. BMC bioinformatics. 2009, 10: 168-10.1186/1471-2105-10-168.PubMed CentralView ArticlePubMedGoogle Scholar
- Kawabata T: Detection of multiscale pockets on protein surfaces using mathematical morphology. Proteins. 2010, 78 (5): 1195-1211. 10.1002/prot.22639.View ArticlePubMedGoogle Scholar
- Capra JA, Laskowski RA, Thornton JM, Singh M, Funkhouser TA: Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS Comput Biol. 2009, 5 (12): e1000585-10.1371/journal.pcbi.1000585.PubMed CentralView ArticlePubMedGoogle Scholar
- Yu J, Zhou Y, Tanaka I, Yao M: Roll: a new algorithm for the detection of protein pockets and cavities with a rolling probe sphere. Bioinformatics. 2010, 26 (1): 46-52. 10.1093/bioinformatics/btp599.View ArticlePubMedGoogle Scholar
- Matthay KK, George RE, Yu AL: Promising therapeutic targets in neuroblastoma. Clin Cancer Res. 2012, 18 (10): 2740-2753. 10.1158/1078-0432.CCR-11-1939.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.