KVFinder: steered identification of protein cavities as a PyMOL plugin
- Saulo HP Oliveira†2, 3,
- Felipe AN Ferraz†1, 2,
- Rodrigo V Honorato1, 2,
- José Xavier-Neto1,
- Tiago JP Sobreira1 and
- Paulo SL de Oliveira1Email author
© Oliveira et al.; licensee BioMed Central Ltd. 2014
Received: 20 February 2014
Accepted: 9 June 2014
Published: 17 June 2014
The characterization of protein binding sites is a major challenge in computational biology. Proteins interact with a wide variety of molecules and understanding of such complex interactions is essential to gain deeper knowledge of protein function. Shape complementarity is known to be important in determining protein-ligand interactions. Furthermore, these protein structural features have been shown to be useful in assisting medicinal chemists during lead discovery and optimization.
We developed KVFinder, a highly versatile and easy-to-use tool for cavity prospection and spatial characterization. KVFinder is a geometry-based method that has an innovative customization of the search space. This feature provides the possibility of cavity segmentation, which alongside with the large set of customizable parameters, allows detailed cavity analyses. Although the main focus of KVFinder is the steered prospection of cavities, we tested it against a benchmark dataset of 198 known drug targets in order to validate our software and compare it with some of the largely accepted methods. Using the one click mode, we performed better than most of the other methods, staying behind only of hybrid prospection methods. When using just one of KVFinder’s customizable features, we were able to outperform all other compared methods. KVFinder is also user friendly, as it is available as a PyMOL plugin, or command-line version.
KVFinder presents novel usability features, granting full customizable and highly detailed cavity prospection on proteins, alongside with a friendly graphical interface. KVFinder is freely available on http://lnbio.cnpem.br/bioinformatics/main/software/.
KeywordsKVFinder Protein cavities Volume calculation PyMOL plugin
Proteins perform their biological functions mainly by interacting with other molecules, ranging from small ions to macromolecules such as proteins, and/or nucleic acids. Experience demonstrates that interactions between protein binding sites and their ligands depend on the physical properties displayed at contact interfaces . These interactions are highly specific and variable across specific protein domain and ligand classes, thus constraining the efficient interaction of a given protein to a few types of ligands. Such strong specificities result from a high level of spatial complementarity between binding sites and ligands .
The development of computational methods to predict and characterize binding sites in proteins has been an active research theme, which can be demonstrated by the large number of theoretical methods developed for this purpose. The published algorithms can be classified into three distinct categories, as they can be based on geometry, on energy, or on evolutionary principles. Geometry-based methods locate cavities by analyzing the molecular surface, generally using a 3D grid, spheres or tessellation techniques, and comprise a majority of available software. Examples include LIGSITE , CAST , SURFNET , PASS , SCREEN , POCASA , PocketPicker , Fpocket , POCKET , CavitySearch , DogSite , TRAPP , PHECON  and VOIDOO .
Energy-based methods identify cavities by analyzing the energetic interaction between the target protein and a probe, usually represented by a chemical group. Examples of this approach are GRID , CS-Map , DrugSite , QsiteFinder  and PocketFinder . Methods based on evolutionary principles rely on the search for conserved residues in sequence alignments and information of known active site profiles. Examples of algorithms using this approach are ConSurf  and Rate4Site . Meta-servers that combine more than one approach have also been published, such as MetaPocket , FINDSITE  and LigSiteCSC .
Geometry-based methods present some advantages when compared to other approaches. In contrast to evolutionary approaches, geometry-based methods do not rely on prior knowledge, thus being independent on the number of available sequences. Geometry-based methods are also more straightforward than energy-based methods, which are highly dependent of force field parameterization and scoring functions.
Here we introduce KVFinder, a geometrical grid-based method, which presents some distinguished capabilities, such as search space segmentation, a user friendly interface and full customizable parameters, able to identify and analyze different kinds of protein cavities, including pockets, tunnels and shallow crevices.
The parameter customization is designed to solve some of the major flaws of geometry-based methods. Grid-based methods are sensitive to grid spacing, but in KVFinder this is a user defined parameter, which, combined with the space segmentation capability, creates the opportunity to generate fine high-resolution representation of cavities. Another common problem in geometry methods is the definition of cavity ceiling, which in our method can be directly controlled by customizable probe sizes. KVFinder’s space segmentation capability creates multiple possibilities for cavity analysis because it allows the study of relevant subpockets and the assignment of its individual characteristics, e.g., sub-sites in enzymes or protein kinases, joint cofactor and substrate binding site.
Finally, one special concern on this project is usability, as we noted that many of the available methods fail on this aspect. KVFinder has basic and advanced usage modes, and is available not only on the command line (which is better suited for quick, or high-throughput analyses), but also as a PyMOL Plugin  with a user friendly GUI for Linux and Windows.
Once cavity points are defined on the grid, the next step separates points forming different cavities. Cavity points are considered as belonging to the same cavity only when there is spatial connectivity between them. For this step, we use a recursive implementation of the DFS algorithm . Considering a cavity point to be a node, we define an accessible node to be searched as a cavity point that is on the same row, column or diagonal of our current node, with grid coordinate difference no greater than one. A new search is made for every cavity point that has not yet been marked as belonging to a cavity. Every point visited recursively during a search procedure is marked with the same label, which marks it as belonging to the same cavity.
For each cavity found, KVFinder performs a spatial characterization based on its grid points. Cavity volume is calculated as the sum of grid sized cubes (voxels) comprised in the cavity space. The surface area is computed by summing grid sized squares formed by grid points on the cavity surface. KVFinder is equipped with a user-defined volume threshold, which suppresses any cavity under a given volume.
Every grid-based method is sensitive to grid spacing. Higher density grids enable richer spatial representation, which are useful for detailed study of cavities, but imply higher computational costs. To overcome this issue, KVFinder makes grid spacing an input parameter, enabling the user to explore a balance between performance and precision.
Given KVFinder's definition of a cavity as the space between molecular surfaces defined by two probes, it is no surprise that probe size has a major impact on the results. The Probe In is preset at 1.40 Å, the approximate radius of a water molecule, defining a solvent accessible surface. Therefore the interior of the calculated cavity is not consisted of any empty space within the protein. It is only defined by the accessible parts to molecules of a certain size. A varied Probe In size can be used for solvents other than water. However, the setup of an optimal radius for the Probe Out is not a straightforward step, as it may vary substantially depending on characteristics of the analyzed protein. This probe defines a ceiling for the cavity. Thus, if the Probe Out is small enough to enter the cavity, it will make the defined cavity shallow. Depending on how deep this Probe Out can roll into the cavity, it might even make the cavity disappear. On the other hand, if the big probe is very large, it can demand more computational time.
To establish an optimal value for the ceiling probe we performed a simulation using 198 known drug targets , screening different values for the probe size. The simulation consisted of making a cavity search using whole protein mode and varying the Probe Out sizes between 2 Å to 8 Å in increments of 0.5 Å steps. A prediction is considered correct when the center of mass of the cavity is within 4 Å of any ligand atom . Cavities were ranked based on cavity volume and the top three cavities were analyzed. To evaluate the ability of KVFinder to work as a cavity detection software, we used the same benchmark dataset above and compared the results to other methods .
A new feature introduced by KVFinder is the space segmentation, which means that the prospected region can be user defined or considering the whole protein. With this feature, the user is able to split the cavity in subpockets, generating a spatial characterization of separate parts of the cavity. By introducing the usability of the search space restriction, KVFinder creates a new set of possibilities for more detailed cavity analysis. Detailed space definition can be a valuable asset on ligand binding studies, because it allows a detailed analysis of the space occupied by different parts of the ligand. The space segmentation also addresses the problem of resolution sensitivity, which affects all grid-based cavity search methods. By restricting the search around a given area of interest, a higher resolution representation can be achieved at a much lower computational cost. The search space is defined by an interactive box, created by the PyMOL Plugin.
Results and discussion
The effect of the probe out size
Comparing KVFinder to other methods
Success rate of binding site prediction of different software for 198 known drug targets structures
KVFinder provides an efficient geometrical characterization of protein cavities. On the blind site prospection test, it achieved a 76% success rate, outperforming other methods. KVFinder's main focus is the innovative steered search approach, relying on a large set of customizable parameters, making possible complex and detailed analyses of cavities. The user can split the cavity in subpockets, define the cavity ceiling, adapt the output to match the protein topology and adjust the spatial representation resolution. All these features are accessible through an easy to use graphical interface. KVFinder is a powerful asset that provides needed tools to gain a deeper understanding on protein cavities.
Availability and requirements
Project name: KVFinder
Project home page: http://lnbio.cnpem.br/bioinformatics/main/software/
Operating system(s): Ubuntu 12.04, Windows 7 SP 1
Programming language: C, C++, Python/Tk
Other requirements: PyMOL v.1.4.1. on Linux, PyMOL v.1.3 in Windows
Aldehyde dehydrogenase 1
Substrate entry channel
This work was supported by grants from São Paulo Research Foundation (2009/06433-0 and 2008-52695-4) and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior.
- Bohacek RS, McMartin C: Modern computational chemistry and drug discovery: structure generating programs. Curr Opin Chem Biol. 1997, 1: 157-161. 10.1016/S1367-5931(97)80004-X.View ArticlePubMedGoogle Scholar
- Henrich S, Salo-Ahen OM, Huang B, Rippmann FF, Cruciani G, Wade RC: Computational approaches to identifying and characterizing protein binding sites for ligand design. J Mol Recognit. 2010, 23: 209-219.PubMedGoogle Scholar
- Hendlich M, Rippmann F, Barnickel G: LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. J Mol Graph Model. 1997, 15: 359-363. 10.1016/S1093-3263(98)00002-3. 389View ArticlePubMedGoogle Scholar
- Liang J, Edelsbrunner H, Woodward C: Anatomy of protein pockets and cavities: measurement of binding site geometry and implications for ligand design. Protein Sci. 1998, 7: 1884-1897. 10.1002/pro.5560070905.View ArticlePubMed CentralPubMedGoogle Scholar
- Laskowski RA: SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. J Mol Graph. 1995, 13: 323-330. 10.1016/0263-7855(95)00073-9. 307–328View ArticlePubMedGoogle Scholar
- Brady GP, Stouten PF: Fast prediction and visualization of protein binding pockets with PASS. J Comput Aided Mol Des. 2000, 14: 383-401. 10.1023/A:1008124202956.View ArticlePubMedGoogle Scholar
- Nayal M, Honig B: On the nature of cavities on protein surfaces: application to the identification of drug-binding sites. Proteins. 2006, 63: 892-906. 10.1002/prot.20897.View ArticlePubMedGoogle Scholar
- Yu J, Zhou Y, Tanaka I, Yao M: Roll: a new algorithm for the detection of protein pockets and cavities with a rolling probe sphere. Bioinformatics. 2010, 26: 46-52. 10.1093/bioinformatics/btp599.View ArticlePubMedGoogle Scholar
- Weisel M, Proschak E, Schneider G: PocketPicker: analysis of ligand binding-sites with shape descriptors. Chem Cent J. 2007, 1: 7-10.1186/1752-153X-1-7.View ArticlePubMed CentralPubMedGoogle Scholar
- Le Guilloux V, Schmidtke P, Tuffery P: Fpocket: an open source platform for ligand pocket detection. BMC Bioinformatics. 2009, 10: 168-10.1186/1471-2105-10-168.View ArticlePubMed CentralPubMedGoogle Scholar
- Levitt DG, Banaszak LJ: POCKET: a computer graphics method for identifying and displaying protein cavities and their surrounding amino acids. J Mol Graph. 1992, 10: 229-234. 10.1016/0263-7855(92)80074-N.View ArticlePubMedGoogle Scholar
- Ho CM, Marshall GR: Cavity search: an algorithm for the isolation and display of cavity-like binding regions. J Comput Aided Mol Des. 1990, 4: 337-354. 10.1007/BF00117400.View ArticlePubMedGoogle Scholar
- Volkamer A, Griewel A, Grombacher T, Rarey M: Analyzing the topology of active sites: on the prediction of pockets and subpockets. J Chem Inf Model. 2010, 50: 2041-2052. 10.1021/ci100241y.View ArticlePubMedGoogle Scholar
- Kokh DB, Richter S, Henrich S, Czodrowski P, Rippmann F, Wade RC: TRAPP: a tool for analysis of transient binding pockets in proteins. J Chem Inf Model. 2013, 53: 1235-1252. 10.1021/ci4000294.View ArticlePubMedGoogle Scholar
- Kawabata T, Go N: Detection of pockets on protein surfaces using small and large probe spheres to find putative ligand binding sites. Proteins. 2007, 68: 516-529. 10.1002/prot.21283.View ArticlePubMedGoogle Scholar
- Kleywegt GJ, Jones TA: Detection, delineation, measurement and display of cavities in macromolecular structures. Acta Crystallogr Sect D: Biol Crystallogr. 1994, 50: 178-185. 10.1107/S0907444993011333.View ArticleGoogle Scholar
- Goodford PJ: A computational procedure for determining energetically favorable binding sites on biologically important macromolecules. J Med Chem. 1985, 28: 849-857. 10.1021/jm00145a002.View ArticlePubMedGoogle Scholar
- Bradford JR, Westhead DR: Improved prediction of protein-protein binding sites using a support vector machines approach. Bioinformatics. 2005, 21: 1487-1494. 10.1093/bioinformatics/bti242.View ArticlePubMedGoogle Scholar
- An J, Totrov M, Abagyan R: Comprehensive identification of "druggable" protein ligand binding sites. Genome Inform. 2004, 15: 31-41.PubMedGoogle Scholar
- Laurie AT, Jackson RM: Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites. Bioinformatics. 2005, 21: 1908-1916. 10.1093/bioinformatics/bti315.View ArticlePubMedGoogle Scholar
- An J, Totrov M, Abagyan R: Pocketome via comprehensive identification and classification of ligand binding envelopes. Mol Cell Proteomics. 2005, 4: 752-761. 10.1074/mcp.M400159-MCP200.View ArticlePubMedGoogle Scholar
- Armon A, Graur D, Ben-Tal N: ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. J Mol Biol. 2001, 307: 447-463. 10.1006/jmbi.2000.4474.View ArticlePubMedGoogle Scholar
- Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N: Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics. 2002, 18 (Suppl 1): S71-S77. 10.1093/bioinformatics/18.suppl_1.S71.View ArticlePubMedGoogle Scholar
- Huang B: MetaPocket: a meta approach to improve protein ligand binding site prediction. OMICS. 2009, 13: 325-330. 10.1089/omi.2009.0045.View ArticlePubMedGoogle Scholar
- Brylinski M, Skolnick J: A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation. Proc Natl Acad Sci U S A. 2008, 105: 129-134. 10.1073/pnas.0707684105.View ArticlePubMed CentralPubMedGoogle Scholar
- Huang B, Schroeder M: LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation. BMC Struct Biol. 2006, 6: 19-10.1186/1472-6807-6-19.View ArticlePubMed CentralPubMedGoogle Scholar
- Schrodinger LLC: Book The PyMOL Molecular Graphics System, Version 1.3r1. The PyMOL Molecular Graphics System, Version 1.3r1. 2010Google Scholar
- Masuya M, Doi J: Detection and geometric modeling of molecular surfaces and cavities using digital mathematical morphological operations. J Mol Graph. 1995, 13: 331-336. 10.1016/0263-7855(95)00071-2.View ArticlePubMedGoogle Scholar
- Matheron G: Random Sets and Integral Geometry. 1975, New York: John Wiley & SonsGoogle Scholar
- Serra J: Image Analysis and Mathematical Morphology. 1983, Orlando: Academic Press, Inc.Google Scholar
- Tarjan R: Depth-first search and linear graph algorithms. SIAM J Comput. 1972, 1: 146-160. 10.1137/0201010.View ArticleGoogle Scholar
- Zhang Z, Li Y, Lin B, Schroeder M, Huang B: Identification of cavities on protein surface using multiple computational approaches for drug binding site prediction. Bioinformatics. 2011, 27: 2083-2088. 10.1093/bioinformatics/btr331.View ArticlePubMedGoogle Scholar
- Kawabata T: Detection of multiscale pockets on protein surfaces using mathematical morphology. Proteins. 2010, 78: 1195-1211. 10.1002/prot.22639.View ArticlePubMedGoogle Scholar
- Capra JA, Laskowski RA, Thornton JM, Singh M, Funkhouser TA: Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS Comput Biol. 2009, 5: e1000585-10.1371/journal.pcbi.1000585.View ArticlePubMed CentralPubMedGoogle Scholar
- Sobreira TJ, Marletaz F, Simoes-Costa M, Schechtman D, Pereira AC, Brunet F, Sweeney S, Pani A, Aronowicz J, Lowe CJ, Davidson B, Laudet V, Bronner M, de Oliveira PS, Schubert M, Xavier-Neto J: Structural shifts of aldehyde dehydrogenase enzymes were instrumental for the early evolution of retinoid-dependent axial patterning in metazoans. Proc Natl Acad Sci U S A. 2011, 108: 226-231. 10.1073/pnas.1011223108.View ArticlePubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.