A robust and efficient algorithm for the shape description of protein structures and its application in predicting ligand binding sites
© Xie and Bourne; licensee BioMed Central Ltd. 2007
Published: 22 May 2007
An accurate description of protein shape derived from protein structure is necessary to establish an understanding of protein-ligand interactions, which in turn will lead to improved methods for protein-ligand docking and binding site analysis. Most current shape descriptors characterize only the local properties of protein structure using an all-atom representation and are slow to compute. We need new shape descriptors that have the ability to capture both local and global structural information, are robust for application to models and low quality structures and are computationally efficient to permit high throughput analysis of protein structures.
We introduce a new shape description that requires only the Cα atoms to represent the protein structure, thus making it both fast and suitable for use on models and low quality structures. The notion of a geometric potential is introduced to quantitatively describe the shape of the structure. This geometric potential is dependent on both the global shape of the protein structure as well as the surrounding environment of each residue. When applying the geometric potential for binding site prediction, approximately 85% of known binding sites can be accurately identified with above 50% residue coverage and 80% specificity. Moreover, the algorithm is fast enough for proteome-scale applications. Proteins with fewer than 500 amino acids can be scanned in less than two seconds.
The reduced representation of the protein structure combined with the geometric potential provides a fast, quantitative description of protein-ligand binding sites with potential for use in large-scale predictions, comparisons and analysis.
The 3D structure of a protein is an essential component in elucidating biological functions at the molecular level. Protein-ligand binding sites and their interactions with binding partners provide strong correlations between structure and function and are thus critical for addressing a wide range of fundamental and practical problems in biology. Knowledge of protein-ligand binding sites provides not only critical clues in elucidating the relationships to evolution, structure and function, but also contributes to drug discovery. Knowledge of such sites may be used to identify and validate drug targets, prioritize and optimize drug leads, rationalize small molecule screening and docking, guide medicinal chemistry efforts and computationally evaluate ADME/Tox properties of preclinical drugs. To derive knowledge of the ligand binding site from the exponentially increasing amount of structural data, it is critical to develop a sensitive and robust algorithm that can identify and characterize the ligand binding sites of proteins on a proteome-wide scale.
Shape descriptors representing protein structure, such as depth [1, 2], surface curvature , extreme elevation , solid angle , surface area  and volume , have been used extensively to identify, study and compare protein-ligand interactions, protein-protein interactions and the respective binding sites. For example, the extreme elevation approach is used for geometric alignment during protein docking . The match of small molecules to protein binding sites has been studied using the molecular shape complementarities of solid angles [5, 8]. Besides predictions of ligand orientations, one of the biggest challenges in any docking study is to obtain an accurate estimate of the binding affinity while including the intrinsic flexibility of the protein and the ligand. Soft docking provides a solution to these problems . The adaptive scoring function for soft docking requires a defined "hard" and "soft" interaction range between the protein and the ligand. Furthermore, the accuracy in estimating binding affinity can be dramatically improved with the docking score index (DSI) from multiple ligand, multiple protein docking . The use of a virtual ligand has been proposed to extend the DSI schema for genome-wide high throughput screening . The success of the proposed DSI method critically depends on the generation of the virtual ligand, which is a negative image of the ligand binding site. It is still an open question how to define such a virtual ligand, or equivalently the boundary to the ligand binding site. Geometry based methods are very useful in detecting pockets and cavities within the protein structure [6, 12–17], and can be applied independently or combined with other evolutionary [18–21] or physical based methods [22, 23]. Although these existing methods [2, 6, 12–17] can locate the binding pockets accurately, the accurate definition of the pocket boundary remains rather poor . This inaccurate description limits further application for protein-ligand docking and functional site comparison. Moreover, the geometrical measurement of pockets and cavities using shape descriptors such as volume and curvature alone is not a good indicator to distinguish true binding pockets from false positives .
Nevertheless, geometric constraints have been used extensively to assess the similarity between functional sites . Most of these studies focus on the local shape of the protein using distance, curvature, and side chain orientation that are sensitive to conformational changes of either the side chains or backbone. To extend the scope of functional site comparison algorithms, it is necessary to develop topological and geometric invariants that are less sensitive to the flexibility and uncertainty inherent in the protein structure, yet still provide a useful metric. In summary, there are several drawbacks in using conventional shape descriptors for the protein structure when applied to ligand binding studies. First, most of these measurements capture the local property of the protein structure and do not distinguish between the ligand binding and non-binding site. Second, some shape descriptors require an all-atom representation of the protein structure, making the algorithm computationally intensive. Finally, existing algorithms are sensitive to conformational changes in the protein structure and are intolerant to the uncertainty inherent in homology models and low resolution structures. Given these shortcomings we propose a new method for protein shape description of the protein structure that is scalable to a large data set of proteins yet robust enough to handle the intrinsic properties of protein flexibility. Inherent in the method is the provision of the location and boundary of any binding pocket, thus providing a new approach to the study of protein-ligand interactions.
Details are given in the Methods section, but in summary the topological relationships among Cα atoms in the protein are established using Delaunay tessellation  of these atoms in 3D space. From the Delaunay tessellation of the reduced representation of the protein structure, shape descriptors such as the direction of the Cα atom relative to the surface can be determined. The notion of a geometric potential is further introduced to quantitatively characterize the so-called shape formed by the set of Cα atoms. The geometric potential is analogous to the hydrophobicity or electrostatics potential in that it is dependent on both the global shape of the protein structure as well as the surrounding environment of the residue. The geometric potential has been successfully applied in a new algorithm for ligand binding site comparison [L. Xie and P.E. Bourne, "Detecting evolutionary linkages across fold and functional space with sequence order independent profile-profile alignments," submitted]. Here we focus on a detailed description of the geometric potential algorithm and applications that predict the ligand binding site.
Features of the geometric potential
Boundary sensitivity and specificity
Comparison to other algorithms
General concept of the geometric potential
The current implementation of the geometric potential is computationally straightforward and conceptually analogous to a residue's electrostatic potential or hydrophobicity, both of which depend on both the global shape of the protein structure and the surrounding environment of the residue. Suppose that the protein is an insulator with a homogenous hydrophilic surface and surrounded by solvent with positive charge. Each of the residues represented by Cα atoms carries a unit of negative charge. The electrostatic potential essentially reflects the residue's geometrical characteristics at the surface and in the pocket/cavity. The electrostatic potential is most negative in a closed cavity, less negative in an open pocket, almost neutral on a flat surface, and most positive on a convex surface. Thus, the geometric potential can, in theory, be computed using rigorous energy-based methods such as Poisson-Boltzmann for the electrostatic free energy  on a protein structure with a reduced representation of residues and charges. In other words, if non-specific interactions such as van der Waals interactions are used to predict ligand binding sites [23, 31], they provide almost identical information to that derived from geometric properties. However, the direct energy based method usually requires an all-atom representation of the protein and the accurate estimation of the interaction energy is not trivial. Consequently, it is possible to integrate the geometric and topological properties characterized by the geometric potential with energy-based physical potentials into a unified framework to study protein-ligand binding.
Relationship to other algorithms
The convex hull and related α-shape algorithm (see methods) have been applied by others to identify ligand binding pockets [6, 15]. Other approaches usually require an all-atom representation of the protein structure. Here, with one simple parameter which uses the radius of the circumscribed sphere from the Delaunay tessellation, a Cα atom representation is sufficient. As a result, the algorithm is theoretically two orders of magnitude faster than all atom approaches because the time complexity of 3D Delaunay tessellation is O(n2) where n is the number of input points. Moreover, as shown, the algorithm is not sensitive to conformational changes of the protein, especially the side chains.
The geometric potential provides a more robust quantitative measurement of the geometrical and topological properties of the pocket and cavity taking into account both the global and local environment surrounding the amino acid residue and hence offers advantages over more conventional measurements such as depth [1, 2], travel depth , surface curvature , surface area  and volume . Recent studies have shown that depth is a more important attribute than curvature, volume and other shape descriptors in distinguishing drugable binding sites . However, the depth or the travel depth of an amino acid residue in a pocket cannot indicate whether the residue resides in a narrow or wide open pocket with the same depth. Moreover, the depth cannot distinguish flat and convex surfaces if they have the same depths. The geometric potential distinguishes these cases. The depth used in our studies to initialize the geometric potential is quite simple and not as accurate as the travel depth proposed by Coleman et al. . However, it is straightforward to implement and incorporate the travel depth concept into our approach by replacing the distance to the closed plane with the travel depth during computing the geometric potential. Other geometric or topological measurements such as the distance to the centric of the protein  and closeness centrality , which are able to distinguish ligand binding and non-binding sites, can also be used to initialize the geometric potential.
Limitations of the algorithm and future work
Ligand binding is primarily a physics phenomenon, depending on fundamental thermodynamics and kinetics. Therefore, the ligand binding site may be best studied with an energy-based method, such as protein-ligand docking [33, 34], grid potential mapping , or solvent/fragment mapping [35–37]. However, docking is not only time consuming computationally, but also inaccurate in estimating the binding energy. Geometric properties of the protein structure, such as pockets and cavities, provide rational constraints to address the docking problem. Another important constraint on the ligand binding site comes from evolution. The identification of conserved residues from sequence analysis significantly reduces the search space thus aiding the location of the ligand binding site. Therefore, the identification of the location and boundary of the ligand binding site is best achieved by integrating protein features associated with geometry, evolution and energy. While not yet completed, the concept of the geometric potential as described here provides a quantitative framework to combine these sources of information.
The approach presented here is extensible and can be used to predict protein-protein binding sites. The majority of these sites are formed from relatively flat surfaces and thus their geometric potentials are less distinguishable than those of pockets and cavities. Solvent vectors have previously been used to define protein-protein interaction interfaces . The global direction of the residue from our algorithm provides a more robust method than the solvent vector to cluster the residues involved in protein-protein ligand binding and further define its boundary. Moreover, if the real hydrophobicity and/or the electrostatic energy are used as the initial value for the geometric potential, it is possible to quantitatively distinguish the protein-protein binding site.
We introduce a new efficient and robust algorithm that quantitatively characterizes the geometric properties of the protein structure. The geometric potential is dependent on both the global shape of the protein structure as well as the surrounding environment of the residue. In this sense it is analogous to the hydrophobicity or electrostatic potential. When applying the geometric potential to ligand binding site prediction, the top three predictions contain more than 75% of the known binding sites and provide at least 50% coverage of the ligand binding site residues. Approximately 85% of known ligand binding sites can be accurately identified with above 50% residue coverage and 80% specificity for all predicted binding sites. Moreover, the algorithm is fast enough for proteome-scale applications. Proteins with fewer than 500 amino acids can be scanned in less than two seconds. The algorithm provides a framework for integrating evolution, energy, and further geometry-based parameters to study protein-ligand interactions on a proteome-wide scale.
A data set of protein-ligand binding sites is built from protein chains in the Protein Data Bank (PDB)  with known 3D protein structures with bound ligands. Only small organic molecules are considered as ligands; DNA, RNA, peptides and metals are excluded. The non-redundant set of protein chains are selected with sequence identities less than 90%. Only x-ray structures are included in our test data set. The final benchmark dataset contains a total of 5263 enzyme and non-enzyme polypeptide chains, as defined as best as possible by the presence or absence of EC numbers.
For each protein-ligand complex, the residues involved in ligand binding are those where any of the atoms in a residue are within a 10.0 Å radius of any ligand atom and the line segment connecting these two atoms does not intersect with other protein atoms. There are 7,570 binding sites found in the 5263 chains involving 48,819 binding residues and 1,414,293 non-binding residues, respectively. In addition, 54,826 residue clusters are randomly generated from non-binding site residues as negative controls. To generate the clusters, one of the solvent accessible residues that are not involved in the ligand binding on the protein is randomly selected as the center. Then it, with all of its neighboring solvent accessible residues within a 10.0 Å radius, defines the cluster. Clusters are selected so as not to overlap.
Overview of the algorithm
The algorithm consists of the following steps (Figure 1).
Step 1. Representation of the protein structure
The protein structure is represented by Cα atoms only, making it computationally efficient and applicable to low resolution structures and homology models on a proteome-wide scale.
Step 2. Delaunay tessellation of the Cα atoms
The structure is Delaunay tessellated using a convex hull algorithm  implemented in the Qhull package. As a result, the structure is partitioned into a set of tetrahedra. A unique circumscribed sphere is defined for each tetrahedron such that its four vertices touch the surface of the sphere. In doing so the following determinant is obeyed:
Where (x i , y i , z i ) with i = 1, 2, 3, and 4 are the coordinates of the points of the tetrahedron from the Delaunay tessellation.
The effect of Delaunay tessellation is to generate a convex hull surrounding the Cα atoms of the protein.
Step 3. Determination of the environmental boundary of the protein structure
The environmental boundary is defined as an outside layer that contains all of the Cα atoms of the protein (red solid lines in Figure 1). It is computed by iteratively peeling off the tetrahedra that include edges longer than 30.0 Å layer by layer starting from the convex hull. The value of 30.0 Å is an empirical estimate for the maximum size of a ligand binding pocket. As a result, some of the triangles on the original convex hull are removed and the Cα atoms of the protein are surrounded with the newly formed triangles resulting from the removal of tetrahedra with edge length longer than 30.0 Å and the remaining triangles on the convex hull. This set of triangles forms the environmental boundary which contains both the protein and any potential ligand binding pockets. All of the remaining tetrahedra form a constrained Delaunay tessellation.
Step 4. Determination of the protein boundary for the structure
The tetrahedra with the radius of the circumscribed sphere larger than 7.5 Å can be further removed from the constrained Delaunay tessellation defined in step 3. A new boundary (blue and purple solid lines in the Figure 1), which still contains all of the Cα atoms of the protein, is formed from the new triangles resulting from the removal of the tetrahedra and the remaining triangles on the convex hull. This boundary is called the protein boundary. The cut-off value of 7.5 Å is derived from the parameterization procedure described below and is based on the separation in size of circumscribed spheres (and hence tetrahedra) that define the surface versus the interior of the protein. The removed tetrahedra from the constrained Delaunay tessellation are candidates for the virtual atom, which is the circumscribed sphere outside the protein boundary but inside the environmental boundary. Protein space is thus partitioned into three parts defined by the protein and environmental boundaries – the Cα atoms of the protein inside the protein boundary, the virtual atoms inside the environmental boundary but outside the protein boundary, and that occupied by the solvent outside the environmental boundary. It is noted that the protein and environmental boundaries may overlap.
Step 5. Computation of geometric measurements
Associated with each Cα atom is a vector describing the distance and direction to the environmental boundary. To compute the distance and direction, the closest plane(s) to the Cα atom, determined from the triangles on the boundary, are first selected. Then the boundary distance P, which will be used in the next step, is the distance from the Cα atom to the closest plane, and the boundary direction is the normal vector of the closest plane. If there is more than one closest plane that has the same distance to the Cα atom (for example, the distances from the Cα atom on the environmental boundary to all its intersected planes are 0.0), the average of the normal vectors of these closest planes is taken as the atom's direction. The distance and direction of the Cα atom to the protein boundary is computed in the same way as to the environmental boundary.
Step 6. Computation of geometric potential
The value of the geometric potential (GP) at each Cα atom depends on the atom's distance to the environmental boundary and the distances and directions to neighboring Cα atoms that are located on the protein boundary and unobstructed by other residues inside the protein boundary. This can be described as follows:
Where P is the distance of a given Cα atom to the environmental boundary. The index, i, indicates the i th neighboring Cα atom that is located on the protein boundary and unobstructed by other residues inside the protein boundary within a 10.0 Å radius. P i , D i , and α i are its distance to the environmental boundary, distance and relative direction to the given Cα atom, respectively. The formula is similar to that proposed by Mancera et al.  to calculate the hydrophobicity of the binding site. In fact if P is substituted with the value for the hydrophobicity of the residue, the geometric potential is equivalent to the hydrophobicity in the binding site. Other geometric or topological measurements for P are possible, such as the distance to the centric of the protein , travel depth  and closeness of residues to other residues in protein residue interaction networks . Finally, the value of the geometric potential for each Cα atom is normalized to lie between 0.0 and 100.0.
Step 7. Construction of the virtual ligand and prediction of the ligand binding site
The tetrahedra that are labeled as candidates for virtual atoms and were discarded in step 3 are further processed to construct the final virtual ligand. First, the tetrahedra whose circumscribed sphere's center is outside the environmental boundary are removed. This procedure guarantees that all virtual atoms are within the environmental boundary. Second, the remaining tetrahedra are considered virtual atoms if their radii are larger than 7.5 Å. The cut-off value is derived from the parameterization procedure described below and is based on the separation in size of spheres (and hence tetrahedra) that define the surface versus the interior of the protein. Virtual atoms are then clustered. Two virtual atoms fall into the same cluster if their circumscribed spheres overlap. Each of the virtual atom clusters is considered a virtual ligand. The negative image of the virtual ligand is the predicted ligand binding site as identified in a similar manner to the ligand binding site for the known ligand, i.e., the Cα atoms predicted are those where any of the atoms are within a 10.0 Å radius from any virtual atom and the line segment connecting these two atoms does not intersect with other spheres. Moreover, an overall geometric potential for each predicted binding site is calculated as the average of the geometric potentials for all Cα atoms within the site. The average geometric potential is used to rank the predicted binding sites.
Parameterization of the algorithm
The algorithm only requires one simple parameter – the radius of the circumscribed sphere within the protein boundary that distinguishes it from spheres within the environmental boundary but outside the protein boundary. We define a tetrahedron as solid if its four edges are formed by residues considered to be in contact. Here two amino acid residues indexed i and j are defined in contact if they have at least one pair of atoms A i and A j where the difference between i and j is one or whose distance D ij satisfies the following condition if |i - j| > 1:
D ij <= R i + R j + 2.0R a (3)
Where R i and R j are the van der Waals radii of atom A i and A j , respectively and R a is the radius of the water molecule (a value of 1.4 Å is used).
As shown in Figure 2, over 99.0% of solid tetrahedra have a sphere radius less than 7.5 Å. Figure 2 further shows that there is a distinct distribution between the radius of the solid and non-solid tetrahedra. They are best separated at around 7.0 Å. On the other hand, the average radius between two contacted amino acid residue as defined above in formula 3 is around 6.0 Å from the statistics analysis of PDB structures (data not shown). In order to include the water molecule in a tetrahedron formed from four residues requires a sphere of at least 7.5 Å. Thus, 7.5 Å is selected as the cut-off value for the virtual atoms. Thus, only those pockets binding to ligands larger than a water molecule are considered.
Performance of the algorithm is evaluated by comparison to the reference binding sites on a protein by protein basis. The performance is measured by two criteria: rank accuracy and boundary sensitivity/specificity. The rank accuracy is the rank of correctly predicted sites over all predicted sites of a protein. The sensitivity and specificity are used to evaluate the residue coverage of the predicted binding site in a protein. They are defined as follows:
Sensitivity = true positives/(true positives + false negatives) × 100
Specificity = true negatives/(false positives + true negatives) × 100
Where true and false positives are correctly and incorrectly predicted number of binding site residues in a protein, respectively. True and false negatives are correctly and incorrectly predicted number of non-binding site residues, respectively, as shown in Figure 3. Thus, the value of both the sensitivity and the specificity for a 100% accurate prediction will be 100.
We are grateful for financial support from the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank and NIH grant GM63208. The RCSB Protein Data Bank is supported by funds from the National Science Foundation (NSF), the National Institute of General Medical Sciences (NIGMS), the Office of Science, Department of Energy (DOE), the National Library of Medicine (NLM), the National Cancer Institute (NCI), the National Center for Research Resources (NCRR), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), National Institute of Neurological Disorders and Stroke (NINDS), and the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). We appreciate the input from three anonymous reviewers and Drs. Jenny Gu and Zhanyang Zhu in our laboratory for suggestions on improving the manuscript.
This article has been published as part of BMC Bioinformatics Volume 8, Supplement 4, 2007: The Second Automated Function Prediction Meeting. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/8?issue=S4.
- Coleman RG, Sharp KA: Travel depth, a new shape descriptor for macromolecules: application to ligand binding. J Mol Biol 2006, 362: 441–458. 10.1016/j.jmb.2006.07.022View ArticlePubMedGoogle Scholar
- Nayal M, Honig B: On the nature of cavities on protein surfaces: application to the identification of drug bidning sites. Proteins: Struct Funct Bioinform 2006, 63: 892–906. 10.1002/prot.20897View ArticleGoogle Scholar
- Coleman RG, Burr MA, Sourvaine DL, Cheng AC: An intuitive approach to measuring protein surface curvature. Proteins: Struct Funct Bioinform 2005, 61: 1068–1074. 10.1002/prot.20680View ArticleGoogle Scholar
- Agarwal PK, Edelsbrunner H, Harer J, Wang Y: Extreme elevation on a 2-manifold. Symp Comp Geo 2004, 20: 357–365.Google Scholar
- Hendrix DK, Kuntz ID: Surface solid angle-based site points for molecular docking. Pac Symp Biocomput: 1998 1998, 317–326.Google Scholar
- Liang J, Edelsbrunner H, Woodward C: Anatomy of protein pockets and cavities: measurement of binding site geometry and implications for ligand design. Protein Sci 1998, 7(9):1884–1897.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang Y, Agarwal PK, Brown P, Edelsbrunner H, Rudolph J: Coase and reliable geometric alignment for protein docking. Pac Symp Biocomput 2005, 10: 64–75.Google Scholar
- Norel R, Wolfson HJ, Nussinov R: Small molecule recognition: solid angles surface representation and molecular shape complementarity. Comb Chem High Throughput Screen 1999, 2(4):223–237.PubMedGoogle Scholar
- May A, Zacharias M: Accounting for global protein deformability during protein-protein and protein-ligand docking. Biochim Biophys Acta 2005, 1754: 225–231.View ArticlePubMedGoogle Scholar
- Fukunishi Y, Mikami Y, Takedomi K, Yamanouchi M, Shima H, Nakamura H: Classification of chemical compounds by protein-compound docking for use in designing a focused library. J Med Chem 2006, 49: 523–533. 10.1021/jm050480aView ArticlePubMedGoogle Scholar
- Fukunishi Y, Kubota S, Kanai C, Nakamura H: A virtual active compound produced from the negative image of a ligand-binding pocket, and its application to in-silico drug screening. J Comput-Aided Mol Des 2006, in press.Google Scholar
- Levitt D, Banaszak L: POCKET: a computer graphics method for identifying and displaying protein cavities and their surronding amino acids. J Mol Graph 1992, 10: 229–234. 10.1016/0263-7855(92)80074-NView ArticlePubMedGoogle Scholar
- Ben-Shimon A, Eisenstein M: Looking at enzymes from the inside out: the proximity of catalytic residues to the molecular centroid can be used for detection of active sites and enzyme-ligand interfaces. J Mol Biol 2005, 351: 309–326. 10.1016/j.jmb.2005.06.047View ArticlePubMedGoogle Scholar
- Brady GPJ, Stouten PF: Fast prediction and visualization of protein binding pockets with PASS. J Comput Aided Mol Des 2000, 14(4):383–401. 10.1023/A:1008124202956View ArticlePubMedGoogle Scholar
- Peters KP, Fauck J, Frommel C: The automatic search for ligand binding sites in proteins of known three-dimensional structure using only geometric criteria. J Mol Biol 1996, 256(1):201–213. 10.1006/jmbi.1996.0077View ArticlePubMedGoogle Scholar
- Laskowski RA: SURFNET: a program for visualizing molecular surfaces, cavities, and inter-molecular interactions. J Mol Graph 1995, 13: 323–330. 307–308 307–308 10.1016/0263-7855(95)00073-9View ArticlePubMedGoogle Scholar
- Hendlich M, Rippmann F, Barnickel G: LIGSITE: Automatic adn efficient detection of potential small molecule-binding sites in proteins. J Mol Graph Model 1997, 15: 359–363. 10.1016/S1093-3263(98)00002-3View ArticlePubMedGoogle Scholar
- Lichtarge O, Bourne HR, Cohen FE: An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 1996, 257: 342–358. 10.1006/jmbi.1996.0167View ArticlePubMedGoogle Scholar
- La D, Sutch B, Livesay DR: Predicting protein functional sites with phylogenetic motifs. Proteins: Struct Funct Bioinform 2005, 58: 309–320. 10.1002/prot.20321View ArticleGoogle Scholar
- Nimrod G, Glaser F, Steinberg D, Ben-Tal N, Pupko T: In silico identification of functional regions in proteins. Bioinformatics 2005, 21(Suppl 1):i328-i337. 10.1093/bioinformatics/bti1023View ArticlePubMedGoogle Scholar
- Glaser F, Morris RJ, Najmanovich RJ, Laskowski RA, Thornton JM: A method for localizing ligand binding pockets in protein structures. Proteins: Struct Funct Bioinform 2006, 62(2):479–488. 10.1002/prot.20769View ArticleGoogle Scholar
- Laurie ATR, Jackson RM: Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites. Bioinformatics 2005, 21(9):1908–1916. 10.1093/bioinformatics/bti315View ArticlePubMedGoogle Scholar
- An J-H, Totrov M, Abagyan R: Pocketome via comprehensive identification and classification of ligand binding envelopes. Mol Cell Proteomics 2005, 4(6):752–761. 10.1074/mcp.M400159-MCP200View ArticlePubMedGoogle Scholar
- Campbell SJ, Gold ND, Jackson RM, Westhead DR: Ligand binding: functional site location, similarity and docking. Curr Opin Struct Biol 2003, 13: 389–395. 10.1016/S0959-440X(03)00075-7View ArticlePubMedGoogle Scholar
- Watson JD, Laskowski RA, Thornton JM: Predicting protein function from sequence and structural data. Curr Opin Struct Biol 2005, 15: 275–284. 10.1016/j.sbi.2005.04.003View ArticlePubMedGoogle Scholar
- Delaunay B: Sur la sphère vide. Izvestia Akademii Nauk SSSR, Otdelenie Matematicheskikh i Estestvennykh Nauk 1934, 7: 793–800.Google Scholar
- Gunasekaran K, Nussinov R: How different are structurally flexible and rigid binding sites? sequence and structural features discriminating proteins that do and do not undergo conformational change upon ligand binding. J Mol Biol 2006, 365: 257–273. 10.1016/j.jmb.2006.09.062View ArticlePubMedGoogle Scholar
- Gutteridge A, Thornton J: Conformational changes observed in enzyme crystal structures upon substrate binding. J Mol Biol 2005, 346(1):21–28. 10.1016/j.jmb.2004.11.013View ArticlePubMedGoogle Scholar
- Dundas J, Zheng O, Tseng J, Binkowski B, Turpaz Y, Liang J: CASTp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated resiudes. Nucleic Acids Res 2006, 34: W116-W118. 10.1093/nar/gkl282PubMed CentralView ArticlePubMedGoogle Scholar
- Honig B, Nicholls A: Classical electrostatics in biology and chemistry. Science 1995, 268(5214):1144–1149. 10.1126/science.7761829View ArticlePubMedGoogle Scholar
- An J-H, Totrov M, Abagyan R: Comprehensive identification of "druggable" protein ligand binding sites. Genome Informatics 2004, 15(2):31–41.PubMedGoogle Scholar
- Amitai G, Shemesh A, Sitbon E, Shklar M, Netanely D, Venger I, Pietrokovski S: Network analysis of protein structures identifies functional residues. J Mol Biol 2004, 344: 1135–1146. 10.1016/j.jmb.2004.10.055View ArticlePubMedGoogle Scholar
- Glick M, Robinson DD, Grant GH, Richards WG: Identification of ligand binding sites on proteins using a multi-scale approach. J Am Chem Soc 2002, 124(10):2337–2344. 10.1021/ja016490sView ArticlePubMedGoogle Scholar
- Bliznyuk A, Gready J: Simple method for locating possible ligand binding sites on protein surfaces. J Comput Chem 1999, 9: 983–988. 10.1002/(SICI)1096-987X(19990715)20:9<983::AID-JCC9>3.0.CO;2-RView ArticleGoogle Scholar
- Verdonk ML, Cole JC, Watson P, Gillet V, Willett P: SuperStar: Improved knowledge-based interaction fields for protein binding sites. J Mol Biol 2001, 307: 841–859. 10.1006/jmbi.2001.4452View ArticlePubMedGoogle Scholar
- Silberstein M, Dennis S, Brown L, Kortvelyesi T, Clodfelter K, Vajda S: Identification of substrate binding sites in enzymes by computational solvent mapping. J Mol Biol 2003, 332(5):1095–1113. 10.1016/j.jmb.2003.08.019View ArticlePubMedGoogle Scholar
- Ruppert J, Welch W, Jain A: Automatic identification and representation of protein binding sites for molecular docking. Protein Sci 1997, 6: 524–533.PubMed CentralView ArticlePubMedGoogle Scholar
- Jones S, Thornton JM: Analysis of protein-protein interaction sites using surface patches. J Mol Biol 1997, 272(1):121–132. 10.1006/jmbi.1997.1234View ArticlePubMedGoogle Scholar
- Deshpande N, Addess KJ, Bluhm WF, Merino-Ott JC, Townsend-Merino W, Zhang Q, Knezevich C, Xie L, Chen L, Feng Z, et al.: The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema. Nucleic Acids Res 2005, (33 Database):D233-D237.Google Scholar
- Barber CB, Dobkin DP, Huhdanpaa H: The Quickhull algorithm for convex hulls. ACM Transactions On Mathematical Software 1996, 22: 469–483. 10.1145/235815.235821View ArticleGoogle Scholar
- Kelly MD, Mancera RL: A new method for estimating the importance of hydrophobic groups in the binding site of a protein. J Med Chem 2005, 48: 1069–1078. 10.1021/jm049524qView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.