- Methodology article
- Open Access
PocketMatch: A new algorithm to compare binding sites in protein structures
© Yeturu and Chandra; licensee BioMed Central Ltd. 2008
- Received: 09 July 2008
- Accepted: 17 December 2008
- Published: 17 December 2008
Recognizing similarities and deriving relationships among protein molecules is a fundamental requirement in present-day biology. Similarities can be present at various levels which can be detected through comparison of protein sequences or their structural folds. In some cases similarities obscure at these levels could be present merely in the substructures at their binding sites. Inferring functional similarities between protein molecules by comparing their binding sites is still largely exploratory and not as yet a routine protocol. One of the main reasons for this is the limitation in the choice of appropriate analytical tools that can compare binding sites with high sensitivity. To benefit from the enormous amount of structural data that is being rapidly accumulated, it is essential to have high throughput tools that enable large scale binding site comparison.
Here we present a new algorithm PocketMatch for comparison of binding sites in a frame invariant manner. Each binding site is represented by 90 lists of sorted distances capturing shape and chemical nature of the site. The sorted arrays are then aligned using an incremental alignment method and scored to obtain PMScores for pairs of sites. A comprehensive sensitivity analysis and an extensive validation of the algorithm have been carried out. A comparison with other site matching algorithms is also presented. Perturbation studies where the geometry of a given site was retained but the residue types were changed randomly, indicated that chance similarities were virtually non-existent. Our analysis also demonstrates that shape information alone is insufficient to discriminate between diverse binding sites, unless combined with chemical nature of amino acids.
A new algorithm has been developed to compare binding sites in accurate, efficient and high-throughput manner. Though the representation used is conceptually simplistic, we demonstrate that along with the new alignment strategy used, it is sufficient to enable binding comparison with high sensitivity. Novel methodology has also been presented for validating the algorithm for accuracy and sensitivity with respect to geometry and chemical nature of the site. The method is also fast and takes about 1/250 th second for one comparison on a single processor. A parallel version on BlueGene has also been implemented.
- Random Perturbation
- Amino Acid Type
- Residue Type
- Ligand Type
- Bind Site Residue
Much of present day biology is dependent on sequence-structure-function relationships in protein molecules, insights obtained for one protein heavily influencing understanding of other proteins in the family. Recognizing similarities and deriving relationships therefore is a fundamental objective in bioinformatics. Some of these similarities are obvious at the sequence level while some are detected at the structure level [1, 2]. It is in fact well established now that the conservation at the structure level of related proteins can be higher and hence much more detectable than at the sequence level . In this context, there are a number of examples in the literature, which illustrate that structures often convey the 'meaning', more efficiently than sequences, here 'meaning' referring to the 'function' of the protein. On the other hand, there are also a number of instances, which illustrate that a particular 'function' is achieved by proteins whose sequences and structures are dis-similar. For example, at least three different proteins with different folds and architectures recognize mannose and exhibit mannose-mediated physiology . In other words, structures also fail to convey the 'meaning' in many cases. We do not yet know if this failure is because of our inability to recognize any similarities in such seemingly dis-similar proteins or it is simply because no similarities actually exist among them. What ultimately matters for a protein molecule however, is its function and not what means it uses to achieve it . A given function could be conserved simply by having similarities in some elements of the structure, such as the binding site residues . A classic example is the large family of serine proteases which are classified into different sequence and structural families, but all come under the functional class of serine proteases due to the presence of the catalytic triad . Comparison of binding sites differ from comparison of whole structures for two main reasons (a) the binding sites are small containing only a few residues and (b) these residues are often not contiguous in sequence. Alignment of two sites containing discrete sets of atoms involves evaluation of a huge number of mappings. This makes it important to have efficient algorithms with low time and space complexities that are capable of identifying and ranking different extents of similarities appropriately. With several structural genomics projects as well as advances in computational methods for structure prediction, the structural databases are growing at a rapid pace, providing experimental structures of thousands and confident homology models of millions of protein molecules [8, 9]http://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtml. Algorithms for identification of binding pockets with reasonable confidence have also been developed [10–13]. The need for large scale comparisons of binding sites is hence accentuated. Some methods are already available for such a purpose based on ideas well established in the field of image processing. For example SitesBase  that uses 'Geometric hashing' involves selection of triads of points representing atomic types and positions in each site and comparing the triangles formed by triads; PINTS  uses a depth first traversal strategy, adopted to find common set of nodes between a pair of graphs connected by similar pattern of edges; Spherical harmonics based algorithm , captures distribution of points representing the site in terms of coefficients of a square integrable function on a unit sphere; CavBase  identifies maximal common sub-graphs between pairs of sites; SPASM and RIGOR  compares distribution of residues from the centroid of the binding sites. Each of the methods have their own merits and demerits, warranting exploration of newer methods for site comparison. Here we present a new algorithm called PocketMatch for representing a binding site in a frame invariant manner and comparison of pairs of sites based on alignment of sorted sequences of distances between pairs of points representing sites.
A new efficient and accurate algorithm PocketMatch has been developed for comparing binding sites in protein structures. A comparison between a pair of sites of an average size of 50 atoms takes only 1/250 th of a second on a single processor. The algorithm has been used for large scale database searches and all-vs-all comparisons. A typical database search of a query site against those from a large dataset comprising about 20000 sites takes in the order of 4 minutes to complete on the same machine with MPI-C version using 4 processors.
Comparison of a pair of binding sites involves three aspects, (a) representation of each site as sorted lists of distances between chosen points, (b) alignment of two sets of distance lists and (c) choosing a scoring scheme for arriving at a final score. Our representation scheme for the site is based on capturing the geometry of a 3D object in a 1D representation by a set of all pairs of distances between points. Such a set of distances would become a frame-invariant representation, a highly desirable scheme for any general shape comparison method. Two sets of distances can be compared for similarity by considering a suitable mapping between distances whose dissimilarity is bounded by a small amount. As the number of such mappings of distances from one set to the other would be huge, we represent them as sorted sequences in ascending order. An alignment strategy for comparison of two sequences of distances is also presented. The different steps involved are described.
Representation of binding site
Step-1 A set of residues whose one or more atoms surround a given crystallographic ligand within a specified distance (4Å as default) from each atom of the ligand is taken as the binding site. We chose to consider atoms of the complete residues corresponding to each of these atoms for shape-representation. For any 3D object represented by a set of points, the set of all pairs of distances between the points encodes the shape of the object. These points can then be flagged with specific chemical properties. 20 amino acids were considered in 5 groups – Group-0:(A,V,I,L,G,P); Group-1:(K,R,H); Group-2:(D,E,Q,N); Group-3:(Y,F,W); Group-4:(C,S,T).
Grouping is implemented as a user defined parameter enabling use of other types of grouping.
where indicates coordinates of the i th atom of the side chain and n is the number of atoms in the side chain.
Step-3 Binning of distances and representation format.
(a) Compute distances between all pairs of points and bin the distances into 90 sets corresponding to group-type-pair and point-type-pairs.
(i) There are 5 types of residue-groups. Therefore 5 * (5 - 1)/2 + 5 → 15 pairs of groups are possible.
(ii) There are 3 types of points. Therefore 3 * (3 - 1)/2 + 3 → 6 pairs of point types are possible.
(iii) Considering both residue-group and point-type information, total number of possible sets of distances would be 15 * 6 ⇒ 90.
Where, NGP: Number of pairs of group-types, NTP: Number of pairs to point-types, ND i : Number of distances in the i th bin, d j : distance between j th pair of points.
Step-4 The distances in each list were sorted in ascending order.
Alignment of a pair of distance-sets
To compute similarity score between two binding sites, each of the 90 sets of one site has to be compared with its corresponding set of the other site. In the next step, similarity between sets was computed using a greedy strategy for alignment of sorted distance-sequences.
Step-5: Let the given pair of sorted(ascending) distance lists be S1 and S2 where each element is indexed by S[i] for the list, S. Let the threshold for alignment of two distances be τ i.e., |S1[i] - S2[j]| ≤ τ. Let m = |S1| and n = |S2| denote the cardinalities of the sets S1 and S2 respectively. Let the variable counter hold the number of matched distances between two sequences. Then alignment strategy is as indicated in the (Sub-routine 1) below.
Sub-routine 1 Alignment of a pair of sorted distance sequences
i = 0; j = 0; counter = 0;
while (i ≤ m) ∧ (j ≤ n) do
if |S1[i] - S2[j]| ≤ τ then
i ← i + 1; j ← j + 1
counter ← counter + 1
if S1[i] <S2[j] then
i ← i + 1;
j ← j + 1;
Scoring for similarity between sites
where |S| indicates cardinality of the set.
This measure of similarity, referred to as PMScore is used as the default scoring scheme in this study. A variant of this scoring scheme PMScore min , in which the denominator was taken as the minimum (|S1|, |S2|), was also explored, since it emphasises local structural similarity ignoring the relative size mismatch between the two sites being compared. However we observed that the comparison was too insensitive with such a scheme and did not use it further. τ is an important parameter that governs the alignment, which decides when a given pair of distances constitute a match. Different values of τ (please see section on sensitivity analysis) were tried and a default of 0.5Å was used in this study.
Different datasets used in the study.
Final set used in the study
One representative ligand of each type is considered in each protein; filtered to remove sites for small and covalently bound ligands; retained proteins common with the SCOP database
Filtered to remove sites for small, covalently bound ligand and all suggested by Jackson and co-workers (10) to contribute to noise; considered sites for all ligand types for the remaining
Curated dataset of CIT,MTX,MK1 and PGA
Ligands for varying sizes and types from PDBBind with multiple sites for a ligand in different proteins
Multiple sites for each ligand in the same protein; filtered to remove small and covalently bound ligands
Based on SCOP classification
where N is the dimension of the matrix.
Figure 1(c) indicates superposition to show similarities or XOR (and automatically also dis-similarities) of two matrices of the same dimension. The red and blue dots in the 2 matrices would superpose well only for protein pairs showing both SCOP family as well as binding site similarity. Where SCOP similarity is 0 and PMScore is also 0, a blank space will be present in both and hence will remain as blank space in the superposition also. Figure 1(c) illustrates superposed matrices, at a threshold of 50% for which the dis-similarity and similarity scores are 0.0284 and 0.9716 respectively.
PocketMatch matrices can be constructed at different thresholds, and the above excerise of superposition and computing XOR repeated for each threshold. Figure 1(d) illustrates XOR between the PMScore and the SCOP matrices as a function of PMScore threshold. As expected the XOR values start at a value close to 1 when the threshold is very low (at 0%) and inch towards 0 when the thresholds are increased to 100%. A sharp decline in XOR can be seen at around a threshold of 40%. This means that when a pair of sites match with a score of 40% or higher, they exhibit the same SCOP classification. At lower thresholds, for example at 20%, i.e., when two sites are considered similar to each other even when they match to only 20%, the correspondence of such a metric with the SCOP matrix is poor. This analysis serves to validate two points, (a) that higher PMScores are reflective of true-positives in the context of the family level SCOP classification and (b) that lower PMScores are reflective of true-negatives by the same logic. The graph also shows that a score of 40% is discriminatory enough to identify a true-positive.
Similarities in tetramers
Using a curated dataset of known similarities
Sensitivity with respect to random perturbation of positions of site points
For a small perturbation of say < 2.0Å the PMScores between the original and the perturbed site was seen to be very high with all scoring schemes for both the ligands. This indicates that the algorithm is robust enough to recognize two sites as similar even when minor perturbations to the extent of 2Å are present. When the perturbation was increased to 5Å the PMScores showed lower values as expected. With scoring scheme-1(green, τ = 1.0) which was most liberal in terms of its similarity definition, the PMScores varied very little even with high perturbation where as when scheme-5(yellow, τ = 0.01) was used, PMScores moved to near zero even for small perturbations of about 0.1Å. The 2 nd , 3 rd and 4 th scoring schemes corresponding to τ of 0.5, 0.25 and 0.125 respectively, showed in-between trends with a reasonable balance between sensitivity and robustness. The figure shows that the trend followed by the various schemes is consistent for different sizes of ligands. Given that the average coordinate error in crystal structures is in the order 0.5Å we have used the scoring scheme-2(red) as the default scheme for large scale analysis. τ is however implemented as a user defined parameter. This analysis indicates that the scores obtained are (a) reflective of the extent of similarities, (b) resistant to minor perturbations in the site, (c) scoring schemes are self consistent validating the basic logic used in the algorithm and (d) perturbed sites where some atoms have moved even up to 5Å are recognizable as similar to their original sites, albeit with lower PMScores, because of retaining the overall nature of the site and high similarities with respect to remaining atoms in the site.
Sensitivity with respect to random perturbation of residue types of site points
Comparison of PocketMatch with other site matching methods
A benchmarking exercise of PocketMatch vis-a-vis the established binding site matching algorithms. SitesBase , SuMo , ProFunc  and a spherical harmonics method , was carried out, to place PocketMatch in context of other methods previously reported in literature.
SitesBase utilizes geometric hashing that works by evaluating many superpositions of triads of atoms between two sites. All atoms within 5 Å are considered to constitute a binding site. An atom-atom score is computed based on number of matching atoms in the sense of least RMSD. Empirically determined low and high scores are 25 and 40 respectively. Each score is flagged by Poisson-Index (PI) a measure of indicating the possibility of reaching that score by chance. The reported time for comparison of 82-atom pocket against a database of 33168 sites is 20 minute . SitesBase is available on the web http://www.modelling.leeds.ac.uk/sb/ as a database search engine with pre-defined proteins and sites, but not available for comparing user-defined structures. Hence some randomly chosen examples that were already available in the pre-computed database of SitesBase were considered for comparison with PocketMatch. The binding site search utility in ProFunc , a web service http://www.ebi.ac.uk/thornton-srv/databases/ProFunc/ for comprehensive analysis of proteins, employs the algorithm – JESS . Jess uses a backtracking and depth first search method that works by determining appropriate correspondence between atoms in the pair of sites based on distance constraints. Ligand binding site templates are generated as triads of residues within 5Å each and having at least one non-hydrophobic residue flanking the location of a ligand in a PDB structure. Two sites are evaluated for similarity based on superposition (RMSD) and number of matches. A pure shape based method by  is based on representation of distribution of points from the centre of binding site by real spherical harmonic expansion coefficients. Binding sites are determined by SURFNET  as a collection of spheres filling the cavity. A ball of 1.4Å is then rotated around the collection to generate surface points. It is the shape of these points as a whole that is captured by the spherical harmonic coefficients and compared against other sets of points and hence other sites. The algorithm is not automatically available on the web or for local installation. Hence a dataset of 40 proteins reported by the authors was considered for comparison with PocketMatch. SuMo  is based on identification of common subgraphs between pairs of sites by greedy algorithmic strategy which may not lead to maximal common subgraph always. The nodes in the graphs are triangles where each triangle corresponds to a functional group in the binding site. Between a pair of sites, the triangles are matched followed by actual superposition of atoms of the sites by least squares optimization. The software is available at http://sumo-pbil.ibcp.fr/cgi-bin/sumo-welcome/. The implementation of this method on the web allows comparison only with their pre-computed database of proteins and sites, but not available for comparing user-defined structures. In addition to these the align module of PyMol has also been used as another method of matching sites. PyMol considers sequence similarity and uses a least squares fit of the two sites, and reports number of residues matched and an RMSD for the matched residues.
Comparison of PocketMatch with other site matching algorithms- ProFunc, SuMo, SitesBase and PyMol
Score/Max.score = Ratio
Pairs of proteins belonging to same SCOP families
65/78 = 83.33
74/79 = 93.67
111/111 = 100
100/130 = 76.92
52/177 = 29.38
97/177 = 54.80
67/72 = 93.06
57/95 = 60
46/53 = 86.79
30/54 = 55.55
Pairs of proteins belonging to different SCOP families
47/86 = 54.65
44/94 = 46.81
Isoniazid, a front-line drug used in the treatment of tuberculosis, is a pro-drug after activation, has been well known to target the inhA-encoded enoyl-ACP reductase (1ZID), thus inhibiting mycolic acid biosynthesis in Mycobacterium tuberculosis. Recently, a surprising observation of binding of isoniazid-adduct to dihydrofolate reductase has been reported  (PDB:2CIG). PocketMatch identified the isoniazid-adduct pockets in the two sites to be similar with a score of 56%, while the other methods failed to identify significant similarities between the two proteins.
It is no surprise that methods vary in their performance especially when the two sites are not highly similar, since different representations capture different aspects of the binding site. PocketMatch using a frame invariant sorted string of distances between point pairs captures the site by utilizing all available points in the site and hence reflects the shape and size of the site efficiently at the same time also reflecting the nature of amino acids, leading it to perform with higher sensitivity in detecting binding site similarities when some elements in the sites are different.
The benchmarking exercises have been useful in validating PocketMatch against those already available in the literature. Besides detecting obvious similarities, such comparison studies reveal interesting differences in the performance of different methods. Two major issues that feature in the comparisons are the sensitivity of similarity detection and speed. PocketMatch has an advantage in both these factors. The performance of PocketMatch was found to be good with different datasets used in this study. An added advantage with our method compared to others is that, scores reflecting local similarities PMScore min as well as global site similarities PMScore can both be obtained at the same time, thus readily yielding information about part similarities as well. This can be attributed in part to the fact that changes between two sites such as correlated mutations that maintain the same chemistry and distance in that pair are scored high by PocketMatch but not by some other methods. Similarly, amino acids at a binding site superposing well in structure but formed by dis-similar polypeptide segments also score high in PocketMatch.
All vs all comparison – Site geometry vs residue types
A new algorithm has been developed to compare binding sites in accurate, efficient and high-throughput manner, where sites are represented as 90 lists of sorted distances flagged with residue type information. This representation therefore captures both the shape and chemical nature of amino acid types at the site. Extensive validation has been performed using different datasets. Sensitivity analysis has also been performed to analyze the performance of the algorithm with respect to perturbations of two types – (a) in the actual atomic positions of atoms of the site and (b) in the amino acid type at the site. Several scoring schemes have been analyzed by virtue of which a scoring function with a good balance between sensitivity and ability to detect similarities has been identified and recommended as the default scoring scheme. Perturbation studies where the geometry of a given site was retained but the residue types were changed randomly, indicated that chance similarities were virtually non-existent. Our analysis also suggests that shape information alone is insufficient to discriminate between diverse binding sites. However, combining shape information with chemical grouping of amino acids at the site enables discrimination between different types of sites.
PocketMatch has been validated on a variety of datasets – PDBBind, a set of tetrameric proteins and a curated dataset containing known similarities of four ligand types (Table 1). To eliminate noise in the datasets, sites corresponding to small ligands with less than 6 non-hydrogen atoms and covalently bound ligands were not considered here. The PDBBind dataset  contains a comprehensive curated set of 1091 protein-ligand complexes determined crystallographically. Using this, two sub-datasets were derived – Table 1-i: A dataset which has only one ligand site for one ligand type in each protein, amounting to 786 proteins and 893 sites and Table 1-ii: A dataset in which ligands suggested to contribute to noise by Jackson and co-workers  were removed, but all sites for all ligand types were considered, amounting to 456 proteins and 1146 sites corresponding to 289 ligand types. Tetramers (Table 1-iv), obtained from PQS server, containing 3768 proteins has been curated to yield a dataset of 1525 proteins having 11301 sites has been chosen for studying the sensitivity of PocketMatch with respect to recognizing known highly similar sites. Another dataset representing multiple sites for four known ligands was curated (Table 1-iii). 51 sites from 27 different proteins in PDBBind for Citrate (CIT), Methotrexate (MTX), Indinavir (MK1) and phosphoglycerate (PGA) were chosen for the dataset. A distance metric measuring the dis-similarity between these sites was computed using PocketMatch and their clustering was studied.
Scheme for Validation
We have validated the algorithm based on (a) SCOP equivalences between pairs of proteins at 4 th level of SCOP , (b) similarities among multiple sites for a given ligand in the tetramer dataset Table 1-iv and (c) clusters formed by sites whose similarities have been previously identified by independent analysis. To compare with SCOP, we constructed a matrix, M1 where each of the rows and columns correspond to list proteins and each element corresponds to a score assigned to that pair of proteins, i and j. A score of 1.0 was assigned to those pairs that had the same SCOP class. All other pairs were assigned a score of 0. Then we constructed another matrix, M2 of the same dimension where a score of 1.0 was assigned to those pairs whose top scoring site-pair had a PMScore greater than a threshold and score of 0 for all others. The two matrices were then compared using the XOR, a logical operator to determine the extent of agreement between the fold and the site metrics in finding similarities in all pairs of proteins represented in the matrices. Understandably, the same comparison can also be done through the XNOR operator, the complement of XOR, which will yield the same information.
Schemes for sensitivity analysis
Sensitivities of PocketMatch (a) with respect to geometry and (b) with respect to specificity of residue types at the site, were studied. For (a), we applied tiny perturbations to coordinates to each of the points representing sites, leading to generation of altered structures within bounded a RMSD and computed PMScores between pairs of perturbed and reference sites. These plots are obtained for various sizes of binding sites. For (b), we randomly assigned residue group information to each of site points and computed PMScores.
Method for random perturbation to positions of site-points
Our method for random perturbation applies random displacement from a uniform distribution to each of the points C α , C β , and C centroid corresponding to each residue as described in (Sub-routine 2).
Sub-routine 2 Performing random perturbation to positions of points of the site
a Let the 'net required RMSD' be ρ and number of intermediate randomly perturbed point-sets be K.
b In order to generate, K perturbed sets of points,
for EACH δ = 0 till ρ in steps of ; do
- Perturb the actual set of points to generate a new point-set at an RMSD of δ.
- Let each point be represented by = (x1, x2, x3) and let the random vector be = (r1, r2, r3).
- Initialize each r i to ; where random() function generates a random number between 0 and RAND_MAX from 'uniform distribution'.
- Normalise by reinitializing each .
- Apply the present displacement, * δ to each atom which set each
- These steps generate a perturbed set of points with a net RMSD of δ.
c For each of the perturbed point-set generated, compute RMSD with respect to the unperturbed version and measure PMScores.
Method for random perturbation of types of site-residues
Our method for random perturbation of group numbers assigned to residues of site involves assignment of random integers between 0 and 4 from a 'uniform' distribution to the residues represented by three points, C α , C β , and C centroid as described in (Sub-routine 3).
Sub-routine 3 Random perturbation of types of site residues
a Get C α 's alone for a given site represented in C α , C β , and C centroid – format.
b To each of the C α 's assign a random number between 0 and 4 using
c Copy back the modified group information of C α atoms back to the file respresenting site by C α , C β , and C centroid points.
d The PMScore is then computed for the perturbed site with respect to unperturbed version.
The software PocketMatch was developed on gcc (GCC) 3.4.3 20041212version 4.1.2 on Linux 2.6.9-5.ELsmp machine. A parallelized version of the software using MPI-C libraries was also developed and implemented both on a standard Quad-core machine (Intel (R) Core (TM) 2 Quad CPU @ 2.4 GHz; Address space: 32 bits physical, 48 bits virtual; 4GB RAM) and also on a IBM BlueGene cluster and tested with 512 processors. Matlab version 7.1 from Mathworks http://www.mathworks.com/ was used for generating plots and histograms. Cluster tree was constructed using neighbour-joining method of phylip-3.67 http://evolution.genetics.washington.edu/phylip.html and viewed using PhyloDraw . PyMol http://www.pymol.org version 0.99rc1 was used for visualizing various structures.
The software can be accessed at http://proline.physics.iisc.ernet.in/pocketmatch/
Support for the Centre of Excellence in Bioinformatics by Department of biotechnology(DBT), Govt. of India and facilities at the Supercomputer Education and Research Centre of this institute are gratefully acknowledged. Financial support from DBT computational genomics initiative is also acknowledged.
- Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 1995, 247: 536–540.PubMedGoogle Scholar
- Orengo CA, Pearl FM, Bray JE, Todd AE, Martin AC, Lo Conte L, Thornton JM: The CATH Database provides insights into protein structure/function relationships. Nucleic Acids Res 1999, 27: 275–279. 10.1093/nar/27.1.275PubMed CentralView ArticlePubMedGoogle Scholar
- Holm L, Sander C: Mapping the protein universe. Science 1996, 273(5275):595–603. 10.1126/science.273.5275.595View ArticlePubMedGoogle Scholar
- Ramachandraiah G, Chandra NR: Sequence and structural determinants of mannose recognition. Proteins 2000, 39(4):358–364. 10.1002/(SICI)1097-0134(20000601)39:4<358::AID-PROT80>3.0.CO;2-MView ArticlePubMedGoogle Scholar
- Prasad T, Subramanian T, Hariharaputran S, Chaitra HS, Chandra N: Extracting hydrogen-bond signature patterns from protein structure data. Appl Bioinformatics 2004, 3(2–3):125–135. 10.2165/00822942-200403020-00007View ArticlePubMedGoogle Scholar
- Russell R, Sasieni P, Sternberg M: Supersites within superfolds. Binding site similarity in the absence of homology. Journal of Molecular Biology 1998, 282: 903–18. 10.1006/jmbi.1998.2043View ArticlePubMedGoogle Scholar
- Dodson G, Wlodawer A: Catalytic triads and their relatives. Trends Biochem Sci 1998, 23(9):347–352. 10.1016/S0968-0004(98)01254-7View ArticlePubMedGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov I, Bourne P: The Protein Data Bank. Nucleic Acids Research 2000, 28: 235–242. 10.1093/nar/28.1.235PubMed CentralView ArticlePubMedGoogle Scholar
- Pieper U, Eswar N, Braberg H, Madhusudhan MS, Davis FP, Stuart AC, Mirkovic N, Rossi A, Marti-Renom MA, Fiser A, Webb B, Greenblatt D, Huang CC, Ferrin TE, Sali A: MODBASE, a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Res 2004, (32 Database):217–222. 10.1093/nar/gkh095Google Scholar
- Glaser F, Morris RJ, Najmanovich RJ, Laskowski RA, Thornton JM: A method for localizing ligand binding pockets in protein structures. Proteins 2006, 62(2):479–488. 10.1002/prot.20769View ArticlePubMedGoogle Scholar
- Laurie AT, Jackson RM: Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites. Bioinformatics 2005, 21(9):1908–1916. 10.1093/bioinformatics/bti315View ArticlePubMedGoogle Scholar
- Huang B, Schroeder M: LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation. BMC Struct Biol 2006, 6: 19–19. 10.1186/1472-6807-6-19PubMed CentralView ArticlePubMedGoogle Scholar
- Kalidas Y, Chandra N: PocketDepth: A new depth based algorithm for identification of ligand binding sites in proteins. J Struct Biol 2008, 161(1):31–42. 10.1016/j.jsb.2007.09.005View ArticlePubMedGoogle Scholar
- Gold ND, Jackson RM: Fold Independent Structural Comparisons of Protein-Ligand Binding Sites for Exploring Functional Relationships. Journal of Molecular Biology 2006, 355: 1112–1124. 10.1016/j.jmb.2005.11.044View ArticlePubMedGoogle Scholar
- Stark A, Russell RB: Annotation in three dimensions. PINTS: Patterns in Non-homologous Tertiary Structures. Nucleic Acids Research 2003, 31: 3341–3344. 10.1093/nar/gkg506PubMed CentralView ArticlePubMedGoogle Scholar
- Morris RJ, Najmanovich RJ, Kahraman A, Thornton JM: Real spherical harmonic expansion coefficients as 3D shape descriptors for protein binding pocket and ligand comparisons. Bioinformatics 2005, 21: 2347–2355. 10.1093/bioinformatics/bti337View ArticlePubMedGoogle Scholar
- Schmitt S, Kuhn D, Klebe G: A new method to detect related function among proteins independent of sequence and fold homology. J Mol Biol 2002, 323(2):387–406. 10.1016/S0022-2836(02)00811-2View ArticlePubMedGoogle Scholar
- Kleywegt GJ: Recognition of Spatial Mofits in Protein Structures. Journal of Molecular Biology 1999, 285: 1887–1897. 10.1006/jmbi.1998.2393View ArticlePubMedGoogle Scholar
- Jambon M, Imberty A, Deléage G, Geourjon C: A new bioinformatic approach to detect common 3D sites in protein structures. Proteins 2003, 52(2):137–145. 10.1002/prot.10339View ArticlePubMedGoogle Scholar
- Laskowski RA, Watson JD, Thornton JM: ProFunc: a server for predicting protein function from 3D structure. Nucleic Acids Res 2005, (33 Web Server):89–93. 10.1093/nar/gki414Google Scholar
- Gold ND, Jackson RM: SitesBase: a database for structur-based protein-ligand binding site comparisons. Nucleic Acids Research 2006, 34: D231-D234. 10.1093/nar/gkj062PubMed CentralView ArticlePubMedGoogle Scholar
- Barker JA, Thornton JM: An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis. Bioinformatics 2003, 19: 1644–1649. 10.1093/bioinformatics/btg226View ArticlePubMedGoogle Scholar
- Laskowski RA: SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. J Mol Graph 1995, 13(5):323–330. 10.1016/0263-7855(95)00073-9View ArticlePubMedGoogle Scholar
- Argyrou A, Vetting MW, Aladegbami B, Blanchard JS: Mycobacterium tuberculosis dihydrofolate reductase is a target for isoniazid. Nat Struct Mol Biol 2006, 13(5):408–413. 10.1038/nsmb1089View ArticlePubMedGoogle Scholar
- Wang R, Fang X, Lu Y, Wang S: The PDBBind Database: Collection of Binding Affinities for Protein-Ligand Complexes with known Three-dimensional Structure. J Med Chem 2004, 47: 2977–2980. 10.1021/jm030580lView ArticlePubMedGoogle Scholar
- Choi JH, Jung HY, Kim HS, Cho HG: PhyloDraw: a phylogenetic tree drawing system. Bioinformatics 2000, 16(11):1056–1058. 10.1093/bioinformatics/16.11.1056View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.