Ligand binding site superposition and comparison based on Atomic Property Fields: identification of distant homologues, convergent evolution and PDB-wide clustering of binding sites
© Totrov; licensee BioMed Central Ltd. 2011
Published: 15 February 2011
A new binding site comparison algorithm using optimal superposition of the continuous pharmacophoric property distributions is reported. The method demonstrates high sensitivity in discovering both, distantly homologous and convergent binding sites. Good quality of superposition is also observed on multiple examples. Using the new approach, a measure of site similarity is derived and applied to clustering of ligand binding pockets in PDB.
Experimental structural biology efforts are uncovering protein structures at unprecedented rate. There is a need to understand relationships and discover similarities between the solved structures. While fold comparisons are routinely performed to identify homologies that are at or beyond the limit of the sequence comparison methods, some functional relationships can only be detected at the level of binding sites. Ultimately, it is the configuration of these sites rather than overall sequence or fold, that determine enzymatic or signal transduction activity of a protein.
Most existing methods for binding site comparison are based on some form of coarse-grain representation of the geometry and properties of the pocket as a set of points or centers. Using a variety of algorithms, correspondence between the two sets is established. FLAP  algorithm first generates GRID  molecular interaction fields, which are used to detect locations where interactions of chemical groups with particular pharmacophoric features would be most favorable. Four-point pharmacophores are constructed from these points and used for target site matching. PocketMatch  is an algorithm for comparison of binding sites in a frame-invariant manner, based on representation of the sites by sorted lists of distances capturing shape and chemical nature of the site. Lists are compared using a special alignment algorithm and PMScore function. IsoCleft  detects 3D atomic similarities between binding sties using a graph-matching method. Protein functional surfaces  methodology attempts to optimize global shape and local physicochemical ‘texture’ match between a pair of surfaces using object recognition techniques. Often, search algorithm is combined with a specially compiled database of binding sites, for example CPASS database comprises ligand-defined binding sites found in the protein data bank (PDB) and CPASS algorithm compares these ligand defined sites to determine similarity without maintaining sequence connectivity. Similarly, SURFACE is a database of protein surface regions, with finctional surface patches defined by sets of residues, and searches performed by matching the residue sets. CavBase is a dataset of cavities extracted from PDB and searcheable using an algorithm that matches pseudocenters analogous to pharmacophoric points . The Superimposé webserver  implements several superposition and comparison methods in an on-line format and allows detection of similarities between binding sites or entire proteins. A searchable database for comparing protein-ligand binding sites for the analysis of structure-function relationships has been reported , including comparison method based on geometric hashing, which identifies maximum common sub-graph of atomic features. Med-SuMo rapidly compares protein surfaces represented by triplets of chemical groups. Standard 3-, 4- and 5-point pharmacophores extracted from binding pockets identified by icmPocketFinder across human PDB protein structures were used create a virtual library of sites in human pocketome, and querying the library with a pharmacophore of methyl-lysine binding site, interesting non-trivial hits were retrieved . Of note, another perspective on the pocket comparison problem, which is to detect principal differences between related sites, was taken by several groups [15–17].
Discretized representation of the continuous pocket surface by amino-acid residues, chemical groups, pharmacophoric points or similar descriptors, allows very rapid comparison but may not be always adequate to capture distant similarities. Pharmacophoric points are well-suited to represent highly localized interaction centers, such as hydrogen bond donors and acceptors. Hydrophobic interactions and shape complementarity on the other hand are continuously distributed properties that lend themselves poorly to point representation. Moreover, to detect distant pocket similarities, ‘fuzzy’ matching may be needed because some of the discrete features may disappear, appear or change. These issues can be partially overcome by increasing the number of representative points and allowing partial matches.
In the present work, the APF approach is adapted to the problem of binding site/pocket superposition. The resulting pocket superposition method is tested on multiple distantly similar pocket examples. The method also produces a score characterizing the degree of similarity of the pockets. The utility of the APF site superposition as a site comparison method is evaluated by calculating a complete distance matrix for the set of over 5000 binding sites in scPDB binding site database. Finally, clustering of this available slice of the pocketome is performed.
Adaptation of the APF ligand superposition method to binding site superposition
The original APF ligand superposition protocol consists of (I) generation of grids with 7 APF potential components from the template (static) ligand and (II) optimization of the target ligand in the grid APF potentials combined with internal force-field energy of the ligand. Monte-Carlo with gradient minimization after each random step is used as a global energy optimizer. Six variables controlling overall position of the ligand as well as torsions around rotatable bonds are optimized.
Distance matrix calculation and clustering
APF pseudo-energy or score EAPF for the optimal superposition reflects the similarity of the atomic property distributions of the two binding pockets. It can be used directly for ranking of the database binding sites by their similarity to a query. However, for some other applications such as clustering, it is necessary to derive a similarity measure that behaves distance-like, rather then ranking score-like. In particular, for a pair of non-identical sites it has to be a positive value that increases as they become more dissimilar and becomes zero for identical pairs. On the other hand, EAPF is always negative, and the value for identical sites varies depending on the size and composition of the site. To convert EAPF to a normalized dot product-like measure with a correct asymptotic behavior, we used the following formula:
SAPF = tanh((EAPF-E0)/∆0),
where E0 and ∆0 are empiric parameters. Next, distance-like similarity measure is obtained from dot-product-like:
Results and discussion
Interestingly, a super-cluster emerged around GTP- and ATPases, grouping together other phosphatases, phosphorylases and phosphodiesterases, very likely due to common features associated with phosphate binding. Rossman fold-based NAD- and FAD- oxydases/reductases and SAM methyltransferases formed another large loose supercluster, having in common the adenine binding sub-site.
Similarly, FAD and NAD cofactors in in UDP-galactose 4-epimerase and D-amino acid oxydase share the same binding mode for the common nucleotide and this homology is successfully detected despite very different portions that coordinate flavine and nicotinamide (Fig 7b).
Sensitive and accurate binding site comparison is a technology with multiple important applications. Binding site databases could be screened for putative off-target sites for known or candidate drugs, either to discover and avoid side-effects or to find new applications. Functional annotation of ‘orphan’ pockets on newly resolved protein structures could be aided by identification of similar sites if known function. Initial drug design leads for new target proteins may be suggested by ligands binding similar sites in well-studied proteins. In contrast to previously reported methods, APF BSS utilizes continuous similarity measure and optimization algorithm which may identify and successfully superimpose distantly related sites missed by point-based approaches. Promising results in PDB-wide site comparisons illustrate sensitivity and accuracy of APF BSS.
Author wishes to acknowledge stimulating discussions with Ruben Abagyan. This work was partially supported by the NIH grant 1R43GM74343.
This article has been published as part of BMC Bioinformatics Volume 12 Supplement 1, 2011: Selected articles from the Ninth Asia Pacific Bioinformatics Conference (APBC 2011). The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/12?issue=S1.
- Baroni M, Cruciani G, Sciabola S, Perruccio F, Mason JS: A common reference framework for analyzing/comparing proteins and ligands. Fingerprints for Ligands and Proteins (FLAP): theory and application. J Chem Inf Model 2007, 47(2):279–294. 10.1021/ci600253eView ArticlePubMedGoogle Scholar
- Goodford PJ: A computational procedure for determining energetically favorable binding sites on biologically important macromolecules. J Med Chem 1985, 28(7):849–857. 10.1021/jm00145a002View ArticlePubMedGoogle Scholar
- Yeturu K, Chandra N: PocketMatch: a new algorithm to compare binding sites in protein structures. BMC bioinformatics 2008, 9: 543. 10.1186/1471-2105-9-543PubMed CentralView ArticlePubMedGoogle Scholar
- Najmanovich R, Kurbatova N, Thornton J: Detection of 3D atomic similarities and their use in the discrimination of small molecule protein-binding sites. In Bioinformatics. Volume 24. Oxford, England; 2008:i105–111. 10.1093/bioinformatics/btn263Google Scholar
- Binkowski TA, Joachimiak A: Protein functional surfaces: global shape matching and local spatial alignments of ligand binding sites. BMC Struct Biol 2008, 8: 45. 10.1186/1472-6807-8-45PubMed CentralView ArticlePubMedGoogle Scholar
- Powers R, Copeland JC, Germer K, Mercier KA, Ramanathan V, Revesz P: Comparison of protein active site structures for functional annotation of proteins and drug design. Proteins 2006, 65(1):124–135. 10.1002/prot.21092View ArticlePubMedGoogle Scholar
- Ferre F, Ausiello G, Zanzoni A, Helmer-Citterich M: SURFACE: a database of protein surface regions for functional annotation. Nucleic acids research 2004, 32(Database issue):D240–244. 10.1093/nar/gkh054PubMed CentralView ArticlePubMedGoogle Scholar
- Schmitt S, Kuhn D, Klebe G: A new method to detect related function among proteins independent of sequence and fold homology. J Mol Biol 2002, 323(2):387–406. 10.1016/S0022-2836(02)00811-2View ArticlePubMedGoogle Scholar
- Bauer RA, Bourne PE, Formella A, Frommel C, Gille C, Goede A, Guerler A, Hoppe A, Knapp EW, Poschel T, et al.: Superimpose: a 3D structural superposition server. Nucleic acids research 2008, 36(Web Server issue):W47–54. 10.1093/nar/gkn285PubMed CentralView ArticlePubMedGoogle Scholar
- Gold ND, Jackson RM: A searchable database for comparing protein-ligand binding sites for the analysis of structure-function relationships. J Chem Inf Model 2006, 46(2):736–742. 10.1021/ci050359cView ArticlePubMedGoogle Scholar
- Brakoulias A, Jackson RM: Towards a structural classification of phosphate binding sites in protein-nucleotide complexes: an automated all-against-all structural comparison using geometric matching. Proteins 2004, 56(2):250–260. 10.1002/prot.20123View ArticlePubMedGoogle Scholar
- Jambon M, Andrieu O, Combet C, Deleage G, Delfaud F, Geourjon C: The SuMo server: 3D search for protein functional sites. In Bioinformatics. Volume 21. Oxford, England; 2005:3929–3930. 10.1093/bioinformatics/bti645Google Scholar
- An J, Totrov M, Abagyan R: Pocketome via comprehensive identification and classification of ligand binding envelopes. Mol Cell Proteomics 2005, 4(6):752–761. 10.1074/mcp.M400159-MCP200View ArticlePubMedGoogle Scholar
- Campagna-Slater V, Arrowsmith AG, Zhao Y, Schapira M: Pharmacophore screening of the protein data bank for specific binding site chemistry. J Chem Inf Model 2010, 50(3):358–367. 10.1021/ci900427bView ArticlePubMedGoogle Scholar
- Sheridan RP, Holloway MK, McGaughey G, Mosley RT, Singh SB: A simple method for visualizing the differences between related receptor sites. J Mol Graph Model 2002, 21(3):217–225. 10.1016/S1093-3263(02)00166-3View ArticlePubMedGoogle Scholar
- Kastenholz MA, Pastor M, Cruciani G, Haaksma EE, Fox T: GRID/CPCA: a new computational tool to design selective ligands. J Med Chem 2000, 43(16):3033–3044. 10.1021/jm000934yView ArticlePubMedGoogle Scholar
- Pastor M, Cruciani G: A novel strategy for improving ligand selectivity in receptor-based drug design. J Med Chem 1995, 38(23):4637–4647. 10.1021/jm00023a003View ArticlePubMedGoogle Scholar
- Totrov M: Atomic property fields: generalized 3D pharmacophoric potential for automated ligand superposition, pharmacophore elucidation and 3D QSAR. Chem Biol Drug Des 2008, 71(1):15–27.View ArticlePubMedGoogle Scholar
- Giganti D, Guillemain H, Spadoni JL, Nilges M, Zagury JF, Montes M: Comparative evaluation of 3D virtual ligand screening methods: impact of the molecular alignment on enrichment. J Chem Inf Model 2010, 50(6):992–1004. 10.1021/ci900507gView ArticlePubMedGoogle Scholar
- Grigoryan AV, Kufareva I, Totrov M, Abagyan RA: Spatial chemical distance based on atomic property fields. J Comput Aided Mol Des 2010, 24(3):173–182. 10.1007/s10822-009-9316-xPubMed CentralView ArticlePubMedGoogle Scholar
- Kellenberger E, Muller P, Schalon C, Bret G, Foata N, Rognan D: sc-PDB: an annotated database of druggable binding sites from the Protein Data Bank. J Chem Inf Model 2006, 46(2):717–727. 10.1021/ci050372xView ArticlePubMedGoogle Scholar
- Abagyan R, Totrov M: Biased probability Monte Carlo conformational searches and electrostatic calculations for peptides and proteins. J Mol Biol 1994, 235(3):983–1002. 10.1006/jmbi.1994.1052View ArticlePubMedGoogle Scholar
- Totrov M, Abagyan R: Detailed ab initio prediction of lysozyme-antibody complex with 1.6 A accuracy. Nat Struct Biol 1994, 1(4):259–263. 10.1038/nsb0494-259View ArticlePubMedGoogle Scholar
- Abagyan R, Totrov M, Kuznetsov D: ICM-A new method for protein modeling and design: Applications to. J Comp Chem 1994, 15(5):488–506. 10.1002/jcc.540150503View ArticleGoogle Scholar
- Abagyan R: ICM user manual.2009. [http://www.molsoft.com/man/]Google Scholar
- Michener CD, Sokal RR: A quantitative approach to a problem in classification. Evolution 1957, 11: 130–162. 10.2307/2406046View ArticleGoogle Scholar
- Marchler-Bauer A, Anderson JB, Cherukuri PF, DeWeese-Scott C, Geer LY, Gwadz M, He S, Hurwitz DI, Jackson JD, Ke Z, et al.: CDD: a Conserved Domain Database for protein classification. Nucleic acids research 2005, 33(Database issue):D192–196. 10.1093/nar/gki069PubMed CentralView ArticlePubMedGoogle Scholar
- Lo Conte L, Ailey B, Hubbard TJ, Brenner SE, Murzin AG, Chothia C: SCOP: a structural classification of proteins database. Nucleic acids research 2000, 28(1):257–259. 10.1093/nar/28.1.257PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.