Knowledge-based annotation of small molecule binding sites in proteins
© Thangudu et al; licensee BioMed Central Ltd. 2010
Received: 16 March 2010
Accepted: 1 July 2010
Published: 1 July 2010
The study of protein-small molecule interactions is vital for understanding protein function and for practical applications in drug discovery. To benefit from the rapidly increasing structural data, it is essential to improve the tools that enable large scale binding site prediction with greater emphasis on their biological validity.
We have developed a new method for the annotation of protein-small molecule binding sites, using inference by homology, which allows us to extend annotation onto protein sequences without experimental data available. To ensure biological relevance of binding sites, our method clusters similar binding sites found in homologous protein structures based on their sequence and structure conservation. Binding sites which appear evolutionarily conserved among non-redundant sets of homologous proteins are given higher priority. After binding sites are clustered, position specific score matrices (PSSMs) are constructed from the corresponding binding site alignments. Together with other measures, the PSSMs are subsequently used to rank binding sites to assess how well they match the query and to better gauge their biological relevance. The method also facilitates a succinct and informative representation of observed and inferred binding sites from homologs with known three-dimensional structures, thereby providing the means to analyze conservation and diversity of binding modes. Furthermore, the chemical properties of small molecules bound to the inferred binding sites can be used as a starting point in small molecule virtual screening. The method was validated by comparison to other binding site prediction methods and to a collection of manually curated binding site annotations. We show that our method achieves a sensitivity of 72% at predicting biologically relevant binding sites and can accurately discriminate those sites that bind biological small molecules from non-biological ones.
A new algorithm has been developed to predict binding sites with high accuracy in terms of their biological validity. It also provides a common platform for function prediction, knowledge-based docking and for small molecule virtual screening. The method can be applied even for a query sequence without structure. The method is available at http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.cgi.
The physical interactions between proteins and other molecules in protein crystal structures provide crucial insights into protein function. It is precisely these structures that enable researchers to study interactions in atomic detail, and find out, for example, how a specific mutation in a protein affects its function, or how a few atom modifications in a small molecule might lead to a more effective drug. With the large number of available crystal structures (nearly 60,000 currently in the RCSB Protein Data Bank), it is of great importance to improve the tools available for study of these interactions.
Moreover, a powerful method of inference can be used to predict function and interactions. It is based on the observation that homologous proteins have similar functions and often interact with their small molecules in a similar manner. Thus it is possible to infer protein-small molecule interactions even if there are no crystal structures available for a particular protein of interest, as long as there are structures of sufficiently close homologs. Recent estimates suggest that the majority of Entrez Protein sequences have homologs with a known structure [1, 2], thereby providing a reasonable chance to find relevant interactions via structures for protein sequences.
Homology inference methods, although powerful, have certain limitations. Common descent does not necessarily imply similarity in function or interactions; and annotations transferred from one protein to a homolog may result in incorrect functional or interolog assignment at larger evolutionary distances [3–6]. To verify and guide annotations, it is often essential to ensure close evolutionary relationships, and at the same time characterize the details of interactions in terms of binding site similarity. Current binding site prediction methods can be subdivided into several major categories: those which use evolutionary conservation of binding site motifs [7–9], those which use information about a structure of a complex [10–12], and docking and other methods [13, 14]. Structure-based methods use detailed knowledge of the protein structure to identify binding sites on the basis of the physico-chemical properties of individual residues, their electrostatic contribution, and their location in the 3D structure [15–26].
A number of methods and servers have been developed for predicting protein function by identifying similarities in sequence and structural features of binding pockets in homologous proteins, or evolutionary constraints on residues , or by using threading and other approaches [20, 28–39]. The main goal of these methods is to provide functional annotation for proteins out to the most distant homology relationships. FINDSITE , for example, looks for structural templates with bound small molecules for a query protein using threading. The templates are superimposed and the centers of mass of the bound small molecules are clustered to annotate putative binding sites on the query. Threading based methods, although capable of recognizing distant functional relations, are limited by the complexity of model building and low reliability of function transfer associated with distant homology [41, 42].
Firestar  predicts functionally important residues based on PSI-BLAST  alignments between the query sequence and structures with functional information derived from the PDB and the Catalytic Site Atlas . PHUNCTIONER  uses sequence profiles based on clustered sequences with matching GO  terms; potential binding sites are detected from sequence conservation. This method is capable of inferring the location of highly conserved small molecule binding sites, but might be questionable if the conservation of sites is caused by factors other than binding.
Transitive annotation of small molecule binding sites is also possible by detection of functional domains in the query protein sequence through BLAST heuristics and mapping the functionally important residues and/or features from the domain family members [30, 46].
There are a few other methods that directly detect small molecule binding sites via geometric analysis of protein structures. These methods include LIGSITEcsc , CAST , PASS , SURFNET , SCREEN , and ConCavity . All of these algorithms attempt to identify solvent-accessible pockets formed by surface residues on the protein, and to rank those pockets (for example by volume), in order to assign the most highly ranked pockets as the predicted/putative small molecule binding sites. LIGSITEcsc, SURFNET, and ConCavity use a more complex ranking function that takes into account residue conservation of binding site residues. These geometric methods are reasonably accurate, achieving success rates of 60-70% in correctly identifying small molecule binding sites. In their evaluation of LIGSITEcsc, the authors showed that their algorithm outperformed the other three methods on a test set of 48 structures . The SCREEN method identifies binding sites geometrically, and also computes feature vectors that are used by machine learning techniques. SCREEN is included in a suite of powerful modeling tools for functional annotation 
Recently we have developed a new database and method called "IBIS" (Inferred Biomolecular Interaction Server , http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.html) which enables researchers to conveniently study biomolecular interactions that have been observed in protein structures and through inference by homology to formulate predictions/hypotheses for biomolecular interactions, even if the data for specific biomolecules is not available. Therefore, IBIS can be considered a resource for functional annotation of proteins that have relevant homologs in the PDB . An input protein sequence may or may not have a structure itself; if not, it is assigned to the most closely related structure(s) using BLAST. IBIS can identify and infer a protein's interaction partners together with the locations of the corresponding binding sites on the protein query. It provides annotations of binding sites for proteins, small molecules (chemicals), nucleic acids, peptides and ions. In this paper we describe the method used in IBIS to annotate protein-small molecule interactions. To ensure biological relevance of binding sites, IBIS clusters similar binding sites found in homologous proteins based on conservation of sequence and structure of the binding site residues. Binding sites which appear evolutionarily conserved among non-redundant sets of homologous proteins are given higher priority. Additionally, binding site clusters are validated by comparing them with available binding site annotations from a manually curated subset of the CDD database [55, 56], and sites with non-biological small molecules are excluded. After binding sites are clustered, position specific score matrices (PSSMs) are constructed from the corresponding binding site alignments. Together with other measures, the PSSMs are subsequently used to rank binding sites to assess how well they match the query, and to gauge the biological relevance of binding sites with respect to the query.
A critical difference between our method and others is that IBIS pays particular attention to ensuring the biological relevance of binding sites, and homology between the unknown query sequence and the known structures of protein complexes. Our method might miss some remote similarities which could be detectable, for example by FINDSITE, but in exchange IBIS's top ranked annotations should be considered highly reliable. Unlike other methods, IBIS does not filter out similar structures to speed up the search process, but accounts for all structures so that interesting small molecule binding complexes are easily accessible. Our method derives the actual binding sites from observed structures, and groups them to account for variations in the binding site residues due to differences in small molecule size and conformations. This is essential for proteins which are important drug targets, as they have often been co-crystallized with a great variety of inhibitors. The clustering (grouping) of binding sites by similarity is very important because it identifies the distinct binding modes and allows for an easier interpretation of the results, despite the great growth in the amount of structure data over the last several years. As we have shown, it is possible to do the clustering automatically and in a biologically meaningful way.
Annotation of protein chains with observed and inferred binding sites
One of the important features of this method is that it does not exclude redundant sequences bound to different small molecules. For example, to account for all specific interactions of various drugs targeting the Kinase ATP binding site, it is imperative to consider all the protein sequences even if they are identical.
We validated the IBIS method by comparing the obtained annotations to the manually curated CDD annotations and to other different methods which use geometry of binding pockets and/or sequence conservation of binding sites. It should be mentioned that since the IBIS method is based on different types of structural evidence, the notion of false positives might not be valid in many cases.
Validation of the IBIS method using the Conserved Domain Database
To test the ability of our method to successfully infer the biologically relevant binding sites, a validation procedure was implemented using the manually curated Conserved Domain Database (CDD)  alignments and the functional features recorded in it as a standard of truth. Manually curated functional site annotations in CDD have been extracted from the published literature or derived from manual interpretation of individual three-dimensional structures. Altogether 49% of the proteins with observed small molecule binding sites have CDD small molecule binding site conserved annotations whereas over 55% of the proteins with inferred binding sites have at least one site overlapping with CDD annotated binding site annotation.
Since there are a number of proteins which do not have CDD annotations, IBIS inferred binding sites may be biologically relevant in these cases.
Validation of ranking scheme: discriminating between biological and non-biological chemicals
We used the same set of protein queries (604 chains) to evaluate our method using structures which contained both biological and non-biological small molecules (see Additional file 1 Table S1). Our goal is to assess how well our ranking scheme distinguishes between the two groups of binding sites: those containing biological versus non-biological small molecules. If all the bound small molecules in an inferred binding site are non-biological, it is deemed as non-biological. To address this, we applied a linear discriminant analysis which constructs a discriminant function that divides the parameter space into regions so as to separate the groups as distinctly as possible. The method computes the posterior probability of group membership for each observation, and assigns the observation to the group that has the highest probability. As a result, a classification matrix is produced, which gives the fraction of observations correctly assigned to each group by the discriminant function. In our case, a good classification would be quantified by high fractions for both correctly predicted biological binding site clusters and correctly predicted non-biological binding site clusters. We found that our method correctly classifies 87% of biological clusters and 85% of non-biological clusters.
Validation of IBIS method by comparison with - geometric methods
To further validate the prediction ability of our method we compared it with several widely used geometry and energy-based approaches discussed in a recent study  which includes LIGSITEcsc , PASS , Q-SiteFinder , Surfnet . We used 44 out of 48 proteins from this paper which have structure homologs with at least 30% sequence identity and also have both small molecule-bound and unbound structures.
Prediction sensitivity (%) of the top three predictions by different geometric approaches and their comparison to IBIS.
All of these approaches, although they perform reasonably well, are limited by the requirement of differentiating true positives from false positives. Introducing sequence conservation need not necessarily improve the prediction accuracy and could be a source of error, leading to over prediction of the binding site area . IBIS on the other hand predicts only a handful of small molecule binding sites with high probability of being biologically relevant. On average our method predicts 4 'biologically relevant' binding sites per protein chain and over half of all predicted sites map to CDD curator annotations.
Knowledge-based docking using IBIS, an example
To demonstrate the effectiveness of IBIS as a knowledge-based prediction system, we compared our method with an established reverse docking approach. Cai and coworkers  employed a reverse docking method to find a potential target protein for a natural product: N-trans-caffeoyltyramin in the genome of Helicobacter pyroli. Initially all potential binding proteins of N-trans-caffeoyltyramin were screened from a database of potential drug targets with known structures from the Protein Data Bank using the reverse docking approach TarFisDock . Only two proteins from the H.pyroli genome were found by the TarFisDock method: diaminopimelate decarboxylase (DC) and Peptide deformylase (PDF). After enzymatic validation, only the PDF protein was found to be a probable drug target. The crystal structure complex of N-trans-caffeoyltyramin with PDF suggested a highly selective binding in the PDF binding pocket.
Recently, it was estimated that over two-thirds of all protein sequences in the GenBank database have at least one structure homolog [1, 2]. As the on-going structural genomics initiative continues to close the sequence-structure gap, our method might be very useful for annotating proteins with unknown function and structure. Moreover, the location of putative binding sites provides guidance for the protein docking methods for drug design. We have assessed the reliability of our method by direct comparison with the binding site annotations from literature and manual curation and have shown that in the great majority of cases, the method detects and ranks the manually annotated binding site cluster at the first or second rank. This is achievable for a number of reasons, such as using a sufficient level of similarity between the unknown query and its homologs with the known binding sites, accurate clustering of small molecule binding sites using a reasonable similarity measure, and applying a deliberately designed ranking scheme that distinguishes the non-biological from the biologically relevant binding sites.
We have also compared our method with several widely used geometry and energy-based approaches to predict small molecule binding sites. As we have shown, the performance of our prediction method is very similar to popular geometric approaches. Moreover, one of the advantages is that our method can be applied even for a query sequence without structure, which is not the case for those binding site prediction methods which explicitly rely on the specific features of binding pocket geometry.
Using remote homology for functional inference is often based on the general assumption that there is a negative correlation between small molecule binding site similarity and overall sequence similarity. However, small molecule binding site similarity is much more complicated with many examples of strikingly similar binding sites with low (<30%) overall sequence identity and also very weakly similar binding sites with high overall sequence identity . Likewise, the similarities of small molecule binding sites across different protein folds, although providing new insights, leads to new challenges in deciphering the functional relevance. Large-scale automated function prediction methods are often limited by the lack of sufficient understanding of biological function and also by the quality of structure data. Hence, through the IBIS approach, we strive to limit the false positive rate by employing a conservative sequence similarity threshold of at least 30% over the structurally superimposed regions of homologs. It is often possible that the protein-small molecule crystal state may correspond to a global minimum of free energy where biologically relevant interactions are difficult to distinguish from non-specific contacts. For example, a recent estimate suggests some 20% of dimeric structures in PDB may be crystallization artifacts . The elaborate scoring scheme of our method based on recurrence and evolutionary conservation, along with the list of non-biological small molecules, tends to de-emphasize the artifactual interactions and ranks such sites near or at the bottom of the list.
Furthermore, the chemical properties of small molecules bound to the inferred binding sites can be used as a starting step in small molecule virtual screening. The PubChem compound database  mapping of IBIS small molecules accomplishes a preliminary step in small molecule virtual screening by clustering the similar chemicals into structurally unique compounds. The functional groups of the small molecules binding in a common binding site of evolutionarily related proteins are likely conserved. Recently it was shown that sequence and structure conservation of the binding site residues contacting these anchor functional groups is significantly higher than those contacting variable regions . IBIS, thus provides a common platform for function prediction, knowledge-based docking and also for small molecule virtual screening.
Finding small molecule binding sites that specify protein function is of great importance in drug development. Here we proposed a method to decipher the function of an unknown protein by interlinking sequence conservation with structural diversity of its homologs. To facilitate validation of the inferred binding sites from homologs, we developed an elaborate scoring scheme that can accurately distinguish biologically relevant sites. The method has been implemented as a web server, IBIS (Inferred Biomolecular Interaction Server - http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.cgi) to facilitate accurate, efficient and high-throughput function prediction.
We used the NCBI Molecular Modeling Database (MMDB)  as a source of data on protein complexes. The automated MMDB processing of PDB files includes steps such as deposition of the protein sequences into GenBank , deposition of small molecules into PubChem , addition of corresponding links to these databases in the MMDB records, also links to citations and references in PubMed, and Entrez indexing for quick searching.
Below we describe different steps of processing, including defining observed interactions from structures, inferred interactions from homologs, clustering binding sites and their ranking in terms of biological relevance with respect to the query protein.
Defining observed interactions
In the current release of the Molecular Modeling Database (MMDB) , there are about 28000 entries with bound small molecules. The resulting 39000 small molecules are bound to about 56000 protein chains in total. A small molecule is defined as any non-polypeptide, non-nucleic acid molecule in the structure complex or any molecule with a sufficient number of non-standard amino acid/nucleic acids and without an assigned GenBank identifier from NCBI. All the small molecules are standardized in the PubChem database  and have valid substance and compound identifiers. In this work we do not consider small molecules that are smaller than 5 heavy atoms or those having molecular weight outside the range of 70-800 Da. Small molecules such as metal ions often play role as crystallization agents, and therefore ions are not considered in this paper.
The filters by atom count and molecular weight only partially remove non-biological small molecules (i.e. buffers, salts, detergents, solvents, and ions added for the purpose of crystallization and/or purification). These non-biological molecules sometimes mimic natural small molecules and tend to bind in functional/active sites of proteins. For validation purposes we used a list of potential non-biological small molecules which has been collected from the literature (see Additional file 1 Table S1) [30, 66, 67].
We define a protein residue to be in contact with a small molecule if there is at least one (heavy) atom of the residue within 4.0Å of some atom from the small molecule. For most pairs of atoms, this threshold corresponds to the sum of their van der Waals radii plus a tolerance of about 0.5Å to allow for coordinate errors in structure determination. For manual curation of the Conserved Domain Database (CDD) a similar contact definition is used for defining protein-small molecule contacts. We retain only those protein-small molecule complexes which have at least five interacting protein residues. We define "binding site" as a set of residues on a given protein chain which are in contact with a given small molecule. Each MMDB entry is analyzed, and all pairs of biomolecules consisting of a protein chain and small molecule in contact with that chain are retained for further analysis.
It should be mentioned that a small molecule can be bound to a single domain or multiple domains which could come from more than one protein chain in the PDB record. Almost half of all the small molecules in the PDB are bound by more than one domain with <75% of all contacts to any single domain . However, using domains as structural units would necessitate automatic domain decomposition methods in many cases [68, 69], and the domain boundaries chosen could affect the results. To circumvent the potential technical difficulties in using domains as the structural unit in recording the observed/physical interactions, we use only complete protein chains for defining protein-small molecule interactions. Small molecules binding to multiple protein chains entail even more technical difficulties. For example, simultaneous superposition of multiple chains would need to be checked to ensure similarity of binding sites. Therefore, when multiple chains are involved in a binding site, if one of the chains includes 75% or more of the contacts, then we define only one binding site and assign it only to that particular chain. Otherwise, we define separate binding sites on each of the chains. The latter situation is relatively rare as only about 15% of the proteins in the current PDB release have small molecule interactions that fall into this category.
Inferring interactions from homologs
1) Collecting homologs with bound small molecules
To infer interactions based on homology we collect proteins which are structurally similar to a given query protein and have at least 30% sequence identity to the query (we refer to them as "homologous structure neighbors"). Structure neighbors for all PDB/MMDB entries have been pre-computed by the VAST algorithm  and stored in the PubVast database. Then we retrieve observed interactions for all structure neighbors (including the query protein). No sequence redundancy filter is applied to remove structures because there are often many structures of the same protein with different bound small molecules, and we may wish to study any of these cases. Since the alignments may contain gaps, we retain only those instances where at least 75% of the residue contacts with the small molecule occur within the structure alignment footprint of the query and neighbor.
2) Measuring binding site similarity
3) Clustering of binding sites
where T is the temperature factor, S(i, j) is the similarity score between binding site i and binding site j in each cluster, C represents a cluster, |C| is the number of binding sites in the cluster C, and N is the total number of binding sites clustered. The temperature T is a parameter (constant) that is chosen so as to correctly balance the energy-like and entropy-like terms in the function .
Biological relevance of binding sites and their ranking with respect to the query protein
All binding site clusters are ranked in terms of their predicted biological relevance and similarity to the query. First we assess the evolutionary conservation of binding site clusters. Those sites which reoccur in diverse enough protein complexes are ranked higher. Clusters that have only one non-redundant member (after members with more than 90% identity are removed) are considered "singletons" and are not assigned any score (ranked at the bottom of the list). A "conservation score" is computed in order to measure the diversity of cluster members and how well the binding site is conserved across the homologs. To do this, positional conservation in the binding site multiple sequence alignment is calculated using the Shannon entropy measure with the Henikoff-Henikoff sequence weights. Sequence weights are estimated using the complete sequences of neighbors aligned with the query protein.
To account for evolutionary closeness of a given binding site cluster to the query we use the sequence-PSSM score and the average sequence identity between the query and all cluster members calculated over the whole structure-structure alignment (not just binding sites). A position specific score matrix (PSSM) is constructed based on the binding site multiple alignment using the implicit pseudo-count method of Gribskov, McLachlan and Eisenberg . The aligned binding site region of the query protein is then scored against the PSSM and a sequence-PSSM score is calculated. A higher sequence-PSSM score points to a higher probability of this site being a biologically relevant site for the query.
To rank the larger interfaces more highly we also calculate the average number of interfacial contacts which the binding site makes in the complex of the corresponding homolog. All components of the ranking score are then normalized and all clusters are ranked with respect to the Z-scores. Any cluster with all members binding non-biological small molecules is disregarded.
The authors are grateful to Aron Märchler-Bauer, Dachuan Zhang, and Jessica Fong. This research was supported by the Intramural Research Program of the NIH, National Library of Medicine.
- Wang Y, Addess KJ, Chen J, Geer LY, He J, He S, Lu S, Madej T, Marchler-Bauer A, Thiessen PA, et al.: MMDB: annotating protein sequences with Entrez's 3D-structure database. Nucleic Acids Res 2007, (35 Database):D298–300. 10.1093/nar/gkl952
- Fukuchi S, Homma K, Sakamoto S, Sugawara H, Tateno Y, Gojobori T, Nishikawa K: The GTOP database in 2009: updated content and novel features to expand and deepen insights into protein structures and functions. Nucleic Acids Res 2009, (37 Database):D333–337. 10.1093/nar/gkn855
- Bork P, Koonin EV: Predicting functions from protein sequences--where are the bottlenecks? Nat Genet 1998, 18(4):313–318. 10.1038/ng0498-313View ArticlePubMed
- Gerlt JA, Babbitt PC: Can sequence determine function? Genome Biol 2000, 1(5):REVIEWS0005. 10.1186/gb-2000-1-5-reviews0005View ArticlePubMedPubMed Central
- Hegyi H, Gerstein M: The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. J Mol Biol 1999, 288(1):147–164. 10.1006/jmbi.1999.2661View ArticlePubMed
- Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, Vidal M, Gerstein M: Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. Genome Res 2004, 14(6):1107–1118. 10.1101/gr.1774904View ArticlePubMedPubMed Central
- Capra JA, Singh M: Predicting functionally important residues from sequence conservation. Bioinformatics 2007, 23(15):1875–1882. 10.1093/bioinformatics/btm270View ArticlePubMed
- Zhang T, Zhang H, Chen K, Shen S, Ruan J, Kurgan L: Accurate sequence-based prediction of catalytic residues. Bioinformatics 2008, 24(20):2329–2338. 10.1093/bioinformatics/btn433View ArticlePubMed
- Fischer JD, Mayer CE, Soding J: Prediction of protein functional residues from sequence by probability density estimation. Bioinformatics 2008, 24(5):613–620. 10.1093/bioinformatics/btm626View ArticlePubMed
- Burgoyne NJ, Jackson RM: Predicting protein interaction sites: binding hot-spots in protein-protein and protein-ligand interfaces. Bioinformatics 2006, 22(11):1335–1342. 10.1093/bioinformatics/btl079View ArticlePubMed
- Ota M, Kinoshita K, Nishikawa K: Prediction of catalytic residues in enzymes based on known tertiary structure, stability profile, and sequence conservation. J Mol Biol 2003, 327(5):1053–1064. 10.1016/S0022-2836(03)00207-9View ArticlePubMed
- Liang S, Zhang C, Liu S, Zhou Y: Protein binding site prediction using an empirical scoring function. Nucleic Acids Res 2006, 34(13):3698–3707. 10.1093/nar/gkl454View ArticlePubMedPubMed Central
- Campbell SJ, Gold ND, Jackson RM, Westhead DR: Ligand binding: functional site location, similarity and docking. Curr Opin Struct Biol 2003, 13(3):389–395. 10.1016/S0959-440X(03)00075-7View ArticlePubMed
- Thibert B, Bredesen DE, del Rio G: Improved prediction of critical residues for protein function based on network and phylogenetic analyses. BMC Bioinformatics 2005, 6: 213. 10.1186/1471-2105-6-213View ArticlePubMedPubMed Central
- Bray T, Chan P, Bougouffa S, Greaves R, Doig AJ, Warwicker J: SitesIdentify: a protein functional site prediction tool. BMC Bioinformatics 2009, 10(1):379. 10.1186/1471-2105-10-379View ArticlePubMedPubMed Central
- Brylinski M, Prymula K, Jurkowski W, Kochanczyk M, Stawowczyk E, Konieczny L, Roterman I: Prediction of functional sites based on the fuzzy oil drop model. PLoS Comput Biol 2007, 3(5):e94. 10.1371/journal.pcbi.0030094View ArticlePubMedPubMed Central
- Brylinski M, Skolnick J: A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation. Proc Natl Acad Sci USA 2008, 105(1):129–134. 10.1073/pnas.0707684105View ArticlePubMedPubMed Central
- Jones S, Thornton JM: Analysis of protein-protein interaction sites using surface patches. J Mol Biol 1997, 272(1):121–132. 10.1006/jmbi.1997.1234View ArticlePubMed
- Landgraf R, Xenarios I, Eisenberg D: Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. J Mol Biol 2001, 307(5):1487–1502. 10.1006/jmbi.2001.4540View ArticlePubMed
- Pazos F, Sternberg MJ: Automated prediction of protein function and detection of functional sites from structure. Proc Natl Acad Sci USA 2004, 101(41):14754–14759. 10.1073/pnas.0404569101View ArticlePubMedPubMed Central
- Teichmann SA, Murzin AG, Chothia C: Determination of protein function, evolution and interactions by structural genomics. Curr Opin Struct Biol 2001, 11(3):354–363. 10.1016/S0959-440X(00)00215-3View ArticlePubMed
- Panchenko AR, Kondrashov F, Bryant S: Prediction of functional sites by analysis of sequence and structure conservation. Protein Sci 2004, 13(4):884–892. 10.1110/ps.03465504View ArticlePubMedPubMed Central
- Bartlett GJ, Porter CT, Borkakoti N, Thornton JM: Analysis of catalytic residues in enzyme active sites. J Mol Biol 2002, 324(1):105–121. 10.1016/S0022-2836(02)01036-7View ArticlePubMed
- Bate P, Warwicker J: Enzyme/non-enzyme discrimination and prediction of enzyme active site location using charge-based methods. J Mol Biol 2004, 340(2):263–276. 10.1016/j.jmb.2004.04.070View ArticlePubMed
- Greaves R, Warwicker J: Active site identification through geometry-based and sequence profile-based calculations: burial of catalytic clefts. J Mol Biol 2005, 349(3):547–557. 10.1016/j.jmb.2005.04.018View ArticlePubMed
- Marti-Renom MA, Rossi A, Al-Shahrour F, Davis FP, Pieper U, Dopazo J, Sali A: The AnnoLite and AnnoLyze programs for comparative annotation of protein structures. BMC Bioinformatics 2007, 8(Suppl 4):S4. 10.1186/1471-2105-8-S4-S4View ArticlePubMedPubMed Central
- Chelliah V, Chen L, Blundell TL, Lovell SC: Distinguishing structural and functional restraints in evolution in order to identify interaction sites. J Mol Biol 2004, 342(5):1487–1504. 10.1016/j.jmb.2004.08.022View ArticlePubMed
- Laurie AT, Jackson RM: Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites. Bioinformatics 2005, 21(9):1908–1916. 10.1093/bioinformatics/bti315View ArticlePubMed
- Huang B, Schroeder M: LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation. BMC Struct Biol 2006, 6: 19. 10.1186/1472-6807-6-19View ArticlePubMedPubMed Central
- Snyder KA, Feldman HJ, Dumontier M, Salama JJ, Hogue CW: Domain-based small molecule binding site annotation. BMC Bioinformatics 2006, 7: 152. 10.1186/1471-2105-7-152View ArticlePubMedPubMed Central
- Lopez G, Valencia A, Tress ML: firestar--prediction of functionally important residues using structural templates and alignment reliability. Nucleic Acids Res 2007, (35 Web Server):W573–577. 10.1093/nar/gkm297
- Qin S, Zhou HX: meta-PPISP: a meta web server for protein-protein interaction site prediction. Bioinformatics 2007, 23(24):3386–3387. 10.1093/bioinformatics/btm434View ArticlePubMed
- Skolnick J, Brylinski M: FINDSITE: a combined evolution/structure-based approach to protein function prediction. Brief Bioinform 2009.
- Hernandez M, Ghersi D, Sanchez R: SITEHOUND-web: a server for ligand binding site identification in protein structures. Nucleic Acids Res 2009, (37 Web Server):W413–416. 10.1093/nar/gkp281
- Talavera D, Laskowski RA, Thornton JM: WSsas: a web service for the annotation of functional residues through structural homologues. Bioinformatics 2009, 25(9):1192–1194. 10.1093/bioinformatics/btp116View ArticlePubMed
- Ivanisenko VA, Pintus SS, Grigorovich DA, Kolchanov NA: PDBSiteScan: a program for searching for active, binding and posttranslational modification sites in the 3 D structures of proteins. Nucleic Acids Res 2004, (32 Web Server):W549–554. 10.1093/nar/gkh439
- Chang DT, Weng YZ, Lin JH, Hwang MJ, Oyang YJ: Protemot: prediction of protein binding sites with automatically extracted geometrical templates. Nucleic Acids Res 2006, (34 Web Server):W303–309. 10.1093/nar/gkl344
- Jambon M, Andrieu O, Combet C, Deleage G, Delfaud F, Geourjon C: The SuMo server: 3 D search for protein functional sites. Bioinformatics 2005, 21(20):3929–3930. 10.1093/bioinformatics/bti645View ArticlePubMed
- Shulman-Peleg A, Nussinov R, Wolfson HJ: SiteEngines: recognition and comparison of binding sites and protein-protein interfaces. Nucleic Acids Res 2005, (33 Web Server):W337–341. 10.1093/nar/gki482
- Brylinski M, Skolnick J: FINDSITE: a threading-based approach to ligand homology modeling. PLoS Comput Biol 2009, 5(6):e1000405. 10.1371/journal.pcbi.1000405View ArticlePubMedPubMed Central
- Wilson CA, Kreychman J, Gerstein M: Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. J Mol Biol 2000, 297(1):233–249. 10.1006/jmbi.2000.3550View ArticlePubMed
- Rost B: Enzyme function less conserved than anticipated. J Mol Biol 2002, 318(2):595–608. 10.1016/S0022-2836(02)00016-5View ArticlePubMed
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389–3402. 10.1093/nar/25.17.3389View ArticlePubMedPubMed Central
- Porter CT, Bartlett GJ, Thornton JM: The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res 2004, (32 Database):D129–133. 10.1093/nar/gkh028
- Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, et al.: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 2004, (32 Database):D258–261.
- Marchler-Bauer A, Bryant SH: CD-Search: protein domain annotations on the fly. Nucleic Acids Res 2004, (32 Web Server):W327–331. 10.1093/nar/gkh454
- Liang J, Edelsbrunner H, Woodward C: Anatomy of protein pockets and cavities: measurement of binding site geometry and implications for ligand design. Protein Sci 1998, 7(9):1884–1897. 10.1002/pro.5560070905View ArticlePubMedPubMed Central
- Brady GP Jr, Stouten PF: Fast prediction and visualization of protein binding pockets with PASS. J Comput Aided Mol Des 2000, 14(4):383–401. 10.1023/A:1008124202956View ArticlePubMed
- Laskowski RA: SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. J Mol Graph 1995, 13(5):323–330. 307–328 307-328 10.1016/0263-7855(95)00073-9View ArticlePubMed
- Nayal M, Honig B: On the nature of cavities on protein surfaces: application to the identification of drug-binding sites. Proteins 2006, 63(4):892–906. 10.1002/prot.20897View ArticlePubMed
- Capra JA, Laskowski RA, Thornton JM, Singh M, Funkhouser TA: Predicting Protein Ligand Binding Sites by Combining Evolutionary Sequence Conservation and 3 D Structure. PLoS Computational Biology 2009. accepted for publication accepted for publication
- Petrey D, Fischer M, Honig B: Structural relationships among proteins with different global topologies and their implications for function annotation strategies. Proc Natl Acad Sci USA 2009, 106(41):17377–17382. 10.1073/pnas.0907971106View ArticlePubMedPubMed Central
- Shoemaker BA, Zhang D, Thangudu RR, Tyagi M, Fong JH, Marchler-Bauer A, Bryant SH, Madej T, Panchenko AR: Inferred Biomolecular Interaction Server--a web server to analyze and predict protein interacting partners and binding sites. Nucl Acids Res 2010, (38 Database):D518–524. 10.1093/nar/gkp842
- Berman H, Henrick K, Nakamura H, Markley JL: The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res 2007, (35 Database):D301–303. 10.1093/nar/gkl971
- Marchler-Bauer A, Anderson JB, DeWeese-Scott C, Fedorova ND, Geer LY, He S, Hurwitz DI, Jackson JD, Jacobs AR, Lanczycki CJ, et al.: CDD: a curated Entrez database of conserved domain alignments. Nucleic Acids Res 2003, 31(1):383–387. 10.1093/nar/gkg087View ArticlePubMedPubMed Central
- Marchler-Bauer A, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR, Gwadz M, et al.: CDD: specific functional annotation with the Conserved Domain Database. Nucleic Acids Res 2009, (37 Database):D205–210. 10.1093/nar/gkn845
- Huang B: MetaPocket: a meta approach to improve protein ligand binding site prediction. OMICS 2009, 13(4):325–330. 10.1089/omi.2009.0045View ArticlePubMed
- Wass MN, Sternberg JEM: Prediction of ligand binding sites using homologous structures and conservation at CASP8. Proteins: Structure, Function, and Bioinformatics 2009, 77(S9):147–151.View Article
- Cai J, Han C, Hu T, Zhang J, Wu D, Wang F, Liu Y, Ding J, Chen K, Yue J, et al.: Peptide deformylase is a potential target for anti-Helicobacter pylori drugs: reverse docking, enzymatic assay, and X-ray crystallography validation. Protein Sci 2006, 15(9):2071–2081. 10.1110/ps.062238406View ArticlePubMedPubMed Central
- Li H, Gao Z, Kang L, Zhang H, Yang K, Yu K, Luo X, Zhu W, Chen K, Shen J, et al.: TarFisDock: a web server for identifying drug targets with docking approach. Nucleic Acids Res 2006, (34 Web Server):W219–224. 10.1093/nar/gkl114
- Kinjo AR, Nakamura H: Comprehensive structural classification of ligand-binding motifs in proteins. Structure 2009, 17(2):234–246. 10.1016/j.str.2008.11.009View ArticlePubMed
- Krissinel E: Crystal contacts as nature's docking solutions. J Comput Chem 31(1):133–143. 10.1002/jcc.21303
- Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Bryant SH: PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res 2009, (37 Web Server):W623–633. 10.1093/nar/gkp456
- Chen J, Anderson JB, DeWeese-Scott C, Fedorova ND, Geer LY, He S, Hurwitz DI, Jackson JD, Jacobs AR, Lanczycki CJ, et al.: MMDB: Entrez's 3D-structure database. Nucleic Acids Res 2003, 31(1):474–477. 10.1093/nar/gkg086View ArticlePubMedPubMed Central
- Sayers EW, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, et al.: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2009, (37 Database):D5–15. 10.1093/nar/gkn741
- Bashton M, Nobeli I, Thornton JM: Cognate ligand domain mapping for enzymes. J Mol Biol 2006, 364(4):836–852. 10.1016/j.jmb.2006.09.041View ArticlePubMed
- Wang R, Fang X, Lu Y, Yang CY, Wang S: The PDBbind database: methodologies and updates. J Med Chem 2005, 48(12):4111–4119. 10.1021/jm048957qView ArticlePubMed
- Koczyk G, Berezovsky IN: Domain Hierarchy and closed Loops (DHcL): a server for exploring hierarchy of protein domain structure. Nucleic Acids Res 2008, (36 Web Server):W239–245. 10.1093/nar/gkn326
- Holland TA, Veretnik S, Shindyalov IN, Bourne PE: Partitioning protein structures into domains: why is it so difficult? J Mol Biol 2006, 361(3):562–590. 10.1016/j.jmb.2006.05.060View ArticlePubMed
- Madej T, Gibrat JF, Bryant SH: Threading a database of protein cores. Proteins 1995, 23(3):356–369. 10.1002/prot.340230309View ArticlePubMed
- Slonim N, Atwal GS, Tkacik G, Bialek W: Information-based clustering. Proc Natl Acad Sci USA 2005, 102(51):18297–18302. 10.1073/pnas.0507432102View ArticlePubMedPubMed Central
- Gribskov M, McLachlan AD, Eisenberg D: Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci USA 1987, 84(13):4355–4358. 10.1073/pnas.84.13.4355View ArticlePubMedPubMed Central
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.