SuperLigands – a database of ligand structures derived from the Protein Data Bank
© Michalsky et al; licensee BioMed Central Ltd. 2005
Received: 03 February 2005
Accepted: 19 May 2005
Published: 19 May 2005
Currently, the PDB contains approximately 29,000 protein structures comprising over 70,000 experimentally determined three-dimensional structures of over 5,000 different low molecular weight compounds. Information about these PDB ligands can be very helpful in the field of molecular modelling and prediction, particularly for the prediction of protein binding sites and function.
Here we present an Internet accessible database delivering PDB ligands in the MDL Mol file format which, in contrast to the PDB format, includes information about bond types. Structural similarity of the compounds can be detected by calculation of Tanimoto coefficients and by three-dimensional superposition. Topological similarity of PDB ligands to known drugs can be assessed via Tanimoto coefficients.
SuperLigands supplements the set of existing resources of information about small molecules bound to PDB structures. Allowing for three-dimensional comparison of the compounds as a novel feature, this database represents a valuable means of analysis and prediction in the field of biological and medical research.
Protein modelling and structure prediction as well as binding and interaction prediction have become very valuable instruments for researchers in biology and medicine. In order to build reasonable and useful models, as much information as possible has to be incorporated into the protein modelling process. To refine protein models, chemical as well as spatial information about ligand structures can be considered, specifically to optimise side-chain conformations around binding-sites .
Several databases delivering structures and different additional information about ligand molecules from the Protein Data Bank (PDB) ,  have been provided on the Internet. Ligand Depot ,  comprises chemical and structural information for small molecules found in the PDB and also provides a graphical interface for performing chemical substructure searches. Idealized three-dimensional structures and additional information about PDB ligands can be retrieved via the search interface of the E-MSD macromolecular structure relational database , . Besides many other features, Relibase ,  allows for two-dimensional similarity and substructure search among the ligands as well as for sequence similarity search among the corresponding proteins. LigBase ,  is a database of ligand binding sites aligned with related protein structures and sequences. Various information about ligands bound to macromolecules deposited in the PDB can be retrieved from many further sources like HIC-Up , , PDBsum ,  and the IMB Jena Image Library of Biological Macromolecules , . The latter can be searched after geometrical properties of the ligand binding sites.
Information contained in these databases can help identifying ligands which are likely to bind to a given protein structure. The opposite question, namely to find target proteins for a certain ligand, was addressed in , where a collection of protein active sites was extracted from the PDB and scanned with aid of a docking algorithm. Further data collections emphasize the link between binding affinities and structures of the protein-ligand complexes and, inter alia, provide experimentally measured binding data, e.g. PLD ,  LPDB , , PDBbind , .
For modelling and simulation purposes, chemical and spatial information about protein ligands is vitally important. Addressing this fact, SuperLigands is a collection of small molecule structures contained in the PDB, facilitating comparison of the molecules regarding their two-dimensional similarity. As spatial comparison of compounds can deliver valuable information in addition to this, SuperLigands also allows for three-dimensional superpositions. Spatial coordinates of the compounds can be retrieved as MDL Mol files, which include information about multiple bonds.
Construction and content
Native conformations of small molecules contained in the PDB and additional information were collected from the PDB , , Ligand Depot ,  and MSD ,  and deposited in the database SuperLigands. The database has been designed as a MySQL relational database and supplemented with a user-friendly web interface. Database queries are performed and HTML pages are generated via PHP scripts. The freely available MDL® Chime plug-in is used to display molecules and allows the user some manipulations of the view and to store the displayed molecule in the MDL Mol file format.
In order to enable fast two-dimensional searches, 960 bit binary fingerprints (MDL MACCS Keys ) were calculated and stored in the database for all ligands. Tanimoto coefficients are calculated via a PHP script. Here, all 960 MDL keys are included and equally weighted. The Tanimoto coefficient for two structures a and b is defined as follows: T(a,b)=Nab/(Na+Nb-Nab), where Na and Nb are the numbers of bits set in the fingerprint of structure a and b, respectively, and Nab is the number of bits which are common to both fingerprints.
Three-dimensional superposition of two different PDB ligands is performed in the following way: each conformation of one molecule occurring in the PDB is superposed with each conformation of the second molecule. Those two instances matching best are displayed. The best match is defined by maximizing the score defined by
where RMSD is the Root Mean Square Deviation of the superposed atoms. For completion, PDB codes, chain identifiers and positions in the PDB files of the matched conformations as well as the atom numbers of both ligands, the number of superposed atoms, the number of superposed atoms of the same type and the RMSD of the superposition are returned. For detailed information regarding the superposition algorithm see .
SuperLigands can be searched by hetero-ID (i.e. the three-letter PDB code for hetero-compounds), name, molecular formula or PDB identifier. In the results table, hetero-ID and names of the compounds are given. Moreover, the molecular structure is displayed in one cell of the table where it can be rotated by the user and different displaying options can be chosen. More information like molecular formula, atom numbers and occurrence in the PDB can be retrieved additionally.
The database SuperLigands contains compounds defined by 'HETATM' records in PDB files. Some of these molecules may be bound to pseudopeptides but can also be separate ligands. Coming across such a molecule, the user is given a hint and is provided with a list of pseudopeptides in which this molecule is bound.
The user can search the database for molecules that have a significant two-dimensional similarity to a given ligand or assess the three-dimensional similarity of two compounds by superposing them with each other. Such similarity queries can be performed starting from the search results tables or directly using separate forms. Using such a form, the Tanimoto coefficient of two given structures can also be retrieved.
A typical example for a query to SuperLigands is a search for tobramycin, known as antiinfective and antibacterial drug, starting in the main form. Searching the database for similar compounds in the next step supplies the drug kanamycin as PDB ligand with the highest Tanimoto similarity (98.6%). Now, a three-dimensional superposition of all instances of tobramycin (32 atoms, two instances) and kanamycin (33 atoms, four instances) occurring in the PDB can be performed. The best fitting structures are superposed and then displayed. In this case, best fitting are the instances of tobramycin in PDB entry 1m4d and kanamycin in 1m4i with an RMSD of 0.14Å (32 atoms superposed). Navigating through the website, the topological and spatial similarities of PDB ligands can be obtained easily. For example, tobramycin and ribostamycin (31 atoms, four instances) have a Tanimoto coefficient of 96.5%, the RMSD of their best superposition is only 0.95Å (25 atoms superposed). In turn, geneticin (34 atoms, five instances) delivers a Tanimoto coefficient of only 81.1% and a much better RMSD (0.16Å, 30 atoms superposed).
Statistics: comparison of PDB ligands with drugs
Recently, a database containing 2396 drug molecules and having the same design as SuperLigands has been created (SuperDrug Database , ). To answer the question, how many drugs or drug-like molecules are bound to PDB structures, Tanimoto coefficients have been calculated for all pairwise combinations of molecules from SuperLigands and the SuperDrug Database. A set of 5,040 PDB ligands has been incorporated into these calculations. Considering two molecules having a Tanimoto coefficient of 100% (or greater than 95% ; 90%) identical or very similar, this analysis reveals that 413 (771 ; 1,457) of 5,040 PDB ligands are drugs or drug-like compounds.
Percentage of PDB ligands and drugs violating certain numbers of Lipinski Rules
Number of violated Lipinski Rules
SuperLigands is a collection of PDB ligands freely accessible via a user-friendly web site. Molecular coordinates can be retrieved as MDL Mol files, supplementing the connectivity records contained in PDB files with bond types, which are necessary for modelling and simulation purposes. The database can be searched for compounds similar to a given ligand by comparison of Tanimoto coefficients. As stated in  and shown in the example in the section Utility, spatial comparison of small molecules can reveal more similarities, and thus similar kinds of interaction, than a pure two-dimensional topology comparison. With aid of SuperLigands, such three-dimensional comparisons can be performed easily. Moreover, the topological similarity of PDB ligand structures to known drugs can be assessed by calculation of Tanimoto coefficients.
The database presented here supplements the set of existing resources of information about small molecules bound to PDB structures. As novel features, three-dimensional comparison of molecules as well as topology comparison of PDB ligands with known drugs are made possible. Thus, SuperLigands represents a valuable means of analysis and prediction in the field of biological and medical research.
Availability and requirements
The database is publicly accessible at http://bioinf.charite.de/superligands/. For visualisation, the free browser plug-in MDL® Chime is required. Chime runs on Windows systems with Microsoft Internet Explorer (6.0 or 5.5 SP2) or Netscape 4.75, 4.79 or on Mac OS 9.0 or 8.6 with Netscape 4.75 (please see http://www.mdlchime.com/products/framework/chime/sys_req.jsp for detailed information).
This work was supported by the BMBF (German Federal Ministry of Education and Research).
- Evers A, Gohlke H, Klebe G: Ligand-supported Homology Modelling of Protein Binding-sites using Knowledge-based Potentials. J Mol Biol 2003, 334: 327–345. 10.1016/j.jmb.2003.09.032View ArticlePubMedGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000, 28: 235–242. 10.1093/nar/28.1.235PubMed CentralView ArticlePubMedGoogle Scholar
- Feng Z, Chen L, Maddula H, Akcan O, Oughtred R, Berman HM, Westbrook J: Ligand Depot: a data warehouse for ligands bound to macromolecules. Bioinformatics 2004, 20: 2153–2155. 10.1093/bioinformatics/bth214View ArticlePubMedGoogle Scholar
- Boutselakis H, Dimitropoulos D, Fillon J, Golovin A, Henrick K, Hussain A, Ionides J, John M, Keller PA, Krissinel E, McNeil P, Naim A, Newman R, Oldfield T, Pineda J, Rachedi A, Copeland J, Sitnov A, Sobhany S, Suarez-Uruena A, Swaminathan J, Tagari M, Tate J, Tromm S, Velankar S, Vranken W: E-MSD: the European Bioinformatics Institute Macromolecular Structure Database. Nucleic Acids Res 2003, 31: 458–462. 10.1093/nar/gkg065PubMed CentralView ArticlePubMedGoogle Scholar
- Hendlich M, Bergner A, Gunther J, Klebe G: Relibase: design and development of a database for comprehensive analysis of protein-ligand interactions. J Mol Biol 2003, 326: 607–620. 10.1016/S0022-2836(02)01408-0View ArticlePubMedGoogle Scholar
- Stuart AC, Ilyin VA, Sali A: LigBase: a database of families of aligned ligand binding sites in known protein sequences and structures. Bioinformatics 2002, 18: 200–201. 10.1093/bioinformatics/18.1.200View ArticlePubMedGoogle Scholar
- Kleywegt GJ, Jones TA: Databases in protein crystallography. Acta Cryst D Biol Cryst 1998, 54: 1119–1131. 10.1107/S0907444998007100View ArticleGoogle Scholar
- Laskowski RA: PDBsum: summaries and analyses of PDB structures. Nucleic Acids Res 2001, 29: 221–222. 10.1093/nar/29.1.221PubMed CentralView ArticlePubMedGoogle Scholar
- Reichert J, Sühnel J: The IMB Jena Image Library of Biological Macromolecules: 2002 update. Nucleic Acids Res 2002, 30: 253–254. 10.1093/nar/30.1.253PubMed CentralView ArticlePubMedGoogle Scholar
- Paul N, Kellenberger E, Bret G, Müller P, Rognan D: Recovering the True Targets of Specific Ligands by Virtual Screening of the Protein Data Bank. Proteins 2004, 54: 671–680. 10.1002/prot.10625View ArticlePubMedGoogle Scholar
- Puvanendrampillai D, Mitchell JBO: Protein Ligand Database (PLD): additional understanding of the nature and specificity of protein-ligand complexes. Bioinformatics 2003, 19: 1856–1857. 10.1093/bioinformatics/btg243View ArticlePubMedGoogle Scholar
- Roche O, Kiyama R, Brooks CL: Ligand-protein database: linking protein-ligand complex structures to binding data. J Med Chem 2001, 44: 3592–3598. 10.1021/jm000467kView ArticlePubMedGoogle Scholar
- Wang R, Fang X, Lu Y, Wang S: The PDBbind Database: Collection of Binding Affinities for Protein-Ligand Complexes with Known Three-Dimensional Structures. J Med Chem 2004, 47: 2977–2980. 10.1021/jm030580lView ArticlePubMedGoogle Scholar
- Durant JL, Leland BA, Henry DR, Nourse JG: Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci 2002, 42: 1273–1280. 10.1021/ci010132rView ArticlePubMedGoogle Scholar
- Thimm M, Goede A, Hougardy S, Preissner R: Comparison of 2D Similarity and 3D Superposition. Application to Searching a Conformational Drug Database. J Chem Inf Comput Sci 2004, 44: 1816–1822. 10.1021/ci049920hView ArticlePubMedGoogle Scholar
- Goede A, Dunkel M, Mester N, Frommel C, Preissner R: SuperDrug: a conformational drug database. Bioinformatics Advance Access published February 2, 2005, PMID: 15691861.Google Scholar
- Lipinski CA, Lombardo F, Dominy BW, Feeney PJ: Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev 2001, 46: 3–26. 10.1016/S0169-409X(00)00129-0View ArticlePubMedGoogle Scholar
- Pfizer Statement on New Information Regarding Cardiovascular Safety of Celebrex[http://pfizer.com/are/investors_releases/2004pr/mn_2004_1217.cfm]
- Ray WA, Griffin MR, Stein CM: Cardiovascular toxicity of valdecoxib. N Engl J Med 2004, 351: 2767. 10.1056/NEJMc045711View ArticlePubMedGoogle Scholar
- Pfizer Statement on Status of Bextra[http://www.pfizer.com/are/news_releases/2005pr/mn_2005_0510.html]
- The Protein Data Bank[http://www.rcsb.org/pdb/]
- Ligand Depot[http://ligand-depot.rutgers.edu/]
- The Macromolecular Structure Database[http://www.ebi.ac.uk/msd/]
- HIC-Up, the Hetero-compound Information Centre - Uppsala[http://xray.bmc.uu.se/hicup/]
- The IMB Jena Image Library of Biological Macromolecules[http://www.imb-jena.de/IMAGE.html]
- Ligand-protein database[http://lpdb.scripps.edu/]
- The PDBbind Database[http://www.pdbbind.org/]
- SuperDrug Database[http://bioinf.charite.de/superdrug/]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.