Designing and implementing chemoinformatic approaches in TDR Targets Database: linking genes to chemical compounds in tropical disease causing pathogens
BMC Bioinformatics volume 11, Article number: O10 (2010)
Information about chemical compounds and their activity against whole organisms or specific molecular targets is available from the literature or from specialized databases. However, there are few resources that effectively integrate such large chemical datasets with genome data and provide a mechanism to link active compounds to potential target genes. Here, we showcase the integration of chemoinformatic tools for querying chemical datasets and linking chemicals to genes in TDR Targets database (tdrtargets.org), a web accessible resource that integrates a wide range of functional genomic datasets from tropical disease pathogens and provides a ranking mechanism for identifying and prioritising novel therapeutic targets .
Materials and methods
Chemical datasets were obtained from three different resources: DrugBank, PubChem and StARlite (ChEMBL). A pipeline was developed to calculate a number of properties (molecular weight; number of flexible bonds; polar surface area; H bond donors/acceptors; and predicted octanol/water partition coefficient) and descriptors (InChi, IUPAC's standard and open chemical identifiers; SMILES; and molecular formula) for each molecule, to facilitate querying and linking to other databases. We have also calculated a number of binary fingerprints and molecular statistics to accelerate searches.
A dataset of 504,020 chemicals, enriched in drugs and drug-like compounds, integrated into TDRTargets.org can be queried using: a textual search on molecular descriptors or chemical properties; a substructure search to find molecules containing the query structure; and a similarity search to find similar molecules (using Tanimoto distance) (see Figure 1). In the Starlite database 438,791 compounds are associated with 3,512 known druggable targets, and 2,224 of these could be linked to 3,043 pathogen targets based on sequence similarity. These relationships are available at TDRTargets.org.
A comprehensive collection of chemical data can be queried in various ways, including by chemical properties, structure and descriptors in TDRTargets.org. More importantly, one can also link compounds of interest to novel target genes in tropical disease causing parasitic organisms based on sequence similarity to known targets of these compounds.
Agüero F, et al.: Genomic-scale prioritization of drug targets: the TDR Targets database. Nat Rev Drug Discov 2008, 7(11):900–907. 10.1038/nrd2684
This work was funded by the “Special Programme for Research and Training in Tropical Diseases (UNICEF/UNDP/World Bank/WHO)”. María Paula Magariños is supported by the Fogarty International Center (Grant Number D43TW007888). The content is solely the responsibility of the authors and does not necessarily represent the official views of the Fogarty International Center or the National Institutes of Health.
About this article
Cite this article
Magariños, M.P., Overington, J., Carmona, S. et al. Designing and implementing chemoinformatic approaches in TDR Targets Database: linking genes to chemical compounds in tropical disease causing pathogens. BMC Bioinformatics 11 (Suppl 10), O10 (2010). https://doi.org/10.1186/1471-2105-11-S10-O10
- Tropical Disease
- Polar Surface Area
- Potential Target Gene
- Genomic Dataset
- Comprehensive Collection