Skip to main content
  • Oral presentation
  • Open access
  • Published:

Designing and implementing chemoinformatic approaches in TDR Targets Database: linking genes to chemical compounds in tropical disease causing pathogens


Information about chemical compounds and their activity against whole organisms or specific molecular targets is available from the literature or from specialized databases. However, there are few resources that effectively integrate such large chemical datasets with genome data and provide a mechanism to link active compounds to potential target genes. Here, we showcase the integration of chemoinformatic tools for querying chemical datasets and linking chemicals to genes in TDR Targets database (, a web accessible resource that integrates a wide range of functional genomic datasets from tropical disease pathogens and provides a ranking mechanism for identifying and prioritising novel therapeutic targets [1].

Materials and methods

Chemical datasets were obtained from three different resources: DrugBank, PubChem and StARlite (ChEMBL). A pipeline was developed to calculate a number of properties (molecular weight; number of flexible bonds; polar surface area; H bond donors/acceptors; and predicted octanol/water partition coefficient) and descriptors (InChi, IUPAC's standard and open chemical identifiers; SMILES; and molecular formula) for each molecule, to facilitate querying and linking to other databases. We have also calculated a number of binary fingerprints and molecular statistics to accelerate searches.


A dataset of 504,020 chemicals, enriched in drugs and drug-like compounds, integrated into can be queried using: a textual search on molecular descriptors or chemical properties; a substructure search to find molecules containing the query structure; and a similarity search to find similar molecules (using Tanimoto distance) (see Figure 1). In the Starlite database 438,791 compounds are associated with 3,512 known druggable targets, and 2,224 of these could be linked to 3,043 pathogen targets based on sequence similarity. These relationships are available at

figure 1

Figure 1


A comprehensive collection of chemical data can be queried in various ways, including by chemical properties, structure and descriptors in More importantly, one can also link compounds of interest to novel target genes in tropical disease causing parasitic organisms based on sequence similarity to known targets of these compounds.


  1. Agüero F, et al.: Genomic-scale prioritization of drug targets: the TDR Targets database. Nat Rev Drug Discov 2008, 7(11):900–907. 10.1038/nrd2684

    Article  PubMed Central  PubMed  Google Scholar 

Download references


This work was funded by the “Special Programme for Research and Training in Tropical Diseases (UNICEF/UNDP/World Bank/WHO)”. María Paula Magariños is supported by the Fogarty International Center (Grant Number D43TW007888). The content is solely the responsibility of the authors and does not necessarily represent the official views of the Fogarty International Center or the National Institutes of Health.

Author information

Authors and Affiliations


Corresponding author

Correspondence to María Paula Magariños.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Magariños, M.P., Overington, J., Carmona, S. et al. Designing and implementing chemoinformatic approaches in TDR Targets Database: linking genes to chemical compounds in tropical disease causing pathogens. BMC Bioinformatics 11 (Suppl 10), O10 (2010).

Download citation

  • Published:

  • DOI: