Volume 11 Supplement 10

Highlights from the Sixth International Society for Computational Biology (ISCB) Student Council Symposium

Open Access

Designing and implementing chemoinformatic approaches in TDR Targets Database: linking genes to chemical compounds in tropical disease causing pathogens

  • María Paula Magariños1Email author,
  • John Overington2,
  • Santiago Carmona1,
  • Dhanasekaran Shanmugam3,
  • Maria Doyle4,
  • Stuart Ralph4,
  • Greg Crowther5,
  • Christiane Hertz-Fowler6,
  • Solomon Nwaka7,
  • Matt Berriman6,
  • David Roos3,
  • Wes Van Voorhis5 and
  • Fernán Agüero1
BMC Bioinformatics201011(Suppl 10):O10


Published: 07 December 2010


Information about chemical compounds and their activity against whole organisms or specific molecular targets is available from the literature or from specialized databases. However, there are few resources that effectively integrate such large chemical datasets with genome data and provide a mechanism to link active compounds to potential target genes. Here, we showcase the integration of chemoinformatic tools for querying chemical datasets and linking chemicals to genes in TDR Targets database (tdrtargets.org), a web accessible resource that integrates a wide range of functional genomic datasets from tropical disease pathogens and provides a ranking mechanism for identifying and prioritising novel therapeutic targets [1].

Materials and methods

Chemical datasets were obtained from three different resources: DrugBank, PubChem and StARlite (ChEMBL). A pipeline was developed to calculate a number of properties (molecular weight; number of flexible bonds; polar surface area; H bond donors/acceptors; and predicted octanol/water partition coefficient) and descriptors (InChi, IUPAC's standard and open chemical identifiers; SMILES; and molecular formula) for each molecule, to facilitate querying and linking to other databases. We have also calculated a number of binary fingerprints and molecular statistics to accelerate searches.


A dataset of 504,020 chemicals, enriched in drugs and drug-like compounds, integrated into TDRTargets.org can be queried using: a textual search on molecular descriptors or chemical properties; a substructure search to find molecules containing the query structure; and a similarity search to find similar molecules (using Tanimoto distance) (see Figure 1). In the Starlite database 438,791 compounds are associated with 3,512 known druggable targets, and 2,224 of these could be linked to 3,043 pathogen targets based on sequence similarity. These relationships are available at TDRTargets.org.

Figure 1


A comprehensive collection of chemical data can be queried in various ways, including by chemical properties, structure and descriptors in TDRTargets.org. More importantly, one can also link compounds of interest to novel target genes in tropical disease causing parasitic organisms based on sequence similarity to known targets of these compounds.



This work was funded by the “Special Programme for Research and Training in Tropical Diseases (UNICEF/UNDP/World Bank/WHO)”. María Paula Magariños is supported by the Fogarty International Center (Grant Number D43TW007888). The content is solely the responsibility of the authors and does not necessarily represent the official views of the Fogarty International Center or the National Institutes of Health.

Authors’ Affiliations

Instituto de Investigaciones Biotecnológicas, Universidad de San Martín
European Bioinformatics: Institute, EBML Outstation
University of Pennsylvania
University of Melbourne
University of Washington
Wellcome Trust Sanger Institute


  1. Agüero F, et al.: Genomic-scale prioritization of drug targets: the TDR Targets database. Nat Rev Drug Discov 2008, 7(11):900–907. 10.1038/nrd2684PubMed CentralView ArticlePubMedGoogle Scholar


© Magariños et al; licensee BioMed Central Ltd. 2010

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.