Drug2Gene: an exhaustive resource to explore effectively the drug-target relation network
© Roider et al.; licensee BioMed Central Ltd. 2014
Received: 18 November 2013
Accepted: 3 March 2014
Published: 11 March 2014
Information about drug-target relations is at the heart of drug discovery. There are now dozens of databases providing drug-target interaction data with varying scope, and focus. Therefore, and due to the large chemical space, the overlap of the different data sets is surprisingly small. As searching through these sources manually is cumbersome, time-consuming and error-prone, integrating all the data is highly desirable. Despite a few attempts, integration has been hampered by the diversity of descriptions of compounds, and by the fact that the reported activity values, coming from different data sets, are not always directly comparable due to usage of different metrics or data formats.
We have built Drug2Gene, a knowledge base, which combines the compound/drug-gene/protein information from 19 publicly available databases. A key feature is our rigorous unification and standardization process which makes the data truly comparable on a large scale, allowing for the first time effective data mining in such a large knowledge corpus. As of version 3.2, Drug2Gene contains 4,372,290 unified relations between compounds and their targets most of which include reported bioactivity data. We extend this set with putative (i.e. homology-inferred) relations where sufficient sequence homology between proteins suggests they may bind to similar compounds. Drug2Gene provides powerful search functionalities, very flexible export procedures, and a user-friendly web interface.
Drug2Gene v3.2 has become a mature and comprehensive knowledge base providing unified, standardized drug-target related information gathered from publicly available data sources. It can be used to integrate proprietary data sets with publicly available data sets. Its main goal is to be a ‘one-stop shop’ to identify tool compounds targeting a given gene product or for finding all known targets of a drug. Drug2Gene with its integrated data set of public compound-target relations is freely accessible without restrictions at http://www.drug2gene.com.
KeywordsDrug-target relations Compound-protein relations Drug development Drug discovery Drug repositioning Knowledge base Tool compounds Biological effect Bioactivity
High-throughput screening techniques caused a dramatic increase in drug-target related information not only within pharmaceutical companies but also in public databases. For instance, as of September, 2013, ChEMBL  contained 12,077,491 bioactivity evidences, 1,324,941 compounds, and 9,356 protein targets . BindingDB  grew from around 20,000 drug-target binding activities in 2007 to 620,000 as of January 2013  and the number of relations in the current version 3 of DrugBank has expanded by more than 50% compared to the previous release . Adequate consolidation and exploration of this compound-gene relation space can have direct impact on the different phases of the drug discovery process by speeding up the identification of tool compounds or by facilitating the repositioning of known drugs . To address this need there exist now numerous databases like STITCH , SuperTarget , SLAP , Dr. PIAS , PROMISCUOUS , DrugMap Central , PiHelper , and ChemMapper  that offer different levels of representation, curation and annotation of relational data. For example STITCH is an online resource that focuses on interactions between proteins and chemicals. STITCH currently integrates connections between more than 300,000 compounds and 2,600,000 proteins from 1,133 organisms, all extracted from source databases like, ChEMBL, and BindingDB. In addition this data is enriched with protein-protein interaction and biological pathway information.
The availability of such a considerable number of projects built upon the source databases confirms the need of consolidation and improved representation of the compound-target data. However, the scope and functionality of the derived databases is oftentimes specific (single purpose applications) and their data content may not be standardized or normalized. The main motivation to create Drug2Gene is to provide the largest, standardized, and unified compound-target knowledge base that eliminates redundancy to eventually enable effective data mining of the drug-target space.
The Drug2Gene building process integrates data from 19 public bio- and chemo informatics resources, some of which are integrated for the first time in a relation-centered knowledge base. It removes redundancy in the relational, target, and compound namespaces and also facilitates the comparison of experimental data from different sources by standardizing bioactivity data. Drug2Gene enriches the combined dataset with additional homology-based relations through the use of gene homology groups from NCBI HomoloGene . This relational data is paired with powerful search functionality and a user-friendly web interface. Drug2Gene aims at being not only a starting point for, but also a continuous companion during the drug discovery process. Its relation-centered design concept keeps the resource sparse on additional information about interacting partners and instead provides external links to primary sources of compound/drug and gene/protein information.
Construction and content
The need for data integration
Drug2Gene data source statistics
Unique relations in DB
Ligand Expo 
NCBI PubChem Compound 
NCBI PubChem Substance 
PubChem Bioassay 
Unified relations shared among two or more DBs
Total counts from all source databases
NCBI Gene/Entrez Gene 
Used for Unification/Data Integration
Homology inferred relations from NCBI HomoloGene 
Total relations in Drug2Gene including homology-inferred relations
Drug2Gene entry: a compound, a gene, and the relation between them
Integration of primary databases into Drug2Gene entries
In its current version (3.2) Drug2Gene has integrated 19 publicly available source databases (Table 1). Our integration process extracts all relations from these databases through separate automated parsing routines tuned for the format of each source database flat file. In total, we extracted 4,945,372 primary database relations between 2,312,423 compound/drug entries and 202,465 target entries (Table 1). These data were analyzed for redundancy and combined with references to the primary sources in the next stage of the integration process - the unification.
The unification process has two major phases – id-mapping of target entries and detection and merging of chemical entries. The id-mapping of all target entries to gene entries is based on a common index, the Entrez Gene Index from NCBI . Entities mapping to the same NCBI gene are thereby merged while relations whose target cannot be unambiguously traced to a verified gene or gene product are removed. Thus only protein and nucleotide target entries are imported, while pseudogenes, metabolic pathways, disease, and other more general types of target entities are removed. As there are many published or tested chemical compounds that lack unified identifiers such as Chemical Abstracts Service Numbers (CAS Number) or International Chemical Identifiers (InChI), compound unification is a more complex process, and depends on the information that is available about the compounds in each source database. If chemical structures are available they are extracted from Structure-Data Files (SDF)  and passed through a series of purging rules (i.e. removal of free water molecules and small independent ions, elimination of repeating molecules from polymers, ignoring information about isotope composition, dipolar bonds, and stereoisomerism). Purged structures that are identical are subsequently merged. In contrast, if no chemical structure is available we try to map the provided compound names to the entries created by our SDF integration through cross-references between the source databases. For example, if the compound aspirin from a given database has a cross-reference to the entry acetylsalicylic acid in another database, then aspirin is merged with acetylsalicylic acid. Finally, for compounds without available chemical structure, or cross-reference between source databases, we try to use other available identifiers such as InChIKeys, brand names, compound formulas, Simplified Molecular-Input Line-Entry System (SMILES), or synonymous names for an exhaustive search which is performed in a hierarchical order (see Additional file 1: Table S1). If any of the identifiers matches a value from an already processed compound then the two entries are merged. The compounds that cannot be merged by above steps are integrated as separate compound entries.
As a result of our gene mapping and compound unification, evidences for a relation either collapse to a single Drug2Gene entry if they refer to the same compound and gene, or they create a unique Drug2Gene entry with its respective evidence(s). The merging procedure behind every composite entry is reflected in the unification confidence provided to the user – with ‘high confidence’ assigned to SDF/InChI based, ‘medium confidence’ for cross-reference-based, and ‘low confidence’ for name-based merging. In addition, all relations are subject to a reliability scoring system based on the type of evidence supporting the relation in the source databases. For example, computationally predicted evidences have a score of 4, measured interactions – 9, and highly curated interactions (e.g. those of approved drugs) - 10. If the interaction has more than one evidence, the highest score is attributed to the entire relation.
Integrated entries are stored in Drug2Gene’s relational database schema designed to meet the requirements of the combined dataset as well as to normalize data coming from different source databases (Additional file 2: Figure S1). Drug2Gene is regularly updated every six months.
Creation of homology-based relations
If a gene in an existing relation has an ortholog or paralog with >80% amino acid sequence identity between the encoded protein sequences (according to NCBI HomoloGene) then a so called “homology-inferred relation” to the same compound(s) is created for the corresponding gene. Through this procedure we extend the scope of imported relations by adding probable drug targets as homology-based relations can suggest overlooked side effects (interactions) for a given compound or extend the available set of active compounds for a given gene/protein (see Case studies). All relations inferred by homology are clearly earmarked with a red asterisk in the user interface of Drug2Gene. In queries they can be intentionally disregarded or specifically selected through two dedicated search indices (see User interface).
Standardization of evidences and classification of relations
Besides the identification and merging of redundant protein and compound data, another prerequisite for efficient data-mining of compound-gene relations is the direct comparability of binding measurements. To this end the data fields “Activity Type”, “Activity Value”, and “Activity Units” in Drug2Gene are subjected to standardization.
We first normalize the types of activity found in the source databases. For example, “Inhibitory Concentration” and active concentration “AC” are replaced by half maximal inhibitory concentration “IC50” if the evidence refers to inhibition. Once the activity types are standardized we convert the measurements to standard units according to International System of Units (SI). For instance, if the data source reports values in micromoles, then they are converted to the molar equivalents such as 207.2 μM to 0.0002072 M. In searches or for data exports only those standardized values are used. Changed values are explicitly denoted in the interface with a SI icon, which displays the original values upon click.
Activity measurements having constants like half maximal inhibitory concentration (IC50) or the binding affinity (Ki) are not self-explanatory for users without experience in the field of biochemistry, medicinal chemistry, or pharmacology. For a more user-friendly searching over those different activity types, we evaluated and assigned them discrete and simple binding strength categories (“strong”, “medium”, “weak”, or “no binding”) which we call “Activity Strength”. To this end, the binding strength of each evidence is evaluated according to its activity value. While arbitrary, the applied cutoffs reflect best practice values typically used in pharmaceutical industry. For instance, IC50 values = < 10E-08 M get classified as “strong”, values between 10E-08 and 10E-06 M – classified as “medium”, and values between 10E-06 M and 10E-05 M as “weak”. IC50 values > 10E-05 M are considered unspecific and are set to “no” activity. Once all evidences of a relation have been categorized, the relation will inherit the “strongest” term assigned to any of the evidences. We refer to this latter categorization as the “Relation Strength”. Now, a user can just filter for all “strong” interactions, or all relations with “Relation Strength” > “weak” thus abstracting from the different types of bioactivity measurements.
Results of data integration
Genes and compounds not participating in relations are also deposited in Drug2Gene to make sure users always identify a compound or gene in the database even if only to learn there are no relations known so far. Therefore we integrated all gene entries and all compound entries from genetic (e.g. NCBI Gene, HGNC) and chemical (ChEBI, NCBI PubChem) resources (see Additional file 1: Table S2). If these entries are counted, the total number of targets before unification equals 12,168,716 and that of compounds 124,848,656. After integration they are reduced to 11,232,904 (~ 8% reduction) and 28,730,299 (~77% reduction) respectively (see Additional file 3: Figure S2). These normalized compound and gene namespaces are used during updates of the database and for integration of new relational sources.
Utility and discussion
The “Main search” mode permits building complex queries combining up to ten search fields in more than 40 different indices that characterize compound-target relations. Search fields can be combined with Boolean operators (“AND”, “OR”, “AND NOT”). Each index is searched with a specific set of logical operators (“=”, “!=”, “>”, “<”, “>=”, “<=”), defined by its type of data. For text indices the operators “=” (equal) and “!=” (not equal) are available with “strict” (results should match all terms of the search field) or “fulltext” extension (results match any of the terms provided). Some indices are enhanced with an auto-complete function (Additional file 4: Table S3), which reveals their content to the user by suggesting actual values from the indices upon typing. Above the search field the user can also find a help icon which opens an info box showing the type of values that can be entered for each search index. Each field can be expanded to a multiline mode to enter a list of search terms (e.g. all gene IDs of a pathway or a list of compounds of a screen). The use of the main search is illustrated with short examples in the “Case studies” section.
The “Alternative search” has one input field for query strings, either manually written to build more complex and powerful queries or queries saved from previous sessions. The query syntax is simplistic and intuitive, allowing for searches unavailable or too cumbersome to create using the “Simple search” functionality. The current query string created either automatically by using the “Simple search” or explicitly with the “Alternative search” is always displayed at the top of the “Hit list page”.
The “Results table” has six columns by default, that present the essential relational information about interacting compounds and genes/proteins together with the available activity information. The number and type of columns in the table is configurable through the “select result columns” link above the upper left corner of the table. Additional columns such as the compound structure can thus easily be added. The first column - “Relation ID”, contains the internal Drug2Gene ID of entries and provides a “Show entry” button (Figure 4, C-4) that leads to the “Final page” which displays, depending on the user choice, either a relation, gene, or compound centric view of the data (Figure 3). A red asterisk (*) next to the ID denotes that the relation has been inferred by homology. If entries are selected with the left-hand check box, they can be exported through the export link in one of the available formats (see “Import and export functionality”). The last column in the “Results table” comprises the evidences ordered into subcategories such as activity values, displayed in base SI units. If the activity data had to be standardized after entering the database, an interactive SI icon is displayed in front of it. Evidences and relations can be flagged as true or false by registered users with the respective rights. Each flag can be supported by comments that are visible for all users of the database. This option is developed with the idea to improve and verify the publicly available relational data through a crowd-sourcing effort.
The “Final page” (Figure 3) displays all available fields for the database entry as well as all the available information for the interacting partners and their relations in one of the three views available through the “Show entry” button.
Import and export functionality
Import and export of data are indispensable features for a combined knowledge base like Drug2Gene. The import allows the integration of new data sources and their immediate comparison with the existing datasets. At the same time the flexible export guarantees further usability of data beyond the knowledge base.
The data import functionalities of Drug2Gene are situated in the “Management” link on the navigation bar at the top of each page. They are accessible to registered users and allow the import of relational, gene, and compound data in XML, TSV, or SDF format. Sample files are provided for each format. The maximum upload file size is 20 MB and 50 files are allowed per single upload. Batch file import and access to an import API without those restrictions are available by request. Upon import all relations are automatically subject to the unification and standardization procedures of the database. The normalized namespace of compounds and targets (including those without known interactions) is used as a reference during the import of new relations. All these features were developed to handle large datasets. They are constantly used within our IT infrastructure for the import of proprietary datasets.
The export dialog on the “Hit-list Page” (Additional file 3: Figure S3) provides three types of options: export format (XML, TSV, XLS, and SDF in a compressed or uncompressed file), selection of result columns for the export file, and selection of the range of exported results (all results or just those selected by the user). An entry-oriented export is also available in each view of the “Final page” (Figure 3). If the “Compound Structure” checkbox is selected, the molecular structure of compounds participating in the selected relations is included in the export file. The export of compounds to files in SDF format is very useful as the structures can further be used with chemoinformatics programs within various drug development workflows. Structures of individual compounds are also accessible in mol file format through the “download mol file” link in the “Compound Structure” column of the “Results table” and the corresponding category on each view of the “Final page”.
The search section of Drug2Gene provides a user-friendly and flexible query building interface. It empowers scientists of different research backgrounds to effectively search for drug-target related data. The usefulness of Drug2Gene is illustrated by several exemplary workflows described in this section. More detailed step-by-step instructions together with the corresponding query strings are presented in Additional file 5.
Searching interactions by compound name, formula, or standard identifiers
For example, searching in the compound name index for “zoledronate” returns 59 relations between 55 unique genes and 2 compounds (zoledronate and its salt zoledronate trisodium hydrate). Adding the ‘compound structure’ column to the “Result table” allows visual verification of compound identities. The hit list includes evidences from 7 relational databases – CGDCP, ChEMBL, CTD, DrugBank, PharmGKB, Pubchem BioAssay, and TTD. Their activity types and values are now standardized (“SI” icon in front of the evidence) and directly comparable in the “Results table”. Click on the SI icons to see the original values from the primary resources. If the user prefers only chemically identical matches in the result list (e.g. only the salt), the systematic name or any of the standard identifiers (“InChI”, “InChIKey”, or “SMILES” indices) should be used as a search term. For example a search for the zoledronate salt ZK-thiazolidinone using its InChIKey returns exactly one interaction. InChIKeys are fixed-length character strings fully representing a molecular compound structure. As strings (in contrast to compound structures) can be used in internet search engines they are a very useful start point for chemists or chemoinformaticians to search for a certain compound structure in Drug2Gene or the internet.
In contrast to zoledronate, 1-bromo-2,3-dihydroindole is a compound with no relation data reported, still a user finds the compound in Drug2Gene to distinguish between “compound was not found” and “compound was found but no relations available”.
Finding all known compounds for a pathway
Researchers often need to quickly understand which compounds are available for a set of genes (e.g. a signaling pathway or the key enzymes of a metabolic pathway).
We start a search for compounds modulating the “Tie2 Signaling” pathway. Pathway Commons provides a list of Gene IDs for this pathway , which we enter in the list option searching the ‘Entrez Gene ID’ index of Drug2Gene. The query returns the overwhelming number of 10,646 relations between 6,750 unique compounds and 18 genes. The data is derived from 15,462 evidences coming from 11 primary databases. Most evidences have no bioactivity data (8,684), whereas 6,778 evidences are based on IC50 (3,466), percent inhibition (428), Ki (164), Kd (268), or other activity types (2,452) still relating 4,916 unique compounds to 11 unique genes. Filtering by “Relation Strength” = “Strong” helps to focus conveniently only on the 826 highly active compounds. We may further narrow down the results by filtering for subnanomolar interactions using “Activity Value” < = “1E-09” referring to IC50, EC50, Ki, or Kd (other notations are also accepted, e.g. “0.000000001”, “0.1E-08”, or “10E-10”). This filter returns 93 relations between 86 unique compounds and 6 unique genes. The data (incl. structures and bioactivities) can now be exported as a file in SDF format and loaded into a chemo-informatics workbench to analyze further by chemical properties (e.g. compound similarity clustering, rule of 5, etc.). Another helpful use case showcasing Drug2Gene’s functionality is to explore the target space of a list of compounds, e.g. to evaluate high throughput screening results or to search a whole cluster of structurally similar compounds. We are not aware of any other resource giving this depth of information within a quick query.
Drug repositioning and predicting side effects
Following the slogan “Drug promiscuity for one can open polypharmacological opportunities for the other”, a very common task in drug repositioning (finding novel applications for already approved drugs) is the exhaustive search for interactions between a given compound and all its intended as well as originally unintended targets (i.e. off-targets). Drug2Gene reduces considerably the time required for gathering such information, which normally is spread between many relational sources.
Thalidomide, a hypnosedative drug, originally prescribed to pregnant women for morning sickness tragically led to fetal malformation of the limbs due to its teratogenicity. The teratogenic effect arises supposedly through inhibition of angiogenesis. However, later Thalidomide was found to be effective against leprosy and in treating multiple myeloma . Searching Drug2Gene for “thalidomide” or its analogs (“pomalidomide”, “lenalidomide”), reports 206 relations and 169 targets including CRBN, the gene conveying thalidomides teratogenic effect. When filtered for “Organism Name” = “Homo sapiens” AND “Relation Strength” > = “Weak” (i.e. at least weak binding to human targets) Drug2Gene selects only 20 relations between four compounds and 14 targets where bioactivity has been convincingly evidenced. Some of these targets are interesting starting points to elucidate the mechanism of thalidomide in e.g. leprosy or multiple myeloma.
Another way in which Drug2Gene can help identify novel avenues for a compound or predict side effects is through its added value of homology-inferred relations. An example illustrating this utility can be found in Additional file 5 (Case study 3). It demonstrates the use of Drug2Gene for the identification of novel inhibitors for metalloproteinase enzymes in human.
Comparison with other knowledge bases
To our knowledge, the most comprehensive database until now, STITCH, contains over 300,000 compounds in interactions as well as protein interaction data from 1133 species . Other aggregated knowledge bases (e.g. SLAP  and PiHelper ) offer smaller interaction datasets and restrict their scope to specific subsets of compounds or biological species or optimize their selection of source databases to additional types of data like protein-protein interactions, metabolic pathways, or drug side-effects. In contrast, Drug2Gene aims at providing the most comprehensive set of compound-gene/compound-protein interactions. Currently it contains over 4 million interactions between some 700,000 compounds and more than 40,000 proteins from 1493 organisms (mostly from human, mouse, and rat - see Additional file 3: Figure S4). Importantly, aside from hosting public data, Drug2Gene also allows the upload and integration of proprietary drug-target information. In addition, Drug2Gene provides a unique combination of data content, search functionalities, data export functionalities, and homology based predictions. Its most differentiating feature is the strict unification and standardization of all data allowing for the first time to mine effectively and non-redundantly over the full chemical and biological space including the bioactivity-based evidences. Algorithms and decision rules implemented within the database make use of this standardization by evaluating each bioactivity data to classify the binding strength. With this categorization in place, even inexperienced users without detailed knowledge about IC50, Ki, or Kd may easily differentiate between affine or weak compounds.
Considering the search functionalities, Drug2Gene provides all major types of search options available in other databases - an all text index as well as separate indices for each data category. Instead of several single entity-oriented search interfaces, Drug2Gene has a unique relation-centered interface, which allows the combination of search terms related to different types of entities - genetic, chemical, and experimental. This innovative approach enables the effective harnessing of specific subsets of relations through step-by-step query building. The consecutive filtering of initial search results using binding strength classification or detailed numerical evaluations like IC50 < 10 nM allows the user to assess the effect of each additional criteria and adjust the query according to it. The search supports the full list of Boolean operators (including “AND NOT”) for query construction. Also very complex queries can be built using the “Alternative search” tab or by interrogating Drug2Gene for a full gene and/or compound set using the multiline search mode.
Another distinguishing feature of Drug2Gene is the very flexible export of results available for result sets, for selected entries, or in entry-oriented mode. The export options completely release the data, including compound structures, for use in custom pipelines or special purpose (e.g. chemo-informatics) workflows.
Unlike alternative resources, homology-inferred relations in Drug2Gene are displayed whenever protein similarity suggests a compound may also bind to a hitherto unrelated protein. They can be searched through the indices “Homolog Organism” and “Homolog Similarity” and accessed independently of the drug target relations underlying their creation.
Drug2Gene in its current version 3.2 still lacks some features that can be found in other resources, e.g. (sub-) structure-based similarity search many of which we will address with the next release of Drug2Gene. For instance, in the next release 4.0 the effectiveness of searching will be enhanced by allowing to draw and search by chemical structure as well as by structural similarity using Tanimoto distance. Regarding the representation of hit lists we are going to add a graphical network view that allows users to quickly and intuitively navigate through the interaction data, to identify common interaction targets, and to identify possible side-effects.
We have integrated 19 publicly available source databases and developed a user friendly web-interface that allows not only simple but also highly complex database searches, covering a broad range of user requirements. The current version of Drug2Gene (version 3.2) hosts over 28 million compounds, some 11 million genes, and more than 4 million compound-target interactions. All integrated bioassay data is strictly standardized and normalized, facilitating comparative meta-analyses. This extensive data collection is further extended by homology inferred interactions that can be used e.g. for identifying novel applications for known drugs or predicting unwanted side effects.
On top of this unprecedented data set, we provide extensive features for not only exporting the data in various formats to facilitate subsequent incorporation into third-party scientific research tools but also for uploading user defined datasets.
All of the above mentioned transform the Drug2Gene knowledge base into a ‘one-stop shop’ for identifying tool compounds for genes or pathways or finding all known targets of a drug or to evaluate the biology of the compound hit list of a high-throughput drug screen.
Availability and requirements
The resource is freely available at http://drug2gene.com. Crowd-sourcing features require registration to enable flagging, annotation of Drug2Gene entries, or upload of own data sets. Authenticated access is provided upon motivated request through the contact section of the site. Due to hardware limitations, a limit of 10,000 entries per single export for guest users exists. Download of larger sets is available upon request (for registered users). All other functionalities are completely accessible without limitation or registration.
- CAS Number:
Chemical Abstracts Service Number
International Chemical Identifier
International System of Units (abbreviated SI from French, Le Systeme international d’unites)
Simplified Molecular-Input Line-Entry System.
The authors thank the developers of all Drug2Gene source databases for making their interaction data available. This work was funded by Bayer HealthCare AG.
- Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP: ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2011, 40: D1100-D1107.View ArticlePubMed CentralPubMedGoogle Scholar
- ChEMBL 17 release notes. ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_17/chembl_17_release_notes.txt
- Liu T, Lin Y, Wen X, Jorissen RN, Gilson MK: BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Res. 2007, 35: D198-D201. 10.1093/nar/gkl999.View ArticlePubMed CentralPubMedGoogle Scholar
- BindingDB Website - Info. http://www.bindingdb.org/bind/info.jsp,
- Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V, Djoumbou Y, Eisner R, Guo AC, Wishart DS: DrugBank 3.0: a comprehensive resource for “Omics” research on drugs. Nucleic Acids Res. 2010, 39: D1035-D1041.View ArticlePubMed CentralPubMedGoogle Scholar
- Wassermann AM, Bajorath J: BindingDB and ChEMBL: online compound databases for drug discovery. Expert Opin Drug Discov. 2011, 6: 683-687. 10.1517/17460441.2011.579100.View ArticlePubMedGoogle Scholar
- Kuhn M, Szklarczyk D, Franceschini A, von Mering C, Jensen LJ, Bork P: STITCH 3: zooming in on protein-chemical interactions. Nucleic Acids Res. 2011, 40: D876-D880.View ArticlePubMed CentralPubMedGoogle Scholar
- Hecker N, Ahmed J, von Eichborn J, Dunkel M, Macha K, Eckert A, Gilson MK, Bourne PE, Preissner R: SuperTarget goes quantitative: update on drug-target interactions. Nucleic Acids Res. 2011, 40: D1113-D1117.View ArticlePubMed CentralPubMedGoogle Scholar
- Chen B, Ding Y, Wild DJ: Assessing drug target association using semantic linked data. PLoS Comput Biol. 2012, 8: e1002574-10.1371/journal.pcbi.1002574.View ArticlePubMed CentralPubMedGoogle Scholar
- Sugaya N, Kanai S, Furuya T: Dr. PIAS 2.0: an update of a database of predicted druggable protein-protein interactions. Database J Biol Databases Curation. 2012, 2012: bas034-Google Scholar
- Von Eichborn J, Murgueitio MS, Dunkel M, Koerner S, Bourne PE, Preissner R: PROMISCUOUS: a database for network-based drug-repositioning. Nucleic Acids Res. 2010, 39: D1060-D1066.View ArticlePubMed CentralPubMedGoogle Scholar
- Fu C, Jin G, Gao J, Zhu R, Ballesteros-villagrana E, Wong STC: DrugMap Central: an on-line query and visualization tool to facilitate drug repositioning studies. Bioinformatics. 2013, 29: 1834-1836. 10.1093/bioinformatics/btt279.View ArticlePubMed CentralPubMedGoogle Scholar
- Aksoy BA, Gao J, Dresdner G, Wang W, Root A, Jing X, Cerami E, Sander C: PiHelper: an open source framework for drug-target and antibody-target data. Bioinformatics. 2013, 29: 2071-2072. 10.1093/bioinformatics/btt345.View ArticlePubMed CentralPubMedGoogle Scholar
- Gong J, Cai C, Liu X, Ku X, Jiang H, Gao D, Li H: ChemMapper: a versatile web server for exploring pharmacology and chemical structure association based on molecular 3D similarity method. Bioinformatics. 2013, 29: 1827-1829. 10.1093/bioinformatics/btt270.View ArticlePubMedGoogle Scholar
- NCBI Resource Coordinators: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2012, 41: D8-D20.View ArticlePubMed CentralGoogle Scholar
- Nicola G, Liu T, Gilson MK: Public domain databases for medicinal chemistry. J Med Chem. 2012, 55: 6987-7002. 10.1021/jm300501t.View ArticlePubMed CentralPubMedGoogle Scholar
- The NCI Cancer Gene Data Curation Pilot. https://wiki.nci.nih.gov/display/caBIO/caBIO+Data+Sources#caBIODataSources-CancerGeneIndexProject,
- Davis AP, Murphy CG, Johnson R, Lay JM, Lennon-Hopkins K, Saraceni-Richards C, Sciaky D, King BL, Rosenstein MC, Wiegers TC, Mattingly CJ: The Comparative Toxicogenomics Database: update 2013. Nucleic Acids Res. 2012, 41: D1104-D1114.View ArticlePubMed CentralPubMedGoogle Scholar
- Sharman JL, Mpamhanga CP, Spedding M, Germain P, Staels B, Dacquet C, Laudet V, Harmar AJ, NC-IUPHAR: IUPHAR-DB: new receptors and tools for easy searching and visualization of pharmacological data. Nucleic Acids Res. 2010, 39: D534-D538.View ArticlePubMed CentralPubMedGoogle Scholar
- Molecular Imaging and Contrast Agent Database (MICAD). http://www.ncbi.nlm.nih.gov/books/NBK5330/,
- Roth BL, Lopez E, Patel S, Kroeze WK: The Multiplicity of Serotonin Receptors: Uselessly Diverse Molecules or an Embarrassment of Riches?. Neuroscientist. 2000, 6: 252-262. 10.1177/107385840000600408.View ArticleGoogle Scholar
- McDonagh EM, Whirl-Carrillo M, Garten Y, Altman RB, Klein TE: From pharmacogenomic knowledge acquisition to clinical applications: the PharmGKB as a clinical pharmacogenomic biomarker resource. Biomark Med. 2011, 5: 795-806. 10.2217/bmm.11.94.View ArticlePubMed CentralPubMedGoogle Scholar
- Zhu F, Shi Z, Qin C, Tao L, Liu X, Xu F, Zhang L, Song Y, Liu X, Zhang J, Han B, Zhang P, Chen Y: Therapeutic target database update 2012: a resource for facilitating target-oriented drug discovery. Nucleic Acids Res. 2012, 40: D1128-D1136. 10.1093/nar/gkr797.View ArticlePubMed CentralPubMedGoogle Scholar
- The UniProt Consortium: Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res. 2012, 40: D71-D75.View ArticlePubMed CentralGoogle Scholar
- Feng Z, Chen L, Maddula H, Akcan O, Oughtred R, Berman HM, Westbrook J: Ligand Depot: a data warehouse for ligands bound to macromolecules. Bioinformatics. 2004, 20: 2153-2155. 10.1093/bioinformatics/bth214.View ArticlePubMedGoogle Scholar
- Seal RL, Gordon SM, Lush MJ, Wright MW, Bruford EA: genenames.org: the HGNC resources in 2011. Nucleic Acids Res. 2011, 39: D514-D519. 10.1093/nar/gkq892.View ArticlePubMed CentralPubMedGoogle Scholar
- Laskowski RA: PDBsum new things. Nucleic Acids Res. 2009, 37: D355-D359. 10.1093/nar/gkn860.View ArticlePubMed CentralPubMedGoogle Scholar
- De Matos P, Alcántara R, Dekker A, Ennis M, Hastings J, Haug K, Spiteri I, Turner S, Steinbeck C: Chemical Entities of Biological Interest: an update. Nucleic Acids Res. 2010, 38: D249-D254. 10.1093/nar/gkp886.View ArticlePubMed CentralPubMedGoogle Scholar
- Bolton EE, Wang Y, Thiessen PA, Bryant SH: Chapter 12 PubChem: Integrated Platform of Small Molecules and Biological Activities. Annu Reports Comput Chem. 2008, 4: 217-241.View ArticleGoogle Scholar
- Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Zhou Z, Han L, Karapetyan K, Dracheva S, Shoemaker BA, Bolton E, Gindulyte A, Bryant SH: PubChem’s BioAssay Database. Nucleic Acids Res. 2012, 40: D400-D412. 10.1093/nar/gkr1132.View ArticlePubMed CentralPubMedGoogle Scholar
- Dalby A, Nourse JG, Hounshell WD, Gushurst AKI, Grier DL, Leland BA, Laufer J: Description of several chemical structure file formats used by computer programs developed at molecular design limited. J Chem Inf Comput Sci. 1992, 32: 244-255. 10.1021/ci00007a012.View ArticleGoogle Scholar
- Cerami EG, Gross BE, Demir E, Rodchenkov I, Babur Ö, Anwar N, Schultz N, Bader GD, Sander C: Pathway Commons, a web resource for biological pathway data. Nucleic Acids Res. 2011, 39: D685-D690. 10.1093/nar/gkq1039.View ArticlePubMed CentralPubMedGoogle Scholar
- Xu M, Hou Y, Sheng L, Peng J: Therapeutic effects of thalidomide in hematologic disorders: a review. Front Med. 2013, 7: 290-300. 10.1007/s11684-013-0277-z.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.