Contextualization of drug-mediator relations using evidence networks
© The Author(s). 2017
Published: 31 May 2017
Genomic analysis of drug response can provide unique insights into therapies that can be used to match the “right drug to the right patient.” However, the process of discovering such therapeutic insights using genomic data is not straightforward and represents an area of active investigation. EDDY (Evaluation of Differential DependencY), a statistical test to detect differential statistical dependencies, is one method that leverages genomic data to identify differential genetic dependencies. EDDY has been used in conjunction with the Cancer Therapeutics Response Portal (CTRP), a dataset with drug-response measurements for more than 400 small molecules, and RNAseq data of cell lines in the Cancer Cell Line Encyclopedia (CCLE) to find potential drug-mediator pairs. Mediators were identified as genes that showed significant change in genetic statistical dependencies within annotated pathways between drug sensitive and drug non-sensitive cell lines, and the results are presented as a public web-portal (EDDY-CTRP). However, the interpretability of drug-mediator pairs currently hinders further exploration of these potentially valuable results.
In this study, we address this challenge by constructing evidence networks built with protein and drug interactions from the STITCH and STRING interaction databases. STITCH and STRING are sister databases that catalog known and predicted drug-protein interactions and protein-protein interactions, respectively. Using these two databases, we have developed a method to construct evidence networks to “explain” the relation between a drug and a mediator.
We applied this approach to drug-mediator relations discovered in EDDY-CTRP analysis and identified evidence networks for ~70% of drug-mediator pairs where most mediators were not known direct targets for the drug. Constructed evidence networks enable researchers to contextualize the drug-mediator pair with current research and knowledge. Using evidence networks, we were able to improve the interpretability of the EDDY-CTRP results by linking the drugs and mediators with genes associated with both the drug and the mediator.
We anticipate that these evidence networks will help inform EDDY-CTRP results and enhance the generation of important insights to drug sensitivity that will lead to improved precision medicine applications.
Response to a drug within a cancer cell involves complex protein signaling processes dependent on the molecular context of the cell and the properties of the individual drug. Transcriptomic data of cancer cell lines coupled with drug response data constitute a rich data set to study drug response and underlying molecular mechanisms. However, the scale of these data presents many unique analytical challenges. Data driven approaches generate a large number of associations and observations that can stand as testable hypotheses. We have utilized this genomic data in the development of a unique algorithm, EDDY (Evaluation of Differential DependencY) , that uses gene expression data and conditions to construct differential dependency networks of given gene sets  between the conditions.
Through statistical interrogation of gene dependencies within an annotated pathway from a gene network catalog such as REACTOME , EDDY repeatedly constructs networks from resampled RNAseq data for each of two conditions. The divergence between the two resulting distributions of networks can then be assessed for significance through permutation test.
EDDY was used with data integrated from the Cancer Therapeutics Response Portal (CTRP) and the Cancer Cell Line Encyclopedia (CCLE) [4–6]. The CTRP dataset contains drug-response measurements for more than 400 small-molecules applied to CCLE cell lines. For each compound, cell lines were classified as either sensitive or non-sensitive for analysis by EDDY in order to identify: 1) pathways enriched with differential dependency between sensitive and non-sensitive cell lines for each compound, and 2) differential dependency networks (DDNs) that capture how gene dependency was rewired. We then identified the genes, termed “mediators”, that played a significantly different role (based on gene dependency networks) between cell lines that were sensitive to a drug and cell lines that were non-sensitive. The details of this analysis and the results have been published in a separate article . We will refer to this analysis as EDDY-CTRP throughout this manuscript.
We predict that these drug-mediator pairs have potential as testable hypotheses on drug sensitivity and discovery of novel drug targets. However, the interpretability of these results serves as a bottleneck on further experimental validation. Currently, to further understand these drug-mediator pairs, a researcher must manually search through current literature, which is often a slow and inefficient process. Furthermore, the sheer volume of peer-reviewed research prevents researchers from reliably finding the most pertinent data to inform these hypotheses.
There are currently multiple databases that can help alleviate this problem by cataloging drug-protein interactions and protein-protein interactions, such as Pathway Commons, STITCH, STRING and BioGrid [8–11]. Currently though, these databases have no easy way of exploring possible drug-mediator relationships. Some of these databases allow for researchers to query for both a drug and a gene but the presentation of the relationships make no effort to show how the drug and gene may be related and often end up displaying many irrelevant genes.
In this study, we attempt to improve the interpretability of the EDDY-CTRP results by contextualizing the drug-mediator pairs with current research using evidence networks generated from the STRING and STITCH knowledge-bases. For this study, we define evidence networks as sub-networks of knowledge-bases that present the most relevant intermediate nodes that have established functional associations with the drug and mediator based on prior research. We chose to use STRING and STITCH as our knowledge-bases for their comprehensive volume of data and for their distinction of different types of evidence into separate association scores.
STRING and STITCH databases
The edges of the STRING and STITCH network were downloaded as flat files and were reconstructed into networks. Each edge in the STRING and STITCH database included scores based on how much evidence established them and how compelling they are. These scores were further broken down into sub-scores based on what type of source the evidence came from. Specific descriptions of evidences for the edges were downloaded separately as a PostGreSQL database, which was later queried against to annotate the generated evidence networks.
In order to maximize the accuracy of chemical matching, all drugs were reduced to SMILES strings (a string representation of a compound’s molecular structure). From their SMILES strings, drugs were then hashed into their respective InChIKeys. InChIKeys are encoded strings that are unique to a chemical structure. The InChIKeys were then queried against the STITCH database, which also stores InChIKeys for most of its catalogued chemicals. The use of InChIKeys helped minimize the number of drugs that are matched with other drugs that may use the same aliases. Like STITCH, all InChIKeys were reduced to their non-stereospecific forms, and all salt forms of compounds were considered as one compound. All InchiKey conversions were done using the RDKit open-source cheminformatics software package .
Construction of evidence networks
The evidence network is a network made of a compound as a starting node, a mediator as a terminal node, and a set of genes connecting the two. It suggests an evidence-supported explanation for why and how the mediator gene is interacting with the compound.
Evidence networks were generated using a modified Yen’s K Shortest Paths  algorithm with a weight function of w edge = 1 − S edge , where S edge is an evidence score described above. Hence, edges with higher scores would be preferred over edges with lower scores (all scores range from 0 to 1). To generate the evidence networks, shortest paths between a compound and a mediator were iteratively found and added to the network until there were no more paths from the compound to the mediator or until there were at least N distinct nodes in the sub-network, where N is some arbitrary threshold. N was not a strict floor, as sometimes the last path added to the sub-network would add two or more distinct nodes pushing the total number of distinct nodes over the threshold. Instead, N was used simply as a stopping condition and was chosen in order to prevent generation of evidence networks that would be too overwhelming for users to interpret. Dijkstra’s Shortest Path algorithm  with a Fibonacci heap  was used as the supporting shortest path algorithm in the modified Yen’s K-Shortest Paths algorithm, as shown in Algorithm 1.
Extensions of evidence networks
In addition to the methods described above, two additional extensions of evidence networks were explored. One extension included exploring the use of single-source evidence networks constructed by including a mediator and k number of closest drugs found in the STITCH/STRING database. These types of evidence networks were initially employed in the situation where EDDY-CTRP identified a drug-mediator pair but the drug could not be found in the STITCH database. If a homologous compound could be identified, evidence nets could then be constructed using the homolog as a substitute for the original drug. We can envision a variety of additional applications for this extension. For example, in the situation where a mediator is identified and it appears to play a role in resistance to a compound, the identification of drugs that are known to interact, directly or indirectly, with this mediator might then suggest a possible combination therapy with the original compound.
In order to preserve the pathway-specific context of the mediator, a second extension of evidence networks was explored which constructed evidence subnetworks for each direct neighbor of a mediator in the original differential dependency network. The direct neighbors of a mediator were defined as genes that were directly connected to the mediator in either of the condition-specific dependency networks for a given mediator. To merge the direct neighbor evidence subnetworks, a set was created containing all distinct nodes from each subnetwork with the addition of the original mediator. Then, for each pair of nodes in this merged set, we checked the STITCH/STRING network to see if an edge existed between the nodes and included it if it did. If there were no paths between a direct neighbor and the drug, the direct neighbor was still included as a node in the network. Since this extension required building an evidence subnetwork for each direct neighbor, the resulting network, while potentially increased in density, often related more clearly to the original DDN and, thereby, its associated pathway. This “pathway-weighted” contextualization aims to extend the filtering of evidence networks, relating the biological context of the mediator’s original DDN to the compound and its known targets.
EDDY-CTRP evidence networks
Distribution of the number of intermediate genes in shortest path between compound and mediator pair
# of intermediate genes in shortest path
# of pairs
We note that 102 evidence networks indeed were direct compound and mediator relations, among which only 34 of them were intended targets defined in the CTRP data and annotation. This indicates STITCH contains drug-target relations that were not included in the CTRP database, but EDDY-CTRP analysis was able to discover those relations. Most of these evidence networks were for drug-mediator pairs where mediators were not direct targets of the drug (according to the CTRP annotation) but had some known functional association to the drug (based on STITCH/STRING database).
To help the user find the most direct paths from the drug to the gene of interest, the most direct path can be highlighted by clicking the on-screen button. To explore alternative paths, the user can also highlight the next shortest paths by repeatedly clicking the on-screen button labeled “Next Shortest.” Data channel weights are also included in the interface to allow users to weight different types of evidence based on their preferences. For example, a user who does not find text-mining evidence to be compelling can prioritize text mining scores to “LOW” or “NONE,” and the edge weights and shortest paths will be recalculated and redrawn accordingly.
Evidence network corroborates DAPK3’s role as a mediator for TG-101348
BRD-8899:BRCA2 evidence network discovers novel interaction between STK33 and DNA repair mechanism
Single-source evidence network of CCNH discovers potential alternative compounds for OSI-027
Pathway-weighted contextualization of the BRD-8899:BRCA2 develops additional features of possible STK33 role in homologous repair
Returning to the BRD-8899 – BRCA2 mediator pair, nearest neighbors from the original DDN have been supplemented leading to greater contextual clarity (Fig. 3), compared to Fig. 5c. The ROCK2 association from the original evidence network, which did not relate to the homologous repair pathway, is no longer present, but new connections related to RAD52, RAD50, NBN and ATM have been added. ATM and RAD50 have direct links to STK33, the target of the compound, and RAD52 and NBN link to STK33 via CDK2. These additional links suggest possible means by which STK33, and thereby BRD-8899, influences homologous repair. As an essentiality mediator, BRCA2 plays a more significant role in the non-sensitive network, which manifests as two non-sensitive specific (blue) edges in the DDN to NBN and RAD50. In including the two condition-specific nodes, the evidence net acquires a condition-specific bias. We develop this idea further in the discussion below.
In the effort to support the statistical inferences discovered by EDDY, the STITCH/STRING databases provided an abundance of support, which could be filtered using different approaches. Employing a naïve shortest paths strategy, priority was given to the strength of support for edges while minimizing distance between compound and mediator. However, this approach often risked losing the relevance of the original biological pathway used in the EDDY discovery. The coupled effect of a promiscuous compound with a pleiotropic gene could potentially engender numerous unrelated networks. Merging the mediator network with those of its nearest-neighbors aimed to maintain the evidentiary focus of the original EDDY inference.
In this project, we have improved the interpretability of the EDDY-CTRP results by generating evidence networks of the most relevant intermediate genes using the STRING and STITCH knowledge-bases. With these evidence networks and our Cytoscape.js-based user interface, we expect that the EDDY-CTRP results can be used to form hypotheses based on these contextualized drug-mediator pairs.
Besides facilitating drug-mediator pair interpretation, evidence networks can be used in a more flexible manner, such as when single-ended evidence networks were employed to identify candidate compounds for interaction with a mediator. Furthermore, integration of network information for a mediator’s neighbors can preserve the pathway context of the original DDN. We furthered this synthesis of DDN and evidence network information through the incorporation of condition-specific neighbor nodes.
In the future, we hope to generalize these evidence networks so they can be used with other knowledge bases and with other drug-gene pairs. Other methods such as high-throughput drug screening generate drug-gene hypotheses similar to EDDY-CTRP and would benefit from an algorithmic approach to contextualizing these hypotheses with current research. In future iterations, we aim to use alternative algorithms to Yen’s K Shortest Paths such as Eppstein’s K Shortest Paths  in order to optimize the speed at which the evidence networks are generated. With faster support algorithms, it could be possible to create an interface that would allow researchers to query any drug-gene pair they might be interested in and receive an on-the-fly generated evidence network.
Research reported in this publication and publication charges were supported by the National Cancer Institute of the National Institutes of Health under Award Number U01CA168397. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Hai Tran was supported by the Helios Education Foundation through the Helios Scholars at TGen summer internship program in biomedical research at the Translational Genomics Research Institute (TGen) in Phoenix, Arizona.
Availability of data and materials
The datasets generated and/or analyzed during the current study are available in the EDDY-CTRP repository, http://biocomputing.tgen.org/software/EDDY/CTRP
HT designed and implemented the analysis, collected the data, and drafted the manuscript. GS contributed to the concept, data interpretation, manuscript draft, and critical revisions. JK contributed to the data interpretation. SK contributed to the concept, data interpretation, manuscript draft, and critical revisions. All authors have read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
About this supplement
This article has been published as part of BMC Bioinformatics Volume 18 Supplement 7, 2017: Proceedings of the Tenth International Workshop on Data and Text Mining in Biomedical Informatics. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-18-supplement-7.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Jung S, Kim S. EDDY: a novel statistical gene set test method to detect differential genetic dependencies. Nucleic Acids Res. 2014;42(7):e60.View ArticlePubMedPubMed CentralGoogle Scholar
- Speyer G, Kiefer J, Dhruv H, Berens M, Kim S. Knowledge-Assisted Approach to Identify Pathways With Differential Dependencies. Pac Symp Biocomput 2016. 2016;21:33–44.View ArticleGoogle Scholar
- Fabregat A, Sidiropoulos K, Garapati P, Gillespie M, Hausmann K, Haw R, Jassal B, Jupe S, Korninger F, McKay S, et al. The Reactome pathway Knowledgebase. Nucleic Acids Res. 2016;44:D481–7.View ArticlePubMedGoogle Scholar
- Barretina J. The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity. Nature. 2012;483:602–7.View ArticleGoogle Scholar
- Seashore-Ludlow B. Harnessing Connectivity in a Large-Scale Small-Molecule Sensitivity Dataset. Cancer Discovery. 2015;5:1210–23.View ArticlePubMedPubMed CentralGoogle Scholar
- Basu A, Bodycombe NE, et al. An Interactive Resource to Identify Cancer Genetic and Lineage Dependencies Targeted by Small Molecules. Cell. 2013;154:1151–61.View ArticlePubMedPubMed CentralGoogle Scholar
- Speyer G, Mahendra D, Tran HJ, Kiefer J, Schreiber S, Clemon P, Dhruv H, Berens M, Kim S. Differential pathway dependency discovery associated with drug response across cancer cell lines. Pac Symp Biocomput 2017. 2017;22:497–508.View ArticleGoogle Scholar
- Cerami E, Gross B, Demir E, Rodchenkov I, Babur O, Anwar N, Schultz N, Bader G, Sander C. Pathway Commons, a web resource for biological pathway data. Nucleic Acids Res. 2011;39:D685–90.View ArticlePubMedGoogle Scholar
- Kuhn M. STITCH 4: integration of protein-chemical interactions with user data. Nucleic Acids Res. 2014;42:D401–7.View ArticlePubMedGoogle Scholar
- Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, et al. STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2015;43:D447–52.View ArticlePubMedGoogle Scholar
- Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2006;34 Suppl 1:D535–9.View ArticlePubMedGoogle Scholar
- RDKit, Open-Source Cheminformatics http://rdkit.org. Accessed 24 Apr 2017.
- Yen J. Finding the K Shortest Loopless Paths in a Network. Manag Sci. 1971;17(11):712–6.View ArticleGoogle Scholar
- Dijkstra E. A note on two problems in connexion with graphs. Numer Math. 1959;1(4):269–71.View ArticleGoogle Scholar
- Fredman ML, Tarjan RE. Fibonacci heaps and their uses in improved network optimization algorithms. J Assoc Comput Mach. 1987;34(3):596–615.View ArticleGoogle Scholar
- Franz M, Lopes CT, Huck G. Cytoscape.js: a graph theory library for visualisation and analysis. Bioinformatics. 2016;32:309–11.PubMedGoogle Scholar
- Nea S. Physical and functional interactions between STAT3 and ZIP kinase. Int Immunol. 2005;17(12):1543–52.View ArticleGoogle Scholar
- Faried LS. Inhibition of the mammalian target of rapamycin (mTOR) by rapamycin increases chemosensitivity of CaSki cells to paclitaxel. Eur J Cancer. 2006;42(7):934–47.View ArticlePubMedGoogle Scholar
- Meng H. SNS-032 inhibits mTORC1/mTORC2 activity in acute myeloid leukemia cells and has synergistic activity with perifosine against Akt. J Hermatol Oncol. 2013;6:18.View ArticleGoogle Scholar
- Kea L. Aloe-emodin suppresses prostate cancer by targeting the mTOR complex 2. Carcinogenesis. 2012;33(7):1406–11.View ArticleGoogle Scholar
- Eppstein D. Finding the k Shortest Paths. SIAM J Comput. 1999;28(2):652–73.View ArticleGoogle Scholar