- Methodology article
- Open Access
- Published:
Novel deep learning model for more accurate prediction of drug-drug interaction effects
BMC Bioinformatics volume 20, Article number: 415 (2019)
Abstract
Background
Predicting the effect of drug-drug interactions (DDIs) precisely is important for safer and more effective drug co-prescription. Many computational approaches to predict the effect of DDIs have been proposed, with the aim of reducing the effort of identifying these interactions in vivo or in vitro, but room remains for improvement in prediction performance.
Results
In this study, we propose a novel deep learning model to predict the effect of DDIs more accurately.. The proposed model uses autoencoders and a deep feed-forward network that are trained using the structural similarity profiles (SSP), Gene Ontology (GO) term similarity profiles (GSP), and target gene similarity profiles (TSP) of known drug pairs to predict the pharmacological effects of DDIs. The results show that GSP and TSP increase the prediction accuracy when using SSP alone, and the autoencoder is more effective than PCA for reducing the dimensions of each profile. Our model showed better performance than the existing methods, and identified a number of novel DDIs that are supported by medical databases or existing research.
Conclusions
We present a novel deep learning model for more accurate prediction of DDIs and their effects, which may assist in future research to discover novel DDIs and their pharmacological effects.
Background
Combination drug therapies are becoming a promising approach for several diseases including cancer, hypertension, asthma and AIDS, since they can increase drug efficacy, decrease drug toxicity or reduce drug resistance [1]. However, the combination of drugs may result in interactions between drugs (drug-drug interactions, DDIs), which are a major cause of adverse drug events (ADEs) [2, 3]. It is estimated that DDIs are associated with 30% of all reported ADEs [4]. In addition, ADEs due to critical DDIs have led to the withdrawal of drugs from the market [5]. Therefore, precise prediction of the effect of DDIs is important for safer and improved prescription to patients.
DDIs can be identified with in vivo models using high-throughput screening [6]. However, the price of such procedures is relatively high, and testing large numbers of drug combinations is not practical [7]. To reduce the number of possible drug combinations, numerous computational approaches have been proposed [8,9,10,11,12,13,14,15].
In some of these computational approaches, drug-target networks are constructed, and DDIs are detected by measuring the strength of network connections [13], or by identifying drug pairs that share drug targets or drug pathways using the random walk algorithm [14].
Other major categories of these computational approaches are based on the structural and side effect similarities of drug pairs. For example, Gottlieb et al. proposed the Inferring Drug Interactions (INDI) method, which predicts novel DDIs from chemical and side effect similarities of known DDIs [8], and Vilar et al. used similarities of fingerprints, target genes, and side effects of drug pairs [9, 10]. Cheng et al. constructed features from Simplified Molecular-Input Line-Entry System (SMILES) data and side effect similarity of drug pairs, and applied support vector machines to predict DDIs [11]. Zhang et al. constructed a network of drugs based on structural and side effect similarities, and applied a label propagation algorithm to identify DDIs [12]. Recently, Ryu et al. proposed DeepDDI, a computational framework that calculates structural similarity profiles (SSP) of DDIs, reduces features using principal component analysis (PCA), and feeds them to the feed-forward deep neural network [15]. The platform generated 86 labeled pharmacological DDI effects, so DeepDDI is basically a multi-classification (multi-label classification) model.
To increase the classification accuracy in the present study, we proposed a novel deep learning based model that uses additional features from target genes and their known functions. We constructed target similarity profiles (TSP) and Gene Ontology (GO) term similarity profiles (GSP), as well as SSP. Because the input size is too large when combining TSP, GSP, and SSP, we used an autoencoder [16] to reduce the feature. Our autoencoder model is trained to minimize the difference between input and output, and at the same time, trained to minimize the error of prediction of DDI labels. Our model showed improved classification accuracy, and we were able to identify novel DDIs with their pharmacological effects.
Results
We developed a novel deep learning model to predict pharmacological effects of DDIs. This model uses an autoencoder to reduce the dimensions of three similarity profiles of drug pairs, and uses a deep feed-forward network that predicts DDI type from reduced similarity profiles. Three similarity profiles are calculated using the chemical structures (SSP), target genes (TSP), and target genes’ biological/molecular function (GSP) of known drug pairs. The entire process is depicted in Fig. 1, and detailed descriptions are provided in the methods section.
To train our model, we downloaded 396,454 known DDIs of 177 types, and SMILES and target gene information for drugs from DrugBank [17]. Functional Interaction (FI) networks were downloaded from BioGrid [18]. FI networks are composed of 22,032 genes. The GO database was downloaded from the Gene Ontology Consortium [19, 20]. The GO database is composed of 45,106 GO terms, and we used 29,692 GO terms in biological processes. Drugs with no target gene information were excluded, and DDI types with fewer than five DDIs were excluded. Finally, 188,258 DDIs of 106 types (Additional file 1: Table S1) and 1597 drugs were used for the experiments.
Our model was learned using different combinations of SSP, TSP, and GSP. The accuracy, macro precision, macro recall, micro precision, micro recall, and the area under the Precision/Recall curve (AUPRC) were calculated using 5-fold cross-validation. These performance metrics are as follows:
where n and l indicate number of samples and DDI types respectively, yi is a predicted value of true DDI type in the DrugBank database of sample i, and TP, TN, FP and FN are true positive, true negative, false positive and false negative, respectively.
Figure 2 shows that incorporating TSP and GSP increases the classification accuracy. The tests using GSP and TSP only, and those using both GSP and TSP, did not generate good classification accuracy (< 0.5). We were also able to observe that TSP and GSP increase classification accuracy in terms of AUPRC. Figure 3 shows cost curves for an autoencoder and deep feed-forward networks, and it can be observed that while the deep feed-forward networks for TSP and GSP converge, the costs are relatively large. Although GSP and TSP are not good single similarity measures, they increased the prediction performance using SSP.
We can see that SSP using the autoencoder (yellow in Fig. 2) generates superior results to those of SSP using PCA [15] in Figs. 4 and 5. We can also confirm that the proposed model shows better performance than baseline methods such as SVM or Random Forest. The hyper-parameters for SVM and Random Forest are provided in Table 1. For the proposed model and that of Ryu et al. [15] in Figs. 2, 4, and 5, the number of features was reduced to 200 using the autoencoder or PCA, and the features for SVM and Random Forest were not reduced.
To observe the performance of each method more specifically, we compared the results for each DDI type. Greater or the same classification accuracy was observed for 101 out of 106 DDI types in two cases using the proposed model (Figs. 6 and 7).
Discussions
Among the true positive predictions in the 5-fold cross-validation results, we selected drug pairs with a predicted value of other DDI type (not the ground truth from Drugbank v5.1.1) greater than or equal to 0.5, and provided these in Additional file 1: Table S2. Among 580 such drug pairs, 86 (14.8%) drug pairs were supported by other databases or existing studies. Among the 86 drug pairs that were supported, we show 12 drug pairs with prediction score > 0.8 in Table 2. The types of the first three DDIs in Table 2 were 100, 100, and 76 in DrugBank v5.1.1, but they were updated to 86, 86, and 18 in DrugBank v5.1.2, and our prediction scores were very high for these three DDIs.
Our work has two potential limitations. First, DDIs in DrugBank are mostly inferred pharmacokinetic interactions, so the DDIs predicted by the proposed model, as well as their clinical consequences should be validated. Second, the optimal values for the hyper-parameters such as learning rate, number of hidden units/layers, and drop-out rate were obtained by iterative experiments for our setting, so the experimental results can be changed for different settings including different dataset version or experimental environment. We recommend that potential users of the proposed model identify their own optimal hyper-parameters through cross-validation.
Conclusion
In this study, we propose a novel deep learning model for more accurate prediction of the pharmacological effects of DDIs. The proposed model is trained using three similarity profiles, SSP, TSP, and GSP, of each drug. Those similarity profiles are reduced using autoencoders and fed into a deep feed-forward network to predict the type of each DDI. The proposed model showed improved classification accuracy over existing models. We found that GSP and TSP can increase the prediction performance. We also predicted new effects of numerous DDIs, many of which were supported by a number of databases or previous studies.
Methods
Similarity measures
We used three similarity measures using three profiles, structural similarity profile (SSP), target gene similarity profile (TSP), and Gene Ontology (GO) term similarity profile (GSP).
SSP for drug A is a vector of structural similarity values between A and the rest of the drugs. A structural similarity between two drugs is a Tanimoto coefficient [24] between their binary vectors (fingerprints) converted from their SMILES [25]. SSP of drug A can be represented as SSPA = {SSAA, SSAB, SSAC, …}, where SSAx is the Tanimoto coefficient between drug A and X.
TSP for drug A is a vector of target gene similarity values between A and the rest of the drugs. A target gene similarity between drugs A and B is calculated with the following formula:
where GA and GB are target genes for drug A and B, and d (x, y) is a distance between genes x and y in the FI network. In short, a target gene similarity between drugs A and B is the ratio of gene pairs that have a shorter distance than the maximum distance tA. TSP of drug A can be represented as TSPA = {TSAA, TSAB, TSAC, …}.
Calculation of GSP is the same as that of TSP, except that gene and FI network are substituted with GO term and GO graph, respectively. GSP of drug A can be represented as GSPA = {GSAA, GSAB, GSAC, …}, where GSAB is similar to TSAB. The length of SSP, TSP, and GSP of a drug is 1597, which is same as the number of all drugs.
Model for prediction of DDI type
The model for prediction of DDI type is composed of three autoencoders and one deep feed-forward network. The autoencoders are used to reduce the dimensions of SSP, TSP, and GSP. Three autoencoders are homogeneous, and have input and output layers of which the size is 3194 (= 1597 × 2), and 3 hidden layers of which the sizes are 1000, 200, and 1000, respectively. The reduced profile pairs are concatenated and fed to the deep feed-forward network. The deep feed-forward network has an input layer of size 600; 6 hidden layers of size 2000; and an output layer of size 106, which is same as the number of DDI types.
The batch size of input is 256, and the learning rates of the autoencoder and feed-forward network are 0.001 and 0.0001, respectively. The activation functions for the autoencoder and feed-forward network are sigmoid and ReLU [26]. We used sigmoid for the activation function for the output layer of the feed-forward network. The number of epochs is 850, and we used Adam for the feed-forward network and RMSprop for the autoencoder as an optimizer [27]. To avoid overfitting, we applied dropout with a drop rate of 0.3 and batch normalization for the feed-forward network and autoencoders.
For each epoch, three autoencoders are independently trained to minimize the difference of input and output. Then the feed-forward network is trained with the reduced profile pairs as input. The training is performed to minimize the sum of costs from the three autoencoders and the feed-forward network. Therefore, the autoencoders are trained twice, and encode profiles so as to predict the DDI type more accurately.
Availability of data and materials
DrugBank, https://www.drugbank.ca/releases/latest
Abbreviations
- ADEs:
-
Adverse drug events
- DDIs:
-
Drug-drug interactions
- GO:
-
Gene ontology
- GSP:
-
GO term similarity profiles
- NSCLC:
-
Non-small cell lung cancer
- SMILES:
-
Molecular-Input Line-Entry System
- SSP:
-
Structural similarity profiles
- TSP:
-
Target gene similarity profiles
References
Foucquier J, Guedj M. Analysis of drug combinations: current methodological landscape. Pharmacol Res Perspect. 2015;3(3):e00149.
Edwards IR, Aronson JK. Adverse drug reactions: definitions, diagnosis, and management. Lancet. 2000;356(9237):1255–9.
Tatonetti NP, Fernald GH, Altman RB. A novel signal detection algorithm for identifying hidden drug-drug interactions in adverse event reports. J Am Med Inform Assoc. 2011, 19(1):79–85.
Pirmohamed M, Orme M: Drug interactions of clinical importance. Davies’s textbook of adverse drug reactions 1998:888–912.
Onakpoya IJ, Heneghan CJ, Aronson JK. Post-marketing withdrawal of 462 medicinal products because of adverse drug reactions: a systematic review of the world literature. BMC Med. 2016;14(1):10.
Gao H, Korn JM, Ferretti S, Monahan JE, Wang Y, Singh M, Zhang C, Schnell C, Yang G, Zhang Y. High-throughput screening using patient-derived tumor xenografts to predict clinical trial drug response. Nat Med. 2015;21(11):1318.
Fang H-B, Chen X, Pei X-Y, Grant S, Tan M. Experimental design and statistical analysis for three-drug combination studies. Stat Methods Med Res. 2017;26(3):1261–80.
Gottlieb A, Stein GY, Oron Y, Ruppin E, Sharan R. INDI: a computational framework for inferring drug interactions and their associated recommendations. Mol Syst Biol. 2012;8(1):592.
Vilar S, Uriarte E, Santana L, Tatonetti NP, Friedman C. Detection of drug-drug interactions by modeling interaction profile fingerprints. PLoS One. 2013;8(3):e58321.
Vilar S, Uriarte E, Santana L, Lorberbaum T, Hripcsak G, Friedman C, Tatonetti NP. Similarity-based modeling in large-scale prediction of drug-drug interactions. Nat Protoc. 2014;9(9):2147.
Cheng F, Zhao Z. Machine learning-based prediction of drug–drug interactions by integrating drug phenotypic, therapeutic, chemical, and genomic properties. J Am Med Inform Assoc. 2014;21(e2):e278–86.
Zhang P, Wang F, Hu J, Sorrentino R. Label propagation prediction of drug-drug interactions based on clinical side effects. Sci Rep. 2015;5:12339.
Huang J, Niu C, Green CD, Yang L, Mei H, Han J-DJ. Systematic prediction of pharmacodynamic drug-drug interactions through protein-protein-interaction network. PLoS Comput Biol. 2013;9(3):e1002998.
Park K, Kim D, Ha S, Lee D. Predicting pharmacodynamic drug-drug interactions through signaling propagation interference on protein-protein interaction networks. PLoS One. 2015;10(10):e0140816.
Ryu JY, Kim HU, Lee SY. Deep learning improves prediction of drug–drug and drug–food interactions. Proc Natl Acad Sci. 2018;115(18):E4304–11.
Rumelhart DE, Hinton GE, Williams RJ: Learning internal representations by error propagation. In.: California Univ San Diego La Jolla Inst for Cognitive Science; 1985.
Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2017;46(D1):D1074–82.
Chatr-Aryamontri A, Oughtred R, Boucher L, Rust J, Chang C, Kolas NK, O'Donnell L, Oster S, Theesfeld C, Sellam A. The BioGRID interaction database: 2017 update. Nucleic Acids Res. 2017;45(D1):D369–79.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT. Gene ontology: tool for the unification of biology. Nat Genet. 2000;25(1):25.
Consortium GO. Expansion of the gene ontology knowledgebase and resources. Nucleic Acids Res. 2016;45(D1):D331–8.
Phansalkar S, Desai AA, Bell D, Yoshida E, Doole J, Czochanski M, Middleton B, Bates DW. High-priority drug-drug interactions for use in electronic health records. J Am Med Inform Assoc. 2012;19(5):735–43.
Drew BJ, Ackerman MJ, Funk M, Gibler WB, Kligfield P, Menon V, Philippides GJ, Roden DM, Zareba W, American Heart Association acute cardiac Care Committee of the Council on clinical cardiology tCoCN, et al. Prevention of torsade de pointes in hospital settings: a scientific statement from the American Heart Association and the American College of Cardiology Foundation. Circulation. 2010;121(8):1047–60.
Drye LT, Spragg D, Devanand DP, Frangakis C, Marano C, Meinert CL, Mintzer JE, Munro CA, Pelton G, Pollock BG. Changes in QTc interval in the citalopram for agitation in Alzheimer's disease (CitAD) randomized trial. PLoS One. 2014;9(6):e98426.
Rogers DJ, Tanimoto TT. A computer program for classifying plants. Science. 1960;132(3434):1115–8.
Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf Model. 2010;50(5):742–54.
Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics: 2011. p. 315–23.
Kingma DP, Ba J: Adam: a method for stochastic optimization. arXiv preprint arXiv:14126980 2014.
Acknowledgements
Not applicable.
Funding
This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education [NRF-2016R1D1A1B03934135], and the Ministry of Science and ICT [NRF-2019R1A2C3005212]. Funding for open access charge: National Research Foundation of Korea (NRF). The funders did not participate in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Author information
Authors and Affiliations
Contributions
GL, CP and JA conceived and designed the experiments; GL and CP performed the experiments; GL, CP and JA analyzed the data and wrote the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file
Additional file 1:
Table S1. DDI types. Table S2. Prediction of DDI (prediction score ≥ 0.5). (XLSX 59 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Lee, G., Park, C. & Ahn, J. Novel deep learning model for more accurate prediction of drug-drug interaction effects. BMC Bioinformatics 20, 415 (2019). https://doi.org/10.1186/s12859-019-3013-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12859-019-3013-0
Keywords
- Drug-drug interaction
- Deep learning
- Autoencoder
- Similarity profile