 Research
 Open Access
 Published:
In silico drug repositioning using deep learning and comprehensive similarity measures
BMC Bioinformatics volume 22, Article number: 293 (2021)
Abstract
Background
Drug repositioning, meanings finding new uses for existing drugs, which can accelerate the processing of new drugs research and development. Various computational methods have been presented to predict novel drug–disease associations for drug repositioning based on similarity measures among drugs and diseases. However, there are some known associations between drugs and diseases that previous studies not utilized.
Methods
In this work, we develop a deep gated recurrent units model to predict potential drug–disease interactions using comprehensive similarity measures and Gaussian interaction profile kernel. More specifically, the similarity measure is used to exploit discriminative feature for drugs based on their chemical fingerprints. Meanwhile, the Gaussian interactions profile kernel is employed to obtain efficient feature of diseases based on known diseasedisease associations. Then, a deep gated recurrent units model is developed to predict potential drug–disease interactions.
Results
The performance of the proposed model is evaluated on two benchmark datasets under tenfold crossvalidation. And to further verify the predictive ability, case studies for predicting new potential indications of drugs were carried out.
Conclusion
The experimental results proved the proposed model is a useful tool for predicting new indications for drugs or new treatments for diseases, and can accelerate drug repositioning and related drug research and discovery.
Background
Although the impressive advances have been witnessed in life sciences and technology and genomics over the past years. To bring a new drug to patients still takes ~ 15 years and 800 million to one billion of dollars [1,2,3]. Traditional drug research and development (R&D) process requires testing for side efforts and safety through cellular model systems, extensive animal model and clinical trial experimental validation. The average cost of new drug discovery has significantly increased and more than 90% of drug candidates fail during development, which caused pharmaceutical R&D tremendously expensive, time costing and high risky [3, 4]. This further directly led to a small quantity and high price of new drugs on the market. Drug repositioning or drug repurposing, identifying new clinical indications for those approved drugs has been used as an important strategy to maximize the potential usage of the existing drugs and increase the number of new drugs [5, 6]. Compared with the traditional drug R&D process, drug repositioning has two major advantages. The first is the safety of the approved drugs has been verified by rigorous clinical trials, the repositioning candidates have passed all necessary tests usual to de novo drug R&D, so these drugs are safe to use. Another advantage is drug repositioning has an abridged process of drug discovery and preparation, which means saving time and money.
In recent years, the establishment of online public databases on pharmacochemical properties, drug molecules chemical structure, drug–drug interactions, disease–disease interactions, related genomic sequences and side efforts has promoted the study of drug–disease interactions and drug repositioning [7]. Such as KEGG [8], OMIM [9], CMap [10], DrugBank [11], STITCH [12] and ChEMBL [13]. The goal of drug repositioning is to find potential indications for existing approved drugs and apply the new identified drug candidates to the clinical treatment for other disease than originally targeted disease. Integrated data from these various sources, to date, many machine learning methods are developed [14,15,16,17,18,19,20,21,22,23,24,25].
For instance, Chiang et al. conducted a ‘guiltbyassociation’ networkbased model to predict potential drug–disease associations, this method assumes that if the two diseases have similar treatment profiles, then the drug used for only one of the two diseases can be used for the other, thus recommending the new use of a drug. However, this approach tends to older drugs with multiple different uses and diseases with manifold different treatments [26]. Gottlieb et al. [27] demonstrated a method for largescale prediction of drug indications, named PREDICT, which uses comprehensive drug–drug and disease–disease similarity measures to obtain discriminative features. Napolitano et al. [28] proposed a multiclass Support Vector Machine (SVM) classifier to predict novel drug–disease interactions and they defined drug similarities by using combined drug datasets. Moreover, some networkbased methods also be put forward in recent years [29, 30]. Wu et al. [31] introduced a weighted drug–disease heterogeneous network to predict new use of drug by clustering based on experimental proved drug–target interactions and gene–disease relationships. Wang et al. [32] also constructed a heterogeneous network integrated drug targets, diseases and drugs into a unified framework, which can rank candidate drugs for each disease by an iterative approach. Martinez et al. [33] proposed DrugNet to perform drug–disease and disease–drug prioritization based on a networkbased prioritization method, which can integrate extensive types of data from complex networks involving interconnected drugs, proteins and diseases.
More recently, some recommendation system based methods have been developed for computational drug discovery [34, 35]. Luo et al. [5] presented MBiRW model to identify new interactions for known drugs, which applied comprehensive similarity measures and BiRandom walk algorithm. Thereafter, Nagaraj et al. [4] developed a novel drug discovery strategy DrugPredict, which combined computational model with biological testing in cell line in order to rapidly identify novel drug candidates for epithelial ovarian cancer. Their work exploited unique repositioning opportunities rendered by a vast amount of disease genomics, phenomics, treatments and genetic pathway [4]. Matrix factorization methods have also been used to identify novel drug–disease interactions, which takes one input matrix and obtained two related matrices as output, while the two are multiplied to approximate the originally input matrix, e.g. kernel Bayesian matrix factorization, collaborative matrix factorization method and so on. Most existing methods rely on the properties of some important drugs or diseases to exploit the drug similarity and disease similarity measures. However, there are some known interactions between drugs and diseases that previous studies have not considered to utilize, which yet have valuable information can be exploited to improve similarity measures.
In this study, we propose a deep learning model for potential Drug–Disease Interactions Prediction, named DDIPred. It applied gated recurrent neural network for predicting new indications of existing drugs using comprehensive similarity measures and Gaussian interaction profile kernel features. The workflow of this study is demonstrated as shown in Fig. 1. More specifically, the similarity measures are calculated based on drug chemical structures, disease phenotypes and known drug–disease interactions. Furthermore, the Gaussian interaction profile (GIP) kernel was applied to exploit effective feature of drug and disease based on known drug–disease interactions. The truncated singular value decomposition (TSVD) is further used to reduce the dimensionality of these combined two feature [17]. Finally, we fed these discriminative features into deep gated recurrent units (GRU) model as input to learn and predict the novel drug–disease interactions, which means potential new use of existing drugs. Moreover, the performance of the proposed model is evaluated on two gold standard datasets under tenfold crossvalidation. And we further made case studies to verify the predictive ability of our model. Experimental results demonstrate that the proposed model has the superior capability to discover potential new use of drugs.
Materials and methodology
In this section, the dataset used in this study will be introduced first. And then, based on the basic hypothesis that the similar drugs have similar indications, we proposed a novel deep learning approach of integrating comprehensive similarity measures and Gaussian interaction profile kernel with GRU model to predict potential drug–disease interactions. We will present the details of similarity measures and Gaussian interaction profile kernel and the implement of GRU model. Meanwhile, we will also describe the comparison models, experimental methods, and the evaluation criteria in this section.
Benchmark datasets
To evaluate the performance of our model, we selected two widely used benchmark datasets including Fdataset and Cdataset. The gold standard dataset Fdataset is obtained from Gottlieb et al.’s work [27], which is made up of multiple data sources. More concretely, for this dataset, there are 1933 known associations between drugs and diseases and 593 drugs from DrugBank [36] and 313 diseases registered in OMIM [9] (the Online Mendelian Inheritance in Man). We also carried out another benchmark dataset Cdataset at the same time, this dataset is firstly presented in Luo et al.’s paper [5]. There are 2532 drug–disease associations in this dataset, including 409 diseases and 663 drugs. Each dataset consists of three matrices: drug–drug similarity matrix \(S_{D} \in R^{m \times m}\), diseasedisease similarity matrix \(S_{d} \in R^{n \times n}\) and drug–disease interactions matrix \(I \in R^{m \times n}\). \(S_{D}\) and \(S_{d}\) are symmetric matrices and each row or column element represents the similarity between a drug and other drugs, a disease and other diseases, respectively. The details of similarity calculation is given in next section. The m rows of matrix \(I\) indicate m drugs, n columns represent n diseases, when drug \(D_{i}\) have association with disease \(d_{j}\), set the element \(I\left( {i,j} \right)\) to 1, else set to 0. The interacting drug–disease pairs are used as positive samples, and the same number of pairs without known interaction are randomly selected as negative samples. The details of these two datasets are shown in Table 1.
Similarity measures
Follow the description above, the drugs similarity is calculated based on the chemical structure information, which comes from drugrelated properties [5]. More concretely, the similarity between two drugs is calculated by the Chemical Development Kit [37] of their 2D chemical fingerprints, which use the Simplified Molecular Input Line Entry Specification (SMILES) [38] of all drugs that downloaded from DrugBank. Moreover, the correlation between two drugs’ similarity and their common diseases are analyzed and set those similarity that is not discriminative close to 0. The similarity are adjusted using the logistic regression function which has been used to modify the diseasesgenes associations similarity by [39]. The function can be defined as follow:
where x represents the similarity value, a and b are adjusting parameters. And then, the drugs are clustered based on known drug–disease associations by using a graph clustering method, ClusterONE [40], which has been employed to detect valuable modules for drug repositioning [5, 31, 41]. The cohesiveness of a cluster M could be defined by ClusterONE as follows:
where \(C_{in} \left( M \right)\) indicates the total weight of edges within a set of vertices M, \(C_{bound} \left( M \right)\) stands for the total weight of edges connecting this set to the remaining of group, and P(M) is the penalty term [5].
Gaussian interaction profile kernel
For diseases, we adopted Gaussian interaction profile kernel [42] to obtain the representation of disease–disease associations [43]. Based on the assumption that the diseases with a similar interaction pattern with drugs are likely to show similar interaction behavior with new drugs [42]. Similar assumptions can also be applied to drugs. Suppose (\(D_{i}\), \(D_{j}\)) indicates two different drugs, while (\(d_{i}\), \(d_{j}\)) represents two different diseases. Their gaussian interaction profile kernel similarity KG can calculation as follows:
Here, for simplicity, the \({\alpha_{d}}^{{\prime}}\) is set to 0.5, and the \(n_{d}\) stands for the number of the diseases, which is inspired by [42]. Then, the matrix decomposition algorithm TSVD was further applied to reduce the dimension of these features.
Implementation of gated recurrent units neural network
In order to overcome several known defects of standard Recurrent Neural network (RNN) model, a series of improved models has been proposed in deep learning field. Among them, the Long short term memory (LSTM) [44, 45] and other similar variant models have the best performance and are widely used in a many fields [46,47,48]. The main reason for their effectiveness is the pullin of gated mechanisms. The Gated Recurrent Units (GRU) was proposed by Cho et al. [49], which has only resetting gate and updating gate and all memory contents are fully open to each timestep. We follow the similar calculation process in [50].
The update gate \(u_{t}\) is calculated by:
here, the \(i_{t}\) indicates the input vector of GRU, \(h_{t  1}\) stands for the previous output of model, \(W_{z}\), \(U_{z}\) and \(b_{z}\) are forward, recurrent matrices and biases for update gate, respectively. Similar to the process of update gate, the computed process of reset gate can be defined as follows:
where the parameters are same as above. Moreover, the candidate memory state \(c_{t}\) can be computed by:
where \(\sigma_{h}\) is the tanh function and \(*\) means an elementwise multiplication. Finally, the memory state \(h_{t}\) of the GRU model is defined as:
In practice, the GRU model is implemented based on Keras framework [51]. Considering the limited scale of the problem, we set the number of hidden neurons in the GRU input layer to 128 and add a Dense layer (fully connected layer) behind the output layer as the classifier to reduce the final prediction probability results. The sigmoid function is employed as activation function, its mathematical behaviors can be expressed as follows:
before activation layer, we applied Dropout to reduce overfitting and enhance the model’s robustness [52]. The parameter of dropout was set to 0.25. And the binary crossentropy was used as loss function, which corresponding to sigmoid activation function. Furthermore, loss function has significant influence to the performance of machine learning model. The binary crossentropy can be defined as:
where p and t denote the prediction output and true label value. Moreover, we used the Adam optimizer the update the weights of model. The Adam integrated the advantages of both RMSProp and AdaGrad, which is popluar in this field [53].
Performance evaluation metrics
In order to comprehensively evaluate the performance of our model, we follow the widely used evaluation indicators and strategies [54, 55]. The tenfold crossvalidation was applied to evaluate the performance of DDIPred. In each validation, all data randomly divides into ten equal parts. Ninefold data are taken as train data, the rest onefold is taken as test data. To guarantee the unbiased comparison, it confirmed that there is no overlap between train data and test data. The final validation result is the mean value of tenfold with standard deviations. We follow the extensive used evaluation criteria, including accuracy (Acc), true positive rate (TPR), true negative rate (TNR), positive predictive value (PPV) and Matthews Correlation Coefficient (MCC) defined as:
where TN stands for the true negative number, TP represents the true positive number, FN denotes the false negative number and FP indicates the false positive number. Certainly, the Receiver Operating Characteristic (ROC) curve and the area under the ROC curve (AUC) are also adopted to evaluate the performance. And considering the specificity of the research task, the predicted topN ranked results are more valuable for related drug development or disease treatment research. We also test the performance of model based on the count of accurately retrieved true drug–disease interactions.
Results and discussion
In this study, we propose a deep learning model to predict potential drug–disease interactions, which can advance the discovery of new use of existing drugs or new treatment of diseases. In this section, we will systematically evaluate the performance of the model. Firstly, we evaluated the prediction capability of DDIPred on two benchmark datasets. And then, we compared it with other stateoftheart models under the same experimental conditions. Furthermore, we made case studies to verify the practicability of the proposed method.
Drug–disease interactions prediction capability evaluation
First, the drug–disease interactions prediction capability of DDIPred is evaluated on two benchmark datasets Fdataset and Cdataset. The details of tenfold cross validation are listed at Tables 2 and 3 for Cdataset and Fdataset. The average values of tenfold crossvalidation are taken as final report results as shown in Fig. 2.
As the Table 2 shown, the mean accuracy of tenfold crossvalidation on Cdataset is 81.48% with standard deviation 1.48%, the mean TPR is 80.59% with standard deviation 2.86%, the mean TNR is 83.01% with standard deviation 2.71%, the average PPV is 80.03% with standard deviation 2.88% and the mean MCC of DDIPred on Cdataset is 63.06% with standard deviation 2.99%. The rigorous cross validation results provided that our model have obvious predictive ability for predicting the associations between drugs and diseases.
The tenfold crossvalidation performance of DDIPred on Fdataset is shown in Table 3. The average accuracy on Fdataset is 77.83% with standard deviation 2.43%, and the average TPR is 77.13% with standard deviation 4.37%, the average TNR is 79.22% with standard deviation 3.48%, the average PPV is 76.57% with standard deviation 4.06% and the mean MCC of DDIPred on Fdataset is 55.80% with standard deviation 4.93%. The performance of DDIPred on this dataset is slightly weaker than on the Cdataset, but it still has acceptable results, which means it is competent for the drug–disease associations prediction task.
Comparison with other stateoftheart methods
We further compared the proposed model with other stateoftheart methods on same datasets under same experimental conditions, including previous studies and widely used machine learning model Support Vector Machine (SVM), the comparison results are reported at Tables 4 and 5 and Fig. 3.
We compared the AUC of our model and previous studies including DrugNet [33] and HGBI [32]. Considering the difference of experimental evaluation indicators in different research, we only compared the AUC value reported in every study, which can best reflect the performance of model. As shown in Table 4 and Fig. 3, the DrugNet obtained a AUC of 0.804 on Cdataset and a AUC of 0.778 on Fdataset. The HGBI performed better than DrugNet with AUC of 0.858, 0.829 on Cdataset and Fdataset respectively. However, the AUC of DDIPred are 0.871, 0.838 on Cdataset and Fdataset, our model performs best on both datasets.
Furthermore, we did a comparison between our model and widely used machine learning model SVM, which is often used as a baseline model and usually has great performance in various fields. The feature input, tenfold cross validation set, evaluation metrics and other experimental conditions are exactly same between DDIPred and SVM model. The parameters of SVM are determined by grid search. The results are shown in Table 5. Our model has significantly improved all indicators.
Case studies
In order to further examined the capability of the proposed model in predicting new associations between drugs and diseases. A drug and a disease are selected as case to be measured. The feature of the tested drug or disease and the feature of each disease or drug were combined as test data. Then, these data are fed into trained model to obtained prediction scores. Finally, all candidates are ranked based on prediction scores. The Zoledronic acid (DrugBank Accession Number: DB00399) and Dexamethasone (DrugBank Accession Number: DB01234) were selected for our case. Zoledronic acid is usually used to treat bone metastases pain, hypercalcemia of malignancy. And it can also helpful to prevent skeletal fractures in multiple myeloma and prostate cancer patients. Dexamethasone has antiinflammatory, antiimmune, antitoxin, antipyretic and other effects, and has a greater impact on metabolism. The prediction results are demonstrated in Tables 6 and 7, our model found the diseases most relevant to the target drugs, both confirmed indications and new potential candidate diseases are successfully predicted.
Conclusion
In this work, we proposed a novel deep learning model DDIPred using comprehensive similarity measure and Gaussian interaction profile kernel and gated recurrent neural networks to predict potential drug–disease associations, which may find new indications of existing drugs and can accelerate the process of drug research and development. The similarity measure matrix is used to exploit discriminative feature for drugs based on their chemical fingerprints. Meanwhile, the Gaussian interactions profile kernel is employed to obtain efficient feature for diseases based on known disease–disease associations. Then, we implemented a competitive deep learning GRU model to deal with the prediction task. Our model achieved remarkable performance on both two benchmark datasets with excellent AUC of 0.871 and 0.838 on Cdataset and Fdataset, and outperforms all comparison stateoftheart models in many indicators. And we further made case studies to verify the predictive ability of our model. The rigorous experimental results proved the proposed method is powerful tool for predicting new indications for drugs or new treatments for diseases, and can be regarded as a useful guide for drug repositioning and drug discovery.
Availability of data and materials
The datasets used and/or analysed during the current study are available at: https://github.com/haichengyi/DDIPred.
Abbreviations
 R&D:

Research and development
 GIP:

Gaussian interaction profile kernel
 TSVD:

Truncated Singular Value Decomposition
 GRU:

Gated Recurrent Units
 SVM:

Support Vector Machine
 OMIM:

Online Mendelian Inheritance in Man
 SMILES:

Simplified Molecular Input Line Entry Specification
 LSTM:

Long shortterm memory
 Acc:

Accuracy
 TPR:

True positive rate
 TNR:

True negative rate
 PPV:

Positive predictive value
 MCC:

Matthews Correlation Coefficient
 ROC:

Receiver operating characteristic curve
 AUC:

The area under the receiver operating characteristic curve
References
 1.
Ashburn TT, Thor KB. Drug repositioning: identifying and developing new uses for existing drugs. Nat Rev Drug Discov. 2004;3:673.
 2.
Booth B, Zemmel R. Prospects for productivity. Nat Rev Drug Discov. 2004;3:451.
 3.
Dudley JT, Deshpande T, Butte AJ. Exploiting drug–disease relationships for computational drug repositioning. Brief Bioinform. 2011;12(4):303–11.
 4.
Nagaraj AB, Wang QQ, Joseph P, Zheng C, Chen Y, Kovalenko O, Singh S, Armstrong A, Resnick K, Zanotti K. Using a novel computational drugrepositioning approach (DrugPredict) to rapidly identify potent drug candidates for cancer treatment. Oncogene. 2018;37(3):403–14.
 5.
Luo H, Wang J, Li M, Luo J, Peng X, Wu FX, Pan Y. Drug repositioning based on comprehensive similarity measures and BiRandom walk algorithm. Bioinformatics. 2016;32(17):2664.
 6.
Luo H, Li M, Wang S, Liu Q, Li Y, Wang J. Computational drug repositioning using lowrank matrix approximation and randomized algorithms. Bioinformatics. 2018;34(11):1904–12.
 7.
Chen X, Sun YZ, Zhang DH, Li JQ, Yan GY, An JY, You ZH: NRDTD: a database for clinically or experimentally supported noncoding RNAs and drug targets associations. Database. 2017;2017:bax057.
 8.
Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 2009;38(suppl_1):D355–60.
 9.
Hamosh A, Scott AF, Amberger J, Bocchini C, Valle D, Mckusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005;33(1):514–7.
 10.
Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, Wrobel MJ, Lerner J, Brunet JP, Subramanian A, Ross KN. The Connectivity Map: using geneexpression signatures to connect small molecules, genes, and disease. Science. 2006;313(5795):1929–35.
 11.
Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V. DrugBank 30: a comprehensive resource for ‘Omics’ research on drugs. Nucleic Acids Res. 2011;39(Database issue):D1035.
 12.
Kuhn M, Szklarczyk D, PletscherFrankild S, Blicher TH, Von MC, Jensen LJ, Bork P. STITCH 4: integration of proteinchemical interactions with user data. Nucleic Acids Res. 2014;42(Database issue):401–7.
 13.
Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, Mcglinchey S, Michalovich D, AlLazikani B. ChEMBL: a largescale bioactivity database for drug discovery. Nucleic Acids Res. 2012;40(Database issue):1100–7.
 14.
Meng FR, You ZH, Chen X, Zhou Y, An JY. Prediction of drug–target interaction networks from the integration of protein sequences and drug chemical structures. Molecules. 2017;22(7):1119.
 15.
Luo H, Chen J, Shi L, Mikailov M, Zhu H, Wang K, He L, Yang L. DRARCPI: a server for identifying drug repositioning potential and adverse drug reactions via the chemical–protein interactome. Nucleic Acids Res. 2011;39(suppl_2):W492–8.
 16.
Guo ZH, You ZH, Huang DS, Yi HC, Chen ZH, Wang YB. A learning based framework for diverse biomolecule relationship prediction in molecular association network. Commun Biol. 2020;3(1):118.
 17.
Yi HC, You ZH, Huang DS, Li X, Jiang TH, Li LP. A deep learning framework for robust and accurate prediction of ncRNAprotein interactions using evolutionary information. Mol Ther Nucleic Acids. 2018;11:337–44.
 18.
Yi HC, You ZH, Cheng L, Zhou X, Jiang TH, Li X, Wang YB. Learning distributed representations of RNA and protein sequences and its application for predicting lncRNAprotein interactions. Comput Struct Biotechnol J. 2020;18:20–6.
 19.
He T, Bai L, Ong Y. Manifold regularized stochastic block model. In: 2019 IEEE 31st international conference on tools with artificial intelligence (ICTAI). 2019. P. 800–7.
 20.
He T, Chan KCC. Discovering fuzzy structural patterns for graph analytics. IEEE Trans Fuzzy Syst. 2018;26(5):2785–96.
 21.
He T, Chan KCC. MISAGA: an algorithm for mining interesting subgraphs in attributed graphs. IEEE Trans Cybern. 2018;48(5):1369–82.
 22.
He T, Chan KCC. Measuring boundedness for protein complex identification in PPI networks. IEEE/ACM Trans Comput Biol Bioinf. 2019;16(3):967–79.
 23.
He T, Liu Y, Ko TH, Chan KCC, Ong YS. Contextual correlation preserving multiview featured graph clustering. IEEE Trans Cybern. 2020;50(10):4318–4331.
 24.
Yi HC, You ZH, Huang DS, Guo ZH, Chan KC, Li Y. Learning representations to predict intermolecular interactions on largescale heterogeneous molecular association network. iScience. 2020;23(7):101261.
 25.
Yi HC, You ZH, Guo ZH. Construction and analysis of molecular association network by combining behavior representation and node attributes. Front Genet. 2019;10:1106.
 26.
Chiang AP, Butte AJ. Systematic evaluation of drug–disease relationships to identify leads for novel drug uses. Clin Pharmacol Ther. 2009;86(5):507–10.
 27.
Gottlieb A, Stein GY, Ruppin E, Sharan R. PREDICT: a method for inferring novel drug indications with application to personalized medicine. Mol Syst Biol. 2011;7(1):496.
 28.
Francesco N, Yan Z, Moreira VM, Roberto T, Juha K, Mauro DA, Dario G. Drug repositioning: a machinelearning approach through data integration. J Cheminform. 2013;5(1):30–30.
 29.
Iorio F, Bosotti R, Scacheri E, Belcastro V, Mithbaokar P, Ferriero R, Murino L, Tagliaferri R, BrunettiPierri N, Isacchi A. Discovery of drug mode of action and drug repositioning from transcriptional responses. Proc Natl Acad Sci. 2010;107(33):14621–6.
 30.
Cheng F, Liu C, Jiang J, Lu W, Li W, Liu G, Zhou W, Huang J, Tang Y. Prediction of drug–target interactions and drug repositioning via networkbased inference. PLoS Comput Biol. 2012;8(5):e1002503.
 31.
Wu C, Gudivada RC, Aronow BJ, Jegga AG. Computational drug repositioning through heterogeneous network clustering. BMC Syst Biol. 2013;7(5):1–9.
 32.
Wang W, Yang S, Zhang X, Li J. Drug repositioning by integrating target information through a heterogeneous network model. Bioinformatics. 2014;30(20):2923–30.
 33.
Martínez V, Navarro C, Cano C, Fajardo W, Blanco A. DrugNet: Networkbased drug–disease prioritization by integrating heterogeneous data. Artif Intell Med. 2015;63(1):41–9.
 34.
Zeng X, Zhu S, Liu X, Zhou Y, Nussinov R, Cheng F. deepDR: a networkbased deep learning approach to in silico drug repositioning. Bioinformatics. 2019;35(24):5191–8.
 35.
Chen H, Cheng F, Li J. iDrug: Integration of drug repositioning and drugtarget prediction via crossnetwork embedding. PLoS Comput Biol. 2020;16(7):e1008040.
 36.
Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, Gautam B, Hassanali M. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 2008;36(Database issue):901–6.
 37.
Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E. The Chemistry Development Kit (CDK): an opensource Java library for chemoand bioinformatics. J Chem Inf Comput Sci. 2003;43(2):493–500.
 38.
Weininger D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci. 1988;28(1):31–6.
 39.
Vanunu O, Magger O, Ruppin E, Shlomi T, Sharan R. Associating genes and protein complexes with disease via network propagation. PLoS Comput Biol. 2010;6(1):e1000641.
 40.
Nepusz T, Yu H, Paccanaro A. Detecting overlapping protein complexes in protein–protein interaction networks. Nat Methods. 2012;9(5):471.
 41.
Yu L, Huang J, Ma Z, Zhang J, Zou Y, Gao L. Inferring drugdisease associations based on known protein complexes. BMC Med Genomics. 2015;8(2):S2.
 42.
van Laarhoven T, Nabuurs SB, Marchiori E. Gaussian interaction profile kernels for predicting drug–target interaction. Bioinformatics. 2011;27(21):3036–43.
 43.
Chen X, Jiang ZC, Xie D, Huang DS, Zhao Q, Yan GY, You ZH. A novel computational model based on superdisease and miRNA for potential miRNA–disease association prediction. Mol BioSyst. 2017;13(6):1202–12.
 44.
Hochreiter S, Schmidhuber J. Long shortterm memory. Neural Comput. 1997;9(8):1735–80.
 45.
Gers FA, Schmidhuber J, Cummins F. Learning to forget: continual prediction with LSTM. 1999.
 46.
Shen Z, Bao W, Huang DS. Recurrent neural network for predicting transcription factor binding sites. Sci Rep. 2018;8(1):15270.
 47.
Yi HC, You ZH, Zhou X, Cheng L, Li X, Jiang TH, Chen ZH. ACPDL: a deep learning long shortterm memory model to predict anticancer peptides using highefficiency feature representation. Mol Ther Nucleic Acids. 2019;17:1–9.
 48.
Wang YB, You ZH, Yang S, Yi HC, Chen ZH, Zheng K. A deep learningbased method for drugtarget interaction prediction based on long shortterm memory neural network. BMC Med Inform Decis Mak. 2020;20(2):49.
 49.
Cho K, Van Merriënboer B, Bahdanau D, Bengio Y: On the properties of neural machine translation: Encoderdecoder approaches. arXiv preprint arXiv:14091259. 2014.
 50.
Chung J, Gulcehre C, Cho K, Bengio Y: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:14123555. 2014.
 51.
Chollet F. Keras: The python deep learning library. Astrophysics Source Code Library. 2018.
 52.
Gal Y, Hron J, Kendall A. Concrete dropout. 2017. arXiv preprint arXiv:1705.07832.
 53.
Kingma DP, Ba J. Adam: a method for stochastic optimization. 2014. arXiv preprint arXiv:1412.6980v3.
 54.
Yi HC, You ZH, Guo ZH, Huang DS, Chan KCC. Learning representation of molecules in association network for predicting intermolecular associations. IEEE/ACM Trans Comput Biol Bioinform. 2020. https://doi.org/10.1109/TCBB.2020.2973091.
 55.
Yi HC, You ZH, Wang MN, Guo ZH, Wang YB, Zhou JR. RPISE: a stacking ensemble learning framework for ncRNAprotein interactions prediction using sequence information. BMC Bioinform. 2020;21(1):60.
Acknowledgements
The authors would like to thank all the editors and anonymous reviewers for their constructive advices.
About this supplement
This article has been published as part of BMC Bioinformatics Volume 22 Supplement 3, 2021: Proceedings of the 2019 International Conference on Intelligent Computing (ICIC 2019): bioinformatics. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume22supplement3.
Funding
Publication costs are sponsored in part by the National Outstanding Youth Science Foundation of NSFC under Grant 61722212, in part by the National Natural Science Foundation of China under Grants 61873212, 61861146002, and 61732012. The funders have no role in study design, data collection, data analysis, data interpretation, or writing of the manuscript.
Author information
Affiliations
Contributions
HCY, ZHY conceived the algorithm, carried out analyses, prepared the data sets, carried out experiments, and wrote the manuscript; LW, XRS, XZ and THJ designed, performed, and analyzed experiments and wrote the manuscript; All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Yi, HC., You, ZH., Wang, L. et al. In silico drug repositioning using deep learning and comprehensive similarity measures. BMC Bioinformatics 22, 293 (2021). https://doi.org/10.1186/s1285902003882y
Received:
Accepted:
Published:
Keywords
 Drug repositioning
 Drug–disease interaction
 Gated recurrent units
 Gaussian interaction profile kernel
 Machine learning