Skip to main content

In silico drug repositioning using deep learning and comprehensive similarity measures

Abstract

Background

Drug repositioning, meanings finding new uses for existing drugs, which can accelerate the processing of new drugs research and development. Various computational methods have been presented to predict novel drug–disease associations for drug repositioning based on similarity measures among drugs and diseases. However, there are some known associations between drugs and diseases that previous studies not utilized.

Methods

In this work, we develop a deep gated recurrent units model to predict potential drug–disease interactions using comprehensive similarity measures and Gaussian interaction profile kernel. More specifically, the similarity measure is used to exploit discriminative feature for drugs based on their chemical fingerprints. Meanwhile, the Gaussian interactions profile kernel is employed to obtain efficient feature of diseases based on known disease-disease associations. Then, a deep gated recurrent units model is developed to predict potential drug–disease interactions.

Results

The performance of the proposed model is evaluated on two benchmark datasets under tenfold cross-validation. And to further verify the predictive ability, case studies for predicting new potential indications of drugs were carried out.

Conclusion

The experimental results proved the proposed model is a useful tool for predicting new indications for drugs or new treatments for diseases, and can accelerate drug repositioning and related drug research and discovery.

Background

Although the impressive advances have been witnessed in life sciences and technology and genomics over the past years. To bring a new drug to patients still takes ~ 15 years and 800 million to one billion of dollars [1,2,3]. Traditional drug research and development (R&D) process requires testing for side efforts and safety through cellular model systems, extensive animal model and clinical trial experimental validation. The average cost of new drug discovery has significantly increased and more than 90% of drug candidates fail during development, which caused pharmaceutical R&D tremendously expensive, time costing and high risky [3, 4]. This further directly led to a small quantity and high price of new drugs on the market. Drug repositioning or drug repurposing, identifying new clinical indications for those approved drugs has been used as an important strategy to maximize the potential usage of the existing drugs and increase the number of new drugs [5, 6]. Compared with the traditional drug R&D process, drug repositioning has two major advantages. The first is the safety of the approved drugs has been verified by rigorous clinical trials, the repositioning candidates have passed all necessary tests usual to de novo drug R&D, so these drugs are safe to use. Another advantage is drug repositioning has an abridged process of drug discovery and preparation, which means saving time and money.

In recent years, the establishment of online public databases on pharmacochemical properties, drug molecules chemical structure, drug–drug interactions, disease–disease interactions, related genomic sequences and side efforts has promoted the study of drug–disease interactions and drug repositioning [7]. Such as KEGG [8], OMIM [9], CMap [10], DrugBank [11], STITCH [12] and ChEMBL [13]. The goal of drug repositioning is to find potential indications for existing approved drugs and apply the new identified drug candidates to the clinical treatment for other disease than originally targeted disease. Integrated data from these various sources, to date, many machine learning methods are developed [14,15,16,17,18,19,20,21,22,23,24,25].

For instance, Chiang et al. conducted a ‘guilt-by-association’ network-based model to predict potential drug–disease associations, this method assumes that if the two diseases have similar treatment profiles, then the drug used for only one of the two diseases can be used for the other, thus recommending the new use of a drug. However, this approach tends to older drugs with multiple different uses and diseases with manifold different treatments [26]. Gottlieb et al. [27] demonstrated a method for large-scale prediction of drug indications, named PREDICT, which uses comprehensive drug–drug and disease–disease similarity measures to obtain discriminative features. Napolitano et al. [28] proposed a multi-class Support Vector Machine (SVM) classifier to predict novel drug–disease interactions and they defined drug similarities by using combined drug datasets. Moreover, some network-based methods also be put forward in recent years [29, 30]. Wu et al. [31] introduced a weighted drug–disease heterogeneous network to predict new use of drug by clustering based on experimental proved drug–target interactions and gene–disease relationships. Wang et al. [32] also constructed a heterogeneous network integrated drug targets, diseases and drugs into a unified framework, which can rank candidate drugs for each disease by an iterative approach. Martinez et al. [33] proposed DrugNet to perform drug–disease and disease–drug prioritization based on a network-based prioritization method, which can integrate extensive types of data from complex networks involving interconnected drugs, proteins and diseases.

More recently, some recommendation system based methods have been developed for computational drug discovery [34, 35]. Luo et al. [5] presented MBiRW model to identify new interactions for known drugs, which applied comprehensive similarity measures and Bi-Random walk algorithm. Thereafter, Nagaraj et al. [4] developed a novel drug discovery strategy DrugPredict, which combined computational model with biological testing in cell line in order to rapidly identify novel drug candidates for epithelial ovarian cancer. Their work exploited unique repositioning opportunities rendered by a vast amount of disease genomics, phenomics, treatments and genetic pathway [4]. Matrix factorization methods have also been used to identify novel drug–disease interactions, which takes one input matrix and obtained two related matrices as output, while the two are multiplied to approximate the originally input matrix, e.g. kernel Bayesian matrix factorization, collaborative matrix factorization method and so on. Most existing methods rely on the properties of some important drugs or diseases to exploit the drug similarity and disease similarity measures. However, there are some known interactions between drugs and diseases that previous studies have not considered to utilize, which yet have valuable information can be exploited to improve similarity measures.

In this study, we propose a deep learning model for potential Drug–Disease Interactions Prediction, named DDIPred. It applied gated recurrent neural network for predicting new indications of existing drugs using comprehensive similarity measures and Gaussian interaction profile kernel features. The workflow of this study is demonstrated as shown in Fig. 1. More specifically, the similarity measures are calculated based on drug chemical structures, disease phenotypes and known drug–disease interactions. Furthermore, the Gaussian interaction profile (GIP) kernel was applied to exploit effective feature of drug and disease based on known drug–disease interactions. The truncated singular value decomposition (TSVD) is further used to reduce the dimensionality of these combined two feature [17]. Finally, we fed these discriminative features into deep gated recurrent units (GRU) model as input to learn and predict the novel drug–disease interactions, which means potential new use of existing drugs. Moreover, the performance of the proposed model is evaluated on two gold standard datasets under ten-fold cross-validation. And we further made case studies to verify the predictive ability of our model. Experimental results demonstrate that the proposed model has the superior capability to discover potential new use of drugs.

Fig. 1
figure1

The workflow of DDIPred

Materials and methodology

In this section, the dataset used in this study will be introduced first. And then, based on the basic hypothesis that the similar drugs have similar indications, we proposed a novel deep learning approach of integrating comprehensive similarity measures and Gaussian interaction profile kernel with GRU model to predict potential drug–disease interactions. We will present the details of similarity measures and Gaussian interaction profile kernel and the implement of GRU model. Meanwhile, we will also describe the comparison models, experimental methods, and the evaluation criteria in this section.

Benchmark datasets

To evaluate the performance of our model, we selected two widely used benchmark datasets including Fdataset and Cdataset. The gold standard dataset Fdataset is obtained from Gottlieb et al.’s work [27], which is made up of multiple data sources. More concretely, for this dataset, there are 1933 known associations between drugs and diseases and 593 drugs from DrugBank [36] and 313 diseases registered in OMIM [9] (the Online Mendelian Inheritance in Man). We also carried out another benchmark dataset Cdataset at the same time, this dataset is firstly presented in Luo et al.’s paper [5]. There are 2532 drug–disease associations in this dataset, including 409 diseases and 663 drugs. Each dataset consists of three matrices: drug–drug similarity matrix \(S_{D} \in R^{m \times m}\), disease-disease similarity matrix \(S_{d} \in R^{n \times n}\) and drug–disease interactions matrix \(I \in R^{m \times n}\). \(S_{D}\) and \(S_{d}\) are symmetric matrices and each row or column element represents the similarity between a drug and other drugs, a disease and other diseases, respectively. The details of similarity calculation is given in next section. The m rows of matrix \(I\) indicate m drugs, n columns represent n diseases, when drug \(D_{i}\) have association with disease \(d_{j}\), set the element \(I\left( {i,j} \right)\) to 1, else set to 0. The interacting drug–disease pairs are used as positive samples, and the same number of pairs without known interaction are randomly selected as negative samples. The details of these two datasets are shown in Table 1.

Table 1 The details of the two drug–disease associations benchmark datasets

Similarity measures

Follow the description above, the drugs similarity is calculated based on the chemical structure information, which comes from drug-related properties [5]. More concretely, the similarity between two drugs is calculated by the Chemical Development Kit [37] of their 2D chemical fingerprints, which use the Simplified Molecular Input Line Entry Specification (SMILES) [38] of all drugs that downloaded from DrugBank. Moreover, the correlation between two drugs’ similarity and their common diseases are analyzed and set those similarity that is not discriminative close to 0. The similarity are adjusted using the logistic regression function which has been used to modify the diseases-genes associations similarity by [39]. The function can be defined as follow:

$$L\left( {\text{x}} \right) = \frac{1}{{1 + e^{{\left( {ax + b} \right)}} }}$$
(1)

where x represents the similarity value, a and b are adjusting parameters. And then, the drugs are clustered based on known drug–disease associations by using a graph clustering method, ClusterONE [40], which has been employed to detect valuable modules for drug repositioning [5, 31, 41]. The cohesiveness of a cluster M could be defined by ClusterONE as follows:

$$f\left( M \right) = \frac{{C_{in} \left( M \right)}}{{(C_{in} \left( M \right) + C_{bound} \left( M \right) + P\left( M \right))}}$$
(2)

where \(C_{in} \left( M \right)\) indicates the total weight of edges within a set of vertices M, \(C_{bound} \left( M \right)\) stands for the total weight of edges connecting this set to the remaining of group, and P(M) is the penalty term [5].

Gaussian interaction profile kernel

For diseases, we adopted Gaussian interaction profile kernel [42] to obtain the representation of disease–disease associations [43]. Based on the assumption that the diseases with a similar interaction pattern with drugs are likely to show similar interaction behavior with new drugs [42]. Similar assumptions can also be applied to drugs. Suppose (\(D_{i}\), \(D_{j}\)) indicates two different drugs, while (\(d_{i}\), \(d_{j}\)) represents two different diseases. Their gaussian interaction profile kernel similarity KG can calculation as follows:

$$KG_{disease} \left( {d_{i} , d_{j} } \right) = {\text{exp}}\left( { - \alpha_{d} \left\| d_{i} - d_{j}^{2}\right\| } \right)$$
(3)
$$\alpha_{d} = \frac{{{\alpha_{d}}^{{\prime}} }}{{\left( {\frac{1}{nd}\mathop \sum \nolimits_{i = 1}^{{n_{d} }} \left| {y_{{d_{i} }} } \right|^{2} } \right)}}$$
(4)

Here, for simplicity, the \({\alpha_{d}}^{{\prime}}\) is set to 0.5, and the \(n_{d}\) stands for the number of the diseases, which is inspired by [42]. Then, the matrix decomposition algorithm TSVD was further applied to reduce the dimension of these features.

Implementation of gated recurrent units neural network

In order to overcome several known defects of standard Recurrent Neural network (RNN) model, a series of improved models has been proposed in deep learning field. Among them, the Long short term memory (LSTM) [44, 45] and other similar variant models have the best performance and are widely used in a many fields [46,47,48]. The main reason for their effectiveness is the pull-in of gated mechanisms. The Gated Recurrent Units (GRU) was proposed by Cho et al. [49], which has only resetting gate and updating gate and all memory contents are fully open to each timestep. We follow the similar calculation process in [50].

The update gate \(u_{t}\) is calculated by:

$$z_{t} = sigmoid\left( {W_{z} i_{t} + U_{t} h_{t - 1} - b_{z} } \right)$$
(5)

here, the \(i_{t}\) indicates the input vector of GRU, \(h_{t - 1}\) stands for the previous output of model, \(W_{z}\), \(U_{z}\) and \(b_{z}\) are forward, recurrent matrices and biases for update gate, respectively. Similar to the process of update gate, the computed process of reset gate can be defined as follows:

$$r_{t} = sigmoid\left( {W_{r} i_{t} + U_{r} h_{t - 1} - b_{r} } \right)$$
(6)

where the parameters are same as above. Moreover, the candidate memory state \(c_{t}\) can be computed by:

$$c_{t} = \sigma \left( {W_{h} i_{t} + U_{h} \left( {r_{t} *h_{t - 1} } \right) - b_{h} } \right)$$
(7)

where \(\sigma_{h}\) is the tanh function and \(*\) means an element-wise multiplication. Finally, the memory state \(h_{t}\) of the GRU model is defined as:

$$h_{t} = \left( {1 - z_{t} } \right)h_{t - 1} + z_{t} c_{t}$$
(8)

In practice, the GRU model is implemented based on Keras framework [51]. Considering the limited scale of the problem, we set the number of hidden neurons in the GRU input layer to 128 and add a Dense layer (fully connected layer) behind the output layer as the classifier to reduce the final prediction probability results. The sigmoid function is employed as activation function, its mathematical behaviors can be expressed as follows:

$$\upsigma = {\text{sigmoid}}\left( x \right) = \frac{1}{{\left( {1 + e^{ - x} } \right)}}$$
(9)

before activation layer, we applied Dropout to reduce overfitting and enhance the model’s robustness [52]. The parameter of dropout was set to 0.25. And the binary cross-entropy was used as loss function, which corresponding to sigmoid activation function. Furthermore, loss function has significant influence to the performance of machine learning model. The binary cross-entropy can be defined as:

$$L\left( {{\text{t}},{\text{p}}} \right) = - \left( {\left( {1 - {\text{p}}} \right) \times \log \left( {1 - {\text{p}}} \right) + {\text{t}} \times {\text{log}}\left( {\text{p}} \right)} \right)$$
(10)

where p and t denote the prediction output and true label value. Moreover, we used the Adam optimizer the update the weights of model. The Adam integrated the advantages of both RMSProp and AdaGrad, which is popluar in this field [53].

Performance evaluation metrics

In order to comprehensively evaluate the performance of our model, we follow the widely used evaluation indicators and strategies [54, 55]. The tenfold cross-validation was applied to evaluate the performance of DDIPred. In each validation, all data randomly divides into ten equal parts. Nine-fold data are taken as train data, the rest one-fold is taken as test data. To guarantee the unbiased comparison, it confirmed that there is no overlap between train data and test data. The final validation result is the mean value of tenfold with standard deviations. We follow the extensive used evaluation criteria, including accuracy (Acc), true positive rate (TPR), true negative rate (TNR), positive predictive value (PPV) and Matthews Correlation Coefficient (MCC) defined as:

$${\text{Acc}} = \frac{TN + TP}{{TN + TP + FN + FP}}$$
(11)
$${\text{TPR}} = \frac{TP}{{TP + FN}}$$
(12)
$${\text{TNR }} = \frac{TN}{{TN + FP}}$$
(13)
$${\text{PPV}} = \frac{TP}{{TP + FP}}$$
(14)
$${\text{MCC}} = \frac{TP \times TN - FP \times FN}{{\sqrt {\left( {TP + FP} \right)\left( {TP + FN} \right)\left( {TN + FP} \right)\left( {TN + FN} \right)} }}$$
(15)

where TN stands for the true negative number, TP represents the true positive number, FN denotes the false negative number and FP indicates the false positive number. Certainly, the Receiver Operating Characteristic (ROC) curve and the area under the ROC curve (AUC) are also adopted to evaluate the performance. And considering the specificity of the research task, the predicted top-N ranked results are more valuable for related drug development or disease treatment research. We also test the performance of model based on the count of accurately retrieved true drug–disease interactions.

Results and discussion

In this study, we propose a deep learning model to predict potential drug–disease interactions, which can advance the discovery of new use of existing drugs or new treatment of diseases. In this section, we will systematically evaluate the performance of the model. Firstly, we evaluated the prediction capability of DDIPred on two benchmark datasets. And then, we compared it with other state-of-the-art models under the same experimental conditions. Furthermore, we made case studies to verify the practicability of the proposed method.

Drug–disease interactions prediction capability evaluation

First, the drug–disease interactions prediction capability of DDIPred is evaluated on two benchmark datasets Fdataset and Cdataset. The details of tenfold cross validation are listed at Tables 2 and 3 for Cdataset and Fdataset. The average values of tenfold cross-validation are taken as final report results as shown in Fig. 2.

Table 2 The tenfold cross-validation details on Cdataset
Table 3 The tenfold cross-validation details on Fdataset
Fig. 2
figure2

The performance of DDIPred on two benchmark datasets

As the Table 2 shown, the mean accuracy of tenfold cross-validation on Cdataset is 81.48% with standard deviation 1.48%, the mean TPR is 80.59% with standard deviation 2.86%, the mean TNR is 83.01% with standard deviation 2.71%, the average PPV is 80.03% with standard deviation 2.88% and the mean MCC of DDIPred on Cdataset is 63.06% with standard deviation 2.99%. The rigorous cross validation results provided that our model have obvious predictive ability for predicting the associations between drugs and diseases.

The tenfold cross-validation performance of DDIPred on Fdataset is shown in Table 3. The average accuracy on Fdataset is 77.83% with standard deviation 2.43%, and the average TPR is 77.13% with standard deviation 4.37%, the average TNR is 79.22% with standard deviation 3.48%, the average PPV is 76.57% with standard deviation 4.06% and the mean MCC of DDIPred on Fdataset is 55.80% with standard deviation 4.93%. The performance of DDIPred on this dataset is slightly weaker than on the Cdataset, but it still has acceptable results, which means it is competent for the drug–disease associations prediction task.

Comparison with other state-of-the-art methods

We further compared the proposed model with other state-of-the-art methods on same datasets under same experimental conditions, including previous studies and widely used machine learning model Support Vector Machine (SVM), the comparison results are reported at Tables 4 and 5 and Fig. 3.

Table 4 Comparison of the AUC of previous studies and DDIPred on datasets
Table 5 Comparing the tenfold cross-validation performance of DDIPred and SVM on two gold standard datasets
Fig. 3
figure3

The performance of DDIPred and comparison method on two benchmark datasets: a the ROC and AUC of DDIPred on Cdataset; b the ROC and AUC of SVM on Cdataset; c the ROC and AUC of DDIPred on Fdataset; d the ROC and AUC of SVM on Fdataset

We compared the AUC of our model and previous studies including DrugNet [33] and HGBI [32]. Considering the difference of experimental evaluation indicators in different research, we only compared the AUC value reported in every study, which can best reflect the performance of model. As shown in Table 4 and Fig. 3, the DrugNet obtained a AUC of 0.804 on Cdataset and a AUC of 0.778 on Fdataset. The HGBI performed better than DrugNet with AUC of 0.858, 0.829 on Cdataset and Fdataset respectively. However, the AUC of DDIPred are 0.871, 0.838 on Cdataset and Fdataset, our model performs best on both datasets.

Furthermore, we did a comparison between our model and widely used machine learning model SVM, which is often used as a baseline model and usually has great performance in various fields. The feature input, tenfold cross validation set, evaluation metrics and other experimental conditions are exactly same between DDIPred and SVM model. The parameters of SVM are determined by grid search. The results are shown in Table 5. Our model has significantly improved all indicators.

Case studies

In order to further examined the capability of the proposed model in predicting new associations between drugs and diseases. A drug and a disease are selected as case to be measured. The feature of the tested drug or disease and the feature of each disease or drug were combined as test data. Then, these data are fed into trained model to obtained prediction scores. Finally, all candidates are ranked based on prediction scores. The Zoledronic acid (DrugBank Accession Number: DB00399) and Dexamethasone (DrugBank Accession Number: DB01234) were selected for our case. Zoledronic acid is usually used to treat bone metastases pain, hypercalcemia of malignancy. And it can also helpful to prevent skeletal fractures in multiple myeloma and prostate cancer patients. Dexamethasone has anti-inflammatory, anti-immune, anti-toxin, antipyretic and other effects, and has a greater impact on metabolism. The prediction results are demonstrated in Tables 6 and 7, our model found the diseases most relevant to the target drugs, both confirmed indications and new potential candidate diseases are successfully predicted.

Table 6 Predicted diseases most relevant to Zoledronic acid
Table 7 Predicted diseases most relevant to Dexamethasone

Conclusion

In this work, we proposed a novel deep learning model DDIPred using comprehensive similarity measure and Gaussian interaction profile kernel and gated recurrent neural networks to predict potential drug–disease associations, which may find new indications of existing drugs and can accelerate the process of drug research and development. The similarity measure matrix is used to exploit discriminative feature for drugs based on their chemical fingerprints. Meanwhile, the Gaussian interactions profile kernel is employed to obtain efficient feature for diseases based on known disease–disease associations. Then, we implemented a competitive deep learning GRU model to deal with the prediction task. Our model achieved remarkable performance on both two benchmark datasets with excellent AUC of 0.871 and 0.838 on Cdataset and Fdataset, and outperforms all comparison state-of-the-art models in many indicators. And we further made case studies to verify the predictive ability of our model. The rigorous experimental results proved the proposed method is powerful tool for predicting new indications for drugs or new treatments for diseases, and can be regarded as a useful guide for drug repositioning and drug discovery.

Availability of data and materials

The datasets used and/or analysed during the current study are available at: https://github.com/haichengyi/DDIPred.

Abbreviations

R&D:

Research and development

GIP:

Gaussian interaction profile kernel

TSVD:

Truncated Singular Value Decomposition

GRU:

Gated Recurrent Units

SVM:

Support Vector Machine

OMIM:

Online Mendelian Inheritance in Man

SMILES:

Simplified Molecular Input Line Entry Specification

LSTM:

Long short-term memory

Acc:

Accuracy

TPR:

True positive rate

TNR:

True negative rate

PPV:

Positive predictive value

MCC:

Matthews Correlation Coefficient

ROC:

Receiver operating characteristic curve

AUC:

The area under the receiver operating characteristic curve

References

  1. 1.

    Ashburn TT, Thor KB. Drug repositioning: identifying and developing new uses for existing drugs. Nat Rev Drug Discov. 2004;3:673.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  2. 2.

    Booth B, Zemmel R. Prospects for productivity. Nat Rev Drug Discov. 2004;3:451.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  3. 3.

    Dudley JT, Deshpande T, Butte AJ. Exploiting drug–disease relationships for computational drug repositioning. Brief Bioinform. 2011;12(4):303–11.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  4. 4.

    Nagaraj AB, Wang QQ, Joseph P, Zheng C, Chen Y, Kovalenko O, Singh S, Armstrong A, Resnick K, Zanotti K. Using a novel computational drug-repositioning approach (DrugPredict) to rapidly identify potent drug candidates for cancer treatment. Oncogene. 2018;37(3):403–14.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  5. 5.

    Luo H, Wang J, Li M, Luo J, Peng X, Wu FX, Pan Y. Drug repositioning based on comprehensive similarity measures and Bi-Random walk algorithm. Bioinformatics. 2016;32(17):2664.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  6. 6.

    Luo H, Li M, Wang S, Liu Q, Li Y, Wang J. Computational drug repositioning using low-rank matrix approximation and randomized algorithms. Bioinformatics. 2018;34(11):1904–12.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  7. 7.

    Chen X, Sun Y-Z, Zhang D-H, Li J-Q, Yan G-Y, An J-Y, You Z-H: NRDTD: a database for clinically or experimentally supported non-coding RNAs and drug targets associations. Database. 2017;2017:bax057.

  8. 8.

    Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 2009;38(suppl_1):D355–60.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  9. 9.

    Hamosh A, Scott AF, Amberger J, Bocchini C, Valle D, Mckusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005;33(1):514–7.

    Google Scholar 

  10. 10.

    Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, Wrobel MJ, Lerner J, Brunet J-P, Subramanian A, Ross KN. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science. 2006;313(5795):1929–35.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  11. 11.

    Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V. DrugBank 30: a comprehensive resource for ‘Omics’ research on drugs. Nucleic Acids Res. 2011;39(Database issue):D1035.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  12. 12.

    Kuhn M, Szklarczyk D, Pletscher-Frankild S, Blicher TH, Von MC, Jensen LJ, Bork P. STITCH 4: integration of protein-chemical interactions with user data. Nucleic Acids Res. 2014;42(Database issue):401–7.

    Article  CAS  Google Scholar 

  13. 13.

    Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, Mcglinchey S, Michalovich D, Al-Lazikani B. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012;40(Database issue):1100–7.

    Article  CAS  Google Scholar 

  14. 14.

    Meng F-R, You Z-H, Chen X, Zhou Y, An J-Y. Prediction of drug–target interaction networks from the integration of protein sequences and drug chemical structures. Molecules. 2017;22(7):1119.

    PubMed Central  Article  CAS  Google Scholar 

  15. 15.

    Luo H, Chen J, Shi L, Mikailov M, Zhu H, Wang K, He L, Yang L. DRAR-CPI: a server for identifying drug repositioning potential and adverse drug reactions via the chemical–protein interactome. Nucleic Acids Res. 2011;39(suppl_2):W492–8.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  16. 16.

    Guo Z-H, You Z-H, Huang D-S, Yi H-C, Chen Z-H, Wang Y-B. A learning based framework for diverse biomolecule relationship prediction in molecular association network. Commun Biol. 2020;3(1):118.

    PubMed  PubMed Central  Article  Google Scholar 

  17. 17.

    Yi H-C, You Z-H, Huang D-S, Li X, Jiang T-H, Li L-P. A deep learning framework for robust and accurate prediction of ncRNA-protein interactions using evolutionary information. Mol Ther Nucleic Acids. 2018;11:337–44.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  18. 18.

    Yi H-C, You Z-H, Cheng L, Zhou X, Jiang T-H, Li X, Wang Y-B. Learning distributed representations of RNA and protein sequences and its application for predicting lncRNA-protein interactions. Comput Struct Biotechnol J. 2020;18:20–6.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  19. 19.

    He T, Bai L, Ong Y. Manifold regularized stochastic block model. In: 2019 IEEE 31st international conference on tools with artificial intelligence (ICTAI). 2019. P. 800–7.

  20. 20.

    He T, Chan KCC. Discovering fuzzy structural patterns for graph analytics. IEEE Trans Fuzzy Syst. 2018;26(5):2785–96.

    Article  Google Scholar 

  21. 21.

    He T, Chan KCC. MISAGA: an algorithm for mining interesting subgraphs in attributed graphs. IEEE Trans Cybern. 2018;48(5):1369–82.

    PubMed  Article  PubMed Central  Google Scholar 

  22. 22.

    He T, Chan KCC. Measuring boundedness for protein complex identification in PPI networks. IEEE/ACM Trans Comput Biol Bioinf. 2019;16(3):967–79.

    CAS  Article  Google Scholar 

  23. 23.

    He T, Liu Y, Ko TH, Chan KCC, Ong YS. Contextual correlation preserving multiview featured graph clustering. IEEE Trans Cybern. 2020;50(10):4318–4331.

  24. 24.

    Yi H-C, You Z-H, Huang D-S, Guo Z-H, Chan KC, Li Y. Learning representations to predict intermolecular interactions on large-scale heterogeneous molecular association network. iScience. 2020;23(7):101261.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  25. 25.

    Yi H-C, You Z-H, Guo Z-H. Construction and analysis of molecular association network by combining behavior representation and node attributes. Front Genet. 2019;10:1106.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  26. 26.

    Chiang AP, Butte AJ. Systematic evaluation of drug–disease relationships to identify leads for novel drug uses. Clin Pharmacol Ther. 2009;86(5):507–10.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  27. 27.

    Gottlieb A, Stein GY, Ruppin E, Sharan R. PREDICT: a method for inferring novel drug indications with application to personalized medicine. Mol Syst Biol. 2011;7(1):496.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  28. 28.

    Francesco N, Yan Z, Moreira VM, Roberto T, Juha K, Mauro DA, Dario G. Drug repositioning: a machine-learning approach through data integration. J Cheminform. 2013;5(1):30–30.

    Article  CAS  Google Scholar 

  29. 29.

    Iorio F, Bosotti R, Scacheri E, Belcastro V, Mithbaokar P, Ferriero R, Murino L, Tagliaferri R, Brunetti-Pierri N, Isacchi A. Discovery of drug mode of action and drug repositioning from transcriptional responses. Proc Natl Acad Sci. 2010;107(33):14621–6.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  30. 30.

    Cheng F, Liu C, Jiang J, Lu W, Li W, Liu G, Zhou W, Huang J, Tang Y. Prediction of drug–target interactions and drug repositioning via network-based inference. PLoS Comput Biol. 2012;8(5):e1002503.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  31. 31.

    Wu C, Gudivada RC, Aronow BJ, Jegga AG. Computational drug repositioning through heterogeneous network clustering. BMC Syst Biol. 2013;7(5):1–9.

    Google Scholar 

  32. 32.

    Wang W, Yang S, Zhang X, Li J. Drug repositioning by integrating target information through a heterogeneous network model. Bioinformatics. 2014;30(20):2923–30.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  33. 33.

    Martínez V, Navarro C, Cano C, Fajardo W, Blanco A. DrugNet: Network-based drug–disease prioritization by integrating heterogeneous data. Artif Intell Med. 2015;63(1):41–9.

    PubMed  Article  PubMed Central  Google Scholar 

  34. 34.

    Zeng X, Zhu S, Liu X, Zhou Y, Nussinov R, Cheng F. deepDR: a network-based deep learning approach to in silico drug repositioning. Bioinformatics. 2019;35(24):5191–8.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  35. 35.

    Chen H, Cheng F, Li J. iDrug: Integration of drug repositioning and drug-target prediction via cross-network embedding. PLoS Comput Biol. 2020;16(7):e1008040.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  36. 36.

    Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, Gautam B, Hassanali M. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 2008;36(Database issue):901–6.

    Article  CAS  Google Scholar 

  37. 37.

    Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E. The Chemistry Development Kit (CDK): an open-source Java library for chemo-and bioinformatics. J Chem Inf Comput Sci. 2003;43(2):493–500.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  38. 38.

    Weininger D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci. 1988;28(1):31–6.

    CAS  Article  Google Scholar 

  39. 39.

    Vanunu O, Magger O, Ruppin E, Shlomi T, Sharan R. Associating genes and protein complexes with disease via network propagation. PLoS Comput Biol. 2010;6(1):e1000641.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  40. 40.

    Nepusz T, Yu H, Paccanaro A. Detecting overlapping protein complexes in protein–protein interaction networks. Nat Methods. 2012;9(5):471.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  41. 41.

    Yu L, Huang J, Ma Z, Zhang J, Zou Y, Gao L. Inferring drug-disease associations based on known protein complexes. BMC Med Genomics. 2015;8(2):S2.

    PubMed  PubMed Central  Article  Google Scholar 

  42. 42.

    van Laarhoven T, Nabuurs SB, Marchiori E. Gaussian interaction profile kernels for predicting drug–target interaction. Bioinformatics. 2011;27(21):3036–43.

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  43. 43.

    Chen X, Jiang Z-C, Xie D, Huang D-S, Zhao Q, Yan G-Y, You Z-H. A novel computational model based on super-disease and miRNA for potential miRNA–disease association prediction. Mol BioSyst. 2017;13(6):1202–12.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  44. 44.

    Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  45. 45.

    Gers FA, Schmidhuber J, Cummins F. Learning to forget: continual prediction with LSTM. 1999.

  46. 46.

    Shen Z, Bao W, Huang D-S. Recurrent neural network for predicting transcription factor binding sites. Sci Rep. 2018;8(1):15270.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  47. 47.

    Yi H-C, You Z-H, Zhou X, Cheng L, Li X, Jiang T-H, Chen Z-H. ACP-DL: a deep learning long short-term memory model to predict anticancer peptides using high-efficiency feature representation. Mol Ther Nucleic Acids. 2019;17:1–9.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  48. 48.

    Wang Y-B, You Z-H, Yang S, Yi H-C, Chen Z-H, Zheng K. A deep learning-based method for drug-target interaction prediction based on long short-term memory neural network. BMC Med Inform Decis Mak. 2020;20(2):49.

    PubMed  PubMed Central  Article  Google Scholar 

  49. 49.

    Cho K, Van Merriënboer B, Bahdanau D, Bengio Y: On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:14091259. 2014.

  50. 50.

    Chung J, Gulcehre C, Cho K, Bengio Y: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:14123555. 2014.

  51. 51.

    Chollet F. Keras: The python deep learning library. Astrophysics Source Code Library. 2018.

  52. 52.

    Gal Y, Hron J, Kendall A. Concrete dropout. 2017. arXiv preprint arXiv:1705.07832.

  53. 53.

    Kingma DP, Ba J. Adam: a method for stochastic optimization. 2014. arXiv preprint arXiv:1412.6980v3.

  54. 54.

    Yi H-C, You Z-H, Guo Z-H, Huang D-S, Chan KCC. Learning representation of molecules in association network for predicting intermolecular associations. IEEE/ACM Trans Comput Biol Bioinform. 2020. https://doi.org/10.1109/TCBB.2020.2973091.

  55. 55.

    Yi H-C, You Z-H, Wang M-N, Guo Z-H, Wang Y-B, Zhou J-R. RPI-SE: a stacking ensemble learning framework for ncRNA-protein interactions prediction using sequence information. BMC Bioinform. 2020;21(1):60.

    CAS  Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank all the editors and anonymous reviewers for their constructive advices.

About this supplement

This article has been published as part of BMC Bioinformatics Volume 22 Supplement 3, 2021: Proceedings of the 2019 International Conference on Intelligent Computing (ICIC 2019): bioinformatics. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-22-supplement-3.

Funding

Publication costs are sponsored in part by the National Outstanding Youth Science Foundation of NSFC under Grant 61722212, in part by the National Natural Science Foundation of China under Grants 61873212, 61861146002, and 61732012. The funders have no role in study design, data collection, data analysis, data interpretation, or writing of the manuscript.

Author information

Affiliations

Authors

Contributions

HCY, ZHY conceived the algorithm, carried out analyses, prepared the data sets, carried out experiments, and wrote the manuscript; LW, XRS, XZ and THJ designed, performed, and analyzed experiments and wrote the manuscript; All authors read and approved the final manuscript.

Corresponding author

Correspondence to Zhu-Hong You.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yi, HC., You, ZH., Wang, L. et al. In silico drug repositioning using deep learning and comprehensive similarity measures. BMC Bioinformatics 22, 293 (2021). https://doi.org/10.1186/s12859-020-03882-y

Download citation

Keywords

  • Drug repositioning
  • Drug–disease interaction
  • Gated recurrent units
  • Gaussian interaction profile kernel
  • Machine learning