 Research
 Open access
 Published:
NeuRank: learning to rank with neural networks for drug–target interaction prediction
BMC Bioinformatics volume 22, Article number: 567 (2021)
Abstract
Background
Experimental verification of a drug discovery process is expensive and timeconsuming. Therefore, recently, the demand to more efficiently and effectively identify drug–target interactions (DTIs) has intensified.
Results
We treat the prediction of DTIs as a ranking problem and propose a neural network architecture, NeuRank, to address it. Also, we assume that similar drug compounds are likely to interact with similar target proteins. Thus, in our model, we add drug and target similarities, which are very effective at improving the prediction of DTIs. Then, we develop NeuRank from a pointwise to a pairwise, and further to listwise model.
Conclusion
Finally, results from extensive experiments on five public data sets (DrugBank, Enzymes, Ion Channels, GProteinCoupled Receptors, and Nuclear Receptors) show that, in identifying DTIs, our models achieve better performance than other stateoftheart methods.
Introduction
In drug discovery, experimental verification of Drug–Target Interactions (DTIs) is so expensive and timeconsuming that only a small fraction of DTIs have been verified [1,2,3,4,5,6]. Therefore, there is a great need for an effective and efficient computational method for identifying DTIs.
Recently, with the rapid development of highthroughput techniques, a great deal of drug–target interaction data has been generated [7, 8]. Traditional experimental verification limits the speed at which new drugs can be identified [9,10,11]. To meet the increasing need for rapid and effective drug discovery, machine learning methods have become more and more widely applied to detect potential DTIs from verified DTI information [12,13,14,15,16]. Matrix Factorization (MF) [17], one of the most successful methods in recommender systems [18], has been widely extended to DTI prediction. For example, Cobanoglu et al. [19] adopted Probabilistic Matrix Factorization (PMF) [20] to identify the potential drug–target association between chemicals and targets; Gönen [21] developed MF by adopting chemical and genomic kernels to predict DTI networks; Liu et al. [9] added neighborhood regularization to logistic MF to predict the probability that a drug will interact with a target. However, most existing MFbased methods only considered a linear and shallow relation between a drug and a target, which is insufficient to capture the complicated relationship between them.
Recently, great success has been achieved with deep learning models in Computer Vision (CV) [22, 23], Neural Language Processing (NLP) [24, 25], and recommender systems [26,27,28]. The goal of deep learning models is to capture the higherorder relation between input data by their hidden layers [3, 29, 30]. To overcome the limitation of traditional MFbased methods, many researchers have tried to apply deep learning models to the prediction of DTIs. For example, Wang et al. [31] adopted Restricted Boltzmann Machines (RBM) [32] to predict DTIs; Gao et al. [33] proposed a neural network combined with a twoway attention network to provide biological insights to interpret the drug–target predictions; AltaeTran et al. [34] integrated Long ShortTerm Memory (LSTM) and graph Convolutional Neural Networks (CNN) to obtain meaningful information from a few data points. Compared with MF, deep learning models have a greater ability to capture deep representation from raw input data.
Although many deep learning models have been proposed to predict potential DTIs, little effort has been devoted to explore ranking learning in the prediction of DTIs. To comply with the DTI prediction setting, Peska et al. [35] extended Bayesian Personalized Ranking (BPR) [36], which has shown excellent performance in various learning tasks; Yuan et al. [37] designed a rankingbased ensemble learning method, DrugE–Rank, which is modeled on multiple wellknown similaritybased methods to improve prediction performance. But, these methods, based on traditional machine learning methods, such as MF and kNearest Neighbor (kNN), are insufficient to capture the drug–target latent structures, for they do not consider any deep interactions between latent features.
Inspired by the good performance of deep learning models in various tasks, to predict DTIs, we designed a neural network architecture, NeuRank, in which, we treat identifying DTIs as a ranking task. Deep learning models are powerful and flexible for learning useful representations. Based on Multilayer Perceptron (MLP) architecture, we extended a new interaction module for drugs and targets to better model their relationship. Then, for better performance, we developed our model from a pointwise to a pairwise and further to a listwise method. In the pairwise method, we assume that the observed DTIs, which have been experimentally verified, are more trustworthy and more important than the unknown ones. Thus, we model the relative ordering from each pair of targets to make predictions, and learn to rank by optimizing a pairwise loss function to find the correct ranking for all targets. And in the listwise method, we seek to maximize the topone probability of targets in the ranking list.
Many works have shown that drugs with similar chemical structures have similar therapeutic functions [38,39,40]. This information is used to enrich latent factors and strengthen the presentation ability of the models. For example, Zheng et al. [38] proposed a model, Multiple Similarities Collaborative Matrix Factorization (MSCMF), which learns lowrank features first and then combines them with weighted similarity matrices over drugs and targets for prediction; Zhang et al. [41] adopted drug featurebased and disease semantic similarities as constraints for drugs and diseases; Laarhoven et al. [42] using chemical similarity and interaction information about known compounds, applied the nearest neighbor algorithm to construct an interaction score for drugs. The methods with similar information are able to make better predictions than other methods without any additional information. Thus, for better build relationships between drug–drug and target–target, a similarity calculation method is used to learn the link between these data.
Our contributions are summarized as follows:

(1)
We solved the DTI problem by using neural networks with a strong ability to capture nonlinearity from raw data and learn deep features from a ranking learning perspective;

(2)
To better predict DTIs, especially for new drugs and targets, we added drug–drug and target–target similarities to our model;

(3)
For different applications, we developed three neural networks from pointwise to pairwise learning and further to listwise learning.
The rest of the paper is organized as follows: “Related work” section briefly reviews the background and some related work. “Proposed methods” section presents our proposed models in detail. “Experiments” section describes the experimental results for several data sets to show the performance of our models. “Conclusion” section gives the conclusion and provides future directions.
Related work
First, we discuss the problem to be solved and define the notations that are used in the rest of the paper. Then, we introduce two MFbased methods, which are closely related to our model: a traditional one Collaborative Matrix Factorization (CMF), and a pairwise ranking learning one, BPR.
Problem definition
Given a DTI matrix, \(\varvec{Y} \in {\mathbb {R}}^{n\times m}\), with a set of n drugs, \(\varvec{D}\), and a set of m targets, \(\varvec{T}\), and element, \(y_{dt} \in \left\{ 0,1 \right\}\). If drug, d, has been experimental verified to interact with target, t, then \(y_{dt}=1\); otherwise, \(y_{dt}=0\). \(\varvec{P}\in {\mathbb {R}}^{n\times k}\) and \(\varvec{Q} \in {\mathbb {R}}^{m\times k}\) denote the lowrank latent features of drugs and targets, respectively, where k denotes the number of latent features. \(\varvec{p}_d\) and \(\varvec{q}_t\) denote the latent features of drug, d, and target, t, respectively. The goal of MF for DTIs is to learn \(\varvec{P}\) and \(\varvec{Q}\) to reconstruct \(\varvec{Y}\):
where \(\varvec{V}\) denotes the set of interactions that have been experimentally verified; \(\left\ \cdot \right\ ^2_F\) denotes the Frobenius norm; \(\lambda\) denotes a regularization coefficient.
CMF
CMF, proposed in [38], adopts multiple kinds of drug–drug and target–target similarities. The objective function of CMF is defined as follows:
where \(\lambda\), \(\lambda _d\), and \(\lambda _t\) denote regularization coefficients; \(\varvec{S}^d \in {\mathbb {R}}^{n\times n}\) denotes the similarity matrix for drugs, and \(\varvec{S}^t \in {\mathbb {R}}^{m\times m}\) denotes the similarity matrix for targets.
The first term, MF, learns lowrank latent features, \(\varvec{P}\), and, \(\varvec{Q}\), to reconstruct \(\varvec{Y}\); the second term is L2 regularization to prevent the model from overfitting; the last two terms are regularizations, which minimize the squared error between \(\varvec{S}^d\) and \(\varvec{PP}^T\), and between \(\varvec{S}^t\) and \(\varvec{QQ}^T\). The key idea is that the similarity between drugs or targets should be approximated by the inner product of the corresponding two feature vectors.
BPR
DTIs provide only very few verified instances to train; therefore, it is inherently difficult to uncover the interaction probability between drugs and targets. Instead of directly predicting the absolute probability of DTIs, BPR uses pairwise ranking loss to model the relative order between observed and unobserved interactions.
Based on BPR, Peska et al. [35] developed the DTI prediction model, which has shown promising power in personalized recommendations. The key idea of BPR is that observed interactions should be ranked higher than unobserved ones [36]. The goal of BPR for DTI predictions is to learn the probability that a drug will interact with a target. BPR aims to maximize the posterior probability that drug, d, interacts with the pair targets of t and i: \(p\left( \varvec{\theta }  t>_d i \right)\), where \(\varvec{\theta }\) is the set of learning parameters. The posterior probability is defined as follows:
Then, the probability that drug, d, interacts with target, t, rather than i is defined as follows:
where \(\sigma (x)=1/\left( 1+exp(x)\right)\) is the sigmoid function, and \({\widehat{y}}_{dt}\) and \({\widehat{y}}_{di}\) are the predicted scores for targets t and i with drug, d, respectively. \({\widehat{y}}_{dt}\), estimated by MF, linearly combines drug and target features as follows:
where \(\varvec{p}_{d}\) and \(\varvec{q}_{t}\) denote the latent features of drug, d, and target, t, respectively.
Finally, based on Bayesian inference, the objective function of BPR, which minimizes the pairwise ranking loss for all pair instances, is defined as follows:
where \(\varvec{F}=\left\{ (d,t,i) d \in \varvec{D} \wedge t \in \varvec{V}_d^+ \wedge i \in \varvec{V}_d^ \right\}\) denotes that drug, d, tends to interact with target, t, rather than i, where, when given a drug, d, \(\varvec{V}_d^+=\{t \in \varvec{T}y_{dt}=1 \}\) denotes a set of targets that have been experimentally verified to interact with d. \(\varvec{V}_d^\) is the rest, and \(\lambda\) is the regularization parameter.
Both CMF and BPR are MFbased methods, which are linear in nature. Therefore, when compared to nonlinear methods, they have limited performance [27, 43]. Inspired by the idea from BPR for ranking learning in DTI prediction and the good performance of NeuMF [43] in recommender systems, we developed a neural network to promote DTI prediction in ranking perspective.
Proposed methods
Methods for oneclass data, i.e. data with only positive examples, are classified into three categories: pointwise regression, pairwise, and listwise methods. Pointwise regression methods directly optimize the absolute value of binary interaction. Pairwise ranking methods assume that drugs have a higher possibility to interact with verified targets rather than unverified ones. And listwise ranking methods seek to maximize the topone probability of targets in the ranking list.
In this section, we build our NeuRank to learn simultaneously the latent features of DTIs and similarity information. First, we introduce in detail the framework of the pointwise method, NeuRank. Then, we develop our model from pointwise to pairwise learning and further to listwise learning. The purpose of our models is to predict the probability that a drug will interact with a target from observed DTIs.
Framework
Pointwise methods, which consider unobserved interactions to be inherently negative, combine the latent features of drugs and targets to predict the score used to rank. Figure 1 illustrates the network framework of NeuRank, which consists of the following five layers: input, embedding, interaction, hidden, and prediction.
Input and embedding layers The role of the embedding layer is to transfer drug and target IDs from the input layer to latent representation space and map the sparse features to dense features as follows:
where \(\varvec{P}\in {\mathbb {R}}^{n\times k}\) and \(\varvec{Q} \in {\mathbb {R}}^{m\times k}\) denote the embedding matrices for drugs and targets, respectively; d and t denote the onehot encoding representation of the ID of a drug and a target, respectively, and their embedding vectors \(\varvec{q}_d \in {\mathbb {R}}^{1\times k}\) and \(\varvec{q}_t\in {\mathbb {R}}^{1\times k}\), respectively.
Interaction layer The role of the interaction layer is to model the interactions between drugs and targets in the shallow layer. The interaction layer, which captures the rowrank relations between drugs and targets, is defined as follows:
where \(f(\cdot )\) denotes the interaction functions between \(\varvec{p}_u\) and \(\varvec{q}_i\), such as concatenation, elementwise product, and elementwise sum. We chose elementwise product as our interaction function.
Hidden layers The role of the hidden layers is to learn nonlinear correlations between drugs and targets. Hidden layers provide neural networks a powerful ability to model the highrank relationships between features as follows:
where \(\varvec{W}_l\), \(\varvec{b}_l\), \(\varvec{h}_{l}\) and \(a(\cdot )\) denote weight, bias, output, and activation functions of the lth (\(0< l\le L\)) layer, respectively. The ReLU function is used as our activation function.
Prediction layer The role of the prediction layer is to compute the probability that a drug will interact with a target. The output, \({\widehat{y}}_{dt }\), is defined as follows:
where \(\sigma (\cdot )\) denotes the sigmoid function.
In NeuRank, the square loss function is used to evaluate loss and the L2 norm is used to regularize all learning parameters:
where \(\varvec{\Theta }\) denotes the learning parameter set of NeuRank.
Pairwise NeuRank
To make predictions, pairwise methods model the relative ordering from each pair of targets. In contrast to the pointwise method, pairwise methods assume that observed interactions are more trust worthy than unobserved ones. Then, NeuRank is developed from pointwise to pairwise learning NeuRank (pNeuRank). Illustrated in Fig. 2 is the network framework of pNeuRank.
In pNeuRank, we assume that an experimentally verified target that interacts with a drug will be assigned a higher value than an unverified target. Thus, the objective function is defined as follows:
where \(\varvec{F}=\left\{ (d,t,i) d \in \varvec{D} \wedge t \in \varvec{V}_d^+ \wedge i \in \varvec{V}_d^  \right\}\) denotes that drug, d, tends to interact more with target, t, than with i; \(\lambda _p\), \(\lambda _d\) and \(\lambda _t\) are the regularization parameters; and \(\varvec{\Theta }_p\) denotes the learning parameter set of pNeuRank.
In pNeuRank, the first four layers (input, embedding, interaction, and hidden) are the same as in the previous NeuRank framework. The key difference is the final output layer, \({\widehat{y}}_{dti}\), defined as follows:
where \({\widehat{y}}_{dt}\) is the output of the final hidden layer when given an observed interaction between drug, d, and target, t; \({\widehat{y}}_{di}\) is the output when given an unobserved interaction between drug, d, and target, i; and \(\sigma (\cdot )\) denotes the sigmoid function to bound the gap between the two values.
Listwise NeuRank
Finally, we design a listwise framework, lNeuRank, to predict the potential DTIs. In lNeuRank, we seek to maximize the topone probability of targets in the ranking list. The framework is shown in Fig. 3. In Fig. 3, in the list of \(\left( K+1\right)\) targets for training, there are one positive instance, and K negative instances sampled from drug d. \({\varvec{q}}_i^{\_}\), where \(i \in \left[ 1,K\right]\), denotes the embeddings from negative instances.
Similarly, in lNeuRank, the first four layers (input, embedding, interaction, and hidden) are the same as in the previous NeuRank framework. The key difference is the final output layer, \({\widehat{y}}_{dt}\), defined as follows:
where \(x_{dt}\) denotes the output from the final hidden layer. We chose the softmax function to map the results from the hidden layer to prediction. The probability \({\hat{y}}_{dt}\) that target t ranks at the topone for drug d is defined as follows:
Then, loss is evaluated by cross entropy, which used to measure the distribution between the true list and the predicted list from the ranking model, is defined as follows:
where \(l_d^+\) and \(l_d^\) denote the verified and unverified interaction list of drug d, respectively; and \(\varvec{\Theta }_l\) denotes the learning parameter set of lNeuRank.
Similarity information
Based on the assumption that similar drugs will interact with similar targets, and vice versa, we added drug–drug similarity and target–target similarity networks to our model. The chemical structure similarity between compounds and the sequence similarity between target proteins are critical for improving the prediction of DTIs, especially when few DTIs are available. Therefore, to predict the interaction from new drugs/targets, we added that similarity information to our models. Similarity regularization is defined as follows:
where \(\Omega \left( \cdot \right)\) is the function to measure the distance between predicted and true similarities. An function which measures the distance from the true values as shown in the following:
Finally, the objective function is defined as follows:
where \({\mathcal {L}}_i\) is the loss function of NeuRank Eq. 11, pNeuRank Eq. 12, lNeuRank Eq. 16, respectively.
Sampling for imbalance data
Since only a small fraction of DTIs is verified, which causes the imbalance data problem, i.e. the number of known DTIs is much larger than the number of unknown DTIs. The imbalance data used to train model will lead to poor performance.
To alleviate this problem, negative sampling, an effective method, is used. In general, the negative sample is proportional to the number of positive sample for each drug/target. The negative DTIs are randomly selected from a set of unobserved DTIs with an equal probability.
Experiments
First, we introduce the data sets used in our experiments; then, we present the baselines we used as comparisons with our models and the metrics we adopted for evaluation; finally, we conduct the experiments in detail and make a detailed analysis.
Experimental setting
Data sets We performed experiments on five public data sets: DrugBank, Nuclear Receptors, GProteinCoupled Receptors (GPCRs), Ion Channels and Enzymes. The first data set, which contains information on drugs and targets created and maintained by the University of Alberta and The Metabolomics Innovation Centre, is available at DrugBank Database^{Footnote 1}. As both a bioinformatics and a cheminformatics resource, DrugBank combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information [44]. And the rest data sets, whose observed DTIs were extracted from public databases KEGG BRITE [45], BRENDA [46], SuperTarget [47], and DrugBank [48], are available at: http://web.kuicr.kyotou.ac.jp/supp/yoshi/drugtarget/. The drug chemical structure information is retrieved from the KEGG LIGAND [45], and the threedimentional structure of target protein is retrieved from PDB [49]. Each one contains three types of information: 1) verified DTIs; 2) drug similarities; and 3) target similarities [50]. Table 1 lists some statistics about the verified DTIs in all the data sets.
Drug–drug similarities are computed by SIMCOMP [51], which uses a graph method to model the size of the common substructures between two compounds. Target–target similarities are computed by normalized SmithWaterman [52], which measures the similarity scores between the amino acid sequences of two proteins.
Evaluation metrics Following previous works [1, 9, 35, 38], two popular metrics: Area Under the Precision–Recall (AUPR) and Area Under the Curve (AUC), are used for performance evaluation in the prediction of DTIs. To evaluate our proposed methods, we used 10–fold Cross Validation (CV) and compared it with other baseline approaches. In 10–fold CV, the data set is randomly divided into 10 equal sized subsets. Of the 10 subsets, a single subset is retained as the validation data for testing the model; the remaining 9 subsets are used as training data. CV is then repeated 10 times, with each of the 10 subsets used exactly once as the validation data. The 10 results are then averaged to produce a single estimation. An AUC score is estimated in each repetition of CV; finally, the average score over all five repetitions is determined. The AUPR score is estimated in the same way.
In DTIs tasks, the main purposes are to effectively detect potential DTIs and discover new drugs. Thus, we conducted CV under the following two different settings:
\(CV_{dt}\): CV on drug–target pairs In this case, we randomly chose 90% of the drug–target pairs in \(\varvec{Y}\) as training data and the remaining 10% as testing data;
\(CV_{nd}\): CV on new drugs In this case, we randomly chose 90% of the rows in \(\varvec{Y}\) as training data and the remaining 10% as testing data;
Baseline approaches To illustrate the effectiveness of our models, we compared our models with the following methods:

PMF, the probabilistic MF, uses dot products on the latent features of drugs and targets to make predictions [19];

CMF, the stateoftheart MFbased method, models on, not only DTIs, but also drug–drug and target–target similarities [38];

BRDTI, the stateoftheart BPRbased method, extends the BPR method by adding similarity information and target bias [35];

RBM, a shallow neural networkbased method for DTI prediction, its visible units encode observed types of DTIs, and its hidden units represent latent features describing DTIs [31];

DeepDTIs, the stateoftheart deep learning method, uses Deep Belief Networks (DBN) to predict DTIs, without taking similarity information into consideration [29].
Parameter settings Our models have seven key parameters: latent feature size (k), learning rate (\(\tau\)), the number of hidden layers (l), batch size (b), one regularization parameter for learning parameters (\(\lambda\)), and two regularization parameters for similarity information (\(\lambda _d\) and \(\lambda _t\)). These parameters and factors were determined by gridsearch on the validation error. In gridsearch, k is chose from \(\{8, 16, 32, 64, 128\}\); \(\tau\) is chose from \(\{10^{4}, 10^{3}, 10^{2}, 10^{1}\}\); l is chose from \(\{1, 2, 3, 4, 5\}\); b is chose from \(\{64, 128, 256, 512\}\); \(\lambda\), \(\lambda _d\), and \(\lambda _t\) are chose from \(\{10^{4}, 10^{3}, 10^{2}, 10^{1}, 1\}\). And the Adam optimizer is chose to optimize our objective function.
Results and analysis
Overall performance First of all, some experiments involved investigation to verify the performance of our methods on different data sets. Table 2 shows the AUC and AUPR scores obtained from all the methods under the setting \(CV_{dt}\).
As shown in Table 2, in most cases, performances of all our models are higher compared with the results of other baseline approaches on the same data set. Also, lNeuRank attains the best AUC and AUPR values over the large data sets (DrugBank, Enzymes, and Ion Channels). On DrugBank, Enzymes, and Ion Channels, in terms of AUC, lNeuRank achieves 2.81%, 5.21% and 2.86% higher than the best baseline method, DeepDTIs, respectively; and in terms of AUPR, lNeuRank achieves 0.94%, 1.14% and 0.18% higher than DeepDTIs, respectively. These results indicate that, in the large data sets, when using neural networks, our model makes high quality predictions.
From the results shown in Table 2, we conclude the following: (1) on the large data sets, lNeuRank >pNeuRank >NeuRank, which indicates that large data sets contain sufficient ranking information for our models to learn accurate features; (2) on the two smallest data sets (GPCRs and Nuclear Receptors), our models achieve worse results than DeepDTIs for these two cases, and a common trend in all cases is NeuRank >pNeuRank >lNeuRank. The best possible reason is that both data sets are too small to contain enough information to make a ranking comparison of DTIs; (3) PMF and CMF exhibit inferior performance on all data sets, indicating that the inner product is insufficient to capture the complex relations between drug and target; (4) BRDTI achieves higher AUPR values than CMF, and pNeuRank higher than NeuRank over all data sets, illustrating that adding pairwise information can boost the performance of the models; (5) on all data sets, RBM has the worst results, indicating that shallow networks without similar information do not make good predictions; (6) NeuRank and pNeuRank capture the nonlinear correlations of latent features via their deep learning strategies; therefore, NeuRank and pNeuRank generally outperform PMF and BRDTI, respectively. Because our models capture the nonliner correlations of the features, they consistently outperform all other baselines. In summary, within the same data set, our methods outperform other competitive approaches, which suggests that the deep learning technique is an effective tool to extract more meaningful features to detect true DTIs.
Effect of similarity information. Next, we study how similarity information benefits the prediction of DTIs under settings, \(CV_{nd}\). In this experiment, we set a same value for both \(\lambda _d\) and \(\lambda _t\). The results obtained under the setting, \(CV_{nd}\), for new drugs is shown in Table 3. The best results are shown in bold.
The results in Table 3 show that our methods, compared with other methods under different settings, yield optimal AUC and AUPR values, indicating that our method, with similarity information, achieves consistently accurate prediction results across all data sets. Compared with the performance in the setting \(CV_{dt}\), after including similarity metrics, our models, BPDTI, and CMF achieve comparable results in the setting \(CV_{nd}\), indicating that adding similarity information to the models is very effective for finding new DTIs. Therefore, it is clearly seen that considering multiple similarities is critical for optimal prediction performance.
To further illustrate the similarity information effects on the prediction of DTIs, we conducted experiments using the DrugBank data sets. In these experiments, we randomly selected one interaction of each drug as testing data and the remainder as training data. Then, we ranked all unobserved DTIs by our trained models. We compared NeuRank with its simplified version without similarity information and selected three examples. The experimental results are shown in Table 4.
From Table 5, it is seen that, compared with the simplified version without similarity information, the predictions of NeuRank, in all cases, are always more accurate. Without similarity information, not only does the previous method incorrectly predict a target in the top4 results in the first case, but also achieves worse results in the other cases. In summary, similarity regularization shows strong improvement over our method.
Effect of hidden layers depth (l). In addition, we studied the impact of hidden layers depth on the prediction of DTIs for our models. In this experiment, the number of hidden layers goes from one to five by step one under the setting, \(CV_{dt}\), on all data sets. Figure 4 shows the performance of AUC and AUPR as the number of depth is changed.
As seen in Fig. 4, on the large data sets, DrugBank and Enzymes, the performance of NueRank remains stable as depth increases; on the small data sets, Ion Channels, GPCRs and Nuclear Receptors, the performance of NueRank decreases as depth increases. Deep neural networks have a strong ability to express features; however, for the small data sets, too many parameters can easily lead to overfitting. Therefore, we conclude that a sensible number of hidden layers is indeed helpful for improving the model.
Effect of embedding size (k). Finally, we illustrate the effects different embedding sizes (latent feature sizes) have on prediction under the setting \(CV_{dt}\) in our proposed models. For simplicity, we conducted experiments on two largest data sets: DrugBank and Enzymes, and use AUC to evaluate. In this experiment, the embedding size was selected within the range \(\{8, 16, 32, 64, 128\}\). The effect embedding size has on the performance of our models is shown in Table 4.
As seen from Table 4, our methods achieve best results when \(k=32\). And k increases, there is a clear increasing trend in the AUC values until the maximum is reached at \(k=32\); then, at \(k=64\), there is a slight decrease. Thus, it is seen that an embedding size that is too large causes the model to be overfitting; an embedding size that is too small causes the model to be underfitting. Consequently, an appropriate size is important for the model to learn meaningful and accurate features and perform well.
Conclusion
Prediction of DTIs plays an import role in the drug discovery process. We proposed three novel methods, NeuRank, pNeuRank, and lNeuRank, to predict the interaction probability. Our models are neural network architectures, which have a powerful ability to effectively learn nonlinear and deep features for predicting DTIs. In addition, especially for new drugs and targets, some similarity information is added to our models for better performance. Experimental results show that, compared with baseline approaches, our methods achieve better performance and higher quality. What is more, our methods can provide useful hits for further biological study of drug discovery and development.
In future work, first, we plan to integrate more biological information to further improve our models; second, because similarity computation plays a critical role in learning accurate latent features, we plan to explore other nonlinear techniques to combine similarity matrices for drugs and targets; finally, for wider application, we will try to incorporate our models with other deep learning models.
Availability of data and materials
DrugBank Database is available at: http://www.drugbank.ca. Nuclear Receptors, GProteinCoupled Receptors (GPCRs), Ion Channels and Enzymes data sets, are available at: http://web.kuicr.kyotou.ac.jp/supp/yoshi/drugtarget/.
Notes
Abbreviations
 DTIs:

Drug–target interactions
 AUC:

Area Under the receiver operator characteristic curve
 AUPR:

Area under the precisionrecall curve
References
Ezzat A, Zhao P, Wu M, Li XL, Kwoh CK. Drugtarget interaction prediction with graph regularized matrix factorization. IEEE/ACM Trans Comput Biol Bioinf. 2016;14(3):646–56.
Sachdev K, Gupta MK. A comprehensive review of feature based methods for drug target interaction prediction. J Biomed Inform. 2019;93:103159.
You J, McLeod RD, Hu P. Predicting drugtarget interaction network using deep learning model. Comput Biol Chem. 2019;80:90–101.
Meng FR, You ZH, Chen X, Zhou Y, An JY. Prediction of drugtarget interaction networks from the integration of protein sequences and drug chemical structures. Molecules. 2017;22(7):1119.
Chen H, Zhang Z. A semisupervised method for drugtarget interaction prediction with consistency in networks. PLoS ONE. 2013;8(5):62975.
Shaikh N, Sharma M, Garg P. An improved approach for predicting drugtarget interaction: proteochemometrics to molecular docking. Mol BioSyst. 2016;12(3):1006–14.
Chen B, Li M, Wang J, Shang X, Wu FX. A fast and high performance multiple data integration algorithm for identifying human disease genes. BMC Med Genomics. 2015;8(3):1–11.
Volkamer A, Rarey M. Exploiting structural information for drugtarget assessment. Future Med Chem. 2014;6(3):319–31.
Liu Y, Wu M, Miao C, Zhao P, Li XL. Neighborhood regularized logistic matrix factorization for drugtarget interaction prediction. PLoS Comput Biol. 2016;12(2):1004760.
Che J, Chen L, Guo ZH, Wang S, et al. Drug target group prediction with multiple drug networks. Combin Chem High Throughput Screen. 2020;23(4):274–84.
Zhou M, Chen Y, Xu R. A drugside effect contextsensitive network approach for drug target prediction. Bioinformatics. 2019;35(12):2100–7.
Chen R, Liu X, Jin S, Lin J, Liu J. Machine learning for drugtarget interaction prediction. Molecules. 2018;23(9):2208.
Wang W, Yang S, Li J. Drug target predictions based on heterogeneous graph inference. In: Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 2013;53–64.
Zong N, Kim H, Ngo V, Harismendy O. Deep mining heterogeneous networks of biomedical linked data to predict novel drugtarget associations. Bioinformatics. 2017;33(15):2337–44.
Hao M, Bryant SH, Wang Y. Predicting drugtarget interactions by dualnetwork integrated logistic matrix factorization. Sci Rep. 2017;7(1):1–11.
Zhang W, Chen Y, Li D. Drugtarget interaction prediction through label propagation with linear neighborhood information. Molecules. 2017;22(12):2056.
Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. Computer. 2009;42(8):30–7.
Li K, Zhou X, Lin F, Zeng W, Wang B, Alterovitz G. Sparse online collaborative filtering with dynamic regularization. Inf Sci. 2019;505:535–48.
Cobanoglu MC, Liu C, Hu F, Oltvai ZN, Bahar I. Predicting drugtarget interactions using probabilistic matrix factorization. J Chem Inf Model. 2013;53(12):3399–409.
Mnih A, Salakhutdinov RR. Probabilistic matrix factorization. In: Advances in Neural Information Processing Systems, 2008;1257–1264.
Gönen M. Predicting drugtarget interactions from chemical and genomic kernels using bayesian matrix factorization. Bioinformatics. 2012;28(18):2304–10.
Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E. Deep learning for computer vision: a brief review. Comput Intell Neurosci. 2018;2018:1–13.
Simonyan K, Zisserman A. Very deep convolutional networks for largescale image recognition. Comput Vis Pattern Recognit, 2014;1556.
Deselaers T, Hasan S, Bender O, Ney H. A deep learning approach to machine transliteration. In: Proceedings of the 4th workshop on statistical machine translation, 2009;233–241.
Otter DW, Medina JR, Kalita JK. A survey of the usages of deep learning for natural language processing. IEEE Trans Neural Netw Learn Syst, 2020;1–21. https://doi.org/10.1109/TNNLS.2020.2979670
Chen M, Li Y, Zhou X. Conet: Cooccurrence neural networks for recommendation. Futur Gener Comput Syst. 2021;124:308–14.
Chen M, Zhou X. Deeprank: Learning to rank with neural networks for recommendation. KnowlBased Syst. 2020;209:106478.
Li K, Zhou X, Lin F, Zeng W, Alterovitz G. Deep probabilistic matrix factorization framework for online collaborative filtering. IEEE Access. 2019;7:56117–28.
Wen M, Zhang Z, Niu S, Sha H, Yang R, Yun Y, Lu H. Deeplearningbased drugtarget interaction prediction. J Proteome Res. 2017;16(4):1401–9.
Lu S, Chen H, Zhou X, Wang B, Wang H, Hong Q. Graphbased collaborative filtering with mlp. Math Prob Eng. 2018;2018.
Wang Y, Zeng J. Predicting drugtarget interactions using restricted Boltzmann machines. Bioinformatics. 2013;29(13):126–34.
Salakhutdinov R, Mnih A, Hinton G. Restricted Boltzmann machines for collaborative filtering. In: Proceedings of the 24th international conference on machine learning, 2007;791–798.
Gao KY, Fokoue A, Luo H, Iyengar A, Dey S, Zhang P. Interpretable drug target prediction using deep neural representation. In: Proceedings of the 27th international joint conference on artificial intelligence, 2018:2018;3371–3377.
AltaeTran H, Ramsundar B, Pappu AS, Pande V. Low data drug discovery with oneshot learning. ACS Cent Sci. 2017;3(4):283–93.
Peska L, Buza K, Koller J. Drugtarget interaction prediction: a Bayesian ranking approach. Comput Methods Programs Biomed. 2017;152:15–21.
Rendle S, Freudenthaler C, Gantner Z, SchmidtThieme L. Bpr: Bayesian personalized ranking from implicit feedback. In: Proceedings of the 25th conference on uncertainty in artificial intelligence, 2012;452–461.
Yuan Q, Gao J, Wu D, Zhang S, Mamitsuka H, Zhu S. Drugerank: improving drugtarget interaction prediction of new candidate drugs or targets by ensemble learning to rank. Bioinformatics. 2016;32(12):18–27.
Zheng X, Ding H, Mamitsuka H, Zhu S. Collaborative matrix factorization with multiple similarities for predicting drugtarget interactions. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, 2013;1025–1033.
Wang L, You ZH, Chen X, Xia SX, Liu F, Yan X, Zhou Y, Song KJ. A computationalbased method for predicting drugtarget interactions by using stacked autoencoder deep neural network. J Comput Biol. 2018;25(3):361–73.
Cheng F, Liu C, Jiang J, Lu W, Li W, Liu G, Zhou W, Huang J, Tang Y. Prediction of drugtarget interactions and drug repositioning via networkbased inference. PLoS Comput Biol. 2012;8(5):1002503.
Zhang W, Yue X, Lin W, Wu W, Liu R, Huang F, Liu F. Predicting drugdisease associations by using similarity constrained matrix factorization. BMC Bioinformatics. 2018;19(1):1–12.
Van Laarhoven T, Marchiori E. Predicting drugtarget interactions for new drug compounds using a weighted nearest neighbor profile. PLoS ONE. 2013;8(6):66952.
He X, Liao L, Zhang H, Nie L, Hu X, Chua TS. Neural collaborative filtering. In: Proceedings of the 26th international conference on world wide web, 2017;173–182 .
Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z, Assempour N, Iynkkaran I, Liu Y, Maciejewski A, Gale N, Wilson A, Chin L, Cummings R, Le D, Pon A, Knox C, Wilson M. Drugbank 5.0: a major update to the drugbank database for 2018. Nucleic acids research 46(D1), 2018;1074–1082.
Kanehisa M, Goto S, Hattori M, AokiKinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M. From genomics to chemical genomics: new developments in kegg. Nucleic acids research 34(suppl\_1), 2006;354–357
Schomburg I, Chang A, Ebeling C, Gremse M, Heldt C, Huhn G, Schomburg D. Brenda, the enzyme database: updates and major new developments. Nucl Acids Res 32(suppl\_1), 2004;431–433
Günther S, Kuhn M, Dunkel M, Campillos M, Senger C, Petsalaki E, Ahmed J, Urdiales EG, Gewiess A, Jensen LJ, et al. Supertarget and matador: resources for exploring drugtarget relationships. Nucl Acids Res 36(suppl\_1), 2007;919–922 .
Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, Gautam B, Hassanali M. Drugbank: a knowledgebase for drugs, drug actions and drug targets. Nucl Acids Res 36(suppl\_1), 2008;901–906.
Rose PW, Prlić A, Altunkaya A, Bi C, Bradley AR, Christie CH, Costanzo LD, Duarte JM, Dutta S, Feng Z et al. The rcsb protein data bank: integrative view of protein, gene and 3d structural information. Nucl Acids Res, 2016;1000.
Yamanishi Y, Araki M, Gutteridge A, Honda W, Kanehisa M. Prediction of drugtarget interaction networks from the integration of chemical and genomic spaces. Bioinformatics. 2008;24(13):232–40.
Hattori M, Okuno Y, Goto S, Kanehisa M. Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. J Am Chem Soc. 2003;125(39):11853–65.
Smith TF, Waterman MS, et al. Identification of common molecular subsequences. J Mol Biol. 1981;147(1):195–7.
Acknowledgements
The authors thank Michael McAllister for proofreading this paper.
Funding
No funding was obtained for this study.
Author information
Authors and Affiliations
Contributions
FL and WZ initialized the research project and designed the experiments; XW performed the experiments and wrote the paper; XZ designed software and performed the experiments. All authors reviewed the manuscript. All authors read approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Wu, X., Zeng, W., Lin, F. et al. NeuRank: learning to rank with neural networks for drug–target interaction prediction. BMC Bioinformatics 22, 567 (2021). https://doi.org/10.1186/s1285902104476y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1285902104476y