Skip to main content


L2,1-GRMF: an improved graph regularized matrix factorization method to predict drug-target interactions

Article metrics

  • 523 Accesses



Predicting drug-target interactions is time-consuming and expensive. It is important to present the accuracy of the calculation method. There are many algorithms to predict global interactions, some of which use drug-target networks for prediction (ie, a bipartite graph of bound drug pairs and targets known to interact). Although these algorithms can predict some drug-target interactions to some extent, there is little effect for some new drugs or targets that have no known interaction.


Since the datasets are usually located at or near low-dimensional nonlinear manifolds, we propose an improved GRMF (graph regularized matrix factorization) method to learn these flow patterns in combination with the previous matrix-decomposition method. In addition, we use one of the pre-processing steps previously proposed to improve the accuracy of the prediction.


Cross-validation is used to evaluate our method, and simulation experiments are used to predict new interactions. In most cases, our method is superior to other methods. Finally, some examples of new drugs and new targets are predicted by performing simulation experiments. And the improved GRMF method can better predict the remaining drug-target interactions.


With advances in drug discovery technologies, the existing methods can identify drug targets to some extent. But drug development is a high-cost, inefficient problem [1]. For drug developers, there has been a great deal of interest in the repositioning of drugs. This repositioning has some potential to reduce risk time and cost [2]. A crucial element for the repositioning of medicines is online biological databases such as KEGG [3], DrugBank [4], STITCH [5] and ChEMBL [6], which store a large number of current drug-target interactions. It is worth noting that there are still many interactions that have not been found [7]. Therefore, the advances of drug-target prediction technology is accelerated, and more and more prediction methods are proposed [8]. These computations, which reasonably predict new and unexplored interactions, have greatly facilitated the drug discovery process, making the process more credible. Recent research shows that there are three popular methods for predicting drug-target interactions, such as ligand-based methods [9], docking-based methods [10], and chemogenomic approaches [11]. Of course, we can also use the opposition-based learning particle swarm optimization to predict interactions, such as SNP-SNP interactions [12]. Moreover, the potential gene-gene interactions network can be identified by LNDriver [13].

Recently, many researchers have used matrix decomposition methods to solve drug-target interaction problems. The main methods are Bayesian matrix factorization, KBMF2K [14] and collaborative matrix factorization method, CMF [15]. A high-dimensional drug-target interaction matrix is decomposed into a plurality of low-dimensional matrices, and these matrices have characteristics of the original matrices, which is the principle of these methods. However, in theories, the above methods of matrix factorization still have some room for improvement. [16].

Using chemogenomic approaches to predict drug-target interactions is an effective method. The reason is that the first two methods have their own drawbacks. If a docking simulation is used, the three-dimensional structure of a target protein must be available. Furthermore, for ligand-based methods, if there are few or no target proteins known, this would be a problem that cannot be ignored. [9]. The advantage of using chemical genomics is that the information from the drugs and targets is used simultaneously for prediction [17]. New interactions are inferred by calculating the similarity of the chemical structures between drugs and the similarity of the genomic sequences between the targets. In this paper, the drug similarity and the target similarity are based on the construction methods in previous studies, which are based on the characteristics of the drug and the characteristics of the target. Its advantage is that we are better able to compare it with other methods, which is universal. However, if the same construction method of the drug similarity and the target similarity is used, this may affect the final results.

Two separate models are used to train drug target pairs, one based on the drug side and the other based on the target side. Thus, the final results are solved by predicting these two aspects. In this paper, to avoid over-fitting and sparing the target, the L2,1-norm is added in our method, which can eliminate some unattached target pairs [18]. Ten-fold cross-validation is used to evaluate the performance of our method.

We present the experimental results in Results. In Datasets, we conducted a case study. And we summarize this paper in Cross-validation experiments. In Interaction prediction under CVd, we clearly introduced the methods, including specific iteration formulas and algorithms.



Four datasets are used to experiment: the nuclear receptor (NR), the G protein-coupled receptor (GPCR), the ion channel (IC) and the enzyme (E). The size of these four datasets is different. Nuclear receptors are one of the most abundant transcriptional regulators in metazoans. NR includes some steroid hormones, vitamin D and quinone. In recent years, nuclear receptors have received widespread attention. For example, they are closely related to the development of diseases such as diabetes and fatty liver. Among them, PPAR-g agonist thiazolidinedione rosiglitazone can effectively improve insulin sensitivity in diabetic patients. GPCRs are one of the target enzymes that are important proteins in cell signaling and have so far been found as therapeutic drugs. The total number of targets is about 500, and GPCR targets account for the vast majority of receptors therein. In recent years, indications for targeting GPCR drugs are expanding from traditional areas such as allergies, hypertension, anesthesia and schizophrenia to new areas such as obesity. An ion channel is a pore-forming protein that traverses the channel by allowing an ion of a particular type to rely on an electrochemical gradient. ICs are small pores in the cell membrane that allow ions to enter and exit the cell. Therefore, most of them have become the targets of some mainstream drugs. Enzymes are macromolecular biocatalysts. Some common drugs use enzymes as targets, and some effects on enzymes such as inhibition, induction, activation or reactivation are exerted. In addition, drugs like this are mostly enzyme inhibitors. According to statistics, half of the top 20 drugs in the world are enzyme inhibitors. It is worth noting that some drugs are enzymes themselves, such as pepsin and trypsin.

Each dataset contains three matrices, Y, Sd and St. Matrix Y represents the drug-target interactions. It is worth noting that this matrix is an adjacency matrix. If it is known that the drug di is related to the target tj, Yij is 1, otherwise Yij is 0. The matrix Sd represents the chemical pairing structural similarity [19] and the matrix St represents the genome sequence similarity of the target pair [20]. Table 1 lists the specific information for the four datasets. More information about the datasets are published in

Table 1 Drugs, Targets, and Interactions in Each Dataset

Cross-validation experiments

We compare the existing matrix decomposition methods CMF (Collaborative matrix factorization), GRMF (Graph regularized matrix factorization), WGRMF (Weighted graph regularized matrix factorization) and our proposed method and compare WKNKN preprocessing on these methods. We use cross-validation experiments on these methods. In this paper, we use a ten-fold cross-validation (CV). The original dataset Y is divided into ten subsets, each of which is tested once and the rest as a training set. The cross-validation is repeated five times, one subset is selected each time as a test set, and the average cross-validation recognition accuracy rate of five times is taken as a result.

To verify the effect of the prediction, we use the evaluation index which has been widely used before, the AUPR (Area under the Precision-Recall curve) [21]. There is also an evaluation scale called AUC (Area under the receiver operating characteristic curve). We can use this method when forecasting. In our experiments, ten AUPR values are calculated for each ten-fold cross-validation, an average is obtained and we repeat five times, so we take the average of the five AUPRs as the final result [22]. In general, the AUPR value is less than the AUC value. The AUPR value is above 0.3, so the experimental results are reasonable.

We test two aspects [23], one is CVd which is based on the drug-interaction profiles and the other is CVt, which is based on the target-interaction profiles. CVd is used to test the ability to predict new drugs, CVt is used to test the ability to predict new targets. In addition, we perform a convergence analysis of each method using the NR and GPCR datasets as examples, and each method is subjected to 100 iterations. When the number of iterations is about 20, our method achieves convergence. It is worth noting that we have different tolerances for errors, considering the size and type of the datasets. Generally speaking, as long as the error is within a reasonable range, this is acceptable. Figures 1 and 2 show the convergence of different methods on the NR and GPCR datasets, respectively.

Fig. 1

Comparison of convergence about three methods on the NR dataset

Fig. 2

Comparison of convergence about three methods on the GPCR dataset

Interaction prediction under CVd

Table 2 lists the experimental results at CVd. And Standard deviations are given in parentheses. Under the NR dataset, the L2,1-GRMF (L2,1-norm Graph regularized matrix factorization) method is superior to the GRMF method and is almost the same as the GRMF method after adding the WKNKN. Importantly, our improved method L2,1-GRMF, with the addition of WKNKN, has seen significant improvements. Moreover, after adding the weight matrix to L2,1-GRMF and using WKNKN, the accuracy of prediction is also improved. Figure 3 shows the PR curves on the CVd side of each method on the NR dataset.

Table 2 AUPR Results for Interaction Prediction Under CVd
Fig. 3

PR curves for different methods are plotted together, providing a visual comparison between their prediction performances. The PR curves on the CVd side of each method on the NR dataset. a WKNKN is not used, the PR curves for each method. b WKNKN is used, the PR curves for each method

However, on the GPCR dataset, we run our method and find that it is not outperform the previous method, and initially estimate that there is a problem with the dataset itself. Figure 4 shows the PR curves on the CVd side of each method on the GPCR dataset. We observe that using the weight matrix when performing CVd experiments is higher than the AUPR value obtained without using the weight matrix. In addition, the L2,1-WGRMF (Weighted L2,1-norm graph regularized matrix factorization) method using WKNKN is superior to any other method in the IC dataset, slightly better than the WGRMF method using WKNKN. Figure 5 shows the PR curves on the CVd side of each method on the IC dataset. In the E dataset, the best method is L2,1-WGRMF but the AUPR score drops instead after applying WKNKN. In other words, in the E dataset, the preprocessing step will actually have a negative effect on the forecast result. Figure 6 shows the PR curves on the CVd side of each method on the E dataset. In general, not all methods use WKNKN to improve AUPR scores, which have a positive effect on most datasets and negative effects on some datasets. In practice, the negative impact of the WKNKN method is unavoidable on some datasets. One important reason is that the WKNKN method assigns an inaccurate value to the 0 element of the matrix Y on the E dataset. When we add the L2,1-GRMF method to make more accurate predictions, these inaccurate values will reduce the prediction accuracy.

Fig. 4

PR curves for different methods are plotted together, providing a visual comparison between their prediction performances. The PR curves on the CVd side of each method on the GPCR dataset. a WKNKN is not used, the PR curves for each method. b WKNKN is used, the PR curves for each method

Fig. 5

PR curves for different methods are plotted together, providing a visual comparison between their prediction performances. The PR curves on the CVd side of each method on the IC dataset. a WKNKN is not used, the PR curves for each method. b WKNKN is used, the PR curves for each method

Fig. 6

PR curves for different methods are plotted together, providing a visual comparison between their prediction performances. The PR curves on the CVd side of each method on the E dataset. a WKNKN is not used, the PR curves for each method. b WKNKN is used, the PR curves for each method

Interaction prediction under CVt

We can see in Table 3 that under most datasets, the AUPR value of CVt is generally higher than the AUPR value of CVd. This shows that hiding the interactions of the target can still get a better prediction result. But hiding the drug interactions and the prediction result will be greatly reduced. And standard deviations are given in parentheses. It is worth noting that in most datasets, the CMF method has lower AUPR values than any other method, and its AUPR value is far less than our method, especially in the NR dataset.

Table 3 AUPR Results for Interaction Prediction Under CVt


Among the NR, GPCR and IC datasets, the superior methods are the L2,1-GRMF method using the preprocessing steps, and our improved method has some improvement on all three datasets. Figures 7, 8, 9 and 10 show the PR curves on the CVt side of each method on the NR, GPCR, IC and E datasets, respectively. On the E dataset, it is still the best GRMF method. We can also see that some instances are ignored after using the weight matrix, whereas the GRMF method does not use the weight matrix W. Therefore, based on the previous conclusions, the information of the target is more important than the information of the drug. Therefore, using the GRMF method, the AUPR value is higher than the AUPR value using WGRMF.

Fig. 7

PR curves for different methods are plotted together, providing a visual comparison between their prediction performances. The PR curves on the CVt side of each method on the NR dataset. a WKNKN is not used, the PR curves for each method. b WKNKN is used, the PR curves for each method

Fig. 8

PR curves for different methods are plotted together, providing a visual comparison between their prediction performances. The PR curves on the CVt side of each method on the GPCR dataset. a WKNKN is not used, the PR curves for each method. b WKNKN is used, the PR curves for each method

Fig. 9

PR curves for different methods are plotted together, providing a visual comparison between their prediction performances. The PR curves on the CVt side of each method on the IC dataset. a WKNKN is not used, the PR curves for each method. b WKNKN is used, the PR curves for each method

Fig. 10

PR curves for different methods are plotted together, providing a visual comparison between their prediction performances. The PR curves on the CVt side of each method on the E dataset. a WKNKN is not used, the PR curves for each method. b WKNKN is used, the PR curves for each method

On most datasets, the L2,1-norm does play a key role in predicting the results. The L2,1-norm can provide a sparse solution for the final result. Compared with the CMF method, the L2,1-norm also promotes the final convergence. Therefore, the overall performance of the L2,1-GRMF method and L2,1-WGRMF is superior to other methods.

Case study

In this section, we conduct a simulation experiment. First, we erase some of the known drug targets in the original dataset. That is, those elements that are originally 1 in the original matrix become 0. This process is performed randomly by the computer. In the second step, we perform the experiment. We examine the results of the experiment and see if the erased condition is successfully predicted.

The experimental procedure we implement is that in the NR dataset, ten drugs with the interaction of the target estrogen receptor alpha (KEGG ID: hsa2099) are removed. This target is the main cause of breast cancer. After the experiment is done, we count the experimental results. We predict five of the hidden interactions. At the same time, we also predict a portion of new drugs and take the most reliable top five new drugs stated in Table 4. Among them, the sixth drug Testosterone is the drug with the highest correlation with this target.

Table 4 Predicted Drugs for estrogen receptor alpha, NR Dataset

In IC dataset, for the drug Diazoxide (KEGG ID: D00294), a blood pressure lowering drug. We also use a similar approach. Before using the L2,1-GRMF method, we eliminate twenty of them in the matrix Y. Because the GPCR dataset is larger than the NR dataset and there are many targets associate with this drug, we have removed twenty interactions here. After conducting simulation experiments, we successfully predicted twelve known targets and eight new targets. We then list the top twenty targets in Table 5. The first 12 are known targets and the remaining part is our prediction of a new target.

Table 5 Predicted Targets for Diazoxide, IC Dataset

For these two cases, the similarity of the estrogen receptor alpha to its nearest neighbor target is less than 0.02 in the matrix St. In the matrix Sd, the similarity of Diazoxide to its nearest neighbor is 0.3, which is also quite low. Therefore, we are more difficult to make predictions. Thus, this shows that our proposed L2,1-GRMF method is excellent and reliable results can be obtained when predicting some challenging drugs and targets. Of course, there are still some limitations to the two methods proposed. If we add a weight matrix, the time required for the experiment will multiply. Compared with other methods, our time complexity is relatively high. In addition, the method does not predict new drugs and new targets without any interaction.


In this paper, we propose two improved matrix decomposition methods, L2,1-GRMF and L2,1-WGRMF. Both methods are used to predict drug-target interactions. We use cross-validation to calculate AUPR values and predict on the drug side (CVd) and the target side (CVt), respectively. We compare them with the most advanced matrix factorization methods currently available. In most cases, our improved methods can provide the best results, which means that the predictive performance is improved with the use of the L2,1-norm.

WKNKN preprocessing steps are used to help the experimental results. In addition, it can also be used as an independent method to predict the interactions of drug-target. Considering that the dimensions of the data are relatively small, so the drug-target interactions contained in each dataset are also limited. And our approach applies to these datasets.

In the future, we expect more and more known interaction of drug targets will be found, providing more valuable datasets for our prediction. We will explore more effective prediction methods to solve drug-target interaction problems. For example, we can use matrix factorization of hyper-graph method to improve the reliability of predictive interactions.



Co-matrix factorization is an effective method to predict the interactions of drug-target [15]. The objective function of CMF method is

$$ {\min}_{\mathbf{A},\mathbf{B}}={\left\Vert \mathbf{Y}-{\mathbf{A}\mathbf{B}}^{\mathrm{T}}\right\Vert}_F^2+{\lambda}_l\left({\left\Vert \mathbf{A}\right\Vert}_F^2+{\left\Vert \mathbf{B}\right\Vert}_F^2\right)+{\lambda}_d{\left\Vert {\mathbf{S}}^{\mathbf{d}}-{\mathbf{A}\mathbf{A}}^{\mathrm{T}}\right\Vert}_F^2+{\lambda}_t{\left\Vert {\mathbf{S}}^{\mathbf{t}}-{\mathbf{BB}}^{\mathrm{T}}\right\Vert}_F^2, $$

where W represents a weight matrix, Wij = 1 when Yij is known, Wij = 0 otherwise. Obviously, the last two items of the objective function are regularization terms. We use L to represent the objection function in Eq. (1), ai represents the i-th vector of A, and bj represents the j-th vector of B. Two update rules are used to solve ∂L/∂a = 0 and ∂L/∂b = 0. Finally, the two update rules are executed using least square until convergence:

$$ \mathbf{A}=\left(\mathbf{YB}+{\lambda}_d{\mathbf{S}}^{\mathbf{d}}\mathbf{A}\right){\left({\mathbf{B}}^{\mathrm{T}}\mathbf{B}+{\lambda}_l{\mathbf{I}}_{\mathrm{k}}+{\lambda}_d{\mathbf{AA}}^{\mathrm{T}}\right)}^{-1}, $$
$$ \mathbf{B}=\left({\mathbf{Y}}^{\mathrm{T}}\mathbf{A}+{\lambda}_t{\mathbf{S}}^{\mathbf{t}}\mathbf{B}\right){\left({\mathbf{A}}^{\mathrm{T}}\mathbf{A}+{\lambda}_l{\mathbf{I}}_{\mathrm{k}}+{\lambda}_t{\mathbf{B}}^{\mathrm{T}}\mathbf{B}\right)}^{-1}. $$

In summary, after the potential feature matrices A and B are updated, the predicted score matrix can be obtained by multiplying A and B. This predicted score matrix can be used to predict new drug-target interactions by comparing with the original drug-target interactions matrix Y.


In the GRMF method, the benefits of regularization items is that it can avoid over-fitting [20]. The objective function of GRMF is as follows:

$$ {\min}_{\mathbf{A},\mathbf{B}}={\left\Vert \mathbf{Y}-{\mathbf{A}\mathbf{B}}^{\mathrm{T}}\right\Vert}_F^2+{\lambda}_l\left({\left\Vert \mathbf{A}\right\Vert}_F^2+{\left\Vert \mathbf{B}\right\Vert}_F^2\right)+{\lambda}_d\mathrm{Tr}\left({\mathbf{A}}^{\mathrm{T}}\overset{\sim }{{\mathbf{L}}_{\mathbf{d}}}\mathbf{A}\right)+{\lambda}_t\mathrm{Tr}\left({\mathbf{B}}^{\mathrm{T}}\overset{\sim }{{\mathbf{L}}_{\mathbf{t}}}\mathbf{B}\right), $$

Then, matrix A and B are initialized. The SVD (singular value decomposition) method is used to decompose matrix YRn × m into URn × k, SkRk × k, and VRn × k. In matrix Y, the largest possible number of singular values is min(n, m), so k max  = min(n, m). Finally, the square root of Sk can be obtained, where \( \mathbf{A}={\mathbf{US}}_{\mathbf{k}}^{1/2} \), \( \mathbf{B}={\mathbf{VS}}_{\mathbf{k}}^{1/2} \).

Next, the least square method is used to update A and B. This objective function in Eq. (4) can be replaced by L. These two update rules are used to solve ∂L/∂a = 0 and ∂L/∂b = 0. Finally, the two update rules are executed by using least square until convergence.


Like CMF, the weight matrix W in WGRMF is the same as W in CMF. Behind the weight matrix, either to prevent unknown interactions, the purpose is to help find the latent feature matrix A and B. The objective function of WGRMF method is as follows

$$ {\min}_{\mathbf{A},\mathbf{B}}={\left\Vert \mathbf{W}\odot \left(\mathbf{Y}-{\mathbf{A}\mathbf{B}}^{\mathrm{T}}\right)\right\Vert}_F^2+{\lambda}_l\left({\left\Vert \mathbf{A}\right\Vert}_F^2+{\left\Vert \mathbf{B}\right\Vert}_F^2\right)+{\lambda}_d\mathrm{Tr}\left({\mathbf{A}}^{\mathrm{T}}\overset{\sim }{{\mathbf{L}}_{\mathbf{d}}}\mathbf{A}\right)+{\lambda}_t\mathrm{Tr}\left({\mathbf{B}}^{\mathrm{T}}\overset{\sim }{{\mathbf{L}}_{\mathbf{t}}}\mathbf{B}\right). $$

This objective function in Eq. (5) can be replaced by L, where ai represents the i-th vector of A, and bj represents the j-th vector of B. These two update rules are used to solve ∂L/∂a = 0 and ∂L/∂b = 0. Finally, the two update rules are executed by using least square until convergence. However, it is worth noting that the update rules here are not the same as the update rules in GRMF. In GRMF, the rules are matrix updates, but in WGRMF the rules are row updates.

Our proposed methods

Here, our improved approach is used to solve the prediction of drug-target interactions problem. WKNKN (weighted K nearest known neighbors) [20] as a preprocessing step is used to solve unknown missing value problems. Two methods are proposed, Graph Regularization Matrix factorization based on L2,1-norm, and a variant called L2,1-WGRMF, both of which are used to predict drug-target interactions. Figure 11 shows a flow chart of the proposed method.

Fig. 11

A brief flow chart of the L2,1-GRMF method. It includes the process of inputting the original datasets to the final generation of the predicted score matrix


Sparsification of the drug similarity matrix and target similarity matrix

Graph regularization terms are used to fully consider the internal structure of the similarity matrix Sd and St. In addition, the graph regularization terms can keep the internal structure of the matrices unchanged. We derive a p-nearest neighbor graph from each drug and target similarity matrix [24] Sd and St in this work. Therefore, given a drug similarity matrix Sd, a p-nearest neighbor graph [25] N can be generated as

$$ {\forall}_{i,j},\kern0.5em {N}_{ij}=\left\{\begin{array}{l}1,\kern1.4em j\in {N}_p(i)\kern0.3em \&\kern0.3em i\in {N}_p(j)\kern0.1em \\ {}0,\kern1.3em j\notin {N}_p(i)\kern0.3em \&\kern0.3em i\notin {N}_p(j)\\ {}0.5,\kern0.7em otherwise,\end{array}\right. $$

where N is used to sparsify the matrix Sd, which can be written as

$$ {\forall}_{i,j},\kern0.3em \hat{S}={N}_{ij}{S}_{ij}^d. $$

This result is for a sparse drug similarity matrix. Similarly, the target similarity matrix St can be obtained in the same way. We use the Euclidean distance to calculate the nearest neighbor. In general, Euclidean distance will give better results because it represents the true distance.

Graph regularization helps to facilitate the study the manifold from learning drugs and target spaces. In the original space, there are points that are close to each other, and when the manifold learning is performed, the points are also close to each other in learning.

Low-rank approximation

The idea of low rank approximation (LRA) is applied to GRMF [26]. It decomposes the target matrix Y into two low-rank latent feature matrices A and B, i.e., Y ≈ ABT [27]. And the objective function of GRMF can be written as the following optimization problem:

$$ {\min}_{\mathbf{A},\mathbf{B}}={\left\Vert \mathbf{Y}-{\mathbf{A}\mathbf{B}}^{\mathrm{T}}\right\Vert}_F^2, $$

where F is Frobenius norm. In addition, the number of potential features of A and B is represented by k.


In general, the Tikhonov and graph regularization terms can be used to avoid over-fitting and enhance generalization capability. Here is the objective function of L2,1-GRMF:

$$ {\min}_{\mathbf{A},\mathbf{B}}={\left\Vert \mathbf{Y}-{\mathbf{A}\mathbf{B}}^{\mathrm{T}}\right\Vert}_F^2+{\lambda}_l\left({\left\Vert \mathbf{A}\right\Vert}_F^2+{\left\Vert \mathbf{B}\right\Vert}_F^2\right)+{\lambda}_l{\left\Vert \mathbf{B}\right\Vert}_{2,1}+{\lambda}_d\sum \limits_{i,r=1}^n\hat{S_{ir}^d}{\left\Vert {a}_i-{a}_r\right\Vert}^2+{\lambda}_t\sum \limits_{j,q=1}^m\hat{S_{jq}^t}{\left\Vert {b}_j-{b}_q\right\Vert}^2, $$

where λl, λd and λt are positive parameters, ai is the i-th rows of A, and bj is the j-th rows of B, n is the number of drugs, and m is the number of targets. The first term is an approximate model of the matrix Y. The second term is the Tikhonov regularization. Its main purpose is to minimize the norms of A, B. The third term is the L2,1-norm applied on B to increase the target matrix sparsity and discard unwanted target pairs. Considering that we are more concerned with certain drugs, we use the L2,1-norm to sparse the potential feature matrix of the target, so that we can better predict new drugs. However, while the L2,1-norm is added to A, some of the more important drugs may be lost. The last two terms are graph regularization of drugs and targets, respectively. Moreover, the drug-target model can be rewritten as:

$$ {\min}_{\mathbf{A},\mathbf{B}}={\left\Vert \mathbf{Y}-{\mathbf{A}\mathbf{B}}^{\mathrm{T}}\right\Vert}_F^2+{\lambda}_l\left({\left\Vert \mathbf{A}\right\Vert}_F^2+{\left\Vert \mathbf{B}\right\Vert}_F^2\right)+{\lambda}_l{\left\Vert \mathbf{B}\right\Vert}_{2,1}+{\lambda}_d\mathrm{Tr}\left({\mathbf{A}}^{\mathrm{T}}{\mathbf{L}}_{\mathbf{d}}\mathbf{A}\right)+{\lambda}_t\mathrm{Tr}\left({\mathbf{B}}^{\mathrm{T}}{\mathbf{L}}_{\mathbf{t}}\mathbf{B}\right), $$

where Tr() is the trace of the matrix, \( {\mathbf{L}}_{\mathbf{d}}={\mathbf{D}}^{\mathbf{d}}-\hat{{\mathbf{S}}^{\mathbf{d}}} \) is the graph Laplacian for \( \hat{{\mathbf{S}}^{\mathbf{d}}} \), \( {\mathbf{L}}_{\mathbf{t}}={\mathbf{D}}^{\mathbf{t}}-\hat{{\mathbf{S}}^{\mathbf{t}}} \) is the graph Laplacian for \( \hat{{\mathbf{S}}^{\mathbf{t}}} \). Please refer to [28] for more details on rewriting graph regularization. We know that the known normalized Laplacian is better than unknown, so we replace Ld and Lt with \( \overset{\sim}{{\mathbf{L}}_{\mathbf{d}}}={\left({\mathbf{D}}^{\mathbf{d}}\right)}^{-1/2}{\mathbf{L}}_{\mathbf{d}}{\left({\mathbf{D}}^{\mathbf{d}}\right)}^{-1/2} \) and \( \overset{\sim}{{\mathbf{L}}_{\mathbf{t}}}={\left({\mathbf{D}}^{\mathbf{t}}\right)}^{-1/2}{\mathbf{L}}_{\mathbf{t}}{\left({\mathbf{D}}^{\mathbf{t}}\right)}^{-1/2} \). The function can be written as:

$$ {\min}_{\mathbf{A},\mathbf{B}}={\left\Vert \mathbf{Y}-{\mathbf{A}\mathbf{B}}^{\mathrm{T}}\right\Vert}_F^2+{\lambda}_l\left({\left\Vert \mathbf{A}\right\Vert}_F^2+{\left\Vert \mathbf{B}\right\Vert}_F^2\right)+{\lambda}_l{\left\Vert \mathbf{B}\right\Vert}_{2,1}+{\lambda}_d\mathrm{Tr}\left({\mathbf{A}}^{\mathrm{T}}\overset{\sim }{{\mathbf{L}}_{\mathbf{d}}}\mathbf{A}\right)+{\lambda}_t\mathrm{Tr}\left({\mathbf{B}}^{\mathrm{T}}\overset{\sim }{{\mathbf{L}}_{\mathbf{t}}}\mathbf{B}\right). $$

We use the minimization of the objective function to predict the outcome of the interactions, but this could lead to unsatisfactory results. Because there are many zeros that have not been found. Therefore, we use WKNKN pre-processing method to solve this problem.

Initialization of A and B

For the input matrix Y, SVD (Singular Value Decomposition) method is used to obtain the initial value of matrix A and matrix B:

$$ \left[\mathbf{U},\mathbf{S},\mathbf{V}\right]= SVD\left(\mathbf{Y},k\right),\mathbf{A}={\mathbf{US}}_{\mathbf{k}}^{\mathbf{1}/\mathbf{2}},\mathbf{B}={\mathbf{VS}}_{\mathbf{k}}^{\mathbf{1}/\mathbf{2}}. $$

Among them, Sk is a diagonal matrix and contains the k largest singular values. In matrix Y, the number of singular values is kmax = min(n, m). According to the SVD method, kmax is the maximum possible number.

Optimization algorithm

In this paper, we can update A and B by using the least square method. Let the partial derivative of A be equal to 0, the partial derivative of B be equal to 0, the objective function in Eq. (11) can be replaced by L, that is, ∂L/A = 0 and ∂L/B = 0. The two update rules are executed by using least square until convergence. When we perform the L2,1-GRMF method, λl, λd and λt are determined by the cross-validation on the training set to the optimal parameter values. We use grid search, λl {2−2, 2−1, 20, 21}. Then we choose the optimal parameters from this set. Derivation process is as follows:

$$ \mathbf{A}=\left(\mathbf{YB}-{\lambda}_d\overset{\sim }{{\mathbf{L}}_{\mathbf{d}}}\mathbf{A}\right){\left({\mathbf{B}}^{\mathrm{T}}\mathbf{B}+{\lambda}_l{\mathbf{I}}_{\mathbf{k}}\right)}^{-1}, $$
$$ \mathbf{B}=\left({\mathbf{Y}}^{\mathrm{T}}\mathbf{A}-{\lambda}_t\overset{\sim }{{\mathbf{L}}_{\mathbf{t}}}\mathbf{B}\right){\left({\mathbf{A}}^{\mathrm{T}}\mathbf{A}+{\lambda}_l{\mathbf{I}}_{\mathbf{k}}+{\lambda}_l{\mathbf{DI}}_{\mathbf{k}}\right)}^{-1}, $$

where D is a diagonal matrix with the i-th diagonal element as dii = 1/2(B)i2. The specific algorithm of L2,1-GRMF is as follows:



A variant of L2,1-GRMF, called L2,1-WGRMF, is obtained here by adding a weight matrix W to the L2,1-GRMF. The advantage is that it helps to determine the latent feature matrices A and B of the drug-target matrix Y. So, we write the objective function that contains W as follows:

$$ {\min}_{\mathbf{A},\mathbf{B}}={\left\Vert \mathbf{W}\odot \left(\mathbf{Y}-{\mathbf{A}\mathbf{B}}^{\mathrm{T}}\right)\right\Vert}_F^2+{\lambda}_l\left({\left\Vert \mathbf{A}\right\Vert}_F^2+{\left\Vert \mathbf{B}\right\Vert}_F^2\right)+{\lambda}_l{\left\Vert \mathbf{B}\right\Vert}_{2,1}+{\lambda}_d\mathrm{Tr}\left({\mathbf{A}}^{\mathrm{T}}{\mathbf{L}}_{\mathbf{d}}\mathbf{A}\right)+{\lambda}_t\mathrm{Tr}\left({\mathbf{B}}^{\mathrm{T}}{\mathbf{L}}_{\mathbf{t}}\mathbf{B}\right). $$

Let objective function be set to F such that ∂F/∂ai = 0 and ∂F/∂bj = 0. The update rules are used to obtain A and B until convergence

$$ {\forall}_i=1\dots n,\kern0.9000001em {a}_i=\left(\sum \limits_{j=1}^m{W}_{ij}{Y}_{ij}{b}_j-{\lambda}_d{\left(\overset{\sim }{{\mathbf{L}}_{\mathbf{d}}}\right)}_{i\ast}\mathbf{A}\right){\left(\sum \limits_{j=1}^m{W}_{ij}{b}_j^{\mathrm{T}}{b}_j+{\lambda}_l{\mathbf{I}}_{\mathbf{k}}\right)}^{-1}, $$
$$ {\forall}_j=1\dots m,\kern0.8000001em {b}_j=\left(\sum \limits_{i=1}^n{W}_{ij}{Y}_{ij}{a}_i-{\lambda}_t{\left(\overset{\sim }{{\mathbf{L}}_t}\right)}_{j\ast}\mathbf{B}\right){\left(\sum \limits_{i=1}^n{W}_{ij}{a}_i^{\mathrm{T}}{a}_i+{\lambda}_l{\mathbf{I}}_{\mathbf{k}}+{\lambda}_l{\mathbf{DI}}_{\mathbf{k}}\right)}^{-1}. $$



Area under the precision-recall curve


Collaborative matrix factorization method




Graph regularized matrix factorization


L2,1-norm Graph regularized matrix factorization


Weighted L2,1-norm graph regularized matrix factorization


Low rank approximation


Singular value decomposition


Weighted graph regularized matrix factorization


Weighted K nearest known neighbors


  1. 1.

    Novac N. Challenges and opportunities of drug repositioning. Trends Pharmacol Sci. 2013;34(5):267–72.

  2. 2.

    Hurle MR, Yang L, Xie Q, Rajpal DK, Sanseau P, Agarwal P. Computational drug repositioning: from data to therapeutics. Clin Pharmacol Ther. 2013;93(4):335–41.

  3. 3.

    Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017;45(Database issue):D353–61.

  4. 4.

    Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V. DrugBank 3.0: a comprehensive resource for ‘omics’ research on drugs. Nucleic Acids Res. 2011;39(Database issue:D1035.

  5. 5.

    Kuhn M, Szklarczyk D, Pletscherfrankild S, Blicher TH, Mering CV, Jensen LJ, Bork P. STITCH 4: integration of protein–chemical interactions with user data. Nucleic Acids Res. 2014;42(Database issue):401–7.

  6. 6.

    Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, Mcglinchey S, Michalovich D, Allazikani B. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012;40(Database issue):1100–7.

  7. 7.

    Yonan AL, Palmer AA, Smith KC, Feldman I, Lee HK, Yonan JM, Fischer SG, Pavlidis P, Gilliam TC. Bioinformatic analysis of autism positional candidate genes using biological databases and computational gene network prediction. Genes Brain Behav. 2003;2(5):303–20.

  8. 8.

    Klipp E, Wade RC, Kummer U. Biochemical network-based drug-target prediction. Curr Opin Biotechnol. 2010;21(4):511–6.

  9. 9.

    Keiser MJ, Roth BL, Armbruster BN, Ernsberger P, Irwin JJ, Shoichet BK. Relating protein pharmacology by ligand chemistry. Nat Biotechnol. 2007;25(2):197–206.

  10. 10.

    Cheng AC, Coleman RG, Smyth KT, Cao Q, Soulard P, Caffrey DR, Salzberg AC, Huang ES. Structure-based maximal affinity model predicts small-molecule druggability. Nat Biotechnol. 2007;25(1):71–5.

  11. 11.

    Yamanishi Y, Araki M, Gutteridge A, Honda W, Kanehisa M. Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics. 2008;24(13):i232–40.

  12. 12.

    Shang J, Sun Y, Li S, Liu JX, Zheng CH, Zhang J. An improved opposition-based learning particle swarm optimization for the detection of SNP-SNP interactions. Biomed Res Int. 2015;2015:524821.

  13. 13.

    Wei PJ, Zhang D, Xia J, Zheng CH. LNDriver: identifying driver genes by integrating mutation and expression data based on gene-gene interaction network. Bmc Bioinformatics. 2016;17(Suppl 17):467.

  14. 14.

    Gönen M. Predicting drug–target interactions from chemical and genomic kernels using Bayesian matrix factorization. Bioinformatics. 2012;28(18):2304–10.

  15. 15.

    Zheng X, Ding H, Mamitsuka H, Zhu S. Collaborative matrix factorization with multiple similarities for predicting drug-target interactions. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2013. p. 1025–33.

  16. 16.

    Mei JP, Kwoh CK, Yang P, Li XL, Zheng J. Drug–target interaction prediction by learning from local information and neighbors. Bioinformatics. 2013;29(2):238–45.

  17. 17.

    Ge SG, Xia J, Sha W, Zheng CH. Cancer subtype discovery based on integrative model of multigenomic data. IEEE/ACM Transactions on Computational Biology & Bioinformatics. 2017;14(5):1115–21.

  18. 18.

    Wang DQ, Zheng CH, Gao YL, Liu JX, Wu SS, Shang JL. L21-iPaD: an efficient method for drug-pathway association pairs inference. In: IEEE international conference on bioinformatics and biomedicine; 2017. p. 664–9.

  19. 19.

    Takahashi Y, Fujishima S, Kato H. Chemical data mining based on structural similarity. Journal of Computer Chemistry Japan. 2003;2(4):119–26.

  20. 20.

    Ezzat A, Zhao P, Wu M, Li X-L, Kwoh C-K. Drug-target interaction prediction with graph regularized matrix factorization. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB). 2017;14(3):646-56.

  21. 21.

    Davis J, Goadrich M. The relationship between precision-recall and ROC curves. In: ICML '06 : proceedings of the international conference on machine learning, New York, Ny, Usa; 2006. p. 233–40.

  22. 22.

    Li J, Fine JP. Weighted area under the receiver operating characteristic curve and its application to gene selection. J R Stat Soc. 2010;59(4):673.

  23. 23.

    Pahikkala T, Airola A, Pietilä S, Shakyawar S, Szwajda A, Tang J, Aittokallio T. Toward more realistic drug–target interaction predictions. Brief Bioinform. 2015;16(2):325–37.

  24. 24.

    Schuffenhauer A, Floersheim P, Acklin P, Jacoby E. Similarity metrics for ligands reflecting the similarity of the target proteins. J Chem Inf Comput Sci. 2003;43(2):391.

  25. 25.

    Wang B, Pan F, Hu KM, Paul JC. Manifold-ranking based retrieval using k -regular nearest neighbor graph. Pattern Recogn. 2012;45(4):1569–77.

  26. 26.

    Liberty E, Woolfe F, Martinsson PG, Rokhlin V, Tygert M. Randomized algorithms for the low-rank approximation of matrices. Proc Natl Acad Sci U S A. 2007;104(51):20167–72.

  27. 27.

    Wang J, Liu J-X, Zheng C-H, Wang Y-X, Kong X-Z, Wen C-G. A Mixed-Norm Laplacian Regularized Low-Rank Representation Method for Tumor Samples Clustering. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB). 2019;16(1):172-82.

  28. 28.

    Gu Q, Zhou J, Ding CHQ. Collaborative filtering: weighted nonnegative matrix factorization incorporating user and item graphs. SDM:199-210. In: Siam international conference on data mining, SDM 2010, April 29–may 1, 2010, Columbus, Ohio, Usa; 2010. p. 199–210.

Download references


Not applicable.


Publication consts are founded by the National Natural Science Foundation of China under grant Nos. 61872220, 61572284, and 61701279.

Availability of data and materials

The datasets that support the findings of this study are available in

About this supplement

This article has been published as part of BMC Bioinformatics Volume 20 Supplement 8, 2019: Decipher computational analytics in digital health and precision medicine. The full contents of the supplement are available online at

Author information

ZC and YLG jointly contributed to the design of the study. ZC designed and implemented the L2,1-GRMF and L2,1-WGRMF method, performed the experiments, and drafted the manuscript. JXL gave statistical and computational advice to the project, and participated in designing evaluation criteria. LYD and SSY contributed to the data analysis. All authors read and approved the final manuscript.

Correspondence to Jin-Xing Liu.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark


  • Drug-target interaction prediction
  • Graph regularization
  • L2,1-norm
  • Matrix factorization
  • Manifold learning