 Research
 Open Access
 Published:
L_{2,1}GRMF: an improved graph regularized matrix factorization method to predict drugtarget interactions
BMC Bioinformatics volumeÂ 20, ArticleÂ number:Â 287 (2019)
Abstract
Background
Predicting drugtarget interactions is timeconsuming and expensive. It is important to present the accuracy of the calculation method. There are many algorithms to predict global interactions, some of which use drugtarget networks for prediction (ie, a bipartite graph of bound drug pairs and targets known to interact). Although these algorithms can predict some drugtarget interactions to some extent, there is little effect for some new drugs or targets that have no known interaction.
Results
Since the datasets are usually located at or near lowdimensional nonlinear manifolds, we propose an improved GRMF (graph regularized matrix factorization) method to learn these flow patterns in combination with the previous matrixdecomposition method. In addition, we use one of the preprocessing steps previously proposed to improve the accuracy of the prediction.
Conclusions
Crossvalidation is used to evaluate our method, and simulation experiments are used to predict new interactions. In most cases, our method is superior to other methods. Finally, some examples of new drugs and new targets are predicted by performing simulation experiments. And the improved GRMF method can better predict the remaining drugtarget interactions.
Background
With advances in drug discovery technologies, the existing methods can identify drug targets to some extent. But drug development is a highcost, inefficient problem [1]. For drug developers, there has been a great deal of interest in the repositioning of drugs. This repositioning has some potential to reduce risk time and cost [2]. A crucial element for the repositioning of medicines is online biological databases such as KEGG [3], DrugBank [4], STITCH [5] and ChEMBL [6], which store a large number of current drugtarget interactions. It is worth noting that there are still many interactions that have not been found [7]. Therefore, the advances of drugtarget prediction technology is accelerated, and more and more prediction methods are proposed [8]. These computations, which reasonably predict new and unexplored interactions, have greatly facilitated the drug discovery process, making the process more credible. Recent research shows that there are three popular methods for predicting drugtarget interactions, such as ligandbased methods [9], dockingbased methods [10], and chemogenomic approaches [11]. Of course, we can also use the oppositionbased learning particle swarm optimization to predict interactions, such as SNPSNP interactions [12]. Moreover, the potential genegene interactions network can be identified by LNDriver [13].
Recently, many researchers have used matrix decomposition methods to solve drugtarget interaction problems. The main methods are Bayesian matrix factorization, KBMF2K [14] and collaborative matrix factorization method, CMF [15]. A highdimensional drugtarget interaction matrix is decomposed into a plurality of lowdimensional matrices, and these matrices have characteristics of the original matrices, which is the principle of these methods. However, in theories, the above methods of matrix factorization still have some room for improvement. [16].
Using chemogenomic approaches to predict drugtarget interactions is an effective method. The reason is that the first two methods have their own drawbacks. If a docking simulation is used, the threedimensional structure of a target protein must be available. Furthermore, for ligandbased methods, if there are few or no target proteins known, this would be a problem that cannot be ignored. [9]. The advantage of using chemical genomics is that the information from the drugs and targets is used simultaneously for prediction [17]. New interactions are inferred by calculating the similarity of the chemical structures between drugs and the similarity of the genomic sequences between the targets. In this paper, the drug similarity and the target similarity are based on the construction methods in previous studies, which are based on the characteristics of the drug and the characteristics of the target. Its advantage is that we are better able to compare it with other methods, which is universal. However, if the same construction method of the drug similarity and the target similarity is used, this may affect the final results.
Two separate models are used to train drug target pairs, one based on the drug side and the other based on the target side. Thus, the final results are solved by predicting these two aspects. In this paper, to avoid overfitting and sparing the target, the L_{2,1}norm is added in our method, which can eliminate some unattached target pairs [18]. Tenfold crossvalidation is used to evaluate the performance of our method.
We present the experimental results in Results. In Datasets, we conducted a case study. And we summarize this paper in Crossvalidation experiments. In Interaction prediction under CVd, we clearly introduced the methods, including specific iteration formulas and algorithms.
Results
Datasets
Four datasets are used to experiment: the nuclear receptor (NR), the G proteincoupled receptor (GPCR), the ion channel (IC) and the enzyme (E). The size of these four datasets is different. Nuclear receptors are one of the most abundant transcriptional regulators in metazoans. NR includes some steroid hormones, vitamin D and quinone. In recent years, nuclear receptors have received widespread attention. For example, they are closely related to the development of diseases such as diabetes and fatty liver. Among them, PPARg agonist thiazolidinedione rosiglitazone can effectively improve insulin sensitivity in diabetic patients. GPCRs are one of the target enzymes that are important proteins in cell signaling and have so far been found as therapeutic drugs. The total number of targets is about 500, and GPCR targets account for the vast majority of receptors therein. In recent years, indications for targeting GPCR drugs are expanding from traditional areas such as allergies, hypertension, anesthesia and schizophrenia to new areas such as obesity. An ion channel is a poreforming protein that traverses the channel by allowing an ion of a particular type to rely on an electrochemical gradient. ICs are small pores in the cell membrane that allow ions to enter and exit the cell. Therefore, most of them have become the targets of some mainstream drugs. Enzymes are macromolecular biocatalysts. Some common drugs use enzymes as targets, and some effects on enzymes such as inhibition, induction, activation or reactivation are exerted. In addition, drugs like this are mostly enzyme inhibitors. According to statistics, half of the top 20 drugs in the world are enzyme inhibitors. It is worth noting that some drugs are enzymes themselves, such as pepsin and trypsin.
Each dataset contains three matrices, Y, S_{d} and S_{t}. Matrix Y represents the drugtarget interactions. It is worth noting that this matrix is an adjacency matrix. If it is known that the drug d_{i} is related to the target t_{j}, Y_{ij} is 1, otherwise Y_{ij} is 0. The matrix S_{d} represents the chemical pairing structural similarity [19] and the matrix S_{t} represents the genome sequence similarity of the target pair [20]. Table 1 lists the specific information for the four datasets. More information about the datasets are published in https://github.com/cuizhensdws/L21GRMF.
Crossvalidation experiments
We compare the existing matrix decomposition methods CMF (Collaborative matrix factorization), GRMF (Graph regularized matrix factorization), WGRMF (Weighted graph regularized matrix factorization) and our proposed method and compare WKNKN preprocessing on these methods. We use crossvalidation experiments on these methods. In this paper, we use a tenfold crossvalidation (CV). The original dataset Y is divided into ten subsets, each of which is tested once and the rest as a training set. The crossvalidation is repeated five times, one subset is selected each time as a test set, and the average crossvalidation recognition accuracy rate of five times is taken as a result.
To verify the effect of the prediction, we use the evaluation index which has been widely used before, the AUPR (Area under the PrecisionRecall curve) [21]. There is also an evaluation scale called AUC (Area under the receiver operating characteristic curve). We can use this method when forecasting. In our experiments, ten AUPR values are calculated for each tenfold crossvalidation, an average is obtained and we repeat five times, so we take the average of the five AUPRs as the final result [22]. In general, the AUPR value is less than the AUC value. The AUPR value is above 0.3, so the experimental results are reasonable.
We test two aspects [23], one is CVd which is based on the druginteraction profiles and the other is CVt, which is based on the targetinteraction profiles. CVd is used to test the ability to predict new drugs, CVt is used to test the ability to predict new targets. In addition, we perform a convergence analysis of each method using the NR and GPCR datasets as examples, and each method is subjected to 100 iterations. When the number of iterations is about 20, our method achieves convergence. It is worth noting that we have different tolerances for errors, considering the size and type of the datasets. Generally speaking, as long as the error is within a reasonable range, this is acceptable. Figures 1 and 2 show the convergence of different methods on the NR and GPCR datasets, respectively.
Interaction prediction under CVd
Table 2 lists the experimental results at CVd. And Standard deviations are given in parentheses. Under the NR dataset, the L_{2,1}GRMF (L_{2,1}norm Graph regularized matrix factorization) method is superior to the GRMF method and is almost the same as the GRMF method after adding the WKNKN. Importantly, our improved method L_{2,1}GRMF, with the addition of WKNKN, has seen significant improvements. Moreover, after adding the weight matrix to L_{2,1}GRMF and using WKNKN, the accuracy of prediction is also improved. Figure 3 shows the PR curves on the CVd side of each method on the NR dataset.
However, on the GPCR dataset, we run our method and find that it is not outperform the previous method, and initially estimate that there is a problem with the dataset itself. Figure 4 shows the PR curves on the CVd side of each method on the GPCR dataset. We observe that using the weight matrix when performing CVd experiments is higher than the AUPR value obtained without using the weight matrix. In addition, the L_{2,1}WGRMF (Weighted L_{2,1}norm graph regularized matrix factorization) method using WKNKN is superior to any other method in the IC dataset, slightly better than the WGRMF method using WKNKN. Figure 5 shows the PR curves on the CVd side of each method on the IC dataset. In the E dataset, the best method is L_{2,1}WGRMF but the AUPR score drops instead after applying WKNKN. In other words, in the E dataset, the preprocessing step will actually have a negative effect on the forecast result. Figure 6 shows the PR curves on the CVd side of each method on the E dataset. In general, not all methods use WKNKN to improve AUPR scores, which have a positive effect on most datasets and negative effects on some datasets. In practice, the negative impact of the WKNKN method is unavoidable on some datasets. One important reason is that the WKNKN method assigns an inaccurate value to the 0 element of the matrix Y on the E dataset. When we add the L_{2,1}GRMF method to make more accurate predictions, these inaccurate values will reduce the prediction accuracy.
Interaction prediction under CVt
We can see in Table 3 that under most datasets, the AUPR value of CVt is generally higher than the AUPR value of CVd. This shows that hiding the interactions of the target can still get a better prediction result. But hiding the drug interactions and the prediction result will be greatly reduced. And standard deviations are given in parentheses. It is worth noting that in most datasets, the CMF method has lower AUPR values than any other method, and its AUPR value is far less than our method, especially in the NR dataset.
Discussion
Among the NR, GPCR and IC datasets, the superior methods are the L_{2,1}GRMF method using the preprocessing steps, and our improved method has some improvement on all three datasets. Figures 7, 8, 9 and 10 show the PR curves on the CVt side of each method on the NR, GPCR, IC and E datasets, respectively. On the E dataset, it is still the best GRMF method. We can also see that some instances are ignored after using the weight matrix, whereas the GRMF method does not use the weight matrix W. Therefore, based on the previous conclusions, the information of the target is more important than the information of the drug. Therefore, using the GRMF method, the AUPR value is higher than the AUPR value using WGRMF.
On most datasets, the L_{2,1}norm does play a key role in predicting the results. The L_{2,1}norm can provide a sparse solution for the final result. Compared with the CMF method, the L_{2,1}norm also promotes the final convergence. Therefore, the overall performance of the L_{2,1}GRMF method and L_{2,1}WGRMF is superior to other methods.
Case study
In this section, we conduct a simulation experiment. First, we erase some of the known drug targets in the original dataset. That is, those elements that are originally 1 in the original matrix become 0. This process is performed randomly by the computer. In the second step, we perform the experiment. We examine the results of the experiment and see if the erased condition is successfully predicted.
The experimental procedure we implement is that in the NR dataset, ten drugs with the interaction of the target estrogen receptor alpha (KEGG ID: hsa2099) are removed. This target is the main cause of breast cancer. After the experiment is done, we count the experimental results. We predict five of the hidden interactions. At the same time, we also predict a portion of new drugs and take the most reliable top five new drugs stated in Table 4. Among them, the sixth drug Testosterone is the drug with the highest correlation with this target.
In IC dataset, for the drug Diazoxide (KEGG ID: D00294), a blood pressure lowering drug. We also use a similar approach. Before using the L_{2,1}GRMF method, we eliminate twenty of them in the matrix Y. Because the GPCR dataset is larger than the NR dataset and there are many targets associate with this drug, we have removed twenty interactions here. After conducting simulation experiments, we successfully predicted twelve known targets and eight new targets. We then list the top twenty targets in Table 5. The first 12 are known targets and the remaining part is our prediction of a new target.
For these two cases, the similarity of the estrogen receptor alpha to its nearest neighbor target is less than 0.02 in the matrix S^{t}. In the matrix S^{d}, the similarity of Diazoxide to its nearest neighbor is 0.3, which is also quite low. Therefore, we are more difficult to make predictions. Thus, this shows that our proposed L_{2,1}GRMF method is excellent and reliable results can be obtained when predicting some challenging drugs and targets. Of course, there are still some limitations to the two methods proposed. If we add a weight matrix, the time required for the experiment will multiply. Compared with other methods, our time complexity is relatively high. In addition, the method does not predict new drugs and new targets without any interaction.
Conclusions
In this paper, we propose two improved matrix decomposition methods, L_{2,1}GRMF and L_{2,1}WGRMF. Both methods are used to predict drugtarget interactions. We use crossvalidation to calculate AUPR values and predict on the drug side (CVd) and the target side (CVt), respectively. We compare them with the most advanced matrix factorization methods currently available. In most cases, our improved methods can provide the best results, which means that the predictive performance is improved with the use of the L_{2,1}norm.
WKNKN preprocessing steps are used to help the experimental results. In addition, it can also be used as an independent method to predict the interactions of drugtarget. Considering that the dimensions of the data are relatively small, so the drugtarget interactions contained in each dataset are also limited. And our approach applies to these datasets.
In the future, we expect more and more known interaction of drug targets will be found, providing more valuable datasets for our prediction. We will explore more effective prediction methods to solve drugtarget interaction problems. For example, we can use matrix factorization of hypergraph method to improve the reliability of predictive interactions.
Methods
CMF
Comatrix factorization is an effective method to predict the interactions of drugtarget [15]. The objective function of CMF method is
where W represents a weight matrix, W_{ij}â€‰=â€‰1 when Y_{ij} is known, W_{ij}â€‰=â€‰0 otherwise. Obviously, the last two items of the objective function are regularization terms. We use L to represent the objection function in Eq. (1), a_{i} represents the ith vector of A, and b_{j} represents the jth vector of B. Two update rules are used to solve âˆ‚L/âˆ‚aâ€‰=â€‰0 and âˆ‚L/âˆ‚bâ€‰=â€‰0. Finally, the two update rules are executed using least square until convergence:
In summary, after the potential feature matrices A and B are updated, the predicted score matrix can be obtained by multiplying A and B. This predicted score matrix can be used to predict new drugtarget interactions by comparing with the original drugtarget interactions matrix Y.
GRMF
In the GRMF method, the benefits of regularization items is that it can avoid overfitting [20]. The objective function of GRMF is as follows:
Then, matrix A and B are initialized. The SVD (singular value decomposition) method is used to decompose matrix Yâ€‰âˆˆâ€‰R^{nâ€‰Ã—â€‰m} into Uâ€‰âˆˆâ€‰R^{nâ€‰Ã—â€‰k}, S_{k}â€‰âˆˆâ€‰R^{kâ€‰Ã—â€‰k}, and Vâ€‰âˆˆâ€‰R^{nâ€‰Ã—â€‰k}. In matrix Y, the largest possible number of singular values is min(n,â€‰m), so kâ€‰maxâ€‰â€‰=â€‰min(n,â€‰m). Finally, the square root of S_{k} can be obtained, where \( \mathbf{A}={\mathbf{US}}_{\mathbf{k}}^{1/2} \), \( \mathbf{B}={\mathbf{VS}}_{\mathbf{k}}^{1/2} \).
Next, the least square method is used to update A and B. This objective function in Eq. (4) can be replaced by L. These two update rules are used to solve âˆ‚L/âˆ‚aâ€‰=â€‰0 and âˆ‚L/âˆ‚bâ€‰=â€‰0. Finally, the two update rules are executed by using least square until convergence.
WGRMF
Like CMF, the weight matrix W in WGRMF is the same as W in CMF. Behind the weight matrix, either to prevent unknown interactions, the purpose is to help find the latent feature matrix A and B. The objective function of WGRMF method is as follows
This objective function in Eq. (5) can be replaced by L, where a_{i} represents the ith vector of A, and b_{j} represents the jth vector of B. These two update rules are used to solve âˆ‚L/âˆ‚aâ€‰=â€‰0 and âˆ‚L/âˆ‚bâ€‰=â€‰0. Finally, the two update rules are executed by using least square until convergence. However, it is worth noting that the update rules here are not the same as the update rules in GRMF. In GRMF, the rules are matrix updates, but in WGRMF the rules are row updates.
Our proposed methods
Here, our improved approach is used to solve the prediction of drugtarget interactions problem. WKNKN (weighted K nearest known neighbors) [20] as a preprocessing step is used to solve unknown missing value problems. Two methods are proposed, Graph Regularization Matrix factorization based on L_{2,1}norm, and a variant called L_{2,1}WGRMF, both of which are used to predict drugtarget interactions. Figure 11 shows a flow chart of the proposed method.
L_{2,1}GRMF
Sparsification of the drug similarity matrix and target similarity matrix
Graph regularization terms are used to fully consider the internal structure of the similarity matrix S^{d} and S^{t}. In addition, the graph regularization terms can keep the internal structure of the matrices unchanged. We derive a pnearest neighbor graph from each drug and target similarity matrix [24] S^{d} and S^{t} in this work. Therefore, given a drug similarity matrix S^{d}, a pnearest neighbor graph [25] N can be generated as
where N is used to sparsify the matrix S^{d}, which can be written as
This result is for a sparse drug similarity matrix. Similarly, the target similarity matrix S^{t} can be obtained in the same way. We use the Euclidean distance to calculate the nearest neighbor. In general, Euclidean distance will give better results because it represents the true distance.
Graph regularization helps to facilitate the study the manifold from learning drugs and target spaces. In the original space, there are points that are close to each other, and when the manifold learning is performed, the points are also close to each other in learning.
Lowrank approximation
The idea of low rank approximation (LRA) is applied to GRMF [26]. It decomposes the target matrix Y into two lowrank latent feature matrices A and B, i.e., Yâ€‰â‰ˆâ€‰AB^{T} [27]. And the objective function of GRMF can be written as the following optimization problem:
where â€–â‹…â€–_{F} is Frobenius norm. In addition, the number of potential features of A and B is represented by k.
Regularization
In general, the Tikhonov and graph regularization terms can be used to avoid overfitting and enhance generalization capability. Here is the objective function of L_{2,1}GRMF:
where Î»_{l}, Î»_{d} and Î»_{t} are positive parameters, a_{i} is the ith rows of A, and b_{j} is the jth rows of B, n is the number of drugs, and m is the number of targets. The first term is an approximate model of the matrix Y. The second term is the Tikhonov regularization. Its main purpose is to minimize the norms of A, B. The third term is the L_{2,1}norm applied on B to increase the target matrix sparsity and discard unwanted target pairs. Considering that we are more concerned with certain drugs, we use the L_{2,1}norm to sparse the potential feature matrix of the target, so that we can better predict new drugs. However, while the L_{2,1}norm is added to A, some of the more important drugs may be lost. The last two terms are graph regularization of drugs and targets, respectively. Moreover, the drugtarget model can be rewritten as:
where Tr(â‹…) is the trace of the matrix, \( {\mathbf{L}}_{\mathbf{d}}={\mathbf{D}}^{\mathbf{d}}\hat{{\mathbf{S}}^{\mathbf{d}}} \) is the graph Laplacian for \( \hat{{\mathbf{S}}^{\mathbf{d}}} \), \( {\mathbf{L}}_{\mathbf{t}}={\mathbf{D}}^{\mathbf{t}}\hat{{\mathbf{S}}^{\mathbf{t}}} \) is the graph Laplacian for \( \hat{{\mathbf{S}}^{\mathbf{t}}} \). Please refer to [28] for more details on rewriting graph regularization. We know that the known normalized Laplacian is better than unknown, so we replace L_{d} and L_{t} with \( \overset{\sim}{{\mathbf{L}}_{\mathbf{d}}}={\left({\mathbf{D}}^{\mathbf{d}}\right)}^{1/2}{\mathbf{L}}_{\mathbf{d}}{\left({\mathbf{D}}^{\mathbf{d}}\right)}^{1/2} \) and \( \overset{\sim}{{\mathbf{L}}_{\mathbf{t}}}={\left({\mathbf{D}}^{\mathbf{t}}\right)}^{1/2}{\mathbf{L}}_{\mathbf{t}}{\left({\mathbf{D}}^{\mathbf{t}}\right)}^{1/2} \). The function can be written as:
We use the minimization of the objective function to predict the outcome of the interactions, but this could lead to unsatisfactory results. Because there are many zeros that have not been found. Therefore, we use WKNKN preprocessing method to solve this problem.
Initialization of A and B
For the input matrix Y, SVD (Singular Value Decomposition) method is used to obtain the initial value of matrix A and matrix B:
Among them, S_{k} is a diagonal matrix and contains the k largest singular values. In matrix Y, the number of singular values is k_{max}â€‰=â€‰min(n,â€‰m). According to the SVD method, k_{max} is the maximum possible number.
Optimization algorithm
In this paper, we can update A and B by using the least square method. Let the partial derivative of A be equal to 0, the partial derivative of B be equal to 0, the objective function in Eq. (11) can be replaced by L, that is, âˆ‚L/âˆ‚Aâ€‰=â€‰0 and âˆ‚L/âˆ‚Bâ€‰=â€‰0. The two update rules are executed by using least square until convergence. When we perform the L_{2,1}GRMF method, Î»_{l}, Î»_{d} and Î»_{t} are determined by the crossvalidation on the training set to the optimal parameter values. We use grid search, Î»_{l}â€‰âˆˆâ€‰{2^{âˆ’2},â€‰2^{âˆ’1},â€‰2^{0},â€‰2^{1}}. Then we choose the optimal parameters from this set. Derivation process is as follows:
where D is a diagonal matrix with the ith diagonal element as d_{ii}â€‰=â€‰1/2â€–(B)^{i}â€–_{2}. The specific algorithm of L_{2,1}GRMF is as follows:
L_{2,1}WGRMF
A variant of L_{2,1}GRMF, called L_{2,1}WGRMF, is obtained here by adding a weight matrix W to the L_{2,1}GRMF. The advantage is that it helps to determine the latent feature matrices A and B of the drugtarget matrix Y. So, we write the objective function that contains W as follows:
Let objective function be set to F such that âˆ‚F/âˆ‚a_{i}â€‰=â€‰0 and âˆ‚F/âˆ‚b_{j}â€‰=â€‰0. The update rules are used to obtain A and B until convergence
Abbreviations
 AUPR:

Area under the precisionrecall curve
 CMF:

Collaborative matrix factorization method
 CV:

Crossvalidation
 GRMF:

Graph regularized matrix factorization
 L_{2,1}GRMF:

L_{2,1}norm Graph regularized matrix factorization
 L_{2,1}WGRMF:

Weighted L_{2,1}norm graph regularized matrix factorization
 LRA:

Low rank approximation
 SVD:

Singular value decomposition
 WGRMF:

Weighted graph regularized matrix factorization
 WKNKN:

Weighted K nearest known neighbors
References
Novac N. Challenges and opportunities of drug repositioning. Trends Pharmacol Sci. 2013;34(5):267â€“72.
Hurle MR, Yang L, Xie Q, Rajpal DK, Sanseau P, Agarwal P. Computational drug repositioning: from data to therapeutics. Clin Pharmacol Ther. 2013;93(4):335â€“41.
Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017;45(Database issue):D353â€“61.
Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V. DrugBank 3.0: a comprehensive resource for â€˜omicsâ€™ research on drugs. Nucleic Acids Res. 2011;39(Database issue:D1035.
Kuhn M, Szklarczyk D, Pletscherfrankild S, Blicher TH, Mering CV, Jensen LJ, Bork P. STITCH 4: integration of proteinâ€“chemical interactions with user data. Nucleic Acids Res. 2014;42(Database issue):401â€“7.
Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, Mcglinchey S, Michalovich D, Allazikani B. ChEMBL: a largescale bioactivity database for drug discovery. Nucleic Acids Res. 2012;40(Database issue):1100â€“7.
Yonan AL, Palmer AA, Smith KC, Feldman I, Lee HK, Yonan JM, Fischer SG, Pavlidis P, Gilliam TC. Bioinformatic analysis of autism positional candidate genes using biological databases and computational gene network prediction. Genes Brain Behav. 2003;2(5):303â€“20.
Klipp E, Wade RC, Kummer U. Biochemical networkbased drugtarget prediction. Curr Opin Biotechnol. 2010;21(4):511â€“6.
Keiser MJ, Roth BL, Armbruster BN, Ernsberger P, Irwin JJ, Shoichet BK. Relating protein pharmacology by ligand chemistry. Nat Biotechnol. 2007;25(2):197â€“206.
Cheng AC, Coleman RG, Smyth KT, Cao Q, Soulard P, Caffrey DR, Salzberg AC, Huang ES. Structurebased maximal affinity model predicts smallmolecule druggability. Nat Biotechnol. 2007;25(1):71â€“5.
Yamanishi Y, Araki M, Gutteridge A, Honda W, Kanehisa M. Prediction of drugtarget interaction networks from the integration of chemical and genomic spaces. Bioinformatics. 2008;24(13):i232â€“40.
Shang J, Sun Y, Li S, Liu JX, Zheng CH, Zhang J. An improved oppositionbased learning particle swarm optimization for the detection of SNPSNP interactions. Biomed Res Int. 2015;2015:524821.
Wei PJ, Zhang D, Xia J, Zheng CH. LNDriver: identifying driver genes by integrating mutation and expression data based on genegene interaction network. Bmc Bioinformatics. 2016;17(Suppl 17):467.
GÃ¶nen M. Predicting drugâ€“target interactions from chemical and genomic kernels using Bayesian matrix factorization. Bioinformatics. 2012;28(18):2304â€“10.
Zheng X, Ding H, Mamitsuka H, Zhu S. Collaborative matrix factorization with multiple similarities for predicting drugtarget interactions. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2013. p. 1025â€“33.
Mei JP, Kwoh CK, Yang P, Li XL, Zheng J. Drugâ€“target interaction prediction by learning from local information and neighbors. Bioinformatics. 2013;29(2):238â€“45.
Ge SG, Xia J, Sha W, Zheng CH. Cancer subtype discovery based on integrative model of multigenomic data. IEEE/ACM Transactions on Computational Biology & Bioinformatics. 2017;14(5):1115â€“21.
Wang DQ, Zheng CH, Gao YL, Liu JX, Wu SS, Shang JL. L21iPaD: an efficient method for drugpathway association pairs inference. In: IEEE international conference on bioinformatics and biomedicine; 2017. p. 664â€“9.
Takahashi Y, Fujishima S, Kato H. Chemical data mining based on structural similarity. Journal of Computer Chemistry Japan. 2003;2(4):119â€“26.
Ezzat A, Zhao P, Wu M, Li XL, Kwoh CK. Drugtarget interaction prediction with graph regularized matrix factorization. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB). 2017;14(3):64656.
Davis J, Goadrich M. The relationship between precisionrecall and ROC curves. In: ICML '06 : proceedings of the international conference on machine learning, New York, Ny, Usa; 2006. p. 233â€“40.
Li J, Fine JP. Weighted area under the receiver operating characteristic curve and its application to gene selection. J R Stat Soc. 2010;59(4):673.
Pahikkala T, Airola A, PietilÃ¤ S, Shakyawar S, Szwajda A, Tang J, Aittokallio T. Toward more realistic drugâ€“target interaction predictions. Brief Bioinform. 2015;16(2):325â€“37.
Schuffenhauer A, Floersheim P, Acklin P, Jacoby E. Similarity metrics for ligands reflecting the similarity of the target proteins. J Chem Inf Comput Sci. 2003;43(2):391.
Wang B, Pan F, Hu KM, Paul JC. Manifoldranking based retrieval using k regular nearest neighbor graph. Pattern Recogn. 2012;45(4):1569â€“77.
Liberty E, Woolfe F, Martinsson PG, Rokhlin V, Tygert M. Randomized algorithms for the lowrank approximation of matrices. Proc Natl Acad Sci U S A. 2007;104(51):20167â€“72.
Wang J, Liu JX, Zheng CH, Wang YX, Kong XZ, Wen CG. A MixedNorm Laplacian Regularized LowRank Representation Method for Tumor Samples Clustering. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB). 2019;16(1):17282.
Gu Q, Zhou J, Ding CHQ. Collaborative filtering: weighted nonnegative matrix factorization incorporating user and item graphs. SDM:199210. In: Siam international conference on data mining, SDM 2010, April 29â€“may 1, 2010, Columbus, Ohio, Usa; 2010. p. 199â€“210.
Acknowledgements
Not applicable.
Funding
Publication consts are founded by the National Natural Science Foundation of China under grant Nos. 61872220, 61572284, and 61701279.
Availability of data and materials
The datasets that support the findings of this study are available in https://github.com/cuizhensdws/L21GRMF.
About this supplement
This article has been published as part of BMC Bioinformatics Volume 20 Supplement 8, 2019: Decipher computational analytics in digital health and precision medicine. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume20supplement8.
Author information
Authors and Affiliations
Contributions
ZC and YLG jointly contributed to the design of the study. ZC designed and implemented the L_{2,1}GRMF and L_{2,1}WGRMF method, performed the experiments, and drafted the manuscript. JXL gave statistical and computational advice to the project, and participated in designing evaluation criteria. LYD and SSY contributed to the data analysis. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisherâ€™s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Cui, Z., Gao, YL., Liu, JX. et al. L_{2,1}GRMF: an improved graph regularized matrix factorization method to predict drugtarget interactions. BMC Bioinformatics 20 (Suppl 8), 287 (2019). https://doi.org/10.1186/s1285901927687
Published:
DOI: https://doi.org/10.1186/s1285901927687
Keywords
 Drugtarget interaction prediction
 Graph regularization
 L_{2,1}norm
 Matrix factorization
 Manifold learning