 Research
 Open Access
 Published:
RCMF: a robust collaborative matrix factorization method to predict miRNAdisease associations
BMC Bioinformatics volume 20, Article number: 686 (2019)
Abstract
Background
Predicting miRNAdisease associations (MDAs) is timeconsuming and expensive. It is imminent to improve the accuracy of prediction results. So it is crucial to develop a novel computing technology to predict new MDAs. Although some existing methods can effectively predict novel MDAs, there are still some shortcomings. Especially when the disease matrix is processed, its sparsity is an important factor affecting the final results.
Results
A robust collaborative matrix factorization (RCMF) is proposed to predict novel MDAs. The L_{2,1}norm are introduced to our method to achieve the highest AUC value than other advanced methods.
Conclusions
5fold cross validation is used to evaluate our method, and simulation experiments are used to predict novel associations on Gold Standard Dataset. Finally, our prediction accuracy is better than other existing advanced methods. Therefore, our approach is effective and feasible in predicting novel MDAs.
Background
A short class of noncoding RNAs called miRNAs, whose length is generally 19 to 25 nt. They usually regulate gene expression and protein production [1,2,3,4,5,6,7]. Since the first two miRNAs lin4 and let7 were discovered in 1993 and 2000, respectively [8, 9]. Thousands of miRNAs have been detected by biologists from nematodes to human eukaryotes [10, 11]. The latest miRNA database version miRBase contains 26,845 entries and more than 2000 human miRNAs are detected [12,13,14]. It is worth noting that with the development of bioinformatics, more researchers are starting to focus on the function of miRNAs. In addition, miRNAs begin to play an important role in biological processes such as proliferation, cell differentiation, viral infection, and signal transduction [15]. Moreover, some miRNAs are closely related to human diseases [16,17,18]. For example, mir433 will upregulate the expression of GRB2 in gastric cancer, which is a known tumorassociated protein [19]. And in every pediatric brain tumor type, mir25, mir129, and mir142 are differentially expressed [20]. Considering the strong association between miRNA and disease, all their potential associations should be explored [15, 21]. In medicine, the advantage is that it can promote the diagnosis and treatment of some complex diseases [22,23,24,25]. However, predicting MDAs is timeconsuming and expensive. Only a few novel associations are discovered and used in clinical medicine each year, and most of the associations are not be discovered by researchers. Therefore, it is imminent to improve the accuracy of prediction results.
In previous studies, functionally similar miRNAs always appear in similar diseases [26, 27]. Based on such theory, more and more computational methods and models are proposed for identifying novel miRNAdisease associations (MDAs) [13]. However, these methods have some shortcomings more or less. For example, Jiang et al. proposed an improved diseasegene prediction model [28]. They introduced the principle and use of hypergeometric distribution. And then they analyzed the actual effect in the prediction model. Moreover, different datasets are used to predict novel MDAs, including the known human miRNAdisease data, miRNA functional similarity data and disease semantic similarity data. However, the shortcoming of this model is the excessive dependence on neighbor miRNA data. Chen et al. proposed a method HGIMDA (Heterogeneous Graph Inference miRNADisease Association) to predict novel MDAs [29]. It is worth noting the known miRNAdisease associations, miRNA functional similarity, disease semantic similarity, and Gaussian interaction profile kernel similarity for diseases and miRNAs are integrated into this method. The benefit is that the accuracy of the algorithm is improved to some extent. The functional relationship between miRNA targets and disease genes in PPI (Protein–Protein Interaction) networks are considered by researchers. Shi et al. proposed a computational method to predict MDAs by performing random walk [30]. They used PPIs, the miRNAtarget interactions and diseasegene associations to identify potential MDAs. However, the model strongly depended on the miRNAtarget interactions with high rate of falsepositive and high falsenegative results [31]. Considering this disadvantage, Chen et al. proposed the RWRMDA (Random Walk with Restart for MiRNAdisease association) model [32]. Their approach was to map all miRNAs to a miRNA functional similarity network. Then, random walk with restart method was implemented until they got stable probability [33]. Finally, all candidate miRNAs will be sorted according to the probability of stability. Moreover, the method was the first global networkbased method. Xuan et al. proposed a HDMP method [34]. The mothed was based on weighted knearestneighbors. The phenotype similarity and semantic similarity between diseases were used to calculate the miRNAs functional similarity matrix. However, the simple ranking of knearestneighbors was not always reliable for prediction. So Chen et al. proposed a new method of rankingbased KNN called RKNNMDA to identify potential MDAs [34]. These previously similaritybased sorted neighbors were reranked to get better prediction results. Recently, matrix factorization methods have been used to identify novel MDAs. The advantage is that these methods can better handle missing associations. Shen et al. proposed a matrix factorization model based collaborative matrix factorization to predict novel MDAs [10]. Matrix factorization method takes one input matrix and tries to obtain two other matrices, then the two matrices are multiplied to approximate the input matrix. Gao et al. proposed a dualnetwork sparse graph regularized matrix factorization method (DNSGRMF) to predict novel MDAs and obtained better experimental results [35]. However, this method does not necessarily solve the overfitting problem very well. Chen et al. developed a computational model of ELLPMDA (Ensemble Learning and Link Prediction for miRNADisease Association) to predict novel MDAs [36]. The miRNAdisease association, miRNA functional similarity, disease semantic similarity and Gaussian profile kernel similarity for miRNAs and diseases were integrated, they used the integrated similarity network and utilized ensemble learning. Three classical algorithms based on similarity are combined to obtain better prediction results. However, even such an excellent method still has some shortcomings, such as excessive ensemble learning will bring more noise. Gao et al. proposed a Nearest Profilebased Collaborative Matrix Factorization (NPCMF) method to predict potential miRNAdisease associations [37]. More importantly, this method has achieved the highest prediction accuracy so far.
In this paper, a simple yet effective matrix factorization model is proposed. Its main function is to predict new MDAs based on existing MDAs. Considering that the missing associations will have a negative impact on the predictions, a preprocessing step is used to solve this problem. The main purpose of this preprocessing method is to try to weight K nearest known neighbors (WKNKN) [38, 39]. It is worth noting that the L_{2,1}norm is introduced in the collaborative matrix factorization (CMF) method. And the L_{2,1}norm can avoid overfitting and eliminate some unattached disease pairs [40, 41]. We also use Gaussian interaction profile kernel similarity to get the network similarity of miRNAs and the network similarity of diseases. Therefore, the final prediction accuracy is greatly improved. Meanwhile, 5fold cross validation is used to evaluate our experimental results. Our proposed method is superior to other methods. In addition, a simulation experiment is conducted to predict novel associations.
Materials
MDAs dataset
The information about associations between miRNA and disease is obtained from HMDD [42], including 383 diseases, 495 miRNAs and 5430 experimentally confirmed human miRNAdiseases associations. And it is a Gold Standard Dataset. The dataset contains three matrices: Y ∈ ℝ^{n × m}, S_{m} ∈ ℝ^{n × n} and S_{d} ∈ ℝ^{m × m}. In addition, Y is an adjacency matrix. In the adjacency matrix, there are n miRNAs as rows and m diseases as columns. If miRNA D(i) is associated with disease d(j), the entity Y(D(i), d(j)) is 1, otherwise 0. The matrix Y is used as the original input matrix. Y is decomposed into two latent feature matrices, and the product of the two latent feature matrices is used to approximate Y. Table 1 lists the specific information for the dataset.
MiRNA functional similarity
Considering the assumption that similarly functioning miRNAs have similar diseases, Wang et.al. proposed a method for calculating the similarity scores of RNA functions [26]. And the miRNA functional similarity scores are downloaded from http:// www.cuilab.cn/files/images/cuilab/misim.zip. The matrix S_{m} is represented miRNA function similarity network. The functional similarity score between miRNA m(i) and m(j) can be represented S_{m}(m(i), m(j)). The S_{m} matrix is also used as an input matrix, which represents the functional similarity of miRNA pairs. Among them, each miRNA has a similarity score of 1 to itself.
Disease semantic similarity
In this work, Directed Acyclic Graph (DAG) is used to describe the diseases. DAG(DD) = (d, T(DD), E(DD)) is used to describe disease DD, where T(DD) is the node set and E(DD) is the corresponding links set [26]. S_{d} is represented disease semantic similarity network. The semantic value of disease DD in DAG(DD) formula is defined as:
where Δ is represented the semantic contribution factor. Generally, the semantic contribution of disease DD to itself is 1. Based on previous research [43], we set Δ to 0.5. It is worth noting that the further the distance between DD and other disease, the smaller the semantic contribution score. Therefore, disease terms contribute the same score to the semantic value of the disease DD in the same layer. Finally, if the two diseases d(i) and d(j) have a larger common part of the DAGs, then the two diseases have a greater similarity score. The disease semantic similarity can be defined as follows:
where S_{d} is the disease semantic similarity matrix. In addition, the S_{d} matrix is also used as an input matrix with Y and S_{m}. Similar to the S_{m} matrix, each disease has its own semantic similarity score of 1. Therefore, the two feature matrices decomposed by Y are controlled by the S_{m} matrix and the S_{d} matrix.
Methodology
Problem formalization
Formally, the known associations Y(m(i), d(j)) of miRNA m(i) associated with disease d(j) are considered to be a matrix factorization model. First, the input associations matrix Y is decomposed into two low rank latent feature matrices A (for miRNAs) and B (for diseases). Then, some constraints are added to the two low rank matrices [44]. Specifically, the L_{2,1}norm is added to the latent feature matrix B (for diseases). Finally, the specific matrices of A and B are obtained by using some update rules. It is worth noting that we need a prediction matrix that is derived from the product of A and B. Considering the stronger association of miRNAs with diseases, the correlation score between them is higher. So, the miRNAdisease pairs Y(m(i), d(j)) are ranked from high to low.
Robust collaborative matrix factorization (RCMF)
The traditional CMF is an effective method for predicting novel MDAs [10]. Collaborative filtering is used by CMF. The objective function of CMF is given as follows:
where ‖⋅‖_{F} is Frobenius norm, λ_{l}, λ_{d} and λ_{t} are nonnegative parameters.
However, although the B matrix is a low rank matrix, it is not sparse enough. In fact, B is indeed sparse. But we want to get the B matrix better, we use the L_{2,1}norm to constrain the latent feature matrix B of the disease. Because the L_{2,1}norm can achieve row sparse, the L_{2,1}norm can better remove the meaningless elements of the B matrix. For matrix B, overfitting problems may be generated to reduce the accuracy of the prediction in predicting novel MDAs.
Therefore, to overcome this problem, a robust collaborative matrix factorization method named RCMF is proposed to predict MDAs. The L_{2,1}norm is introduced to the RCMF method to solve overfitting problems [45, 46]. In this paper, the dataset used in the experiment, the number of diseases is less than the number of miRNAs, we are more concerned about which miRNAs are likely to be associated with the diseases. Therefore, we apply the L_{2,1}norm on the potential feature matrix B of the disease to make the B matrix sparse. The advantage is that more miRNAs can be accurately matched to the disease to improve the accuracy of prediction. The interaction matrix Y is decomposed into two matrices A and B, where AB^{T} ≈ Y. RCMF uses two collaborative regularization terms to constrain A and B. Specifically, these two regularization terms require similar miRNAs or diseases potential feature vectors to be similar, and dissimilar miRNAs or diseases potential feature vectors are not similar, respectively [38]. Where S_{m} ≈ AA^{T} and S_{d} ≈ BB^{T}. Therefore, the objective function of RCMF can be written as:
where ‖⋅‖_{F} is Frobenius norm, ‖⋅‖_{2, 1} is L_{2,1}norm, λ_{l}, λ_{d} and λ_{t} are nonnegative parameters. Based on previous research [38], the grid search method is used to perform the selection of optimal parameters, where λ_{l} ∈ {2^{−2}, 2^{−1}, 2^{0}, 2^{1}} and λ_{d}/λ_{d} ∈ {0, 10^{−4}, 10^{−3}, 10^{−2}, 10^{−1}}. In order to find the latent feature matrices A and B, an approximate model of the matrix Y is constructed in the first term. In the second term, the Tikhonov regularization can minimize the norms of A, B. The L_{2,1}norm is applied on B in the third term. And this advantage is able to increase the sparsity of the disease matrix and eliminate undesired disease pairs. The last two regularization terms represent the minimization of squared error between S_{m} (S_{d}) and AA^{T} (BB^{T}).
Initialization of A and B
A and B are initialized to use the SVD (Singular Value Decomposition) method for the input MDAs matrix Y. The initialization formula can be written as:
where S_{k} is a diagonal matrix, which contains the k largest singular values.
Optimization algorithm
A and B are updated using least squares in this study. First, F is represented as the objection function of RCMF method. Then, ∂F/∂A and ∂F/∂B are set to be 0, respectively. A and B are continued to use the least squares until convergence. Figure 1 shows the convergence of the RCMF method. We perform the RCMF method on the dataset used in the experiment, where the xaxis represents the number of iterations and the yaxis represents the error. As can be seen from Fig. 1, after 50 iterations, the curve begins to converge on a straight line, which proves that our method begins to converge after 50 iterations. In addition, λ_{l}, λ_{d} and λ_{t} are automatically determined. The optimal parameter values are obtained when cross validating the training set. The update rules of A and B can be written as:
where D is a diagonal matrix with the ith diagonal element as d_{ii} = 1/2‖(B)^{i}‖_{2}. Based on these update rules, we first calculate the maximum time complexity required to perform the iterative steps, and then we conclude that the final time complexity of RCMF method is O(nmk), where n is the number of miRNAs, m is the number of diseases and k is the number of singular values in the SVD. Therefore, the algorithm of RCMF is as follows:
Results
Cross validation experiments
In this study, our experiments are compared to the previous advanced methods CMF [10], HDMP [33], WBSMDA [47], MKRMDA [31], HAMDA [48] and ELLPMDA [36]. For each method, 5fold cross validation is conducted 100 times. However, the WKNKN preprocessing steps is performed before running our method. This can solve the problem of missing unknown values. At the same time, it can also improve the accuracy of prediction to some extent.
In general, AUC (Area Under the Curve) is used as a reasonable indicator when evaluating the predictive performance of a method. The popular indicator of AUC is also used to evaluate our approach in this study. The area under the ROC (Receiver Operating Characteristic) curve is considered to be AUC. In other words, the value of this area will not be greater than 1. AUC values between 0.5 and 1 are normal and reasonable. Once below 0.5, the method will have no meaning at all. Before running cross validation, the miRNAdisease pairs are randomly removed in the input MDAs matrix Y. Doing this is a comprehensive assessment of our approach by increasing the difficulty of prediction [49]. This way is called CVp (Cross Validation pairs).
Association prediction under CVp
Table 2 lists the experimental results at CVp. The AUC average of 100 times 5fold cross validation is used as the final AUC score. It is worth noting that AUC is known to be insensitive to skewed class distributions [50]. The gold standard miRNA disease dataset is highly unbalanced in this study. One problem is that there are more negative factors than positive ones. Thus, AUC is a more suitable measure for other methods. As shown in Table 2, the AUC values for each method are counted, including the highest AUC in bold, and standard deviations are given in (parentheses).
As listed in Table 2, our proposed method RCMF achieves an AUC of 0.9345 on Gold Standard Dataset, which is 1.52% higher than ELLPMDA with an AUC of 0.9193. The AUC value of the WBSMDA method is the lowest, and our method is 11.6% higher than it. Also, our method is 6.48% higher than the traditional CMF method. Therefore, our proposed is better than other existing methods. Figure 2 visually shows the AUC level of each method.
Comprehensive prediction for novel MDAs
A simulation experiment is conducted in this subsection. Two cases are tested by our method, one is Esophageal Neoplasms, the other is Liver Neoplasms. Esophageal Neoplasms is very common in many areas of China, especially in northern China [51].. More information about the disease are published in http://www.omim.org/entry/133239. For Esophageal Neoplasms, the 30 miRNAs associated with it are removed. Then, the simulation is conducted to get the final prediction score matrix. Based on the predicted scores for this disease, the miRNAs associated with this disease are ranked from high to low. At the same time, whether the removed miRNA is successfully predicted and the novel associations are also needs to be counted. Twenty known associations are successfully predicted and five novel associations are predicted. Among the unknown associations, the three of five unknown associations are confirmed by the dbDEMC [52]. It is worth noting that hsamir215 has the highest correlation with Esophageal Neoplasms. About hsamir215, Fassan et al. have discovered this miRNA in 2011 related to Esophageal Neoplasms. They performed qRTPCR and ISH analyses on two independent series of endoscopic biopsies (qRTPCR) and esophagectomy specimens (ISH) [53]. In particular, hsamir215 is significantly overexpressed during the pathogenesis of Esophageal Neoplasms. About hsamir184, Kojima et al. discovered this miRNA in 2015 related to Esophageal Neoplasms. They conducted miRNA expression analysis by microarray [54]. By comparing Esophageal Neoplasms with normal samples, hsamir184 is underexpressed in diseased samples. Table 3 lists the experimental results. And the known associations are in bold.
Another case is Liver Neoplasms. It is the fifth most common cancer and the third most common cause of death from cancer worldwide [55]. More information about the disease are published in http://www.omim.org/entry/114550. For Liver Neoplasms, fifteen miRNAs associated with it are removed from the dataset while running our method. Then based on the predicted scores for this disease, the miRNAs associated with this disease are ranked from high to low. Finally, twelve known associations are successfully predicted. At the same time, three novel associations are predicted. And, all three are confirmed by dbDEMC. About hsamir200b, hsamir15b and hsamir183, Naoki et al. have discovered this miRNA in 2012 related to Liver Neoplasms. In particular, hsamir200b, hsamir15b,and hsamir183 are significantly overexpressed during the pathogenesis of Liver Neoplasms [56]. Table 4 lists the experimental results.
According to the above simulation results, most known miRNAs are predicted. At the same time, some unknown miRNAs are also confirmed by dbDEMC. Therefore, our method can be used to predict novel MDAs and achieve excellent predictions.
Discussion
Sensitivity analysis from WKNKN
As mentioned earlier in this study, there are some missing unknown associations in the matrix Y, so WKNKN method is used to minimize the error. K represents the number of nearest known neighbors and p represents a decay term where p ≤ 1. Before running RCMF method, the parameters K and p will be fixed. The sensitivity analysis of these two parameters is given by Figs. 3 and 4, respectively. It can be clearly seen from the figures that when K = 5, p = 0.7, the AUC tends to be stable. Furthermore, to more fully verify the sensitivity of these two parameters to AUC, their joint sensitivity analysis is shown in Fig. 5.
Robust analysis of our method
The L_{2,1}norm can increase the robustness of the algorithm. This is mainly reflected in the distinction between outliers in the dataset. In this section, we use a simulation dataset of 200 data points to verify the robustness of the algorithm. To illustrate RCMF’s ability to learn a subspace, we apply RCMF on a synthetic dataset composed of 200 twodimensional data points. It is worth noting that all data points are distributed in a onedimensional subspace, i.e., a straight line (y = x). In addition, both RCMF and CMF are applied to the synthetic data set for comparison. Specifically, we add different numbers of noise points to the simulation dataset to compare RCMF and CMF. Figures 6 and 7 show the data distribution of 0 noise points, 20 noise points, 40 noise points, 60 noise points and 80 noise points, respectively. As can be seen from Fig. 6, both RCMF and CMF remain stable when there are no noise points in the dataset. It can be seen from Fig. 7 that as the noise point increases, the CMF cannot continue to maintain stability but gradually shifts. It is worth noting that RCMF can still maintain the same state as the original data point due to the L_{2,1}norm. Even if the number of noise points is constantly increasing, RCMF is still unaffected by outliers. This proves that RCMF is robust.
Conclusions
Abnormal expression of miRNA has a crucial impact in the development of complex human diseases. More and more diseases are confirmed by biologists to have a close relationship with miRNAs. In this paper, a novel computational model is proposed to predict MDAs. The most valuable contribution is that the L_{2,1}norm is added to the CMF. AUC value is used as a reliable indicator to evaluate our approach. Meanwhile, the excellent results are generated by our method.
More importantly, WKNKN is used as a preprocessing method. This step plays a crucial role in predicting MDAs. The best predictions are achieved by dealing with missing unknown associations.
In the future, more and more novel MDAs will be predicted and more datasets will be available. At the same time, more valuable MDA information will be published in public databases. In fact, there are many other methods to predict MDAs. RCMF is hoped to be helpful for MDA prediction and relevant miRNA research from the computational biology. In future work, we will continue to study more effective methods to predict novel MDAs.
Availability of data and materials
The datasets that support the findings of this study are available in https://github.com/cuizhensdws/L21GRMF.
Abbreviations
 AUC:

Area under the curve
 CMF:

Collaborative matrix factorization method
 CV:

Crossvalidation
 CVp:

Cross validation pairs
 MDAs:

MiRNAdisease associations
 RCMF:

Robust collaborative matrix factorization method
 SVD:

Singular value decomposition
References
 1.
Ambros V. microRNAs: tiny regulators with great potential. Cell. 2001;107(7):823–6.
 2.
Ambros V. The functions of animal microRNAs. Nature. 2004;431(7006):350.
 3.
Ibrahim R, Yousri NA, Ismail MA, ElMakky NM. miRNA and gene expression based cancer classification using selflearning and cotraining approaches. arXiv preprint arXiv:14014589; 2014.
 4.
Katayama Y, Maeda M, Miyaguchi K, Nemoto S, Yasen M, Tanaka S, Mizushima H, Fukuoka Y, Arii S, Tanaka H. Identification of pathogenesisrelated microRNAs in hepatocellular carcinoma by expression profiling. Oncol Lett. 2012;4(4):817–23.
 5.
Meister G, Tuschl T. Mechanisms of gene silencing by doublestranded RNA. Nature. 2004;431(7006):343.
 6.
Deng SP, Zhu L, Huang DS. Mining the bladder cancerassociated genes by an integrated strategy for the construction and analysis of differential coexpression networks. BMC genomics. 2015;16(Suppl 3):S4.
 7.
Hu Y, Liu JX, Gao YL, Li SJ, Wang J. Differentially expressed genes extracted by the tensor robust principal component analysis (TRPCA) method. Complexity. 2019;2019:113.
 8.
Lee RC, Feinbaum RL, Ambros V. The C. elegans heterochronic gene lin4 encodes small RNAs with antisense complementarity to lin14. Cell. 1993;75(5):843–54.
 9.
Reinhart BJ, Slack FJ, Basson M, Pasquinelli AE, Bettinger JC, Rougvie AE, Horvitz HR, Ruvkun G. The 21nucleotide let7 RNA regulates developmental timing in Caenorhabditis elegans. Nature. 2000;403(6772):901.
 10.
Shen Z, Zhang YH, Han K, Nandi AK, Honig B, Huang DS. miRNAdisease association prediction with collaborative matrix factorization. Complexity. 2017;2017:19.
 11.
Lu M, Zhang Q, Deng M, Miao J, Guo Y, Gao W, Cui Q. An analysis of human microRNA and disease associations. PLoS One. 2008;3(10):e3420.
 12.
Chen X, Yan GY. Semisupervised learning for potential human microRNAdisease associations inference. Sci Rep. 2014;4:5501.
 13.
Bandyopadhyay S, Mitra R, Maulik U, Zhang MQ. Development of the human cancer microRNA network. Silence. 2010;1(1):6.
 14.
Kozomara A, GriffithsJones S. miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 2013;42(D1):D68–73.
 15.
Yi HC, You ZH, Huang DS, Li X, Jiang TH, Li LP. A deep learning framework for robust and accurate prediction of ncRNAprotein interactions using evolutionary information. Mol Ther Nucleic Acids. 2018;11:337–44.
 16.
Peng C, Zou L, Huang DS. Discovery of relationships between long noncoding RNAs and genes in human diseases based on tensor completion. IEEE Access. 2018;6:59152–62.
 17.
Bao W, Jiang Z, Huang DS. Novel human microbedisease association prediction using network consistency projection. BMC Bioinformatics. 2017;18(16):543.
 18.
Huang DS, Zheng CH. Independent component analysisbased penalized discriminant method for tumor classification using gene expression data. Bioinformatics. 2006;22(15):1855–62.
 19.
Luo H, Zhang H, Zhang Z, Zhang X, Ning B, Guo J, Nie N, Liu B, Wu X. Downregulated miR9 and miR433 in human gastric carcinoma. J Exp Clin Cancer Res. 2009;28(1):82.
 20.
Patel V, Williams D, Hajarnis S, Hunter R, Pontoglio M, Somlo S, Igarashi P. miR17∼ 92 miRNA cluster promotes kidney cyst growth in polycystic kidney disease. Proc Natl Acad Sci. 2013;110(26):10765–70.
 21.
Cui Z, Liu JX, Gao YL, Zhu R, Yuan SS. LncRNAdisease associations prediction using bipartite local model with nearest profilebased association inferring. IEEE journal of biomedical and health informatics; 2019.
 22.
Zheng CH, Huang DS, Zhang L, Kong XZ. Tumor clustering using nonnegative matrix factorization with gene selection. IEEE Trans Inf Technol Biomed. 2009;13(4):599–607.
 23.
Sethupathy P, Collins FS. MicroRNA target site polymorphisms and human disease. Trends Genet. 2008;24(10):489–97.
 24.
Li J, Liu Y, Xin X, Kim TS, Cabeza EA, Ren J, Nielsen R, Wrana JL, Zhang Z. Evidence for positive selection on a number of microRNA regulatory interactions during recent human evolution. PLoS Genet. 2012;8(3):e1002578.
 25.
Yu N, Gao YL, Liu JX, Shang J, Zhu R, Dai LY. Codifferential gene selection and clustering based on graph regularized multiview NMF in cancer genomic data. Genes. 2018;9(12):586.
 26.
Wang D, Wang J, Lu M, Song F, Cui Q. Inferring the human microRNA functional similarity and functional network based on microRNAassociated diseases. Bioinformatics. 2010;26(13):1644–50.
 27.
Gao MM, Cui Z, Gao YL, Li F, Liu JX. Dual Sparse Collaborative Matrix Factorization Method Based on Gaussian Kernel Function for Predicting LncRNADisease Associations. In: International Conference on Intelligent Computing. Cham, Springer; 2019. p. 318–26.
 28.
Jiang Q, Hao Y, Wang G, Juan L, Zhang T, Teng M, Liu Y, Wang Y. Prioritization of disease microRNAs through a human phenomemicroRNAome network. BMC Syst Biol. 2010;4(1):S2.
 29.
Chen X, Yan CC, Zhang X, You ZH, Huang YA, Yan GY. HGIMDA: heterogeneous graph inference for miRNAdisease association prediction. Oncotarget. 2016;7(40):65257.
 30.
Shi H, Xu J, Zhang G, Xu L, Li C, Wang L, Zhao Z, Jiang W, Guo Z, Li X. Walking the interactome to identify human miRNAdisease associations through the functional link between miRNA targets and disease genes. BMC Syst Biol. 2013;7(1):101.
 31.
Chen X, Niu YW, Wang GH, Yan GY. MKRMDA: multiple kernel learningbased Kronecker regularized least squares for MiRNA–disease association prediction. J Transl Med. 2017;15(1):251.
 32.
Chen X, Liu MX, Yan GY. RWRMDA: predicting novel human microRNAdisease associations. Mol BioSyst. 2012;8(10):2792–8.
 33.
Xuan P, Han K, Guo M, Guo Y, Li J, Ding J, Liu Y, Dai Q, Li J, Teng Z. Prediction of microRNAs associated with human diseases based on weighted k most similar neighbors. PLoS One. 2013;8(8):e70204.
 34.
Chen X, Wu QF, Yan GY. RKNNMDA: rankingbased KNN for MiRNAdisease association prediction. RNA Biol. 2017;14(7):952–62.
 35.
Gao MM, Cui Z, Gao YL, Liu JX, Zheng CH. Dualnetwork sparse graph regularized matrix factorization for predicting miRNA–disease associations. Molecular omics. 2019;15(2):130–7.
 36.
Chen X, Zhou Z, Zhao Y. ELLPMDA: ensemble learning and link prediction for miRNAdisease association prediction. RNA Biol. 2018;15(6):807–18.
 37.
Gao YL, Cui Z, Liu JX, Wang J, Zheng CH. NPCMF: nearest profilebased collaborative matrix factorization method for predicting miRNAdisease associations. BMC Bioinformatics. 2019;20(1):353.
 38.
Ezzat A, Zhao P, Wu M, Li XL, Kwoh CK. Drugtarget interaction prediction with graph regularized matrix factorization. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB). 2017;14(3):646–56.
 39.
Yin MM, Cui Z, Gao MM, Liu JX, Gao YL. LWPCMF: logistic weighted profilebased collaborative matrix factorization for predicting MiRNAdisease associations. IEEE/ACM transactions on computational biology and bioinformatics; 2019.
 40.
Cui Z, Gao YL, Liu JX, Wang J, Shang J, Dai LY. The computational prediction of drugdisease interactions using the dualnetwork L 2, 1CMF method. BMC Bioinformatics. 2019;20(1):5.
 41.
Liu JX, Wang DQ, Zheng CH, Gao YL, Wu SS, Shang JL. Identifying drugpathway association pairs based on L 2, 1integrative penalized matrix decomposition. BMC Syst Biol. 2017;11(6):119.
 42.
Li Y, Qiu C, Tu J, Geng B, Yang J, Jiang T, Cui Q. HMDD v2. 0: a database for experimentally supported human microRNA and disease associations. Nucleic Acids Res. 2013;42(D1):D1070–4.
 43.
Liu Y, Zeng X, He Z, Zou Q. Inferring microRNAdisease associations by random walk on a heterogeneous network with multiple data sources. IEEE/ACM Trans Comput Biol Bioinform. 2017;14(4):905–15.
 44.
Yuan L, Zhu L, Guo WL, Zhou X, Zhang Y, Huang Z, Huang DS. Nonconvex penalty based lowrank representation and sparse regression for eQTL mapping. IEEE/ACM Trans Comput Biol Bioinform (TCBB). 2017;14(5):1154–64.
 45.
Cui Z, Gao YL, Liu JX, Dai LY, Yuan SS. L 2, 1GRMF: an improved graph regularized matrix factorization method to predict drugtarget interactions. BMC bioinformatics. 2019;20(8):287.
 46.
Yu N, Gao YL, Liu JX, Wang J, Shang J. Hypergraph regularized NMF by L 2, 1norm for Clustering and Comabnormal Expression Genes Selection. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Madrid;IEEE. 2018;57882.
 47.
Chen X, Yan CC, Zhang X, You ZH, Deng L, Liu Y, Zhang Y, Dai Q. WBSMDA: within and between score for MiRNAdisease association prediction. Sci Rep. 2016;6:21106.
 48.
Chen X, Niu YW, Wang GH, Yan GY. HAMDA: hybrid approach for MiRNAdisease association prediction. J Biomed Inform. 2017;76:50–8.
 49.
Ezzat A, Wu M, Li XL, Kwoh CK. Computational prediction of drugtarget interactions using chemogenomic approaches: an empirical survey. Brief Bioinform. 2018;8:1337–57.
 50.
Ezzat A, Wu M, Li XL, Kwoh CK. Drugtarget interaction prediction via class imbalanceaware ensemble learning. BMC Bioinformatics. 2016;17(19):509.
 51.
Wang LD, Zhou FY, Li XM, Sun LD, Song X, Jin Y, Li JM, Kong GQ, Qi H, Cui J. Genomewide association study of esophageal squamous cell carcinoma in Chinese subjects identifies a susceptibility locus at PLCE1. Nat Genet. 2010;42(9):759.
 52.
Yang Z, Ren F, Liu C, He S, Sun G, Gao Q, Yao L, Zhang Y, Miao R, Cao Y. dbDEMC: a database of differentially expressed miRNAs in human cancers. BMC Genomics. 2010;11(Suppl 4):S5.
 53.
Fassan M, Volinia S, Palatini J, Pizzi M, Baffa R, De Bernard M, Battaglia G, Parente P, Croce CM, Zaninotto G. MicroRNA expression profiling in human Barrett's carcinogenesis. Int J Cancer. 2011;129(7):1661–70.
 54.
Kojima M, Sudo H, Kawauchi J, Takizawa S, Kondou S, Nobumasa H, Ochiai A. MicroRNA markers for the diagnosis of pancreatic and biliarytract cancers. PLoS One. 2015;10(2):e0118220.
 55.
Taniguchi K, Roberts LR, Aderca IN, Dong X, Qian C, Murphy LM, Nagorney DM, Burgart LJ, Roche PC, Smith DI. Mutational spectrum of βcatenin, AXIN1, and AXIN2 in hepatocellular carcinomas and hepatoblastomas. Oncogene. 2002;21(31):4863.
 56.
Oishi N, Kumar MR, Roessler S, Ji J, Forgues M, Budhu A, Zhao X, Andersen JB, Ye QH, Jia HL. Transcriptomic profiling reveals hepatic stemlike gene signatures and interplay of miR200c and epithelialmesenchymal transition in intrahepatic cholangiocarcinoma. Hepatology. 2012;56(5):1792–803.
Acknowledgements
Not applicable.
About this supplement
This article has been published as part of BMC Bioinformatics Volume 20 Supplement 25, 2019: Proceedings of the 2018 International Conference on Intelligent Computing (ICIC 2018) and Intelligent Computing and Biomedical Informatics (ICBI) 2018 conference: bioinformatics. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume20supplement25.
Funding
Publication costs are funded by the NSFC under grant Nos. 61872220, and 61572284; by the CoInnovation Center for Information Supply & Assurance Technology, Anhui University under grant No. ADXXBZ201704.
Author information
Affiliations
Contributions
ZC and JXL jointly contributed to the design of the study. YLG designed and implemented the RCMF method, performed the experiments, and drafted the manuscript. CHZ participated in the design of the study and performed the statistical analysis. JW contributed to the data analysis. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Cui, Z., Liu, JX., Gao, YL. et al. RCMF: a robust collaborative matrix factorization method to predict miRNAdisease associations. BMC Bioinformatics 20, 686 (2019). https://doi.org/10.1186/s1285901932600
Published:
DOI: https://doi.org/10.1186/s1285901932600
Keywords
 MiRNAdisease association prediction
 L_{2,1}norm
 Collaborative regularization
 Matrix factorization