 Research
 Open Access
 Published:
Prediction of lncRNAdisease association based on a Laplace normalized random walk with restart algorithm on heterogeneous networks
BMC Bioinformatics volumeÂ 23, ArticleÂ number:Â 5 (2022)
Abstract
Background
More and more evidence showed that long noncoding RNAs (lncRNAs) play important roles in the development and progression of human sophisticated diseases. Therefore, predicting human lncRNAdisease associations is a challenging and urgently task in bioinformatics to research of human sophisticated diseases.
Results
In the work, a global networkbased computational framework called as LRWRHLDA were proposed which is a universal networkbased method. Firstly, four isomorphic networks include lncRNA similarity network, disease similarity network, gene similarity network and miRNA similarity network were constructed. And then, six heterogeneous networks include known lncRNAdisease, lncRNAgene, lncRNAmiRNA, diseasegene, diseasemiRNA, and genemiRNA associations network were applied to design a multilayer network. Finally, the Laplace normalized random walk with restart algorithm in this global network is suggested to predict the relationship between lncRNAs and diseases.
Conclusions
The tenfold cross validation is used to evaluate the performance of LRWRHLDA. As a result, LRWRHLDA achieves an AUC of 0.98402, which is higher than other compared methods. Furthermore, LRWRHLDA can predict isolated diseaserelated lnRNA (isolated lnRNA related disease). The results for colorectal cancer, lung adenocarcinoma, stomach cancer and breast cancer have been verified by other researches. The case studies indicated that our method is effective.
Background
The disease is an abnormal life activity process that occurs due to the disorder of homeostasis after the body is damaged by the cause of the disease under certain conditions. Currently, many studies have confirmed that there is a complex crossregulation relationship among diseases, genes, lncRNAs, and miRNAs [1,2,3,4].
Many researches have shown that although the proportion of encoded proteins in the human genome is less than 2%, under certain conditions, most of all nucleotides are detectably transcribed [5]. Among the various types of nonproteincoding transcripts, long noncoding RNAs (lncRNAs) and microRNAs (miRNAs) has attracted more and more attention. Among them, lncRNAs are defined as noncoding RNA with a length greater than 200 nucleotides [6]; miRNAs are an RNA molecule with a length of about 19â€“25 nucleotides that exists widely in eukaryotes [7].
The lncRNAs play an important role in a variety of biological mechanisms, such as epigenetic regulation, chromatin remodeling, gene transcription, protein transport, cell transportation [8]. The function of lncRNAs can be divided into the following categories: Transcription interference; Inducing chromatin remodeling and nucleosome modification; Regulating alternative splicing mode; Generating endogenous siRNAs; Regulating protein activity; Structure or Tissue function; Change the location of protein; Precursor of small RNA [5, 9, 10], et al.
Many researchers found that the expression or functional abnormalities of lncRNAs are closely related to the occurrence of human diseases, including cancers and degenerative neurological diseases, which seriously endanger human health. For example: The lncRNA HOTAIR overexpression increases breast cancer cell proliferation [11, 12]. The lncRNA AFAP1AS1 has abnormal expression in cholangiocarcinoma, gallbladdercancer, hepatocellular carcinoma, gastric cancer, colorectal cancer, esophageal cancer [13]. The lncRNA HOXAAS2 may be a biomarker for the treatment of gastric cancer, et al. [14]. There is a close correlation between lncRNA PCGEM1 and osteoarthritis [15]. Therefore, lncRNAs can be used as an important biomarker for the diagnosis of diseases.
The identification of lncRNAdiseases association includes biological experimental verification methods and computational model predictions. For example, based on the biological experiments, Faghihi et al. [16] found that the expression of BACE1AS can promote the rapid feed forward regulation of Î²secretase in Alzheimerâ€™s disease. Applying the RTPCR technology and Northern blot analysis, Hu et al. [17] confirmed and verified that H19 may become a new target for colon cancer antitumor therapy. The results of biological experimental are reliable, however, they are timeconsuming and costly.
Recently, the computational model attracted more and more attention, in which various data resources can be integrated, to identify the lncRNAdisease association. For instance, based on a semisupervised learning framework, the Laplacian regularized least squares for lncRNAdisease association calculation model (LRLSLDA) was suggested to predict potential diseaserelated lncRNA models [18]. Integrating genome, regulome and transcriptome data, the naive Bayesian classifier was proposed to identify cancerrelated lncRNAs [19]. Similarly, based on diseasegene cluster association scores, a machine learning method was suggested to predict potential lncRNAdisease associations [20]. Combining the incremental principal component analysis (IPCA) and random forest (RF) algorithm, a machine learning model, called as IPCARF, was applied to predict the lncRNAdisease associations [21].
In the process of finding lncRNAdisease associations, the method of matrix factorization has also been widely used. For instance, the dualnetwork integrated logistic matrix factorization and Bayesian optimization model has been used for lncRNAdisease associations (DNILMFLDA) [22]. In addition, the weighted graph regularized collaborative matrix factorization (WGRCMF), dual sparse collaborative matrix factorization (DSCMF) and the multilabel fusion collaborative matrix factorization (MLFCMF) were applied to construct model for prediction of lncRNAdisease associations [23,24,25].
Based on the hypothesis that lncRNAs with similar functions may be related to diseases with similar phenotypes, some researchers have proposed several calculation methods based on biological networks to predict diseaserelated lncRNAs.
In addition, integrating the lncRNA and the disease similarity network, and the lncRNAdisease association network. BPLLDA model based on paths of fixed lengths in a heterogeneous lncRNAdisease association network was proposed to predict lncRNAdisease associations [26]. Furthermore, some random walk models on these heterogeneous networks were suggested to predict the relationship between lncRNA and disease [27,28,29]. For example, Sun et al. [27] proposed the random walk with restart method on a lncRNA functional similarity network (RWRlncD). Gu et al. [28] proposed a global networkbased random walk with restart algorithm on lncRNA seed nodes and disease seed nodes to predict the relationship between lncRNA and disease (GrWLDA). Based on the heterogeneous network through the lncRNA, disease, and gene similarity network, MHRWR model was proposed based on random walk with restart algorithm on the global network [29].
Following the random walk with restart model, in the paper, a new computational model based on Laplacian normalized random walk with restart algorithm in a heterogeneous network was proposed to predict the association between lncRNA and disease. Firstly, the disease semantic similarity (lncRNA function similarity, gene function similarity, miRNA function similarity) is calculated. And then, based on the association of lncRNA and disease (miRNA and gene), the Gaussian interaction profile kernel similarity of lncRNA and disease (miRNA and gene) are calculated. The lncRNA function similarity (disease semantic similarity, miRNA function similarity, gene function similarity) is integrated with the Gaussian interaction profile kernel similarity for lncRNAs (diseases, miRNAs, genes) to construct the isomorphic networks. Furthermore, the Laplace normalized random walk with restart algorithm on heterogeneous networks is developed to predict potential lncRNAdisease association. As a result, our method obtains reliable AUCs of 0.98402 in the tenfold cross validation. The performance of our method is superior to other similar methods. Moreover, case studies on colorectal cancer, lung adenocarcinoma, stomach cancer and breast cancer also demonstrate the reliability of our model.
Methods
Experimental data sources
In the paper, the databases involved in lncRNAdisease associations mainly include LncRNADisease database [30, 31], EVLncRNAs database [32], Lnc2Cancer database [33], MNDR v3.1 database [34], et al. Similarly, the lncRNAmiRNA association comes from the integrated data of DIANALncBase database [35], LncAcTdb 2.0 database [36], MiRcode database [37], and StarBase database [38]. The lncRNAgene association comes from the integrated data of LncRNADisease database [30, 31], LncAcTdb 2.0 database [36] and LncRNA2Target v2.0 database [39]. The miRNAdisease association comes from the integrated data of MNDR v3.1 database [34], HMDD database [40] and MiR2Disease database [41]. The miRNAgene association comes from the data of MiRTarBase database [42]. The genedisease association comes from the integrated data of DisGeNET database [43], CREEDS database [44], and DISEASES database [45].
Due to the different databases may have different names for the same biomolecule, so we need to perform data error correction and data cleaning on the data sets obtained from the database (mainly includes deleting duplicates, mistake, vacant data). In addition, the names of biomolecules of the same type from different databases are unified. In order to improve the comprehensiveness of the data and further improve the accuracy and scope of the prediction, the union of the related data of the above database was considered.
For lncRNA, the intersection of three database, lncRNAdisease, lncRNAgene and lncRNAmiRNA association set obtained from all databases, were considered to construct the lncRNA similarity network. There are 814 lncRNA in the work (Fig.Â 1). Finally, 2476 miRNAs, 7986 genes, and 217 diseases were remained to research. At the same time, we also summarize some basic characteristics of the Xâ€“Y association dataset (e.g., the average degree) of the dataset in TableÂ 1. And X and Y both stand for lncRNA, disease, gene, miRNA.
Calculate the similarity matrix
LncRNA functional similarity matrix
Similar to the method of Sun et al. [27], the functional similarity of two lncRNAs was computed as following:
Supposing lncRNA l_{1} is associated with the disease group D_{1} (\(D_{1} = \{ d_{1i} 1 \le i \le a\}\)), and lncRNA l_{2} is associated with the disease group D_{2} (\(D_{2} = \{ d_{2j} 1 \le j \le b\}\)), the similarity between disease d_{11} and a disease group D_{2} is defined as follows:
where \(Sim(d_{11} ,d_{2} )\) is the disease semantic similarity of diseases d_{11} and d_{2}. Then, the functional similarity between lncRNA l_{1} and l_{2} is defined as:
Disease semantic similarity matrix
The Disease Ontology (DO) provides opensource ontology for the integration of biomedical data that is associated with human disease [46]. The terms in DO are diseases or ideas of diseaserelated that are organized in a directed acyclic graph (DAG). Applying the method of Wang et al. [47, 48], the semantic similarity of diseases is calculated as following:
Given disease d, its DAG graph can be expressed as DAG(d)â€‰=â€‰(Ans(d), E(d)), where Ans(d) represents the set of the node, including node and its ancestor nodes, E(d) represents the edge set of the corresponding direct link from the parent node d to the child node. That is the E(d) denotes the relationship between different diseases. Based on DAG graph, the contribution of disease term d to the semantic value of disease T and the semantic value of disease T itself can be computed by the following two steps:
where \(\Delta\) is the semantic contribution attenuation factor and its value ranged from 0 to 1. As the direct distance between disease d and its ancestor diseases increases, the contribution of these ancestral diseases to the semantic value of disease d will gradually decrease. The semantic similarity between diseased d_{1} and diseased d_{2} is calculated by Eq.Â (5):
MiRNA functional similarity matrix
Similar to the Wang et al. [47] method, the functional similarity of two miRNAs can be defined as following:
Assuming that miRNA m_{1} is associated with the disease group D_{3} (\(D_{3} = \{ d_{3k} 1 \le k \le c\}\)) and miRNA m_{2} is associated with the disease group D_{4} (\(D_{4} = \{ d_{4z} 1 \le z \le e\}\)). The similarity of a disease d_{31} and a disease group D_{4} is defined as follows:
and the functional similarity between miRNA m_{1} and m_{2} is computed by Eq.Â (7):
Gene function similarity matrix
The Gene Ontology (GO) database is the worldâ€™s largest informatics resource on the functions of genes [49]. For a GO node A, DAGâ€‰=â€‰(Ans(A), E (A)) is its directed acyclic graph, where Ans(A) represents the set of all ancestors of node A (including node A); E (A) represents the set of edges connecting each node in DAG. For any GO node, assuming t is the ancestor of A, or tâ€‰=â€‰A, \(S_{A} (t)\) of t's contribution to A is defined by Eq.Â (8):
where \(\Delta\) is the semantic contribution attenuation factor and its value ranged from 0 to 1. As the direct distance between gene A and its ancestor genes increases, the contribution of these ancestral genes to the semantic value of gene A will gradually decrease. The semantic contribution \(S_{V} (A)\) of node A is defined as follows:
Then the semantic similarity of nodes A and B is calculated by Eq.Â (10):
The similarity of a go node g and a GO node set \(G = \left\{ {{{\text{go}}_1},g{o_2}, \ldots ,g{o_f}} \right\}\) is defined as:
Assuming that the GO term set annotations of genes G_{1} and G_{2} are \(G{O_1} = \left\{ {{{\text{go}}_{11}},g{o_{12}}, \ldots ,g{o_{1m}}} \right\}\) and \(G{O_2} = \left\{ {{{\text{go}}_{21}},g{o_{22}}, \ldots ,g{o_{2n}}} \right\}\), respectively, the similarity of the two genes G_{1} and G_{2} is calculated by Eq.Â (12) [50]:
Gaussian interaction profile kernel similarity for lncRNAs and diseases
Because there are many zeros in the matrix LS, DS, MS and GS, this will cause the sparsity of the matrix, which may lead to the inaccuracy of the prediction results. To avoid such scenario, we introduce the Gaussian interaction profile kernel similarity [51, 52].
Firstly, the mâ€‰Ã—â€‰n matrix LD represents the association matrix of lncRNA and disease, the elements are only 0 and 1. For example, if lncRNA l_{i} is related to disease d_{j}, LD (i, j)â€‰=â€‰1, otherwise LD (i, j)â€‰=â€‰0.
In the same way, we can define the lncRNAmiRNA association matrix LM, lncRNAgene association matrix LG, diseasegene association matrix DG, miRNAgene association matrix MG, miRNAdisease association matrix MD, respectively.
The Gaussian interaction profile kernel similarity of lncRNA l_{i} and l_{j} is defined as following:
where IP (l_{i}) is a binary vector, which represents the ith row of the lncRNAdisease association matrix LD, and m represents the number of lncRNAs. \(r_{l}^{^{\prime}}\) is a regulation parameter of the kernel bandwidth parameter of \(r_{l}\). According to the previous research, it is set to 1.
Similarly, the Gaussian interaction profile kernel similarity of disease d_{i} and d_{j} is defined as:
where IP (d_{i}) is a binary vector, which represents the ith column of the lncRNAdisease association matrix LD and n is the number of diseases. \(r^{\prime}_{d} = 1\), it is a regulation parameter of the kernel bandwidth parameter of \(r_{d}\).
Gaussian interaction profile kernel similarity for MiRNAs and genes
The Gaussian interaction profile kernel similarity calculation method of miRNA and gene is similar to that of lncRNA and disease, but the correlation matrix MG is used here. Therefore, we similarly define as follows: IP (m_{i})is a binary vector, which represents the ith row of the matrix MG and h is the number of miRNAs. \(r^{\prime}_{m}\)â€‰=â€‰1, it is a regulation parameter of the kernel bandwidth parameter of \(r_{m}\). IP (g_{i}) is a binary vector, which represents the ith column of the matrix MG and k is the number of genes. \(r^{\prime}_{g}\)â€‰=â€‰1, it is a regulation parameter of the kernel bandwidth parameter of \(r_{g}\).
Integration of similarities between lncRNAs, miRNAs, genes, and diseases
We integrate the lncRNA functional similarity (disease semantic similarity, miRNA functional similarity, gene functional similarity) with the Gaussian interaction profile kernel similarity for lncRNAs (diseases, miRNAs, genes) as follows:
where NL is the set of lncRNAs with no functional similarity with any other lncRNAs, ND is the set of diseases with no sematic similarity with any other disease, NM is the set of miRNAs with no functional similarity with any other miRNAs, and NG is the set of genes with no functional similarity with any other genes. By definition, LL, DD, MM and GG are symmetric.
The heterogeneous network
Based on the novel lncRNA similarity matrix LL, diseases similarity matrix DD, miRNA similarity matrix MM, and gene similarity matrix GG, four isomorphic networks include lncRNA similarity network, disease similarity network gene similarity network and miRNA similarity network were constructed, as shown in Fig.Â 2. In addition, a heterogeneous network through these four similarity networks and their interrelation ships were built based on six association matrix LD, LM, LG, MD, MG, DG, as shown in Fig.Â 3.
The random walk with restart
Based on the heterogeneous network, the random walk with restart (RWR) on the heterogeneous network to predict lncRNAdisease association was defined as follows [53]:
where P^{0} is the initial probability vector, P^{t} is the probability vector in which the ith element is the probability of detecting the random walk at node i at step t. Î» is the restart probability, and its value ranged from 0 to 1. W is the probability transition matrix and W_{ij} denotes the transition probability from node i to j, when the L_{1} norm of P^{t+1} and P^{t} is less than 10^{âˆ’6}, it can be considered that reaches a stable state, meanwhile, the stable probability \(P^{\infty }\) can be obtained.
The probability transition matrix W is constructed in this paper as follows:
Among them, the matrix W includes four intratransition matrices and twelve intertransition matrices. W_{LL} is the intratransition matrix of lncRNA similarity network. W_{DD}, W_{MM} and W_{GG} are similar to W_{LL} and represent the intratransition matrix of disease similarity network, miRNA similarity network, and gene similarity network, respectively. W_{LM} is defined as the transition matrix from lncRNA network to miRNA network. W_{LG}, W_{LD}, W_{ML}, W_{MG}, W_{MD}, W_{GL}, W_{GM}, W_{GD}, W_{DL}, W_{DM} and W_{DG} are defined similar to W_{LM}.
Laplacian normalization
Given the matrix Aâ€‰=â€‰A (i, j), the diagonal matrix D is defined as follows, if iâ€‰=â€‰j, then D (i, j) is equal to the sum of the ith row of matrix A, otherwise D (i, j)â€‰=â€‰0, then the Laplace normalization of matrix A is defined as [54, 55]:
Therefore, W_{LM} and W_{LL} can be obtained by the following two steps:
The probability of transition from l_{i} to m_{j} is as follows:
The probability of transition from l_{i} to l_{j} is as follows:
where P_{LM} (P_{LG}, P_{LD}) is the parameter which represents the transition probability from lncRNA similarity network to miRNA (gene, disease) similarity network and its value ranged from 0 to 1. Besides, P_{LMâ€‰â€‰}=â€‰P_{ML}, P_{LGâ€‰â€‰}=â€‰P_{GL}, P_{LDâ€‰}=â€‰P_{DL}, P_{MGâ€‰}=â€‰P_{GM}, P_{MDâ€‰}=â€‰P_{DM}, P_{GDâ€‰}=â€‰P_{DG}. Similarly, other intratransition matrix and intertransition matrix can be defined.Applying the Laplacian normalization, all elements of probability transition matrix W can be obtained.The calculation formula of P^{0} is as follows:
Among them, the parameters P_{L}, P_{M}, P_{G}, 1â€‰âˆ’â€‰P_{Lâ€‰}âˆ’â€‰P_{Mâ€‰}âˆ’â€‰P_{G} represent the importance of lncRNA similarity network, miRNA similarity network, gene similarity network and disease similarity network, respectively. Their values ranged from 0 to 1. U_{L0} represents the initial probability of the lncRNA similarity network, which is equal probabilities and is assigned to all seed nodes in the lncRNA similarity network. The sum of U_{L0} is 1. The initial probability U_{M0} and U_{G0} are similar to U_{L0}. U_{D0} represents the initial probability of the disease similarity network, for disease d, the initial transition probability of disease d is 1, and the transition probability of other diseases is 0.
Finally, the Laplace normalized random walk with restart algorithm is used to predict related lncRNAs scores (see Fig.Â 3). The method was called as LRWRHLDA (the Laplace normalized random walk with restart algorithm in heterogeneous networks to predict the lncRNAdisease association).
Results
Performance evaluation
In this paper, tenfold cross validation is used to evaluate the performance of our model. In the tenfold cross validation, all known lncRNAdisease interactions are randomly divided into ten folds. For each experiment, nine subsets are regarded as training samples and the remaining one subset is treated as test samples. After completing the test, predicted scores are generated. Then, we rank test samples and unknown lncRNAdisease interactions. The corresponding predicted result of test samples is considered as true positive (TP) when the predicted relevance score is greater than the threshold. Otherwise, considered as false negative (FN). Similarly, for the unknown lncRNAdisease interactions, the corresponding predicted result consider as false positive (FP) when the predicted relevance score is greater than the threshold. Otherwise, considered as true negative (TN). Then, the true positive rates (TPR), the false positive rates (FPR), recall and precision are calculated as follow:
Finally, the receiver operating characteristic (ROC) curve and precisionrecall curve (PR) curve are drawn as shown in Fig.Â 4. The area under the ROC curve (AUC) and the area under the PR curve (AUPR) are used to evaluate the performance of our method. The range of AUC, AUPR are all from 0 to 1. When the parameters are set to P_{LM}â€‰=â€‰P_{LG}â€‰=â€‰P_{LD}â€‰=â€‰P_{MG}â€‰=â€‰P_{MD}â€‰=â€‰P_{GD}â€‰=â€‰0.2, P_{L}â€‰=â€‰0.4, P_{M}â€‰=â€‰0.1, P_{G}â€‰=â€‰0.1, Î»â€‰=â€‰0.7, the results of ten experiments are shown in TableÂ 2.
Comparison with different predicted methods using tenfold cross validation
In order to compare with other models, the data in this paper is applied to the BPLLDA model [26], the RWRlncD model [27], GrwLDA model [28] and the MHRWR model [29].
As a result, the ROC curves under tenfold cross validation of LRWRHLDA, RWRlncD, GrwLDA, BPLLDA and MHRWR were plotted in Fig.Â 5.
As can be seen, LRWRHLDA has an AUC of 0.98402 and outperformed RWRlncD (0.53625), GrwLDA (0.83276), BPLLDA (0.87148) and MHRWR (0.97169). In summary, LRWRHLDA is better than other model in lncRNAdisease association prediction.
The area under PR curve (AUPR) is also used to evaluate the performance of LRWRHLDA model, BPLLDA model [26], the RWRlncD model [27], GrwLDA model [28] and MHRWR model [29] to avoid overestimates the performance of these methods (see Fig.Â 6).
It can be seen from Fig.Â 6 that the AUPR value of LRWRHLDA is also higher than other models.
Effects of parameters
There are ten parameters in our model, including the transition probability P_{LM}, P_{LG}, P_{LD}, P_{MG}, P_{MD}, P_{GD} between networks; the weight of the subnet P_{L}, P_{M}, P_{G}; and the restart probability Î». Due to too many parameters and our limited computing resources, we arbitrarily fixed nine of these parameters in the paper and only discussed the impact of restart probability Î» with the tenfold cross validation in our model. The results are shown in TableÂ 3. As can be seen, based on the AUC index, the parameter Î» has less influence on the performance of LRWRHLDA, when Î»â€‰=â€‰0.7. Based on the AUPR index, whenÂ Î» is equal to 0.9, the AUPR value reaches the maximum. And observing TableÂ 3, the results showed that the restart probability Î» has powerful effects on our model.
Case study
Case studies on predicted lncRNAdisease associations
It is known that lncRNAs play critical roles in the development of many diseases. To evaluate the ability of LRWRHLDA in inferring potential lncRNAdisease associations, we use all known lncRNAdisease associations in LD as training data to assess the potential of predicted associations by our model.
The stable probability \(P^{\infty }\) can be used as a measure of proximity to the seed lncRNAs. If \(P^{\infty }\) (lncRNA i)â€‰>â€‰\(P^{\infty }\) (lncRNA j), then lncRNA i will be in closer proximity to the seed lncRNAs than lncRNA j in the lncRNA similarity network. As a result, all candidate lncRNAs can be ranked according to the \(P^{\infty }\), and the top ranked lncRNAs can be expected to have a high probability of being associated with the disease of interest. The novel lncRNAdisease associations are ranked according to the stable probability of LRWRHLDA. To validate the predictions, we use literature or the following those databases: LncRNADisease [30], LncRNADisease v2.0 [31], MNDR v3.1 [34], lnCAR [56]. Specifically, we list the top 10 lncRNAs associated with four diseases, including colorectal cancer, lung adenocarcinoma, stomach cancer and breast cancer. According to \(P^{\infty }\), the top 10 results were shown in TableÂ 4 (the detailed results see Additional file 1: TableS1).
Colorectal cancer is the third most common cancer diagnosed in the US. While the incidence and the mortality rate of colorectal cancer has decreased due to effective cancer screening measures, there has been an increase in number of young patients diagnosed in colon cancer due to unclear reasons at this point of time [57]. Lung adenocarcinoma is one of the main types of lung cancer, which belongs to nonsmall cell carcinoma. The incidence of lung adenocarcinoma is mainly female and nonsmokers [58]. Stomach cancer is the fifth most common cancer and the third most common cause of cancer death globally [59]. The most majority of stomach cancers are adenocarcinomas, with no obvious symptoms in the early stage. They are often similar to the symptoms of chronic gastric diseases such as gastritis and gastric ulcers, and easily ignore. Moreover, the current early diagnosis rate of stomach cancer is still low. Breast cancer is a malignant tumor that occurs in the epithelial tissue of the breast. At present, breast cancer has become a major public health problem in the current society, and its cause is not yet fully understood. In the world, breast cancer is an important cause of human suffering and premature mortality among women [60].
In TableÂ 4, the six potential lncRNAdisease associations were confirmed in the literature except the existing lncRNAdisease associations in the database, in which included ENST00000535511colorectal cancer, RP4colorectal cancer, CTNNAP1colorectal cancer, LINC01021colorectal cancer, GMDSAS1lung adenocarcinoma, LINC01207lung adenocarcinoma. These results demonstrated that the predictive performance of the proposed method.
Case studies on predicted novel diseases and novel lncRNAs
For each disease, it is deemed as a novel disease and all its related lncRNAs are removed to predict potential lncRNAs related the disease. All the candidate lncRNAs were ranked according to \(P^{\infty }\) and lncRNAs with high scores were expected to be potentially related with investigated disease d. Depend on \(P^{\infty }\), the top 10 results were listed in TableÂ 5 (the detailed results see Additional file 2: TableS2).
Analogously, the stable probability \(P^{\infty }\) can be also used as a measure of proximity to the seed diseases. All the candidate diseases were ranked according to \(P^{\infty }\) and diseases with high scores were expected to be potentially related with investigated lncRNA. To evaluate the ability of our model to predict new lncRNAs, we analyzed two lncRNAs including H19 and HOTAIR. For each lncRNA, it is removed all its related diseases in predicting potential diseases. According to \(P^{\infty }\), the top 10 results were showed in TableÂ 6 (the detailed results see Additional file 3: TableS3).
Observing TableÂ 5, we can find that thirtyfive of the top ten lncRNAs associations with four cancers were validated by the database or literature. However, other five cancerlncRNA associations, colorectal cancerCARL, stomach cancerAF117829.1, breast cancerAP003486.1, lung adenocarcinomaAC018413.1 and lung adenocarcinomaTUBB2A have not been confirmed by the database or literature. It implies our method can predict more additional lncRNAdisease associations.
From TableÂ 6, in both cases, all top ten associated diseases were validated by the database. In summary, LRWRHLDA achieves favorable performances in predicting novel diseaseassociated lncRNAs and novel lncRNAassociated diseases.
Discussion
At present, many studies have shown that lncRNA has an important influence on the physiological process of diseases. Because traditional biological experiments are timeconsuming and costly, it is necessary to develop a computational model to predict the association between lncRNA and disease.
In this paper, a new modelLRWRHLDA based on the Laplace normalized random walk with restart algorithm in heterogeneous network was constructed to predict potential lncRNAdisease associations. The tenfold cross validation test is applied to evaluate the prediction performance of our method. In comparison with the stateoftheart prediction methods, our method can achieve better performance in terms of AUC values. Moreover, case studies of colorectal cancer, lung adenocarcinoma, stomach cancer and breast cancer are implemented to further demonstrate that it could be a useful method for predicting potential relationships between lncRNAs and diseases as well.
However, our method has some limitations. Firstly, since we have 10 parameters, the selection and adjustment of parameters still face some difficulties. Secondly, because of our model is based on four networks, there are too many nodes in the network. In the random walk process, the more nodes there are, the longer the random walk time will be. In the future, we will continue to improve the model.
Conclusion
In this study, we proposed an effective method, LRWRHLDA, which is based on the Laplace normalized random walk with restart algorithm in heterogeneous network to predict the potential lncRNA and disease association. First, a heterogeneous network based on lncRNA, disease, miRNA, gene similarity network and their correlation networks were constructed. Then, we calculate the probability transition matrix by Laplace normalization. Finally, the potential lncRNAdisease associations were predicted by the random walk with restart over heterogeneous networks. Furthermore, LRWRHLDA can predict isolated diseaserelated lnRNA (isolated lnRNArelated disease). Our method is evaluated comprehensively by tenfold cross validation and case studies in comparison with other methods. The results show that our method has higher prediction accuracy.
Availability of data and materials
The datasets supporting the conclusions of this article are included within the article and its additional files. The code (executable code and source code) and data for this study are available at https://github.com/wang124/LRWRHLDA.git.
Abbreviations
 lncRNAs:

Long noncoding RNAs
 miRNA:

MicroRNA
 LRWRHLDA:

Prediction the potential lncRNAdisease associations based on Laplace normalized random walk with restart algorithm in heterogeneous networks
 ROC:

Receiver operating characteristic
 TPR:

True positive rates
 FPR:

False positive rates
 AUC:

Areas under ROC curve
 PR:

Precisionrecall
 AUPR:

The area under the precisionrecall curve
References
Moreau Y, Tranchevent LC. Computational tools for prioritizing candidate genes: boosting disease gene discovery. Nat Rev Genet. 2012;13(8):523â€“36.
Rupaimoole R, Slack FJ. MicroRNA therapeutics: towards a new era for the management of cancer and other diseases. Nat Rev Drug Discov. 2017;16(3):203â€“22.
Bhan A, Soleimani M, Mandal SS. Long noncoding RNA and cancer: a new paradigm. Cancer Res. 2017;77(15):3965â€“81.
Dai LY, Liu JX, Zhu R, Wang J, Yuan SS. Logistic weighted profilebased birandom walk for exploring MiRNAdisease associations. J Comput Sci Technol. 2021;36(2):276â€“87.
Jarroux J, Morillon A, Pinskaya M. History, discovery, and classification of lncRNAs. Adv Exp Med Biol. 2017;1008:1â€“46.
Li J, Li Z, Zheng W, Li X, Wang Z, Cui Y, et al. LncRNAATB: an indispensable cancerrelated long noncoding RNA. Cell Prolif. 2017;50(6):e12381.
Lu TX, Rothenberg ME. MicroRNA. J Allergy Clin Immunol. 2018;141(4):1202â€“7.
Geisler S, Coller J. RNA in unexpected places: long noncoding RNA functions in diverse cellular contexts. Nat Rev Mol Cell Biol. 2013;14(11):699â€“712.
Ma L, Bajic VB, Zhang Z. On the classification of long noncoding RNAs. RNA Biol. 2013;10(6):925â€“33.
Li Z, Ho IHT, Li X, Xu D, Wu WKK, Chan MTV, et al. Long noncoding RNAs in the spinal cord injury: novel spotlight. J Cell Mol Med. 2019;23(8):4883â€“90.
Xue X, Yang YA, Zhang A, Fong KW, Kim J, Song B, et al. LncRNA HOTAIR enhances ER signaling and confers tamoxifen resistance in breast cancer. Oncogene. 2016;35(21):2746â€“55.
Gupta RA, Shah N, Wang KC, et al. Long noncoding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature. 2010;464(7291):1071â€“6.
Ji D, Zhong X, Jiang X, Leng K, Xu Y, Li Z, et al. The role of long noncoding RNA AFAP1AS1 in human malignant tumors. Pathol Res Pract. 2018;214(10):1524â€“31.
Wang J, Su Z, Lu S, Fu W, Liu Z, Jiang X, et al. LncRNA HOXAAS2 and its molecular mechanisms in human cancer. Clin Chim Acta. 2018;485:229â€“33.
Zhao Y, Xu J. Synovial fluidderived exosomal lncRNA PCGEM1 as biomarker for the different stages of osteoarthritis. Int Orthop. 2018;42(12):2865â€“72.
Faghihi MA, Modarresi F, Khalil AM, Wood DE, Sahagan BG, Morgan TE, et al. Expression of a noncoding RNA is elevated in Alzheimerâ€™s disease and drives rapid feedforward regulation of betasecretase. Nat Med. 2008;14(7):723â€“30.
Hu Q, Wang YB, Zeng P, Yan GQ, Xin L, Hu XY. Expression of long noncoding RNA (lncRNA) H19 in immunodeficient mice induced with human colon cancer cells. Eur Rev Med Pharmacol Sci. 2016;20(23):4880â€“4.
Chen X, Yan GY. Novel human lncRNAdisease association inference based on lncRNA expression profiles. Bioinformatics. 2013;29(20):2617â€“24.
Zhao T, Xu J, Liu L, Bai J, Xu C, Xiao Y, et al. Identification of cancerrelated lncRNAs through integrating genome, regulome and transcriptome features. Mol Biosyst. 2015;11(1):126â€“36.
Yuan Q, Guo X, Ren Y, Wen X, Gao L. Cluster correlation based method for lncRNAdisease association prediction. BMC Bioinform. 2020;21(1):180.
Zhu R, Wang Y, Liu JX, Dai LY. IPCARF: improving lncRNAdisease association prediction using incremental principal component analysis feature selection and a random forest classifier. BMC Bioinform. 2021;22(1):175.
Li Y, Li J, Bian N. DNILMFLDA: prediction of lncRNAdisease associations by dualnetwork integrated logistic matrix factorization and bayesian optimization. Genes (Basel). 2019;10(8):608.
Liu JX, Cui Z, Gao YL, Kong XZ. WGRCMF: a weighted graph regularized collaborative matrix factorization method for predicting novel LncRNAdisease associations. IEEE J Biomed Health Inform. 2021;25(1):257â€“65.
Liu JX, Gao MM, Cui Z, Gao YL, Li F. DSCMF: prediction of LncRNAdisease associations based on dual sparse collaborative matrix factorization. BMC Bioinform. 2021;22(Suppl 3):241.
Gao MM, Cui Z, Gao YL, Wang J, Liu JX. Multilabel fusion collaborative matrix factorization for predicting LncRNAdisease associations. IEEE J Biomed Health Inform. 2021;25(3):881â€“90.
Xiao X, Zhu W, Liao B, Xu J, Gu C, Ji B, et al. BPLLDA: predicting lncRNAdisease associations based on simple paths with limited lengths in a heterogeneous network. Front Genet. 2018;9:411.
Sun J, Shi H, Wang Z, Zhang C, Liu L, Wang L, et al. Inferring novel lncRNAdisease associations based on a random walk model of a lncRNA functional similarity network. Mol Biosyst. 2014;10(8):2074â€“81.
Gu C, Liao B, Li X, Cai L, Li Z, Li K, et al. Global network random walk for predicting potential human lncRNAdisease associations. Sci Rep. 2017;7(1):12442.
Zhao X, Yang Y, Yin M. MHRWR: prediction of lncRNAdisease associations based on multiple heterogeneous networks. IEEE/ACM Trans Comput Biol Bioinform. 2020;PP.
Chen G, Wang Z, Wang D, Qiu C, Liu M, Chen X, et al. LncRNADisease: a database for longnoncoding RNAassociated diseases. Nucleic Acids Res. 2013;41(Database issue):D983â€“6.
Bao Z, Yang Z, Huang Z, Zhou Y, Cui Q, Dong D. LncRNADisease 2.0: an updated database of long noncoding RNAassociated diseases. Nucleic Acids Res. 2019;47(D1):D1034â€“7.
Zhou B, Ji B, Liu K, Hu G, Wang F, Chen Q, et al. EVLncRNAs 2.0: an updated database of manually curated functional long noncoding RNAs validated by lowthroughput experiments. Nucleic Acids Res. 2021;49(D1):D8691.
Gao Y, Shang S, Guo S, Li X, Zhou H, Liu H, et al. Lnc2Cancer 3.0: an updated resource for experimentally supported lncRNA/circRNA cancer associations and web tools based on RNAseq and scRNAseq data. Nucleic Acids Res. 2021;49(D1):D1251â€“8.
Ning L, Cui T, Zheng B, Wang N, Luo J, Yang B, et al. MNDR v3.0: mammal ncRNAdisease repository with increased coverage and annotation. Nucleic Acids Res. 2021;49(D1):D160â€“4.
Paraskevopoulou MD, Georgakilas G, Kostoulas N, Reczko M, Maragkakis M, Dalamagas TM, et al. DIANALncBase: experimentally verified and computationally predicted microRNA targets on long noncoding RNAs. Nucleic Acids Res. 2013;41(Database issue):D239â€“45.
Wang P, Li X, Gao Y, Guo Q, Wang Y, Fang Y, et al. LncACTdb 2.0: an updated database of experimentally supported ceRNA interactions curated from low and highthroughput experiments. Nucleic Acids Res. 2019;47(D1):D121â€“7.
Jeggari A, Marks DS, Larsson E. MiRcode: a map of putative microRNA target sites in the long noncoding transcriptome. Bioinformatics. 2012;28(15):2062â€“3.
Li JH, Liu S, Zhou H, Qu LH, Yang JH. StarBase v2.0: decoding miRNAceRNA, miRNAncRNA and proteinRNA interaction networks from largescale CLIPSeq data. Nucleic Acids Res. 2014;42(Database issue):D92â€“7.
Cheng L, Wang P, Tian R, Wang S, Guo Q, Luo M, et al. LncRNA2Target v2.0: a comprehensive database for target genes of lncRNAs in human and mouse. Nucleic Acids Res. 2019;47(D1):D140â€“4.
Huang Z, Shi J, Gao Y, Cui C, Zhang S, Li J, et al. HMDD v3.0: a database for experimentally supported human microRNAdisease associations. Nucleic Acids Res. 2019;47(D1):D1013â€“7.
Jiang Q, Wang Y, Hao Y, Juan L, Teng M, Zhang X, et al. MiR2Disease: a manually curated database for microRNA deregulation in human disease. Nucleic Acids Res. 2009;37(Database issue):D98104.
Huang HY, Lin YC, Li J, Huang KY, Shrestha S, Hong HC, et al. MiRTarBase 2020: updates to the experimentally validated microRNAtarget interaction database. Nucleic Acids Res. 2020;48(D1):D148â€“54.
PiÃ±ero J, RamÃrezAnguita JM, SaÃ¼chPitarch J, Ronzano F, Centeno E, Sanz F, et al. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res. 2020;48(D1):D845â€“55.
Wang Z, Monteiro CD, Jagodnik KM, Fernandez NF, Gundersen GW, Rouillard AD, et al. Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd. Nat Commun. 2016;7:12846.
PletscherFrankild S, PallejÃ A, Tsafou K, Binder JX, Jensen LJ. DISEASES: text mining and data integration of diseasegene associations. Methods. 2015;74:83â€“9.
Schriml LM, Mitraka E, Munro J, Tauber B, Schor M, Nickle L, et al. Human Disease Ontology 2018 update: classification, content and workflow expansion. Nucleic Acids Res. 2019;47(D1):D955â€“62.
Wang D, Wang J, Lu M, Song F, Cui Q. Inferring the human microRNA functional similarity and functional network based on microRNAassociated diseases. Bioinformatics. 2010;26(13):1644â€“50.
Li J, Gong B, Chen X, Liu T, Wu C, Zhang F, et al. DOSim: an R package for similarity between diseases based on disease ontology. BMC Bioinform. 2011;12:266.
The Gene Ontology Consortium. The gene ontology resource: 20 years and still GOing strong. Nucleic Acids Res. 2019;47(D1):D330â€“8.
Wang JZ, Du Z, Payattakool R, Yu PS, Chen CF. A new method to measure the semantic similarity of GO terms. Bioinformatics. 2007;23(10):1274â€“81.
Laarhoven TV, Nabuurs SB, Marchiori E. Gaussian interaction profile kernels for predicting drugtarget interaction. Bioinformatics. 2011;27(21):3036â€“43.
Ganegoda GU, Li M, Wang W, Feng Q. Heterogeneous network model to infer human diseaselong intergenic noncoding RNA associations. IEEE Trans Nanobiosci. 2015;14(2):175â€“83.
Li Y, Patra JC. Genomewide inferring geneâ€“phenotype relationship by walking on the heterogeneous network. Bioinformatics. 2010;26(9):1219â€“24.
Wen Y, Han G, Anh VV. Laplacian normalization and birandom walks on heterogeneous networks for predicting lncRNAdisease associations. BMC Syst Biol. 2018;12(Suppl 9):122.
Zhao ZQ, Han GS, Yu ZG, Li J. Laplacian normalization and random walk on heterogeneous networks for diseasegene prioritization. Comput Biol Chem. 2015;57:21â€“8.
Zheng Y, Xu Q, Liu M, Hu H, Xie Y, Zuo Z, et al. LnCAR: a comprehensive resource for lncRNAs from cancer arrays. Cancer Res. 2019;79(8):2076â€“83.
Thanikachalam K, Khan G. Colorectal cancer and nutrition. Nutrients. 2019;11(1):164.
Song Q, Shang J, Yang Z, Zhang L, Zhang C, Chen J, et al. Identification of an immune signature predicting prognosis risk of patients in lung adenocarcinoma. J Transl Med. 2019;17(1):70.
Smyth EC, Nilsson M, Grabsch HI, van Grieken NC, Lordick F. Gastric cancer. Lancet. 2020;396(10251):635â€“48.
Coughlin SS. Epidemiology of breast cancer in women. Adv Exp Med Biol. 2019;1152:9â€“29.
Acknowledgements
This work was supported by the National Natural Science Foundation of China (No. 61772027, 61772028), key research and development plan of Zhejiang Province (2021C02039).
Funding
This research is partly sponsored by the National Natural Science Foundation of China (No. 61772027, 61772028), key research and development plan of Zhejiang Province (2021C02039). The funding bodies did not play any roles in the design of the study, in the collection, analysis, or interpretation of data, or in writing the manuscript.
Author information
Authors and Affiliations
Contributions
LW, MS, QD and PH designed the study. LW and MS carried out analyses and wrote the program. LW and PH wrote the paper. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1
. In this file we provide the results of stable probability of lncRNA when LRWRHLDA run over for four cancers based on the LD matrix.
Additional file 2
. In this file we provide the results of stable probability of lncRNA when LRWRHLDA run over when delete related lncRNAs of the cancer.
Additional file 3
. In this file we provide the results of stable probability of lncRNA when LRWRHLDA run over when delete related cancer of the lncRNAs.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Wang, L., Shang, M., Dai, Q. et al. Prediction of lncRNAdisease association based on a Laplace normalized random walk with restart algorithm on heterogeneous networks. BMC Bioinformatics 23, 5 (2022). https://doi.org/10.1186/s12859021045381
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12859021045381
Keywords
 lncRNAdisease associations
 Similarity network
 Heterogeneous network
 LRWRHLDA
 Tenfold cross validation
 AUC