Skip to main content

Prediction of lncRNA-disease association based on a Laplace normalized random walk with restart algorithm on heterogeneous networks

Abstract

Background

More and more evidence showed that long non-coding RNAs (lncRNAs) play important roles in the development and progression of human sophisticated diseases. Therefore, predicting human lncRNA-disease associations is a challenging and urgently task in bioinformatics to research of human sophisticated diseases.

Results

In the work, a global network-based computational framework called as LRWRHLDA were proposed which is a universal network-based method. Firstly, four isomorphic networks include lncRNA similarity network, disease similarity network, gene similarity network and miRNA similarity network were constructed. And then, six heterogeneous networks include known lncRNA-disease, lncRNA-gene, lncRNA-miRNA, disease-gene, disease-miRNA, and gene-miRNA associations network were applied to design a multi-layer network. Finally, the Laplace normalized random walk with restart algorithm in this global network is suggested to predict the relationship between lncRNAs and diseases.

Conclusions

The ten-fold cross validation is used to evaluate the performance of LRWRHLDA. As a result, LRWRHLDA achieves an AUC of 0.98402, which is higher than other compared methods. Furthermore, LRWRHLDA can predict isolated disease-related lnRNA (isolated lnRNA related disease). The results for colorectal cancer, lung adenocarcinoma, stomach cancer and breast cancer have been verified by other researches. The case studies indicated that our method is effective.

Peer Review reports

Background

The disease is an abnormal life activity process that occurs due to the disorder of homeostasis after the body is damaged by the cause of the disease under certain conditions. Currently, many studies have confirmed that there is a complex cross-regulation relationship among diseases, genes, lncRNAs, and miRNAs [1,2,3,4].

Many researches have shown that although the proportion of encoded proteins in the human genome is less than 2%, under certain conditions, most of all nucleotides are detectably transcribed [5]. Among the various types of non-protein-coding transcripts, long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) has attracted more and more attention. Among them, lncRNAs are defined as non-coding RNA with a length greater than 200 nucleotides [6]; miRNAs are an RNA molecule with a length of about 19–25 nucleotides that exists widely in eukaryotes [7].

The lncRNAs play an important role in a variety of biological mechanisms, such as epigenetic regulation, chromatin remodeling, gene transcription, protein transport, cell transportation [8]. The function of lncRNAs can be divided into the following categories: Transcription interference; Inducing chromatin remodeling and nucleosome modification; Regulating alternative splicing mode; Generating endogenous siRNAs; Regulating protein activity; Structure or Tissue function; Change the location of protein; Precursor of small RNA [5, 9, 10], et al.

Many researchers found that the expression or functional abnormalities of lncRNAs are closely related to the occurrence of human diseases, including cancers and degenerative neurological diseases, which seriously endanger human health. For example: The lncRNA HOTAIR overexpression increases breast cancer cell proliferation [11, 12]. The lncRNA AFAP1-AS1 has abnormal expression in cholangiocarcinoma, gallbladdercancer, hepatocellular carcinoma, gastric cancer, colorectal cancer, esophageal cancer [13]. The lncRNA HOXA-AS2 may be a biomarker for the treatment of gastric cancer, et al. [14]. There is a close correlation between lncRNA PCGEM1 and osteoarthritis [15]. Therefore, lncRNAs can be used as an important biomarker for the diagnosis of diseases.

The identification of lncRNA-diseases association includes biological experimental verification methods and computational model predictions. For example, based on the biological experiments, Faghihi et al. [16] found that the expression of BACE1-AS can promote the rapid feed forward regulation of β-secretase in Alzheimer’s disease. Applying the RT-PCR technology and Northern blot analysis, Hu et al. [17] confirmed and verified that H19 may become a new target for colon cancer anti-tumor therapy. The results of biological experimental are reliable, however, they are time-consuming and costly.

Recently, the computational model attracted more and more attention, in which various data resources can be integrated, to identify the lncRNA-disease association. For instance, based on a semi-supervised learning framework, the Laplacian regularized least squares for lncRNA-disease association calculation model (LRLSLDA) was suggested to predict potential disease-related lncRNA models [18]. Integrating genome, regulome and transcriptome data, the naive Bayesian classifier was proposed to identify cancer-related lncRNAs [19]. Similarly, based on disease-gene cluster association scores, a machine learning method was suggested to predict potential lncRNA-disease associations [20]. Combining the incremental principal component analysis (IPCA) and random forest (RF) algorithm, a machine learning model, called as IPCARF, was applied to predict the lncRNA-disease associations [21].

In the process of finding lncRNA-disease associations, the method of matrix factorization has also been widely used. For instance, the dual-network integrated logistic matrix factorization and Bayesian optimization model has been used for lncRNA-disease associations (DNILMF-LDA) [22]. In addition, the weighted graph regularized collaborative matrix factorization (WGRCMF), dual sparse collaborative matrix factorization (DSCMF) and the multi-label fusion collaborative matrix factorization (MLFCMF) were applied to construct model for prediction of lncRNA-disease associations [23,24,25].

Based on the hypothesis that lncRNAs with similar functions may be related to diseases with similar phenotypes, some researchers have proposed several calculation methods based on biological networks to predict disease-related lncRNAs.

In addition, integrating the lncRNA and the disease similarity network, and the lncRNA-disease association network. BPLLDA model based on paths of fixed lengths in a heterogeneous lncRNA-disease association network was proposed to predict lncRNA-disease associations [26]. Furthermore, some random walk models on these heterogeneous networks were suggested to predict the relationship between lncRNA and disease [27,28,29]. For example, Sun et al. [27] proposed the random walk with restart method on a lncRNA functional similarity network (RWRlncD). Gu et al. [28] proposed a global network-based random walk with restart algorithm on lncRNA seed nodes and disease seed nodes to predict the relationship between lncRNA and disease (GrWLDA). Based on the heterogeneous network through the lncRNA, disease, and gene similarity network, MHRWR model was proposed based on random walk with restart algorithm on the global network [29].

Following the random walk with restart model, in the paper, a new computational model based on Laplacian normalized random walk with restart algorithm in a heterogeneous network was proposed to predict the association between lncRNA and disease. Firstly, the disease semantic similarity (lncRNA function similarity, gene function similarity, miRNA function similarity) is calculated. And then, based on the association of lncRNA and disease (miRNA and gene), the Gaussian interaction profile kernel similarity of lncRNA and disease (miRNA and gene) are calculated. The lncRNA function similarity (disease semantic similarity, miRNA function similarity, gene function similarity) is integrated with the Gaussian interaction profile kernel similarity for lncRNAs (diseases, miRNAs, genes) to construct the isomorphic networks. Furthermore, the Laplace normalized random walk with restart algorithm on heterogeneous networks is developed to predict potential lncRNA-disease association. As a result, our method obtains reliable AUCs of 0.98402 in the ten-fold cross validation. The performance of our method is superior to other similar methods. Moreover, case studies on colorectal cancer, lung adenocarcinoma, stomach cancer and breast cancer also demonstrate the reliability of our model.

Methods

Experimental data sources

In the paper, the databases involved in lncRNA-disease associations mainly include LncRNADisease database [30, 31], EVLncRNAs database [32], Lnc2Cancer database [33], MNDR v3.1 database [34], et al. Similarly, the lncRNA-miRNA association comes from the integrated data of DIANA-LncBase database [35], LncAcTdb 2.0 database [36], MiRcode database [37], and StarBase database [38]. The lncRNA-gene association comes from the integrated data of LncRNADisease database [30, 31], LncAcTdb 2.0 database [36] and LncRNA2Target v2.0 database [39]. The miRNA-disease association comes from the integrated data of MNDR v3.1 database [34], HMDD database [40] and MiR2Disease database [41]. The miRNA-gene association comes from the data of MiRTarBase database [42]. The gene-disease association comes from the integrated data of DisGeNET database [43], CREEDS database [44], and DISEASES database [45].

Due to the different databases may have different names for the same biomolecule, so we need to perform data error correction and data cleaning on the data sets obtained from the database (mainly includes deleting duplicates, mistake, vacant data). In addition, the names of biomolecules of the same type from different databases are unified. In order to improve the comprehensiveness of the data and further improve the accuracy and scope of the prediction, the union of the related data of the above database was considered.

For lncRNA, the intersection of three database, lncRNA-disease, lncRNA-gene and lncRNA-miRNA association set obtained from all databases, were considered to construct the lncRNA similarity network. There are 814 lncRNA in the work (Fig. 1). Finally, 2476 miRNAs, 7986 genes, and 217 diseases were remained to research. At the same time, we also summarize some basic characteristics of the X–Y association dataset (e.g., the average degree) of the dataset in Table 1. And X and Y both stand for lncRNA, disease, gene, miRNA.

Fig. 1
figure 1

Ultimately retained the number of lncRNA, disease, miRNA, gene node. A lncRNA. B Disease. C miRNA. D Gene

Table 1 The basic characteristics of the X–Y association dataset

Calculate the similarity matrix

LncRNA functional similarity matrix

Similar to the method of Sun et al. [27], the functional similarity of two lncRNAs was computed as following:

Supposing lncRNA l1 is associated with the disease group D1 (\(D_{1} = \{ d_{1i} |1 \le i \le a\}\)), and lncRNA l2 is associated with the disease group D2 (\(D_{2} = \{ d_{2j} |1 \le j \le b\}\)), the similarity between disease d11 and a disease group D2 is defined as follows:

$$S(d_{11} ,D_{2} ) = \mathop {\max }\limits_{{d_{2} \in D_{2} }} (Sim(d_{11} ,d_{2} )),$$
(1)

where \(Sim(d_{11} ,d_{2} )\) is the disease semantic similarity of diseases d11 and d2. Then, the functional similarity between lncRNA l1 and l2 is defined as:

$$LS(l_{1} ,l_{2} ) = \frac{{\sum\limits_{1 \le i \le a} {S(d_{1i} ,D_{2} ) + \sum\limits_{1 \le j \le b} {S(d_{2j} ,D_{1} )} } }}{a + b}.$$
(2)

Disease semantic similarity matrix

The Disease Ontology (DO) provides open-source ontology for the integration of biomedical data that is associated with human disease [46]. The terms in DO are diseases or ideas of disease-related that are organized in a directed acyclic graph (DAG). Applying the method of Wang et al. [47, 48], the semantic similarity of diseases is calculated as following:

Given disease d, its DAG graph can be expressed as DAG(d) = (Ans(d), E(d)), where Ans(d) represents the set of the node, including node and its ancestor nodes, E(d) represents the edge set of the corresponding direct link from the parent node d to the child node. That is the E(d) denotes the relationship between different diseases. Based on DAG graph, the contribution of disease term d to the semantic value of disease T and the semantic value of disease T itself can be computed by the following two steps:

$$\left\{ \begin{gathered} D_{T} (d) = 1\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \;\;if\;d = T, \hfill \\ D_{T} (d) = \max \{ \Delta *D_{T} (d^{\prime})|d^{\prime} \in chidren\;of\;d\} \quad if\;d \ne T,\; \hfill \\ \end{gathered} \right.$$
(3)
$$DV(T) = \sum\limits_{d \in Ans(d)} {D_{T} (d)} ,$$
(4)

where \(\Delta\) is the semantic contribution attenuation factor and its value ranged from 0 to 1. As the direct distance between disease d and its ancestor diseases increases, the contribution of these ancestral diseases to the semantic value of disease d will gradually decrease. The semantic similarity between diseased d1 and diseased d2 is calculated by Eq. (5):

$$DS(d_{1} ,d_{2} ) = \frac{{\sum\limits_{{d \in (Ans(d_{1} ) \cap Ans(d_{2} ))}} {(D_{{d_{1} }} (d) + D_{{d_{2} }} (d))} }}{{DV(d_{1} ) + DV(d_{2} )}}.$$
(5)

MiRNA functional similarity matrix

Similar to the Wang et al. [47] method, the functional similarity of two miRNAs can be defined as following:

Assuming that miRNA m1 is associated with the disease group D3 (\(D_{3} = \{ d_{3k} |1 \le k \le c\}\)) and miRNA m2 is associated with the disease group D4 (\(D_{4} = \{ d_{4z} |1 \le z \le e\}\)). The similarity of a disease d31 and a disease group D4 is defined as follows:

$$S(d_{31} ,D_{4} ) = \mathop {\max }\limits_{{d_{4} \in D_{4} }} (Sim(d_{31} ,d_{4} )),$$
(6)

and the functional similarity between miRNA m1 and m2 is computed by Eq. (7):

$$MS(m_{1} ,m_{2} ) = \frac{{\sum\limits_{1 \le k \le c} {S(d_{3k} ,D_{4} ) + \sum\limits_{1 \le z \le e} {S(d_{4z} ,D_{3} )} } }}{c + e}.$$
(7)

Gene function similarity matrix

The Gene Ontology (GO) database is the world’s largest informatics resource on the functions of genes [49]. For a GO node A, DAG = (Ans(A), E (A)) is its directed acyclic graph, where Ans(A) represents the set of all ancestors of node A (including node A); E (A) represents the set of edges connecting each node in DAG. For any GO node, assuming t is the ancestor of A, or t = A, \(S_{A} (t)\) of t's contribution to A is defined by Eq. (8):

$$\left\{ \begin{gathered} S_{A} (t) = 1\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \;\;\;if\;t = A, \hfill \\ S_{A} (t) = \max \{ \Delta *S_{A} (t^{\prime})|t^{\prime} \in chidren\;of\;t\} \quad if\;t \ne A, \hfill \\ \end{gathered} \right.$$
(8)

where \(\Delta\) is the semantic contribution attenuation factor and its value ranged from 0 to 1. As the direct distance between gene A and its ancestor genes increases, the contribution of these ancestral genes to the semantic value of gene A will gradually decrease. The semantic contribution \(S_{V} (A)\) of node A is defined as follows:

$$S_{V} (A) = \sum {_{t \in Ans(A)} } S_{A} (t).$$
(9)

Then the semantic similarity of nodes A and B is calculated by Eq. (10):

$$S_{GO} (A,B) = \frac{{\begin{array}{*{20}c} {\sum {_{t \in (Ans(A) \cap t \in Ans(B))} } } & {(S_{A} (t) + S_{B} (t))} \\ \end{array} }}{{S_{V} (A) + S_{V} (B)}}.$$
(10)

The similarity of a go node g and a GO node set \(G = \left\{ {{{\text{go}}_1},g{o_2}, \ldots ,g{o_f}} \right\}\) is defined as:

$$S(g,G) = \mathop {\max }\limits_{1 \le i \le f} (S_{GO} (g,go_{i} )).$$
(11)

Assuming that the GO term set annotations of genes G1 and G2 are \(G{O_1} = \left\{ {{{\text{go}}_{11}},g{o_{12}}, \ldots ,g{o_{1m}}} \right\}\) and \(G{O_2} = \left\{ {{{\text{go}}_{21}},g{o_{22}}, \ldots ,g{o_{2n}}} \right\}\), respectively, the similarity of the two genes G1 and G2 is calculated by Eq. (12) [50]:

$$GS(G_{1} ,G_{2} ) = \frac{{\sum\limits_{1 \le i \le m} {S(go_{1i} ,GO_{2} ) + \sum\limits_{1 \le j \le n} {S(go_{2j} ,GO_{1} )} } }}{m + n}.$$
(12)

Gaussian interaction profile kernel similarity for lncRNAs and diseases

Because there are many zeros in the matrix LS, DS, MS and GS, this will cause the sparsity of the matrix, which may lead to the inaccuracy of the prediction results. To avoid such scenario, we introduce the Gaussian interaction profile kernel similarity [51, 52].

Firstly, the m × n matrix LD represents the association matrix of lncRNA and disease, the elements are only 0 and 1. For example, if lncRNA li is related to disease dj, LD (i, j) = 1, otherwise LD (i, j) = 0.

In the same way, we can define the lncRNA-miRNA association matrix LM, lncRNA-gene association matrix LG, disease-gene association matrix DG, miRNA-gene association matrix MG, miRNA-disease association matrix MD, respectively.

The Gaussian interaction profile kernel similarity of lncRNA li and lj is defined as following:

$$GaL(l_{i} ,l_{j} ) = \exp ( - r_{l} ||IP(l_{i} ) - IP(l_{j} )||^{2} ),$$
(13)
$$r_{l} = r^{\prime}_{l} /(\frac{1}{m}\sum\limits_{i = 1}^{m} {||IP(l_{i} )} ||).$$
(14)

where IP (li) is a binary vector, which represents the ith row of the lncRNA-disease association matrix LD, and m represents the number of lncRNAs. \(r_{l}^{^{\prime}}\) is a regulation parameter of the kernel bandwidth parameter of \(r_{l}\). According to the previous research, it is set to 1.

Similarly, the Gaussian interaction profile kernel similarity of disease di and dj is defined as:

$$GaD_{{}} (d_{i} ,d_{j} ) = \exp ( - r_{d} ||IP(d_{i} ) - IP(d_{j} )||^{2} ),$$
(15)
$$r_{d} = r^{\prime}_{d} /(\frac{1}{n}\sum\limits_{i = 1}^{n} {||IP(d_{i} )} ||).$$
(16)

where IP (di) is a binary vector, which represents the ith column of the lncRNA-disease association matrix LD and n is the number of diseases. \(r^{\prime}_{d} = 1\), it is a regulation parameter of the kernel bandwidth parameter of \(r_{d}\).

Gaussian interaction profile kernel similarity for MiRNAs and genes

The Gaussian interaction profile kernel similarity calculation method of miRNA and gene is similar to that of lncRNA and disease, but the correlation matrix MG is used here. Therefore, we similarly define as follows: IP (mi)is a binary vector, which represents the i-th row of the matrix MG and h is the number of miRNAs. \(r^{\prime}_{m}\) = 1, it is a regulation parameter of the kernel bandwidth parameter of \(r_{m}\). IP (gi) is a binary vector, which represents the ith column of the matrix MG and k is the number of genes. \(r^{\prime}_{g}\) = 1, it is a regulation parameter of the kernel bandwidth parameter of \(r_{g}\).

Integration of similarities between lncRNAs, miRNAs, genes, and diseases

We integrate the lncRNA functional similarity (disease semantic similarity, miRNA functional similarity, gene functional similarity) with the Gaussian interaction profile kernel similarity for lncRNAs (diseases, miRNAs, genes) as follows:

$$LL = \left\{ \begin{gathered} GaL(l_{i} ,l_{j} )\quad if\;l_{i} \;or\;l_{j} \in NL,\; \hfill \\ LS(l_{i} ,l_{j} )\quad \quad \quad \quad \quad \;\;\;\,else. \hfill \\ \end{gathered} \right.$$
(17)
$$DD = \left\{ \begin{gathered} GaD(d_{i} ,d_{j} )\quad \;if\;d_{i} \;or\;d_{j} \in ND,\quad \hfill \\ DS(d_{i} ,d_{j} )\quad \quad \quad \quad \quad \quad \;\;\,\,else. \hfill \\ \end{gathered} \right.$$
(18)
$$MM = \left\{ \begin{gathered} GaM(m_{i} ,m_{j} )\quad if\;m_{i} \;or\;m_{j} \in NM, \hfill \\ MS(m_{i} ,m_{j} )\quad \quad \quad \quad \quad \quad \quad \,else. \hfill \\ \end{gathered} \right.$$
(19)
$$GG = \left\{ \begin{gathered} GaG(g_{i} ,g_{j} )\quad if\;g_{i} \;or\;g_{j} \in NG, \hfill \\ GS(g_{i} ,g_{j} )\quad \quad \quad \quad \quad \quad \;\;\;else. \hfill \\ \end{gathered} \right.$$
(20)

where NL is the set of lncRNAs with no functional similarity with any other lncRNAs, ND is the set of diseases with no sematic similarity with any other disease, NM is the set of miRNAs with no functional similarity with any other miRNAs, and NG is the set of genes with no functional similarity with any other genes. By definition, LL, DD, MM and GG are symmetric.

The heterogeneous network

Based on the novel lncRNA similarity matrix LL, diseases similarity matrix DD, miRNA similarity matrix MM, and gene similarity matrix GG, four isomorphic networks include lncRNA similarity network, disease similarity network gene similarity network and miRNA similarity network were constructed, as shown in Fig. 2. In addition, a heterogeneous network through these four similarity networks and their interrelation ships were built based on six association matrix LD, LM, LG, MD, MG, DG, as shown in Fig. 3.

Fig. 2
figure 2

Construction of similarity network of lncRNAs, diseases, miRNAs and genes

Fig. 3
figure 3

Construction of heterogeneous network, and rank of lncRNA according to the stable probability of lncRNA by LRWRHLDA

The random walk with restart

Based on the heterogeneous network, the random walk with restart (RWR) on the heterogeneous network to predict lncRNA-disease association was defined as follows [53]:

$$P^{t + 1} = (1 - \lambda )WP^{t} + \lambda P^{0} ,$$
(21)

where P0 is the initial probability vector, Pt is the probability vector in which the ith element is the probability of detecting the random walk at node i at step t. λ is the restart probability, and its value ranged from 0 to 1. W is the probability transition matrix and Wij denotes the transition probability from node i to j, when the L1 norm of Pt+1 and Pt is less than 10−6, it can be considered that reaches a stable state, meanwhile, the stable probability \(P^{\infty }\) can be obtained.

The probability transition matrix W is constructed in this paper as follows:

$$W = \left( {\begin{array}{*{20}c} {W_{LL} } & {W_{LM} } & {W_{LG} } & {W_{LD} } \\ {W_{ML} } & {W_{MM} } & {W_{MG} } & {W_{MD} } \\ {W_{GL} } & {W_{GM} } & {W_{GG} } & {W_{GD} } \\ {W_{DL} } & {W_{DM} } & {W_{DG} } & {W_{DD} } \\ \end{array} } \right).$$
(22)

Among them, the matrix W includes four intra-transition matrices and twelve inter-transition matrices. WLL is the intra-transition matrix of lncRNA similarity network. WDD, WMM and WGG are similar to WLL and represent the intra-transition matrix of disease similarity network, miRNA similarity network, and gene similarity network, respectively. WLM is defined as the transition matrix from lncRNA network to miRNA network. WLG, WLD, WML, WMG, WMD, WGL, WGM, WGD, WDL, WDM and WDG are defined similar to WLM.

Laplacian normalization

Given the matrix A = A (i, j), the diagonal matrix D is defined as follows, if i = j, then D (i, j) is equal to the sum of the ith row of matrix A, otherwise D (i, j) = 0, then the Laplace normalization of matrix A is defined as [54, 55]:

$$\overrightarrow {A} (i,j) = \frac{A(i,j)}{{\sqrt {D(i,i)D(j,j)} }}.$$
(23)

Therefore, WLM and WLL can be obtained by the following two steps:

The probability of transition from li to mj is as follows:

$$\overrightarrow {LM} (i,j) = \left\{ \begin{gathered} \frac{LM(i,j)}{{\sqrt {\sum\limits_{i} {LM} (i,j)\sum\limits_{j} {LM} (i,j)} }}\quad if\;\sum\limits_{i} {LM} (i,j)\sum\limits_{j} {LM} (i,j) \ne 0,\; \hfill \\ 0\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad else.\quad \hfill \\ \end{gathered} \right.$$
(24)
$$W_{LM} (i,j) = \left\{ \begin{gathered} P_{LM} *\frac{{\overrightarrow {LM} (i,j)}}{{\sum\limits_{j} {\overrightarrow {LM} (i,j)} }}\quad if\;\sum\limits_{j} {\mathop {LM}\limits^{ \to } (i,j)} \ne 0, \hfill \\ 0\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \,else. \hfill \\ \end{gathered} \right.$$
(25)

The probability of transition from li to lj is as follows:

$$\overrightarrow {LL} (i,j) = \left\{ \begin{gathered} \frac{LL(i,j)}{{\sqrt {\sum\limits_{i} {LL} (i,j)\sum\limits_{j} {LL} (i,j)} }}\quad if\;\sum\limits_{i} {LL} (i,j)\sum\limits_{j} {LL} (i,j) \ne 0, \hfill \\ 0\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \;{\kern 1pt} {\kern 1pt} else. \hfill \\ \end{gathered} \right.\;$$
(26)
$$W_{LL} (i,j) = \left\{ {\begin{array}{*{20}l} {{{\overrightarrow {LL} (i,j)} \mathord{\left/ {\vphantom {{\overrightarrow {LL} (i,j)} {\sum\limits_{j} {\overrightarrow {LL} (i,j)} }}} \right. \kern-\nulldelimiterspace} {\sum\limits_{j} {\overrightarrow {LL} (i,j)} }}} \hfill & {if\;\sum\limits_{j} {\overrightarrow {LM} (i,j)} = 0,\sum\limits_{j} {\overrightarrow {LG} (i,j)} = 0,\sum\limits_{j} {\overrightarrow {LD} (i,j)} = 0,} \hfill \\ {(1 - P_{LM} )*{{\overrightarrow {LL} (i,j)} \mathord{\left/ {\vphantom {{\overrightarrow {LL} (i,j)} {\sum\limits_{j} {\overrightarrow {LL} (i,j)} }}} \right. \kern-\nulldelimiterspace} {\sum\limits_{j} {\overrightarrow {LL} (i,j)} }}} \hfill & {if\;\sum\limits_{j} {\overrightarrow {LM} (i,j)} \ne 0,\sum\limits_{j} {\overrightarrow {LG} (i,j)} = 0,\sum\limits_{j} {\overrightarrow {LD} (i,j)} = 0,} \hfill \\ {(1 - P_{LG} )*{{\overrightarrow {LL} (i,j)} \mathord{\left/ {\vphantom {{\overrightarrow {LL} (i,j)} {\sum\limits_{j} {\overrightarrow {LL} (i,j)} }}} \right. \kern-\nulldelimiterspace} {\sum\limits_{j} {\overrightarrow {LL} (i,j)} }}} \hfill & {if\;\sum\limits_{j} {\overrightarrow {LM} (i,j)} = 0,\sum\limits_{j} {\overrightarrow {LG} (i,j)} \ne 0,\sum\limits_{j} {\overrightarrow {LD} (i,j)} = 0,} \hfill \\ {(1 - P_{LD} )*{{\overrightarrow {LL} (i,j)} \mathord{\left/ {\vphantom {{\overrightarrow {LL} (i,j)} {\sum\limits_{j} {\overrightarrow {LL} (i,j)} }}} \right. \kern-\nulldelimiterspace} {\sum\limits_{j} {\overrightarrow {LL} (i,j)} }}} \hfill & {if\;\sum\limits_{j} {\overrightarrow {LM} (i,j)} = 0,\sum\limits_{j} {\overrightarrow {LG} (i,j)} = 0,\sum\limits_{j} {\overrightarrow {LD} (i,j)} \ne 0,} \hfill \\ {(1 - P_{LM} - P_{LG} )*{{\overrightarrow {LL} (i,j)} \mathord{\left/ {\vphantom {{\overrightarrow {LL} (i,j)} {\sum\limits_{j} {\overrightarrow {LL} (i,j)} }}} \right. \kern-\nulldelimiterspace} {\sum\limits_{j} {\overrightarrow {LL} (i,j)} }}} \hfill & {if\;\sum\limits_{j} {\overrightarrow {LM} (i,j)} \ne 0,\sum\limits_{j} {\overrightarrow {LG} (i,j)} \ne 0,\sum\limits_{j} {\overrightarrow {LD} (i,j)} = 0,} \hfill \\ {(1 - P_{LM} - P_{LD} )*{{\overrightarrow {LL} (i,j)} \mathord{\left/ {\vphantom {{\overrightarrow {LL} (i,j)} {\sum\limits_{j} {\overrightarrow {LL} (i,j)} }}} \right. \kern-\nulldelimiterspace} {\sum\limits_{j} {\overrightarrow {LL} (i,j)} }}} \hfill & {if\;\sum\limits_{j} {\overrightarrow {LM} (i,j)} \ne 0,\sum\limits_{j} {\overrightarrow {LG} (i,j)} = 0,\sum\limits_{j} {\overrightarrow {LD} (i,j)} \ne 0,} \hfill \\ {(1 - P_{LG} - P_{LD} )*{{\overrightarrow {LL} (i,j)} \mathord{\left/ {\vphantom {{\overrightarrow {LL} (i,j)} {\sum\limits_{j} {\overrightarrow {LL} (i,j)} }}} \right. \kern-\nulldelimiterspace} {\sum\limits_{j} {\overrightarrow {LL} (i,j)} }}} \hfill & {if\;\sum\limits_{j} {\overrightarrow {LM} (i,j)} = 0,\sum\limits_{j} {\overrightarrow {LG} (i,j)} \ne 0,\sum\limits_{j} {\overrightarrow {LD} (i,j)} \ne 0,} \hfill \\ {(1 - P_{LM} - P_{LG} - P_{LD} )*{{\overrightarrow {LL} (i,j)} \mathord{\left/ {\vphantom {{\overrightarrow {LL} (i,j)} {\sum\limits_{j} {\overrightarrow {LL} (i,j)} }}} \right. \kern-\nulldelimiterspace} {\sum\limits_{j} {\overrightarrow {LL} (i,j)} }}} \hfill & {if\;\sum\limits_{j} {\overrightarrow {LM} (i,j)} \ne 0,\sum\limits_{j} {\overrightarrow {LG} (i,j)} \ne 0,\sum\limits_{j} {\overrightarrow {LD} (i,j)} \ne 0.} \hfill \\ \end{array} } \right.$$
(27)

where PLM (PLG, PLD) is the parameter which represents the transition probability from lncRNA similarity network to miRNA (gene, disease) similarity network and its value ranged from 0 to 1. Besides, PLM  = PML, PLG  = PGL, PLD = PDL, PMG = PGM, PMD = PDM, PGD = PDG. Similarly, other intra-transition matrix and inter-transition matrix can be defined.Applying the Laplacian normalization, all elements of probability transition matrix W can be obtained.The calculation formula of P0 is as follows:

$$P^{0} = \left( {\begin{array}{*{20}c} {P_{L} *U_{L0} } \\ {P_{M} *U_{M0} } \\ {P_{G} *U_{G0} } \\ {(1 - P_{L} - P_{M} - P_{G} )*U_{D0} } \\ \end{array} } \right).$$
(28)

Among them, the parameters PL, PM, PG, 1 − PL − PM − PG represent the importance of lncRNA similarity network, miRNA similarity network, gene similarity network and disease similarity network, respectively. Their values ranged from 0 to 1. UL0 represents the initial probability of the lncRNA similarity network, which is equal probabilities and is assigned to all seed nodes in the lncRNA similarity network. The sum of UL0 is 1. The initial probability UM0 and UG0 are similar to UL0. UD0 represents the initial probability of the disease similarity network, for disease d, the initial transition probability of disease d is 1, and the transition probability of other diseases is 0.

Finally, the Laplace normalized random walk with restart algorithm is used to predict related lncRNAs scores (see Fig. 3). The method was called as LRWRHLDA (the Laplace normalized random walk with restart algorithm in heterogeneous networks to predict the lncRNA-disease association).

Results

Performance evaluation

In this paper, ten-fold cross validation is used to evaluate the performance of our model. In the ten-fold cross validation, all known lncRNA-disease interactions are randomly divided into ten folds. For each experiment, nine subsets are regarded as training samples and the remaining one subset is treated as test samples. After completing the test, predicted scores are generated. Then, we rank test samples and unknown lncRNA-disease interactions. The corresponding predicted result of test samples is considered as true positive (TP) when the predicted relevance score is greater than the threshold. Otherwise, considered as false negative (FN). Similarly, for the unknown lncRNA-disease interactions, the corresponding predicted result consider as false positive (FP) when the predicted relevance score is greater than the threshold. Otherwise, considered as true negative (TN). Then, the true positive rates (TPR), the false positive rates (FPR), recall and precision are calculated as follow:

$$TPR = recall = \frac{TP}{{TP + FN}},$$
(29)
$$FPR = \frac{FP}{{FP + TN}},$$
(30)
$$precision = \frac{TP}{{TP + FP}}.$$
(31)

Finally, the receiver operating characteristic (ROC) curve and precision-recall curve (PR) curve are drawn as shown in Fig. 4. The area under the ROC curve (AUC) and the area under the PR curve (AUPR) are used to evaluate the performance of our method. The range of AUC, AUPR are all from 0 to 1. When the parameters are set to PLM = PLG = PLD = PMG = PMD = PGD = 0.2, PL = 0.4, PM = 0.1, PG = 0.1, λ = 0.7, the results of ten experiments are shown in Table 2.

Fig. 4
figure 4

The performance of LRWRHLDA by ten-fold cross validation. A The average ROC curve. B The average PR curve

Table 2 The AUC (AUPR) value for each experiment and mean AUC (AUPR) value

Comparison with different predicted methods using ten-fold cross validation

In order to compare with other models, the data in this paper is applied to the BPLLDA model [26], the RWRlncD model [27], GrwLDA model [28] and the MHRWR model [29].

As a result, the ROC curves under ten-fold cross validation of LRWRHLDA, RWRlncD, GrwLDA, BPLLDA and MHRWR were plotted in Fig. 5.

Fig. 5
figure 5

The ROC curve and AUC of LRWRHLDA, RWRlncD, GrWLDA, BPLLDA and MHRWR in predicting lncRNA-disease associations by the ten-fold cross validation

As can be seen, LRWRHLDA has an AUC of 0.98402 and outperformed RWRlncD (0.53625), GrwLDA (0.83276), BPLLDA (0.87148) and MHRWR (0.97169). In summary, LRWRHLDA is better than other model in lncRNA-disease association prediction.

The area under PR curve (AUPR) is also used to evaluate the performance of LRWRHLDA model, BPLLDA model [26], the RWRlncD model [27], GrwLDA model [28] and MHRWR model [29] to avoid overestimates the performance of these methods (see Fig. 6).

Fig. 6
figure 6

The PR curve and AUPR of LRWRHLDA, RWRlncD, GrWLDA, BPLLDA and MHRWR in predicting lncRNA-disease associations by ten-fold cross validation

It can be seen from Fig. 6 that the AUPR value of LRWRHLDA is also higher than other models.

Effects of parameters

There are ten parameters in our model, including the transition probability PLM, PLG, PLD, PMG, PMD, PGD between networks; the weight of the subnet PL, PM, PG; and the restart probability λ. Due to too many parameters and our limited computing resources, we arbitrarily fixed nine of these parameters in the paper and only discussed the impact of restart probability λ with the ten-fold cross validation in our model. The results are shown in Table 3. As can be seen, based on the AUC index, the parameter λ has less influence on the performance of LRWRHLDA, when λ = 0.7. Based on the AUPR index, when λ is equal to 0.9, the AUPR value reaches the maximum. And observing Table 3, the results showed that the restart probability λ has powerful effects on our model.

Table 3 The AUC and AUPR values when λ taking different values from 0.1 to 0.9, in which other parameters were fixed

Case study

Case studies on predicted lncRNA-disease associations

It is known that lncRNAs play critical roles in the development of many diseases. To evaluate the ability of LRWRHLDA in inferring potential lncRNA-disease associations, we use all known lncRNA-disease associations in LD as training data to assess the potential of predicted associations by our model.

The stable probability \(P^{\infty }\) can be used as a measure of proximity to the seed lncRNAs. If \(P^{\infty }\) (lncRNA i) > \(P^{\infty }\) (lncRNA j), then lncRNA i will be in closer proximity to the seed lncRNAs than lncRNA j in the lncRNA similarity network. As a result, all candidate lncRNAs can be ranked according to the \(P^{\infty }\), and the top ranked lncRNAs can be expected to have a high probability of being associated with the disease of interest. The novel lncRNA-disease associations are ranked according to the stable probability of LRWRHLDA. To validate the predictions, we use literature or the following those databases: LncRNADisease [30], LncRNADisease v2.0 [31], MNDR v3.1 [34], lnCAR [56]. Specifically, we list the top 10 lncRNAs associated with four diseases, including colorectal cancer, lung adenocarcinoma, stomach cancer and breast cancer. According to \(P^{\infty }\), the top 10 results were shown in Table 4 (the detailed results see Additional file 1: Table-S1).

Table 4 The predicted top10 potential lncRNAs for four cancers by LRWRHLDA

Colorectal cancer is the third most common cancer diagnosed in the US. While the incidence and the mortality rate of colorectal cancer has decreased due to effective cancer screening measures, there has been an increase in number of young patients diagnosed in colon cancer due to unclear reasons at this point of time [57]. Lung adenocarcinoma is one of the main types of lung cancer, which belongs to non-small cell carcinoma. The incidence of lung adenocarcinoma is mainly female and non-smokers [58]. Stomach cancer is the fifth most common cancer and the third most common cause of cancer death globally [59]. The most majority of stomach cancers are adenocarcinomas, with no obvious symptoms in the early stage. They are often similar to the symptoms of chronic gastric diseases such as gastritis and gastric ulcers, and easily ignore. Moreover, the current early diagnosis rate of stomach cancer is still low. Breast cancer is a malignant tumor that occurs in the epithelial tissue of the breast. At present, breast cancer has become a major public health problem in the current society, and its cause is not yet fully understood. In the world, breast cancer is an important cause of human suffering and premature mortality among women [60].

In Table 4, the six potential lncRNA-disease associations were confirmed in the literature except the existing lncRNA-disease associations in the database, in which included ENST00000535511-colorectal cancer, RP4-colorectal cancer, CTNNAP1-colorectal cancer, LINC01021-colorectal cancer, GMDS-AS1-lung adenocarcinoma, LINC01207-lung adenocarcinoma. These results demonstrated that the predictive performance of the proposed method.

Case studies on predicted novel diseases and novel lncRNAs

For each disease, it is deemed as a novel disease and all its related lncRNAs are removed to predict potential lncRNAs related the disease. All the candidate lncRNAs were ranked according to \(P^{\infty }\) and lncRNAs with high scores were expected to be potentially related with investigated disease d. Depend on \(P^{\infty }\), the top 10 results were listed in Table 5 (the detailed results see Additional file 2: Table-S2).

Table 5 The predicted top 10 novel lncRNAs-related for four cancers by LRWRHLDA

Analogously, the stable probability \(P^{\infty }\) can be also used as a measure of proximity to the seed diseases. All the candidate diseases were ranked according to \(P^{\infty }\) and diseases with high scores were expected to be potentially related with investigated lncRNA. To evaluate the ability of our model to predict new lncRNAs, we analyzed two lncRNAs including H19 and HOTAIR. For each lncRNA, it is removed all its related diseases in predicting potential diseases. According to \(P^{\infty }\), the top 10 results were showed in Table 6 (the detailed results see Additional file 3: Table-S3).

Table 6 The predicted top 10 novel diseases-related for H19 and HOTAIR by LRWRHLDA

Observing Table 5, we can find that thirty-five of the top ten lncRNAs associations with four cancers were validated by the database or literature. However, other five cancer-lncRNA associations, colorectal cancer-CARL, stomach cancer-AF117829.1, breast cancer-AP003486.1, lung adenocarcinoma-AC018413.1 and lung adenocarcinoma-TUBB2A have not been confirmed by the database or literature. It implies our method can predict more additional lncRNA-disease associations.

From Table 6, in both cases, all top ten associated diseases were validated by the database. In summary, LRWRHLDA achieves favorable performances in predicting novel disease-associated lncRNAs and novel lncRNA-associated diseases.

Discussion

At present, many studies have shown that lncRNA has an important influence on the physiological process of diseases. Because traditional biological experiments are time-consuming and costly, it is necessary to develop a computational model to predict the association between lncRNA and disease.

In this paper, a new model-LRWRHLDA based on the Laplace normalized random walk with restart algorithm in heterogeneous network was constructed to predict potential lncRNA-disease associations. The ten-fold cross validation test is applied to evaluate the prediction performance of our method. In comparison with the state-of-the-art prediction methods, our method can achieve better performance in terms of AUC values. Moreover, case studies of colorectal cancer, lung adenocarcinoma, stomach cancer and breast cancer are implemented to further demonstrate that it could be a useful method for predicting potential relationships between lncRNAs and diseases as well.

However, our method has some limitations. Firstly, since we have 10 parameters, the selection and adjustment of parameters still face some difficulties. Secondly, because of our model is based on four networks, there are too many nodes in the network. In the random walk process, the more nodes there are, the longer the random walk time will be. In the future, we will continue to improve the model.

Conclusion

In this study, we proposed an effective method, LRWRHLDA, which is based on the Laplace normalized random walk with restart algorithm in heterogeneous network to predict the potential lncRNA and disease association. First, a heterogeneous network based on lncRNA, disease, miRNA, gene similarity network and their correlation networks were constructed. Then, we calculate the probability transition matrix by Laplace normalization. Finally, the potential lncRNA-disease associations were predicted by the random walk with restart over heterogeneous networks. Furthermore, LRWRHLDA can predict isolated disease-related lnRNA (isolated lnRNA-related disease). Our method is evaluated comprehensively by ten-fold cross validation and case studies in comparison with other methods. The results show that our method has higher prediction accuracy.

Availability of data and materials

The datasets supporting the conclusions of this article are included within the article and its additional files. The code (executable code and source code) and data for this study are available at https://github.com/wang-124/LRWRHLDA.git.

Abbreviations

lncRNAs:

Long non-coding RNAs

miRNA:

MicroRNA

LRWRHLDA:

Prediction the potential lncRNA-disease associations based on Laplace normalized random walk with restart algorithm in heterogeneous networks

ROC:

Receiver operating characteristic

TPR:

True positive rates

FPR:

False positive rates

AUC:

Areas under ROC curve

PR:

Precision-recall

AUPR:

The area under the precision-recall curve

References

  1. 1.

    Moreau Y, Tranchevent LC. Computational tools for prioritizing candidate genes: boosting disease gene discovery. Nat Rev Genet. 2012;13(8):523–36.

    CAS  PubMed  Google Scholar 

  2. 2.

    Rupaimoole R, Slack FJ. MicroRNA therapeutics: towards a new era for the management of cancer and other diseases. Nat Rev Drug Discov. 2017;16(3):203–22.

    CAS  PubMed  Google Scholar 

  3. 3.

    Bhan A, Soleimani M, Mandal SS. Long noncoding RNA and cancer: a new paradigm. Cancer Res. 2017;77(15):3965–81.

    CAS  PubMed  PubMed Central  Google Scholar 

  4. 4.

    Dai LY, Liu JX, Zhu R, Wang J, Yuan SS. Logistic weighted profile-based bi-random walk for exploring MiRNA-disease associations. J Comput Sci Technol. 2021;36(2):276–87.

    Google Scholar 

  5. 5.

    Jarroux J, Morillon A, Pinskaya M. History, discovery, and classification of lncRNAs. Adv Exp Med Biol. 2017;1008:1–46.

    CAS  PubMed  Google Scholar 

  6. 6.

    Li J, Li Z, Zheng W, Li X, Wang Z, Cui Y, et al. LncRNA-ATB: an indispensable cancer-related long noncoding RNA. Cell Prolif. 2017;50(6):e12381.

    PubMed Central  Google Scholar 

  7. 7.

    Lu TX, Rothenberg ME. MicroRNA. J Allergy Clin Immunol. 2018;141(4):1202–7.

    CAS  PubMed  Google Scholar 

  8. 8.

    Geisler S, Coller J. RNA in unexpected places: long non-coding RNA functions in diverse cellular contexts. Nat Rev Mol Cell Biol. 2013;14(11):699–712.

    CAS  PubMed  PubMed Central  Google Scholar 

  9. 9.

    Ma L, Bajic VB, Zhang Z. On the classification of long non-coding RNAs. RNA Biol. 2013;10(6):925–33.

    PubMed  Google Scholar 

  10. 10.

    Li Z, Ho IHT, Li X, Xu D, Wu WKK, Chan MTV, et al. Long non-coding RNAs in the spinal cord injury: novel spotlight. J Cell Mol Med. 2019;23(8):4883–90.

    CAS  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Xue X, Yang YA, Zhang A, Fong KW, Kim J, Song B, et al. LncRNA HOTAIR enhances ER signaling and confers tamoxifen resistance in breast cancer. Oncogene. 2016;35(21):2746–55.

    CAS  PubMed  Google Scholar 

  12. 12.

    Gupta RA, Shah N, Wang KC, et al. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature. 2010;464(7291):1071–6.

    CAS  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Ji D, Zhong X, Jiang X, Leng K, Xu Y, Li Z, et al. The role of long non-coding RNA AFAP1-AS1 in human malignant tumors. Pathol Res Pract. 2018;214(10):1524–31.

    CAS  PubMed  Google Scholar 

  14. 14.

    Wang J, Su Z, Lu S, Fu W, Liu Z, Jiang X, et al. LncRNA HOXA-AS2 and its molecular mechanisms in human cancer. Clin Chim Acta. 2018;485:229–33.

    CAS  PubMed  Google Scholar 

  15. 15.

    Zhao Y, Xu J. Synovial fluid-derived exosomal lncRNA PCGEM1 as biomarker for the different stages of osteoarthritis. Int Orthop. 2018;42(12):2865–72.

    PubMed  Google Scholar 

  16. 16.

    Faghihi MA, Modarresi F, Khalil AM, Wood DE, Sahagan BG, Morgan TE, et al. Expression of a noncoding RNA is elevated in Alzheimer’s disease and drives rapid feed-forward regulation of beta-secretase. Nat Med. 2008;14(7):723–30.

    CAS  PubMed  PubMed Central  Google Scholar 

  17. 17.

    Hu Q, Wang YB, Zeng P, Yan GQ, Xin L, Hu XY. Expression of long non-coding RNA (lncRNA) H19 in immunodeficient mice induced with human colon cancer cells. Eur Rev Med Pharmacol Sci. 2016;20(23):4880–4.

    CAS  PubMed  Google Scholar 

  18. 18.

    Chen X, Yan GY. Novel human lncRNA-disease association inference based on lncRNA expression profiles. Bioinformatics. 2013;29(20):2617–24.

    CAS  PubMed  Google Scholar 

  19. 19.

    Zhao T, Xu J, Liu L, Bai J, Xu C, Xiao Y, et al. Identification of cancer-related lncRNAs through integrating genome, regulome and transcriptome features. Mol Biosyst. 2015;11(1):126–36.

    CAS  PubMed  Google Scholar 

  20. 20.

    Yuan Q, Guo X, Ren Y, Wen X, Gao L. Cluster correlation based method for lncRNA-disease association prediction. BMC Bioinform. 2020;21(1):180.

    CAS  Google Scholar 

  21. 21.

    Zhu R, Wang Y, Liu JX, Dai LY. IPCARF: improving lncRNA-disease association prediction using incremental principal component analysis feature selection and a random forest classifier. BMC Bioinform. 2021;22(1):175.

    CAS  Google Scholar 

  22. 22.

    Li Y, Li J, Bian N. DNILMF-LDA: prediction of lncRNA-disease associations by dual-network integrated logistic matrix factorization and bayesian optimization. Genes (Basel). 2019;10(8):608.

    CAS  Google Scholar 

  23. 23.

    Liu JX, Cui Z, Gao YL, Kong XZ. WGRCMF: a weighted graph regularized collaborative matrix factorization method for predicting novel LncRNA-disease associations. IEEE J Biomed Health Inform. 2021;25(1):257–65.

    PubMed  Google Scholar 

  24. 24.

    Liu JX, Gao MM, Cui Z, Gao YL, Li F. DSCMF: prediction of LncRNA-disease associations based on dual sparse collaborative matrix factorization. BMC Bioinform. 2021;22(Suppl 3):241.

    CAS  Google Scholar 

  25. 25.

    Gao MM, Cui Z, Gao YL, Wang J, Liu JX. Multi-label fusion collaborative matrix factorization for predicting LncRNA-disease associations. IEEE J Biomed Health Inform. 2021;25(3):881–90.

    PubMed  Google Scholar 

  26. 26.

    Xiao X, Zhu W, Liao B, Xu J, Gu C, Ji B, et al. BPLLDA: predicting lncRNA-disease associations based on simple paths with limited lengths in a heterogeneous network. Front Genet. 2018;9:411.

    CAS  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Sun J, Shi H, Wang Z, Zhang C, Liu L, Wang L, et al. Inferring novel lncRNA-disease associations based on a random walk model of a lncRNA functional similarity network. Mol Biosyst. 2014;10(8):2074–81.

    CAS  PubMed  Google Scholar 

  28. 28.

    Gu C, Liao B, Li X, Cai L, Li Z, Li K, et al. Global network random walk for predicting potential human lncRNA-disease associations. Sci Rep. 2017;7(1):12442.

    PubMed  PubMed Central  Google Scholar 

  29. 29.

    Zhao X, Yang Y, Yin M. MHRWR: prediction of lncRNA-disease associations based on multiple heterogeneous networks. IEEE/ACM Trans Comput Biol Bioinform. 2020;PP.

  30. 30.

    Chen G, Wang Z, Wang D, Qiu C, Liu M, Chen X, et al. LncRNADisease: a database for long-non-coding RNA-associated diseases. Nucleic Acids Res. 2013;41(Database issue):D983–6.

    CAS  PubMed  Google Scholar 

  31. 31.

    Bao Z, Yang Z, Huang Z, Zhou Y, Cui Q, Dong D. LncRNADisease 2.0: an updated database of long non-coding RNA-associated diseases. Nucleic Acids Res. 2019;47(D1):D1034–7.

    CAS  PubMed  Google Scholar 

  32. 32.

    Zhou B, Ji B, Liu K, Hu G, Wang F, Chen Q, et al. EVLncRNAs 2.0: an updated database of manually curated functional long non-coding RNAs validated by low-throughput experiments. Nucleic Acids Res. 2021;49(D1):D86-91.

    CAS  PubMed  Google Scholar 

  33. 33.

    Gao Y, Shang S, Guo S, Li X, Zhou H, Liu H, et al. Lnc2Cancer 3.0: an updated resource for experimentally supported lncRNA/circRNA cancer associations and web tools based on RNA-seq and scRNA-seq data. Nucleic Acids Res. 2021;49(D1):D1251–8.

    CAS  PubMed  Google Scholar 

  34. 34.

    Ning L, Cui T, Zheng B, Wang N, Luo J, Yang B, et al. MNDR v3.0: mammal ncRNA-disease repository with increased coverage and annotation. Nucleic Acids Res. 2021;49(D1):D160–4.

    CAS  PubMed  Google Scholar 

  35. 35.

    Paraskevopoulou MD, Georgakilas G, Kostoulas N, Reczko M, Maragkakis M, Dalamagas TM, et al. DIANA-LncBase: experimentally verified and computationally predicted microRNA targets on long non-coding RNAs. Nucleic Acids Res. 2013;41(Database issue):D239–45.

    CAS  PubMed  Google Scholar 

  36. 36.

    Wang P, Li X, Gao Y, Guo Q, Wang Y, Fang Y, et al. LncACTdb 2.0: an updated database of experimentally supported ceRNA interactions curated from low- and high-throughput experiments. Nucleic Acids Res. 2019;47(D1):D121–7.

    CAS  PubMed  Google Scholar 

  37. 37.

    Jeggari A, Marks DS, Larsson E. MiRcode: a map of putative microRNA target sites in the long non-coding transcriptome. Bioinformatics. 2012;28(15):2062–3.

    CAS  PubMed  PubMed Central  Google Scholar 

  38. 38.

    Li JH, Liu S, Zhou H, Qu LH, Yang JH. StarBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Res. 2014;42(Database issue):D92–7.

    CAS  PubMed  Google Scholar 

  39. 39.

    Cheng L, Wang P, Tian R, Wang S, Guo Q, Luo M, et al. LncRNA2Target v2.0: a comprehensive database for target genes of lncRNAs in human and mouse. Nucleic Acids Res. 2019;47(D1):D140–4.

    CAS  PubMed  Google Scholar 

  40. 40.

    Huang Z, Shi J, Gao Y, Cui C, Zhang S, Li J, et al. HMDD v3.0: a database for experimentally supported human microRNA-disease associations. Nucleic Acids Res. 2019;47(D1):D1013–7.

    CAS  Google Scholar 

  41. 41.

    Jiang Q, Wang Y, Hao Y, Juan L, Teng M, Zhang X, et al. MiR2Disease: a manually curated database for microRNA deregulation in human disease. Nucleic Acids Res. 2009;37(Database issue):D98-104.

    CAS  PubMed  Google Scholar 

  42. 42.

    Huang HY, Lin YC, Li J, Huang KY, Shrestha S, Hong HC, et al. MiRTarBase 2020: updates to the experimentally validated microRNA-target interaction database. Nucleic Acids Res. 2020;48(D1):D148–54.

    CAS  PubMed  Google Scholar 

  43. 43.

    Piñero J, Ramírez-Anguita JM, Saüch-Pitarch J, Ronzano F, Centeno E, Sanz F, et al. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res. 2020;48(D1):D845–55.

    PubMed  Google Scholar 

  44. 44.

    Wang Z, Monteiro CD, Jagodnik KM, Fernandez NF, Gundersen GW, Rouillard AD, et al. Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd. Nat Commun. 2016;7:12846.

    CAS  PubMed  PubMed Central  Google Scholar 

  45. 45.

    Pletscher-Frankild S, Pallejà A, Tsafou K, Binder JX, Jensen LJ. DISEASES: text mining and data integration of disease-gene associations. Methods. 2015;74:83–9.

    CAS  PubMed  Google Scholar 

  46. 46.

    Schriml LM, Mitraka E, Munro J, Tauber B, Schor M, Nickle L, et al. Human Disease Ontology 2018 update: classification, content and workflow expansion. Nucleic Acids Res. 2019;47(D1):D955–62.

    CAS  PubMed  Google Scholar 

  47. 47.

    Wang D, Wang J, Lu M, Song F, Cui Q. Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases. Bioinformatics. 2010;26(13):1644–50.

    CAS  PubMed  Google Scholar 

  48. 48.

    Li J, Gong B, Chen X, Liu T, Wu C, Zhang F, et al. DOSim: an R package for similarity between diseases based on disease ontology. BMC Bioinform. 2011;12:266.

    Google Scholar 

  49. 49.

    The Gene Ontology Consortium. The gene ontology resource: 20 years and still GOing strong. Nucleic Acids Res. 2019;47(D1):D330–8.

    Google Scholar 

  50. 50.

    Wang JZ, Du Z, Payattakool R, Yu PS, Chen CF. A new method to measure the semantic similarity of GO terms. Bioinformatics. 2007;23(10):1274–81.

    CAS  PubMed  Google Scholar 

  51. 51.

    Laarhoven TV, Nabuurs SB, Marchiori E. Gaussian interaction profile kernels for predicting drug-target interaction. Bioinformatics. 2011;27(21):3036–43.

    PubMed  Google Scholar 

  52. 52.

    Ganegoda GU, Li M, Wang W, Feng Q. Heterogeneous network model to infer human disease-long intergenic non-coding RNA associations. IEEE Trans Nanobiosci. 2015;14(2):175–83.

    Google Scholar 

  53. 53.

    Li Y, Patra JC. Genome-wide inferring gene–phenotype relationship by walking on the heterogeneous network. Bioinformatics. 2010;26(9):1219–24.

    CAS  PubMed  Google Scholar 

  54. 54.

    Wen Y, Han G, Anh VV. Laplacian normalization and bi-random walks on heterogeneous networks for predicting lncRNA-disease associations. BMC Syst Biol. 2018;12(Suppl 9):122.

    CAS  PubMed  PubMed Central  Google Scholar 

  55. 55.

    Zhao ZQ, Han GS, Yu ZG, Li J. Laplacian normalization and random walk on heterogeneous networks for disease-gene prioritization. Comput Biol Chem. 2015;57:21–8.

    CAS  PubMed  Google Scholar 

  56. 56.

    Zheng Y, Xu Q, Liu M, Hu H, Xie Y, Zuo Z, et al. LnCAR: a comprehensive resource for lncRNAs from cancer arrays. Cancer Res. 2019;79(8):2076–83.

    CAS  PubMed  Google Scholar 

  57. 57.

    Thanikachalam K, Khan G. Colorectal cancer and nutrition. Nutrients. 2019;11(1):164.

    CAS  PubMed Central  Google Scholar 

  58. 58.

    Song Q, Shang J, Yang Z, Zhang L, Zhang C, Chen J, et al. Identification of an immune signature predicting prognosis risk of patients in lung adenocarcinoma. J Transl Med. 2019;17(1):70.

    PubMed  PubMed Central  Google Scholar 

  59. 59.

    Smyth EC, Nilsson M, Grabsch HI, van Grieken NC, Lordick F. Gastric cancer. Lancet. 2020;396(10251):635–48.

    CAS  PubMed  Google Scholar 

  60. 60.

    Coughlin SS. Epidemiology of breast cancer in women. Adv Exp Med Biol. 2019;1152:9–29.

    CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 61772027, 61772028), key research and development plan of Zhejiang Province (2021C02039).

Funding

This research is partly sponsored by the National Natural Science Foundation of China (No. 61772027, 61772028), key research and development plan of Zhejiang Province (2021C02039). The funding bodies did not play any roles in the design of the study, in the collection, analysis, or interpretation of data, or in writing the manuscript.

Author information

Affiliations

Authors

Contributions

LW, MS, QD and PH designed the study. LW and MS carried out analyses and wrote the program. LW and PH wrote the paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Ping-an He.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1

. In this file we provide the results of stable probability of lncRNA when LRWRHLDA run over for four cancers based on the LD matrix.

Additional file 2

. In this file we provide the results of stable probability of lncRNA when LRWRHLDA run over when delete related lncRNAs of the cancer.

Additional file 3

. In this file we provide the results of stable probability of lncRNA when LRWRHLDA run over when delete related cancer of the lncRNAs.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, L., Shang, M., Dai, Q. et al. Prediction of lncRNA-disease association based on a Laplace normalized random walk with restart algorithm on heterogeneous networks. BMC Bioinformatics 23, 5 (2022). https://doi.org/10.1186/s12859-021-04538-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12859-021-04538-1

Keywords

  • lncRNA-disease associations
  • Similarity network
  • Heterogeneous network
  • LRWRHLDA
  • Ten-fold cross validation
  • AUC