A novel target convergence set based random walk with restart for prediction of potential LncRNA-disease associations

Li, Jiechen; Li, Xueyong; Feng, Xiang; Wang, Bing; Zhao, Bihai; Wang, Lei

doi:10.1186/s12859-019-3216-4

Research article
Open access
Published: 03 December 2019

A novel target convergence set based random walk with restart for prediction of potential LncRNA-disease associations

Jiechen Li^1,2,
Xueyong Li¹,
Xiang Feng^1,2,
Bing Wang³,
Bihai Zhao² &
…
Lei Wang ORCID: orcid.org/0000-0002-5065-8447^1,2

BMC Bioinformatics volume 20, Article number: 626 (2019) Cite this article

2056 Accesses
10 Citations
1 Altmetric
Metrics details

Abstract

Background

In recent years, lncRNAs (long-non-coding RNAs) have been proved to be closely related to the occurrence and development of many serious diseases that are seriously harmful to human health. However, most of the lncRNA-disease associations have not been found yet due to high costs and time complexity of traditional bio-experiments. Hence, it is quite urgent and necessary to establish efficient and reasonable computational models to predict potential associations between lncRNAs and diseases.

Results

In this manuscript, a novel prediction model called TCSRWRLD is proposed to predict potential lncRNA-disease associations based on improved random walk with restart. In TCSRWRLD, a heterogeneous lncRNA-disease network is constructed first by combining the integrated similarity of lncRNAs and the integrated similarity of diseases. And then, for each lncRNA/disease node in the newly constructed heterogeneous lncRNA-disease network, it will establish a node set called TCS (Target Convergence Set) consisting of top 100 disease/lncRNA nodes with minimum average network distances to these disease/lncRNA nodes having known associations with itself. Finally, an improved random walk with restart is implemented on the heterogeneous lncRNA-disease network to infer potential lncRNA-disease associations. The major contribution of this manuscript lies in the introduction of the concept of TCS, based on which, the velocity of convergence of TCSRWRLD can be quicken effectively, since the walker can stop its random walk while the walking probability vectors obtained by it at the nodes in TCS instead of all nodes in the whole network have reached stable state. And Simulation results show that TCSRWRLD can achieve a reliable AUC of 0.8712 in the Leave-One-Out Cross Validation (LOOCV), which outperforms previous state-of-the-art results apparently. Moreover, case studies of lung cancer and leukemia demonstrate the satisfactory prediction performance of TCSRWRLD as well.

Conclusions

Both comparative results and case studies have demonstrated that TCSRWRLD can achieve excellent performances in prediction of potential lncRNA-disease associations, which imply as well that TCSRWRLD may be a good addition to the research of bioinformatics in the future.

Background

For many years, the genetic information of organism is considered to be stored only in genes used for protein coding, and RNAs have always been thought to be an intermediary in the process of encoding proteins by DNAs [1, 2]. However, recent studies have shown that the genes used to encode proteins only account for a small part (less than 2%) of human genome and more than 98% of human genome are not made up of genes that encode proteins and yield a big mount of ncRNAs (non-coding-RNAs) [3, 4]. In addition, as the complexity of biological organisms increases, so does the importance of ncRNAs in biological processes [5, 6]. Generally, ncRNAs can be divided into two major categories such as small ncRNAs and long ncRNAs (lncRNAs) according to the length of nucleotides during transcription, where small ncRNAs consist of less than 200 nucleotides and include microRNAs and transfer RNAs etc. However, lncRNAs consist of more than 200 nucleotides [7,8,9]. In 1990, the first two kinds of lncRNAs such as H19 and Xist were discovered by researchers through gene mapping. Since gene mapping approach is extremely time-consuming and labor-intensive, then researches in the field of lncRNAs have been at a relatively slow pace for a long time [10, 11]. In recent years, with the rapid development of high-throughput technologies in gene sequencing, more and more lncRNAs have been found in eukaryotes and other species [12, 13]. Moreover, simulation results have shown as well that lncRNAs play important roles in various physiological processes such as cell differentiation and death, regulation of epigenetic shape and so on [8, 14, 15]. Simultaneously, growing evidences have further illustrated that lncRNAs are closely linked to diseases that pose a serious threat to human health [16,17,18], which means that lncRNAs can be used as potential biomarkers in the course of disease treatment in the future [19].

With the discovery of a large number of new types of lncRNAs, many databases related to lncRNAs such as lncRNAdisease [20], lncRNAdb [21], NONCODE [22] and Lnc2Cancer [23] have been established by researchers successively, however, in these databases, the number of known associations between lncRNAs and diseases is still very limited due to high costs and time-consumption of traditional biological experiments. Thus, it is meaningful to develop mathematical models to predict potential lncRNA-disease associations quickly and massively. Based on the assumption that similar diseases tend to be more likely associated with similar lncRNAs [24, 25], up to now, a good deal of computational models for inferring potential lncRNA-disease associations have been proposed. For instance, Chen et al. proposed a computational model called LRLSLDA [26] for prediction of potential lncRNA-disease associations by adopting the method of Laplacian regularized least squares. Ping and Wang et al. constructed a prediction model for extracting feature information from bipartite interactive networks [27]. Zhao and Wang et al. developed a computational model based on Distance Correlation Set to uncover potential lncRNA-disease associations through integrating known associations between three kinds of nodes such as disease nodes, miRNA nodes and lncRNA nodes into a complex network [28]. Chen et al. proposed an lncRNA-disease association prediction model based on a heterogeneous network by considering the influence of path length between nodes on the similarity of nodes in the heterogeneous network [29,30,31]. However, for some time past, a network traversal method called RWR (Random Walk with Restart) has emerged in the field of computational biology including prediction of potential miRNA-disease associations [32, 33], drug-target associations [34] and lncRNA-disease associations [35,36,37] etc.

Inspired by the thoughts illustrated in above state-of-the-art literatures, in this paper, a computational model called TCSRWRLD is proposed to discover potential lncRNA-disease associations. In TCSRWRLD, a heterogeneous network is constructed first through combining known lncRNA-disease associations with the lncRNA integrated similarity and the disease integrated similarity, which can overcome a drawback of traditional RWR based approaches that these approaches cannot start walking process while there are no known lncRNA-disease associations. And then, each node in the heterogeneous network will establish its own TCS according to the information of network distance, which can reflect the specificity of different nodes in the walking process and make the prediction more accurate and less time-consuming. Moreover, considering that for a given walker, while its TCS has reached the ultimate convergence state, there may be still some nodes that are not included in its TCS but actually associated with it, then in order to ensure that there is no omission in our prediction results, each node in the heterogeneous network will further establish its own GS as well. Finally, for evaluating the prediction performance of our newly proposed model TCSRWRLD, cross validation are implemented based on known lncRNA-disease associations downloaded from the lncRNAdisease database (2017version), and as a result, TCSRWRLD can achieve reliable AUCs of 0.8323, 0.8597, 0.8665 and 0.8712 under the frameworks of 2-folds CV, 5-folds CV, 10-folds CV and LOOCV respectively. In addition, simulation results in case studies of leukemia and lung cancer show that there are 5 and 7 out of the top 10 predicted lncRNAs having been confirmed to be associated with Leukemia and Lung cancer respectively by recent evidences, which demonstrate as well that our model TCSRWRLD has excellent prediction performance.

Results

In order to verify the performance of TCSRWRLD in predicting potential lncRNA-disease associations, LOOCV, 2-folds CV, 5-folds CV and 10-folds CV were implemented on TCSRWRLD respectively. And then, based on the dataset of 2017-version downloaded from the lncRNADisease database, we obtained the Precision-Recall curve (P-R curve) of TCSRWRLD. In addition, based on the dataset of 2017-version downloaded from the lncRNADisease database and the dataset of 2016-version downloaded from the lnc2Cancer database, we compared TCSRWRLD with state-of-the-art prediction models such as KATZLDA, PMFILDA [38] and Ping’s model separately. After that, we further analyzed the influences of key parameters on the prediction performance of TCSRWRLD. Finally, case studies of leukemia and lung cancer were performed to validate the feasibility of TCSRWRLD as well.

Cross validation

In this section, ROC curve (Receiver Operating Characteristic) and the score of AUC (Area Under ROC Curve) will be adopted to measure the performance of TCSRWRLD in different cross validations. Here, let TPR (True Positive Rates or Sensitivity) represent the percentage of candidate lncRNAs-disease associations with scores higher than a given score cutoff, and FPR (False Positive Rates or 1-Specificity) denote the ratio of predicted lncRNA-disease associations with scores below the given threshold, then ROC curves can be obtained by connecting the corresponding pairs of TPR and FPR on the graph. As illustrated in Fig. 1, simulation results show that TCSRWRLD can achieve reliable AUCs of 0.8323, 0.8597, 0.8665 and 0.8712 in the frameworks of 2-folds CV, 5-folds CV, 10-folds and LOOCV respectively, which implies that TCSRWRLD can achieve excellent performance in predicting potential lncRNA-disease associations.

Moreover, in order to further estimate the prediction performance of TCSRWRLD, we will obtain the P-R curve of TCSRWRLD as well. Unlike the AUC, the AUPR (Area Under the Precision-Recall curve) represents the ratio of all true positives to all positive predictions at every given recall rate. As illustrated in Fig. 2, simulation results show that TCSRWRLD can achieve a reliable AUPR of 0.5007.

Comparison with other related methods

From above descriptions, it is easy to know that TCSRWRLD can achieve satisfactory prediction performance. In this section, we will compare TCSRWRLD with some classical prediction models to further demonstrate the performance of TCSRWRLD. Firstly, based on the dataset of 2017-version downloaded from the lncRNAdisease database, we will compare TCSRWRLD with the state-of-the-art models such as KATZLDA, PMFILDA and Ping’s model. As shown in Fig. 3, it is easy to see that TCSRWRLD can achieve a reliable AUC of 0.8712 in LOOCV, which is superior to the AUCs of 0.8257, 0.8702 and 0.8346 achieved by KATZLDA, Ping’s model and PMFILDA in LOOCV respectively.

Moreover, in order to prove that TCSRWRLD can perform well in different data backgrounds, we also adopt the dataset of 2016-version downloaded from the lnc2Cancer database, which consists of 98 human cancers, 668 lncRNAs and 1103 confirmed associations between them, to compare TCSRWRLD with KATZLDA, PMFILDA and Ping’s model. As illustrated in Fig. 4, it is easy to see that TCSRWRLD can achieve a reliable AUC of 0.8475 in LOOCV, which is superior to the AUCs of 0.8204 and 0.8374 achieved by KATZLDA and PMFILDA respectively, while is inferior to the AUC of 0.8663 achieved by Ping’s model.

Analysis on effects of parameters

In TCSRWRLD, there are some key parameters such as $ {\gamma}_l^{\prime } $, $ {\gamma}_d^{\prime } $ and ∂. As for $ {\gamma}_l^{\prime } $ and $ {\gamma}_d^{\prime } $ in the Equation (5) and Equation (11), we have already known that the model can achieve the best performance when the values of $ {\gamma}_l^{\prime } $and$ {\gamma}_d^{\prime } $ are both set to 1 [39]. Hence, in order to estimate effect of the key parameter ∂ on the prediction performance of TCSRWRLD, we will set the value range of ∂ from 0.1 to 0.9 and select the value of AUC in LOOCV as the basis of parameter selection in this section. As illustrated in Table 1, It is easy to see that TCSRWRLD can achieve the highest value of AUC in LOOCV while ∂ is set to 0.4. Moreover, it is also easy to see that TCSRWRLD can maintain robustness for different values of ∂, which means that TCSRWRLD is not sensitive to the values of ∂ as well.

Table 1 AUCs achieved by TCSRWRLD in LOOCV while the parameter ∂ is set to different values from 0.1 to 0.9

Full size table

Case studies

Up to now, cancer is considered as one of the most dangerous diseases to human health because it is hard to be treated [40]. At present, the incidence of various cancers has a high level not only in the developing countries where medical development is relatively backward, but also in the developed countries where the medical level is already very high. Hence, in order to further evaluate the performance of TCSRWRLD, case study of two kinds of dangerous cancers such as lung cancer and leukemia will be implemented in this section. As for these two kinds of dangerous cancers, the incidence of lung cancer has remained high in recent years, and the number of lung cancer deaths per year is about 1.8 million, which is the highest of any cancer types. However, the survival rate within five years after the diagnosis of lung cancer is only about 15%, which is much lower than that of other cancers [41]. Recently, growing evidences have shown that lncRNAs play crucial roles in the development and occurrence of lung cancer [42]. As illustrated in Table 2, while implementing TCSRWRLD to predict lung cancer related lncRNAs, there are 7 out of the top 10 predicted candidate lung cancer related lncRNAs having been confirmed by the latest experimental evidences. Additionally, as a blood-related cancer [43], Leukemia has also been found to be closely related to a variety of lncRNAs in recent years. As illustrated in Table 2, while implementing TCSRWRLD to predict Leukemia related lncRNAs, there are 5 out of the top 10 predicted candidate Leukemia related lncRNAs having been confirmed by state-of-the-art experiment results as well. Thus, from above simulation results of case studies, we can easily reach an agreement that TCSRWRLD may have great value in predicting potential lncRNA-disease associations.

Table 2 Evidences of top 10 potential leukemia-related lncRNAs and lung cancer-related lncRNAs predicted by TCSRWRLD

Full size table

Discussion

Since it is very time-consuming and labor-intensive to verify associations between lncRNAs and diseases through traditional biological experiments, then it has become a hot topic in bioinformatics to establish computational models to infer potential lncRNA-disease associations, which can help researchers to have a deeper understanding of diseases at the lncRNA level. In this manuscript, a novel prediction model called TCSRWRLD is proposed, in which, a heterogeneous network is constructed first through combining the disease integrated similarity, the lncRNA integrated similarity and known lncRNA-disease associations, which can guarantee that TCSRWRLD is able to overcome the shortcomings of traditional RWR based prediction models that the random walk process cannot be started while there are no known lncRNA-disease associations. And then, based on the newly constructed heterogeneous network, a random walk based prediction model is further designed based on the concepts of TCS and GS. In addition, based on the dataset of 2017-version downloaded from the lncRNAdisease database, a variety of simulations have been implemented, and simulation results show that TCSRWRLD can achieve reliable AUCs of 0.8323, 0.8597 0.8665 and 0.8712 under the frameworks of 2-fold CV, 5-fold CV, 10-fold CV and LOOCV respectively. Additionally, simulation results of case studies of lung cancer and leukemia show as well that TCSRWRLD has a reliable diagnostic ability in predicting potential lncRNA-disease associations. Certainly, the current version of TCSRWRLD still has some shortages and deficiencies. For example, the prediction performance of TCSRWRLD can be further improved if more known lncRNA-disease associations have been added into the experimental datasets. In addition, more accurate establishment of Mesh database will help us obtain more accurate disease semantic similarity scores, which is very important for the calculation of lncRNA functional similarity as well. Of course, all these above problems will be the focus of our future researches.

Conclusion

In this paper, the main contributions are as follows: (1) A heterogeneous lncRNA-disease network is constructed by integrating three kinds of networks such as the known lncRNA-disease association network, the disease-disease similarity network and the lncRNA-lncRNA similarity network. (2) Based on the newly constructed heterogeneous lncRNA-disease network, the concept of network distance is introduced to establish the TCS (Target Convergence Set) and GS (Global Set) for each node in the heterogeneous lncRNA-disease network. (3) Based on the concepts of TCS and GS, a novel random walk model is proposed to infer potential lncRNA-disease associations. (4) Through comparison with traditional state-of-the-art prediction models and the simulation results of case studies, TCSRWRLD is demonstrated to be of excellent prediction performance in uncovering potential lncRNA-disease associations.

Methods and materials

Known disease-lncRNA associations

Firstly, we download the 2017-version of known lncRNA-disease associations from the lncRNAdisease database (http://www.cuilab.cn/ lncrnadisease). And then, after removing duplicated associations and picking out the lncRNA-disease associations from the raw data, we finally obtain 1695 known lncRNA-disease associations (see Additional file 1) including 828 different lncRNAs (see Additional file 2) and 314 different diseases (see Additional file 3). Hence, we can construct a 314 × 828 dimensional lncRNA-disease association adjacency matrix A, in which, there is A(i, j) = 1, if and only if there is an known association between the disease d_i and the lncRNA l_j in the LncRNADisease database, otherwise there is A(i, j) = 0. In addition, for convenience of description, let N_L = 828 and N_D = 314, then it is obvious that the dimension of the lncRNA-disease association adjacency matrix A can be represented as N_D × N_L. And the like mentioned above, we can get a cancer-disease associations adjacency matrix which dimension is 98 × 668 (It comes from 2016-version of known lncRNA-disease associations from the Lnc2Cancer database) (see Additional file 4).

Similarity of diseases

Semantic similarity of diseases

In order to estimate the semantic similarity between different diseases, based on the concept of DAGs (Directed Acyclic Graph) of different diseases proposed by Wang et al. [44, 45], we can calculate the disease semantic similarity through calculating the similarity between compositions of DAGs of different diseases as follows:

Step 1

For all these 314 diseases newly obtained from the lncRNAdisease database, their corresponding MESH descriptors can be downloaded from the Mesh database in the National Library of Medicine (http://www.nlm.nih.gov/). As illustrated in Fig. 5, based on the information of MESH descriptors, each disease can establish a DAG of its own.

Step 2

For any given disease d, Let its DAG be DAG(d) = (d, D(d), E(d)), where D(d) represents a set of nodes consisting of the disease d itself and its ancestral disease nodes, and E(d) denotes a set of directed edges pointing from ancestral nodes to descendant nodes.

Step 3

For any given disease d and one of its ancestor nodes t in DAG(d), the semantic contributions of the ancestor node t to the disease d can be defined as follows:

$$ {D}_d(t)=\left\{\begin{array}{c}1\\ {}\max \left\{\varDelta \ast {D}_d\left(t\hbox{'}\right)|t\hbox{'}\in children\kern0.17em of\;t\right\}\kern1em \begin{array}{c} if\;t=d\\ {} if\;t\ne d\end{array}\end{array}\right\} $$

(1)

Where Δ is the attenuation factor with value between 0 and 1 to calculate the disease semantic contribution, and according to the state-of-the-art experimental results, the most appropriate value forΔis 0.5 .

Step 4

For any given disease d, let its DAG be DAG(d), then based on the concept of DAG, the semantic value of d can be defined as follows:

$$ D(d)={\sum \limits}_{t_i\in DAG(d)}{D}_d\left({t}_i\right) $$

(2)

Taking the disease DSN (Digestive Systems Neoplasms) illustrated in Fig. 5 for example, according to the Equation (1), it is easy to know that the semantic contribution of digestive systems neoplasms to itself is 1. Besides, since the neoplasms by site and the digestive system disease located in the second layer of the DAG of DSN, then it is obvious that both of the semantic contributions of these two kinds of diseases to DSN are 0.5*1 = 0.5. Moreover, since the neoplasms located in the third layer of the DAG of DSN, then its semantic contribution to DSN is 0.5*0.5 = 0.25. Hence, according to above formula (2), it is easy to know the semantic value of DSN will be 2.25 (=1 + 0.5 + 0.5 + 0.25).

Step 5

For any two given diseases d_i and d_j, based on the assumption that the more similar the structures of their DAGs, the higher the semantic similarity between them will be, the semantic similarity between d_i and d_j can be defined as follows:

$$ DisSemSim\left(i,j\right)= DisSemSim\left({d}_i,{d}_j\right)=\frac{\sum_{t\in \left( DAG\left({d}_i\right)\cap DAG\left({d}_j\right)\right)}\left({D}_{d_i}(t)+{D}_{d_j}(t)\right)}{D\left({d}_i\right)+D\left({d}_j\right)} $$

(3)

Gaussian interaction profile kernel similarity of diseases

Based on the assumption that similar diseases tend to be more likely associated with similar lncRNAs, according to above newly constructed lncRNA-disease association adjacency matrix A, for any two given diseases d_i and d_j, the Gaussian interaction profile kernel similarity between them can be obtained as follows:

$$ GKD\left({d}_i,{d}_j\right)=\mathit{\exp}\left(-{\gamma}_d{\left\Vert IP\left({d}_i\right)- IP\left({d}_j\right)\right\Vert}^2\right) $$

(4)

$$ {\gamma}_d={\gamma}_d^{\hbox{'}}/\left({\sum \limits}_{k=1}^{N_D}{\left\Vert IP\left({d}_k\right)\right\Vert}^2\right) $$

(5)

Here, IP(d_t) denotes the vector consisting of elements in the tth row of the lncRNA-disease adjacency matrix A. γ_d is the parameter to control the kernel bandwidth based on the new bandwidth parameter $ {\gamma}_d^{\prime } $by computing the average number of lncRNAs-disease associations for all the diseases. In addition, inspired by the thoughts of former methods proposed by O. Vanunu et al. [46], we will adopt a logistics function to optimize the Gaussian interaction profile kernel similarity between diseases, and based on above Equation (4), we can further obtain a N_D × N_D dimensional adjacency matrix FKD as follows:

$$ FKD\left(i,j\right)=\frac{1}{1+{e}^{\left(-12 GKD\left(i,j\right)+\log (9999)\right)}} $$

(6)

Integrated similarity of diseases

Based on the disease semantic similarity and disease Gaussian interaction profile kernel similarity obtained above, a N_D × N_D dimensional integrated disease similarity adjacency matrix KD (N_D × N_D) can be obtained as follows:

$$ KD\left(i,j\right)=\frac{DisSemSim\left(i,j\right)+ FKD\left(i,j\right)}{2} $$

(7)

Similarity of LncRNAs

Functional similarity of LncRNAs

We can obtain corresponding disease groups of two given lncRNAs l_i and l_j from the known associations of lncRNA-disease. Based on the assumption that similar diseases tend to be more likely associated with similar lncRNAs, We define the functional similarity of two given lncRNAs l_i and l_j as the semantic similarity between the disease groups corresponding to them. The specific calculation process is as follows:

For any two given lncRNAs l_i and l_j, let DS(i) = {d_k | A(k, i) = 1, k∈[1, N_D]} and DS(j) = {d_k | A(k, j) = 1, k∈[1, N_D]}, then the functional similarity between l_i and l_j can be calculated according to the following steps [31]:

Step 1

For any given disease group DS(k) and disease d_t∉DS(k), we first calculate the similarity between d_t and DS(k) as follows:

$$ S\left({d}_t, DS(k)\right)={\max}_{d_s\in DS(k)}\left\{ DisSemSim\left({d}_t,{d}_s\right)\right\} $$

(8)

Step 2

Therefore, based on above Equation (8), we define the functional similarity between l_i and l_j as FuncKL(i, j), which can be calculated as follows:

$$ FuncKL\left(i,j\right)=\frac{\sum_{d_t\in DS(i)}S\left({d}_t, DS(j)\right)+{\sum}_{d_t\in DS(j)}S\left({d}_t, DS(i)\right)}{\mid DS(i)\mid +\mid DS(i)\mid } $$

(9)

Here, |D(i)| and |D(j)| represent the number of diseases in DS(i) and DS(j) respectively. Thereafter, according to above Equation (9), it is obvious that a N_L × N_L dimensional lncRNA functional similarity matrix FuncKL can be obtained in final.

Gaussian interaction profile kernel similarity of lncRNAs

Based on the assumption that similar lncRNAs tend to be more likely associated with similar diseases, according to above newly constructed lncRNA-disease association adjacency matrix A, for any two given lncRNAs l_i and l_j, the Gaussian interaction profile kernel similarity between them can be obtained as follows:

$$ FKL\left({l}_i,{l}_j\right)=\mathit{\exp}\left(-{\gamma}_l{\left\Vert IP\left({l}_i\right)- IP\left({l}_j\right)\right\Vert}^2\right) $$

(10)

$$ {\gamma}_l={\gamma}_l^{\hbox{'}}/\left({\sum \limits}_{k=1}^{N_L}{\left\Vert IP\left({l}_k\right)\right\Vert}^2\right) $$

(11)

Here, IP(l_t) denotes the vector consisting of elements in the tth column of the lncRNA-disease adjacency matrix A. γ_l is the parameter to control the kernel bandwidth based on the new bandwidth parameter$ {\gamma}_l^{\prime } $by computing the average number of lncRNAs-disease associations for all the lncRNAs. So far, based on above Equation (10), we can obtain a N_L × N_L dimensional lncRNA Gaussian interaction profile kernel similarity matrix FKL as well.

Integrated similarity of lncRNAs

Based on the lncRNA functional similarity and lncRNA Gaussian interaction profile kernel similarity obtained above, a N_L × N_L dimensional integrated lncRNA similarity adjacency matrix KL (N_L × N_L) can be obtained as follows:

$$ KL\left(i,j\right)=\frac{FuncKL\left(i,j\right)+ FKL\left(i,j\right)}{2} $$

(12)

Construction of computational model TCSRWRLD

The establishment of heterogeneous network

Through combing the N_D × N_D dimensional integrated disease similarity adjacency matrix KD and the N_L × N_L dimensional integrated lncRNA similarity adjacency matrix KL with the N_D × N_L dimensional lncRNA-disease association adjacency matrix A, we can construct a new (N_L + N_D) × (N_L + N_D) dimensional integrated matrix AA as follow:

$$ AA\left(i,j\right)=\left[\begin{array}{cc} KL\left(i,j\right)& {A}^T\left(i,j\right)\\ {}A\left(i,j\right)& KD\left(i,j\right)\end{array}\right] $$

(13)

According to above Equation (13), we can construct a corresponding heterogeneous lncRNA-disease network consisting of N_D different disease nodes and N_L different lncRNA nodes, in which, for any given pair of nodes i and j, there is an edge existing between them, if and only if there is AA(i, j) > 0.

Establishment of TCS (target convergence set)

Before the implementation of random walk, for each node in above newly constructed heterogeneous lncRNA-disease network, as illustrated in Fig. 6, it will establish its own TCS first according to the following steps:

Step 1

For any given lncRNA node l_j, we define its original TCS as the set of all disease nodes that have known associations with it, i.e., the original TCS of l_j is TCS₀(l_j) = {d_k | A(k, j) = 1, k∈[1, N_D]}. Similarly, for a given disease node d_i, we can define its original TCS as TCS₀(d_i) = {l_k | A(i, k) = 1, k∈[1, N_L]}.

Step 2

After the original TCS has been established, for any given lncRNA node l_j, ∀d_k∈TCS₀(l_j), and ∀t∈[1, N_D], then we can define the network distance ND(k, t) between d_k and d_t as follows:

$$ ND\left(k,t\right)=\frac{1}{KD\left(k,t\right)} $$

(14)

According to above Equation (14), for any disease nodes d_k∈TCS₀(l_j) and ∀t∈[1, N_D], obviously it is reasonable to deduce that the smaller the value of ND(k, t), the higher the similarity between d_t and d_k would be, i.e., the higher the possibility that there is potential association between d_t and l_j will be.

Similarly, for any given disease node d_i, ∀l_k∈TCS₀(d_i) and ∀t∈[1, N_L], we can define the network distance ND(k, t) between l_k and l_t as follows:

$$ ND\left(k,t\right)=\frac{1}{KL\left(k,t\right)} $$

(15)

According to above Equation (15), for any lncRNA nodes l_k∈TCS₀(d_i) and ∀t∈[1, N_L], obviously it is reasonable to deduce that the smaller the value of ND(k, t), the higher the similarity between l_t and l_k will be, i.e., the higher the possibility that there is potential association between l_t and d_i will be.

Step 3

According to above Equation (14) and Equation (15), for any given disease node d_i or any given lncRNA node l_j, we define that the TCS of d_i as the set of top 100 lncRNA nodes in the heterogeneous lncRNA-disease network that have minimum average network distance to the lncRNA nodes in TCS₀(d_i), and the TCS of l_j as the set of top 100 disease nodes in the heterogeneous lncRNA-disease network that have minimum average network distance to the disease nodes in TCS₀(l_j). Then, it is easy to know that these 100 lncRNA nodes in TCS (d_i) may belong to TCS₀(d_i) or may not belong to TCS₀(d_i), and these 100 disease nodess in TCS (l_j) may belong to TCS₀(l_j) or may not belong to TCS₀(l_j).

Random walk in the heterogeneous LncRNA-disease network

The method of random walk simulates the process of random walker’s transition from one starting node to other neighboring nodes in the network with given probability. Based on the assumption that similar diseases tend to be more likely associated with similar lncRNAs, as illustrated in Fig. 7, the process of our prediction model TCSRWRLD can be divided into the following major steps:

Step 1

For a walker, before it starts its random walk across the heterogeneous lncRNA-disease network, it will first construct a transition probability matrix W as follows:

$$ W\left(i,j\right)=\frac{AA\left(i,j\right)}{\sum_{k=1}^{N_D+{N}_L} AA\left(i,k\right)} $$

(16)

Step 2

In addition, for any node £_i in the heterogeneous lncRNA-disease network, whether £_i is a lncRNA node l_i or a disease node d_i, it can obtain an initial probability vector P_i (0) for itself as follows:

$$ {P}_i(0)={\left({p}_{i,1}(0),{p}_{i,2}(0),\dots, {p}_{i,j}(0),\dots {p}_{i,{N}_D+{N}_L}(0)\right)}^T $$

(17)

$$ {p}_{i,j}(0)=W\left(i,j\right)\kern0.36em j=1,2,\dots, {N}_{D+}{N}_L $$

(18)

Step 3

Next, the walker will randomly select a node §_i in the heterogeneous lncRNA-disease network as the starting node to initiate its random walk, where §_i may be an lncRNA node l_i or a disease node d_i. After the initiation of the random walk process, supposing that currently the walker has arrived at the node Γ_i from the previous hop node Γ_j after t-1 hops during its random walk across the heterogeneous lncRNA-disease network, then here and now, whether Γ_i is a lncRNA node l_i or a disease node d_i, and Γ_j is a lncRNA node l_j or a disease node d_j, the walker can further obtain a walking probability vector P_i(t) as follows:

$$ {P}_i(t)=\left(1-\partial \right)\ast {W}^T\ast {P}_j\left(t-1\right)+\partial \ast {P}_i(0) $$

(19)

Where ∂ (0< ∂< 1) is a parameter for the walker to adjust the value of walking probability vector at each hop. Moreover, based on above newly obtained walking probability vector P_i(t), let P_i(t) =$ {\left({p}_{i,1}(t),{p}_{i,2}(t),\dots, {p}_{i,j}(t),\dots {p}_{i,{N}_D+{N}_L}(t)\right)}^T $, and for convenience, supposing that there is p_{i, k}(k)=maximum{$ {p}_{i,1}(t),{p}_{i,2}(t),\dots, {p}_{i,k}(t),\dots {p}_{i,{N}_D+{N}_L}(t) $}, then the walker will choose the node ψ_k as its next hop node, where ψ_k may be a lncRNA node l_k or a disease node d_k. Especially, as for the starting node §_i, since it can be regarded that the walker has arrived at §_i from §_i after 0 hops, then it is obvious that at the starting node §_i, the walker will obtain two kinds of probability vectors such as the initial probability vector P_i (0) and the walking probability vector P_i (1). However, at each intermediate node Γ_i, the walker will obtain two other kinds of probability vectors such as the initial probability vector P_i (0) and the walking probability vector P_i(t).

Step 4

Based on above Equation (19), supposing that currently the walker has arrived at the node Γ_i from the previous hop node Γ_j after t-1 hops during its random walk across the heterogeneous lncRNA-disease network, let the walking probability vectors obtained by the walker at the node Γ_i and Γ_j be P_i(t) and P_j(t-1) respectively, if the L1 norm between P_i(t) and P_j(t-1) satisfies ‖P_i(t) − P_j(t − 1)‖₁ ≤ 10⁻⁶, then we will regard that the walking probability vector P_i(t) has reached a stable state at the node Γ_i. Thus, after the walking probability vectors obtained by the walker at every disease node and lncRNA node in the heterogeneous lncRNA-disease network have reached stable state, and for convenience, let these stable walking probability vectors be $ {P}_1\left(\infty \right),{P}_2\left(\infty \right),\dots, {P}_{N_D+{N}_L}\left(\infty \right) $, then based on these stable walking probability vectors, we can obtain a stable walking probability matrix S(∞) as follows:

$$ S\left(\infty \right)=\left[\frac{S_1}{S_3}\kern1em \frac{S_2}{S_4}\right]={\left({P}_1\left(\infty \right),{P}_2\left(\infty \right),\dots, {P}_{N_D+{N}_L}\left(\infty \right)\right)}^T $$

(20)

Where S₁ is a N_L×N_L dimensional matrix, S₂ is a N_L×N_D dimensional matrix, S₃ is a N_D×N_L dimensional matrix, and S₄ is a N_D×N_D dimensional matrix. And moreover, from above descriptions, it is easy to infer that the matrix S₂ and the matrix S₃ are the final result matrices needed by us, and we can predict potential lncRNA-disease associations based on the scores given in these two final result matrices.

According to above described steps of the random walk process based on our prediction model TCSRWRLD, it is obvious that for each node Γ_i in the heterogeneous lncRNA-disease network, the stable walking probability vector obtained by the walker at Γ_i is P_i(∞) = $ {\left({p}_{i,1}\left(\infty \right),{p}_{i,2}\left(\infty \right),\dots, {p}_{i,j}\left(\infty \right),\dots {p}_{i,{N}_D+{N}_L}\left(\infty \right)\right)}^T $. Moreover, for convenience, we denote a node set consisting of all the N_D+N_L nodes in the heterogeneous lncRNA-disease network as a Global Set (GS), then it is obvious that we can rewrite the stable walking probability vector P_i(∞) as $ {P}_i^{GS}\left(\infty \right) $. Additionally, from observing the stable walking probability vector $ {P}_i^{GS}\left(\infty \right) $, it is easy to know that the walker will not stop its random walk until the N_D+N_L dimensional walking probability vector at each node in the heterogeneous lncRNA-disease network has reached a stable state, which will obviously be very time-consuming while the value of N_D+N_L is large to a certain extent. Hence, in order to decrease the execution time and quicken the velocity of convergence of TCSRWRLD, based on the concept of TCS proposed in above section, while constructing the walking probability vector P_i(t)=(p_{i, 1}(t), p_{i, 2}(t), …, p_{i, j}(t), $ \dots, {p}_{i,{N}_D+{N}_L}(t)\Big){}^T $ at the node Γ_i, we will keep the p_{i, j}(t) unchanged if the jth node in these N_D+N_L nodes belongs to the TCS of Γ_i, otherwise we will set p_{i, j}(t)=0. Thus, the walking probability vector obtained by the walker at Γ_i will turn to be $ {P}_i^{TCS}(t) $ while the stable walking probability vector obtained by the walker at Γ_i will turn to be $ {P}_i^{TCS}\left(\infty \right) $. Obviously, comapred with $ {P}_i^{GS}\left(\infty \right) $, the stable state of $ {P}_i^{TCS}\left(\infty \right) $ can be reached by the walker much more quickly. However, considering that there may be nodes that are not in the TCS of Γ_i but actually associated with the target node, therefore, in order to avoid omissions, during simulation, we will construct a novel stable walking probability vector $ {P}_i^{ANS}\left(\infty \right) $ through combining $ {P}_i^{GS}\left(\infty \right) $with $ {P}_i^{TCS}\left(\infty \right) $to predict potential lncRNA-disease associations as follows:

$$ {P}_i^{ANS}\left(\infty \right)=\frac{\ {P}_i^{GS}\left(\infty \right)+{P}_i^{TCS}\left(\infty \right)}{2} $$

(21)

Availability of data and materials

The datasets generated and/or analysed during the current study are available in the LncRNADisease repository, http://www.cuilab.cn/ lncrnadisease.

Abbreviations

10-Fold CV:: 10-fold cross-validation
2-Fold CV:: 2-fold cross-validation;
5-Fold CV:: 5-fold cross-validation
AUC:: Areas under ROC curve
AUPR:: Area under the precision-recall curve
FPR:: False positive rates
GS:: Global set
H19:: Long non-coding RNA H19
lncRNAs:: Long non-coding RNAs
LOOCV:: Leave-One Out Cross Validation
ncRNAs:: Non-coding RNAs
P-R curve:: Precision-recall curve
ROC:: Receiver-operating characteristics
RWR:: Random walk with restart
TCS:: Target Convergence Set
TCSRWRLD:: A novel computational model based on improved rand walk with restart is proposed to infer potential lncRNA-disease associations
TPR:: True positive rates
Xist:: Long non-coding RNA Xist

References

Crick FHC, Barnett L, Brenner S, Watts-Tobin RJ. General nature of the genetic code for proteins. Nat. 1961;192(4809):1227–32.
Article CAS Google Scholar
Yanofsky C. Establishing the triplet nature of the genetic code. Cell. 2007;128(5):815–8.
Article CAS PubMed Google Scholar
Jean-Michel C. Fewer genes, more noncoding RNA. Sci. 2005;309(5740):1529–30.
Article CAS Google Scholar
Core LJ, Waterfall JJ, Lis JT. Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Sci. 2008;322(5909):1845–8.
Article CAS Google Scholar
Paul B, Viktor S, Royce TE, Rozowsky JS, Urban AE, Xiaowei Z, Rinn JL, Waraporn T, Manoj S, Sherman W. Global identification of human transcribed sequences with genome tiling arrays. Sci. 2004;306(5705):2242–6.
Article CAS Google Scholar
Piero C, Albin S, Boris L, Shintaro K, Kazuro S, Jasmina P, Semple CAM, Taylor MS. Engstr?M PRG, Frith MC: genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet. 2006;38(6):626–35.
Article CAS Google Scholar
Nina H, Damjan G. Long non-coding RNA in cancer. Int J Mol Sci. 2013;14(3):4655–69.
Article CAS Google Scholar
Mercer TR, Dinger ME, Mattick JS. Long non-coding RNAs: insights into functions. Nat Rev Genet. 2009;10(3):155–9.
Article CAS PubMed Google Scholar
Mitchell G, Pamela R, Ingolia NT, Weissman JS, Lander ES. Ribosome profiling provides evidence that large noncoding RNAs do not encode proteins. Cell. 2013;154(1):240–51.
Article CAS Google Scholar
Borsani G, ., Tonlorenzi R, ., Simmler MC, Dandolo L, ., Arnaud D, ., Capra V, ., Grompe M, ., Pizzuti A, ., Muzny D, ., Lawrence C, . Characterization of a murine gene expressed from the inactive X chromosome. Nat 1991, 351(6324):325–329.
Article CAS Google Scholar
Brockdorff N, Ashworth A, Kay GF, Mccabe VM, Norris DP, Cooper PJ, Swift S, Rastan S. The product of the mouse Xist gene is a 15 kb inactive X-specific transcript containing no conserved ORF and located in the nucleus. Cell. 1992;71(3):515–26.
Article CAS PubMed Google Scholar
Mitchell G, Manuel G, Levin JZ, Julie D, James R, Xian A, Lin F, Koziol MJ, Andreas G, Chad N. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol. 2010;28(5):503–10.
Article CAS Google Scholar
Guttman M, Amit I, Garber M, French C, Lin MF, Feldser D, Huarte M, Zuk O, Carey BW, Cassady JP. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature. 2009;458(7235):223.
Article CAS PubMed PubMed Central Google Scholar
Ponting CP, Oliver PL, Reik W. Evolution and functions of long noncoding RNAs. Cell. 2009;136(4):629–41.
Article CAS PubMed Google Scholar
Wilusz JE, Hongjae S, Spector DL. Long noncoding RNAs: functional surprises from the RNA world. Genes Dev. 2009;23(13):1494–504.
Article CAS PubMed PubMed Central Google Scholar
Gupta RA, Nilay S, Wang KC, Jeewon K, Horlings HM, Wong DJ, Miao-Chih T, Tiffany H, Pedram A, Rinn JL. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature. 2010;464(7291):1071–6.
Article CAS PubMed PubMed Central Google Scholar
Pibouin L, Villaudy J, Ferbus D, Muleris M, Prospéri MT, Remvikos Y, Goubin G. Cloning of the mRNA of overexpression in colon carcinoma-1 : a sequence overexpressed in a subset of colon carcinomas. Cancer Genet Cytogenet. 2002;133(1):55–60.
Article CAS PubMed Google Scholar
Ji P, Diederichs SW, Boing S, Metzger R, Schneider PM, Tidow N, Brandt B, Buerger H, Bulk E, Thomas M. MALAT-1, a novel noncoding RNA, and thymosin beta4 predict metastasis and survival in early-stage non-small cell lung cancer. Oncogene. 2003;22(39):8031.
Article PubMed CAS Google Scholar
Spizzo R, ., Almeida MI, Colombatti A, ., Calin GA: Long non-coding RNAs and cancer: a new frontier of translational research? Oncogene 2012, 31(43):4577–4587.
Article CAS PubMed PubMed Central Google Scholar
Chen G, Wang Z, Wang D, Qiu C, Liu M, Chen X, Zhang Q, Yan G, Cui Q. LncRNADisease: a database for long-non-coding RNA-associated diseases. Nucleic Acids Res. 2012;41(D1):D983–6.
Article PubMed Central CAS PubMed Google Scholar
Quek XC, Thomson DW, Maag JL, Bartonicek N, Signal B, Clark MB, Gloss BS. Dinger ME.lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs. Nucleic Acids Res. 2015;43(Database issue):D168–73.
Article CAS PubMed Google Scholar
Bu D, Yu K, Sun S, Xie C, Skogerbø G, Miao R, Xiao H, Liao Q, Luo H, Zhao G. NONCODE v3. 0: integrative annotation of long noncoding RNAs. Nucleic Acids Res. 2011;40(D1):D210–5.
Article PubMed PubMed Central CAS Google Scholar
Ning S, Zhang J, Wang P, Zhi H, Wang J, Liu Y, Gao Y, Guo M, Yue M, Wang L. Lnc2Cancer: a manually curated database of experimentally supported lncRNAs associated with various human cancers. Nucleic Acids Res. 2015;44(D1):D980–5.
Article PubMed PubMed Central CAS Google Scholar
Ideker T, Sharan R. Protein networks in disease. Genome Res. 2008;18(4):644–52.
Article CAS PubMed PubMed Central Google Scholar
Ming L, Qipeng Z, Min D, Jing M, Yanhong G, Wei G, Qinghua C. An analysis of human microRNA and disease associations. PLoS One. 2008;3(10):e3420.
Article CAS Google Scholar
Xing C, Gui-Ying Y. Novel human lncRNA-disease association inference based on lncRNA expression profiles. Bioinformatics. 2013;29(20):2617–24.
Article CAS Google Scholar
Ping P, Wang L, Kuang L, Ye S, Iqbal MFB, Pei T. A novel method for lncRNA-disease association prediction based on an lncRNA-disease association network. IEEE/ACM Trans Comput Biol Bioinform. 2018;16(2):688–93.
Article PubMed Google Scholar
Zhao H, Kuang L, Wang L, Ping P, Xuan Z, Pei T, Wu Z. Prediction of microRNA-disease associations based on distance correlation set. BMC Bioinformatics. 2018;19(1):141.
Article PubMed PubMed Central CAS Google Scholar
Chen X. KATZLDA: KATZ measure for the lncRNA-disease association prediction. Sci Rep. 2014;5(1):16840.
Article CAS Google Scholar
Katz L. A new status index derived from sociometric analysis. Psychometrika. 1953;18(1):39–43.
Article Google Scholar
Chen X, Yan CC, Luo C, Ji W, Zhang Y, Dai Q. Constructing lncRNA functional similarity network based on lncRNA-disease associations and disease semantic similarity. Sci Rep. 2015;5:11338.
Article PubMed Central PubMed Google Scholar
Chen X, Liu MX, Yan GY. RWRMDA: predicting novel human microRNA-disease associations. Mol BioSyst. 2012;8(10):2792–8.
Article CAS PubMed Google Scholar
Chen X. miREFRWR: a novel disease-related microRNA-environmental factor interactions prediction method. Mol BioSyst. 2016;12(2):624–33.
Article CAS PubMed Google Scholar
Chen X, Liu M-X, Yan G-Y. Drug–target interaction prediction by random walk on the heterogeneous network. Mol BioSyst. 2012;8(7):1970–8.
Article CAS PubMed Google Scholar
Jie S, Hongbo S, Zhenzhen W, Changjian Z, Lin L, Letian W, Weiwei H, Dapeng H, Shulin L, Meng Z. Inferring novel lncRNA-disease associations based on a random walk model of a lncRNA functional similarity network. Mol BioSyst. 2014;10(8):2074–81.
Article Google Scholar
Chen X, You ZH, Yan GY, Gong DW. IRWRLDA: improved random walk with restart for lncRNA-disease association prediction. Oncotarget. 2016;7(36):57919–31.
PubMed PubMed Central Google Scholar
Fan XN, Zhang SW, Zhang SY, Zhu K, Lu S. Prediction of lncRNA-disease associations by integrating diverse heterogeneous information sources with RWR algorithm and positive pointwise mutual information. BMC Bioinformatics. 2019;20(1):87.
Article PubMed PubMed Central Google Scholar
Xuan Z, Li J, Yu J, Feng X, Zhao B, Wang L. A probabilistic matrix factorization method for identifying lncRNA-disease associations. Genes. 2019;10(2):126.
Article CAS PubMed Central Google Scholar
van Laarhoven T, Nabuurs SB, Marchiori E. Gaussian interaction profile kernels for predicting drug–target interaction. Bioinformatics. 2011;27(21):3036–43.
Article PubMed CAS Google Scholar
Spiess PE, Dhillon J, Baumgarten AS, Johnstone PA, Giuliano AR. Pathophysiological basis of human papillomavirus in penile cancer: key to prevention and delivery of more effective therapies. CA Cancer J Clin. 2016;66(6):481–95.
Article PubMed Google Scholar
Tony G, Monika HM, Moritz E, Jeff H, Youngsoo K, Alexey R, Gayatri A, Marion S, Matthias G. The noncoding RNA MALAT1 is a critical regulator of the metastasis phenotype of lung cancer cells. Cancer Res. 2013;73(3):1180–9.
Article CAS Google Scholar
White NM, Cabanski CR, Silva-Fisher JM, Dang HX, Govindan R, Maher CA. Transcriptome sequencing reveals altered long intergenic non-coding RNAs in lung cancer. Genome Biol. 2014;15(8):429.
Article PubMed PubMed Central Google Scholar
Omer A, Singh P, Yadav NK. Singh RK: microRNAs: role in leukemia and their computational perspective. Wiley Interdiscip Rev: RNA. 2015;6(1):65–78.
Article CAS PubMed Google Scholar
Wang D, Wang J, Lu M, Song F, Cui Q. Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases. Bioinform. 2010;26(13):1644–50.
Article CAS Google Scholar
Chen X. Predicting lncRNA-disease associations and constructing lncRNA functional similarity network based on the information of miRNA. Sci Rep. 2015;5:13186.
Article CAS PubMed PubMed Central Google Scholar
Vanunu O, Magger O, Ruppin E, Shlomi T, Sharan R. Associating genes and protein complexes with disease via network propagation. PLoS Comput Biol. 2010;6(1):e1000641.
Article PubMed PubMed Central CAS Google Scholar

Download references

Acknowledgements

The authors thank the anonymous referees for suggestions that helped improve the paper substantially.

Funding

This research was partly sponsored by the National Natural Science Foundation of China (No.61873221,No. 61672447) and the Natural Science Foundation of Hunan Province (No.2018JJ4058, No.2019JJ70010, No.2017JJ5036). Publication costs were funded by the National Natural Science Foundation of China (No.61873221,No.61672447). The funder of manuscript is Lei Wang(L.W.),whose contribution are stated in the section of Author’s Contributions. The funding body has not played any roles in the design of the study and collection,analysis and interpretation of data in writing the manuscript.

Author information

Authors and Affiliations

College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, Hunan, People’s Republic of China
Jiechen Li, Xueyong Li, Xiang Feng & Lei Wang
Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, XiangTan, People’s Republic of China
Jiechen Li, Xiang Feng, Bihai Zhao & Lei Wang
School of Electrical and Information Engineering, Anhui University of Technology, Anhui, 243002, Maanshan, People’s Republic of China
Bing Wang

Authors

Jiechen Li
View author publications
You can also search for this author in PubMed Google Scholar
Xueyong Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Feng
View author publications
You can also search for this author in PubMed Google Scholar
Bing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bihai Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Lei Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

JCL conceived the study. JCL, XF, LW improved the study based on the original model. XYL, BW and BHZ implemented the algorithms corresponding to the study. LW supervised the study. JCL and LW wrote the manuscript of the study. All authors reviewed and improved the manuscript.

Corresponding author

Correspondence to Lei Wang.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1.

The known lncRNA-disease associations for constructing the known lncRNA-disease network. We list 1695 known lncRNA-disease associations which were collected from LncRNAdisease datasetit is the latest version in the database.

Additional file 2.

The known 828 lncRNAs name Included in the 1695 known lncRNA-disease associations which were collected from LncRNAdisease datasetit is the latest version in the database.

Additional file 3.

The known 314 diseases name Included in the 1695 known lncRNA-disease associations which were collected from LncRNAdisease datasetit is the latest version in the database.

Additional file 4.

The known 98 human cancer,668 lncRNAs and 1103 confirmed associations between them from Lnc2Cancer database.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Li, J., Li, X., Feng, X. et al. A novel target convergence set based random walk with restart for prediction of potential LncRNA-disease associations. BMC Bioinformatics 20, 626 (2019). https://doi.org/10.1186/s12859-019-3216-4

Download citation

Received: 08 February 2019
Accepted: 12 November 2019
Published: 03 December 2019
DOI: https://doi.org/10.1186/s12859-019-3216-4

A novel target convergence set based random walk with restart for prediction of potential LncRNA-disease associations

Abstract

Background

Results

Conclusions

Background

Results

Cross validation

Comparison with other related methods

Analysis on effects of parameters

Case studies

Discussion

Conclusion

Methods and materials

Known disease-lncRNA associations

Similarity of diseases

Semantic similarity of diseases

Step 1

Step 2

Step 3

Step 4

Step 5

Gaussian interaction profile kernel similarity of diseases

Integrated similarity of diseases

Similarity of LncRNAs

Functional similarity of LncRNAs

Step 1

Step 2

Gaussian interaction profile kernel similarity of lncRNAs

Integrated similarity of lncRNAs

Construction of computational model TCSRWRLD

The establishment of heterogeneous network

Establishment of TCS (target convergence set)

Step 1

Step 2

Step 3

Random walk in the heterogeneous LncRNA-disease network

Step 1

Step 2

Step 3

Step 4

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary information

Additional file 1.

Additional file 2.

Additional file 3.

Additional file 4.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us