 Research article
 Open Access
 Published:
Prediction of lncRNAdisease associations by integrating diverse heterogeneous information sources with RWR algorithm and positive pointwise mutual information
BMC Bioinformatics volume 20, Article number: 87 (2019)
Abstract
Background
Long noncoding RNAs play an important role in human complex diseases. Identification of lncRNAdisease associations will gain insight into diseaserelated lncRNAs and benefit disease diagnoses and treatment. However, using experiments to explore the lncRNAdisease associations is expensive and time consuming.
Results
In this study, we developed a novel method to identify potential lncRNAdisease associations by Integrating Diverse Heterogeneous Information sources with positive pointwise Mutual Information and Random Walk with restart algorithm (namely IDHIMIRW). IDHIMIRW first constructs multiple lncRNA similarity networks and disease similarity networks from diverse lncRNArelated and diseaserelated datasets, then implements the random walk with restart algorithm on these similarity networks for extracting the topological similarities which are fused with positive pointwise mutual information to build a largescale lncRNAdisease heterogeneous network. Finally, IDHIMIRW implemented random walk with restart algorithm on the lncRNAdisease heterogeneous network to infer potential lncRNAdisease associations.
Conclusions
Compared with other stateoftheart methods, IDHIMIRW achieves the best prediction performance. In case studies of breast cancer, stomach cancer, and colorectal cancer, 36/45 (80%) novel lncRNAdisease associations predicted by IDHIMIRW are supported by recent literatures. Furthermore, we found lncRNA LINC01816 is associated with the survival of colorectal cancer patients. IDHIMIRW is freely available at https://github.com/NWPU903PR/IDHIMIRW.
Background
Long noncoding RNAs (lncRNAs) are the biggest part of noncoding RNAs with at least 200 nucleotides and no observed potential to encode proteins [1, 2]. To date, 15,778 lncRNA genes and 27,908 lncRNA transcripts have been annotated in human genome by the GENCODE v27. Increasing evidences have revealed that lncRNAs have key roles in gene regulations, affecting cellular proliferation, survival, migration and genomic stability [3,4,5,6,7]. Therefore, there is no surprise that mutation and dysregulation of lncRNAs could contribute to the development of various human complex diseases [8,9,10], such as HOTAIR in breast cancer [11] and MALAT1 in earlystage nonsmall cell lung cancer [12]. On the other hand, lncRNAs can drive many important cancer phenotypes through their interactions with other cellular macromolecules including DNA, protein, and RNA [4]. For example, lncRNA PCGEM1 and PRNCR1 are associated with androgen receptor in prostate cancer cells [6]. And lncRNA PTCSC3 could be a tumor suppressor in thyroid cancer cells by interacting with miR5745p [13].
In recent years, the number of experimentally verified lncRNAdisease associations is gradually increasing. Several databases for lncRNA functions and disease associations have been published, such as LncRNAdb [14], LncRNADisease [15], Lnc2Cancer [16] and NONCODE [17]. However, known lncRNAdisease associations still involve a small part of lncRNAs and diseases. Computational methods have been developed to predict the potential lncRNAdisease associations that can be used as candidates for biological experiment verifications, which would greatly reduce the experiment cost and save time for finding new lncRNAdisease associations. Existing computational methods can mainly be categorized into machine learningbased methods [18,19,20,21,22,23,24,25,26,27,28,29] and networkbased methods [30,31,32,33,34,35,36,37,38,39,40,41]. The machine learningbased methods, such as LRLSLDA [18], LDAP [26], and MFLDA [27], have been developed to predict the potential lncRNAdisease associations. LRLSLDA [18] combined optimal classifiers in lncRNA space and disease space into a single classifier to predict lncRNAdisease associations based on lncRNA expression profiles and known lncRNAdisease associations. But how to combine the classifiers reasonably needs to further study. LDAP [26] employed two lncRNA similarity measures and five disease similarity measures to calculate lncRNA similarities and disease similarities, respectively, then used the bagging SVM to predict lncRNAdisease associations. However, this method suffered from fusing multiple similarities effectively. Fu et al. [27] developed a lncRNAdisease associations prediction model (MFLDA) with matrix factorization by integrating seven relational data sources between six object types (e.g. lncRNAs, miRNAs, genes, Gene Ontology, Disease Ontology, and drugs). Yet, MFLDA can only predict the potential lncRNAdisease associations which share both lncRNAs and diseases with known associations in training set.
The networkbased methods, such as RWRlncD [30], RWRHLD [32], KATZLDA [33] and GrwLDA [40], use lncRNAdisease association, disease similarity, lncRNA similarity, and other molecular similarity to construct the lncRNA similarity networks, or lncRNAdisease heterogeneous network, then implement global network models (such as random walk and various propagation algorithms) to predict potential lncRNAdisease associations [10]. RWRlncD [30] constructed a lncRNA similarity network based on known lncRNAdisease associations, i.e., each lncRNA in their network has at least one known lncRNAdisease association, for predicting potential lncRNAdisease associations. So, the major limitation of RWRlncD is that it cannot predict lncRNAdisease associations for lncRNAs and diseases without any known lncRNAdisease associations. RWRHLD [32] calculated lncRNA similarities and disease similarities based on crosstalk between lncRNAs and miRNAs and directed acyclic graph in the disease ontology, respectively. One weakness of RWRHLD is that lncRNAs interacting with similar miRNAs do not always mean related with similar diseases, and only a small fraction of lncRNAmiRNA interactions is used [25]. KATZLDA [33] integrated lncRNA expression similarity, lncRNA functional similarity, Gaussian interaction profile kernel similarity for diseases and lncRNAs, disease semantic similarity, and known lncRNAdisease associations to build a lncRNAdisease heterogeneous network, then used KATZ algorithm to calculate potential association probability of each lncRNAdisease pair. GrwLDA [40] introduced a global network random walk method to predict potential lncRNAdiseases association by integrating disease semantic similarity, lncRNA functional similarity and known lncRNAdisease associations. Overall, the results of existing networkbased methods show that integrating diverse lncRNArelated and diseaserelated information can boost the prediction accuracy of the lncRNAdisease association. However, most existing methods are limited to a small number of lncRNAs and diseases. For example, the network built in RWRHLD involves 697 lncRNAs and 126 diseases, while the network built in GrwLDA just involves 78 lncRNAs and 113 diseases. In addition, most existing methods calculate the lncRNA/disease similarities only on those that have at least one known lncRNAdisease association.
To address the aforementioned issues (or limitations) and further improve the prediction accuracy, we proposed a novel networkbased method, namely IDHIMIRW, to predict the potential lncRNAdisease associations by constructing a largescale lncRNAdisease heterogeneous network with Random Walk with Restart (RWR) algorithm and the positive pointwise mutual information (PPMI). Instead of constraining lncRNA and disease on those with at least one known lncRNAdisease association, IDHIMIRW calculates the lncRNA similarities for all the lncRNAs involved in lncRNA expression profiles, lncRNAmiRNA interactions, and lncRNAprotein interactions, and also calculates the diseases similarities for all the diseases involved in disease ontology, diseasemiRNA associations, and diseasegene associations. Then, IDHIMIRW uses the RWR algorithm on each similarity network to capture network topological structural features for measuring the lncRNA/disease topological similarity through the PPMI. By integrating the lncRNA/disease topological similarity, and introducing the known lncRNAdisease association information, a largescale lncRNAdisease heterogeneous network is built. Finally, the random walk with restart on heterogeneous network (RWRH) algorithm [42] is applied on the lncRNAdisease heterogeneous network to predict the potential lncRNAdisease associations. The computational results show that IDHIMIRW cannot only better predict the known lncRNAdisease associations, but also can effectively predict the potential lncRNAdisease associations, providing more candidates for experimental verification. Most of the new predicted lncRNAdisease associations are supported by recent literatures. By analyzing nine unvalidated lncRNAs, we found that six lncRNAs were differentially expressed in corresponding cancers. We also found that lncRNA LINC01816 is associated with the survival of colorectal cancer patients, which provides evidence that this lncRNA is diseaserelated.
Results
In this section, we first introduced the evaluation method and metrices for evaluating the performance of the IDHIMIRW method. Then, we compared our IDHIMIRW method with other existing stateofthe art methods on a smallscale lncRNAdisease heterogeneous network, explored the predictive power of IDHIMIRW on a largescale lncRNAdisease heterogeneous network, and discussed the effect of different parameters. In the end, we analyzed several predicted potential lncRNAdisease associations with our IDHIMIRW.
Evaluation method and metrices
The leaveoneout cross validation (LOOCV) test method was used to evaluate the performance of the IDHIMIRW method. In LOOCV test method, each known lncRNAdisease association in the dataset is singled out in turn as a test sample, and the remaining lncRNAdisease associations are used as training samples. That is, for a given disease d_{i}, each known lncRNA associated with d_{i} is left out in turn as a test sample, and corresponding association edge between test lncRNA and d_{i} is removed, and the remaining lncRNAs associated with d_{i} are considered as training samples.
The area under the receiver operating characteristic (ROC) curve (AUC) and the area under the precisionrecall (PR) curve (AUPR) were used as evaluation metrices in our experiments. The ROC curve is the plot of the truepositive rate (TPR, or Recall) versus the falsepositive rate (FPR) at different rank cutoffs. The PR curve is the plot of the ratio of true positives among all positive predictions for each given recall rate.
Comparison with other methods
We compared our IDHIMIRW method with other six stateoftheart methods of LRLSLDA [18], LNCSIM [19], RWRlncD [30], IRWRLDA [34], KATZLDA [33] and GrwLDA [40] on the smallscale lncRNAdisease heterogeneous network (HNet_{S}) which contains 362 lncRNAs, 370 diseases, and 2169 known lncRNAdisease associations. Most existing methods often built this smallscale lncRNAdisease heterogeneous network in which each lncRNA (or disease) has at least an associated disease (or lncRNA) to predict the potential lncRNAdisease associations. LRLSLDA [18] and LNCSIM [19] adopt the semisupervised learning frameworks with Laplacian regularized least squares. RWRlncD [30], IRWRLDA [34], KATZLDA [33] and GrwLDA [40] are the networkbased methods. All methods were executed on a win10 system pc with i7–6700 CPU and 16.0G memory. Figure 1 shows the AUC and AUPR values of IDHIMIRW and other six methods. IDHIMIRW achieved a better performance than other six methods in terms of AUC and AUPR. The AUC of IDHIMIRW is 0.866, which is 0.337, 0.108, 0.350, 0.245, 0.197 and 0.061 higher than that of LRLSLDA, LNCSIM, RWRlncD, IRWRLDA, KATZLDA and GrwLDA, respectively. The AUCPR of IDHIMIRW is 0.318, which is 0.143, 0.213, 0.296, 0.172, 0.194 and 0.166 higher than that of LRLSLDA, LNCSIM, RWRlncD, IRWRLDA, KATZLDA and GrwLDA, respectively. The recall values of seven methods at different rank cutoffs are listed in Table 1, from which we can see that the recall value of IDHIMIRW is higher than that of other six existing methods at 10, 20, 50, and 100 ran cutoff. These results show that our IDHIMIRW can effectively predict the lncRNAdisease associations.
To further evaluate the performance of IDHIMIRW for predicting the associated lncRNAs for new diseases without any known lncRNA association information, we removed all the known lncRNA associations for the query disease in the smallscale lncRNAdisease heterogeneous network. Due to RWRlncD implemented the RWR algorithm on an lncRNA similarity network, we just compared our IDHIMIRW method with other five methods of LRLSLDA, LNCSIM, IRWRLDA, KATZLDA and GrwLDA for predicting the associated lncRNAs of the query diseases. The comparison results are shown in Fig. 2, which shows that our IDHIMIRW method can better predict the associated lncRNAs for the new disease than other existing prediction methods.
Effectiveness of introducing multiple information sources
In order to illustrate the effectiveness of introducing multiple information sources, we collected 7637 lncRNAs and 6453 diseases from EMBLEBI (EMTAB5214), starBase v2.0 [43], NPInter v3.0 [44], RAID v2.0 [45], Diseases ontology [46], HMDD v2.0 [47], and DisGeNet [48] to construct a largescale lncRNAdisease heterogeneous network (HNet_{L}) by introducing 2169 known lncRNAdisease associations, then implemented our IDHIMIRW method on HNet_{L}. Additional files 1 and 2 provided the data processing procedure for lncRNAs and diseases. The results of IDHIMIRW on HNet_{S} and HNet_{L} heterogeneous networks in LOOCV test are listed in Table 2, from which we can see that introducing more lncRNAs and diseases can effectively improve the predictive performance of IDHIMIRW and can predict the potential lncRNAs/diseases for new disease/lncRNA without any known disease/lncRNA association information. All these results show that IDHIMIRW can obtain a more reliable performance for predicting lncRNAdisease associations.
Effectiveness of using the topological similarity network to construct the lncRNAdisease heterogeneous network
In order to evaluate the effectiveness of using the topological similarity network to construct the lncRNAdisease heterogeneous network for improving the predictive performance, we designed another method of IDHIAVG by adopting the strategy of averaging three lncRNA similarity matrices of LncNet1, LncNet2 and LncNet3 to form the lncRNA integration network (i.e., LncINet), averaging of three disease similarity matrices of DisNet1, DisNet2, and DisNet3 to form the disease integration network (i.e., DisINet). IDHIAVG combines these two integration similarity networks of LncINet and DisINet with known lncRNAdisease bipartite network to construct the lncRNAdisease heterogeneous network on which RWRH algorithm is implemented to predict the potential lncRNAdisease associations. The compared results of IDHIAVG and IDHIMIRW on the smallscale lncRNAdisease heterogeneous network (HNet_{S}) and largescale ncRNAdisease heterogeneous network (HNet_{L}) in LOOCV test are shown in Table 3. We can see the AUC and AUPR values of IDHIMIRW are higher than that of IHDIAVG. These results demonstrate that the strategy of using RWR and PPMI to form lncRNA/disease topological similarity networks and further constructing the lncRNAdisease heterogeneous network is effective. It can improve the performance of predicting lncRNAdisease associations.
The effect of parameters
There are four main parameters in our method, which are the restart probability α in RWR, and the restart probability β, jumping probability γ, parameter η in RWRH. η is used to weight the importance of lncRNA topological similarity subnetwork and disease topological similarity subnetwork. To evaluate the effect of parameters, we implemented our IDHIMIRW on HNet_{L} heterogeneous network in LOOCV test with different α, β, γ, and η values (varying from 0.1 to 0.9 with scale 0.1). Additional file 3 shows the AUC and AUPR values of IDHIMIRW with different parameters. We can see that the performance of IDHIMIRW is robust to the value of these four parameters. Additional file 4 presents the AUC and AUPR values of IDHIMIRW on HNet_{S} heterogeneous network in LOOCV test. In this work, we selected α = 0.9, γ = 0.9, η = 0.2, and β = 0.6.
Case studies and the potential lncRNAdisease associations analysis
We used breast cancer, stomach cancer, and colorectal cancer as the cases to predict their potential associated lncRNAs with our IDHIMIRW. For a given disease, all known lncRNAs associated with this given disease were considered as the seed nodes, and other remaining lncRNAs (i.e., without known association with the given disease) were considered as the candidates associated with the given disease. By implementing our IDHIMIRW algorithm on the largescale lncRNAdisease heterogeneous network, and according to the lncRNAdisease associations ranking scores from large to small, we extract top 15 potential association lncRNAs for each cancer. These top potential association lncRNAs are listed in Additional files 5, 6, and 7.
For breast cancer which is one of most common cancers and the second leading cause of cancer death [49], 13 out of 15 potential association lncRNAs are supported by recent literatures. For example, Diego ChaconCortes et al. [50] investigated six SNPs (i.e. rs1888138, rs7336610, rs9589207, rs17735387, rs4248505, rs1428) in the lncRNA MIR17HG, and identified significant association between rs4248505 at the allele level and rs4248505/ rs7336610 at the haplotype level susceptibility to breast cancer, which means that lncRNA MIR17HG plays the main role in the pathophysiology of breast cancer. Fu et al. [51] found lncRNA SNHG1, SNORD28 and snomiR28 are all significantly upregulated in breast tumors. LncRNA can be used as the biomarkers and therapeutic targets in combatting breast cancer [52].
For stomach cancer (or gastric cancer) which is the third leading cause of cancer mortality in the world [53, 54], 11 out of 15 potential association lncRNAs can be supported by recent literatures. For example, Hu et al. [55] discovered that lncRNA CRNDE increases gastric cancer cell viability and promotes proliferation by targeting miR145. Pan et al. [56] found that lncRNA DANCR is activated by SALL4 and promotes the proliferation and invasion of gastric cancer cells. Specially, lncRNA LINC01816 (also known as LOC100133985) associated with stomach cancer has been confirmed by Tian et al. [57]. LncRNA LINC01816 is downregulated and might be protective factor in gastric cancer.
For colorectal cancer which is the third most commonly diagnosed cancer in males and the second in females [58], 12 out of 15 potential association lncRNAs can be supported by recent literatures. For example, Zhao et al. [59] found that lncRNA SNHG1 promotes cell proliferation by affecting P53 in colorectal cancer. Zhang et al. [60] found that lncRNA CYTOR (also known as LINC00152) downregulated by miR376c3p restricts viability and promotes apoptosis of colorectal cancer cells.
To further discover the evidences for the predicted lncRNAs associated with cancers, we analyzed the RNAseq and clinical data from TCGA for breast cancer, stomach cancer and colorectal cancer. For colorectal cancer, the RNASeq data including 19,676 protein coding genes, 15,513 lncRNA genes in 41 normal samples and 474 tumor samples were downloaded from TCGA. Using DESeq2 [61] algorithm, we found 1230 significantly upregulated lncRNAs and 568 downregulated lncRNAs by setting log2FC > 1 (or < − 1), FDR < 0.001. Among three unvalidated lncRNA, lncRNA SNHG7 (14th) is significantly upregulated in tumor samples (Fig. 3a). Meanwhile, we downloaded the clinical data of 448 tumor samples, and KaplanMeier survival analysis shows that lncRNA LINC01816 (10th) can divided the 448 colorectal cancer patients into high and lowrisk groups with different survival times (Fig. 3b). The results of RNAseq and clinical data analysis for breast cancer and stomach cancer are shown in.
Additional files 8 and 9. 5/6 unvalidated lncRNAs are significantly differentially expressed in corresponding cancers.
In summary, 36 (13 for breast cancer, 11 for stomach cancer, 12 for colorectal cancer) out of 45 potential association lncRNAs have been supported by recent literatures. By analyzing the nine unvalidated potential association lncRNAs, we found that six lncRNAs are differentially expressed in corresponding cancers, and lncRNA LINC01816 is associated with the survival of patients with colorectal cancer. Results of these three case studies show that IDHIMIRW can effectively predict the new association lncRNAs for a disease.
Discussion
LncRNAs play important roles in the development of human complex diseases. More and more attentions have been paid to discover the lncRNA functions related with human complex disease. Most previous computational methods only focus on the smallscale lncRNAdisease heterogeneous network (i.e., involving small numbers of lncRNAs and diseases) to predict the lncRNAdisease associations. To address this issue, IDHIMIRW was developed to predict the potential lncRNAdisease associations based on a largescale lncRNAdisease heterogeneous network (containing 7637 lncRNAs and 6453 diseases). Instead of calculating similarities of lncRNAs and diseases only involving in known lncRNAdisease associations, IDHIMIRW used three lncRNArelated information (i.e., lncRNA expression profiles, lncRNAmiRNA interactions, and lncRNAprotein interactions) to form three lncRNA similarity networks, and three diseaserelated information (i.e., disease semantic similarity, diseasemiRNA associations, and diseasegene associations) to form three disease similarity networks. Furthermore, instead of directly fusing those similarity networks, IDHIMIRW applied the RWR algorithm on each lncRNA/disease similarity network to capture the topological similarity, and the PPMI to generate lncRNA/disease topological similarity network. The largescale lncRNAdisease heterogeneous network was constructed by combing the lncRNA topological similarity network, disease topological similarity network, and the known lncRNAdisease bipartite graph. Then, the RWRH algorithm was used to prioritize candidate lncRNAs for each query disease. Our experiment results show that IDHIMIRW achieves a better performance than other existing methods. We evaluated the effectiveness of introducing multiple information sources and capturing topological similarities, Tables 2 and 3 show that those strategies are effective for improving the performance of predicting lncRNAdisease associations. In addition, more novel lncRNAdisease associations predicted by IDHIMIRW are supported by recent literatures, which means that IDHIMIRW can effectively predict the novel association lncRNAs for a query disease. All the predicted lncRNAdisease associations are provided in Additional file 10.
Although IDHIMIRW can effectively predict potential lncRNAdisease associations, there are still several issues need to be further addressed in the future. First, IDHIMIRW used three lncRNArelated and three diseaserelated information to generate similarity matrices, we still expect to integrate more information (e.g., lncRNA GO annotations and disease MeSH annotation) to better predict lncRNAdisease association. Second, the averaging strategy was used to integrate the lncRNA/disease topological similarity matrices, we expect to design better integration approaches in future work to measure the different contributions of multiple lncRNA/disease similarities.
Conclusions
In this study, we proposed a novel networkbased method (namely IDHIMIRW) for identifying potential lncRNAdisease associations. We built a largescale lncRNAdisease heterogeneous network by integrating multiple lncRNArelated information (i.e. lncRNA expression profiles, lncRNAmiRNA interactions, and lncRNAprotein interactions), multiple diseaserelated information (i.e. disease semantic similarity, diseasemiRNA associations, and diseasegene associations), and known lncRNAdisease association information using RWR and PPMI. Our experimental results show that IDHIMIRW can achieve higher performance than other stateoftheart methods, and we found lncRNA LINC01816 is associated with the survival of colorectal cancer patients. These results indicate that IDHIMIRW will contribute to the identification of potential lncRNAdisease associations.
Methods
Datasets
We collected lncRNA expression profile, lncRNAmiRNA interaction, and lncRNAprotein interaction data for constructing the lncRNA similarity networks, and Diseases Ontology (DO) information, diseasemiRNA association, and diseaseprotein association data for constructing the disease similarity networks. All lncRNAs are annotated by ensembl gene ID, and all diseases are annotated by Disease Ontology ID.
LncRNA expression profiles were downloaded from EMBLEBI (EMTAB5214), which includes the expression profiles in 53 human tissue samples. LncRNAmiRNA interactions and lncRNAprotein interactions were collected from starBase v2.0 [43], NPInter v3.0 [44], and RAID v2.0 [45] databases. Diseases ontology terms were collected from the Disease ontology [46]. DiseasesmiRNAs associations were collected from HMDD v2.0 [47]. Diseasegene associations were collected from DisGeNet [48]. Known lncRNAdisease associations were collected from lncRNAdisease [15], lnc2Cancer [16], and GeneRIF [62]. Details and statistics of these data are shown in Additional file 11.
An overview of the IDHIMIRW algorithm
Our IDHIMIRW algorithm consists of the following four steps. Step 1, build three lncRNA similarity networks (i.e., LncNet1, LncNet2, LncNet3) based on lncRNA expression profiles, lncRNAmiRNA interactions, and lncRNAprotein interactions, and also build three disease similarity networks (i.e., DisNet1, DisNet2, DisNet3) based on disease ontology, diseasemiRNA associations, and diseasegene associations. Step 2, form the lncRNA topological similarity network (LncTSNet) and disease topological similarity network (DisTSNet) by fusing lncRNA and disease multiple topological similarities obtained through implementing RWR on lncRNA similarity network (LncNet1, LncNet2, LncNet3) and disease similarity network (DisNet1, DisNet2, DisNet3), respectively. Step 3, construct a largescale lncRNAdisease heterogeneous network by integrating lncRNA topological similarity network (LncTSNet), disease topological similarity network (DisTSNet), and known lncRNAdisease associations. Step 4, implement RWRH on the lncRNAdisease heterogeneous network for predicting the potential lncRNAdisease associations. The flowchart of IDHIMIRW is shown in Fig. 4.
Building lncRNA/disease similarity networks
By calculating the Pearson correlation coefficient of any lncRNA pair with expression profiles and fixing the Pvalue threshold (< 0.01), we built the LncNet1 lncRNA similarity weighted network. Based on Gaussian interaction profile kernel similarity [18, 63] of lncRNAmiRNA and lncRNAprotein interactions, we computed the Gaussian interaction profile kernel similarity between any pair of lncRNA l_{i} and lncRNA l_{j}, then built the LncNet2 and LncNet3 lncRNA similarity weighted networks, respectively. Gaussian interaction profile kernel similarity between lncRNA l_{i} and lncRNA l_{j} is calculated.
where, the interaction profile IP(l_{i}) is the binary vector of lncRNAmiRNA (or lncRNAprotein) interactions encoding the presence or absence of interactions between lncRNA l_{i} and miRNA (or protein) in the lncRNAmiRNA (or lncRNAprotein) interaction dataset, κ_{l} controls the kernel bandwidth, and N_{l} is the total number of lncRNAs.
Based on the structure of a directed acyclic graph (DAG) in Disease Ontology, we used the function “doSim” form R package “DOSE” [64] to obtain the similarity between any disease pair, then built the DisNet1 disease similarity weighted network. Based on Gaussian interaction profile kernel similarity of diseasemiRNA and diseasegene associations, we computed the Gaussian interaction profile kernel similarity between any pair of disease d_{i} and d_{j}, then built the DisNet2 and DisNet3 disease similarity weighted networks, respectively.
where, the interaction profile IP(d_{i}) is the binary vector of diseasemiRNA (or diseasegene) associations encoding the presence or absence of associations between d_{i} and miRNA (or gene) in the diseasemiRNA (or diseasegene) association dataset. κ_{d} controls the kernel bandwidth, and N_{d} is the total number of diseases.
Generating lncRNA/disease topological similarity networks
Instead of directly fusing six similarity networks (i.e., LncNet1, LncNet2, LncNet3, DisNet1, DisNet2, and DisNet3), we captured the network topological structural features by implementing the RWR algorithm on each similarity network. The RWR algorithm is a network diffusion algorithm, which has been extensively applied to analyze the complex biological network [65,66,67,68,69]. By considering both local and global topological connectivity patterns within network, the RWR algorithm can fully exploit the direct or indirect relation between nodes [65]. The RWR algorithm can be formulated as:
where, S^{t} is the distribution matrix in which the (i, j)th element denotes the distribution probability of node j being visited from node i after t iterations in the random walk process and S^{0} is the initial distribution matrix in which S^{0}(i, i) = 1, S^{0}(i, j) = 0, ∀j ≠ i. α is restart probability controlling the relative influence of local and global topological information. B is the weighted adjacency matrix of lncRNA (or disease).
When the L1 norm of ΔS = S^{t + 1} − S^{t}is less than a small positive ε (we set ε = 10^{−10}), we can obtain a stationary distribution matrix S, which was referred as the diffusion state of each node [70]. The element S(i, j) in diffusion state matrix S represents the probability of RWR starting node i and ending up at node j in equilibrium. When the diffusion states of two nodes are close, which suggests that they may have similar positions with respect to other nodes in the network and they probably share similar functions.
Motivated by Gligorijevic et.al. [69], we then calculated the topological similarity of each node pair by using PPMI, which is defined as:
The matrix MI is a nonsymmetric matrix, thus we use the average of MI(i, j) and MI(j, i) to represent the topological similarity of node i and node j. After obtaining three lncRNA topological similarity matrices \( {X}_L^1 \), \( {X}_L^2 \), \( {X}_L^3 \) of LncNet1, LncNet2, LncNet3, and three disease topological similarity matrices \( {X}_D^1 \), \( {X}_D^2 \), \( {X}_D^3 \) of DisNet1, DisNet2, DisNet3, we can form the integration lncRNA topological similarity matrix \( {X}_L^{\prime } \) by averaging three lncRNA topological similarity matrices, and the disease topological similarity matrix \( {X}_D^{\prime } \) by averaging three disease topological similarity matrices, that is, \( {X}_L^{\prime }=\left({X}_L^1+{X}_L^2+{X}_L^3\right)/3 \), \( {X}_D^{\prime }=\left({X}_D^1+{X}_D^2+{X}_D^3\right)/3 \). Thus, we generated the lncRNA topological similarity network LncTSNet, and disease topological similarity network DisTSNet.
Constructing the lncRNAdisease heterogeneous network
By integrating the LncTSNet and DisTSNet networks with known lncRNAdisease bipartite network, we can construct the lncRNAdisease heterogeneous network whose adjacency matrix can be defined as:
where, A_{L} and A_{D} represent the weighted adjacency matrices of LncTSNet and DisTSNet, respectively; A_{LD} is the adjacency matrix of the lncRNAdisease bipartite graph; A_{DL} represents the transpose of A_{LD}. If there is association between lncRNA i and disease j in known lncRNAdisease associations, A_{LD}(i, j) = 1, otherwise, A_{LD}(i, j) = 0.
Implementing RWRH algorithm for predicting lncRNAdisease associations
To predict the association between lncRNA and disease, we adopted the RWRH (random walk with restart on heterogeneous network) algorithm [42] to prioritize candidate lncRNAs associated with a given disease. The RWRH algorithm is wellknown heterogeneous networkbased algorithm to infer the genephenotype relationship. It can effectively capture the complementarity of two kinds of node within heterogeneous network, which is widely used to predict the association problem [42, 71, 72]. The RWRH algorithm on the lncRNAdisease heterogeneous network can be formulated as:
where, p^{t} is a probability vector in which the ith element holds the probability of finding the random walker at node i at step t; β ∈ (0, 1) is restart probability; p^{0} is the initial probability vector for lncRNAdisease heterogeneous network which is defined as \( {p}^0=\left[\begin{array}{c}\eta \ast {u}_0\\ {}\left(1\eta \right)\ast {v}_0\end{array}\right] \). u_{0} and v_{0} represent the initial probability of LncTSNet and DisTSNet, respectively. The initial probability u_{0} of LncTSNet network is set such that all the seed nodes are assigned to the equal probabilities with the sum of probabilities equal to 1. Similarity, the initial probability v_{0} of DisTSNet network is given. The parameter η ∈ (0, 1) is used to weight the importance of each subnetwork.
\( M=\left[\begin{array}{cc}{M}_L& {M}_{LD}\\ {}{M}_{DL}& {M}_D\end{array}\right] \) is the transition matrix of the lncRNAdisease heterogenous network, where M_{L} and M_{D} are the intrasubnetwork transition matrices, M_{LD} and M_{DL} are the intersubnetwork transition matrices. Let γ be the jumping probability, that is, the probability of random walker jumping from lncRNA network to disease network or vice versa. Thus, the transition probability M_{L}(i, j) from lncRNA l_{i} to lncRNA l_{j} and the transition probability M_{D} (i, j) from disease d_{i} to disease d_{j} are defined as
The transition probability from lncRNA l_{i} to disease d_{j} and the transition probability from disease d_{i} to lncRNA l_{j} are described as:
After some steps, the steady state probability vector p^{∗} = p^{∞} can be obtained by performing the iteration until the difference between p^{t} and p^{t + 1} (measured by the L_{1} norm) fall below 10^{−10}. p^{∗} gives the ranking score of every lncRNA for a query disease. The lncRNAs with maximum in p^{∗} are considered as the most probable associated lncRNAs of the query disease.
Abbreviations
 AUC:

The area under the receiver operating characteristic curve
 AUPR:

The area under the precisionrecall curve
 DAG:

Directed acyclic graph
 DO:

Disease ontology
 FPR:

Falsepositive rate
 lncRNAs:

Long noncoding RNAs
 LOOCV:

Leaveoneout cross validation; ROC: receiver operating characteristic
 PPMI:

Positive pointwise mutual information
 PR:

Precisionrecall
 RWR:

Random walk with restart
 RWRH:

Random walk with restart on heterogeneous network
 TPR:

Truepositive rate
References
 1.
Quinn JJ, Chang HY. Unique features of long noncoding RNA biogenesis and function. Nat Rev Genet. 2016;17(1):47–62.
 2.
Rinn JL, Chang HY. Genome regulation by long noncoding RNAs. Annu Rev Biochem. 2012;81:145–66.
 3.
Huarte M. The emerging role of lncRNAs in cancer. Nat Med. 2015;21(11):1253–61.
 4.
Schmitt AM, Chang HY. Long noncoding RNAs in Cancer pathways. Cancer Cell. 2016;29(4):452–63.
 5.
Quinodoz S, Guttman M. Long noncoding RNAs: an emerging link between gene regulation and nuclear organization. Trends Cell Biol. 2014;24(11):651–63.
 6.
Yang L, Lin C, Jin C, Yang JC, Tanasa B, Li W, Merkurjev D, Ohgi KA, Meng D, Zhang J, et al. lncRNAdependent mechanisms of androgenreceptorregulated gene activation programs. Nature. 2013;500(7464):598–602.
 7.
Lee S, Kopp F, Chang TC, Sataluri A, Chen B, Sivakumar S, Yu H, Xie Y, Mendell JT. Noncoding RNA NORAD regulates genomic stability by sequestering PUMILIO proteins. Cell. 2016;164(1–2):69–80.
 8.
Yan X, Hu Z, Feng Y, Hu X, Yuan J, Zhao SD, Zhang Y, Yang L, Shan W, He Q, et al. Comprehensive genomic characterization of long noncoding RNAs across human cancers. Cancer Cell. 2015;28(4):529–40.
 9.
Wapinski O, Chang HY. Long noncoding RNAs and human disease. Trends Cell Biol. 2011;21(6):354–61.
 10.
Chen X, Yan CC, Zhang X, You ZH. Long noncoding RNAs and complex diseases: from experimental results to computational models. Brief Bioinform. 2017;18(4):558–76.
 11.
Gupta RA, Shah N, Wang KC, Kim J, Horlings HM, Wong DJ, Tsai MC, Hung T, Argani P, Rinn JL, et al. Long noncoding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature. 2010;464(7291):1071–6.
 12.
Ji P, Diederichs S, Wang W, Boing S, Metzger R, Schneider PM, Tidow N, Brandt B, Buerger H, Bulk E, et al. MALAT1, a novel noncoding RNA, and thymosin beta4 predict metastasis and survival in earlystage nonsmall cell lung cancer. Oncogene. 2003;22(39):8031–41.
 13.
Fan M, Li X, Jiang W, Huang Y, Li J, Wang Z. A long noncoding RNA, PTCSC3, as a tumor suppressor and a target of miRNAs in thyroid cancer cells. Exp Ther Med. 2013;5(4):1143–6.
 14.
Quek XC, Thomson DW, Maag JL, Bartonicek N, Signal B, Clark MB, Gloss BS, Dinger ME. lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs. Nucleic Acids Res. 2015;43(Database issue):D168–73.
 15.
Chen G, Wang Z, Wang D, Qiu C, Liu M, Chen X, Zhang Q, Yan G, Cui Q. LncRNADisease: a database for longnoncoding RNAassociated diseases. Nucleic Acids Res. 2013;41(Database issue):D983–6.
 16.
Ning S, Zhang J, Wang P, Zhi H, Wang J, Liu Y, Gao Y, Guo M, Yue M, Wang L, et al. Lnc2Cancer: a manually curated database of experimentally supported lncRNAs associated with various human cancers. Nucleic Acids Res. 2016;44(D1):D980–5.
 17.
Zhao Y, Li H, Fang S, Kang Y, Wu W, Hao Y, Li Z, Bu D, Sun N, Zhang MQ, et al. NONCODE 2016: an informative and valuable data source of long noncoding RNAs. Nucleic Acids Res. 2016;44(D1):D203–8.
 18.
Chen X, Yan GY. Novel human lncRNAdisease association inference based on lncRNA expression profiles. Bioinformatics. 2013;29(20):2617–24.
 19.
Chen X, Yan CC, Luo C, Ji W, Zhang Y, Dai Q. Constructing lncRNA functional similarity network based on lncRNAdisease associations and disease semantic similarity. Sci Rep. 2015;5:11338.
 20.
Huang YA, Chen X, You ZH, Huang DS, Chan KC. ILNCSIM: improved lncRNA functional similarity calculation model. Oncotarget. 2016;7(18):25902–14.
 21.
Chen X, Huang YA, Wang XS, You ZH, Chan KC. FMLNCSIM: fuzzy measurebased lncRNA functional similarity calculation model. Oncotarget. 2016;7(29):45948–58.
 22.
Liu MX, Chen X, Chen G, Cui QH, Yan GY. A computational framework to infer human diseaseassociated long noncoding RNAs. PLoS One. 2014;9(1):e84408.
 23.
Chen X. Predicting lncRNAdisease associations and constructing lncRNA functional similarity network based on the information of miRNA. Sci Rep. 2015;5:13186.
 24.
Zhao T, Xu J, Liu L, Bai J, Xu C, Xiao Y, Li X, Zhang L. Identification of cancerrelated lncRNAs through integrating genome, regulome and transcriptome features. Mol BioSyst. 2015;11(1):126–36.
 25.
Wang J, Ma R, Ma W, Chen J, Yang J, Xi Y, Cui Q. LncDisease: a sequence based bioinformatics tool for predicting lncRNAdisease associations. Nucleic Acids Res. 2016;44(9):e90.
 26.
Lan W, Li M, Zhao K, Liu J, Wu FX, Pan Y, Wang J. LDAP: a web server for lncRNAdisease association prediction. Bioinformatics. 2017;33(3):458–60.
 27.
Fu G, Wang J, Domeniconi C, Yu G. Matrix factorizationbased data fusion for the prediction of lncRNAdisease associations. Bioinformatics. 2018;34(9):1529–37.
 28.
Cheng L, Hu Y, Sun J, Zhou M, Jiang Q. DincRNA: a comprehensive webbased bioinformatics toolkit for exploring disease associations and ncRNA function. Bioinformatics. 2018;34(11):1953–6.
 29.
Yu G, Wang Y, Wang J, Fu G, Guo M, Domeniconi C: Weighted matrix factorization based data fusion for predicting lncRNAdisease associations. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2018: 572–577.
 30.
Sun J, Shi H, Wang Z, Zhang C, Liu L, Wang L, He W, Hao D, Liu S, Zhou M. Inferring novel lncRNAdisease associations based on a random walk model of a lncRNA functional similarity network. Mol BioSyst. 2014;10(8):2074–81.
 31.
Yang X, Gao L, Guo X, Shi X, Wu H, Song F, Wang B. A network based method for analysis of lncRNAdisease associations and prediction of lncRNAs implicated in diseases. PLoS One. 2014;9(1):e87797.
 32.
Zhou M, Wang X, Li J, Hao D, Wang Z, Shi H, Han L, Zhou H, Sun J. Prioritizing candidate diseaserelated long noncoding RNAs by walking on the heterogeneous lncRNA and disease network. Mol BioSyst. 2015;11(3):760–9.
 33.
Chen X. KATZLDA: KATZ measure for the lncRNAdisease association prediction. Sci Rep. 2015;5:16840.
 34.
Chen X, You ZH, Yan GY, Gong DW. IRWRLDA: improved random walk with restart for lncRNAdisease association prediction. Oncotarget. 2016;7(36):57919–31.
 35.
Cheng L, Shi H, Wang Z, Hu Y, Yang H, Zhou C, Sun J, Zhou M. IntNetLncSim: an integrative network analysis method to infer human lncRNA functional similarity. Oncotarget. 2016;7(30):47864–74.
 36.
Yu G, Fu G, Lu C, Ren Y, Wang J. BRWLDA: birandom walks for predicting lncRNAdisease associations. Oncotarget. 2017;8(36):60429–46.
 37.
Wang P, Guo Q, Gao Y, Zhi H, Zhang Y, Liu Y, Zhang J, Yue M, Guo M, Ning S, et al. Improved method for prioritization of disease associated lncRNAs based on ceRNA theory and functional genomics data. Oncotarget. 2017;8(3):4642–55.
 38.
Yao Q, Wu L, Li J, Yang LG, Sun Y, Li Z, He S, Feng F, Li H, Li Y. Global prioritizing disease candidate lncRNAs via a multilevel composite network. Sci Rep. 2017;7:39516.
 39.
Ding L, Wang M, Sun D, Li A. TPGLDA: novel prediction of associations between lncRNAs and diseases via lncRNAdiseasegene tripartite graph. Sci Rep. 2018;8(1):1065.
 40.
Gu C, Liao B, Li X, Cai L, Li Z, Li K, Yang J. Global network random walk for predicting potential human lncRNAdisease associations. Sci Rep. 2017;7(1):12442.
 41.
Zhang J, Zhang Z, Chen Z, Deng L. Integrating multiple heterogeneous networks for novel LncRNAdisease association inference. IEEE/ACM Trans Comput Biol Bioinform. 2017.
 42.
Li Y, Patra JC. Genomewide inferring genephenotype relationship by walking on the heterogeneous network. Bioinformatics. 2010;26(9):1219–24.
 43.
Li JH, Liu S, Zhou H, Qu LH, Yang JH. starBase v2.0: decoding miRNAceRNA, miRNAncRNA and proteinRNA interaction networks from largescale CLIPSeq data. Nucleic Acids Res. 2014;42(Database issue):D92–7.
 44.
Hao Y, Wu W, Li H, Yuan J, Luo J, Zhao Y, Chen R: NPInter v3.0: an upgraded database of noncoding RNAassociated interactions. Database (Oxford) 2016, 2016.
 45.
Yi Y, Zhao Y, Li C, Zhang L, Huang H, Li Y, Liu L, Hou P, Cui T, Tan P, et al. RAID v2.0: an updated resource of RNAassociated interactions across organisms. Nucleic Acids Res. 2017;45(D1):D115–8.
 46.
Schriml LM, Arze C, Nadendla S, Chang YW, Mazaitis M, Felix V, Feng G, Kibbe WA. Disease ontology: a backbone for disease semantic integration. Nucleic Acids Res. 2012;40(Database issue):D940–6.
 47.
Li Y, Qiu C, Tu J, Geng B, Yang J, Jiang T, Cui Q. HMDD v2.0: a database for experimentally supported human microRNA and disease associations. Nucleic Acids Res. 2014;42(Database issue):D1070–4.
 48.
Pinero J, Bravo A, QueraltRosinach N, GutierrezSacristan A, DeuPons J, Centeno E, GarciaGarcia J, Sanz F, Furlong LI. DisGeNET: a comprehensive platform integrating information on human diseaseassociated genes and variants. Nucleic Acids Res. 2017;45(D1):D833–9.
 49.
Cancer Genome Atlas N: Comprehensive molecular portraits of human breast tumours. Nature 2012, 490(7418):61–70.
 50.
ChaconCortes D, Smith RA, Lea RA, Youl PH, Griffiths LR. Association of microRNA 1792 cluster host gene (MIR17HG) polymorphisms with breast cancer. Tumour Biol. 2015;36(7):5369–76.
 51.
Yu F, Bracken CP, Pillman KA, Lawrence DM, Goodall GJ, Callen DF, Neilsen PM. p53 represses the oncogenic SnoMiR28 derived from a SnoRNA. PLoS One. 2015;10(6):e0129190.
 52.
Lin A, Li C, Xing Z, Hu Q, Liang K, Han L, Wang C, Hawke DH, Wang S, Zhang Y, et al. The LINKA lncRNA activates normoxic HIF1alpha signalling in triplenegative breast cancer. Nat Cell Biol. 2016;18(2):213–24.
 53.
Siegel RL, Miller KD, Jemal A. Cancer statistics, 2016. CA Cancer J Clin. 2016;66(1):7–30.
 54.
Ge S, Xia X, Ding C, Zhen B, Zhou Q, Feng J, Yuan J, Chen R, Li Y, Ge Z, et al. A proteomic landscape of diffusetype gastric cancer. Nat Commun. 2018;9(1):1012.
 55.
Hu CE, Du PZ, Zhang HD, Huang GJ. Long noncoding RNA CRNDE promotes proliferation of gastric Cancer cells by targeting miR145. Cell Physiol Biochem. 2017;42(1):13–21.
 56.
Pan L, Liang W, Gu J, Zang X, Huang Z, Shi H, Chen J, Fu M, Zhang P, Xiao X, et al. Long noncoding RNA DANCR is activated by SALL4 and promotes the proliferation and invasion of gastric cancer cells. Oncotarget. 2018;9(2):1915–30.
 57.
Tian X, Zhu X, Yan T, Yu C, Shen C, Hong J, Chen H, Fang JY. Differentially expressed lncRNAs in gastric Cancer patients: a potential biomarker for gastric Cancer prognosis. J Cancer. 2017;8(13):2575–86.
 58.
Torre LA, Bray F, Siegel RL, Ferlay J, LortetTieulent J, Jemal A. Global cancer statistics, 2012. CA Cancer J Clin. 2015;65(2):87–108.
 59.
Zhao Y, Qin ZS, Feng Y, Tang XJ, Zhang T, Yang L. Long noncoding RNA (lncRNA) small nucleolar RNA host gene 1 (SNHG1) promote cell proliferation in colorectal cancer by affecting P53. Eur Rev Med Pharmacol Sci. 2018;22(4):976–84.
 60.
Zhang YH, Fu J, Zhang ZJ, Ge CC, Yi Y. LncRNALINC00152 downregulated by miR376c3p restricts viability and promotes apoptosis of colorectal cancer cells. Am J Transl Res. 2016;8(12):5286–97.
 61.
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNAseq data with DESeq2. Genome Biol. 2014;15(12):550.
 62.
Lu Z, Cohen KB, Hunter L. GeneRIF quality assurance as summary revision. Pac Symp Biocomput. 2007:269–80.
 63.
van Laarhoven T, Nabuurs SB, Marchiori E. Gaussian interaction profile kernels for predicting drugtarget interaction. Bioinformatics. 2011;27(21):3036–43.
 64.
Yu G, Wang LG, Yan GR, He QY. DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis. Bioinformatics. 2015;31(4):608–9.
 65.
Luo Y, Zhao X, Zhou J, Yang J, Zhang Y, Kuang W, Peng J, Chen L, Zeng J. A network integration approach for drugtarget interaction prediction and computational drug repositioning from heterogeneous information. Nat Commun. 2017;8(1):573.
 66.
Cao M, Pietras CM, Feng X, Doroschak KJ, Schaffner T, Park J, Zhang H, Cowen LJ, Hescott BJ. New directions for diffusionbased network prediction of protein function: incorporating pathways with confidence. Bioinformatics. 2014;30(12):i219–27.
 67.
Navlakha S, Kingsford C. The power of protein interaction networks for associating genes with diseases. Bioinformatics. 2010;26(8):1057–63.
 68.
Liao CS, Lu K, Baym M, Singh R, Berger B. IsoRankN: spectral methods for global alignment of multiple protein networks. Bioinformatics. 2009;25(12):i253–8.
 69.
Gligorijevic V, Barot M, Bonneau R. DeepNF: deep network fusion for protein function prediction. Bioinformatics. 2018;34(22):3873–81.
 70.
Cho H, Berger B, Peng J. Diffusion component analysis: unraveling functional topology in biological networks. Res Comput Mol Biol. 2015;9029:62–4.
 71.
Chen X, Liu MX, Yan GY. Drugtarget interaction prediction by random walk on the heterogeneous network. Mol BioSyst. 2012;8(7):1970–8.
 72.
Valdeolivas A, Tichit L, Navarro C, Perrin S, Odelin G, Levy N, Cau P, Remy E, Baudot A. Random walk with restart on multiplex and heterogeneous biological networks. Bioinformatics. 2018.
Acknowledgements
Not applicable.
Funding
This work was supported by the National Natural Science Foundation of China under Grant No. 61873202, No. 61473232 and No. 91430111; and the National Library of Medicine grants of United States under Grant No. R00LM011673. The funding bodies did not play any roles in the design of the study, in the collection, analysis, or interpretation of data, or in writing the manuscript.
Availability of data and materials
IDHIMIRW is available at https://github.com/NWPU903PR/IDHIMIRW, and the datasets used and/or analyzed during the current study are available from the corresponding references.
Author information
Affiliations
Contributions
XNF collected the dataset, performed the experiments, and wrote the initial manuscript. SWZ and SL conceived and designed the experiments. XNF, SYZ and KZ analyzed the results. XNF and SYZ developed the codes. SWZ revised the manuscript. All authors participated in the definition of the process, the discussion of relevant aspects, and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional files
Additional file 1:
LncRNA data processing procedure. (TIF 1447 kb)
Additional file 2:
Disease data processing procedure. (TIF 1340 kb)
Additional file 3:
AUPR values of IDHIMIRW on the largescale lncRNAdisease heterogeneous with different parameters in LOOCV test. (A) AUC values with different α. (B) AUC values with different γ. (C) AUC values with different η. (D) AUC values with different β. (E) AUPR values with different α. (F) AUPR values with different γ. (G) AUPR values with different η. (H) AUPR values with different β. (TIF 3520 kb)
Additional file 4:
AUC and AUPR values of IDHIMIRW on the smallscale lncRNAdisease heterogeneous with different parameters in LOOCV test. (A) AUC values with different α. (B) AUC values with different γ. (C) AUC values with different η. (D) AUC values with different β. (E) AUPR values with different α. (F) AUPR values with different γ. (G) AUPR values with different η. (H) AUPR values with different β. (TIF 3705 kb)
Additional file 5:
The top 15 predicted associated lncRNAs for breast cancer. (XLSX 9 kb)
Additional file 6:
The top 15 predicted associated lncRNAs for stomach cancer. (XLSX 9 kb)
Additional file 7:
The top 15 predicted associated lncRNAs for colorectal cancer. (XLSX 9 kb)
Additional file 8:
The results of RNASeq data analysis for breast cancer. (A) heatmap of top 200 most significantly dysregulated lncRNA expression values. (B) heatmap of lncRNA AL157395.1 expression values. (C) boxplot of lncRNA AL157395.1 expression in normal and tumor samples. (D) heatmap of lncRNA AP001528.1 expression values. (E) boxplot of lncRNA AP001528.1 expression in normal and tumor samples. (TIF 9850 kb)
Additional file 9
The results of RNASeq data analysis for stomach cancer. (A) heatmap of top 200 most significantly dysregulated lncRNA expression values. (B) heatmap of lncRNA KCNQ1OT1 expression values. (C) boxplot of lncRNA KCNQ1OT1 expression in normal and tumor samples. (D) heatmap of lncRNA DLEU2 expression values. (E) boxplot of lncRNA DLEU2 expression in normal and tumor samples. (F) heatmap of lncRNA LINC00299 expression values. (G) boxplot of lncRNA LINC00299 expression in normal and tumor samples. (TIF 9211 kb)
Additional file 10:
The predicted lncRNAdisease associations. (TXT 180 kb)
Additional file 11:
Details and statistics of collected data. (DOCX 34 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Fan, XN., Zhang, SW., Zhang, SY. et al. Prediction of lncRNAdisease associations by integrating diverse heterogeneous information sources with RWR algorithm and positive pointwise mutual information. BMC Bioinformatics 20, 87 (2019). https://doi.org/10.1186/s128590192675y
Received:
Accepted:
Published:
Keywords
 Long noncoding RNA
 Disease
 lncRNAdisease association
 Heterogeneous network
 Random walk with restart algorithm