 Research
 Open Access
 Published:
A novel algorithm based on birandom walks to identify diseaserelated lncRNAs
BMC Bioinformatics volume 20, Article number: 569 (2019)
Abstract
Backgrounds
There is evidence to suggest that lncRNAs are associated with distinct and diverse biological processes. The dysfunction or mutation of lncRNAs are implicated in a wide range of diseases. An accurate computational model can benefit the diagnosis of diseases and help us to gain a better understanding of the molecular mechanism. Although many related algorithms have been proposed, there is still much room to improve the accuracy of the algorithm.
Results
We developed a novel algorithm, BiWalkLDA, to predict diseaserelated lncRNAs in three real datasets, which have 528 lncRNAs, 545 diseases and 1216 interactions in total. To compare performance with other algorithms, the leaveoneout validation test was performed for BiWalkLDA and three other existing algorithms, SIMCLDA, LDAP and LRLSLDA. Additional tests were carefully designed to analyze the parameter effects such as α, β, l and r, which could help user to select the best choice of these parameters in their own application. In a case study of prostate cancer, eight out of the topten diseaserelated lncRNAs reported by BiWalkLDA were previously confirmed in literatures.
Conclusions
In this paper, we develop an algorithm, BiWalkLDA, to predict lncRNAdisease association by using birandom walks. It constructs a lncRNAdisease network by integrating interaction profile and gene ontology information. Solving coldstart problem by using neighbors’ interaction profile information. Then, birandom walks was applied to three real biological datasets. Results show that our method outperforms other algorithms in predicting lncRNAdisease association in terms of both accuracy and specificity.
Availability
Background
It suggests that only 1.5% of genes in the human genome were proteincoding genes, which are twice as many as that of worm and fruit fly [1]. However, 74.7% of the human genome is involved in the process of primary transcripts [2]. It implies that noncoding RNAs play major roles in the regulation of gene expression. The presence or absence of some noncoding RNAs could down or upregulate a cascade of gene expression, which could be drug targets for medical therapy of a disease. Many researchers put efforts in to the discovery of the long noncoding RNAs function. Recent studies have found strong association between lncRNA and diseases. It shows that many lncRNAs play as some functional roles in diverse biological processes, such as cell proliferation, RNA binding complexes, immune surveillance, neuronal processes, morphogenesis and gametogenesis [3]. Their dysfunction may cause various diseases. For example, HOTAIR would induce androgenindependent (AR) activation, which plays a central role in establishing an oncogenic cascade that drives prostate cancer progression. It is also a causal reason for ARmediated transcription programs in the absence of androgen [4]. Therefore, the prediction of lncRNA function would give us a new way to understand the regulation mechanism and disease pathology. There is an urgent demand for the development of fast and accurate algorithm to predict lncRNAdisease association.
Many computational tools have recently been developed to predict potential lncRNAdisease association and functional patters in biological networks [5–10]. Functional patterns in biological networks. These computational methods are majorly in three categories. One of them is based on the idea of matrix factorization. Matrix factorization can be seen as a linear model of latent factors. In these methods, a corresponding latent factor is generated for each lncRNA and disease. Then, it uses a dot product of the latent factors to represent their similarity. The objective function of matrix factorization is to learn the optimal latent factors which can minimize the prediction error. Recently, these methods have been widely used in the prediction of lncRNAdisease relationship. For example, MFLDA reduces the high dimension of heterogeneous data sources into lowrank matrices via matrix trifactorization, which can help to explore and exploit their intrinsic and shared structure [11]. SIMCLDA translates the lncRNAdisease association prediction problem into a recommendation, which can be solved with inductive matrix completion (IMC) [12]. However, matrix factorization may also bear the risk of overfitting and the problem of costingtime complexity. Another type of methods is based on the idea of "guiltbyassociate". They are intuitively guided by the assumption that similar disease or lncRNA have similar connection patterns. If disease (A) and lncRNA (A) are known to be related, and disease (A) and disease (B) are very similar. We can infer disease (B) may also related to lncRNA (A). Obviously, the performance of these algorithms heavily depends on the accuracy of the similarity measures. Many "guiltbyassociation" algorithms have been proposed. For example, RWRlncD infers potential human lncRNAdisease associations by implementing the random walk with restart method on a lncRNA functional similarity network [13]. IRWRLDA predicts novel lncRNAdisease associations by integrating known lncRNAdisease associations, disease semantic similarity, and various lncRNA similarity measures and make prediction based on improved Random Walk with Restart [14]. The third type of methods focus on classification. Feature extraction was performed on the complex network. Binary classifiers could be applied in the following step to predict whether there exists a connection between lncRNAs and diseases. Another typical prediction algorithm is LRLSLDA, which constructs a cost function in lncRNA and disease space and makes prediction by combining several classifiers in the lncRNA and disease space into a single classifier [15]. LDAP predicts potential lncRNAdisease associations by using a bagging SVM classifier based on lncRNA similarity and disease similarity [16].
In this paper, we proposed a novel algorithm, BiWalkLDA, to predict potential lncRNAdisease associations. The design of BiwalkLDA was intuitivly guided by the assumption of "guiltbyassociate". In order to construct more accurate similarity network, we integrate two types of data from interaction profiles and gene ontology. Furthermore, our method was designed to solve the coldstart problem. BiWalkLDA uses birandom walks algorithm to predict lncRNAdisease association base on a similarity network we constructed. The experiments were carried out on three real datasets downloaded from the LncRNADisease database [17]. Algorithm performance were evaluated by using Leaveoneout cross validation (LOOCV). Results show that BiWalkLDA outperforms other four stateofart algorithms, meanwhile it is robust on different datasets and parameters in predicting novel lncRNAdisease associations.
Methods
Construction of disease similarity networks
Association patterns were commonly used to calculate disease similarity [14, 18]. In that case, disease similarity will depend on known LncRNA and disease association. Because of the lack of these prior knowledge in lncRNAdisease association, we considered to use gene ontology as an additional information. Gene ontology informations are obtained from previous work [12], which downloaded association between genes and gene ontology terms of human being from Ensemble database [19] and derived diseasegene associations from DisGeNet database [20]. For each disease, we can get the corresponding GO set. Then we use jaccard similarity to measure the similarity between the two sets. The calculation process is shown in the following formula:
where \({GO}_{d_{i}}\) and \({GO}_{d_{j}}\) are two sets of gene ontology terms of disease d_{i} and d_{j}, respectively. Like previous algorithms, we also construct disease similarity networks by using known disease and LncRNA associations. The construction process can be divided into two steps: (1) construction of an adjacency matrix \(A_{n_{l} \times n_{d}}\), where n_{l} is the number of lncRNA and n_{d} is the number of diseases. A_{ij}=1 represent that the i^{th} lncRNA is associated with d_{j}, otherwise A_{ij}=0. (2) With the matrix A, we referred IP(d(i)) to the i^{th} column of A, which is the interaction profile of disease d_{i}. IP(d(i)) is a binary vector of length n_{l} and represents an association pattern of disease d(i). Then we calculate the similarity between two diseases based on the gaussian linear kernel,
where −γ_{d} is the bandwidth of kernel which is calculated as follow:
Here n_{d} is the number of diseases. Up to now, we have constructed S_{GKD} based on known association between lncRNA and disease and S_{GO} based on diseaserelated GO set. Then we use a simple linear model to fuse the two similarity networks.
Here α is a hyperparameter that control the proportion of S_{GKD} and S_{GO}. If α=1, disease similarity only be calculated base on gene ontology information. If α=0, disease similarity only be calculated base on known diseaselncRNA associations. When the matrix is sparse, it would be better to give a large α so that similarity rewards can be obtained from geneontology. This technique makes the algorithm more robust
Construction of lncRNA similarity network
Similar to the previous process, we calculate lncRNA gaussian similarity based on known diseaselncRNA association. First, we use IP(l(i)) which is the i^{th} row of A to represent the interaction profile of lncRNA l(i). IP(l(i)) is a binary vector of length n_{d} and represents an association pattern of lncRNA l(i). Then lncRNA gaussian similarity was calculated base on the following formula:
where γ_{l} is the bandwidth of kernel, n_{l} is the number of the lncRNA.
Calculation of interaction profiles for new lncRNAs
In the prediction process, if an lncRNA only knows very few diseases associated with it, this lncRNA is difficult to predict accurately. This is a common problem in industry, such as the difficulty of recommending products to a new user. This problem is also known as coldstart problem. There are two ways to solve it. The first way is to consider additional information in the definition of node similarity. The other one is to use prior information, e.g. diseases with many connections are more likely to interact with a new unknown lncRNA. We had considered using additional data such as lncRNA sequence information to measure similarity between lncRNAs. But on the one hand, the length of lncRNA sequence is very long(> 300 bp), and it is difficult to find an appropriate algorithm to measure their similarity. On the other hand, it is difficult for some new lncRNA to collect their sequence information. So in this paper, we mainly deal with the cold start problem through the second ways. We will describe this process in detail. First, we calculate the interaction profile for a new lncRNA using the mean of its neighbors’ interaction profile. Taken lncRNA l(i) as an example, the neighbors of lncRNA l(i) should be satisfied with the following formula:
Here, n_{l} is the number of lncRNA. In another words, if similarity between l(i) and l(j) were larger than the mean of the similarity, l(j) can be defined as the neighbors of l(i). IP(l(i)) was the mean of its neighbors’ interaction profile.
Here N(lnc_{i}) is the set of the neighbors of lncRNA l(i) and N(lnc_{i}) is the size of N(lnc_{i}). Notice that our approach here is different from the traditional approach to dealing with coldstart problem. Typically, the traditional method uses the mean of other lncRNAs interaction profile to fill in the new LncRNA. This is actually based on the popularity to make prediction. In contrast, BiwalkLDA uses local topological structure to predict missing interactions. Given a new lncRNA, we first find all its similar (or nearest) lncRNAs, which are likely to share common disease interactors with our node of interest. So, the key point is the definition of similarity function. Unlike all other algorithms, we assume that these lncRNAs sparsely connected to diseases would contribute more to the given node. It means they are likely to share common disease nodes. For example, an inactive user didnŠt buy Harry Potter, although the book is one of the best seller. How likely does a new user would choose to buy the book. In our model, new users would more likely to learn from inactive users.
The algorithm of Birandom walk
Based on the construction of lncRNA similarity network and disease similarity network, we use the birandom walk algorithm to predict potential lncRNAdisease associations. First, I will explain the idea of birandom walk algorithm. BiwalkLDA mainly make prediction base on the assumption that similar diseases or lncRNA have similar connection patterns. For example, if we know that lncRNA (i) is associated with disease (j) and lncRNA (i) is very similar to lncRNA (j), it is obvious that we can infer that lncRNA (j) may also associate with disease (j). So far we have constructed disease similarity networks and lncRNA similarity networks. Birandom walk algorithm actually constructed a linear model based on similarity. Suppose we want to predict the relationship between lncRNA (i) and disease (j).
Here a_{ij} represents the possibility that lncRNA(i) and disease(j) are related. sim_{d}(k, j) represents the similarity of disease(k) and disease(j). So the process of calculation is actually to traverse every disease k and add a_{i,k}∗sim_{d}(k,j) up. It can be seen as a linear model based on similarity. Considering that we want to keep part of the original a_{ij}, the formula can written as below:
Note that we need to normalize the similarity to ensure that a_{ij} is always less than 1. The above formula is based on disease similarity to make predictions. Similarly, we can make predictions based on the similarity of lncRNA and then combine the two results together to make final prediction. So the whole process of the algorithm can be divided into three steps: (1)First, we predict new scores based on disease similarity and lncRNA similarity according to random walk algorithm. (2)Then, we use the mean of two scores as the result of this round of prediction. (3)The two steps are repeatedly performed until maximum number of iterations. Let’s go into the details of the algorithm. We do row normalization on both lncRNA similarity network and disease similarity. This is because random walk is actually a linear prediction model based on similarity. The similarity should be normalized so that the prediction results are between 0 and 1.
. Here \(D_{S_{d}}(i, i)\) is the sum of the i^{th} row of S_{d}. Similarly, we normalized the similarity of lncRNA as following formula:
Here \(D_{S_{GKL}}(i, i)\) is the sum of the i^{th} row of S_{GKL}. Adjacent matrix A also needs to be initialized. Scores of all known lncRNAdisease association are set to 1/n where n is th total number of known lncRNA and disease associations. Scores of Other unobserved associations are set to zero.
Here S_{ini} represent the initial probability and the sum of initial probabilities is 1. Because the importance of predicting results based on different similarity networks may be different. We introduce two parameter l and r as the numbers of maximal iterations in the left and right random walks on these two networks. The more iterations, the more important the prediction through this similarity network is. The iterative process can be described by the following formula:
Here S_{d}, S_{l} represent disease and lncRNA similarity networks. S_{ini} represents initial score of all diseaselncRNA association. β is the decay factor which control the degree of retention of initial information. R_{l} represents the score of random walk on the lncRNA similarity network and R_{d} represents the score of random walk on the disease similarity network. In the iterative function, we use the averaged value of R_{d} and R_{l} as \(S_{ini}^{t}\) in step t. This process can be seen as a combination of lncRNA similarity and disease similarity to make predictions. When the number of iterations reached max(l,r), \(S_{int}^{t}\) is the final result which represents the possibilities of all lncRNAdisease association. The pseudocode of birandom walk algorithm can be seen in Algorithm 1.
Data and materials
Known lncRNA and disease associations were downloaded from the LncRNADisease database [17], which is commonly used as the gold standard in predicting lncRNAdisease associations question. In the experiment, we used three databases which are downloaded from three different versions of 2012, 2014 and 2015 (Table 1). in 2012, experiments confirm there exists 276 interactions between 112 lncRNAs and 150 diseases. In 2014, this number has increased to 319 interactions between 131 lncRNAs and 169 diseases. In 2015, it suggests that there are 621 interactions between 285 lncRNAs and 226 diseases.
Results
We use leaveoneout cross validation (LOOCV) to test the performance of BiwalkLDA. LOOCV is a widelyused strategy to evaluate the quality of the algorithms. In each turn, one known association was set as a test sample. All other lncRNAdisease association were set to training set to train model. All associations that are not observed will be considered as a candidate set and will be scored by BiwalkLDA. A correspond rankList can be generated based on the predicted results. Then true positive rates (TPR, sensitivity) and false positive rates (FPR, 1specificity) can be calculated by giving different thresholds. Based on the calculated values of TPR and FPR, the receiveroperating characteristics (ROC) curves can be plotted. Then we use the areas under ROC curve (AUC) as evaluation criteria of algorithmic performance which reflects the global prediction accuracy in different situation. The value of AUC closed to one means a perfect prediction, while the AUC value of 0.5 indicates purely random performance.
The effects of parameters
The effects of α
In the section of disease similarity, we use a linear model to fuse S_{GO} and S_{GKD}. Here α is a hyperparameter that control the proportion of S_{GO} and S_{GKD}. If α = 1, disease similarity only be calculated base on gene ontology information. If α = 0, disease similarity only be calculated base on known diseaselncRNA associations. BiWalkLDA use gene ontology information as a supplement to S_{GKD}, which makes the generalization ability of the algorithm stronger. To test the performance of the algorithm under different α values, we changed α from 0 to 1 and increased 0.1 per time. Then we use BiwalkLDA to make prediction. The experimental results are shown in Fig. 1, When α=0.1, BiwalkLDA obtain the best results on dataset1 and dataset2. On dataset3, it reaches the peak when α=0.3. It can see that small changes in α do not have much impact on the results. Therefore, we recommend the region of α could be set between 0.1 and 0.3 for using BiwalkLDA. The experimental results show that the fusion of S_{GKD} and S_{GO} can improve the accuracy of the algorithm. Meanwhile, the algorithm can achieve good performance even if we only use the GO similarity network. It indicates that the algorithm still works in the absence of diseaselncRNA association information.
The effects of β
β is a decay factor in birandom walk algorithm. β determines the degree of retention of initial information in each iteration. if β=0, all initial information will be retained. If β=1, all initial information will be used to predict new score in each turn. Obviously, it will result in a poor performance if β is either 0 or 1 are inappropriate and will result in a poor performance. To test the performance of the algorithm under different β values, we increased β rom 0 to 1 in 10 steps, and run BiwalkLDA. The value of β was changed from 0 to 1 and increased 0.1 each time and then using BiwalkLDA to make prediction. The experimental results are shown in Fig. 2. When 0.1 ≤β≤0.9, the results of the algorithm varied slightly. It indicts that BiWalkLDA is robust to β. BiWalkLDA performs the best AUC when β=0.8 in dataset1 and dataset2 and performs the best AUC when β=0.7 in dataset3. Intuitively, if the initial data is sufficient, a smaller β is more appropriate. Because dataset3 contains more known lncRNAdisease associations, the optimal β in dataset3 is less than the other dataset. Finally, we set β = 0.8 as default in three datasets.
The effects of l and r
In birandom walk algorithm, the parameters l and r are used to limit the number of random walk steps in the disease and lncRNA similarity network respectively. l and r can actually be understood as the importance of disease similarity and lncRNA similarity. If the value of l is larger, it means that we will predict more based on disease similarity. Values of different l and r make the algorithm applicable to different data. If l or r equals 0, the algorithm actually degenerates into a single random walk algorithm. This means that we only make predictions through lncRNA similarity or disease similarity alone instead of combining them. To test the performance of the algorithm under different l and r values. We grid search the combination of l and r in a certain range. The value of l and r were increased from 1 to 7 and increased 1 each time. Then we use BiwalkLDA to make prediction and calculate the AUC values by LOOCV. The experimental results are shown in Table 2. The experimental results show that when the values of l and r are relatively close, BiwalkLDA perform well. This shows that prediction through lncRNA similarity and disease similarity are equally important. Finally we set l=6 and r=6 as default in three dataset.
Comparison with other algorithms
To test the performance of the BiwalkLDA, we compared BiWalkLDA with three thestateofart computational methods (LDAP, LRLSLDA, SIMCLDA) of lncRNAdisease association prediction in three datasets. The results of the algorithm are measured by AUC value and number of correctly retrieved association. Because limited code can be used, we also compare our algorithm with KATZHMDA which is be used to predict diseasemicrobe association. LRLSLDA used Laplacian normalization operation and construct cost function in lncRNA and disease space. Then making prediction by minimize the cost function to obtain optimal classifier [15]. LDAP fused different data source and make prediction based on bagging SVM classifier [16]. SIMCLDA predicted lncRNAdisease association based on inductive matrix completion [12]. KATZHMDA integrated known microbedisease associations and gaussian interaction profile kernel similarity for microbes and diseases and make prediction based on katz algorithm [21]. On dataset1, we can see that BiwalkLDA obtained an AUC of 0.8268 which is higher than others others(LRLSLDA:AUC=0.7217, KATZHMDA:AUC=0.6510, LDAP:AUC=0.6987, SIMCLDA:AUC=0.7949) as shown in Fig. 3a. In addition to AUC, we also use the numbers of correctly retrieved association to measure the performance of the algorithm. If a predicted association in the first percent k of the candidate set, this association will be regard as a correctly retrieved association under given threshold k. So the numbers of correctly retrieved association can reflect the accuracy of the algorithm in top k% and AUC reflects the global performance of the algorithm. The experimental results are shown in Fig 3b. BiWalkLDA can predict more correctly retrieved association in Top10%. But it can also be seen that LRLSLDA performs better at lower thresholds in term of the numbers of correctly retrieved association. This result actually indicates that BiwalkLDA is more inclined to make global optimal predictions. This phenomenon can be explained as follows: (1)BiwalLDA processing samples with less information separately may significantly increase the AUC value, but it may also make incorrect predictions. (2)More comprehensive sequencing results can be obtained based on lncRNA similarity network and disease similarity network. This means that only samples that meet both of these inference criteria will be given a higher ranking. However, if one sample conforms to the similarity inference of lncRNA but does not conform to the other, it will not be given a high ranking. This problem can be solved by using nonlinear algorithm. The results on the other two datasets are similar, so we will not discuss them one by one (Figs. 4 and 5). It can be concluded that BiWalkLDA also achieve the best result(AUC 0.8510 in dataset2 and AUC 0.8473 in dataset3) and BiwalkLDA is robust enough to different parameter selection.
De novo lncRNAdisease prediction
In section of disease similarity, we combine gauss similarity and gene ontology similarity. Fusion of multiple similarities network not only improves the performance of the algorithm, but also strengthen generalization ability of BiwalkLDA. To assess the performance of BiWalkLDA, we conduct de novo lncRNAdisease association prediction in dataset1. In the process of de novo prediction, each queried disease d(i) would be removed all known lncRNAdisease association of this disease. Different computational methods were used in the prediction problem. Notice that we still know the gene ontology information of the disease. The experimental results are shown in Fig 6. The performance of BiWalkLDA only has a slight drop(AUC:0.8364) and is much higher than other algorithms. The result shows that BiWalkLDA can make good prediction even if there is absence in diseaselncRNA association information and combining heterogeneous data sources can deal with data missing situation. Note that AUC of LDAP only has 0.4762. This result is lower than random guess which AUC value is 0.5. This is because we are actually testing the performance of the algorithm in the absence of data. LDAP treats this problem as a classification problem and using a bagging SVM classifier to make prediction. If there is a serious lack of data, the features learned will be inaccurate and the effect of classification will be poor. There are two reasons for the good performance of BiwalkLDA. 1) Gene ontology information was used as a supplementary data. 2) BiwalkLDA used the neighborhood information to predict connections for new unknown lncRNAs.
Case studies
We conduct case study for prostate cancer to test the performance of BiWalkLDA in predicting new lncRNA for a certain disease. Prostate cancer (PC) is the most commonly diagnosed malignancy and the third leading cause of cancer death among men in developed countries. Predicting prostate cancer related lncRNA will help us to understand the mechanism of prostate cancer and provide a high probability set of candidate lncRNA for scientist. We use dataset3 to make prediction and confirm the result by searching related paper. Prediction results of the BiwalkLDA are showed in Table 3. It can see that eight prostate cancerrelated lncRNA(H19, MALAT1, HOTAIR, MEG3, PVT1, GAS5, NEAT1, UCA1) in the top ten candidates have been confirmed by previous studies. Long noncoding RNA H19 and H19derived microRNA675(miR675) were significantly downregulated in the metastatic prostate cancer cell line M12 compared with the nonmetastatic prostate epithelial cell line P69 [22]. MALAT1 was upregulated in human prostate cancer tissues and cell line [23]. HOTAIR as an androgenrepressed lncRNA is markedly upregulated following androgen deprivation therapies and in castrationresistant prostate cancer [4]. MEG3 decreased significantly in prostate cancer tissues relative to adjacent normal tissues [24]. Region surrounding rs378854 which is identified as a novel function prostate cancerspecific genetic variant interacts with the MYC and PVT1 promoters [25]. GAS5 promotes the apoptosis of prostate cell, and exonic sequence, i.e. GAS5 lncRNA, is sufficient to mediate this activity [26]. Nuclear enriched abundant transcript 1 (NEAT1) was identified as the most significantly overexpressed lncRNA in prostate cancer by using a combination of chromatin immunoprecipitation (ChIP) and RNAsequencing data [27]. UCA1 was abnormally upregulated in tumor tissues from prostate cancer patients and patients with high UCA1 levels had a significantly poorer prognosis [28]. Successful predictions for prostate cancer prove that BiWalkLDA can help us to find new relationships between lncRNA and disease base on historical data.
Conclusion
Many recent studies suggest that lncRNAs are strongly associated with various complex human diseases and they play important roles in the gene expression regulation and posttranscription modification. Predicting lncRNAdisease association can help understand the biological mechanism of disease and reduce the cost of experimental verification. However, discovering the relationship between lncRNA and disease by means of computational model is still a very challenging problem. Therefore, the development of computational tools is much in demand. Although many computational models have been proposed. Their prediction accuracy still has a lot of room to improve. To improve the performance of existing algorithms, we present a novel algorithm, BiwalkLDA based on birandom walks for the prediction of lncRNAdisease associations. It integrates gene ontology and interaction profile data together to calculate disease similarity, to solve the coldstart problem by using the local structure of lncRNAs neighbors information. Four thestateofart computational methods and BiwalkLDA are applied to predict lncRNAdisease associations on three different datasets. Results show that BiwalkLDA is superior to every other existing algorithms in terms of both accuracy and recall. There are still many problems to be dealt with. Existing models are based on smallscale datasets. Although algorithms can achieve high accuracy, their results are often repetitive. If the dataset is too large, the existing algorithms can not be applied to largescale data. In future work, we will consider to develop more effective algorithm to solve this problem.
Availability of data and material
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
Abbreviations
 AR:

Androgen
 IMC:

Inductive matrix completion
 LOOCV:

Leaveoneout cross validation
 ROC:

receiveroperating characteristics
References
 1
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, et al.Initial sequencing and analysis of the human genome. Nature. 2001; 3(6822):346.
 2
Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi AM, et al.Landscape of transcription in human cells. Nature. 2012; 489(7414):101.
 3
Guttman M, Amit I, Garber M, French C, Lin MF, Feldser D, et al.Chromatin signature reveals over a thousand highly conserved large noncoding rnas in mammals. Nature. 2009; 458(7235):223.
 4
Zhang A, Zhao J, Kim J, et al.Lncrna hotair enhances the androgenreceptormediated transcriptional program and drives castrationresistant prostate cancer. Cell Rep. 2015; 13(1):209–21.
 5
Hu J, Gao Y, Zheng Y, Shang X. KFfinder: Identification of key factors from hostmicrobial networks in cervical cancer. BMC Syst Biol. 2018; 12(S4):54.
 6
Hu J, Gao Y, He J, Zheng Y, Shang X. WebNetCoffee: a webbased application to identify functionally conserved proteins from Multiple PPI networks. BMC Bioinformatics. 2018; 19(1):422.
 7
Hu J, Zheng Y, Shang X. MiteFinderII: a novel tool to identify miniature invertedrepeat transposable elements hidden in eukaryotic genomes. BMC Med Genomics. 2018; 11(S5):101.
 8
Hu J, Shang X. Detection of Network Motif Based on a Novel Graph Canonization Algorithm from Transcriptional Regulation Networks. Molecules. 2017; 22(12):2194.
 9
Hu J, Wang J, Li J, Lin J, Liu T, Zhong Y, Liu J, Zheng Y, Gao Y, He J, Shang X. MDSVM: A novel SVMbased algorithm for the motif discovery of transcription factor binding sites. BMC Bioinformatics. 2019; 20(S7). https://doi.org/10.1186/s1285901927353.
 10
Peng J, Guan J, Shang X. Predicting Parkinson’s disease genes based on node2vec and autoencoder. Front Genet. 2019; 10. https://doi.org/10.3389/fgene.2019.00226.
 11
Fu G, Wang J, Domeniconi C, Yu G. Matrix factorization based data fusion for the prediction of lncrnadisease associations. Bioinformatics. 2017; 34(9):1529–37.
 12
Lu C, Yang M, Luo F, Wu FX, Li M, Pan Y, et al.Prediction of lncrnadisease associations based on inductive matrix completion. Bioinformatics. 2018; 34(19):3357–64. https://doi.org/10.1093/bioinformatics/bty327.
 13
Sun J, Shi H, Wang Z, Zhang C, Liu L, Wang L, et al.Inferring novel lncrnadisease associations based on a random walk model of a lncrna functional similarity network. Mol Biosyst. 2014; 10(8):2074–081.
 14
Chen X, You ZH, Yan GY, Gong DW. Irwrlda: improved random walk with restart for lncrnadisease association prediction. Oncotarget. 2016; 7(36):57919–31.
 15
Chen X, Yan GY. Novel human lncrnadisease association inference based on lncrna expression profiles. Bioinformatics. 2013; 29(20):2617–24.
 16
Lan W, Li M, Zhao K, Liu J, Wu FX, Pan Y, et al.Ldap: a web server for lncrnadisease association prediction. Bioinformatics. 2017; 33(3):458–60.
 17
Chen G, Wang Z, Wang D, Qiu C, Liu M, Chen X, et al.Lncrnadisease: a database for longnoncoding rnaassociated diseases. Nucleic Acids Res. 2013; 41(Database issue):D983–D986.
 18
Chen X. Katzlda: katz measure for the lncrnadisease association prediction. Sci Rep. 2014; 5:16840.
 19
Aken BL, Ayling S, Barrell D, Clarke L, Curwen V, Fairley S, et al.The ensembl gene annotation system. Database J Biol Databases Curation. 2016; 2016:baw093. https://doi.org/10.1093/database/baw093.
 20
BauerMehren A, Rautschka M, Sanz F, Furlong LI. Disgenet. Bioinformatics. 2010; 26(22):2924–292.
 21
Chen X, Huang YA, You ZH, Yan GY, Wang XS. A novel approach based on katz measure to predict associations of human microbiota with noninfectious diseases. Bioinformatics. 2016; 33(5):733–9.
 22
Zhu M, Chen Q, Liu X, Sun Q, Zhao X, Deng R, et al.Lncrna h19/mir675 axis represses prostate cancer metastasis by targeting tgfbi. Febs J. 2015; 281(16):3766–75.
 23
Ren S, Liu Y, Xu W, Sun Y, Lu J, Wang F, et al.Long noncoding rna malat1 is a new potential therapeutic target for castration resistant prostate cancer. J Urol. 2013; 190(6):2278–87.
 24
Luo G, Wang M, Wu X, Tao D, Xiao X, Wang L, et al.Long noncoding rna meg3 inhibits cell proliferation and induces apoptosis in prostate cancer. Cell Physiol Biochem. 2015; 37(6):2209.
 25
Meyer KB, Maia AT, O’Reilly M, Ghoussaini M, Prathalingam R, Portergill P, et al.A functional variant at a prostate cancer predisposition locus at 8q24 is associated with pvt1 expression. Plos Genet. 2011; 7(7):e1002165.
 26
Pickard MR, Mourtadamaarabouni M, Williams GT. Long noncoding rna gas5 regulates apoptosis in prostate cancer cell lines. Biochim Biophys Acta. 2013; 1832(10):1613–23.
 27
Chakravarty D, Sboner A, Nair SS, Giannopoulou E, Li R, Hennig S, et al.The oestrogen receptor alpharegulated lncrna neat1 is a critical modulator of prostate cancer. Nat Commun. 2014; 5:5383.
 28
Na XY, Liu ZY, Ren PP, Yu R, Shang XS. Long noncoding rna uca1 contributes to the progression of prostate cancer and regulates proliferation through klf4krt6/13 signaling pathway. Int J Clin Exp Med. 2015; 8(8):12609–16.
Acknowledgments
Many thanks go to Dr. Bolin Chen and Dr. Jiajie Peng for discussion.
About this supplement
This article has been published as part of BMC Bioinformatics Volume 20 Supplement 18, 2019: Selected articles from the Biological Ontologies and Knowledge bases workshop 2018. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume20supplement18.
Funding
Publication costs were funded by the National Natural Science Foundation of China (Grant No. 61702420); This project has also been funded by the National Natural Science Foundation of China (Grant No. 61332014, 61702420 and 61772426); the China Postdoctoral Science Foundation (Grant No. 2017M613203); the Natural Science Foundation of Shaanxi Province (Grant No. 2017JQ6037); the Fundamental Research Funds for the Central Universities (Grant No. 3102018zy032); the Top International University Visiting Program for Outstanding Young Scholars of Northwestern Polytechnical University.
Author information
Affiliations
Contributions
JH designed the computational framework, YG, JL, YZ, and JW performed all the analyses of the data and wrote the manuscript; XS is the major coordinator, who contributed a lot of time and efforts in the discussion of this project. All authors read and approved the final manuscript.
Corresponding author
Correspondence to Xuequn Shang.
Ethics declarations
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Hu, J., Gao, Y., Li, J. et al. A novel algorithm based on birandom walks to identify diseaserelated lncRNAs. BMC Bioinformatics 20, 569 (2019). https://doi.org/10.1186/s1285901931283
Published:
Keywords
 LncRNAdisease association
 Birandom walks
 Gene ontology
 Interaction profile