 Methodology
 Open Access
 Published:
A multinetwork integration approach for measuring disease similarity based on ncRNA regulation and heterogeneous information
BMC Bioinformatics volumeÂ 23, ArticleÂ number:Â 89 (2022)
Abstract
Background
Measuring similarity between complex diseases has significant implications for revealing the pathogenesis of diseases and development in the domain of biomedicine. It has been consentaneous that functional associations between diseaserelated genes and semantic associations can be applied to calculate disease similarity. Currently, more and more studies have demonstrated the profound involvement of noncoding RNA in the regulation of genome organization and gene expression. Thus, taking ncRNA into account can be useful in measuring disease similarities. However, existing methods ignore the regulation functions of ncRNA in biological process. In this study, we proposed a novel deeplearning method to deduce disease similarity.
Results
In this article, we proposed a novel method, ImpAESim, a framework integrating multiple networks embedding to learn compact feature representations and disease similarity calculation. We first utilize three different diseaserelated information networks to build up a heterogeneous network, after a network diffusion process, RWR, a compact feature learning model composed of classic Auto Encoder (AE) and improved AE model is proposed to extract constraints and lowdimensional feature representations. We finally obtain an accurate and lowdimensional feature representation of diseases, then we employed the cosine distance as the measurement of disease similarity.
Conclusion
ImpAESim focuses on extracting a lowdimensional vector representation of features based on ncRNA regulation, and geneâ€“gene interaction network. Our method can significantly reduce the calculation bias resulted from the sparse disease associations which are derived from semantic associations.
Background
Human complex diseases are often related with each other through shared causes or pathology. Knowledge of how various diseases are related can facilitate deepening the understanding of their etiology and pathogenesis [1, 2]. Studying the relationships can make contributions to predict disease causing genes [3, 4], inferring miRNA function associations [5, 6], and identifying novel therapeutic drugs for diseases [7, 8]. Various aspects including pathogenesis and phenotypes can be exploited to calculate the similarity of pairwise diseases. Current methods for measuring disease similarity can be classified as semanticbased [5, 9] and functionalbased [10,11,12]. Semanticbased methods are widely used for measuring similarity between diseasesassociated ontological terms, such as Gene Ontology [13] and human phenotype ontology (HPO) [14] in biomedical and bioinformatics domain. Semantic association between diseases are documented in the ontology such as Disease Ontology (DO) [9]. For measuring similarity of semantic associations, Resnikâ€™s method calculates disease similarity based on the information content (IC) of the most informative common ancestor (MICA) between two terms, Wang et al.â€™s method calculate similarity between terms considering multiple common ancestors [15]. It has been successfully employed in measuring disease similarity between medical subject headings (MeSH) terms and inferring microRNA function network [16]. Le et al. constructed disease similarity network based on semantic similarity measures on phenotype ontology database and integrated them with several kinds of gene/protein networks [17]. MultiSourcDSim proposed by Deng et al. compute the similarity between diseases by integrating multiple biological datasets including genedisease associations, GO biological processdisease associations and symptomdisease associations [18].
Functionbased methods for calculating similarity of terms incorporate genome information. Mathur and Dinakarpandian presented a processsimilarity based (PSB) method by involving the associations based on Gene Ontology [13]. Cheng et al. utilized gene interactions in the comprehensive gene functional network to calculate disease similarity (SemFunSim) [19]. In contrast with aforementioned methods which ignore that genes could also be associated based on intermediate nodes in the gene functional network, InfDisSim presented by Hu et al. models the information flow to the network in order that the entire network could be fully utilized [20]. Keller et al. revealed hidden relationships between diseases based on common associated genes as well as genes associated with a common set of diseases by investigating formal concepts [21]. Carson et al. assumed that if a gene or gene sets is related to only one pair of diseases, the similarity between these two diseases would be higher than that of a pair of diseases sharing gene associations with many other diseases [22]. However, it is worth noticing that many of these methods calculate disease similarity based on a single metric or a single data source, which could lead to a biased conclusion lacking of comprehensive assessment. Moreover, noncoding RNA have been demonstrated that they play a major part in many significant biological process, but existing methods have not taken this into account.
Noncoding RNAs have been considered as key regulators of gene expression, genome stability and defense against foreign genetic elements. The majority of the human genome transcripts are noncoding RNAs, in particular, miRNAs and lncRNAs [23, 24], which are involved in a plethora of cellular processes including either cis or transregulation of proteincoding genes and alternative splicing. In this work, we developed a novel method, called ImpAESim, to calculate disease similarity with taking these noncoding RNAs into account by integrating multiple disease information networks. Many existing methods have been proposed to ensemble multiple networks into one network, such as kernelbased [25], Bayesian inferencebased [26], weighted averaging or summingbased approaches [27], deep learning models [28, 29], and network representation learning (NRL) methods [30], these aforementioned methods fuse different networks into one integrated network and extract feature representations. However, they may induce information loss in the process of summarizing different networks into one. To solve this problem, multinetwork embedding methods have been proposed, such as Mashup [31] which captured lowdimensional feature representations of genes based on multiple networks by utilizing a matrix factorizationbased approach. However, matrix factorizationbased approaches are a kind of linear and limited approach, it is difficult to capture complex and highdimensional nonlinear structure in integrated networks.
To address above problems, we proposed a novel method, named ImpAESim (disease similarity calculation based on an improved AutoEncoder model), ImpAESim not only integrates diverse information from heterogeneous data sources (e.g., diseasegene associations, lncRNAgene associations, miRNAgene associations) but also copes with the noisy and highdimensional nature of largescale biological data by utilizing an improved AutoEncoder (AE) model to learn lowdimensional but informative vector representations of disease features. Then by measuring the distance between pairwise diseases we finally obtained the disease similarity. To this end, ImpAESim is not only a novel method to calculate disease similarity but also provide a new aspect to enrich human understanding of the heterogeneity and relevance of diseases.
Results
Effectiveness
FigureÂ 1 shows the distribution of similarity scores calculated by ImpAESim, SemFunSim and NCRR. After normalization, the similarity score of 1,390,206 disease pairs of 1,405,326 range from 0.3 to 0.8. In order to further analyze the performance of the proposed method, ImpAESim was compared with disease similarity methods SemFunSim and NCRR. During the experiment, the parameters of these methods are selected according to the original paper. To clearly study the density curves, the disease pairs with similarity score under 0.2 and 0.3 are omitted for SemFunSim and NCRR, respectively. SemFunSim and NCRR both are similarity methods utilizing disease terms and â€˜is_Aâ€™ relationships from Disease Ontology database. From this aspect, similarity score of many disease pairs may be 0 because they have no relationships according to semantic terms. Therefore, in the figure of distribution density curves of SemFunSim and NCRR, they spread wider than the density curve of ImpAESim, this indicates that the results of SemFunSim and NCRR are loosely structured, which is not beneficial to study the relationships of different diseases. To further test the efficiency of ImpAESim. We randomly selected five diseases from the disease set as the query diseases, and a list comprising of a top5 most similar diseases to each query disease generated by ImpAESim. The results were recorded in Table 1. Take Hyperbilirubinemia for example, ImpAESim has discovered that porphyria was similar or related to it with the given disease set. Many studies on these two diseases have revealed their close relationship, such as hyperbilirubinemia is observed in erythropoietic porphyrias [32]. For coronary artery disease and familial benign chronic pemphigus, it has been detected that mutations in exons of ATP2C1 gene in the patients of familial benign chronic pemphigus [33], Nassa et al. found that ATP2C1 may induce coronary artery disease [34].
Case study
Three diseases Mitochondrial complex I deficiency, Mitochondrial complex II deficiency, and Mitochondrial complex III deficiency were selected as the targets and analyzed for further evaluation of the effectiveness of our method. Besides, two nonrelated diseases Precocious puberty and Celiac disease were selected as the contrasts.Â TableÂ 2 presents the similarity score of each disease pair measured by our method.
It is known that Mitochondrial complex I deficiency, Mitochondrial complex II deficiency, and Mitochondrial complex III deficiency are all clinically belong to certain congenital disorder of metabolism. According to the international statistical classification of disease and related health problems (ICD11) released by world health organization, these three diseases are all found to be the children of the term Inborn errors of energy metabolism. Moreover, in KEGG pathway maps, they all corresponds to pathway Oxidative phosphorylation (hsa00190), while Mitochondrial complex II deficiency also corresponds to pathway Citrate cycle (TCA cycle, hsa00020) and Mitochondrial complex III deficiency corresponds to pathway Cardiac muscle contraction (hsa04260). As a contrast, precocious puberty is a type of endocrine disease and celiac disease is a kind of digestive system disease. Both of them have not been found to have any associations with the above three targeted diseases.
Discussion
Existing methods for calculating disease similarity most focus on semantic associations, disease gene associations, and gene functional networks. These methods mostly depend on ontology, which are not reliable due to the differences between disease terms from various databases. However, noncoding RNAs such as lncRNAs and miRNAs are also very important in understanding the mechanism of complex diseases. In this article, we proposed a novel method, ImpAESim, a framework integrating multiple networks embedding to learn compact feature representations and disease similarity calculation. We first utilized three different diseaserelated information networks to build up a heterogeneous network, after a network diffusion process, RWR, a compact feature learning model composed of classic AE and improved AE is proposed to extract constraints and lowdimensional feature representations. We finally obtained an accurate and lowdimensional feature representation of diseases, then we employed the cosine distance as the measurement of disease similarity. This work may facilitate relevant studies and can be further improved to attain more accurate results.
Conclusions
Complex diseases are not simply caused by a single gene, single mRNA transcript or single protein but the effect of their functionalcollaborations. Measuring similarity between complex diseases has significant implications for revealing the etiology and pathogenesis of diseases and further research in the development of biomedicine, which can also support identifying potential therapeutic drugs for diseases.
It has been consentaneous that functional associations between diseaserelated genes and semantic associations can be applied to calculate disease similarity. Currently, more and more studies have demonstrated a profound involvement of noncoding RNA in the regulation of genome organization and gene expression. Noncoding RNA seem to operate at several biological levels such as epigenetic processes that control differentiation and development. Thus, taking noncoding RNA into account can be useful in measuring disease similarities.
The results of ImpAESim lead us to a further direction in complex disease research. In this paper, we focus on the problem of computing disease similarity with disease associated noncoding RNAs and compact feature learning, which can improve the accuracy of similarity calculation by solving the problem raised by sparse disease associations.
Methods
Work frame
ImpAESim contains two main parts, (1) multiple networks embedding based on random walk with restart (RWR) and AutoEncoder, (2) disease similarity calculation. In the network embedding process, we first utilized a network diffusion algorithm (RWR) to capture single network topological information and transform it into feature representations of each disease node. However, due to the noisy and highdimensional character of biological network, we need further apply the AutoEncoder (AE), a deeplearning model to learn the features with extracted constraints from different input networks. Then the lowdimensional feature representations for each disease are obtained by integrating the constraints and hidden vector of the AutoEncoder by the proposed improved AE model (ImpAE). Intuitively, the lowdimensional vector representations encode the association information and topological context of each disease in the heterogeneous network. After obtaining the lowdimensional feature representation of diseases, ImpAESim calculates the cosine distance as the measurement of disease similarity. The workflow of ImpAESim is presented in Fig.Â 2.
Data collection
The heterogeneous network input to ImpAESim is constructed based on the following known biomedical entities: diseasegene associations, genemiRNA associations, genelncRNA associations. Thus, a total of four types of nodes and four types of edges, representing different diseasesrelated information, were collected from the public databases and used to construct the heterogeneous network for the following work. We collected diseaserelated genes from KEGG DISEASE Database, it contains 6310 genedisease associations between 5236 genes and 1907 diseases. We obtained genes regulated by lncRNAs from LncRNA2Target [35]. LncRNA2Target is a database to provide a comprehensive resource of lncRNAtarget relationships inferred from lowthroughput experiments or lncRNA knockdown or overexpression experiments followed by microarray/RNAseq. Associations between miRNA target genes and miRNAs are downloaded from miRDB database [36]. All the targets in miRDB were predicted by MirTarget, which was developed by analyzing thousands of miRNAtarget interactions from highthroughput sequencing experiments. In addition, we excluded those isolated nodes which means we only keep the nodes with at least one edge in the network. Finally, we got 1677 diseases, 2963 genes, 50 lncRNAs and 2728 miRNAs, with 4647 associations between diseases and genes, 170,226 associations between diseases and miRNAs, 5640 associations between diseases and lncRNAs 660,401 associations between genes and miRNAs, 10,154 associations between genes and lncRNAs.
ImpAESim algorithm
Network embedding and compact feature learning
To compile various curated diseaserelated information, we constructed a heterogeneous network which includes three diverse networks (diseasegene association network, diseaselncRNA association network, and diseasemiRNA association network, as shown in Fig.Â 3). First we utilized a network diffusion algorithm, random walk with restart, RWR, to capture the topological information of each network and transform it into feature representations of nodes. RWR introduces a predefined restart probability at the initial node for every iteration, which can take into consideration of global topological connectivity patterns within the network to fully exploit the latent direct or indirect relations between nodes. Formally, let A denote the weighted adjacency matrix of a molecular interaction network with n diseases. Matrix B is defined as a transition matrix, in which B_{i,j} denotes the probability of a transition from node i to node j, which means,
Then, \(p_{i}^{t}\) denotes an ndimensional feature vector of disease i in which each element stores the probability of a node being visited from node i after t iterations in the random walk process. Thus, the RWR process from node i can be defined as:
where \(e_{i}\) indicates the ndimensional standard basis vector with \(e_{i} \left( i \right) = 1\) and \(e_{i} \left( j \right) = 0,{ }\forall i \ne j\), and \(u_{r}\) denotes the predefined restart probability, after a range of iterating process, we can obtain a stationary distribution \(p_{i}^{\infty }\) of RWR process.
Then each of the disease information network obtained after RWR process is fed into the original autoencoder. For a pair of disease nodes i and j, we first utilized Pearson correlation coefficient (PCC) to measure the pairwise similarity between them. Let x_{i} and x_{j} denotes the feature vectors of node i and j, the PCC of them can be indicated as:
Constraints extraction using ImpAE
After obtaining the distances between all pairs of disease nodes from each network, two thresholds for positivelink and negativelink are set to extract both constraints. Let \(T_{1} { }\) and \(T_{2}\) denote the positivelink threshold and negativelink threshold, respectively. Thus, if the PCC value of a pair of nodes is larger than \(T_{1}\), the pair is considered as a positivelink constraint, and if the PCC value is smaller than \(T_{2}\), the pair is considered as a negativelink constraint.
By compiling the both kinds of constraints, we can get a set of positivelink constraints which means each pair of nodes in it is strongly associated and a set of negativelink constraints which means each pair of nodes in it are unrelated. As a result, the size of constraint sets is much smaller than that of the original network. Thus, the constraints can be considered as the correlation of different networks for the following work.
Constraints integration using ImpAE
AE is a typical unsupervised deep learning model which aims to learn a new encoding representation of input data with a superiority in dimensionality reduction. In this work, we proposed a novel optimized autoencoder model named ImpAE to learn the lowdimensional feature representation based on integrating correlations of different networks. The input of ImpAE includes lowdimension feature vector and constraints obtained from former layer. Because the constraints are derived from different networks, before feeding into the ImpAE, we need to take the intersection of the constraints in case that they may conflict with each other. As shown in Fig.Â 1, the output of RWR process is first input to original AE, then the output of original AE is fed to ImpAE.
Original autoencoder model is composed of two parts, encoder and decoder. The â€˜encoderâ€™ operation converts the original highdimensional data to lowdimensional feature vectors, and the â€˜decoderâ€™ operation recovers the input data from the lowdimensional feature vectors. The output lowdimensional feature vectors are considered as a compact representation of the original input data. Let \(x_{{\text{i}}}\) be the ith input vector indicating the node representation of the network, and f, g be the activation functions of the hidden layer and the output layer, respectively. Then the output representation of hidden layer and output layer can be indicated as follows,
where Î©â€‰=â€‰(W, b, \(W^{\prime}\), d) are the parameters, f and g are activate functions, here we chose the sigmoid function. Then the optimization goal is to minimize the reconstruction error between the reconstructed vector \(y_{{\text{i}}}\) and the input vector \(x_{{\text{i}}}\),
After obtaining the constraints from the original AE, the lowdimensional feature vectors and constraints are fed into ImpAE to learn the new representation of the feature vectors. Intuitively, if node i node j are a pair of positivelink constraint, the distance between them should be smaller after encoding. On the contrary, if they are a pair of negativelink constraint, the distance between them should be larger after encoding. Let \(h_{{\text{i}}}\) and h_{j} be the output of encoding operator in the AE, which represent the lowdimensional feature vectors of disease i and j. let \(x_{{\text{i}}}\) and x_{j} denotes the original feature vectors of disease i and j which are the input of the encoding operator. Let d(\(h_{{\text{i}}} ,\) h_{j}) and d(\(x_{{\text{i}}} ,\) x_{j}) indicate the error score between disease i and j in the encoding space and original space, respectively. From the hypothesis we mentioned above, d(\(h_{{\text{i}}} ,h_{{\text{j}}}\)) should be smaller than d(\(x_{{\text{i}}} ,\) x_{j}) if node i and j are a pair of positivelink constraints, d((\(h_{{\text{i}}} ,\) h_{j}) should be larger than d(\(x_{i} ,\) x_{j}) if node i and j are a pair of negativelink constraints. Hence, we add a penalty on the loss function if disease pair (i, j) is a positivelink constraint and we add a reward on the loss function if disease pair (i, j) is a negativelink constraint. Therefore, the loss function for modeling constraints is defined as follows:
where P, N indicates the constraints sets of positivelink constraints and negativelink constraints, respectively. \(\gamma_{1}\), \(\gamma_{2}\) are the weight coefficients restraining the influence of penalty and reward, respectively.
To integrate the constraints we proposed an improved autoencoder model named ImpAE, which combined Eqs.Â (6) and (7) then jointly minimizes the following loss function:
The loss function is constituted of two parts, the first part measures the squared error between output and input node features, the second part measures the error score of constraints in the hidden layer.
Disease similarity calculation
After obtaining lowdimensional feature representations of diseases by ImpAE, we calculated the disease similarity defined as the measurement of cosine distance of their feature vectors \(W_{{d_{i} }} = \left\{ {W_{1,1} ,W_{1,2} , \ldots ,W_{1,i} , \ldots ,W_{1,N} } \right\}\) as following:
The ImpAESim algorithm
The ImpAESim algorithm mainly contains two parts, a multinetwork embedding algorithm for compact feature learning and a disease similarity calculation method based on distance measurement of feature vectors. In the compact feature learning part, we first ran the RWR process on each of the diseaserelated information network to learn the topological structure information, then we trained the original AE and ImpAE model to learn the lowdimensional representations of disease features. As the iterations increase, the model tends to be stable eventually. Then the cosine distance of disease feature vectors is computed as the disease similarity. The pseudocode for ImpAESim is shown in Algorithm 1.
Availability of data and materials
All the datasets used in this paper could be downloaded from https://github.com/mymymaya/ImpAESim
Abbreviations
 AE:

Auto encoder
 RWR:

Random walk with restart
 SEMD:

Spondyloepimetaphyseal dysplasia
 PMDS:

Persistent Mullerian duct syndrome
 MODY:

Maturity onset diabetes of the young
 GSD:

Glycogen storage disease
 VSD:

Ventricular septal defect
 SMS:

Smithâ€“Magenis syndrome
 DJS:

Dubinâ€“Johnson syndrome
 ILL:

Infantile liver failure
 ADHD:

Adrenocorticotropic hormone deficiency
 HD:

Hematological disease
 FBCP:

Familial benign chronic pemphigus
 LEP:

Leptin
 SLC:

Sideroblastic
 SCDO:

Spondylocostal dysostosis
 CPHD:

Combined pituitary hormone deficiency
 CRMO:

Chronic recurrent multifocal osteomyelitis
References
Zhao T, Hu Y, Valsdottir LR, Zang T, Peng J. Identifying drugâ€“target interactions based on graph convolutional network and deep neural network. Brief Bioinform. 2021;22(2):2141â€“50.
Zhao T, Lyu S, Lu G, Juan L, Zeng X, Wei Z, Hao J, Peng J. SC2disease: a manually curated database of singlecell transcriptome for human diseases. Nucleic Acids Res. 2021;49(D1):D1413â€“9.
Lage K, Karlberg EO, StĂ¸rling ZM, Olason PI, Pedersen AG, Rigina O, Hinsby AM, TĂ¼mer Z, Pociot F, Tommerup N. A human phenomeinteractome network of protein complexes implicated in genetic disorders. Nat Biotechnol. 2007;25(3):309â€“16.
Wu X, Liu Q, Jiang R. Align human interactome with phenome to identify causative genes and networks underlying disease families. Bioinformatics. 2009;25(1):98â€“104.
Wang D, Wang J, Lu M, Song F, Cui Q. Inferring the human microRNA functional similarity and functional network based on microRNAassociated diseases. Bioinformatics. 2010;26(13):1644â€“50.
Zhao T, Hu Y, Cheng L. DeepDRM: a computational method for identifying diseaserelated metabolites based on graph deep learning approaches. Brief Bioinform. 2020. https://doi.org/10.1093/bib/bbaa212.
Gottlieb A, Stein GY, Ruppin E, Sharan R. PREDICT: a method for inferring novel drug indications with application to personalized medicine. Mol Syst Biol. 2011;7(1):496.
Zhao T, Liu J, Zeng X, Wang W, Li S, Zang T, Peng J, Yang Y. Prediction and collection of proteinâ€“metabolite interactions. Brief Bioinform. 2021. https://doi.org/10.1093/bib/bbab014.
Li J, Gong B, Chen X, Liu T, Wu C, Zhang F, Li C, Li X, Rao S, Li X. DOSim: an R package for similarity between diseases based on disease ontology. BMC Bioinform. 2011;12(1):266.
Mathur S, Dinakarpandian D. Automated ontological gene annotation for computing disease similarity. Summit Transl Bioinform. 2010;2010:12.
Suthram S, Dudley JT, Chiang AP, Chen R, Hastie TJ, Butte AJ. Networkbased elucidation of human disease similarities reveals common functional modules enriched for pluripotent drug targets. PLoS Comput Biol. 2010;6(2):e1000662.
Mathur S, Dinakarpandian D. Finding disease similarity based on implicit semantic similarity. J Biomed Inform. 2012;45(2):363â€“71.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT. Gene ontology: tool for the unification of biology. Nat Genet. 2000;25(1):25â€“9.
Robinson PN, Mundlos S. The human phenotype ontology. Clin Genet. 2010;77(6):525â€“34.
Wang JZ, Du Z, Payattakool R, Yu PS, Chen CF. A new method to measure the semantic similarity of GO terms. Bioinformatics. 2007;23(10):1274â€“81.
Leacock C, Chodorow M. Combining local context and WordNet similarity for word sense identification. WordNet Electron Lex Database. 1998;49(2):265â€“83.
Le DH, Dang VT. Ontologybased disease similarity network for disease gene prediction. Vietnam J Comput Sci. 2016;3(3):197â€“205.
Deng L, Ye D, Zhao J, Zhang J. MultiSourcDSim: an integrated approach for exploring disease similarity. BMC Med Inform Decis Mak. 2019;19(6):1â€“10.
Cheng L, Li J, Ju P, Peng J, Wang Y. SemFunSim: a new method for measuring disease similarity by integrating semantic and gene functional association. PLoS ONE. 2014;9(6):e99415.
Hu Y, Zhou M, Shi H, Ju H, Jiang Q, Cheng L. InfDisSim: a novel method for measuring disease similarity based on information flow. In: 2016 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE; 2016. p. 20â€“6.
Raza K. Formal concept analysis for knowledge discovery from biological data. Int J Data Min Bioinform. 2017;18(4):281â€“300.
Carson MB, Liu C, Lu Y, Jia C, Lu H. A disease similarity matrix based on the uniqueness of shared genes. BMC Med Genom. 2017;10(1):27â€“32.
Filipowicz W, Bhattacharyya SN, Sonenberg N. Mechanisms of posttranscriptional regulation by microRNAs: are the answers in sight? Nat Rev Genet. 2008;9(2):102â€“14.
Mercer TR, Dinger ME, Mattick JS. Long noncoding RNAs: insights into functions. Nat Rev Genet. 2009;10(3):155â€“9.
Yu G, Rangwala H, Domeniconi C, Zhang G, Zhang Z. Predicting protein function using multiple kernels. IEEE/ACM Trans Comput Biol Bioinf. 2014;12(1):219â€“33.
Wong AK, Krishnan A, Yao V, Tadych A, Troyanskaya OG. IMP 2.0: a multispecies functional genomics portal for integration, visualization and prediction of protein functions and networks. Nucleic Acids Res. 2015;43(W1):W128â€“33.
Tsuda K, Shin H, SchĂ¶lkopf B. Fast protein classification with multiple networks. Bioinformatics. 2005;21(suppl_2):ii59â€“65.
Peng J, Xue H, Wei Z, Tuncali I, Hao J, Shang X. Integrating multinetwork topology for gene function prediction using deep neural networks. Brief Bioinform. 2021;22(2):2096â€“105.
Peng J, Wang Y, Guan J, Li J, Han R, Hao J, Wei Z, Shang X. An endtoend heterogeneous graph representation learningbased framework for drugâ€“target interaction prediction. Brief Bioinform. 2021. https://doi.org/10.1093/bib/bbaa430.
Peng J, Guan J, Hui W, Shang X. A novel subnetwork representation learning method for uncovering diseasedisease relationships. Methods. 2021;192:77â€“84.
Cho H, Berger B, Peng J. Compact integration of multinetwork topology for functional analysis of genes. Cell Syst. 2016;3(6):540.e5548.e5.
Balwani M, Desnick RJ. The porphyrias: advances in diagnosis and treatment. Blood J Am Soc Hematol. 2012;120(23):4496â€“504.
Li X, Zhang D, Ding J, Li L, Wang Z. Identification of ATP2C1 mutations in the patients of Haileyâ€“Hailey disease. BMC Med Genet. 2020;21(1):1â€“11.
Nassa G, Giurato G, Cimmino G, Rizzo F, Ravo M, Salvati A, Nyman TA, Zhu Y, Vesterlund M, LehtiĂ¶ J. Splicing of platelet resident premRNAs upon activation by physiological stimuli results in functionally relevant proteome modifications. Sci Rep. 2018;8(1):1â€“12.
Cheng L, Wang P, Tian R, Wang S, Guo Q, Luo M, Zhou W, Liu G, Jiang H, Jiang Q. LncRNA2Target v2.0: a comprehensive database for target genes of lncRNAs in human and mouse. Nucleic Acids Res. 2019;47(1):140â€“4.
Chen Y, Wang X. miRDB: an online database for prediction of functional microRNA targets. Nucleic Acids Res. 2020;48(D1):D127â€“31.
Acknowledgements
The authors thank the anonymous referees for their many useful suggestions.
About this supplement
This article has been published as part of BMC Bioinformatics Volume 23 Supplement 1, 2022: Selected articles from the Biological Ontologies and Knowledge bases workshop 2020. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume23supplement1.
Funding
Publication costs are funded by the National Natural Science Foundation of China [No.: 62072082]. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
TYZ helped revise this paper. NYZ wrote this paper and did the experiments. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Zhang, N., Zang, T. A multinetwork integration approach for measuring disease similarity based on ncRNA regulation and heterogeneous information. BMC Bioinformatics 23 (Suppl 1), 89 (2022). https://doi.org/10.1186/s12859022046131
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12859022046131
Keywords
 Noncoding RNA
 Disease similarity
 Semantic association
 Gene functional network