Combined embedding model for MiRNA-disease association prediction

Background Cumulative evidence from biological experiments has confirmed that miRNAs have significant roles to diagnose and treat complex diseases. However, traditional medical experiments have limitations in time-consuming and high cost so that they fail to find the unconfirmed miRNA and disease interactions. Thus, discovering potential miRNA-disease associations will make a contribution to the decrease of the pathogenesis of diseases and benefit disease therapy. Although, existing methods using different computational algorithms have favorable performances to search for the potential miRNA-disease interactions. We still need to do some work to improve experimental results. Results We present a novel combined embedding model to predict MiRNA-disease associations (CEMDA) in this article. The combined embedding information of miRNA and disease is composed of pair embedding and node embedding. Compared with the previous heterogeneous network methods that are merely node-centric to simply compute the similarity of miRNA and disease, our method fuses pair embedding to pay more attention to capturing the features behind the relative information, which models the fine-grained pairwise relationship better than the previous case when each node only has a single embedding. First, we construct the heterogeneous network from supported miRNA-disease pairs, disease semantic similarity and miRNA functional similarity. Given by the above heterogeneous network, we find all the associated context paths of each confirmed miRNA and disease. Meta-paths are linked by nodes and then input to the gate recurrent unit (GRU) to directly learn more accurate similarity measures between miRNA and disease. Here, the multi-head attention mechanism is used to weight the hidden state of each meta-path, and the similarity information transmission mechanism in a meta-path of miRNA and disease is obtained through multiple network layers. Second, pair embedding of miRNA and disease is fed to the multi-layer perceptron (MLP), which focuses on more important segments in pairwise relationship. Finally, we combine meta-path based node embedding and pair embedding with the cost function to learn and predict miRNA-disease association. The source code and data sets that verify the results of our research are shown at https://github.com/liubailong/CEMDA. Conclusions The performance of CEMDA in the leave-one-out cross validation and fivefold cross validation are 93.16% and 92.03%, respectively. It denotes that compared with other methods, CEMDA accomplishes superior performance. Three cases with lung cancers, breast cancers, prostate cancers and pancreatic cancers show that 48,50,50 and 50 out of the top 50 miRNAs, which are confirmed in HDMM V2.0. Thus, this further identifies the feasibility and effectiveness of our method. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04092-w.

with lung cancers, breast cancers, prostate cancers and pancreatic cancers show that 48,50,50 and 50 out of the top 50 miRNAs, which are confirmed in HDMM V2.0. Thus, this further identifies the feasibility and effectiveness of our method.
Keywords: MiRNA and disease interactions, Meta-path, Pair embedding, Node embedding, Combined embedding Background Microribonucleic acids (miRNAs), a small non-coding RNA molecule which contains about 21-22 nucleotides, have an important effect on the post-transcriptional level and cell processes [1]. Experiments have confirmed that miRNAs participate in the diagnosis and medical treatment of heart conditions [2], cardiovascular diseases, malignancies, mental disorders and diabetes. For instance, medical experiments exhibit that mir-33 controls cholesterol homeostasis [3]. Hence, it is essential for medical scholars to find out miRNAs which are related to diseases. Many medical technologies, e.g., microarrays and PCR, have been utilized to explore miRNA and disease associations [4]. Though, traditional medical experiments have their limitations in high cost and time-consuming. Therefore, many researchers are devoted to devising computational methods to find unidentified miRNA and disease interactions, so that they can recompense the drawbacks [5,6] of traditional experimental methods.
Many innovational computational approaches have been developed to discovery miRNA and disease interactions recently. Among them, those methods can be approximately classified into two categories: similarity-based methods and machine learningbased methods. With the presumption that miRNAs with similar functions are closely associated with similar diseases, many kinds of measurements apply similarity-based methods. For instance, Jiang et al. proposed the first method which combines disease phenotype information with miRNA information to predict miRNA and disease interactions [7]. Nevertheless, this approach also had some shortcomings. It was unreasonable to regard the number of overlapping target genes of two miRNAs as the criterion for calculating the miRNA functional similarity score, which proved that it was inadequate because it ignored the indirect neighbors. According to functional similarity, miRNA clusters, and miRNA families, Xuan et al. scored unlabeled miRNAs. However, the miRNA similarity network they utilized restrained their experimental performance [8]. Chen et al. applied the random walk algorithm to the prediction of miRNA and disease associations [9]. However, this method had some limitations in constructing miRNA functionally similar networks, which made it unable to predict new diseases without the confirmed related miRNAs. Then, Chen et al. integrated within-scores and betweenscores to rank the unverified miRNA and disease associations [10]. Besides, without using any known miRNA-disease associations, Zhao et al. innovatively constructed a miRNA-lncRNA-disease network(DCSMDA), which integrated the miRNA-lncRNA associations and lncRNA-disease associations to indirectly predict miRNA-disease intearctions [11]. In summary, the subject of the similarity calculation method is to construct a network model, and different methods are used to measure the similarity between nodes in the network to predict miRNA and disease interactions, most of which are limited by the quality of the constructed network model and the incomplete relationship between nodes.
Except methods based on similarity measures, exploring potential miRNA-disease interactions with machine learning algorithms is also a significant academic method in this field. Different from the methods based on similarity to directly calculate the similarity between nodes in the network, researches based on machine learning are committed to extracting inherent features and devising effective classification algorithms to find miRNA and disease associations. For example, Jiang et al. offered negative samples randomly from the unverified miRNA-disease pairs and applied SVM as prediction classifier [12]. Different from above approach, Chen et al. designed a semi-supervised classification, which demanded no negative samples [13]. In order to solve data insufficiency and data noise, Liang et al. devised an objective function based on L1-norm [14]. Chen et al. chose the discriminative features in view of occurrence frequency [15]. Further, Zhao et al. combined multiple weak classifiers with boosting to strengthen classification [16]. In addition, matrix decomposition [17,18] and collaborative filtering [19] are both useful in revealing miRNA-disease relations. For instance, Mao et al. devised the method based on genomic data fusion, which employed the Bayesian Probabilistic Matrix Factorization model to fuse data from multiple sources(MDBPMF). They innovatively offered a great approximation to the matrix and were able to generalize it by assessing its performance on invisible data [20]. Also, there are enormous efforts on predicting miRNA and disease association motivated by promising development of autoencoder [21], node embedding [22], deep learning and structural deep network embedding (SDNE) [23].
Though, current approaches have favorable performances to predict the unconfirmed miRNA and disease interactions. We still have to do some work to improve experimental performance. On the one hand, many papers have shown that previous node-centric methods simply compute the similarity by applying a similarity metric, such as inner product or Euclidean distance [24], ignoring hidden relative information between two nodes. On the other hand, some methods limit in obtaining intrinsic information and discriminative features from miRNA-disease associations, to a large extent. Moreover, some methods are not suitable for new diseases without the confirmed miRNAs.
Node-centric methods fall short of considering the hidden relative information between two nodes. Thence, we introduce the concept of "pair". We deem that "pair" can better capture the hidden relative features between two nodes. In order to obtain effictient relative features between two nodes, it is necessary to transform the feature them simultaneously which we call "pair embedding". For instance, Fig. 1 demonstrates a visualization of embeddings of miRNA and disease, where each miRNA is assigned a single embedding. Names of most diseases contain keywords related to body organs, which can be their feature representing their disease type. We assume that miR-21 cluster has related to multiple disease types, such as Pancreatic cancers [25], Breast cancers. Whereas miR-17 cluster [26], regarded as oncogene, is solely overexpressed in lung cancers. Since every miRNA has a single embedding, it has to be embedded to a best single point among all the various disease types. Thus, lung cancers are regarded to be associated with miR-17, rather than miR-21 when predicting. However, in fact, miR-21 has confirmed to be related to lung cancers in clinical trials [27]. On the other hand, as shown in Fig. 2, if we can embed each miRNA-disease pair such that each pair independently captures its associated features. ("Target disease", miR-21) pair may be associated more closely with the valid pairs related to "lung cancers" than ("Target disease", miR-17) pair is. To sum up, the pair embedding could capture the hidden features behind the pairwise relationship more precisely than the node embedding.
Meta-paths are some links formed by a series of nodes, which can be employed to preserve associations between nodes and explore the structure information in heterogeneous networks. Shi et al. offered an algorithm to reveal relationships by performing random walk [28]. They used the miRNA-target associations and disease-gene interactions to identify potential miRNA-disease. However, the model strongly depended on the previous nodes to predict the next node in the network [29], ignoring that each node had a different contribution to the meta-path and could not optimize it step by step. Different from Shi's work, we develop a novel Combined Embedding model for MiRNA and Disease Associations prediction to learn the similarity feature of miRNAs and diseases. We deem that the pair embedding can better capture the features between two nodes. Then, the MLP enables us to construct the fine-grained pairwise relationship in confirmed miRNA and disease pair. We construct heterogeneous network from the identified miRNA-disease pairs, disease semantic similarity and miRNA functional similarity. According to the above heterogeneous network, we find all the associated context paths of each confirmed miRNA and disease in the miRNA-disease heterogeneous network. Then, the associated context paths are linked by nodes, and we propose to employ meta-path based nodding embedding to obtain features which are high contributions to meta-paths during model training. The parameters are optimized to get better prediction through iterative training. To incentivize associated meta-paths, the multi-head attention mechanism is applied to weight the hidden state of each sequence and compensate for the dependency loss of the meth-paths in model training. In this way, the similarity information transmission mechanism in a meta-path of miRNA and disease is obtained through multiple network layers. Finally, we combined the pair embedding and node embedding, which predicts the fine-grained relationship in heterogeneous network better than single embedding. At the same time, CEMDA is suitable for new diseases with unknown miRNA information. Our method outperforms other state-of-the-art methods, with the power of the combination of pair embedding of miRNA-disease and meta-path based node embedding. The results of global LOOCV and 5-folds cross validation illustrate that CEMDA achieves the AUCs of 93.16% and 92.03%, respectively. Furthermore, three kinds of case researches with breast cancers, lung cancers, pancreatic cancers, prostate cancers and colorectal cancers illustrate our approach obtains a remarkable performance.

Results
Firstly, we present the experimental methods and evaluation criteria. Secondly, compared with five classical methods, the results of CEMDA are analyzed. Finally, we implement three kinds of case researches to verify the experimental performance of our approach.

Experimental approaches and evaluation criteria
5430 experimental identified miRNAs-diseases interactions are collected from HMDD V2.0 [30] to regard as the dataset in the predicting work. We apply global LOOCV and fivefold cross validation strategies in experiments. Then, every one verified miRNA and disease pair is acted as the testing samples, and the other pairs are view as the training samples in global LOOCV. At the same time, the miRNA and disease associations are divided into five equal-size groups randomly in fivefold cross validation. Then, four groups are regarded as the training set and the other one left acts as the testing set. We repeat fivefold cross validation 50 times to reduce randomness, and then calculate the averaged results. All the meta-paths, the length of which is less than 4, are extracted, because we find that too long meta-paths contribute little to improve the performance and increase too much in computing resources.
We consider area under the curve as AUC, which is regarded as the standard to evaluate the following compared approaches' performance.

Comparisons of CEMDA with pair embedding and without pair embedding
We compared CEMDA with pair embedding and without pair embedding upon Global LOOCV and fivefold cross validation. The results depicted in Figs. 5 and 6, demonstrate that the pair embedding enhances the effect in global LOOCV and fivefold cross validation strategies, which means that the pair embedding takes an important role in CEMDA. First, the pair embedding helps model the fine-grained pairwise relationship better than the previous when each node only has a single embedding. Second, pair embedding generates incentives to the associated nodes in the meta-path. The feature information of miRNA-disease pair is obtained by multi-layer perceptron to enhance the similarity information transmission. We find that it's the better performance when meta-path length increases. More relative nodes are contained when the length of meta-path increases, which brings rich information and abundant features in meta-paths to model training. In other word, the method can integrat more long-term dependency between nodes. Figures 7 and 8 show that the meta-path length increases, but the performance of CEMDA falls distinctly. Because the length of meta-path is longer, the information repeats more in segments that it contains, which contributes less to the performance. After many trials, we decided 3L as the max length of meta-path in our method below.

Influence of projection dimensions
We respectively compared the influence of several projection dimensions Z in Formula (11) on the result of CEMDA under global LOOCV and fivefold cross-validation. Figure 9 shows the AUC values of CEMDA under different projection dimensions Z upon global LOOCV and fivefold cross-validation. In the Formula (11), we used five different projection dimensions, 32, 64, 128, 256 and 512, respectively. It illustrates that the AUC with the increase of projection dimensions values display an upward trend slightly. Besides, we also tested experiment on the projection dimensions of 512, the effect was diminished slightly in training process because of huge amount of calculation and data noise. Thence, we finally selected the projection dimensions of 256.

Cases studies
Three kinds of case researches are carried out to further validate miRNA and disease interactions. In the first case research, we utilized lung cancers and breast cancers with HDMM V2.0 as data set to discovery the associated unverified miRNAs for. Finally, we compare the found candidate miRNAs with two public databases, dbDEMC [32] and PhenomiR [33] to validate its accuracy. It has been reported that lung cancers are overwhelming deadly diseases that led to a wide range of deaths worldwide [34]. Biomedical finds that a person discovers lung cancers as soon as possible, he may have a high survival rate. Medical experiments have proven that miRNAs have a huge effect on the diagnosis and cure of lung cancers [35]. Depicted in Table 1, the first column contains the top 50 and the second column lists the top 26-50. Among them, 48 of the top 50 candidates are proved to be related to lung cancers by biological experimental results that are supported from the two public  Table 1, has been illustrated to promote proliferation in non-small cell cancers [36]. Thence, the performance of our prediction model offers a novel view for researches.
Breast cancers are widespread neoplasms with high mortality in women around the world. The deaths of breast neoplasm will up to three million in the future [37]. Evidence that miR-142-3p is related to breast cancers, has been validated in biological experiments. We adopt CEMDA to verify the related miRNAs for breast cancers and chose the top 50 related miRNAs contained in Table 2. It has been shown that all the top 50 miR-NAs were supported by the above-mentioned databases. Hsa-mir-140, which ranks 1st, has been validated to promote the spread of breast neoplasm cell [38]. Thence, the novel findings illustrate that CEMDA offers strong evidence for breast neoplasm predictions.
Then, in the second case research, we want to verify whether this approach is suitable for new diseases without the confirmed related miRNA in biological experiments. We first selected prostate cancers because it is the most universal cancers in men in the world. It is said that over one hundred thousand men die from prostate diseases in a foreign country in 2018 [39]. Firstly, we set all miRNA-disease associations that are associated with prostate cancers from HMDD 2.0 to zero and then perform CEMDA to verify the related miRNAs for prostate cancers. The results shown in Additional file 1: Table S1 indicates that all the top 50 miRNAs were verified by dbDEMC and PhenomiR. Second, to access more new diseases further, we carried out the research on pancreatic cancers. The results of the case of pancreatic cancers are contained in Additional file 1: Table S2. All of the top 50 predicted miRNAs were also included in HMDD, dbDEMC and Phe-nomiR. Therefore, the case indicates that CEMDA is suitable for new diseases without the confirmed related miRNAs. Finally, we implemented the third case research to identify whether CEMDA trained with data from an older version of HMMD could verify new imported miRNA and disease pairs in a new version of HMDD. We use HMDD 3.0 [40], dbDEMC and PhenomiR to identify the outcomes. The findings of the case research in colorectal cancers are contained in Additional file 1: Table S3. All of the top 50 miRNAs are supported by HMDD 3.0, dbDEMC and PhenomiR.
In view of the outcomes of three case researches, we summarize that, our approach is effective when predicting unverified miRNA and disease interactions.

Discussion
Compared with five classical approaches upon global LOOCV and fivefold cross validation, experimental results indicate that CEMDA has better prediction performance. Moreover, three kinds of case researches with five diseases also support our approach' s result. Firstly, we take out all meta-path instances of the confirmed miRNA and disease pair in miRNA and disease heterogeneous network to obtain complicated associations from miRNA and disease interactions. Meta-paths are linked by noeds and then input to GRU to learn more accurate similarity measures between miRNA and disease. Considering that there are different nodes with different contribution values in the meta path, the multi-head attention mechanism is used to weight the hidden state of each matepath, and the similarity information transmission mechanism in a meta-path of miRNA and disease is obtained through multiple network layers. Second, the MLP is utilized to obtain the relative information in confirmed miRNA and disease pair. By applying pair embedding that captures the features behind the pairwise relationships, we can obtain the fine-grained associations. Finally, meta-path based node embedding and pair embedding are devised to integrate node and edge information from meta-path instances. In conclusion, CEMDA achieves an excellent prediction in modeling the fine-grained pairwise relationship and considering contributions of different nodes in the miRNA and disease heterogeneous network.

Methods
The framework of predicting miRNA and disease associations by CEMDA is presented in Fig. 9. Firstly, many similarity methods are utilized to compute miRNA integrated similarity and disease integrated similarity. Secondly, we build the heterogeneous network from experimentally certified miRNA and disease associations, miRNA integrated similarity and disease integrated similarity. Thirdly, we develop a novel Combined Embedding model to extract associated information to predict the unidentified miRNA and disease associations. The model is composed of pair embedding of miRNA-disease, meta-path based node embedding and predicting miRNA-disease associations with combined embedding. Pair embedding employs the MLP to pay more attention to important segments in pairwise relationship. Then, the initial representations of miR-NAs and diseases with different dimensions are projected into the same vector space. The associated context paths are serialized based on nodes, and then GRU is used to learn node features which are high contributions to meta-paths. The multi-head attention mechanism is used to weight the hidden state of each sequence, and the entire meta-path information is obtained through multiple network layers. We define the loss function to obtain the ultimate representations of miRNAs and diseases by combining pair embedding and meta-path based node embedding.

MiRNA and disease association network structure
HMDD V2.0 is composed of supported experimentally miRNA-disease interactions, which is a universal database. In this article, we employ the adjacency matrix A ∈ R m×n to express the supported miRNA and disease associations. Where, m and n stand for the number of miRNAs and diseases, respectively. The element A ij is equal to 1, which means miRNA r i is associated with disease d j . Otherwise, A ij equals to 0 in the matrix. We utilize the datasets with HMDD v2.0 to construct the matrix. As illustrated in the datasets, there are 5430 associations between 495 miRNAs and 383 diseases. We define that m = 495 and n = 383 . Overall, the adjacency matrix A is adopted to construct miRNA and disease association network.

Disease integrated similarity network construction
In order to make the experimental model more accurate and reliable, we investigated Wang  where GD d i , d j represents disease Gaussian interaction profile kernel similarity. Assuming that if two diseases have more the same ancestor subject headings, they will be more similar in semantics. In the above Formula (1), SS d i , d j represents the combined semantic similarity of diseases d i and d j . For the first disease semantic similarity method, we take disease semantic similarity based on MeSH which defined by Wang et al. For any kind of disease D , it can be represented by a Directed Acyclic Graph (DAG(D)) , which contains the set of ancestor disease nodes and the edges of each parent node pointing to the child node. They define the contribution of disease d in DAG(D) as follows: where is the semantic attenuation contribution factor (0 < ∆ < 1). This article refers to Xuan et al. 's study [8] and set factor to 0.5. Then, the semantic value of disease D is the sum of the semantic contribution values of D and its all ancestor nodes as follows: where T (D) means all ancestor nodes of disease D including itself in the DAG graph.
Eventually, they calculate the first disease semantic similarity between disease d i and disease d j as follows: Xuan et al. [8] defined the second method to provide the semantic value of disease D . Supposing that some special diseases may have higher contributions to disease D , they have another definition of the semantic contribution of disease d as follows: When, the semantic similarity SS2 d i , d j between d i and d j is calculated as the percentage of the contribution of themselves and their common ancestor nodes as follows:

number of DAGs inluding d the numbuer of diseases
Eventually, the first disease semantic similarity calculation method and the second disease semantic similarity calculation method are arithmetically averaged as the disease semantic similarity SS d i , d j as follows: Finally, according to the Formula(1), we calculated disease integrated similarity network SD d i , d j .

MiRNA integrated similarity network structure
According to Wang et al. ' study, miRNAs with similar functions are often associated with diseases with similar semantics [42]. We calculated miRNA similarity by merging miRNA functional similarity FSand Gaussian interaction profile kernel similarity GM as follows: where FS r i , r j ( i ∈ [1, 495], j ∈ [1, 383]) represents miRNA functional similarity between r i and r j . GM(r i , r j ) represents Gaussian interaction profile kernel similarity of miRNAs r i and r j . Benefit from Wang's task, the miRNA functional similarity FS r i , r j is downloaded from their study.
Besides, Zhao et al. calculated the Gaussian similarity calculation between miRNA r i and miRNA r j as follows [16]: where IV (r i ) , IV (r i ) is the i-th and j-th row of matrix A , respectively. Parameter α r controls the kernel bandwidth as follows: where initial kernel bandwidth parameter α r0 is set to 1.
Finally, we can provide miRNA integrated similarity network SM as Formula (8).
To sum up, we combine miRNA and disease association network, miRNA integrated similarity network, disease integrated similarity network to construct miRNA and disease heterogeneous network. We define MiRNA and Disease heterogeneous network as an undirected graph G = ( V, E), including miRNAs ( M ) and diseases ( D ). V is composed of miRNA and disease nodes. E represents an edge set containing three edge types, for example, M → D or D → M indicates a miRNA is correlated with a disease, M → M suggests two miRNA nodes are similar and D → D reveals us there is an edge between two disease nodes.
FS r i , r j r i and r j has functional similarity GM r i , r j otherwise

Meta-path instances extraction from MiRNA and disease heterogeneous network
There are one or multiple paths between a miRNA and a related disease in miRNA and disease heterogeneous network. Meta-paths mean that the indirect and composite connections between miRNA and disease, which help to understand information and complicated structure in miRNA and disease associations. There are different meta-path instances between the confirmed miRNA and disease association in its sequence. For convenience, we explain meta-path instance below. Firstly, we define that meta-path P with L-Length as a sequence is in form of m → N 1 → · · · N i → · · · d . Where, m and d is from the verified miRNA and disease pair with HMDD2.0, N i ∈ {M, D} . Different types of meta-path can help understand the season why two nodes are closely related to each other. Because the paths from one node to another can also be associated with multiple types, which construct the different semantics of the paths. For example, a meta-path type of D → D → M shows that if a disease is associated with a miRNA, then other disease who is similar to the disease will be potential associated with the miRNA. A meta-path type of D → M → M shows that if a miRNA is associated with a disease, then other miRNA who is similar to the miRNA will be potential associated with the disease. There are different mete-path instances with L-Length between the identified m and d as shown in Fig. 10. For example, the confirmed m 2 and d 2 pair have different instances with different length, one meta-path instance Finally, all meta-path instances of the confirmed miRNA and disease in network are extracted.

Linear transformations of MiRNAs and diseases
We take the i-th row in the miRNA similarity matrix SM as the initial features of the i-th miRNA. In the same way, we regard the j-th row in the disease similarity matrix SD as the feature of the j-th disease. Then, the initial features of miRNAs and diseases projected into the same vector with linear transformations because of the difference of dimensions.
We project the feature of a miRNA r into the Z-dimensional space as follows: Similarly, the initial feature of disease d is projected into the Z-dimensional space as follows: where h r , h d is the projected feature of miRNA r and disease d, respectively. x r and x d are the initial feature of miRNA r and disease d. W R ∈ R Z * m is a linear transformation matrix to project the 495-dimensional matix into Z-dimensional space and W D ∈ R Z * n is a linear transformation matrix to project the 383-dimensional matix into Z-dimensional space.
In Fig. 9, the nodes with shadow are the transformed representation of the initial miRNA and disease.

MLP encoder of miRNA-disease interactions
Given a miRNA embedding h r ∈ R Z and a disease embedding h d ∈ R Z as Com(h r , h d ) ∈ R 4Z , we use a m-layer multi-layer perceptron (MLP) to embed miRNAdisease interaction ( h r , h d ) into Z-dimensional vector. The pair embedder is g(r, d) . Firstly, miRNA embedding and disease embedding is combined to form the initial input of MLP.
where ° denotes element-wise vector multiplication, ReLU(x ) denotes max(0, x) and g(r, d) ∈ R X . We employ dropout on the hidden layers and regarded the last layer output of MLP as the pair embedding. We take g(·) as a 2-layered MLP, which each layer has 100 hidden units. (11)

Validity of pair embedding
Recall that one of the limitations of node embedding is that it inadvertently makes a miRNA and a disease similar to each other if they frequently appear together within the meta-path, whether or not the miRNA is associated with disease. Then, we present a pair validity classifier π : R X → R to discriminate whether the miRNA-disease pair is a valid pair or not, which is formulated by binary cross-entropy loss as follows: π(·) is a 2-layered MLP with ReLU activation.

Multi-head attention embedding of meta-path
Meta-paths are linked by a series of nodes, which can be employed to preserve the important structure information in heterogeneous networks. According to a meta-path instance p connecting the confirmed miRNA r with disease d , the measurable features of the connection are implied in the sequences of p . The sequence of p is represented as {X 1 , X 2 , · · · X n−1 , X n } , where, X 1 = h r , X n = h d . Considering that different nodes in the meta path have different importance to the meta path, GRU can learn important nodes with the contributions to the sequence, which is suitable for sequential data learning. We use a GRU to generate a Z-dimensional vector for p . GRU calculates the hidden state h t with h t−1 and X t as input, t ∈ [1, n] , which is shown as follows.
where σ is a sigmoid function, and W zx ∈ R X×Z , W rx ∈ R X×Z , W hx ∈ R X×Z , We apply dropout to the hidden state update vector as g t follows: where d(·) is the dropout function defined as follows: where q is the dropout rate and mask is a vector, which is got from sampling from the Bernoulli distribution with success probability 1 − q.
We obtain an embedding matrix h∈ R n×Z after GRU training of meta-path instance p . Z-dimesnional vector is extracted by aggregating h with attentive pooling. The contribution of each node in the meta-path instances is measured as follows: where M ∈ R Z is a trained attention parameter vector, i ∈ [1, n], j ∈ [1, n].
The extracted vector is formed by a weighted sum of the vectors from the matrix h as follows: To make the learning of attention parameter stable, we extend attention mechanism to multi-head attention, conduct attention K times independently and average their outputs as follows: where ΣΣ indicates concatenation, α k i are normalized attention coefficients in the K-th attention.

Attention-aware fusion of multiple meta-path instances to represent miRNA-disease associations
For meta-path instances connecting the confirmed miRNA r and disease d , the metapath instances may have different length. The meta-path instances with the same meta-path length exhibit diverse contributions to the connection between r i and d j as the difference of nodes in the sequences, which we call meta-path type. For example, m 2 → m 4 → d 3 → d 4 and m → m 4 → d 1 → d 4 are listed in Fig. 10. Since the related information involved in two meta-path instances are not the same. To merge the global information of different meta-path instances with the same length to indicate the connection between r and d, we joint into an attention.
where att p ∈ R Z is the parameter in meta-path instance p . e p indicates the contribution of meta-path instance p of r i and d j .  meta-path instances with meta-path type P . For all p ∈ P , the comprehensive representation the connection between r i and d j can be obtained by the weighted sum of all meta-path instances as shown in Formula (29).

Attention-aware fusion of multiple meta-paths to represent miRNA-disease associations
We define meta-path type as P i , i ∈ [1, N ] and the features of the confirmed miRNA r i and disease d j association by different meta-path type as h P i ∈ R Z . Supposing the different contributions of different types and length, attention mechanisms are employed to obtain the ultimate representation.
where att P i ∈ R Z is the parameter with different path length P i . w P i indicates that the contribution of meta-path type P i to the connection. w ′ P i is normalized with the softmax function of all the meta-paths. So, h p r,d ∈ R Z represents all math-path with path length attention.
Finally, the representations of miRNA r and disease d interactions with significant information of meta-paths are modeled by the above-mentioned mechanisms.

Predicting MiRNA-disease associations with combined embedding
Finally, we get the ultimate representation of miRNA and disease h P u , including the total information of miRNA and disease associations. The parameters of W R , W D , att p and att pi are trained in order to gain features as correct as possible. The primary purpose for training our model is to make distance between two nodes who are related in miRNA and disease heterogeneous network as small as possible. Meanwhile, we want to make pair embedding and meta-path based node embedding similar. Thence, we predicting miRNAdisease associations with combined embedding. We obtain the cross entropy for meta-path based node embedding as follows: where P is the set of positive pairs with the supported relationships. The parameters can be learned by minimizing the following loss function. We combine the above two loss functions to gain the ultimate loss function as follows: (30)