This novel method is named bipartite graph-based collaborative matrix factorization (BGCMF). The method is divided into two major steps. First, the Gaussian interaction profile kernel (GIP) and nearest neighbour profile (NP) are introduced in our method to process the original miRNA matrix and the disease matrix to obtain their network information. At the same time, WKNKN is used to handle the original interaction matrix \({\mathbf{Y}}\) to minimize the error. Second, the BG algorithm is implemented to obtain prediction matrix \({\mathbf{Y}}_{{1}}\) and collaborative matrix factorization (CMF) to obtain the prediction matrix \({\mathbf{Y}}_{2}\), respectively. Finally, the prediction matrix \({\mathbf{Y}}_{{{\text{predict}}}}\) is obtained by combining our two improved models. The flowchart of BGBMF is shown in Fig. 4.

### MiRNA functional similarity

With the hypothesis that functionally similar miRNAs tend to be associated with phenotypically similar diseases, a computing method of miRNA functional similarity was presented by Wang et al*.* [10]. The functional similarity score matrix can be downloaded from http://www.cuilab.cn/files/images/cuilab/misim.zip. Here, the obtained functional similarity for miRNA is denoted by \({\mathbf{S}}_{m} \in {\mathbb{R}}^{{{\text{n}} \times {\text{n}}}}\), and the value of entity \({\mathbf{S}}\left( {M\left( i \right),M\left( j \right)} \right)\) measures the closeness between miRNA \(M\left( i \right)\) and \(M\left( j \right)\).

### Disease semantic similarity

A directed acyclic graph (DAG) is proposed to describe the relationships among various diseases. In addition, the disease \(D\) can be described by \(DAG\left( D \right) = \left( {D,T\left( D \right),E\left( D \right)} \right)\). \(T\left( D \right)\) is the node set and represents both its ancestor nodes and \(D\) itself. \(E\left( D \right)\) is used to represent all direct edges between child nodes and parent nodes. The semantic similarity value of disease \(D\) is as follows:

$$SV1\left( {\text{D}} \right) = \sum\limits_{d \in T\left( D \right)} {D1_{D} } \left( d \right),$$

(4)

$$D1_{\text{D}} \left( d \right) = \left\{ {\begin{array}{*{20}l} 1 \hfill & {if{\kern 1pt} {\kern 1pt} {\kern 1pt} d = D,{\kern 1pt} } \hfill \\ {\max \left\{ {\Delta * D1_{D} \left( {d^{^{\prime}} } \right)\left| {d^{^{\prime}} \in children} \right.of{\kern 1pt} d} \right\}} \hfill & {if{\kern 1pt} {\kern 1pt} {\kern 1pt} d \ne D{\kern 1pt} ,} \hfill \\ \end{array} } \right.$$

(5)

where \(\Delta\) represents the semantic contribution factor and \(D1_{D} \left( d \right)\) is the contribution of disease \(d\). For each disease \(d\), its contribution to itself is 1, and the contribution of its child node decreases with increasing distance. Obviously, when the two diseases have a larger shared part in their \(DAGs\), they will obtain a greater similarity score. \(SV\left( {d_{i} } \right)\) and \(SV\left( {d_{j} } \right)\) represent the semantic similarity values of \(d_{i}\) and \(d_{j}\), respectively. Thus, the semantic similarity score of the two diseases \(d_{i}\) and \(d_{j}\) can be calculated as follows:

$$S_{d} \left( {d_{i} ,d_{j} } \right) = \frac{{\sum\nolimits_{{t \in T\left( {d_{i} } \right) \cap T\left( {d_{j} } \right)}} {\left( {D_{{d_{i} }} \left( t \right) + D_{{d_{j} }} \left( t \right)} \right)} }}{{SV\left( {d_{i} } \right) + SV\left( {d_{j} } \right)}}.$$

(6)

### Gaussian Interaction Profile Kernel for miRNAs and diseases

According to the previous work [38], the method is based on the idea that it relies on the topological structure of known miRNA–disease associations in a network to compute the similarity of diseases and miRNAs [26]. Here are two miRNAs \(m_{i}\) and \(m_{j}\) and two diseases \(d_{i}\) and \(d_{j}\). The network similarity between them can be calculated with the following formulas:

$$GIP_{miRNA} \left( {m_{i} ,m_{j} } \right) = \exp \left( { - \gamma \left\| {{\mathbf{Y}}\left( {m_{i} } \right) - {\mathbf{Y}}\left( {m_{j} } \right)} \right\|^{2} } \right),$$

(7)

$$GIP_{disease} \left( {d_{i} ,d_{j} } \right) = \exp \left( { - \gamma \left\| {{\mathbf{Y}}\left( {d_{i} } \right) - {\mathbf{Y}}\left( {d_{j} } \right)} \right\|^{2} } \right),$$

(8)

where \(\gamma\) is an adjustable parameter that can control the bandwidth of the kernel. In addition, \({\mathbf{Y}}\left( {m_{i} } \right)\) and \({\mathbf{Y}}\left( {m_{j} } \right)\) are the miRNA interaction profiles of \(m_{i}\) and \(m_{j}\), respectively. Similarly, \({\mathbf{Y}}\left( {d_{i} } \right)\) and \({\mathbf{Y}}\left( {d_{j} } \right)\) are the disease interaction profiles of \(d_{i}\) and \(d_{j}\), respectively. Then, the network similarity matrix \({\mathbf{K}}_{m}\) of miRNA and the \({\mathbf{K}}_{d}\) of disease are obtained by combining the original matrix \({\mathbf{S}}_{m}\) and \({\mathbf{S}}_{d}\). The detailed descriptions are as below:

$${\mathbf{K}}_{m} = \alpha {\mathbf{S}}_{m} + \left( {1 - \alpha } \right){\mathbf{GIP}}_{miRNA} ,$$

(9)

$${\mathbf{\rm K}}_{d} = \alpha {\mathbf{S}}_{d} + \left( {1 - \alpha } \right){\mathbf{GIP}}_{disease} ,$$

(10)

where \(\alpha\) is an adjustable parameter range in [0, 1], and \({\mathbf{K}}_{m}\) represents the miRNA integrated similarity matrix, which is a linear combination of the Gaussian interaction profile kernel similarity for miRNA \({\mathbf{GIP}}_{miRNA}\) and the miRNA functional matrix \({\mathbf{S}}_{m}\). Similar to \({\mathbf{K}}_{m}\), \({\mathbf{K}}_{d}\) represents the disease integrated similarity matrix, which is a linear combination of the Gaussian interaction profile kernel similarity for disease \({\mathbf{GIP}}_{disease}\) and the disease semantic matrix \({\mathbf{S}}_{d}\). When \(\alpha\) is equal to 0.5, BGCMF achieves the highest AUC value. The sensitivity analysis of \(\alpha\) is shown in Fig. 5.

### Bipartite graph method

Based on the assumption that miRNAs that are similar will interact with similar diseases, the interaction profile for a new miRNA candidate could be inferred from the known interactions of their neighbours. MiRNAs with large similarities to new potential miRNAs are said to be their neighbours. Therefore, we introduce the nearest profile (NP) to our method [39]. Below are the formulas for calculating a new miRNA \(m_{i}\) and a new disease \(d_{i}\).

$${\mathbf{N}}_{m} \left( {m_{i} } \right) = {\mathbf{K}}_{m} \left( {m_{i} ,m_{nearest} } \right) \times {\mathbf{Y}}\left( {m_{nearest} } \right),$$

(11)

$${\mathbf{N}}_{d} \left( {d_{i} } \right) = {\mathbf{K}}_{m} \left( {d_{i} ,d_{nearest} } \right) \times {\mathbf{Y}}\left( {d_{nearest} } \right),$$

(12)

where \(m_{nearest}\) and \(d_{nearest}\) are the miRNAs most similar to \(m_{i}\) and the diseases most similar to \(d_{i}\), respectively. \({\mathbf{N}}_{m} \left( {m_{i} } \right)\) and \({\mathbf{N}}_{d} \left( {d_{i} } \right)\) are the association profiles of the miRNAs and diseases, respectively. The NP process in this method can be divided into four steps. First, remove the self-similarity of miRNA matrices \({\mathbf{K}}_{m}\) and \({\mathbf{K}}_{d}\). Next, obtain the nearest neighbour for each miRNA and disease. Then, ignore all miRNA similarities and disease similarities. Finally, the miRNA nearest neighbour matrix \({\mathbf{\rm N}}_{m}\) and disease nearest neighbour matrix \({\mathbf{N}}_{d}\) can be obtained.

### Weighted profile

The weighted profile (WP) is proposed as a simple predictive model in [39]. The idea of the weighted profile is to perform a similarity-weighted average of all other miRNAs or diseases to obtain the prediction matrix. For instance, the WP for a new miRNA \(m_{i}\) and a new disease are computed as:

$$\widehat{{\mathbf{Y}}}\left( {m_{i} } \right) = \frac{{\sum\nolimits_{j = 1}^{{n_{m} }} {{\mathbf{N}}_{m} \left( {m_{i} ,m_{j} } \right) \times {\mathbf{Y}}\left( {m_{j} } \right)} }}{{\sum\nolimits_{j = 1}^{{n_{m} }} {{\mathbf{N}}_{m} \left( {m_{i} ,m_{j} } \right)} }},$$

(13)

$$\widehat{{\mathbf{Y}}}\left( {d_{i} } \right) = \frac{{\sum\nolimits_{j = 1}^{{n_{d} }} {{\mathbf{N}}_{d} \left( {d_{i} ,d_{j} } \right) \times {\mathbf{Y}}\left( {d_{j} } \right)} }}{{\sum\nolimits_{j = 1}^{{n_{d} }} {{\mathbf{N}}_{d} \left( {d_{i} ,d_{j} } \right)} }},$$

(14)

where \({\mathbf{N}}_{m}\) and \({\mathbf{N}}_{d}\) are the nearest neighbour matrices we construct for miRNA and disease. \({\mathbf{Y}}\left( {m_{j} } \right)\) and \({\mathbf{Y}}\left( {d_{j} } \right)\) are association matrices of miRNA \(m_{j}\) and disease \(d_{j}\), respectively. First, the BG algorithm is used to obtain the neighbour information about miRNAs and diseases, and then predictions from both miRNA and disease sides are averaged to obtain the final prediction matrix:

$${\mathbf{Y}}_{1} = \frac{{\widehat{{\mathbf{Y}}}\left( {m_{i} } \right) + \widehat{{\mathbf{Y}}}\left( {d_{j} } \right)}}{2}.$$

(15)

### BGCMF for MiRNA-disease associations association prediction

The traditional collaborative matrix factorization (CMF) method is effective in predicting the underlying interactions between miRNAs and diseases [29]. The objective function of CMF method is defined as:

$$\min_{{{\mathbf{A}},{\mathbf{B}}}} = \left\| {{\mathbf{Y}} - {\mathbf{AB}}^{T} } \right\|_{F}^{2} + \lambda_{l} \left( {\left\| {\mathbf{A}} \right\|_{F}^{2} + \left\| {\mathbf{B}} \right\|_{F}^{2} } \right) + \lambda_{d} \left\| {{\mathbf{S}}_{m} - {\mathbf{AA}}^{T} } \right\|_{F}^{2} + \lambda_{t} \left\| {{\mathbf{S}}_{d} - {\mathbf{BB}}^{T} } \right\|_{F}^{2} ,$$

(16)

where \(\lambda_{l} {\kern 1pt} {\kern 1pt} {\kern 1pt}\), \(\lambda_{d} {\kern 1pt} {\kern 1pt}\), and \(\lambda_{t}\) are non-parameters and \(\left\| \cdot \right\|_{F}^{2}\) represents the Frobenius norm. In this formula, the first item is used to find the low-rank matrices \({\mathbf{A}}\) and \({\mathbf{B}}\) of the reconstructed \({\mathbf{Y}}\). The second item is the Tikhonov regularization term. The last two items are regularization terms that demand potential feature vectors of similar miRNAs/diseases to be similar and potential feature vectors of dissimilar miRNAs/diseases to be dissimilar. However, traditional CMF does not take into account the network relationship between the miRNA and the disease, which will reduce the accuracy of predicting MDAs. Therefore, we introduce the Gaussian kernel similarity \({\mathbf{K}}_{m}\) of miRNA and the \({\mathbf{K}}_{d}\) of disease into CMF [40]. The objective function can be rewritten as:

$$\min_{{{\mathbf{A}},{\mathbf{B}}}} = \left\| {{\mathbf{Y}} - {\mathbf{AB}}^{T} } \right\|_{F}^{2} + \lambda_{l} \left( {\left\| {\mathbf{A}} \right\|_{F}^{2} + \left\| {\mathbf{B}} \right\|_{F}^{2} } \right) + \lambda_{d} \left\| {{\mathbf{K}}_{m} - {\mathbf{AA}}^{T} } \right\|_{F}^{2} + \lambda_{t} \left\| {{\mathbf{K}}_{d} - {\mathbf{BB}}^{T} } \right\|_{F}^{2} ,$$

(17)

where \(\left\| \cdot \right\|_{F}^{2}\) is the Frobenius norm. \(\lambda_{l} {\kern 1pt} {\kern 1pt} {\kern 1pt}\), \(\lambda_{d} {\kern 1pt} {\kern 1pt}\) and \(\lambda_{t}\) represent the positive parameters. In this study, the setting of the three parameters is done by cross-validation. The grid search is adopted to select the optimal parameters among these values:\(\lambda_{l} \in \left\{ {2^{ - 2} ,2^{ - 1} ,2^{0} ,2^{1} ,2^{2} } \right\}\), \(\lambda_{d} /\lambda_{t} \in \left\{ {2^{ - 6} ,2^{ - 5} ,2^{ - 4} ,2^{ - 3} ,2^{ - 2} ,2^{ - 1} ,2^{0} ,2^{1} ,2^{2} } \right\}\). The association matrix \({\mathbf{Y}}\) is decomposed into two low-rank matrices \({\mathbf{A}}\) and \({\mathbf{B}}\), where \({\mathbf{Y}} \approx {\mathbf{AB}}^{T}\). Tikhonov regularization is adopted to minimize the norms of both \({\mathbf{A}}\) and \({\mathbf{B}}\). The roles of the third and fourth terms are to minimize the squared error \({\mathbf{S}}_{m} \approx {\mathbf{AA}}^{T}\) and \({\mathbf{S}}_{d} \approx {\mathbf{BB}}^{T}\), respectively.

### Initialization of \({\mathbf{A}}\) and \({\mathbf{B}}\)

In the CMF method, the first step is to initialize the adjacency matrix \({\mathbf{Y}}\). We use singular value decomposition (SVD) to decompose the input matrix \({\mathbf{Y}} \in {\mathbb{R}}^{n \times m}\) into \({\mathbf{U}}^{n \times k}\), \({\mathbf{S}}^{k \times k}\) and \({\mathbf{V}}^{k \times m}\). Then, matrix \({\mathbf{A}}\) and matrix \({\mathbf{B}}\) are obtained by the following formula:

$$\left[ {{\mathbf{U,S,V}}} \right] = SVD\left( {{\mathbf{Y}},k} \right),\quad {\mathbf{A}} = {\mathbf{US}}_{k}^{1/2} ,\quad {\mathbf{B}} = {\mathbf{VS}}_{k}^{1/2} ,$$

(18)

where \({\mathbf{S}}\) is a diagonal matrix and \({\text{k}}\) represents the maximum number of singular values.

### Alternating least squares

In this study, alternating least squares is used to optimize \({\mathbf{A}}\) and \({\mathbf{B}}\) until convergence. Here, \(L\) is used to represent the objective function of BGCMF. Then, \({\mathbf{A}}\) and \({\mathbf{B}}\) are obtained by letting \(\partial L/\partial {\mathbf{A}} = 0,\) and \(\partial L/\partial {\mathbf{B}} = 0,\) respectively. Moreover, the optimal values of \(\lambda_{l}\), \(\lambda_{d}\) and \(\lambda_{{\text{t}}}\) are automatically obtained through a fivefold cross-validation experiment. The iterative formulas for \({\mathbf{A}}\) and \({\mathbf{B}}\) are represented by:

$${\mathbf{A}} = \left( {{\mathbf{YB}} + \lambda_{d} {\mathbf{K}}_{m} {\mathbf{A}}} \right)\left( {{\mathbf{B}}^{T} {\mathbf{B}} + \lambda_{l} {\mathbf{I}}_{k} + \lambda_{d} {\mathbf{AA}}^{T} } \right)^{ - 1} ,$$

(19)

$${\mathbf{B}} = \left( {{\mathbf{Y}}^{T} {\mathbf{A}} + \lambda_{{\text{t}}} {\mathbf{K}}_{d} {\mathbf{B}}} \right)\left( {{\mathbf{A}}^{T} {\mathbf{A}} + \lambda_{l} {\mathbf{I}}_{k} + \lambda_{d} {\mathbf{B}}^{T} {\mathbf{B}}} \right)^{ - 1} .$$

(20)

Finally, the final prediction matrix \({\mathbf{Y}}\) is obtained by combining both the BG algorithm and the optimized CMF model.