 Research article
 Open Access
 Published:
ANMDA: antinoise based computational model for predicting potential miRNAdisease associations
BMC Bioinformatics volume 22, Article number: 358 (2021)
Abstract
Background
A growing proportion of research has proved that microRNAs (miRNAs) can regulate the function of target genes and have close relations with various diseases. Developing computational methods to exploit more potential miRNAdisease associations can provide clues for further functional research.
Results
Inspired by the work of predecessors, we discover that the noise hiding in the data can affect the prediction performance and then propose an antinoise algorithm (ANMDA) to predict potential miRNAdisease associations. Firstly, we calculate the similarity in miRNAs and diseases to construct features and obtain positive samples according to the Human MicroRNA Disease Database version 2.0 (HMDD v2.0). Then, we apply kmeans on the undetected miRNAdisease associations and sample the negative examples equally from the kcluster. Further, we construct several data subsets through sampling with replacement to feed on the light gradient boosting machine (LightGBM) method. Finally, the voting method is applied to predict potential miRNAdisease relationships. As a result, ANMDA can achieve an area under the receiver operating characteristic curve (AUROC) of 0.9373 ± 0.0005 in fivefold crossvalidation, which is superior to several published methods. In addition, we analyze the predicted miRNAdisease associations with high probability and compare them with the data in HMDD v3.0 in the case study. The results show ANMDA is a novel and practical algorithm that can be used to infer potential miRNAdisease associations.
Conclusion
The results indicate the noise hiding in the data has an obvious impact on predicting potential miRNAdisease associations. We believe ANMDA can achieve better results from this task with more methods used in dealing with the data noise.
Background
MicroRNA (miRNA) is a class of endogenous small molecule singlestranded noncoding RNA (ncRNA), which can specifically bind to 3'UTR (3'untranslated region) of the target mRNA [1]. Research shows that miRNA is involved in many cell activities including cell proliferation, apoptosis, and stem cell differentiation [2, 3]. It's reported that 48,860 different mature miRNAs sequences have been found from 271 organic organisms, of which 2654 mature miRNAs sequences come from humans [4].
MiRNArelated malfunctions are related to various types of human diseases including tumor, neurodegeneration, and diabetic cardiomyopathy, etc. [5,6,7]. Therefore, uncovering the miRNAdisease associations can provide valuable clues for disease diagnosis at an early stage [8]. Based on the hypothesis that miRNAs with similar functions tend to be related to similar diseases [9], much effort has been devoted to developing various computational methods for miRNAdisease associations prediction during the past years [10].
In general, there are four main types of methods proposed to predict potential miRNAdisease associations.
One type of method is the score functionbased algorithms. Jiang et al. [11] integrated miRNAs functional interactions network and disease similarity network and then implemented a scoring method to predict the associations. Chen et al. [12] used a model of calculating withinscores and between scores for miRNAdisease association probabilities (WBSMDA) by integrating miRNA functional similarity, disease semantic similarity, and using Gaussian kernel functions. One challenge of these methods is to utilize more effective features and to design a reasonable score function.
Another type of method is networkbased algorithms. Shi et al. [13] tried to connect miRNA and disease through the gene function network and applied the random walk algorithm for final prediction. You et al. [14] constructed a heterogeneous graph with many paths by using weighted matrices to design a pathbased algorithm for prediction (PBMDA). Qu et al. [15] built a reliable heterogeneous network and used KATZ to predict miRNAdisease associations (KATZMDA). One challenge of the methods is to integrate different data to build reliable networks and analyze the network function.
The third type of method is mainly based on machine learning algorithms. Chen et al. [16] proposed a rankingbased knearest neighbor method for miRNAdisease associations prediction (RKNNMDA). RKNNMDA searched miRNA and disease by knearest neighbors and reranked them by support vector machine (SVM). Ha et al. [17] utilized a matrix factorization method to predict miRNAdisease associations (PMAMCA). Zhu et al. [18] used the biased heat conduction (BHCMDA) to pay more attention to unpopular nodes and improve the final results. Recently, ensemble learning methods have been designed to solve this problem and achieve great success. For instance, Zhao et al. [19] adopted the adaptive boosting algorithm for prediction (ABMDA). By adapting the weighing coefficient of residual samples, the algorithm relearned the residual samples and obtain better results. Zhou et al. [20] combined gradient boosting decision trees with logistic regression (GBDTLR) to predict potential pairs. Yao et al. [21] used the random forest to select 100 important features and predict miRNAdisease associations based on the selected features (IRFMDA100). Peng et al. [22] attempted to solve this association inference based on ensemble learning and kernel ridge regression (EKRRMDA). However, the training cost of the ensemble learning methods is often high.
The last type of method belongs to deep learningbased methods. As convolution neural networks (CNN) can obtain potential information between features effectively, Peng et al. [23] used autoencoders for dimensionality reduction and then applied CNN to predict miRNAdisease associations (MDACNN). To extract dense and highdimensional representations of diseases and miRNAs, Ji et al. [24] used a deep autoencoder framework (AEMDA). Further, to utilize the information of all miRNAdisease pairs during the pretraining process, Chen et al. [25] adopted a deepbelief network (DBNMDA) to predict the associations. Li et al. [26] applied fully connected graph convolutional networks to rank the potential pairs, which combined the graphrelated techniques and CNN (FCGCNMDA). However, deep learning may be more suitable for bigger data.
Although much progress has been made in this field, the noise hiding in the data is an unprecedented problem to be tackled. As some researchers [19,20,21, 23, 25, 26] regard undetected miRNAdisease pairs as negative samples and randomly choose several samples to feed into algorithms, the algorithms may be influenced by some unreliable negative samples.
This paper proposes a novel antinoise algorithm predict potential miRNAdisease associations (ANMDA). According to the method, we first analyze the interference of the noise and then use a kmeans algorithm to pick negative samples, subsample to noise smoothing, and finally apply Light Gradient Boosting Machine (LightGBM) to tackle this problem.
The main contributions are listed as follows: (1) We focus on the noise hiding in the data from a new perspective. (2) We subsample the data to smooth the noise to eliminate the influence of the noise. (3) We apply an effective algorithm (LightGBM) to further deal with the noise. The results demonstrate that ANMDA can outperform some published methods.
Result
Experiment design
To validate the performance of ANMDA, we design different experiments to demonstrate the effect of subsampling for noise smoothing and the superiority of LightGBM. In our study, all of the experiments are implemented by using fivefold crossvalidation 100 times, and the evaluation metrics are the same as other works including the area under the receiver operating characteristic curve (AUROC), area under the preciserecall curve (AUPR), precision, recall, and F1score.
Performance evaluation on ANMDA
We evaluate the performance of ANMDA and compare the results of ANMDA with 6 other published methods: WBSMDA, BHCMDA, EKRRMDA, MDACNN, FCGCNMDA, and DBNMDA. The main character for each method is shown in Table 1. WBSMDA is a classic method, BHCMDA and EKRRMDA are recently published machine learning methods, EKRRMDA is an ensemble learning method and more comparable to ANMDA. Furthermore, the deep learningbased models: MDACNN, FCGCNMDA, and DBNMDA are also picked.
The AUROCs of ANMDA and other 6 published methods are shown in Fig. 1, as we can see, ABMDA achieves the best performance in these 6 methods. What’s more, the standard deviation of ANMDA is 0.0005, which means that ANMDA is more stable than other methods such as WBSMDA (0.0009) and DBNMDA (0.0026).
To further show the performance of ANMDA, we repeat ABMDA, GBDTLR, and IRFMDA100 to compare with ANMDA because they have similar feature construction and data construction. In addition, all of them belong to ensemble learning algorithms. To design a fair and convincing experiment, we test these methods on the same data. The results are shown in Fig. 2. It is shown from the ROC curve and the preciserecall curve that ANMDA can outperform ABMDA, GBDTLR, and IRFMDA100. In addition, ANMDA can achieve higher AUROC and AUPR and lower standard deviation than ABMDA, GBDTLR, and IRFMDA100. Table 2 shows the performance of different methods in 100 times fivefold crossvalidation test.
Effect of subsampling for noise smoothing
To evaluate the influence of subsampling for noise smoothing, we compare the results of using subsampling for noise smoothing or not. The results are shown in Fig. 3.
Noisy_KNN and Noisy_MLP represent applying kNearest Neighbor (kNN) and Multilayer Perceptron (MLP) directly for the data, respectively. Smooth_Noisy_KNN and Smooth_Noisy_MLP represent applying kNN and MLP in subsampling for noise smoothing on the data, respectively.
The results demonstrate that the performance of both algorithms is improved after using subsampling for noise smoothing. Specifically, the average AUROC of kNN and MLP increases by 2.35%, and the average AUPR increases by 3.75%, respectively.
The superiority of LightGBM in noise resistance
To reveal the noise resistance ability of each algorithm, we compare the performance of the methods (LightGBM, kNN, and MLP) on the dataset. The results are shown in Fig. 4.
Noisy_KNN, Noisy_MLP, Noisy_LGB represent applying kNN, MLP LightGBM method, respectively. It can be seen that the performance of LightGBM is better than the other two algorithms, reflecting that LightGBM is expert in dealing with the noise in the data.
Case study
Further, we use ANMDA to predict undetected miRNAdisease pairs that are not recorded in the Human MicroRNA Disease Database version 2.0 (HMDD v2.0). Then, we verify the results in HMDD v3.0 which records more newlydiscovered miRNAdisease associations. The results of the top 200 miRNAdisease associations predicted by ANMDA are shown in the Additional file 1.
Two kinds of case studies are carried out to prove the prediction ability of ANMDA. In the first part, we sort all of the undetected pairs and then verify the top 50 associations predicted by ANMDA with HMDD v3.0. The results are shown in the Additional file 2: Table 1. In the second part, we apply ANMDA to predict prostate neoplasm, gastric neoplasm, colorectal carcinoma, melanoma, and hepatocellular carcinoma. For each disease, the top 10 predicted miRNAdisease associations are selected based on the probabilities. The results are shown in the Additional File 2: Table 2.
In conclusion, the case studies indicate that ANMDA can predict potential miRNAdisease associations with high accuracy.
Discussion
In this work, we analyze the noise hiding in the data systematically and propose a novel and practical algorithm ANMDA to tackle the noise properly. The main reasons can be listed as follows: (1) By subsampling for noise smoothing, we extract several subsets from the data. In this way, the noise can be separated into each subset, thereby it reduces the interference to the algorithm on judging positive samples because of the noise aggregation. Further, subsampling for noise smoothing can further decrease the influence of the noise by averaging the prediction results of each subset. (2) The residual is mainly caused by the noise hiding in the data. Further, LightGBM based on GBDT can fit residual in each iteration and improve the final prediction.
However, there are also some limitations in ANMDA. First, the high computational cost in the training process of ANMDA is an important problem. For instance, it takes about 300 min to finish fivefold crossvalidation 100 times with CPU of Intel Xeon E31231 and 1.5 GB of memory usage. In addition, using the current sampling method to discover reliable negative samples is common, therefore, there is still room for improvement.
Conclusion
This paper proposes a novel method (ANMDA) to predict potential miRNAdisease associations. The experiment results confirm that ANMDA can achieve better results than other published methods. In the case study, several miRNAdisease associations predicted by ANMDA are supported by HMDD v3.0. Therefore, ANMDA is effective and can provide a reference for researchers. In the followup work, we plan to use feature selection to accelerate the training process and try to find reliable negative samples. Further, some biological experiments can also be conducted to verify the prediction results of ANMDA.
Methods
The framework of ANMDA is shown in Fig. 5.
First, the features are constructed based on the miRNA functional similarity, disease semantic similarity, and Gaussian kernel functions. Second, we try to visualize the noise to reveal the effect of noise on data. Based on HMDD v2.0, we construct positive samples and use kmeans on undetected pairs to select negative samples as data. Then, we subsample the data to smooth the noise. Finally, each subset is fed to LightGBM, and a voting rule is used to decide the final prediction.
MiRNAdisease associations
HMDD records experimentally supported human miRNA and disease associations. The current version of HMDD is 3.0. As most of the researchers [12,13,14,15,16,17,18,19,20,21,22, 25, 26] choose HMDD v2.0 to test their methods, so we also take it to validate ANMDA. Finally, we obtained 5430 experimentally verified associations, including 495 miRNAs and 383 diseases [27].
Feature construction
We construct the features by integrating miRNA functional similarity, disease semantic similarity, and using Gaussian kernel functions, which is similar to several other methods [14, 16, 18,19,20,21,22, 24,25,26].
Disease semantic similarity
Based on the idea that "functionally similar miRNAs may be associated with similar diseases, vice versa" [28], we calculate the semantic similarity of two diseases according to the extent that they share in common [29].
First, according to MeSH (Medical Subject Headings) tree structure, the relationship between diseases can be displayed as a layered directed acyclic graph (DAG). Each vertex is composed of tree numbers and the heading of one disease. The directed edge in DAG represents the coordination of different diseases. The diseases with a more general heading (like neoplasm) are at an upper layer in the DAG called ancestor nodes. The vertex at a lower layer in the DAG called the children node is composed of diseases having a more specific definition. Given a disease d_{i} and its DAG Equation is as follows:
where P(d_{i}) represents the set of vertexes in the DAG and S(d_{i}) represents the set of edges in the DAG.
Therefore, the similarity based on the semantic value between two diseases can be measured according to their positions in the DAG. The more information two diseases share in common, the more similar they are. To be specific, the semantic similarity between disease d_{i} and disease d_{j} can be calculated as follows:
Respectively, D_{di}(d) is defined as the semantic value of the disease d contributes to the disease d_{i}. Disease d is a set of the vertex shared by the disease d_{i} and the disease d_{j} in common in the DAG. V(d_{i}) represents the semantic value of the disease d_{i}.
To calculate D_{di}(d), we assume that diseases at different layers in the DAG contribute differently to the semantic value of disease d_{i} [38]. Therefore, we define it as a semantic contribution factor and the contribution of disease to d_{i} itself is defined as 1, and the disease located at the upper node of the DAG denotes less to the semantic value of the disease d_{i}. Therefore, the contribution of disease d to the semantic value of disease d_{i} can be calculated by the formula:
In addition, to avoid the problem that two kinds of diseases having different occurrences in the DAG are calculated as the same semantic value for being at the same layer, a new way is used to define the contribution of disease d to the semantic value of disease d_{i}:
In the formula, N_{d} is the number of DAGs that contain diseases d. N represents the number of all of the diseases. Based on the contribution of each disease d in the DAG to the disease d_{i}, disease d_{i}’s V(d_{i}) can be calculated by the formula:
As shown in Eqs. (3) and (4), there are two ways to calculate D_{di}(d). Thus, two semantic similarities (SS_{1} and SS_{2}) are calculated according to Eq. (2). Here, the final semantic similarity is calculated as follows:
miRNA functional similarity
Research combine disease phenotype similarity, semantic similarity, and miRNAdisease network to calculate miRNAs functional similarity [30, 31].
For the two miRNAs m_{i} and m_{j}, (1) According to the miRNAdisease network, we set MD_{i} = {md_{1}, md_{2}, …, md_{ni}} for all the diseases associated with m_{i}, and MD_{j} = {md_{1}, md_{2}, …, md_{nj}} for all the diseases associated with m_{j}. (2) We calculate the semantic value of each disease in MD_{i} and MD_{j}. (3) Finally, the functional similarity of m_{i} and m_{j} is calculated as follows:
Respectively, n_{i} is the number of diseases associated with m_{i}. n_{j} is the number of diseases associated with m_{j}. S(md, MD) is the max semantic similarity between the disease md and any diseases in another set MD.
Disease and miRNA similarity
As mentioned above, the Gaussian interaction kernel function is used for computing the disease and miRNA similarity [32].
In the miRNAdisease association network, the binary interaction profile vector IP(x_{i}) represents the interaction information of disease or miRNA. Therefore, the Gaussian interaction profile kernel similarity for diseases or miRNAs is defined as follows:
In the formula, x can represent disease d or miRNA m, IP(x_{i}) is the interaction information of disease d_{i} or miRNA m_{i}. IP(x_{j}) is the interaction information of disease d_{j} or miRNA m_{j}.
γ_{x} is a parameter controlling the kernel bandwidth and can be calculated by normalizing γ_{x}’ by the average number of related miRNAs(diseases) per disease(miRNA). The specific formula is as follows:
Here, we set γ_{x}’ to a value of 1 based on the previous study [33], so that we can have a better comparison.
Integrated similarity for diseases and miRNAs
To deal with the problem that some diseases have no semantic similarity or miRNAs have no functional similarity, here we propose a reasonable method: if SS(d_{i}, d_{j}) (the semantic similarity of disease d_{i} and d_{j}) exists, the similarity of these two diseases will finally be
the average of Gaussian interaction profile kernel similarity and semantic similarity; otherwise, it will be only GS_{d}(d_{i}, d_{j}) (Gaussian interaction profile kernel similarity). In the same way, if FSM(m_{i}, m_{j}) (the functional similarity of miRNA m_{i} and m_{j}) exists, the similarity of these two miRNAs will finally be
the average of Gaussian interaction profile kernel similarity and functional similarity; otherwise, it will be only GS_{m}(m_{i}, m_{j}) (Gaussian interaction profile kernel similarity).
Noise visualization
From HMDD v2.0, we download 5430 miRNAdisease associations as a positive sample. According to the research in AEMDA [24], there are 12,034 known pairs in HMDD v3.0. Therefore, if we choose negative samples randomly, we estimate that it will obtain the data containing about 3.59% of the noise.
To illustrate the impact of the noise, we design the experiment as follows:

1.
First, we extract 200 positive samples and 200 negative samples as noisefree data from the UCI ML Breast Cancer Wisconsin (Diagnostic) dataset [34].

2.
Then, we deliberately change 7 positive samples’ labels in the noisefree data into negative labels to simulate the noise hiding in data and form the noise data. The situation process is shown in Fig. 6. The red dots represent the noise hiding in the data. The blue dots and the black ones represent positive samples and negative samples, respectively. It is shown that the decision boundaries are different because of the noise in the two situations.

3.
Further, we maintain positive samples and negative samples 200 each in the noisy data to make sure the experiment is rigorous.

4.
Finally, we use the logistic regression algorithm on both noisefree and noise data to demonstrate the interference caused by the noise. The results are listed in Table 3.
Further, the experiments can prove that the noise hiding in the data affects the final results of miRNAdisease associations prediction to a certain extent. To be specific, the noise hiding in the data is close to positive samples, which can cause interference to algorithms on judging positive samples.
Method for negative samples selection
Inspired by ABMDA [19], here we use the kmeans algorithm [35] to select negative samples. The specific process is as follows: we cluster all undetected miRNAdisease pairs into 23 clusters by kmeans. The similar pairs will be in the same cluster after clustering, which makes the noise in the same cluster and distinguished easily. Then, we extract equal amounts of samples from each cluster as negative samples in a way that the noise can be reduced to some extent.
Antinoise computational model for miRNAdisease associations prediction
To further resist the noise, we propose a subsampling method for noise smoothing motivated by Ho [36]. In detail, we construct several subsets by sampling with replacement from the original data.
Then, we feed each subset to LightGBM [37], which is an ensemble algorithm based on GBDT [38]. In each learning iteration, the basic model of LightGBM learns the residual result from the previous iteration so that it can improve the performance. What’s more, LightGBM utilizes two significant techniques: Gradientbased OneSide Sampling (GOSS) for data samples and Exclusive Feature Bundling (EFB) for features. To be specific, GOSS can maintain the examples with large gradients and randomly picks examples with small gradients, which reduces the training cost. EFB can bundle many exclusive features to fewer dense features, which further reduces the cost of calculating for zero feature values.
The eventual result is an average of each subset’s prediction result. The detailed steps of the ANMDA are shown in Fig. 7.
Availability of data and materials
The data and materials are available from https://github.com/BioInfoLeo/ANMDA
Abbreviations
 miRNA:

MicroRNA
 ANMDA:

Antinoise algorithm for predicting miRNAdisease associations
 LightGBM:

Light gradient boosting machine
 HMDD:

Human microRNA disease database
 ncRNA:

Noncoding RNA
 ROC:

Receiver operating characteristic
 PR:

Preciserecall
 AUROC:

Area under the receiver operating characteristic curve
 AUPR:

Area under the preciserecall curve
 kNN:

kNearest neighbor
 MLP:

Multilayer perceptron
 DAG:

Directed acyclic graph
 GOSS:

Gradientbased oneside sampling
 EFB:

Exclusive feature bundling
References
 1.
Stark A, Brennecke J, Bushati N. Animal microRNAs confer robustness to gene expression and have a significant impact on 3’UTR evolution. Cell. 2005;123(6):1133–46.
 2.
Hayashita Y, Osada H, Tatematsu Y. A polycistronic microRNA cluster, miR1792, is overexpressed in human lung cancers and enhances cell proliferation. Cancer Res. 2005;65(21):9628–32.
 3.
Hatfield SD, Shcherbata HR, Fischer KA. Stem cell division is regulated by the microRNA pathway. Nature. 2005;435(7044):974–8.
 4.
Kozomara A, Birgaoanu M, GriffithsJones S. miRBase: from microRNA sequences to function. Nucleic Acids Res. 2019;47:D155–62.
 5.
Toxopeus E, LynamLennon N, Biermann K. Tumor microRNA126 controls cell viability and associates with poor survival in patients with esophageal adenocarcinoma. Exp Biol Med. 2019;244(14):1210–9.
 6.
Sharma S, Lu HC. microRNAs in neurodegeneration: current findings and potential impacts. J Alzheimers Dis Parkinsonism. 2018;8(1):420.
 7.
Pofi R, Giannetta E, Galea N, Francone M, Campolo F, Barbagallo F, et al. Diabetic cardiomiopathy progression is triggered by miR122–5p and involves extracellular matrix: a 5year prospective study. JACC. Cardiovascular Imaging. 2020.
 8.
Li L, Masica D, Ishida M. Human bile contains microRNAladen extracellular vesicles that can be used for cholangiocarcinoma diagnosis. Hepatology. 2014;60(3):896–907.
 9.
PerezIratxeta C, Wjst M, Bork P. G2D: a tool for mining genes associated with disease. BMC Genet. 2005;6:45.
 10.
Chen X, Xie D, Zhao Q. MicroRNAs and complex diseases: from experimental results to computational models. Brief Bioinform. 2019;20:515–39.
 11.
Jiang Q, Hao Y, Wang G. Prioritization of disease microRNAs through a human phenomemicroRNAome network. BMC Syst Biol. 2010;4:S2.
 12.
Chen X, Yan CC, Zhang X. WBSMDA: within and between score for MiRNAdisease association prediction. Sci Rep. 2016;6:21106.
 13.
Shi H, Xu J, Zhang G. Walking the interactome to identify human miRNAdisease associations through the functional link between miRNA targets and disease genes. BMC Syst Biol. 2013;7:101.
 14.
You Z, Huang ZA, Zhu ZX. PBMDA: A novel and effective pathbased computational model for miRNAdisease association prediction. PLoS Comput. Biol. 2017;13(3):e1005455.
 15.
Qu Y, Zhang HX, Liang C. KATZMDA: prediction of miRNAdisease associations based on KATZ model. IEEE Access. 2018;6:3943–50.
 16.
Chen X, Wu QF, Yan GY. RKNNMDA: rankingbased KNN for miRNAdisease association prediction. RNA Biol. 2017;14(7):952–62.
 17.
Ha J, Park C, Park S. PMAMCA: prediction of microRNAdisease association utilizing a matrix completion approach. BMC Syst Biol. 2019;13:33.
 18.
Zhu X, Wang X, Zhao H. BHCMDA: A new biased heat conduction based method for potential MiRNADisease association prediction. Front Genet. 2020;11:384.
 19.
Zhao Y, Chen X, Yin J. Adaptive boostingbased computational model for predicting potential miRNAdisease associations. Bioinformatics. 2019;35(22):4730–8.
 20.
Zhou S, Wang SL, Wu Q. Predicting potential miRNAdisease associations by combining gradient boosting decision tree with logistic regression. Comput Biol Chem. 2020;85:107200.
 21.
Yao DJ, Zhan XJ, Kwoh CK. An improved random forestbased computational model for predicting novel miRNAdisease associations. BMC Bioinform. 2019;20:624.
 22.
Peng LH, Zhou LQ, Chen X. A computational study of potential miRNAdisease association inference based on ensemble learning and kernel ridge regression. Front Bioeng Biotechnol. 2020;8:40.
 23.
Peng JJ, Hui WW, Li QQ. A learningbased framework for miRNAdisease association identification using neural networks. Bioinformatics. 2019;35(21):4364–71.
 24.
Ji C, Gao Z, Ma X, Wu Q, Ni J, Zheng C. AEMDA: Inferring miRNAdisease associations based on deep autoencoder. Bioinformatics. 2020; 29:btaa670.
 25.
Chen X, Li TH, Zhao Y. Deepbelief network for predicting potential miRNAdisease associations. Brief Bioinform. 2020:bbaa186.
 26.
Li J, Li Z, Nie R. FCGCNMDA: predicting miRNAdisease associations by applying fully connected graph convolutional networks. Mol Genet Genomics. 2020;295(5):1197–209.
 27.
Li Y, Qiu C, Tu J. HMDD v2.0: a database for experimentally supported human microRNA and disease associations. Nucleic Acids Res. 2013;42(D1): D1070–4.
 28.
Hsu JB, Chiu CM, Hsu SD. miRTar: an integrated system for identifying miRNAtarget interactions in human. BMC Bioinformatics. 2011;12:300.
 29.
Resnik P. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence  Volume 1, IJCAI’95. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1995, pp. 448–453.
 30.
Wang D, Wang J, Lu M. Inferring the human microRNA functional similarity and functional network based on microRNAassociated diseases. Bioinformatics. 2010;26(13):1644–50.
 31.
Xuan P, Han K, Guo M. Correction: Prediction of microRNAs Associated with Human Diseases Based on Weighted k Most Similar Neighbors. PLoS One. 2013;8(9):10.1371.
 32.
Van Laarhoven T, Nabuuxs SB, Marchiori E. Gaussian interaction profile kernels for predicting drugtarget interaction. Bioinformatics. 2011;27(21):3036–43.
 33.
Chen X, Yan GY. Novel human lncRNAdisease association inference based on lncRNA expression profiles. Bioinformatics. 2013;29(20):2617–24.
 34.
The UCI ML Breast Cancer Wisconsin (Diagnostic) dataset. https://goo.gl/U2Uwz2
 35.
Hartigan JA, Wong MA. A Kmeans clustering algorithm. J Roy Stat Soc: Ser C (Appl Stat). 1979;28(1):100–8.
 36.
Ho TK. The random subspace method for constructing decision forests. Pattern Anal Mach Intell. 1998;20(8):832–44.
 37.
Ke G, Meng Q, Finely T. LightGBM: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst. 2017;30:3146–54.
 38.
Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Stat. 2001;29(5):1189–232.
Acknowledgements
We would like to thank anonymous reviewers for their comments and suggestions.
Funding
This work was partially supported by grants from the National Key R&D Program of China (2019YFA0110802 and 2019YFA0802800), the Fundamental Research Funds for the Central Universities. The funding bodies did not play any roles in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Author information
Affiliations
Contributions
XJC, XYH and ZRJ designed the experiments and analyzed the data. XJC, XYH performed the experiments. XJC, XYH and ZRJ wrote the paper. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1
. The top 200 miRNAdisease associations predicted by ANMDA
Additional file 2
. The case studies of ANMDA
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Chen, XJ., Hua, XY. & Jiang, ZR. ANMDA: antinoise based computational model for predicting potential miRNAdisease associations. BMC Bioinformatics 22, 358 (2021). https://doi.org/10.1186/s12859021042666
Received:
Accepted:
Published:
Keywords
 miRNAdisease association
 kmeans
 Noise smoothing
 Light gradient boosting machine