Inferring plant microRNA functional similarity using a weighted protein-protein interaction network
© Meng et al. 2015
Received: 3 March 2015
Accepted: 20 October 2015
Published: 4 November 2015
MiRNAs play a critical role in the response of plants to abiotic and biotic stress. However, the functions of most plant miRNAs remain unknown. Inferring these functions from miRNA functional similarity would thus be useful. This study proposes a new method, called PPImiRFS, for inferring miRNA functional similarity.
The functional similarity of miRNAs was inferred from the functional similarity of their target gene sets. A protein-protein interaction network with semantic similarity weights of edges generated using Gene Ontology terms was constructed to infer the functional similarity between two target genes that belong to two different miRNAs, and the score for functional similarity was calculated using the weighted shortest path for the two target genes through the whole network. The experimental results showed that the proposed method was more effective and reliable than previous methods (miRFunSim and GOSemSim) applied to Arabidopsis thaliana. Additionally, miRNAs responding to the same type of stress had higher functional similarity than miRNAs responding to different types of stress.
For the first time, a protein-protein interaction network with semantic similarity weights generated using Gene Ontology terms was employed to calculate the functional similarity of plant miRNAs. A novel method based on calculating the weighted shortest path between two target genes was introduced.
MicroRNAs (miRNAs) are single-stranded noncoding RNAs and are typically ~22 nucleotides long. These molecules are involved in post-transcriptional regulation and trigger targeted degradation of messenger RNA or inhibit translation [1, 2]. In plants, the expression of miRNA genes is a multistep process. First, the miRNA gene is initially transcribed as a primary miRNA sequence (pri-miRNA) by RNA polymerase II. Then, the pri-miRNA is processed into a hairpin sequence (precursor miRNA) by the endoribonuclease Dicer. Next, the loop region of the precursor miRNA is removed from the hairpin to form a miRNA duplex (miRNA:miRNA*). Finally, the miRNA* strand is degraded, and the other miRNA strand, named the mature miRNA, is incorporated into the RNA-induced silencing complex (RISC) .
miRNAs that predominantly act as regulators of gene expression are involved in many plant biological processes such as development, nutrient homeostasis, biotic stress responses, abiotic stress responses and pathogen responses . Previous studies have verified that groups of miRNAs are involved in many biological processes [4, 5]. Therefore, miRNAs involved in the same biological process should have identical or similar group functions. Currently, the number of miRNAs with functional annotations is limited, and the functions of some miRNAs are only partly known. Therefore, research on miRNA function has received increasing attention. In recent years, biologists have compared the functions of miRNA genes and predicted the potential functions of miRNAs based on the relationship between miRNAs with known molecular functions or associated with a specific stressor and those with unknown functions.
To date, only a few computational models have been available for inferring the functional similarity among miRNAs. In one report, functional similarity scores of human miRNAs were computed based on human miRNA-disease association data . This computational method was implemented by measuring the similarity of miRNA-associated diseases structured as directed acyclic graphs and is similar to inferring the similarity of protein-coding genes by measuring the semantic similarity weights of Gene Ontology (GO) terms . In the Online Mendelian Inheritance in Man (OMIM) disease similarity network, a random walk was applied to predict potential disease-miRNA associations under the assumption that functionally related miRNAs are often associated with phenotypically similar diseases . The above two methods make full use of the associations among phenotypically similar diseases and obtained very satisfactory performance on the human data, but there are no disease similarity network data for plants, preventing the application of these strategies to plants. Because miRNAs are involved in biological processes through the regulation of their target transcripts, the functional similarity of miRNAs can be inferred by studying the associations of their target genes. In previous studies, several computational methods were proposed based on the associations between target genes. The simplest method used the proportion of the common target genes regulated by two miRNAs calculated by the Jaccard similarity measure . Each plant miRNA regulates a small number of target genes, and the target gene sets of most plant miRNAs have no intersections; therefore, most of the calculation results from the Jaccard similarity measure are zeros. Therefore, this method is also not suitable for plants. A systematic method for studying the functional similarity of human miRNAs was proposed . The functional similarity between two miRNAs was quantified by measuring the semantic similarity weights of the GO terms between two miRNA target genes. A new definition called a co-regulating functional module was introduced . The GO categories of the target gene sets of each pairwise set of miRNAs were used to test the significance of their co-regulated target genes using a hypergeometric test. The co-regulating functional modules were established using a protein-protein interaction network (PPIN). miRNAs that shared at least one co-regulating functional module were considered to have similar functions. The shortcomings of this method are that the results are only 0 or 1 and that it cannot generate numerical results to measure the level of similarity. Another method used a target gene network to measure the functional similarity of miRNAs. This method considered both the target site accessibility and the interactive context of the target genes in a functional gene network constructed with semantic similarity weights generated using the GO terms of the target genes . Because the GO annotations are incomplete, the functional gene network constructed may not be as realistic as those of networks confirmed by experimental data, such as PPI networks. PPINs have been widely used to predict protein function , protein complexes , and gene functional similarity . Furthermore, the functional similarity scores of human miRNAs were computed using a PPIN that quantified the associations between the miRNAs based on their targeting propensities and protein connectivity in an integrated PPIN .
Most of the existing computational methods have been designed specifically for human. Based on the above analysis, few of them can be used for plants. It is thus necessary to develop an effective and stable computational method for calculating functional similarity scores of plant miRNAs. This study proposes a novel computational method, called PPImiRFS, to obtain the functional similarity scores of miRNA pairs based on a PPIN with semantic similarity weights generated using GO terms and graph theoretic properties. The proposed method is available for download at our supporting website: https://github.com/kobe-liudong/PPImiRFS. The miRNA families, miRNA clusters and experimentally verified miRNAs associated with biotic and abiotic stress responses in Arabidopsis thaliana (A. thaliana) were used to evaluate and validate the performance of our method. Furthermore, a comparative analysis showed that our method was more effective and reliable than two widely used computational methods (miRFunSim  and GOSemSim ).
A. thaliana miRNA and mRNA
All of the A. thaliana mature miRNA sequences, A. thaliana miRNA families and genome coordinates of the miRNAs were downloaded from miRBase  (Release 21, June 2014). This release contains 427 mature sequences, 47 families containing more than one miRNA and 30 clusters (with 10 kb as the maximum inter-miRNA distance for two miRNA genes to be clustered together) . The A. thaliana candidate mRNAs were obtained from the TAIR database, which includes all of the transcribed sequences  (Release 10). The families and clusters of A. thaliana miRNAs are presented in Additional file 1 and Additional file 2, respectively.
A. thaliana miRNAs in response to stress
There is no publicly available database of A. thaliana miRNAs related to their response to abiotic and biotic stress; thus, we obtained 126 experimentally verified A. thaliana miRNAs associated with the stress response, including 12 types of abiotic stress and 3 types of biotic stress, by referring to 25 reports listed in Additional file 3, which also presents the 126 experimentally verified A. thaliana miRNAs that respond to various types of stress.
Topological characteristics of the A. thaliana PPIN
Proteins No. (%)
Interactions No. (%)
7,115 (64.8 %)
70,699 (79.9 %)
2,807 (25.6 %)
6,204 (7.0 %)
2,776 (25.3 %)
5,619 (6.4 %)
6,943 (63.2 %)
16,463 (18.6 %)
4,172 (38.0 %)
9,480 (10.7 %)
10,985 (100 %)
88,484 (100 %)
Construction of the WPPIN
The weights of the PPIN were computed by measuring the functional similarity of the target genes based on the semantic similarity of their GO terms. The functional similarity weights were calculated by GOSemSim , an R package that has implemented the compute methods of the semantic similarity. GOSemSim supports 19 species, including A. thaliana, human, mouse, and yeast. For PPImiRFS, we used geneSim in the GOSemSim package to calculate the semantic similarity between two target genes. In geneSim, a graph-based semantic similarity measurement method  is used. The GO data used in the experiment were collected and processed by the GOSemSim and its version is 2.14.0. Because GO is composed of three orthogonal ontologies, molecular function (MF), biological process (BP), and cellular component (CC), we calculated the semantic similarity weights of the GO terms of a pair of target genes using each of the three orthogonal ontologies separately and then constructed three WPPINs.
Prediction of the miRNA target genes
Results of A. thaliana miRNA target prediction
No. of miRNAs
No. of Targets
Functional similarity of target gene sets
A novel network-based weighted shortest path method was proposed to calculate the functional similarity between two target gene sets.
where n is the number of edges in the shortest path. The function max(x) means that F i, j is the maximum of all of the results calculated by the average accumulated weight method when there is more than one shortest path between gene i and gene j in the WPPIN. F i, j is equal to 1 when gene i and gene j are equivalent.
where m and n are the number of target genes of miRNA i and miRNA j , respectively, and n’ and m’ are the number of target genes that are not included in the WPPIN.
Results and discussion
Functional similarity of the miRNAs in the same family or cluster
Statistical analysis results of functional similarity of the intrafamily, interfamily and randomly selected miRNAs
Statistical analysis results of functional similarity of the intracluster, intercluster and randomly selected miRNAs
To verify our result, the other two methods (miRFunSim  and GOSemSim ) were applied to the above experiment, and the results based on family and cluster data are shown in Fig. 2c and d. The functional similarity scores among the three classes of miRNA pairs are significantly different. The statistical analysis results of these methods based on family and cluster data are also shown in Tables 3 and 4, respectively.
In conclusion, the above two methods produce the same results as those obtained using PPImiRFS and clearly verify the utility of PPImiRFS.
Functional similarity of miRNAs responding to identical types of stress
Performance evaluation of PPImiRFS
Comparison with existing similar methods
Recently, several computational methods have been proposed for quantifying the functional similarity scores of miRNAs. In this section, we selected two methods, miRFunSim and GOSemSim, for comparison with the proposed PPImiRFS method. miRFunSim is used to calculate the functional similarity between miRNAs based on the PPI data, and it only utilizes the structural features of PPI networks. One report has found that weighted PPI networks are more effective than unweighted PPI networks . Because the GO data are incomplete, there are many null values in the result of GOSemSim, thereby affecting its performance. The proposed PPImiRFS not only considers the structural features of the PPI network but also includes the GO similarity weighting, which may allow it to overcome the deficiencies present in the above two methods.
Figure 6 shows the comparisons of the PPImiRFS with other methods when ClusterONE is used. The proposed method outperformed the two previous methods, with the exception of its slightly lower sensitivity. Figure 7 shows the comparisons of the PPImiRFS with other methods when Connected Component Cluster is used. Although the highest precision and sensitivity were achieved in the network constructed with GOSemSim, this occurred because the number of clusters predicted was very small, including an impossibly large cluster containing 393 miRNAs. Therefore, most of the miRNAs in the benchmark clusters were included in this very large cluster, giving a very high sensitivity, and most of the members of other clusters were in the same family, meaning that they appeared in the benchmark clusters, giving a relatively high precision. Therefore, the network computed with GOSemSim was not truly better than those generated using PPImiRFS and miRFunSim.
In conclusion, the PPImiRFS method is more effective and reliable for quantifying the relationships between miRNAs than other available similar methods.
Top 5 prediction results for miRNAs responding to high-salt conditions and TMV-Cg stress
Availability of PPImiRFS
To our knowledge, most of the existing methods mentioned previously have not been implemented as publicly available software packages. Therefore, their availability is limited. In this study, we not only introduced a novel computing method but also implemented a publicly available software package. This software package is composed of a main program, data pre-processing programs, and A. thaliana data. PPImiRFS is a console application programmed in C++, and the data pre-processing programs are implemented in Perl and R. The target gene sets of the miRNAs of the species to be inferred and the WPPIN data are required before the software can be run. The current version of the software package includes the necessary datasets for A. thaliana, and we will integrate datasets from additional species into future versions of the software. If users are interested in applying the current version of the software package to other species, all the necessary programs for generating the required datasets are provided. To use the software, users only need to input a file that includes their miRNA pairs of interest. The functional similarity scores of these miRNA pairs will be calculated automatically, and a file will be created that contains all of the functional similarity scores of the miRNA pairs in the input file. Our software was tested on a PC (2.5 GHz cup, 2 GB RAM) and required 0.13 h, 1.11 h and 6.00 h to finish with input files of 100, 1000 and 5000 miRNA pairs, respectively. The software is available at https://github.com/kobe-liudong/PPImiRFS.
In this study, we proposed a novel computational method to quantify the functional similarity between a pair of plant miRNAs based on a PPIN with GO term semantic similarity weights. For the convenience of other researchers, we implemented our proposed method as a publicly available software package for local use. This study revealed that the functions of miRNAs responding to the same type of stress (abiotic or biotic) appeared more similar using the proposed method than those of miRNAs not responding to the same type of stress. By computing the functional similarity scores of intrafamily, interfamily and randomly selected miRNAs and intracluster, intercluster and randomly selected miRNAs, the miRNAs in the same family or cluster were shown to have higher functional similarity scores. These results suggest that our method can correctly identify the functional similarities and differences between miRNAs in different groups. Furthermore, in a comparison with other similar computational methods, our proposed method achieved the most effective and reliable performance.
Qualifying the functional similarity of miRNAs is based on a PPIN and predicted target gene sets, and the utilized plant PPIN has very low coverage and is often associated with high rates of false positives and false negatives. In addition, the predicted targets often have high false positive rates. Thus, our method will achieve higher performance as the quality of the PPIN increases, and improved target prediction methods are proposed. Lastly, PPImiRFS can be applied to any plant species with a PPIN and GO data.
The current study was supported by the National Natural Science Foundation of China (Nos. 31272167, 61472061, and 31471880).
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Ruvkun G. Glimpses of a tiny RNA world. Science. 2001;294(5543):797–9.View ArticlePubMedGoogle Scholar
- Ambros V. A hierarchy of regulatory genes controls a larva-to-adult developmental switch in C. elegans. Cell. 1989;57(1):49–57.View ArticlePubMedGoogle Scholar
- Bartel DP. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell. 2004;116(2):281–97.View ArticlePubMedGoogle Scholar
- Hwang H, Mendell J. MicroRNAs in cell proliferation, cell death, and tumorigenesis. Br J Cancer. 2006;94(6):776–80.View ArticlePubMedPubMed CentralGoogle Scholar
- Xu P, Guo M, Hay BA. MicroRNAs and the regulation of cell death. TRENDS Genetics. 2004;20(12):617–24.View ArticleGoogle Scholar
- Wang D, Wang J, Lu M, Song F, Cui Q. Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases. Bioinformatics. 2010;26(13):1644–50.View ArticlePubMedGoogle Scholar
- Wang JZ, Du Z, Payattakool R, Philip SY, Chen C-F. A new method to measure the semantic similarity of GO terms. Bioinformatics. 2007;23(10):1274–81.View ArticlePubMedGoogle Scholar
- Chen H, Zhang Z. Prediction of Associations between OMIM Diseases and MicroRNAs by Random Walk on OMIM Disease Similarity Network. The Scientific World Journal. 2013. doi: 10.1155/2013/204658Google Scholar
- Shalgi R, Lieber D, Oren M, Pilpel Y. Global and local architecture of the mammalian microRNA–transcription factor regulatory network. PLoS Comput Biol. 2007;3(7):e131.View ArticlePubMedPubMed CentralGoogle Scholar
- Yu G, Xiao C-L, Bo X, Lu C-H, Qin Y, Zhan S, et al. A new method for measuring functional similarity of microRNAs. J Integrated Omics. 2010;1(1):49–54.Google Scholar
- Xu J, Li C-X, Li Y-S, Lv J-Y, Ma Y, Shao TT, et al. MiRNA–miRNA synergistic network: construction via co-regulating functional modules and disease miRNA topological features. Nucleic Acids Res. 2011;39(3):825–36.View ArticlePubMedGoogle Scholar
- Xu Y, Guo M, Liu X, Wang C, Liu Y. Inferring the soybean (Glycine max) microRNA functional network based on target gene network. Bioinformatics. 2014;30(1):94–103.View ArticlePubMedGoogle Scholar
- Chua HN, Sung W-K, Wong L. Exploiting indirect neighbors and topological weight to predict protein function from protein–protein interactions. Bioinformatics. 2006;22(13):1623–30.View ArticlePubMedGoogle Scholar
- Liu G, Wong L, Chua HN. Complex discovery from weighted PPI networks. Bioinformatics. 2009;25(15):1891–7.View ArticlePubMedGoogle Scholar
- Wang Q, Sun J, Zhou M, Yang H, Li Y, Li X, et al. A novel network-based method for measuring the functional relationship between gene sets. Bioinformatics. 2011;27(11):1521–8.View ArticlePubMedGoogle Scholar
- Sun J, Zhou M, Yang H, Deng J, Wang L, Wang QH. Inferring potential microRNA-microRNA associations based on targeting propensity and connectivity in the context of protein interaction network. PLoS One. 2013;8(7):e69719.View ArticlePubMedPubMed CentralGoogle Scholar
- Griffiths-Jones S, Saini HK, Van Dongen S, Enright AJ. miRBase: tools for microRNA genomics. Nucleic Acids Res. 2008;36 Suppl 1:D154-D158.Google Scholar
- Zhou M, Sun J, Wang QH, Song LQ, Zhao G, Wang HZ, et al. Genome‐wide analysis of clustering patterns and flanking characteristics for plant microRNA genes. FEBS J. 2011;278(6):929–40.View ArticlePubMedGoogle Scholar
- Rhee SY, Beavis W, Berardini TZ, Chen G, Dixon D, Doyle A, et al. The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucleic Acids Res. 2003;31(1):224–8.View ArticlePubMedGoogle Scholar
- Dai X, Zhao PX. psRNATarget: a plant small RNA target analysis server. Nucleic Acids Res. 2011;39 Suppl 2:W155–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Fahlgren N, Howell MD, Kasschau KD, Chapman EJ, Sullivan CM, Cumbie JS, et al. High-throughput sequencing of Arabidopsis microRNAs: evidence for frequent birth and death of MIRNA genes. PLoS One. 2007;2(2):e219.View ArticlePubMedPubMed CentralGoogle Scholar
- Couto FM, Silva M, Coutinho P. Measuring semantic similarity between Gene Ontology terms. Data Knowl Eng. 2007;61(1):137–52.View ArticleGoogle Scholar
- Schlicker A, Domingues FS, Rahnenführer J, et al. A new measure for functional similarity of gene products based on Gene Ontology. BMC Bioinform. 2006;7(12):302.View ArticleGoogle Scholar
- Brandão MM, Dantas LL, Silva-Filho MC. AtPIN: Arabidopsis thaliana protein interaction network. BMC Bioinform. 2009;10(1):454.View ArticleGoogle Scholar
- Lin M, Zhou X, Shen X, Mao C, Chen X. The predicted Arabidopsis interactome resource and network topology-based systems biology analyses. Plant Cell Online. 2011;23(3):911–22.View ArticleGoogle Scholar
- Stark C, Breitkreutz B-J, Reguly T, Boucher L, Breitkreutz A, Tyers M. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2006;34 Suppl 1:D535–9.View ArticlePubMedGoogle Scholar
- Aranda B, Achuthan P, Alam-Faruque Y, Armean I, Bridge A, Derow C, et al. The IntAct molecular interaction database in 2010. Nucleic Acids Res. 2010;38 Suppl 1:D525–31.View ArticlePubMedGoogle Scholar
- Yu G, Li F, Qin Y, Bo X, Wu Y, Wang SQ. GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics. 2010;26(7):976–8.View ArticlePubMedGoogle Scholar
- Meng J, Shi L, Luan Y. Plant microRNA-Target Interaction Identification Model Based on the Integration of Prediction Tools and Support Vector Machine. PLoS One. 2014;9(7):e103181.View ArticlePubMedPubMed CentralGoogle Scholar
- Bartel DP. MicroRNAs: target recognition and regulatory functions. Cell. 2009;136(2):215–33.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhang YF, Zhang R, Su B. Diversity and evolution of MicroRNA gene clusters. Sci China C-Life Sci. 2009;52(3):261–6.View ArticlePubMedGoogle Scholar
- Price T, Pena FI, Cho YR. Survey: Enhancing protein complex prediction in PPI networks with GO similarity weighting. Interdiscip Sci. 2013;5(3):196–210.View ArticlePubMedGoogle Scholar
- Morris JH, Apeltsin L, Newman AM, Baumbach J, Wittkop T, Su G, et al. clusterMaker: a multi-algorithm clustering plugin for Cytoscape. BMC Bioinform. 2011;12(1):14.View ArticleGoogle Scholar
- Nepusz T, Yu HY, Paccanaro A. Detecting overlapping protein complexes in protein-protein interaction networks. Nat Methods. 2012;9(5):471–U481.View ArticlePubMedPubMed CentralGoogle Scholar
- Li XL, Wu M, Kwoh CK, Ng SK. Computational approaches for detecting protein complexes from protein interaction networks: a survey. BMC Genomics. 2010;11 Suppl 1:19.View ArticlePubMedPubMed CentralGoogle Scholar
- Khraiwesh B, Zhu J-K, Zhu J. Role of miRNAs and siRNAs in biotic and abiotic stress responses of plants. Biochim Biophys Acta. 2012;1819(2):137–48.View ArticlePubMedGoogle Scholar
- Liu H-H, Tian X, Li Y-J, Wu C-A, Zheng C-C. Microarray-based analysis of stress-regulated microRNAs in Arabidopsis thaliana. RNA. 2008;14(5):836–43.View ArticlePubMedPubMed CentralGoogle Scholar
- Liang G, He H, Yu D. Identification of nitrogen starvation-responsive microRNAs in Arabidopsis thaliana. PLoS One. 2012;7(11):e48951.View ArticlePubMedPubMed CentralGoogle Scholar
- Vidal EA, Araus V, Lu C, Parry G, Green PJ, Coruzzi GM, et al. Nitrate-responsive miR393/AFB3 regulatory module controls root system architecture in Arabidopsis thaliana. Proc Natl Acad Sci. 2010;107(9):4477–82.View ArticlePubMedPubMed CentralGoogle Scholar
- Tagami Y, Inaba N, Kutsuna N, Kurihara Y, Watanabe Y. Specific enrichment of miRNAs in Arabidopsis thaliana infected with Tobacco mosaic virus. DNA Res. 2007;14(5):227–33.View ArticlePubMedPubMed CentralGoogle Scholar
- Moldovan D, Spriggs A, Yang J, Pogson BJ, Dennis ES, Wilson IW. Hypoxia-responsive microRNAs and trans-acting small interfering RNAs in Arabidopsis. J Exp Bot. 2009;61(1):165–77.View ArticlePubMed CentralGoogle Scholar
- Hsieh L-C, Lin S-I, Shih AC-C, Chen J-W, Lin W-Y, Tseng CY, et al. Uncovering small RNA-mediated responses to phosphate deficiency in Arabidopsis by deep sequencing. Plant Physiol. 2009;151(4):2120–32.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhao M, Ding H, Zhu JK, Zhang F, Li WX. Involvement of miR169 in the nitrogen‐starvation responses in Arabidopsis. New Phytol. 2011;190(4):906–15.View ArticlePubMedPubMed CentralGoogle Scholar