TAM: A method for enrichment and depletion analysis of a microRNA category in a list of microRNAs
© Lu et al. 2010
Received: 11 March 2010
Accepted: 9 August 2010
Published: 9 August 2010
Skip to main content
© Lu et al. 2010
Received: 11 March 2010
Accepted: 9 August 2010
Published: 9 August 2010
MicroRNAs (miRNAs) are a class of important gene regulators. The number of identified miRNAs has been increasing dramatically in recent years. An emerging major challenge is the interpretation of the genome-scale miRNA datasets, including those derived from microarray and deep-sequencing. It is interesting and important to know the common rules or patterns behind a list of miRNAs, (i.e. the deregulated miRNAs resulted from an experiment of miRNA microarray or deep-sequencing).
For the above purpose, this study presents a method and develops a tool (TAM) for annotations of meaningful human miRNAs categories. We first integrated miRNAs into various meaningful categories according to prior knowledge, such as miRNA family, miRNA cluster, miRNA function, miRNA associated diseases, and tissue specificity. Using TAM, given lists of miRNAs can be rapidly annotated and summarized according to the integrated miRNA categorical data. Moreover, given a list of miRNAs, TAM can be used to predict novel related miRNAs. Finally, we confirmed the usefulness and reliability of TAM by applying it to deregulated miRNAs in acute myocardial infarction (AMI) from two independent experiments.
TAM can efficiently identify meaningful categories for given miRNAs. In addition, TAM can be used to identify novel miRNA biomarkers. TAM tool, source codes, and miRNA category data are freely available at http://cmbi.bjmu.edu.cn/tam.
MicroRNAs (miRNAs) are one class of newly identified important cellular components . At the posttranscriptional level, miRNAs normally act as negative gene regulators by binding to the 3'UTR of target mRNAs through base pairing, which results in the cleavage of target mRNAs or translation inhibition . Increasing evidences suggest that miRNAs play crucial roles in nearly all important biological processes, including cell growth, proliferation, differentiation, development, and apoptosis , and that miRNA dysfunctions are associated with various diseases . Since their discovery, the number of identified miRNAs has been increasing dramatically and various high-throughput techniques related to miRNAs are continuously being developed. Microarrays, for example, generate experimental data at rates that exceed knowledge growth. To mine meaningful information of miRNAs, a number of tools and databases have been presented [4–12]. Among these resources, the tools for searching for the gene sets (i.e. KEGG pathways and Gene Ontology) that may be affected by one or multiple miRNAs represent some of the most important tools in miRNA bioinformatics [6, 10, 11]. A common point of these methods is that they obtain the meaningful gene sets by enrichment analysis of the in-silico predicted miRNA targets. The first limitation of these methods is the high false positives and high false negatives of the predicted miRNA targets . The second limitation of these methods is that they perform analysis based on target genes and only focus on significantly enriched gene sets and therefore may fail to find some functions or biological processes associated with the inputted miRNAs. For example, miR-18a is known to be related to apoptosis , but these methods fail to find the pathway "apoptosis" for miR-18a. Finally, it seems difficult for those methods to find novel miRNAs that are related to the inputted miRNAs. Therefore, for a list of miRNAs, for example the upregulated and/or downregulated miRNAs from a miRNA microarray experiment, novel methods are needed to find the patterns behind these miRNAs.
Most of the current tools for miRNA functional annotation are based on predicted miRNA targets, mainly, because of the lack of miRNA knowledge resources. However, functional resources for protein-coding genes are easily available. Therefore, for protein-coding genes, a large number of programs for the annotation of lists of genes have been developed  because various gene resources such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway http://www.genome.jp/kegg/ and the Online Mendelian Inheritance in Man (OMIM) compendium http://www.ncbi.nlm.nih.gov/omim/ are available for protein-coding genes. Developing miRNA annotation tools should become more feasible as meaningful miRNA resources are collected. In this study, TAM, a web-accessible program for this purpose is presented. In TAM, miRNAs are integrated into different categories according to the miRNA family, genome locations, functions, associated diseases, and tissue specificity. TAM then evaluates the statistical significance (i.e., overrepresentation or underrepresentation) of each miRNA category among lists of miRNAs using the hypergeometric test. TAM is also able to search for novel miRNAs related to a given list of miRNAs. Finally, we applied TAM to the upregulated miRNAs and downregulated miRNAs in acute myocardial infarction (AMI). As expected, different meaningful miRNA categories have been identified for upregulated and downregulated miRNAs, respectively. This suggested that TAM could be an efficient method and tool for the annotation of meaningful miRNA categories for a list of miRNAs. TAM represents an alternative tool for the processing of outputs of high throughput miRNA experiments.
Options provided by the TAM tool
miRNA categories integrated according to miRNA conservation
miRNA categories integrated according to miRNA genome location
miRNA categories integrated according to their function
miRNA associated disease
miRNA categories integrated according to the associated disease
miRNA tissue specificity
miRNA categories integrated according to their tissue specificity
Significant miRNA categories of upregulated and downregulated miRNAs in AMI obtained by TAM
(miR-31; miR-18a;miR-18b; miR-214;miR-223;miR-923; miR-711;miR-199a)
(miR-499;miR-29b;miR-126; miR-1;miR-181d;miR-181c; miR-451;miR-26b)
Akt pathway;cell cycle; HIV latency;Hormones regulation; stem cell regulation;immune; inflammation;oncogenic
Cardiogenesis; Hormones regulation; tumor suppressor; Muscle development
Cancer; hypertrophic cardiomyopathy; atrophic muscular disorders;
Cancer; Cardiac Arrhythmias; Cardiomegaly; Coronary Artery Disease; polycythemia vera;
To valid our method, we applied TAM to the deregulated miRNAs of AMI from another independent miRNA expression profiling experiment of AMI rat model by Rooij et al.. In their study, Rooij et al. identified 39 upregulated miRNAs and 46 downregulated miRNAs, respectively. As a result, although the deregulated miRNAs from Rooij et al.' experiment seem quite different from those of Shi et al.', the enriched miRNA categories identified by TAM have a good consistency across these two independent experiments. For example, the upregulated miRNAs from Rooij et al.' experiment are also enriched in miR-199a cluster (P = 4.33 × 10-3), miR-199 family (P = 4.33 × 10-3), cell cycle (P = 6.37 × 10-3), stem cell regulation (P = 1.82 × 10-6), inflammation (P = 3.14 × 10-3), and onco-miRNAs (P = 5.73 × 10-5). For HMDD category, the upregulated miRNAs are also enriched in various cancer, hypertrophic cardiomyopathy (P = 0.04) and atrophic muscular disorders (P = 4.54 × 10-12); the downregulated miRNAs from Rooij et al.' experiment are also enriched in miR-29a cluster (P = 7.37 × 10-3), miR-29b cluster (P = 7.37 × 10-3), hormones regulation (P = 2.14 × 10-7), miRNA tumor suppressor (P = 9.23 × 10-3). For HMDD category, the downregulated miRNAs are also enriched in various cancer, and polycythemia vera (P = 7.01 × 10-3).
As discussed previously, one of the limitations of target-based pathway enrichment analysis of miRNAs is that it can not predict novel miRNAs related to the inputted miRNAs. For TAM, it is very easy to perform this kind of analysis because TAM integrated miRNAs directly but not integrated miRNAs through miRNA targets. In the enriched miRNA category, the other miRNAs that are not included in the input miRNA list could be potential novel miRNAs related to the inputted miRNAs. For example, TAM analysis showed that the 16 deregulated miRNAs in AMI from Shi et al.'s study are enriched in the function of muscle development (P = 0.04). Among the 11 miRNAs in this category, two (miR-1 and miR-499) are included in the 16 inputted miRNAs. The other 9 miRNAs (miR-24, miR-124, miR-133a, miR-23a, miR-133b, miR-206, miR-221, miR-222, and miR-208b) in this category are predicted to be potential novel AMI related miRNAs. We confirmed four (miR-24, miR-133a, miR-221, and miR-222) of the nine miRNAs (44.4%) are related to AMI based on the deregulated miRNAs from another independent study by Rooij et al.. The results indicate that TAM is a highly reliable tool for predicting novel miRNAs that are related to inputted miRNAs.
As the rapid development of high-throughput biological techniques, it is increasingly important to mine meaningful patterns for a given list of miRNAs. As described above, TAM represents one important tool for this purpose. Unlike tools based on in-silico predicted miRNA targets, TAM integrated miRNAs into groups directly based on miRNA annotations. Therefore, TAM represents a new class of methods for the above purpose and represents an alternative tool for the annotations of a given list of miRNAs. Furthermore, TAM is able to predict novel miRNAs that are related to the inputted miRNAs. This enables users to find novel miRNA biomarkers for their experiments. In addition, TAM is highly dependent on the data of integrated miRNA sets and will be improved greatly when more miRNA annotation data becomes available in the future.
In this study, we presented a method to identify overrepresented and/or underrepresented miRNA categories for a given list of miRNAs. Moreover, an online tool, TAM, for annotations of human miRNAs based on various miRNA sets is developed. After applying TAM to deregulated miRNAs in AMI, we show that the upregulated miRNAs and the downregulated miRNAs in AMI are enriched in different and even opposite miRNA categories, which is helpful for the understanding of AMI. In addition, TAM can be used to predict novel miRNAs that are mostly related to the input miRNAs. TAM is useful for providing potential clues for miRNAs of interest. Furthermore, TAM is scalable and will grow and improve as more miRNA resources become available. In addition, TAM can be easily reconfigured for use with other species.
miRNA sets are defined as groups of miRNAs that have meaningful relationships. If any two miRNAs have meaningful relationships, for example they are associated with the same diseases, they are then integrated into one miRNA set. Here, miRNA sets were collected according to miRNA family, genome locations, function, associated diseases, and tissue specificity. Studies have indicated that miRNAs in one family are most likely derived from duplications of common ancestor miRNAs [18, 19], and tend to act together in various functional processes [20, 21]. Therefore, miRNAs in one family can be considered as one miRNA set. The miRNA family data from the miRBase database was downloaded  and utilized in this study.
miRNAs are not located randomly in the genome but tend to exist in clusters . MiRNAs in a cluster are likely to be co-transcribed and have similar expression patterns . Therefore, these clustered miRNAs may be involved in similar biological processes. In this study, miRNA clusters were identified by grouping miRNAs that were within a distance of 50 kb in the chromosomes, according to the observation of Baskerville and Bartel . The integrated miRNAs were also manually integrated into different sets according to their functions, as reported in publications. For example, miRNAs that were associated with the immune system were collected from a recent review paper published in Cell . The miRNA sets were generated by miRNA-associated diseases based on the Human MicroRNA Disease Database (HMDD, http://cmbi.bjmu.edu.cn/hmdd), a database for miRNA disease associations . The tissue-specific index values of miRNA were obtained from the study of Lu et al., and tissue-specific miRNA sets were generated by collecting miRNAs with tissue specificity index values of greater than or equal to 0.7. Finally, according to the methods described above, 257 miRNA sets were generated. These miRNA sets are available for download at the TAM website.
Finally, the P values for all miRNA sets are adjusted by Bonferroni and FDR corrections.
Project name: TAM.
Project home page: http://cmbi.bjmu.edu.cn/tam.
Operating system: Platform independent.
Programming language: Python.
Other requirements: Apache 1.22, Jquery, Extjs, and Django.
License: GPL v3.
This work was supported by the Natural Science Foundation of China (Grant No. 30900829) and was supported by Doctoral Fund of Ministry of Education of China (Grant No. 20090001120040).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.