miRNAminer: A tool for homologous microRNA gene search
BMC Bioinformatics volume 9, Article number: 39 (2008)
MicroRNAs (miRNAs), present in most metazoans, are small non-coding RNAs that control gene expression by negatively regulating translation through binding to the 3'UTR of mRNA transcripts. Previously, experimental and computational methods were used to construct miRNA gene repositories agreeing with careful submission guidelines.
An algorithm we developed – miRNAminer – is used for homologous conserved miRNA gene search in several animal species. Given a search query, candidate homologs from different species are tested for their known miRNA properties, such as secondary structure, energy and alignment and conservation, in order to asses their fidelity. When applying miRNAminer on seven mammalian species we identified several hundreds of high-confidence homologous miRNAs increasing the total collection of (miRbase) miRNAs, in these species, by more than 50%. miRNAminer uses stringent criteria and exhibits high sensitivity and specificity.
We present – miRNAminer – the first web-server for homologous miRNA gene search in animals. miRNAminer can be used to identify conserved homolog miRNA genes and can also be used prior to depositing miRNAs in public databases. miRNAminer is available at http://pag.csail.mit.edu/mirnaminer.
MicroRNAs (miRNAs) are short, ~22 nt non-coding RNAs that control gene expression. miRNAs bind to the 3'UTRs of their regulated mRNA transcripts to facilitate mRNA degradation or translation inhibition [1, 2]. miRNAs are present in most metazoans and are thought to regulate a diverse range of biological processes [3, 4]. miRNA genes' evolution is unique since they mostly emerge through duplication events ; exhibit most often unidirectional evolution ; are generally gained and not lost ; show several cases of rapid evolution in primates [8, 9]; are rarely changed due to functional constraints ; and, show relatively rare evolutionary acquisition events (accounted by their relatively small numbers).
miRNA predictions based on computational methods, which followed initial extensive cloning efforts, are based on the secondary structure of the miRNA, its phylogenetic conservation and thermodynamic stability [11, 12]. miRNA gene repositories are constantly expanding giving rise to more than 3500 reported miRNAs in more than 30 animal species (Sanger miRbase database, Version 10.0 [13, 14]). However, it is apparent that even this comprehensive repository is far from completion, accounted by the very few miRNAs listed for dog (6) and chimpanzee (83), compared to human (533), to name two examples. Since these differences cannot be accounted merely by species-specific miRNAs, we saw the need for a computational tool for miRNA homologous searches.
We present miRNAminer, a tool for automatic identification of homolog miRNAs based on a given user defined query miRNA. The tool exploits numerous characteristics of miRNAs: high conservation of precursor sequences, very high conservation of mature sequences (particularly in the seed region, nt 2–8 ), and hairpin secondary structure with high folding energy and base pairing. miRNAminer first uses BLAST  to select candidate matches and ranks them according to their e-values. Then it employs a series of rigorous filters to improve specificity.
An input query consists of a precursor miRNA, mature miRNA, a set of filter threshold values and the number of best-fitted results requested in the output. We designed miRNAminer's algorithm to maximize specificity of matches. This is because the designed application of miRNAminer is to identify homolog matches after a miRNA has been experimentally confirmed. We estimated the default values presented below so that each filter by itself selects 95% of known miRNAs in training genomes (criteria was also based on ).
miRNAminer's algorithm follows these steps: (i) Use BLAST  to find matches in target genomes (the whole precursor miRNA from the query is used); (ii) Filter with e-value threshold (default 0.05 per chromosome); (iii) Extend the match by adding flanking nucleotides (default 50) up- and down-stream from the match (Ensembl genome database; ). Examine all possible extensions of the match within threshold length (default min 70 nt, max 180 nt); (iv) Filter with RNA secondary folding energy threshold (default -25 kcal/mole; RNAfold with options "-p -d2 -noLP" ); (v) Filter with minimal base-pairing threshold (default 55% pairing; with 20 gap penalty and 0.5 extension penalty); (vi) Filter with requirement for hairpin-shape secondary structure; (vii) Filter with alignment of precursor sequences (default 56% identity); (viii) Filter with alignment of mature miRNA sequences (default 80% identity); (ix) Filter with maximum number of mismatches in mature miRNA sequences (default 3 nt); (x) Filter with conservation of seed (2–8 nt, required 100% conservation ); (xi) Filter with position of mature miRNA on the hairpin (max 4 nt overlap of mature sequence and hairpin loop). miRNAminer's output includes detailed analysis of the identified genomic region(s) that passed the selected threshold criteria (Figure 1). Currently, miRNAminer supports searches in 10 metazoan genomes. We will regularly add additional genomes upon their release. After the query is issued, results are usually available within a minute (though this depends on the number of results requested) and can either be viewed on the screen or requested to be sent by email.
We used miRNAminer to perform a comprehensive homology search for miRNA precursors in seven species (human, chimpanzee, mouse, rat, dog, cow and opossum). For the search, we used all 2925 vertebrate miRNAs listed in the Sanger miRNA registry (release 9.0 of October 2006). Table 1 shows the summary information of miRNAs listed in the Sanger registry and of new, or non-registered, miRNAs identified by our method. We identified 790 non-miRbase registered miRNAs with major contributions to chimp (P. troglodytes), dog (C. familiaris) and cow (B. taurus), vastly increasing their known miRNA repertoire and possibly opening new research facets in these species (see Additional file 1). Table 2 presents the miRNA candidates that our method identifies in human (H. sapiens). It is of interest that 22 new candidate miRNAs in human were identified, despite many previous exhaustive human miRNA identification studies [8, 21], possibly due to recent non-human/primate miRNAs identification , updated assembly of the human genome (Ensembl H. sapiens genome release 42 used here was updated in October 2006), and the modification of search parameters as implemented in miRNAminer.
We compared our predicted miRNAs (in the human species; Table 2) to other prediction methods. We found that 18 and 36% of our miRNAs are contained within RNAmicro [23, 5] and Berezikov 2005  databases, respectively, out of almost 3000 and 1000 miRNAs in each set, respectively. The overlap is not extensive however even when miRNAs derived from algorithms using very similar search parameters are compared only about 50% overlap is seen . One of our identified miRNAs, which is not reported by any other study, was recently identified experimentally ( present in miRbase version 10.0 and not 9.0) increasing confidence in our unique miRNAs. Notably, even though miRNAs from all species were used to search for human homologs, the candidate miRNAs discovered are homologs to genes in two species only, M. musculus and R. norvegicus, indicating better miRbase coverage for mouse and rat than for other species. Two examples of the non-registered human homolog miRNA genes are presented in Figure 2. miRbase mouse miR-764 sequence, which has no known registered homologs was used as input for miRNAminer search (with default parameters). The output reported a homolog (presumably hsa-miR-764; see Table 2), which is located in the second intron of Human serotonin receptor 2C (HTR2C; NM 000868; Figure 2A). The mouse miRNA homolog is located in an intron of the same gene (HTR2C; NM 008312) suggesting an evolutionary conserved co-expression of miRNA and its host gene [27–29]. Second in the list of non-registered human homolog miRNA genes (sorted according to BLAST e-values) is miR-763. This miRNA spans the longest complementary sequence out of the list, has the lowest RNA folding energy (Table 2) and shows high conservation between many species (Figure 2C). Interestingly, human miR-763 is harboured in an intron of the high mobility group AT-hook 2 oncogene (HMGA2; for review see [30, 31]). Recent disrupted interplay between miRNAs and HMGA2 showed an increase in oncogenesis [32–34]. To regulate their targets, miRNAs bind to 'seed' regions in the 3'UTR, typically 6–7 nt long (nt 2 to nt 7 or 8 of the miRNA [15, 35, 36]; also see ). miR-763, possibly also co-expressed with its host gene [27–29], has a conserved binding site for its own harboured miRNA (nt 2 to 8 of the miRNA binds position 2192 in the 3'UTR which is conserved in human/mouse/rat). It is tempting to speculate a negative feedback regulatory role of newly identified human miR-763 and its oncogenic host when co-expressed in the same spatio-temporal context. To this end data from Expressed Sequence Tags (ESTs) supports this possibility (Figure 2C). Other identified miRNAs presented in Table 2 show high species conservation (for example, miR-670) or are located in exons (for example, miR-711) or exon-intron junctions (for example, miR-762). Interestingly, in a recent study involving deep sequencing, four of our human predicted miRNAs were confirmed (miR-760, 708, 543, and 665 , available in miRbase version 10.0). To conclusively confirm the presence of the identified candidates in the studied species, an experimental verification is required. However, the candidates identified by our method are close homologs to known miRNAs and as such are not required to meet as stringent criteria to be annotated as novel miRNAs . In this study we looked at homolog genes which are genes related to each other by descent from a common ancestral DNA sequence. We do not segregate between orthologs, genes in different species that evolved from a common ancestral gene by speciation, and paralogs, genes separated by the event of genetic duplication. We cannot also rule out that similar miRNAs in different species have developed independently . Our tool, which is based on evolutionary conservation, can only detect evolutionarily conserved miRNA genes. We are currently improving our algorithm to include multiple alignments of vertebrate miRNA sequences in order to better refine the boundaries of the miRNA precursor sequence.
For searches with relaxed parameters (reduced stringency) we suggest initially performing the following modifications: (i) do not 'Require seed conservation in mature miRNA (nt 2–8)' (uncheck box); (ii) increase 'maximal number of gaps in miRNA precursor alignment' from 10 (default) to 15; (iii) decrease 'minimal mature miRNA identity' from 0.8 (default) to 0.7; (iv) decrease 'minimal base pairing percentage in miRNA precursor' from 55 (default) to 40; and (v) change 'minimal/maximal length of precursor sequence (nt)' from 70/180 (default) to 50/250. In order to view miRNAs which are other than the top candidate we suggest increasing the 'number of results to report' from 1 (default) to 5. The parameters (i–v) above are listed in the order that would output an increasing total number of identified miRNAs. For example, reducing mature miRNA identity from 0.8 (default) to 0.7 increases miRNAs from 22 to 24 (9%) and 31 to 36 (16%) in human and mouse, respectively. On the other hand, we found that changing the length of the miRNA precursor from 70–180 nt (default) to 50–250 nt, added only 1 additional miRNA in human and none in mouse. This, however, might change when run in combination with other modified parameters. Altogether each of the modified parameters listed above will result, independently, in an average miRNA increase of 11% when tested on seven mammalian species.
We estimated miRNAminer's sensitivity (Table 3) and specificity. The sensitivity, on seven mammalian species, is 0.88. Sensitivity for a species is the portion of the species' miRNAs with known homologs that are detected by miRNAminer using miRNAs from all other species. We used only miRNAs which miRbase lists for more than one species. Sensitivity measures are higher in chimp (0.94), mouse (0.88) and rat (0.91) than in human (0.85). To estimate specificity, we used miRNAminer to search for miRNA homologs in C. elegans, which has a large evolutionary distance from the studied mammals. We treated as false positives all hits reported by miRNAminer that were not identified as homologs by previous studies. This conservative treatment may over-approximate the number of false positives. Using 1375 miRNAs from the seven studied mammalian species, miRNAminer detected, in C. elegans, two known homologs (let-7 and mir-124) and reported only five false positives.
Several approaches to identify miRNA homologs have been previously described, both in plants , and in animals [5, 41, 42]. However, the only tool that is available as a web service, microHARVESTER , is targeted for plants. miRNAminer is the first available miRNA gene homolog search tool for animal genomes.
Availability and requirements
Project name: miRNAminer; Project home page: http://pag.csail.mit.edu/mirnaminer; Operating system: Platform independent; Programming language: Java; License: Open source, see http://opensource.org/licenses/mit-licence.php; Code is available upon request. miRNAs identified using miRNAminer will be incorporated in next miRbase versions, see http://microrna.sanger.ac.uk.
Bushati N, Cohen SM: microRNA Functions. Annu Rev Cell Dev Biol 2007, 23: 175–205. 10.1146/annurev.cellbio.23.090506.123406
Carthew RW: Gene regulation by microRNAs. Curr Opin Genet Dev 2006, 16(2):203–208. 10.1016/j.gde.2006.02.012
Hornstein E, Mansfield JH, Yekta S, Hu JK, Harfe BD, McManus MT, Baskerville S, Bartel DP, Tabin CJ: The microRNA miR-196 acts upstream of Hoxb8 and Shh in limb development. Nature 2005, 438(7068):671–674. 10.1038/nature04138
Hornstein E, Shomron N: Canalization of development by microRNAs. Nat Genet 2006, 38 Suppl: S20–4. 10.1038/ng1803
Hertel J, Stadler PF: Hairpins in a Haystack: recognizing microRNA precursors in comparative genomics data. Bioinformatics 2006, 22(14):e197–202. 10.1093/bioinformatics/btl257
Sempere LF, Cole CN, McPeek MA, Peterson KJ: The phylogenetic distribution of metazoan microRNAs: insights into evolutionary complexity and constraint. J Exp Zoolog B Mol Dev Evol 2006, 306(6):575–588. 10.1002/jez.b.21118
Hertel J, Lindemeyer M, Missal K, Fried C, Tanzer A, Flamm C, Hofacker IL, Stadler PF: The expansion of the metazoan microRNA repertoire. BMC Genomics 2006, 7: 25. 10.1186/1471-2164-7-25
Bentwich I, Avniel A, Karov Y, Aharonov R, Gilad S, Barad O, Barzilai A, Einat P, Einav U, Meiri E, Sharon E, Spector Y, Bentwich Z: Identification of hundreds of conserved and nonconserved human microRNAs. Nat Genet 2005, 37(7):766–770. 10.1038/ng1590
Berezikov E, Thuemmler F, van Laake LW, Kondova I, Bontrop R, Cuppen E, Plasterk RH: Diversity of microRNAs in human and chimpanzee brain. Nat Genet 2006, 38(12):1375–1377. 10.1038/ng1914
Saunders MA, Liang H, Li WH: Human polymorphism at microRNAs and microRNA target sites. Proc Natl Acad Sci U S A 2007, 104(9):3300–3305. 10.1073/pnas.0611347104
Lai EC, Tomancak P, Williams RW, Rubin GM: Computational identification of Drosophila microRNA genes. Genome Biol 2003, 4(7):R42. 10.1186/gb-2003-4-7-r42
Lim LP, Glasner ME, Yekta S, Burge CB, Bartel DP: Vertebrate microRNA genes. Science 2003, 299(5612):1540. 10.1126/science.1080372
Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ: miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res 2006, 34(Database issue):D140–4. 10.1093/nar/gkj112
miRbase database contains all published miRNA sequences, genomic locations and associated annotation.[http://microrna.sanger.ac.uk]
Lewis BP, Burge CB, Bartel DP: Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 2005, 120(1):15–20. 10.1016/j.cell.2004.12.035
Ambros V, Bartel B, Bartel DP, Burge CB, Carrington JC, Chen X, Dreyfuss G, Eddy SR, Griffiths-Jones S, Marshall M, Matzke M, Ruvkun G, Tuschl T: A uniform system for microRNA annotation. Rna 2003, 9(3):277–279. 10.1261/rna.2183803
Ensembl database produces and maintains automatic annotations on selected eukaryotic genomes.[http://www.ensembl.org]
RNA secondary structure prediction of the RNAfold program[http://www.tbi.univie.ac.at/RNA]
Berezikov E, van Tetering G, Verheul M, van de Belt J, van Laake L, Vos J, Verloop R, van de Wetering M, Guryev V, Takada S, van Zonneveld AJ, Mano H, Plasterk R, Cuppen E: Many novel mammalian microRNA candidates identified by extensive cloning and RAKE analysis. Genome Res 2006, 16(10):1289–1298. 10.1101/gr.5159906
Mineno J, Okamoto S, Ando T, Sato M, Chono H, Izu H, Takayama M, Asada K, Mirochnitchenko O, Inouye M, Kato I: The expression profile of microRNAs in mouse embryos. Nucleic Acids Res 2006, 34(6):1765–1771. 10.1093/nar/gkl096
Berezikov E, Guryev V, van de Belt J, Wienholds E, Plasterk RH, Cuppen E: Phylogenetic shadowing and computational identification of human microRNA genes. Cell 2005, 120(1):21–24. 10.1016/j.cell.2004.12.031
Terai G, Komori T, Asai K, Kin T: miRRim: A novel system to find conserved miRNAs with high sensitivity and specificity. Rna 2007.
Cai X, Cullen BR: The imprinted H19 noncoding RNA is a primary microRNA precursor. Rna 2007, 13(3):313–316. 10.1261/rna.351707
Baskerville S, Bartel DP: Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. Rna 2005, 11(3):241–247. 10.1261/rna.7240905
Liang Y, Ridzon D, Wong L, Chen C: Characterization of microRNA expression profiles in normal human tissues. BMC Genomics 2007, 8: 166. 10.1186/1471-2164-8-166
Rodriguez A, Griffiths-Jones S, Ashurst JL, Bradley A: Identification of mammalian microRNA host genes and transcription units. Genome Res 2004, 14(10A):1902–1910. 10.1101/gr.2722704
Fedele M, Pierantoni GM, Visone R, Fusco A: Critical role of the HMGA2 gene in pituitary adenomas. Cell Cycle 2006, 5(18):2045–2048.
Reeves R: Molecular biology of HMGA proteins: hubs of nuclear function. Gene 2001, 277(1–2):63–81. 10.1016/S0378-1119(01)00689-8
Lee YS, Dutta A: The tumor suppressor microRNA let-7 represses the HMGA2 oncogene. Genes Dev 2007, 21(9):1025–1030. 10.1101/gad.1540407
Mayr C, Hemann MT, Bartel DP: Disrupting the pairing between let-7 and Hmga2 enhances oncogenic transformation. Science 2007, 315(5818):1576–1579. 10.1126/science.1137999
Shell S, Park SM, Radjabi AR, Schickel R, Kistner EO, Jewell DA, Feig C, Lengyel E, Peter ME: Let-7 expression defines two differentiation stages of cancer. Proc Natl Acad Sci U S A 2007, 104(27):11400–11405. 10.1073/pnas.0704372104
Brennecke J, Stark A, Russell RB, Cohen SM: Principles of microRNA-target recognition. PLoS Biol 2005, 3(3):e85. 10.1371/journal.pbio.0030085
Krek A, Grun D, Poy MN, Wolf R, Rosenberg L, Epstein EJ, MacMenamin P, da Piedade I, Gunsalus KC, Stoffel M, Rajewsky N: Combinatorial microRNA target predictions. Nat Genet 2005, 37(5):495–500. 10.1038/ng1536
Nielsen CB, Shomron N, Sandberg R, Hornstein E, Kitzman J, Burge CB: Determinants of targeting by endogenous and exogenous microRNAs and siRNAs. Rna 2007, 13(11):1894–1910. 10.1261/rna.768207
Landgraf P, Rusu M, Sheridan R, Sewer A, Iovino N, Aravin A, Pfeffer S, Rice A, Kamphorst AO, Landthaler M, Lin C, Socci ND, Hermida L, Fulci V, Chiaretti S, Foa R, Schliwka J, Fuchs U, Novosel A, Muller RU, Schermer B, Bissels U, Inman J, Phan Q, Chien M, Weir DB, Choksi R, De Vita G, Frezzetti D, Trompeter HI, Hornung V, Teng G, Hartmann G, Palkovits M, Di Lauro R, Wernet P, Macino G, Rogler CE, Nagle JW, Ju J, Papavasiliou FN, Benzing T, Lichter P, Tam W, Brownstein MJ, Bosio A, Borkhardt A, Russo JJ, Sander C, Zavolan M, Tuschl T: A mammalian microRNA expression atlas based on small RNA library sequencing. Cell 2007, 129(7):1401–1414. 10.1016/j.cell.2007.04.040
Arteaga-Vazquez M, Caballero-Perez J, Vielle-Calzada JP: A family of microRNAs present in plants and animals. Plant Cell 2006, 18(12):3355–3369. 10.1105/tpc.106.044420
Dezulian T, Remmert M, Palatnik JF, Weigel D, Huson DH: Identification of plant microRNA homologs. Bioinformatics 2006, 22(3):359–360. 10.1093/bioinformatics/bti802
Wang X, Zhang J, Li F, Gu J, He T, Zhang X, Li Y: MicroRNA identification based on sequence and structure alignment. Bioinformatics 2005, 21(18):3610–3614. 10.1093/bioinformatics/bti562
Weber MJ: New human and mouse microRNA genes found by homology search. Febs J 2005, 272(1):59–73. 10.1111/j.1432-1033.2004.04389.x
UCSC genome browser contains the reference sequences and working draft assemblies for a large collection of genomes.[http://genome.ucsc.edu]
NS conceived the study. SA, AK and NS planned and designed the algorithm and web-server. SA and AK wrote the code. NS analyzed the output. SA, AK and NS wrote the paper. All authors read and approved the manuscript.
Shay Artzi, Adam Kiezun contributed equally to this work.
Electronic supplementary material
Additional file 1: Non-miRbase registered miRNAs. A list of 790 miRNAs that were identified using miRNAminer. These miRNAs add more than 50% to the total count of miRNAs in the seven mammalian species tested: human, chimpanzee, mouse, rat, dog, cow and opossum, and are available at: http://web.mit.edu/nshomron/www/miRNAminer_SM1.zip (ZIP 67 KB)
About this article
Cite this article
Artzi, S., Kiezun, A. & Shomron, N. miRNAminer: A tool for homologous microRNA gene search. BMC Bioinformatics 9, 39 (2008). https://doi.org/10.1186/1471-2105-9-39