miRNAminer: A tool for homologous microRNA gene search
- Shay Artzi†1,
- Adam Kiezun†1 and
- Noam Shomron1Email author
https://doi.org/10.1186/1471-2105-9-39
© Artzi et al; licensee BioMed Central Ltd. 2008
Received: 28 September 2007
Accepted: 23 January 2008
Published: 23 January 2008
Abstract
Background
MicroRNAs (miRNAs), present in most metazoans, are small non-coding RNAs that control gene expression by negatively regulating translation through binding to the 3'UTR of mRNA transcripts. Previously, experimental and computational methods were used to construct miRNA gene repositories agreeing with careful submission guidelines.
Results
An algorithm we developed – miRNAminer – is used for homologous conserved miRNA gene search in several animal species. Given a search query, candidate homologs from different species are tested for their known miRNA properties, such as secondary structure, energy and alignment and conservation, in order to asses their fidelity. When applying miRNAminer on seven mammalian species we identified several hundreds of high-confidence homologous miRNAs increasing the total collection of (miRbase) miRNAs, in these species, by more than 50%. miRNAminer uses stringent criteria and exhibits high sensitivity and specificity.
Conclusion
We present – miRNAminer – the first web-server for homologous miRNA gene search in animals. miRNAminer can be used to identify conserved homolog miRNA genes and can also be used prior to depositing miRNAs in public databases. miRNAminer is available at http://pag.csail.mit.edu/mirnaminer.
Keywords
Background
MicroRNAs (miRNAs) are short, ~22 nt non-coding RNAs that control gene expression. miRNAs bind to the 3'UTRs of their regulated mRNA transcripts to facilitate mRNA degradation or translation inhibition [1, 2]. miRNAs are present in most metazoans and are thought to regulate a diverse range of biological processes [3, 4]. miRNA genes' evolution is unique since they mostly emerge through duplication events [5]; exhibit most often unidirectional evolution [6]; are generally gained and not lost [7]; show several cases of rapid evolution in primates [8, 9]; are rarely changed due to functional constraints [10]; and, show relatively rare evolutionary acquisition events (accounted by their relatively small numbers).
miRNA predictions based on computational methods, which followed initial extensive cloning efforts, are based on the secondary structure of the miRNA, its phylogenetic conservation and thermodynamic stability [11, 12]. miRNA gene repositories are constantly expanding giving rise to more than 3500 reported miRNAs in more than 30 animal species (Sanger miRbase database, Version 10.0 [13, 14]). However, it is apparent that even this comprehensive repository is far from completion, accounted by the very few miRNAs listed for dog (6) and chimpanzee (83), compared to human (533), to name two examples. Since these differences cannot be accounted merely by species-specific miRNAs, we saw the need for a computational tool for miRNA homologous searches.
Implementation
We present miRNAminer, a tool for automatic identification of homolog miRNAs based on a given user defined query miRNA. The tool exploits numerous characteristics of miRNAs: high conservation of precursor sequences, very high conservation of mature sequences (particularly in the seed region, nt 2–8 [15]), and hairpin secondary structure with high folding energy and base pairing. miRNAminer first uses BLAST [16] to select candidate matches and ranks them according to their e-values. Then it employs a series of rigorous filters to improve specificity.
An input query consists of a precursor miRNA, mature miRNA, a set of filter threshold values and the number of best-fitted results requested in the output. We designed miRNAminer's algorithm to maximize specificity of matches. This is because the designed application of miRNAminer is to identify homolog matches after a miRNA has been experimentally confirmed. We estimated the default values presented below so that each filter by itself selects 95% of known miRNAs in training genomes (criteria was also based on [17]).
A sample output of miRNAminer. Indicated are: search start and end times (rows 1 and 24, respectively) and the species and assembly searched (rows 2–3); whether a match (a miRNA homolog), or matches, that passed the input criteria were found (rows 4–5); information about the quality of the homolog miRNA match such as BLAST e-value, genomic location, sequence, RNA fold and energy, pairing, length and alignment with input sequence (rows 6–19). A hyperlink to the genomic locus of the miRNA homolog is also provided through Ensembl ContigView [18] or UCSC Genome Browser [43] (rows 20–23) and a copy of the users' input data (row 25).
Results
Known registered miRbase miRNAs and new candidates identified by miRNAminer.
Genome | miRbase 9.0 | Newly identified | Sum |
---|---|---|---|
Human | 474 | 22 | 496 |
Chimpanzee | 83 | 251 | 334 |
Mouse | 373 | 31 | 404 |
Rat | 234 | 74 | 308 |
Dog | 6 | 228 | 234 |
Cow | 98 | 131 | 229 |
Opossum | 107 | 53 | 160 |
Total | 1375 | 790 | 2165 |
Candidate non-registered miRNAs identified by miRNAminer in human.
Original information | Results in H. sapiens | RNA fold | Identity with original | ||||||
---|---|---|---|---|---|---|---|---|---|
miRNA | length (nt) | Chr | Position | Length (nt) | e-value | base pair % | ΔG | mature % | precursor % |
mmu-mir-759 | 98 | 13 | 52282180 | 96 | 3.39E-048 | 72.9 | -32 | 100 | 100 |
mmu-mir-763 | 120 | 12 | 64538060 | 120 | 4.99E-048 | 76.7 | -57 | 95 | 95 |
mmu-mir-760 | 119 | 1 | 94084955 | 120 | 9.15E-048 | 66.7 | -55 | 100 | 95 |
mmu-mir-708 | 109 | 11 | 78790709 | 109 | 1.66E-041 | 62.4 | -50 | 100 | 94 |
mmu-mir-543 | 76 | 14 | 100568079 | 72 | 7.11E-027 | 61.1 | -22 | 100 | 95 |
rno-mir-543 | 80 | 14 | 100568079 | 72 | 7.46E-027 | 61.1 | -22 | 100 | 96 |
mmu-mir-670 | 100 | 11 | 43537789 | 89 | 7.67E-025 | 71.9 | -36 | 100 | 91 |
mmu-mir-762 | 76 | 16 | 30812726 | 72 | 1.45E-024 | 77.8 | -54 | 91 | 94 |
mmu-mir-764 | 108 | X | 113780174 | 96 | 3.83E-024 | 66.7 | -42 | 95 | 91 |
mmu-mir-675 | 84 | 11 | 1974559 | 84 | 5.96E-022 | 71.4 | -53 | 95 | 92 |
mmu-mir-711 | 82 | 3 | 48591339 | 74 | 1.24E-014 | 59.5 | -31 | 91 | 88 |
mmu-mir-665 | 94 | 14 | 100411119 | 86 | 3.10E-014 | 65.1 | -39 | 91 | 87 |
rno-mir-664 | 59 | 1 | 218440516 | 70 | 9.59E-012 | 60 | -26 | 95 | 92 |
mmu-mir-322 | 95 | X | 133508324 | 91 | 2.72E-009 | 76.9 | -47 | 95 | 85 |
rno-mir-322 | 95 | X | 133508327 | 87 | 2.72E-009 | 80.5 | -45 | 95 | 85 |
mmu-mir-718 | 88 | X | 152938565 | 70 | 9.78E-009 | 71.4 | -39 | 90 | 87 |
mmu-mir-709 | 88 | 3 | 186851919 | 71 | 1.21E-005 | 62 | -27 | 100 | 70 |
mmu-mir-466 | 73 | 2 | 35362302 | 78 | 4.60E-005 | 61.5 | -21 | 95 | 72 |
rno-mir-292 | 82 | 19 | 58982746 | 70 | 8.80E-004 | 65.7 | -31 | 86 | 77 |
mmu-mir-669 | 97 | 17 | 69204319 | 71 | 1.33E-003 | 62 | -22 | 90 | 86 |
mmu-mir-705 | 82 | 22 | 45887964 | 72 | 2.70E-003 | 58.3 | -22 | 100 | 58 |
mmu-mir-207 | 79 | 17 | 73276373 | 70 | 4.10E-003 | 60 | -25 | 96 | 62 |
mmu-mir-720 | 64 | 3 | 165541839 | 71 | 3.05E-002 | 59.2 | -21 | 100 | 89 |
mmu-mir-761 | 76 | 1 | 110433694 | 71 | 4.76E-002 | 62 | -31 | 81 | 84 |
Two examples of a non-miRbase registered miRNA identified using our miRNAminer web-server. (A) Human miR-764 was identified using miRbase mouse miR-764 sequence as input (and default parameters) for miRNAminer search. The output reported a homolog (presumably hsa-miR-764), which is located in the second intron of human serotonin receptor 2C (HTR2C; NM 000868). The mouse miRNA homolog is located in an intron of the same gene (HTR2C; NM 008312) suggesting an evolutionary conserved co-expression of miRNA and its host gene [27-29]. High conservation is seen in this region (mountain-like graph derived from UCSC Genome Browser 17 species multiZ alignment; [43]). Black rectangles represent exons (shorter rectangles in C are UTRs), lines are introns and dark-grey rectangles are miRNA genes. (B) RNA secondary structure of both the identified human (top) and mouse (bottom) miR-764 exhibit similar thermodynamic stability (41.8/49.9 kcal/mol, respectively) and structures (mature miRNA region is underlined). Human miR-764 homolog was also identified by Berezikov [21]. (C) Non-registered (miRbase) human miR-763 is highly conserved among vertebrate species and can potentially bind its own host gene. On top; a schematic non-scaled representation of the HMGA2 transcript (NM 003483; human miR-763 is in dark-grey; conservation plot as shown in A). Expressed Sequence Tags (ESTs; light-grey bars) are evidence for the expression of this particular genomic region. ESTs from top to bottom: BM715067 (isolated from eye-related tissue); BJ997562 (isolated from wilms tumor tissue); BU39975 (isolated from eye-related tissue); AI935081 (tissue source unknown). On the right; the potential binding site of miR-763 in HMGA2 3'UTR (nt 2–8 of the miRNA; positions 2192–2198) is conserved to human, mouse and rat.
For searches with relaxed parameters (reduced stringency) we suggest initially performing the following modifications: (i) do not 'Require seed conservation in mature miRNA (nt 2–8)' (uncheck box); (ii) increase 'maximal number of gaps in miRNA precursor alignment' from 10 (default) to 15; (iii) decrease 'minimal mature miRNA identity' from 0.8 (default) to 0.7; (iv) decrease 'minimal base pairing percentage in miRNA precursor' from 55 (default) to 40; and (v) change 'minimal/maximal length of precursor sequence (nt)' from 70/180 (default) to 50/250. In order to view miRNAs which are other than the top candidate we suggest increasing the 'number of results to report' from 1 (default) to 5. The parameters (i–v) above are listed in the order that would output an increasing total number of identified miRNAs. For example, reducing mature miRNA identity from 0.8 (default) to 0.7 increases miRNAs from 22 to 24 (9%) and 31 to 36 (16%) in human and mouse, respectively. On the other hand, we found that changing the length of the miRNA precursor from 70–180 nt (default) to 50–250 nt, added only 1 additional miRNA in human and none in mouse. This, however, might change when run in combination with other modified parameters. Altogether each of the modified parameters listed above will result, independently, in an average miRNA increase of 11% when tested on seven mammalian species.
Sensitivity of miRNAminer.
Genome | Found | Not found | Sensitivity |
---|---|---|---|
Human | 179 | 31 | 0.85 |
Chimpanzee | 63 | 4 | 0.94 |
Mouse | 184 | 24 | 0.88 |
Rat | 154 | 15 | 0.91 |
Dog | 5 | 0 | 1.00 |
Cow | 58 | 14 | 0.81 |
Opossum | 71 | 8 | 0.90 |
Total | 714 | 96 | 0.88 |
Conclusion
Several approaches to identify miRNA homologs have been previously described, both in plants [40], and in animals [5, 41, 42]. However, the only tool that is available as a web service, microHARVESTER [40], is targeted for plants. miRNAminer is the first available miRNA gene homolog search tool for animal genomes.
Availability and requirements
Project name: miRNAminer; Project home page: http://pag.csail.mit.edu/mirnaminer; Operating system: Platform independent; Programming language: Java; License: Open source, see http://opensource.org/licenses/mit-licence.php; Code is available upon request. miRNAs identified using miRNAminer will be incorporated in next miRbase versions, see http://microrna.sanger.ac.uk.
Notes
Declarations
Authors’ Affiliations
References
- Bushati N, Cohen SM: microRNA Functions. Annu Rev Cell Dev Biol 2007, 23: 175–205. 10.1146/annurev.cellbio.23.090506.123406View ArticlePubMedGoogle Scholar
- Carthew RW: Gene regulation by microRNAs. Curr Opin Genet Dev 2006, 16(2):203–208. 10.1016/j.gde.2006.02.012View ArticlePubMedGoogle Scholar
- Hornstein E, Mansfield JH, Yekta S, Hu JK, Harfe BD, McManus MT, Baskerville S, Bartel DP, Tabin CJ: The microRNA miR-196 acts upstream of Hoxb8 and Shh in limb development. Nature 2005, 438(7068):671–674. 10.1038/nature04138View ArticlePubMedGoogle Scholar
- Hornstein E, Shomron N: Canalization of development by microRNAs. Nat Genet 2006, 38 Suppl: S20–4. 10.1038/ng1803View ArticlePubMedGoogle Scholar
- Hertel J, Stadler PF: Hairpins in a Haystack: recognizing microRNA precursors in comparative genomics data. Bioinformatics 2006, 22(14):e197–202. 10.1093/bioinformatics/btl257View ArticlePubMedGoogle Scholar
- Sempere LF, Cole CN, McPeek MA, Peterson KJ: The phylogenetic distribution of metazoan microRNAs: insights into evolutionary complexity and constraint. J Exp Zoolog B Mol Dev Evol 2006, 306(6):575–588. 10.1002/jez.b.21118View ArticleGoogle Scholar
- Hertel J, Lindemeyer M, Missal K, Fried C, Tanzer A, Flamm C, Hofacker IL, Stadler PF: The expansion of the metazoan microRNA repertoire. BMC Genomics 2006, 7: 25. 10.1186/1471-2164-7-25PubMed CentralView ArticlePubMedGoogle Scholar
- Bentwich I, Avniel A, Karov Y, Aharonov R, Gilad S, Barad O, Barzilai A, Einat P, Einav U, Meiri E, Sharon E, Spector Y, Bentwich Z: Identification of hundreds of conserved and nonconserved human microRNAs. Nat Genet 2005, 37(7):766–770. 10.1038/ng1590View ArticlePubMedGoogle Scholar
- Berezikov E, Thuemmler F, van Laake LW, Kondova I, Bontrop R, Cuppen E, Plasterk RH: Diversity of microRNAs in human and chimpanzee brain. Nat Genet 2006, 38(12):1375–1377. 10.1038/ng1914View ArticlePubMedGoogle Scholar
- Saunders MA, Liang H, Li WH: Human polymorphism at microRNAs and microRNA target sites. Proc Natl Acad Sci U S A 2007, 104(9):3300–3305. 10.1073/pnas.0611347104PubMed CentralView ArticlePubMedGoogle Scholar
- Lai EC, Tomancak P, Williams RW, Rubin GM: Computational identification of Drosophila microRNA genes. Genome Biol 2003, 4(7):R42. 10.1186/gb-2003-4-7-r42PubMed CentralView ArticlePubMedGoogle Scholar
- Lim LP, Glasner ME, Yekta S, Burge CB, Bartel DP: Vertebrate microRNA genes. Science 2003, 299(5612):1540. 10.1126/science.1080372View ArticlePubMedGoogle Scholar
- Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ: miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res 2006, 34(Database issue):D140–4. 10.1093/nar/gkj112PubMed CentralView ArticlePubMedGoogle Scholar
- miRbase database contains all published miRNA sequences, genomic locations and associated annotation.[http://microrna.sanger.ac.uk]
- Lewis BP, Burge CB, Bartel DP: Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 2005, 120(1):15–20. 10.1016/j.cell.2004.12.035View ArticlePubMedGoogle Scholar
- NCBI BLAST[http://www.ncbi.nlm.nih.gov/BLAST]
- Ambros V, Bartel B, Bartel DP, Burge CB, Carrington JC, Chen X, Dreyfuss G, Eddy SR, Griffiths-Jones S, Marshall M, Matzke M, Ruvkun G, Tuschl T: A uniform system for microRNA annotation. Rna 2003, 9(3):277–279. 10.1261/rna.2183803PubMed CentralView ArticlePubMedGoogle Scholar
- Ensembl database produces and maintains automatic annotations on selected eukaryotic genomes.[http://www.ensembl.org]
- RNA secondary structure prediction of the RNAfold program[http://www.tbi.univie.ac.at/RNA]
- Supplementary Material[http://web.mit.edu/nshomron/www/miRNAminer_SM1.zip]
- Berezikov E, van Tetering G, Verheul M, van de Belt J, van Laake L, Vos J, Verloop R, van de Wetering M, Guryev V, Takada S, van Zonneveld AJ, Mano H, Plasterk R, Cuppen E: Many novel mammalian microRNA candidates identified by extensive cloning and RAKE analysis. Genome Res 2006, 16(10):1289–1298. 10.1101/gr.5159906PubMed CentralView ArticlePubMedGoogle Scholar
- Mineno J, Okamoto S, Ando T, Sato M, Chono H, Izu H, Takayama M, Asada K, Mirochnitchenko O, Inouye M, Kato I: The expression profile of microRNAs in mouse embryos. Nucleic Acids Res 2006, 34(6):1765–1771. 10.1093/nar/gkl096PubMed CentralView ArticlePubMedGoogle Scholar
- RNAmicro[http://www.tbi.univie.ac.at/~jana/software/RNAmicro.html]
- Berezikov E, Guryev V, van de Belt J, Wienholds E, Plasterk RH, Cuppen E: Phylogenetic shadowing and computational identification of human microRNA genes. Cell 2005, 120(1):21–24. 10.1016/j.cell.2004.12.031View ArticlePubMedGoogle Scholar
- Terai G, Komori T, Asai K, Kin T: miRRim: A novel system to find conserved miRNAs with high sensitivity and specificity. Rna 2007.Google Scholar
- Cai X, Cullen BR: The imprinted H19 noncoding RNA is a primary microRNA precursor. Rna 2007, 13(3):313–316. 10.1261/rna.351707PubMed CentralView ArticlePubMedGoogle Scholar
- Baskerville S, Bartel DP: Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. Rna 2005, 11(3):241–247. 10.1261/rna.7240905PubMed CentralView ArticlePubMedGoogle Scholar
- Liang Y, Ridzon D, Wong L, Chen C: Characterization of microRNA expression profiles in normal human tissues. BMC Genomics 2007, 8: 166. 10.1186/1471-2164-8-166PubMed CentralView ArticlePubMedGoogle Scholar
- Rodriguez A, Griffiths-Jones S, Ashurst JL, Bradley A: Identification of mammalian microRNA host genes and transcription units. Genome Res 2004, 14(10A):1902–1910. 10.1101/gr.2722704PubMed CentralView ArticlePubMedGoogle Scholar
- Fedele M, Pierantoni GM, Visone R, Fusco A: Critical role of the HMGA2 gene in pituitary adenomas. Cell Cycle 2006, 5(18):2045–2048.View ArticlePubMedGoogle Scholar
- Reeves R: Molecular biology of HMGA proteins: hubs of nuclear function. Gene 2001, 277(1–2):63–81. 10.1016/S0378-1119(01)00689-8View ArticlePubMedGoogle Scholar
- Lee YS, Dutta A: The tumor suppressor microRNA let-7 represses the HMGA2 oncogene. Genes Dev 2007, 21(9):1025–1030. 10.1101/gad.1540407PubMed CentralView ArticlePubMedGoogle Scholar
- Mayr C, Hemann MT, Bartel DP: Disrupting the pairing between let-7 and Hmga2 enhances oncogenic transformation. Science 2007, 315(5818):1576–1579. 10.1126/science.1137999PubMed CentralView ArticlePubMedGoogle Scholar
- Shell S, Park SM, Radjabi AR, Schickel R, Kistner EO, Jewell DA, Feig C, Lengyel E, Peter ME: Let-7 expression defines two differentiation stages of cancer. Proc Natl Acad Sci U S A 2007, 104(27):11400–11405. 10.1073/pnas.0704372104PubMed CentralView ArticlePubMedGoogle Scholar
- Brennecke J, Stark A, Russell RB, Cohen SM: Principles of microRNA-target recognition. PLoS Biol 2005, 3(3):e85. 10.1371/journal.pbio.0030085PubMed CentralView ArticlePubMedGoogle Scholar
- Krek A, Grun D, Poy MN, Wolf R, Rosenberg L, Epstein EJ, MacMenamin P, da Piedade I, Gunsalus KC, Stoffel M, Rajewsky N: Combinatorial microRNA target predictions. Nat Genet 2005, 37(5):495–500. 10.1038/ng1536View ArticlePubMedGoogle Scholar
- Nielsen CB, Shomron N, Sandberg R, Hornstein E, Kitzman J, Burge CB: Determinants of targeting by endogenous and exogenous microRNAs and siRNAs. Rna 2007, 13(11):1894–1910. 10.1261/rna.768207PubMed CentralView ArticlePubMedGoogle Scholar
- Landgraf P, Rusu M, Sheridan R, Sewer A, Iovino N, Aravin A, Pfeffer S, Rice A, Kamphorst AO, Landthaler M, Lin C, Socci ND, Hermida L, Fulci V, Chiaretti S, Foa R, Schliwka J, Fuchs U, Novosel A, Muller RU, Schermer B, Bissels U, Inman J, Phan Q, Chien M, Weir DB, Choksi R, De Vita G, Frezzetti D, Trompeter HI, Hornung V, Teng G, Hartmann G, Palkovits M, Di Lauro R, Wernet P, Macino G, Rogler CE, Nagle JW, Ju J, Papavasiliou FN, Benzing T, Lichter P, Tam W, Brownstein MJ, Bosio A, Borkhardt A, Russo JJ, Sander C, Zavolan M, Tuschl T: A mammalian microRNA expression atlas based on small RNA library sequencing. Cell 2007, 129(7):1401–1414. 10.1016/j.cell.2007.04.040PubMed CentralView ArticlePubMedGoogle Scholar
- Arteaga-Vazquez M, Caballero-Perez J, Vielle-Calzada JP: A family of microRNAs present in plants and animals. Plant Cell 2006, 18(12):3355–3369. 10.1105/tpc.106.044420PubMed CentralView ArticlePubMedGoogle Scholar
- Dezulian T, Remmert M, Palatnik JF, Weigel D, Huson DH: Identification of plant microRNA homologs. Bioinformatics 2006, 22(3):359–360. 10.1093/bioinformatics/bti802View ArticlePubMedGoogle Scholar
- Wang X, Zhang J, Li F, Gu J, He T, Zhang X, Li Y: MicroRNA identification based on sequence and structure alignment. Bioinformatics 2005, 21(18):3610–3614. 10.1093/bioinformatics/bti562View ArticlePubMedGoogle Scholar
- Weber MJ: New human and mouse microRNA genes found by homology search. Febs J 2005, 272(1):59–73. 10.1111/j.1432-1033.2004.04389.xView ArticlePubMedGoogle Scholar
- UCSC genome browser contains the reference sequences and working draft assemblies for a large collection of genomes.[http://genome.ucsc.edu]
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.