Accurate microRNA target prediction correlates with protein repression levels
- Manolis Maragkakis†1, 2Email author,
- Panagiotis Alexiou†1, 3,
- Giorgio L Papadopoulos1,
- Martin Reczko1, 4,
- Theodore Dalamagas5,
- George Giannopoulos5, 6,
- George Goumas7,
- Evangelos Koukis7,
- Kornilios Kourtis7,
- Victor A Simossis1,
- Praveen Sethupathy8,
- Thanasis Vergoulis5, 6,
- Nectarios Koziris7,
- Timos Sellis5, 6,
- Panagiotis Tsanakas7 and
- Artemis G Hatzigeorgiou1, 9Email author
© Maragkakis et al; licensee BioMed Central Ltd. 2009
Received: 15 April 2009
Accepted: 18 September 2009
Published: 18 September 2009
MicroRNAs are small endogenously expressed non-coding RNA molecules that regulate target gene expression through translation repression or messenger RNA degradation. MicroRNA regulation is performed through pairing of the microRNA to sites in the messenger RNA of protein coding genes. Since experimental identification of miRNA target genes poses difficulties, computational microRNA target prediction is one of the key means in deciphering the role of microRNAs in development and disease.
DIANA-microT 3.0 is an algorithm for microRNA target prediction which is based on several parameters calculated individually for each microRNA and combines conserved and non-conserved microRNA recognition elements into a final prediction score, which correlates with protein production fold change. Specifically, for each predicted interaction the program reports a signal to noise ratio and a precision score which can be used as an indication of the false positive rate of the prediction.
Recently, several computational target prediction programs were benchmarked based on a set of microRNA target genes identified by the pSILAC method. In this assessment DIANA-microT 3.0 was found to achieve the highest precision among the most widely used microRNA target prediction programs reaching approximately 66%. The DIANA-microT 3.0 prediction results are available online in a user friendly web server at http://www.microrna.gr/microT
MicroRNAs (miRNAs) are short, endogenously expressed RNA molecules that regulate gene expression by binding directly and preferably to the 3' untranslated region (3'UTR) of protein coding genes . Each miRNA is 19-24 nucleotides in length and is processed from a longer transcript which is referred to as the primary transcript (pri-miRNA). These transcripts are processed in the cell nucleus to short, 70-nucleotide stem-loop structures known as pre-miRNAs. Pre-miRNAs are processed to mature miRNAs in the cytoplasm by interaction with the endonuclease Dicer which cleaves the pre-miRNA stem-loop into two complementary short RNA molecules. One of these molecules is integrated into the RISC (RNA induced silencing complex) complex and guides the whole complex to the mRNA, thus inhibiting translation or inducing mRNA degradation . Since their initial identification, miRNAs have been found to confer a novel layer of genetic regulation in a wide range of biological processes. miRNAs were first identified in 1993  via classical genetic techniques in C. elegans, but it was not until 2001 that they were found to be widespread and abundant in cells [4–6]. This finding served as the primary impetus for the development of the first computational miRNA target prediction programs. DIANA-microT  and TargetScan  were the first algorithms to predict miRNA targets in humans, and led to the identification of an initial set of experimentally supported mammalian targets. Such targets are now collected and reported in TarBase  which contains more than one thousand entries for human and mouse miRNAs.
DIANA-microT 3.0, the algorithm described here, utilizes the above mentioned features and categorizes as putative MREs those sites that have seven, eight or nine nucleotide long consecutive WC base pairing with the miRNA driver sequence, starting from position 1 or 2 of the 5'end of the miRNA. For sites with additional base pairing involving the 3'end of the miRNA, a single G:U wobble pair or binding of only 6 consecutive nucleotides to the driver sequence are allowed. Briefly, the DIANA-microT 3.0 algorithm consists of (figure 1a): a) alignment of the miRNA driver sequence on the 3'UTR of a protein coding gene, b) identification of putative MREs based on specific binding rules, c) scoring of individual MREs according to their binding type and conservation profile, d) calculation of an overall miRNA target gene (miTG) score through the weighted sum of all MRE scores lying on the 3'UTR. The program is designed to use up to 27 different species to estimate MRE conservation scores and combines both conserved and non-conserved MREs in a final miTG score (figure 1c). The miTG score correlates with fold changes in protein expression. Additionally, since the algorithm calculates all weights and scores independently for each miRNA it allows for the calculation of signal to noise ratio (SNR) at different miTG score cut-offs providing precision scores which serve as an indication of the false positive rate of the predicted interactions.
Generally, miRNAs can repress the expression of proteins in two ways: via mRNA degradation or via repression of mRNA translation. Until recently, high throughput experiments were only able to measure miRNA-mediated changes at the mRNA level (degradation), allowing the characterization of only a subset of direct miRNA targets [17, 18]. However, recently two groups [12, 19] have independently developed methods to characterize miRNA-mediated gene expression changes at both the mRNA and the protein level. Selbach et al.  used microarrays and pulsed stable isotope labeling with amino acids in cell culture (pSILAC) assays to determine the genes targeted by each of five over-expressed miRNAs in HeLa cells. Using this set of experimentally supported targets the authors performed a comparative assessment of several target prediction programs. The benchmark revealed that the simplest prediction method involving the search for complementary sequences of the miRNA seed region on the 3'UTR of genes achieved a precision (the fraction of the predicted targets that were actually downregulated) of 44% while only three of the prediction programs (including an initial version of DIANA-microT 3.0) achieved significantly higher precision. PicTar  and TargetScanS  achieved approximately 62% precision compared to DIANA-microT 3.0 with approximately 66%.
Identification of putative miRNA binding sites through sequence alignment
The program identifies the highest scoring alignment between every nine nucleotide long window of the 3'UTR with the miRNA driver sequence using a dynamic programming algorithm. The alignment is based on the following binding rules. Firstly, a minimum of six consecutive matches (Watson-Crick (W-C) or G:U) is required. If the six matches are W-C and the binding starts at position 1 or 2 of the miRNA driver sequence, then the MRE is considered a 6mer. A 7mer (8mer, 9mer) has seven (eight, nine) consecutive W-C matches starting at position 1 or 2 of the miRNA driver. A single G:U wobble pair is allowed as long as there are at least six W-C pairs, yielding 7mers, 8mers and 9mers, each with a wobble base pair.
Filter of putative miRNA binding sites depending on binding energy
Mock miRNAs are artificially produced miRNA sequences which are independently created for each real miRNA. These artificial miRNA sequences are designed to have approximately the same number of predicted MREs as the corresponding real miRNA and are generated through the following procedure. Initially, all 3'UTR sequences are scanned for sites perfectly complementary to each possible 6 nucleotide long motif (hexamer) excluding those motifs corresponding to positions 1-6, 2-7 and 3-8 of real miRNAs. The 60 hexamers having the closest number of complementary sites to those of the seed of the real miRNA are chosen. These hexamers are then used as the seed of each artificially created mock miRNA. The remaining sequence of the mock miRNAs is then produced by randomly shuffling the remaining nucleotides of the real miRNA.
miRNA Recognition Elements score (MRE score)
miRNA target gene score
Binding category weights
Estimated Weights (w b )
Multiplication weights (mw b )
Overall Diana weights Dw b = mw b /mw9mer
1.00 = 4/4
0.75 = 3/4
0.50 = 2/4
0.25 = 1/4
miTG score threshold assessment
The precision of a prediction is defined as the ratio of correct positive predictions over all positive predictions [precision = truepositive /(truepositive + falsepositive)]. In the case of DIANA-microT 3.0, the average number of miTGs for mock miRNAs provides an estimation of the number of false positive targets predicted. Therefore, the number of true positive predicted miTGs can be calculated by subtracting the average number of predicted miTGs for the mock miRNAs from the total number of predicted miTGs for the real miRNA. In detail, the precision for miRNA r at miTG score s is calculated by where W r is the number of miTGs of the r real miRNA having miTG scores from s to s + Δs, is the average number of miTGs of the mock miRNAs corresponding to miRNA r having miTG scores from s to s + Δs and Δs is a specified miTG score window (Δs = 3).
The human and mouse miRNA sequences used by DIANA-microT 3.0 have been downloaded from miRBase Build 10.0 .
The gene 3'UTR sequences have been downloaded from Ensembl, release 48 . Those 3'UTR sequences that correspond to the same gene but to different gene transcripts have been filtered to keep only the longest 3'UTR sequence.
Multiple Alignment Files (MAFs)
The multiple genome alignment files have been downloaded from the UCSC Genome Browser . The file used for human (hg18) is the alignment to 16 vertebrate genomes while for mouse (mm9) 29 vertebrate genomes are used.
Signal to Noise Ratio (SNR) assessment
Receiver Operating Characteristics (ROC) analysis on proteomics data
Correlation of miTG score to the repression of protein production
Discussion and conclusion
Number of miTGs predicted in common by programs
Percentage of common predictions among programs
Recently, the rapid growth in the discovery rate of novel miRNA sequences due to extensive usage of deep sequencing technology , and the fact that miRNAs have been shown to undergo A-to-I RNA editing  have underlined the need for a web based program which would allow for miRNA target predictions based on user defined miRNA sequences. DIANA-microT 3.0 is one of the few programs offering such a service, supporting the scientific community with a tool which in total can be extensively used for the analysis of miRNA dependent processes. This tool can be accessed thought the DIANA-microT  web server at http://www.microrna.gr/microT which includes an optimized prediction algorithm that provides several features, combined with a user friendly interface which assists in the identification of interactions of interest.
As already mentioned, DIANA-microT 3.0 takes into account both conserved and not conserved MREs. This attribute provides the algorithm with a highly important capability to predict targets of viral miRNA sequences. Generally, targets of viral miRNAs are not expected to be conserved and this limits the ability of algorithms dependent on conservation to identify them. However, since DIANA-microT 3.0 algorithm accepts non conserved MREs it can successfully cope with viral miRNA sequences.
Funding: Aristeia Award from General Secretary Research and Technology, Greece
- Bartel DP: MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 2004, 116(2):281–297. 10.1016/S0092-8674(04)00045-5View ArticlePubMedGoogle Scholar
- Liu J, Carmell MA, Rivas FV, Marsden CG, Thomson JM, Song JJ, Hammond SM, Joshua-Tor L, Hannon GJ: Argonaute2 is the catalytic engine of mammalian RNAi. Science 2004, 305(5689):1437–1441. 10.1126/science.1102513View ArticlePubMedGoogle Scholar
- Lee RC, Feinbaum RL, Ambros V: The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 1993, 75(5):843–854. 10.1016/0092-8674(93)90529-YView ArticlePubMedGoogle Scholar
- Lagos-Quintana M, Rauhut R, Lendeckel W, Tuschl T: Identification of novel genes coding for small expressed RNAs. Science 2001, 294(5543):853–858. 10.1126/science.1064921View ArticlePubMedGoogle Scholar
- Lau NC, Lim LP, Weinstein EG, Bartel DP: An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 2001, 294(5543):858–862. 10.1126/science.1065062View ArticlePubMedGoogle Scholar
- Lee RC, Ambros V: An extensive class of small RNAs in Caenorhabditis elegans. Science 2001, 294(5543):862–864. 10.1126/science.1065329View ArticlePubMedGoogle Scholar
- Kiriakidou M, Nelson PT, Kouranov A, Fitziev P, Bouyioukos C, Mourelatos Z, Hatzigeorgiou A: A combined computational-experimental approach predicts human microRNA targets. Genes Dev 2004, 18(10):1165–1178. 10.1101/gad.1184704PubMed CentralView ArticlePubMedGoogle Scholar
- Lewis BP, Shih IH, Jones-Rhoades MW, Bartel DP, Burge CB: Prediction of mammalian microRNA targets. Cell 2003, 115(7):787–798. 10.1016/S0092-8674(03)01018-3View ArticlePubMedGoogle Scholar
- Papadopoulos GL, Reczko M, Simossis VA, Sethupathy P, Hatzigeorgiou AG: The database of experimentally supported targets: a functional update of TarBase. Nucleic Acids Res 2009, (37 Database):D155–158. 10.1093/nar/gkn809
- Lewis BP, Burge CB, Bartel DP: Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 2005, 120(1):15–20. 10.1016/j.cell.2004.12.035View ArticlePubMedGoogle Scholar
- Brennecke J, Stark A, Russell RB, Cohen SM: Principles of microRNA-target recognition. PLoS Biol 2005, 3(3):e85. 10.1371/journal.pbio.0030085PubMed CentralView ArticlePubMedGoogle Scholar
- Baek D, Villen J, Shin C, Camargo FD, Gygi SP, Bartel DP: The impact of microRNAs on protein output. Nature 2008, 455(7209):64–71. 10.1038/nature07242PubMed CentralView ArticlePubMedGoogle Scholar
- Kertesz M, Iovino N, Unnerstall U, Gaul U, Segal E: The role of site accessibility in microRNA target recognition. Nat Genet 2007, 39(10):1278–1284. 10.1038/ng2135View ArticlePubMedGoogle Scholar
- Long D, Lee R, Williams P, Chan CY, Ambros V, Ding Y: Potent effect of target structure on microRNA function. Nat Struct Mol Biol 2007, 14(4):287–294. 10.1038/nsmb1226View ArticlePubMedGoogle Scholar
- Grimson A, Farh KK, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP: MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell 2007, 27(1):91–105. 10.1016/j.molcel.2007.06.017PubMed CentralView ArticlePubMedGoogle Scholar
- Gaidatzis D, van Nimwegen E, Hausser J, Zavolan M: Inference of miRNA targets using evolutionary conservation and pathway analysis. BMC Bioinformatics 2007, 8: 69. 10.1186/1471-2105-8-69PubMed CentralView ArticlePubMedGoogle Scholar
- Lim LP, Lau NC, Garrett-Engele P, Grimson A, Schelter JM, Castle J, Bartel DP, Linsley PS, Johnson JM: Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature 2005, 433(7027):769–773. 10.1038/nature03315View ArticlePubMedGoogle Scholar
- Sood P, Krek A, Zavolan M, Macino G, Rajewsky N: Cell-type-specific signatures of microRNAs on target mRNA expression. Proc Natl Acad Sci USA 2006, 103(8):2746–2751. 10.1073/pnas.0511045103PubMed CentralView ArticlePubMedGoogle Scholar
- Selbach M, Schwanhausser B, Thierfelder N, Fang Z, Khanin R, Rajewsky N: Widespread changes in protein synthesis induced by microRNAs. Nature 2008, 455(7209):58–63. 10.1038/nature07228View ArticlePubMedGoogle Scholar
- Lall S, Grun D, Krek A, Chen K, Wang YL, Dewey CN, Sood P, Colombo T, Bray N, Macmenamin P, et al.: A genome-wide map of conserved microRNA targets in C. elegans. Curr Biol 2006, 16(5):460–471. 10.1016/j.cub.2006.01.050View ArticlePubMedGoogle Scholar
- Rehmsmeier M, Steffen P, Hochsmann M, Giegerich R: Fast and effective prediction of microRNA/target duplexes. Rna 2004, 10(10):1507–1517. 10.1261/rna.5248604PubMed CentralView ArticlePubMedGoogle Scholar
- Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ: miRBase: tools for microRNA genomics. Nucleic Acids Res 2008, (36 Database):D154–158.
- Flicek P, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, et al.: Ensembl 2008. Nucleic Acids Res 2008, (36 Database):D707–714.
- Karolchik D, Hinrichs AS, Kent WJ: The UCSC Genome Browser. Curr Protoc Bioinformatics 2007, Chapter 1(Unit 1):4.PubMedGoogle Scholar
- Sethupathy P, Megraw M, Hatzigeorgiou AG: A guide through present computational approaches for the identification of mammalian microRNA targets. Nat Methods 2006, 3(11):881–886. 10.1038/nmeth954View ArticlePubMedGoogle Scholar
- Maragkakis M, Reczko M, Simossis VA, Alexiou P, Papadopoulos GL, Dalamagas T, Giannopoulos G, Goumas G, Koukis E, Kourtis K, et al.: DIANA-microT web server: elucidating microRNA functions through target prediction. Nucleic Acids Res 2009, (37 Web Server):W273–276. 10.1093/nar/gkp292
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.