mirCoX: a database of miRNA-mRNA expression correlations derived from RNA-seq meta-analysis
© Giles et al.; licensee BioMed Central Ltd. 2013
Published: 9 October 2013
Experimentally validated co-expression correlations between miRNAs and genes are a valuable resource to corroborate observations about miRNA/mRNA changes after experimental perturbations, as well as compare miRNA target predictions with empirical observations. For example, when a given miRNA is transcribed, true targets of that miRNA should tend to have lower expression levels relative to when the miRNA is not expressed.
We processed publicly available human RNA-seq experiments obtained from NCBI's Sequence Read Archive (SRA) to identify miRNA-mRNA co-expression trends and summarized them in terms of their Pearson's Correlation Coefficient (PCC) and significance.
We found that sequence-derived parameters from TargetScan and miRanda were predictive of co-expression, and that TargetScan- and miRanda-derived gene-miRNA pairs tend to have anti-correlated expression patterns in RNA-seq data compared to controls. We provide this data for download and as a web application available at http://wrenlab.org/mirCoX/.
This database of empirically established miRNA-mRNA transcriptional correlations will help to corroborate experimental observations and could be used to help refine and validate miRNA target predictions.
MicroRNAs (miRNAs) are small (19-22 nt) non-coding RNAs that can interfere with mRNA translation by base-pairing with mRNAs to form double-stranded RNA or by promoting the loss of polyadenylation, leading to inhibition of translation or degradation of the mRNA by the cellular machinery [1, 2]. Less often, dsRNA formed by miRNA-target complexes can target gene promoters and actually enhance transcription of target genes, sometimes termed RNAa (RNA activation) . Through these mechanisms, a single miRNA can potentially alter the expression levels of hundreds, even thousands, of mRNA transcripts  and long non-coding RNAs  and therefore exert considerable regulatory control over cellular processes. The diverse regulatory behavior of microRNAs also has been found to play an important role in a wide variety of pathologies, including cancer and cardiovascular disease [6, 7].
Predicting miRNA target genes has been a topic of active research , and works by calculating sequence similarity metrics between miRNAs and their putative target sequences, often in the 3' UTR of genes [9–14], although other features have predictive value . However, because miRNA-target base pairing is rarely characterized by perfect complementarity, even in the "seed" region, sequence-based miRNA target prediction algorithms are prone to many false positives and have poor inter-algorithm agreement [16, 17]. Experimental validation of miRNA-target interactions can be conducted, but is prohibitively costly and time-consuming to do for large-scale assessments of miRNA target prediction efficacy. Because of the biological effects of miRNAs on their targets, expression data is something that can potentially assist sequence-based methods. Therefore, several methods have been developed to prioritize predicted miRNA-target interactions in silico.
Several groups [18–20] have used expression data from paired miRNA-mRNA microarrays to generate correlations between miRNAs and their putative targets, with the idea that miRNA-mRNA pairs with a negative correlation are more likely to be legitimate interactions . Similarly, Gennarino et al developed HOCTAR, a method that re-ranks sequence-based predictions for intragenic miRNAs by using expression of a host gene as a proxy for expression of the miRNA, thus avoiding the necessity of using specialized miRNA arrays but at the cost of being restricted to intragenic miRNAs [21, 22].
Although previous expression-based miRNA-target interaction prioritization approaches have been successful, they have several drawbacks. Typically, the miRNA and mRNA expression profiles are determined by different types of arrays, leading to the possibility of technology-specific artifacts or batch effects. Although microRNAs can target other non-coding RNAs and methods are being actively developed to discover such interactions [5, 23, 24], ncRNAs are poorly represented on most array platforms. Furthermore, microarrays are not as quantitative and require probes for each transcript of interest, as compared with next-generation sequencing technology, which is probe agnostic and can provide information about all expressed transcripts, including isoforms.
To overcome these drawbacks, we generated a database of expression correlations between microRNAs and mRNAs by using total RNA sequencing (RNA-seq) experiments from NCBI's Sequence Read Archive (SRA). We then integrated sequence-based miRNA target predictions from miRanda and TargetScan databases together with RNA-seq derived expression correlations to prioritize these predicted miRNA-target pairs by co-expression and created a publicly-available web server for users to query these expression correlations. We anticipate this will be potentially useful for two different types of users. First, biologists would be interested from the standpoint of interpreting correlations in their own experiments. For example, if they knocked out Gene X and observed miRNA Y was highly expressed in that experiment, a natural question to ask is whether or not X and Y are normally anti-correlated. If so, it lends itself to the hypothesis that Gene X might somehow repress miRNA Y, either directly or indirectly. Second, bioinformaticists would potentially be interested in downloading the data to either help train their sequence-based miRNA target prediction algorithms or to use the correlations as corroborating evidence of effect. Finally, we would like to note that, although the empirical correlations we are reporting here may be suggestive of effect, this resource itself is not intended to predict miRNA-mRNA target pairs.
Selection and pre-processing of RNA-seq experiments
RNA co-expression data was obtained from processing RNA-seq datasets available for download in NCBI's Sequence Read Archive (SRA). The Bioconductor package SRAdb was installed  and its companion database was downloaded, current as of January 2013, to obtain experiment information from SRA. The database was queried to find RNA-seq runs which met the following criteria: 1) Taxon ID 9606 (human), 2) "RANDOM" library selection, to ensure that certain varieties of transcripts were not artificially enriched or depleted by the selection method, as for example via poly(A) tail selection; 3) "TRANSCRIPTOMIC" library sources and "RNA-Seq" library strategy to eliminate specialized procedures such as cap analysis of gene expression (CAGE), and 4) paired-end reads. The selected run accessions were then downloaded from the SRA using the Aspera software and converted to FASTQ using the "fastq-dump" program from NCBI's SRA Toolkit .
Reads were trimmed first for Illumina adapters, then by quality, using the "fastq-mcf" program from the ea-utils package. The Bowtie2 aligner  was used to map each set of reads in FASTQ format to the GRCh37/hg19 reference genome using the "--sensitive" parameter. Samtools  was used to convert Bowtie2's SAM output to sorted BAM. Using the Kent source utilities , these mappings were then converted to BigWig, and transcript coverage was quantified using the "bigWigAverageOverBed" program, where transcript coordinates were obtained from the UCSC knownGenes (for genes and ncRNAs) and from mirBase (for microRNAs). To obtain runs with adequate sequencing depth and quality, runs which had fewer than 10 million mapped reads, fewer than 60% mapped reads, or which failed two or more FASTQC tests were discarded from further analysis. Overall, two expression matrices (of knownGene IDs and mirBase IDs) containing raw mapped counts were constructed using 141 runs (see Additional File 1 for a list of runs used).
Transcript-miRNA correlations and miRNA-target information
Experimentally validated target predictions for each miRNA were obtained from miRecord , and computational predictions were obtained from TargetScan version 6.2  and miRanda (August 2010 release) [9, 11]. miRanda divides its predicted targets into conserved and nonconserved targets, and we analyzed these two groups separately. For miRecord, only 215 of 1582 miR-target pairs were mappable to miR-knownGene pairs. The reason for this relatively low mapping rate is that frequently the RefSeq IDs for genes in miRecords did not correspond to a RefSeq ID for a knownGene transcript.
Web application implementation
The above data were assembled into a MySQL database, and a PHP front-end was implemented to serve user searches on particular transcripts, genes, or miRNAs. The application is hosted on a LAMP server. This web interface is accessible by Internet Explorer version 10 or greater and modern versions of Firefox and Chrome.
miR-target correlations in experimentally validated and computationally predicted subsets
# miR-gene pairs
miRanda and TargetScan score parameters can predict co-expression
# Conserved Sites
# Nonconserved Sites
# Conserved 7mer-1a sites
# Conserved 7mer-m8 sites
# Conserved 8mer sites
# Nonconserved 7mer-1a sites
# Nonconserved 7mer-m8 sites
# Nonconserved 8mer sites
RNA-sequencing data contains information about expression patterns for the entire transcriptome at the time of measurement. Consequently, this data is an excellent means of exploring expression correlations among non-coding transcripts, microRNAs, and genes. We provide a method and interface to explore detected miRNA-mRNA correlations in light of the commonly accepted hypothesis that miRNA-mRNA pairing affects expression and/or translation of the mRNA, often as a means of repression but also as a means of activating transcription. Positive or negative correlations alone, of course, do not prove causation, as there can be a number of different factors involved in mRNA degradation. But strong correlations, particularly in the absence of other strong miRNA correlations with the same mRNA, can be considered strongly suggestive of an influence. And the more expression samples gathered for analysis, the more statistical confidence in trends that can be gained, and rare transcripts will become less of a problem.
The correlations we detected were statistically significant, yet much smaller in magnitude than we had anticipated prior to the study. There are a number of potential reasons this might be. First, since it has been observed that miRNAs, to achieve effective translational repression, frequently act in combination ., the transcription of one miRNA simply does not correlate strongly enough. This is analogous to the situation whereby multiple transcription factors frequently act in combination in order to achieve transcriptional activation . Second, it is possible that the repressive effect is more pronounced on the translational level. If so, this would suggest transcriptional approaches are not well-suited to address this problem. Third, it's possible that there is another layer of regulation not accounted for by simple correlations, for example the competing endogenous RNA hypothesis, whereby some transcripts may exist to "soak up" multiple miRNAs of the same type and keep them from repressing their targets . Finally, although probably least likely, if the bulk of transcripts being detected are pre-miRNAs that have not yet been processed (which we would expect to correlate with mature miRNA levels), then it's possible that there's another regulatory layer that keeps them from being processed until some further signal is given.
In summary, RNA-seq experiments provide us with a unique overview of the whole transcriptome in a single experiment, which can be used to detect transcript-transcript correlations and potentially detect whether or not one type of transcript with known regulatory effects is influencing the other. This work provides a resource to examine predicted correlations among two classes of RNA that are known to interact, with miRNAs able to degrade mRNA expression. It has the potential to help corroborate sequence-based predictions of miRNA-mRNA interactions, estimate the efficiency of using a miRNA with the specific intention of degrading a target mRNA, and for comparison of general co-expression trends to specific experimental observations.
mirCoX, a web application described in this article, is available at http://wrenlab.org/mirCoX
Publication of this work has been supported in part by the National Institutes of Health (NIH, grants # 1P20GM103636 and 8P20GM103456 to JDW) and by the Indian Council of Medical Research's Viral Disease Network Program (VIR/8/2011-ECD-1 to RGD).
This article has been published as part of BMC Bioinformatics Volume 14 Supplement 14, 2013: Proceedings of the Tenth Annual MCBIOS Conference. Discovery in a sea of data. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/14/S14.
- Aravin A, Tuschl T: Identification and characterization of small RNAs involved in RNA silencing. Febs Lett. 2005, 579: 5830-40. 10.1016/j.febslet.2005.08.009.View ArticlePubMedGoogle Scholar
- Giraldez AJ, Mishima Y, Rihel J, Grocock RJ, Van Dongen S, Inoue K, Enright AJ, Schier AF: Zebrafish MiR-430 promotes deadenylation and clearance of maternal mRNAs. Science. 2006, 312: 75-9. 10.1126/science.1122689.View ArticlePubMedGoogle Scholar
- Huang V, Qin Y, Wang J, Wang X, Place RF, Lin G, Lue TF, Li L-C: RNAa is conserved in mammalian cells. Plos One. 2010, 5: e8848-10.1371/journal.pone.0008848.PubMed CentralView ArticlePubMedGoogle Scholar
- Lim LP, Lau NC, Garrett-Engele P, Grimson A, Schelter JM, Castle J, Bartel DP, Linsley PS, Johnson JM: Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature. 2005, 433: 769-73. 10.1038/nature03315.View ArticlePubMedGoogle Scholar
- Juan L, Wang G, Radovich M, Schneider BP, Clare SE, Wang Y, Liu Y: Potential roles of microRNAs in regulating long intergenic noncoding RNAs. Bmc Med Genomics. 2013, 6 (Suppl 1): S7-10.1186/1755-8794-6-S1-S7.PubMed CentralView ArticlePubMedGoogle Scholar
- Majid S, Dar Aa, Saini S, Yamamura S, Hirata H, Tanaka Y, Deng G, Dahiya R: MicroRNA-205-directed transcriptional activation of tumor suppressor genes in prostate cancer. Cancer. 2010, 116: 5637-49. 10.1002/cncr.25488.PubMed CentralView ArticlePubMedGoogle Scholar
- Soifer HS, Rossi JJ, Saetrom Pl: MicroRNAs in disease and potential therapeutic applications. Mol Ther J Am Soc Gene Ther. 2007, 15: 2070-9. 10.1038/sj.mt.6300311.View ArticleGoogle Scholar
- Witkos TM, Koscianska E, Krzyzosiak WJ: Practical Aspects of microRNA Target Prediction. Curr Mol Med. 2011, 11: 93-109. 10.2174/156652411794859250.PubMed CentralView ArticlePubMedGoogle Scholar
- Betel D, Koppal A, Agius P, Sander C, Leslie C: Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites. Genome Biol. 2010, 11: R90-10.1186/gb-2010-11-8-r90.PubMed CentralView ArticlePubMedGoogle Scholar
- Enright AJ, John B, Gaul U, Tuschl T, Sander C, Marks DS: MicroRNA targets in Drosophila. Genome Biol. 2003, 5: R1-10.1186/gb-2003-5-1-r1.PubMed CentralView ArticlePubMedGoogle Scholar
- John B, Enright AJ, Aravin A, Tuschl T, Sander C, Marks DS: Human MicroRNA targets. Plos Biol. 2004, 2: e363-10.1371/journal.pbio.0020363.PubMed CentralView ArticlePubMedGoogle Scholar
- Lewis BP, Burge CB, Bartel DP: Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell. 2005, 120: 15-20. 10.1016/j.cell.2004.12.035.View ArticlePubMedGoogle Scholar
- Krek A, Grün D, Poy MN, Wolf R, Rosenberg L, Epstein EJ, MacMenamin P, da Piedade I, Gunsalus KC, Stoffel M, Rajewsky N: Combinatorial microRNA target predictions. Nat Genet. 2005, 37: 495-500. 10.1038/ng1536.View ArticlePubMedGoogle Scholar
- Chandra V, Girijadevi R, Nair AS, Pillai SS, Pillai RM: MTar: a computational microRNA target prediction architecture for human transcriptome. BMC Bioinformatics. 2010, 11 (Suppl 1): S2-10.1186/1471-2105-11-S1-S2.PubMed CentralView ArticlePubMedGoogle Scholar
- Grimson A, Farh KK-H, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP: MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell. 2007, 27: 91-105. 10.1016/j.molcel.2007.06.017.PubMed CentralView ArticlePubMedGoogle Scholar
- Didiano D, Hobert O: Perfect seed pairing is not a generally reliable predictor for miRNA-target interactions. Nat Struct Mol Biol. 2006, 13: 849-51. 10.1038/nsmb1138.View ArticlePubMedGoogle Scholar
- Saito T, Saetrom P: MicroRNAs-targeting and target prediction. New Biotechnol. 2010, 27: 243-9. 10.1016/j.nbt.2010.02.016.View ArticleGoogle Scholar
- Huang JC, Babak T, Corson TW, Chua G, Khan S, Gallie BL, Hughes TR, Blencowe BJ, Frey BJ, Morris QD: Using expression profiling data to identify human microRNA targets. 2007, 4: 1045-1049.Google Scholar
- Hausser J, Berninger P, Rodak C, Jantscher Y, Wirth S, Zavolan M: MirZ: an integrated microRNA expression atlas and target prediction resource. Nucleic Acids Res. 2009, 37: W266-72. 10.1093/nar/gkp412.PubMed CentralView ArticlePubMedGoogle Scholar
- Su W-L, Kleinhanz RR, Schadt EE: Characterizing the role of miRNAs within gene regulatory networks using integrative genomics techniques. Mol Syst Biol. 2011, 7: 490-PubMed CentralView ArticlePubMedGoogle Scholar
- Gennarino VA, Sardiello M, Mutarelli M, Dharmalingam G, Maselli V, Lago G, Banfi S: HOCTAR database: a unique resource for microRNA target prediction. Gene. 2011, 480: 51-8. 10.1016/j.gene.2011.03.005.PubMed CentralView ArticlePubMedGoogle Scholar
- Gennarino VA, Sardiello M, Avellino R, Meola N, Maselli V, Anand S, Cutillo L, Ballabio A, Banfi S: MicroRNA target prediction by expression analysis of host genes. Genome Res. 2009, 19: 481-90.PubMed CentralView ArticlePubMedGoogle Scholar
- Jeggari A, Marks DS, Larsson E: miRcode: a map of putative microRNA target sites in the long non-coding transcriptome. Bioinforma Oxf Engl. 2012, 28: 2062-3. 10.1093/bioinformatics/bts344.View ArticleGoogle Scholar
- Paraskevopoulou MD, Georgakilas G, Kostoulas N, Reczko M, Maragkakis M, Dalamagas TM, Hatzigeorgiou AG: DIANA-LncBase: experimentally verified and computationally predicted microRNA targets on long non-coding RNAs. Nucleic Acids Res. 2013, 41: D239-45. 10.1093/nar/gks1246.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhu Y, Stephens RM, Meltzer PS, Davis SR: SRAdb: query and use public next-generation sequencing data from within R. BMC Bioinformatics. 2013, 14: 19-10.1186/1471-2105-14-19.PubMed CentralView ArticlePubMedGoogle Scholar
- NCBI SRA Toolkit. [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=std]
- Langmead B, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012, 9: 357-9. 10.1038/nmeth.1923.PubMed CentralView ArticlePubMedGoogle Scholar
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The Sequence Alignment/Map format and SAMtools. Bioinforma Oxf Engl. 2009, 25: 2078-9. 10.1093/bioinformatics/btp352.View ArticleGoogle Scholar
- Kent WJ, Zweig AS, Barber G, Hinrichs AS, Karolchik D: BigWig and BigBed: enabling browsing of large distributed datasets. Bioinforma Oxf Engl. 2010, 26: 2204-2207. 10.1093/bioinformatics/btq351.View ArticleGoogle Scholar
- Smyth G: Limma: linear models for microarray data. Bioinforma Comput Biol Solutions Using R Bioconductor. 2005, New York: Springer, 397-420.View ArticleGoogle Scholar
- Xiao F, Zuo Z, Cai G, Kang S, Gao X, Li T: miRecords: an integrated resource for microRNA-target interactions. Nucleic Acids Res. 2009, 37: D105-10. 10.1093/nar/gkn851.PubMed CentralView ArticlePubMedGoogle Scholar
- Valencia-Sanchez MA, Liu J, Hannon GJ, Parker R: Control of translation and mRNA degradation by miRNAs and siRNAs. Genes Dev. 2006, 20: 515-24. 10.1101/gad.1399806.View ArticlePubMedGoogle Scholar
- Yeang CH, Jaakkola T: Modeling the combinatorial functions of multiple transcription factors. J Comput Biol. 2006, 13: 463-80. 10.1089/cmb.2006.13.463.View ArticlePubMedGoogle Scholar
- Salmena L, Poliseno L, Tay Y, Kats L, Pandolfi PP: A ceRNA hypothesis: the Rosetta Stone of a hidden RNA language?. Cell. 2011, 146: 353-8. 10.1016/j.cell.2011.07.014.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.