Prediction of novel miRNAs and associated target genes in Glycine max
© Joshi et al; licensee BioMed Central Ltd. 2010
Published: 18 January 2010
Small non-coding RNAs (21 to 24 nucleotides) regulate a number of developmental processes in plants and animals by silencing genes using multiple mechanisms. Among these, the most conserved classes are microRNAs (miRNAs) and small interfering RNAs (siRNAs), both of which are produced by RNase III-like enzymes called Dicers. Many plant miRNAs play critical roles in nutrient homeostasis, developmental processes, abiotic stress and pathogen responses. Currently, only 70 miRNA have been identified in soybean.
We utilized Illumina's SBS sequencing technology to generate high-quality small RNA (sRNA) data from four soybean (Glycine max) tissues, including root, seed, flower, and nodules, to expand the collection of currently known soybean miRNAs. We developed a bioinformatics pipeline using in-house scripts and publicly available structure prediction tools to differentiate the authentic mature miRNA sequences from other sRNAs and short RNA fragments represented in the public sequencing data.
The combined sequencing and bioinformatics analyses identified 129 miRNAs based on hairpin secondary structure features in the predicted precursors. Out of these, 42 miRNAs matched known miRNAs in soybean or other species, while 87 novel miRNAs were identified. We also predicted the putative target genes of all identified miRNAs with computational methods and verified the predicted cleavage sites in vivo for a subset of these targets using the 5' RACE method. Finally, we also studied the relationship between the abundance of miRNA and that of the respective target genes by comparison to Solexa cDNA sequencing data.
Our study significantly increased the number of miRNAs known to be expressed in soybean. The bioinformatics analysis provided insight on regulation patterns between the miRNAs and their predicted target genes expression. We also deposited the data in a soybean genome browser based on the UCSC Genome Browser architecture. Using the browser, we annotated the soybean data with miRNA sequences from four tissues and cDNA sequencing data. Overlaying these two datasets in the browser allows researchers to analyze the miRNA expression levels relative to that of the associated target genes. The browser can be accessed at http://digbio.missouri.edu/soybean_mirna/.
Identification of the complete set of miRNAs and other small regulatory RNAs in organisms is essential with regard to our understanding of genome organization, genome biology, and evolution . There are three important classes of endogenous small RNAs in plants, animal or fungi: micro RNAs (miRNAs), short interfering RNAs (siRNAs) and piwi-interacting RNAs (piRNAs). In plants, there are no known piRNA.
MicroRNAs (miRNAs) are small 18-24 nucleotide regulatory RNAs that play very important roles in post-transcriptional gene regulation by directing degradation of mRNAs or facilitating repression of targeted gene translation [4, 5]. While siRNA are processed from longer double stranded RNA molecules and represent both strands of the RNA, miRNAs originate from hairpin precursors formed from one RNA strand [6, 7]. The hairpin precursors (pre-miRNA) are typically around ~60-70 bp in animals, but somewhat larger, ~90-140 bp in plants. In plants, helped by RNA polymerase II, miRNA gene is first transcribed into pri-miRNA. The pri-miRNAs are cleaved to miRNA precursors (pre-miRNA), which form a characteristic hairpin structure, catalyzed by Dicer-like enzyme (DCL1) [7, 8]. The pre-miRNA is further cleaved to a miRNA duplex (miRNA:miRNA*), a short double-stranded RNA (dsRNA) . The dsRNA is then exported to cytoplasm by exportin-5. Helped by AGO1, single-strand mature miRNA will form a RNA-protein complex, named RNA-induced silencing complex (RISC), which negatively regulates gene expression by inhibiting gene translation or degrading mRNAs by perfect or near-perfect complement to target mRNAs [10, 11].
Although some soybean miRNA were previously identified , the number was small and, therefore, the identification of all soybean miRNAs is far from complete. The aim of this study is to expand the collection of miRNAs expressed in soybean by using a deep sequencing approach with the Illumina Solexa platform. Towards this, we generated Solexa cDNA sequencing data for root, nodule and flower tissues since they are all relevant soybean organs to various studies in legume biology and due to their impact on soybean yield. One of the legume-specific traits is the symbiosis existing between the legume root and soil bacteria leading to the nodule. We think the small RNA content of soybean nodules needs to be established since research in other legume species showed a role for small RNA in nodule development [13, 14]. Root tissue is another important organ to analyze due to its role in nutrient-water absorption, which is clearly important to soybean yield. Finally, we selected flower for its direct impact on soybean seed yield. We constructed the small RNA libraries, prepared from these four different soybean tissues and each library was sequenced individually, generating a total of over one million sequences per library. We developed a bioinformatics pipeline using in-house developed scripts and other publicly available RNA structure prediction tools to differentiate the authentic mature miRNA sequences from other small RNAs and short RNA fragments represented in the sequencing data. We also conducted a detailed analysis of predicted miRNA target genes and correlated the miRNA expression data to that of the corresponding target genes using Solexa cDNA sequencing data.
Sources of sequences and assemblies
Illumina's SBS sequencing technology was utilized to generate high-quality reads from four different soybean tissues, including root, seed, flower, and nodule. The Gmax1.01 release version genomic sequences and gene model predictions of Williams 82 soybean genome were acquired from Phytozome  and used as a reference genome.
Small RNA library construction and SBS sequencing
Soybean (Glycine max L. Merr.) cultivar Williams 82 were planted in 5 L pots containing a mixture of two parts Metro-Mix 360 (Scotts-Sierra Horticultural Products Co., Marysville, Ohio) and 1 part vermiculite (Strong-Lite Medium Vermiculite, Sun Gro Horticulture Co, Seneca Illinois). Plants were grown under controlled environmental conditions in a greenhouse, with a temperature regime of 22 ± 3°C/day and 20 ± 3°C/night, and relative humidity ranging from 45% to 65% throughout the day/night cycle. Sunlight was supplemented with metal halide lamps, set to a 15-h day, 9-h night cycle (lights on at 700 h). The soil mixture was kept moist by an automated drip irrigation system, which delivered nutrient solution twice a day (younger plants) or three times a day (older plants). The nutrient solution contained the following concentrations of mineral salts: 1 mM KNO3, 0.4 mM Ca(NO3)2, 0.1 mM MgSO4, 0.15 mM KH2PO4 and 25 μM CaCl2, 25 μM H3BO3, 2 μm MnSO4, 2 μM ZnSO4, 0.5 μM CuSO4, 0.5 μM H2MoO4, 0.1 μM NiSO4, 1 μM Fe(III)-/N/,/N'-/ethylenebis [2-(2-hydroxyphenyl)-glycine]. Flowers were collected at -2 to 2 days after anthesis. Seeds were harvest at 20 days post anthesis. Root tissues were collected from 3 weeks-old soybean seedlings, which were grown with nitrogen and 1/2 lullien  solution in cassions. To collect nodules, soybean plants were grown in aeropontic growth champers for 7 days in 1/2 × Lullien medium , with no nitrogen and inoculated with Bradyrhizobium japonicum USDA110 to induce nodule formation. Nodules were collected 7, 14, and 21 days after inoculation and pooled for RNA isolation. Total RNA of different tissues were isolated by TRIzol reagents (Invitrogen). Small RNA libraries were constructed and sequenced as previously described .
Bioinformatics analyses of sequencing data
The sequencing reads were generated for all four libraries along with base quality score information. The sequences were initially trimmed for adapter sequences with an in-house trimming procedure and raw abundance read counts were calculated for all unique sequences for each library individually. This final set of reads and corresponding read counts for each library were all combined to generate a list of unique sequences, which are referred to as sequence tags henceforth. These sequence tags were then mapped to the soybean genome assembly using the stand-alone version of Megablast software , and the corresponding matching chromosome number, positions and strand were recorded for those mapped to the genome with no mis-matches in sequence tags. The sequence tags not passing this mapping criterion were excluded from further analysis.
Statistics of sequenced tags matching genome, tRNA and rRNA
Tags with genome hits
Distinct tags with genome hits
Tags with genome hits unique to library
Tags with tRNA hits
Distinct tags with tRNA hits
Tags with rRNA hits
Distinct tags with rRNA hits
# Sum_use (tags with non tRNA, rRNA genomic hits)
where n_base is a million (1,000,000); #sum_use is number of tags found to hit genomic positions of non-tRNA and non-rRNA regions. To ensure that we were working with good quality and expressed small RNA sequences, we further removed any sequence tags where the sum of TPM abundance from all the four libraries was <20 TPM . At the conclusion of these filtering steps, 113,399 sequence tags were considered in our further analysis.
The candidate miRNA and miRNA* should come from opposite stem-arms and must be entirely within the arm of the hairpin;
miRNA::miRNA* duplex mismatches were restricted to four or fewer;
The frequency of asymmetric bulges is restricted to less than one and the size should be less than 2 bases.
Following these stringent filtering criteria, we identified 129 putative miRNA sequence tags that were compared against the downloaded miRBase , containing known miRNA sequences for soybean and other genomes, using FASTA to identify the already known miRNA and differentiate them from the novel miRNA. We identified a total of 129 putative miRNA sequences, of which 42 matched the already known miRNAs in miRBase while 87 were novel miRNAs.
miRNA family assignment
The 129 predicted miRNA sequences were further assigned to miRNA families using sequence similarity to other known miRNA (including already known soybean miRNA) in the miRBase database. The putative miRNA sequences, along with the known soybean miRNA sequences, were used for multiple alignments using ClustalW  and the miRNA families were assigned based on the dendogram tree. 42 miRNAs were assigned families based on matches to known miRNAs, while the 87 novel miRNAs will get new families using miRBase Registry.
miRNA target gene prediction and experimental validation
The 129 predicted miRNA sequences were later utilized for miRNA target identification using an in-house plant target prediction program following standard rules of miRNA-mRNA interactions [26, 27]. The filtering criteria were: a mismatch is given a score of 1, a wobble (G:U mismatch) is given a score of 0.5, and a bulge is given a score of 2. The final entry score was set to 4. At the same time, positions 10 and 11 of miRNA must perfectly match to its target, and there should be no more than one mismatch in miRNA position 2 to 9. Overall, we identified 603 target genes for 78 of the identified miRNA. Further analysis showed that target genes could be predicted for ~73% of known miRNAs and ~54% of novel miRNAs. No targets could be predicted for the remaining ~26% of known miRNA and ~45% of novel miRNA using the above stringent score cutoff. To validate the miRNA-cleavage site, PolyA RNA was extracted and a 5'-RACE reaction was performed using a Gene Racer core kit (Invitrogen, Carlsbad, CA) following previously published methods .
Data display and data integration
We deposited the data in a soybean genome browser based on the architecture provided by USCS genome browser. The Gmax1.01 release version genomic sequences and gene model predictions were downloaded from Phytozome and formatted to be used as annotations for our soybean genome browser. The mapped chromosome number and position information of the four small RNA library sequence tags was converted into a BED format and uploaded to the browser as user tracks. The final putative miRNA list was also converted to BED format and also color-coded based on expression pattern, as well as miRNA length.
Solexa transcriptome data analysis and integration
We also generated Solexa cDNA sequencing data for three out of the four soybean tissues (root, nodule and flower) that were used for small RNA library construction. In addition, we also have Solexa cDNA reads derived from mRNA isolated from green pods. The raw sequencing reads for the four tissues were aligned to the soybean reference genome acquired from Phytozome using the MAQ-0.6.6 version software . The aligned read positions were further converted into the WIG format and uploaded to the soybean genome browser as user tracks for the four tissues individually. The transcriptomic data for the libraries were normalized against the total number of soybean reads identified in each tissue and analysed to generate the expression abundance values for all the genes in soybean. The expression values of the miRNA were compared against the miRNA target gene abundance values to provide some valuable insight into the predicted target gene regulation by the respective miRNA. We calculated Spearman correlation coefficients between the two and looked specifically into any miRNA-target gene pairs with a strong negative correlation.
Overview of small RNA sequencing results
Novel and known miRNA identification
miRNA target gene prediction
Soybean Genome Browser and Solexa transcriptomics data integration
The soybean genome browser developed utilizing the architecture provided by the UCSC Genome Browser allows for incorporation of the small RNA data, along with soybean transcriptome sequencing data. Overlaying these two datasets along side the soybean gene model predictions facilitates a complete view of the small RNA library distribution along entire chromosomes in one view. It also allows biologists to compare the miRNA expression levels to that of the predicted target genes to get valuable insight into the regulation patterns.
Sequencing of four small RNA libraries in soybean which generated over one million sequencing reads per library and its subsequent bioinformatics analyses to identify the authentic known and novel miRNA added 87 new miRNA to the list of known soybean miRNAs. This study encompasses many more soybean tissues than those examined by earlier studies and also provides a unique opportunity to study the relationship between the miRNA expression levels and their regulation of the corresponding target genes utilizing the Solexa cDNA sequencing data derived from the same tissues at the same time. The visualization of the small RNA libraries data alongside the transcriptomics data in the genome browser can help biologists to better understand the dynamics of gene regulation. The many hypothesis generated from this relationship can help advance our understanding of miRNA target gene regulation in the future. As more miRNAs are discovered and their target genes identified, finding biological roles for these interactions enables deeper understanding of gene regulation and of the roles of both the miRNA and its target genes in plant development.
TJ and DX were supported by United Soybean Board. ZY, ML and GS were supported by a grant from the National Science Foundation, Plant Genome Research Program, #DBI-0421620). Work on legume small RNAs in the labs of PJG, DJS and BCM was supported by USDA award 2006-03567. We would also like to thank Robin Kramer for setting up and maintaining the genome browser locally, where the data of this study were deposited.
This article has been published as part of BMC Bioinformatics Volume 11 Supplement 1, 2010: Selected articles from the Eighth Asia-Pacific Bioinformatics Conference (APBC 2010). The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/11?issue=S1.
- Brodersen P, Voinnet O: The diversity of RNA silencing pathways in plants. Trends Genet 2006, 22: 268–280. 10.1016/j.tig.2006.03.003View ArticlePubMedGoogle Scholar
- Lippman Z, Martienssen R: The role of RNA interference in heterochromatic silencing. Nature 2004, 431: 364–370. 10.1038/nature02875View ArticlePubMedGoogle Scholar
- Zhang BH, Pan XP, Cannon CH, Cobb GP, Anderson TA: Identification and characterization of new plant microRNAs using EST analysis. Cell Res 2005, 15: 336–360. 10.1038/sj.cr.7290302View ArticlePubMedGoogle Scholar
- Carrington JC, Ambros V: Role of microRNAs in plant and animal development. Science 2003, 301: 336–338. 10.1126/science.1085242View ArticlePubMedGoogle Scholar
- Glazov EA, Cottee PA, Barris WC, Moore RJ, Dalrymple BP, Tizard ML: A microRNA catalog of the developing chicken embryo identified by a deep sequencing approach. Genome Res 2008, 18: 957–64. 10.1101/gr.074740.107PubMed CentralView ArticlePubMedGoogle Scholar
- He L, Hannon GJ: MicroRNAs: Small RNAs with a big role in gene regulation. Nat Rev Genet 2004, 5: 522–531. 10.1038/nrg1379View ArticlePubMedGoogle Scholar
- Chapman EJ, Carrington JC: Specialization and evolution of endogenous small RNA pathways. Nat Rev Genet 2007, 8: 884–896. 10.1038/nrg2179View ArticlePubMedGoogle Scholar
- Meyers BC, Axtell MJ, Bartel B, Bartel DP, Baulcombe D, Bowman JL, Cao X, Carrington JC, Chen X, Green PJ, et al.: Criteria for annotation of plant MicroRNAs. Plant Cell 2008, 20: 3186–3190. 10.1105/tpc.108.064311PubMed CentralView ArticlePubMedGoogle Scholar
- Hannon GJ: RNA interference. Nature 2002, 418: 244–251. 10.1038/418244aView ArticlePubMedGoogle Scholar
- Matzke M, Matzke AJM, Kooter JM: RNA: guiding gene silencing. Science 2001, 293: 1080–1083. 10.1126/science.1063051View ArticlePubMedGoogle Scholar
- Zhu JK: Reconstituting plant miRNA biogenesis. PNAS 2008, 105: 9851–9852. 10.1073/pnas.0805207105PubMed CentralView ArticlePubMedGoogle Scholar
- Subramanian S, Fu Y, Sunkar R, Barbazuk WB, Zhu JK, Yu O: Novel and nodulation-regulated microRNAs in soybean roots. BMC Genomics 2008, 9: 160–174. 10.1186/1471-2164-9-160PubMed CentralView ArticlePubMedGoogle Scholar
- Boualem A, Laporte P, Jovanovic M, Laffont C, Plet J, Combier JP, Niebel A, Crespi M, Frugier F: MicroRNA166 controls root and nodule development in Medicago truncatula. Plant J 2008, 54: 876–887. 10.1111/j.1365-313X.2008.03448.xView ArticlePubMedGoogle Scholar
- Combier JP, Frugier F, de Billy F, Boualem A, El-Yahyaoui F, Moreau S, Vernie T, Ott T, Gamas P, Crespi M, et al.: MtHAP2–1 is a key transcriptional regulator of symbiotic nodule development regulated by microRNA169 in Medicago truncatula. Genes Dev 2006, 20: 3084–3088. 10.1101/gad.402806PubMed CentralView ArticlePubMedGoogle Scholar
- Lullien V, Barker DG, de Lajudie P, Huguet T: Plant gene expression in effective and ineffective root nodules of alfalfa (Medicago sativa). Plant Mol Biol 1987, 9: 469–478. 10.1007/BF00015878View ArticlePubMedGoogle Scholar
- Nobuta K, Lu C, Shrivastava R, Pillay M, De Paoli E, Accerbi M, Arteaga-Vazquez M, Sidorenko L, Jeong DH, Yen Y, et al.: Distinct size distribution of endogenous siRNAs in maize: Evidence from deep sequencing in the mop1–1 mutant. PNAS 2008, 105: 14958–14963. 10.1073/pnas.0808066105PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang Z, Schwartz S, Wagner L, Miller W: A greedy algorithm for aligning DNA sequences. J Comput Biol 2000, 7: 203–14. 10.1089/10665270050081478View ArticlePubMedGoogle Scholar
- Lowe TM, Eddy SR: tRNAscan-SE: A program for improved detection of transfer RNA genes in genomic sequence. Nucl Acids Res 1997, 25: 955–964. 10.1093/nar/25.5.955PubMed CentralView ArticlePubMedGoogle Scholar
- Chan PP, Lowe TM: GtRNAdb: A database of transfer RNA genes detected in genomic sequence. Nucl Acids Res 2008, (37 Database):D93-D97.Google Scholar
- Lagesen K, Hallin PF, Rodland E, Staerfeldt HH, Rognes T, Ussery DW: RNammer: consistent annotation of rRNA genes in genomic sequences. Nucleic Acids Res 2007, 35: 3100–8. 10.1093/nar/gkm160PubMed CentralView ArticlePubMedGoogle Scholar
- Meyers BC, Tej SS, Vu TH, Haudenschild CD, Agrawal V, Edberg SB, Ghazal H, Decola S: The use of MPSS for whole-genome transcriptional analysis in Arabidopsis. Genome Res 2004, 14: 1641–1653. 10.1101/gr.2275604PubMed CentralView ArticlePubMedGoogle Scholar
- Markham NR, Zuker M: DINAMelt web server for nucleic acid melting prediction. Nucleic Acids Res 2005, 33: W577-W581. 10.1093/nar/gki591PubMed CentralView ArticlePubMedGoogle Scholar
- Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ: miRBase: tools for microRNA genomics. Nucleic Acids Res 2008, (36 Database):D154-D158.Google Scholar
- Higgins D, Thompson J, Gibson T, Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 1994, 22: 4673–4680. 10.1093/nar/22.22.4673PubMed CentralView ArticlePubMedGoogle Scholar
- Allen E, Xie Z, Gustafson AM, Carrington JC: microRNA-directed phasing during transacting siRNA biogenesis in plants. Cell 2005, 121: 207–221. 10.1016/j.cell.2005.04.004View ArticlePubMedGoogle Scholar
- Schwab R, Palatnik JF, Riester M, Schommer C, Schmid M, Weigel D: Specific effects of microRNAs on the plant transcriptome. Developmental cell 2005, 8: 517–27. 10.1016/j.devcel.2005.01.018View ArticlePubMedGoogle Scholar
- Maq: Mapping and Assembly with Qualities[http://maq.sourceforge.net/index.shtml]
- Hogg RV, Craig AT: Introduction to Mathematical Statistics. 5th edition. New York: Macmillan; 1995:338–400.Google Scholar
- Baskerville S, Bartel DP: Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. RNA 2005, 11: 241–247. 10.1261/rna.7240905PubMed CentralView ArticlePubMedGoogle Scholar
- Wang YP, Li KB: Correlation of expression profiles between microRNAs and mRNA targets using NCI-60 data. BMC Genomics 2009, 10: 218–231. 10.1186/1471-2164-10-218PubMed CentralView ArticlePubMedGoogle Scholar
- Dugas DV, Bartel B: MicroRNA regulation of gene expression in plants. Current Opinion in Plant Biology 2004, 7: 512–520. 10.1016/j.pbi.2004.07.011View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.