miRPlant: an integrated tool for identification of plant miRNA from RNA sequencing data
© An et al.; licensee BioMed Central Ltd. 2014
Received: 31 March 2014
Accepted: 1 August 2014
Published: 12 August 2014
Small RNA sequencing is commonly used to identify novel miRNAs and to determine their expression levels in plants. There are several miRNA identification tools for animals such as miRDeep, miRDeep2 and miRDeep*. miRDeep-P was developed to identify plant miRNA using miRDeep’s probabilistic model of miRNA biogenesis, but it depends on several third party tools and lacks a user-friendly interface. The objective of our miRPlant program is to predict novel plant miRNA, while providing a user-friendly interface with improved accuracy of prediction.
We have developed a user-friendly plant miRNA prediction tool called miRPlant. We show using 16 plant miRNA datasets from four different plant species that miRPlant has at least a 10% improvement in accuracy compared to miRDeep-P, which is the most popular plant miRNA prediction tool. Furthermore, miRPlant uses a Graphical User Interface for data input and output, and identified miRNA are shown with all RNAseq reads in a hairpin diagram.
We have developed miRPlant which extends miRDeep* to various plant species by adopting suitable strategies to identify hairpin excision regions and hairpin structure filtering for plants. miRPlant does not require any third party tools such as mapping or RNA secondary structure prediction tools. miRPlant is also the first plant miRNA prediction tool that dynamically plots miRNA hairpin structure with small reads for identified novel miRNAs. This feature will enable biologists to visualize novel pre-miRNA structure and the location of small RNA reads relative to the hairpin. Moreover, miRPlant can be easily used by biologists with limited bioinformatics skills.
miRPlant and its manual are freely available at http://www.australianprostatecentre.org/research/software/mirplant or http://sourceforge.net/projects/mirplant/.
miRNA is a class of non-coding endogenous small RNA that post transcriptionally regulates target genes . miRDeep-P  is one of the most commonly used computational plant miRNA identification tool, which is based on the miRDeep  algorithm.
The most challenging problem in identifying novel plant miRNA is to find a suitable genomic region as a miRNA precursor candidate (to test whether it forms hairpins) because the majority of precursor miRNA in plants are between 100-200 bp , which is much longer than those in animals. Approaches using a shorter miRNA precursor may result in false negatives if the miRNA is longer and more variable than the predicted precursor region. Conversely, using a longer candidate precursor region to test whether it forms a hairpin structure may result in a non-complimentary match for the mature miRNA within the candidate precursor miRNA. Thus, in miRPlant, after small RNA sequencing reads are mapped to the genome, genomic regions around mapped reads are extended by 200 bp to determine whether they form hairpin structures. To ensure detection of short plant miRNA, we also scan 100 bp regions to see if we can detect a hairpin. This strategy can detect bona fide miRNAs that would otherwise be missed if only the longer (200 bp) precursor candidate length was used.
filter out reads if their length is out of the 10-23 bp range, or which have a read-quality below the criteria that is set by user.
aggregate exact reads into one.
map aggregated reads to the genome reference without mismatch. miRPlant uses the Java-coded bowtie  alignment algorithm. BAM format is used to store mapped reads. Please note that the attribute “XS” in the BAM file is used to record the copy number of the read as introduced by miRDeep*.
gather sequences in the reference genome flanking the RNAseq read (precursor miRNA region) to determine whether the genomic region forms a hairpin structure using the RNA secondary structure algorithm .
use the miRDeep model to calculate the score for each predicted miRNA to measure the strength of the prediction. A higher score equates to a higher probability that the predicted miRNA is true.
Results and discussion
Comparison table (ATH, MTR, PPE)
A. thaliana (Number of known miRNA: 121)
M. truncatula (Number of known miRNA: 196)
P. persica (Number of known miRNAs: 75)
Availability and requirements
Project name: miRPlant.
Project home page:http://www.australianprostatecentre.org/research/software/mirplant.
Operating system (s): Windows, Linux, Mac OS.
Programming language: Java.
Other requirements: JRE.
License: GNU General Public License.
Any restrictions to use by non-academics: None.
This work was supported by The Commonwealth Government of Australia, Department of Health and Queensland State Government, Smart Futures Premier Fellowship - Colleen C Nelson.
- Pritchard CC, Cheng HH, Tewari M: MicroRNA profiling: approaches and considerations. Nat Rev Genet. 2012, 13 (5): 358-369. 10.1038/nrg3198.View ArticlePubMed CentralPubMedGoogle Scholar
- Yang X, Li L: miRDeep-P: a computational tool for analyzing the microRNA transcriptome in plants. Bioinformatics. 2011, 27 (18): 2614-2615.PubMedGoogle Scholar
- Friedlander MR, Chen W, Adamidi C, Maaskola J, Einspanier R, Knespel S, Rajewsky N: Discovering microRNAs from deep sequencing data using miRDeep. Nat Biotechnol. 2008, 26 (4): 407-415. 10.1038/nbt1394.View ArticlePubMedGoogle Scholar
- Meyers BC, Axtell MJ, Bartel B, Bartel DP, Baulcombe D, Bowman JL, Cao X, Carrington JC, Chen X, Green PJ, Griffithsnes S, Jacobsen SE, Mallory AC, Martienssen RA, Poethig RS, Qi Y, Vaucheret H, Voinnet O, Watanabe Y, Weigel D, Zhu JK: Criteria for annotation of plant MicroRNAs. Plant cell. 2008, 20 (12): 3186-3190. 10.1105/tpc.108.064311.View ArticlePubMed CentralPubMedGoogle Scholar
- An J, Lai J, Lehman ML, Nelson CC: miRDeep*: an integrated application tool for miRNA identification from RNA sequencing data. Nucleic Acids Res. 2013, 41 (2): 727-737. 10.1093/nar/gks1187.View ArticlePubMed CentralPubMedGoogle Scholar
- Friedlander MR, Mackowiak SD, Li N, Chen W, Rajewsky N: miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades. Nucleic Acids Res. 2012, 40 (1): 37-52. 10.1093/nar/gkr688.View ArticlePubMed CentralPubMedGoogle Scholar
- Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009, 10 (3): R25-10.1186/gb-2009-10-3-r25.View ArticlePubMed CentralPubMedGoogle Scholar
- Hofacker IL: Vienna RNA secondary structure server. Nucleic Acids Res. 2003, 31 (13): 3429-3431. 10.1093/nar/gkg599.View ArticlePubMed CentralPubMedGoogle Scholar
- Zhu QH, Spriggs A, Matthew L, Fan L, Kennedy G, Gubler F, Helliwell C: A diverse set of microRNAs and microRNA-like small RNAs in developing rice grains. Genome Res. 2008, 18 (9): 1456-1465. 10.1101/gr.075572.107.View ArticlePubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.