isomiR-SEA: an RNA-Seq analysis tool for miRNAs/isomiRs expression level profiling and miRNA-mRNA interaction sites evaluation
© Urgese et al. 2016
Received: 27 August 2015
Accepted: 19 February 2016
Published: 31 March 2016
Massive parallel sequencing of transcriptomes, revealed the presence of many miRNAs and miRNAs variants named isomiRs with a potential role in several cellular processes through their interaction with a target mRNA. Many methods and tools have been recently devised to detect and quantify miRNAs from sequencing data. However, all of them are implemented on top of general purpose alignment methods, thus providing poorly accurate results and no information concerning isomiRs and conserved miRNA-mRNA interaction sites.
To overcome these limitations we present a novel algorithm named isomiR-SEA, that is able to provide users with very accurate miRNAs expression levels and both isomiRs and miRNA-mRNA interaction sites precise classifications. Tags are mapped on the known miRNAs sequences thanks to a specialized alignment algorithm developed on top of biological evidence concerning miRNAs structure. Specifically, isomiR-SEA checks for miRNA seed presence in the input tags and evaluates, during all the alignment phases, the positions of the encountered mismatches, thus allowing to distinguish among the different isomiRs and conserved miRNA-mRNA interaction sites.
isomiR-SEA performances have been assessed on two public RNA-Seq datasets proving that the implemented algorithm is able to account for more reliable and accurate miRNAs expression levels with respect to those provided by two compared state of the art tools. Moreover, differently from the few methods currently available to perform isomiRs detection, the proposed algorithm implements the evaluation of isomiRs and conserved miRNA-mRNA interaction sites already in the first alignment phases, thus avoiding any additional filtering stages potentially responsible for the loss of useful information.
miRNAs are nowadays recognized as a class of evolutionally conserved, single-stranded, 19–23 nucleotides (nt) long non-coding RNAs acting, both in plants and animals, as post-transcriptional regulators. Genes codifying for miRNAs can be localized at different sites into the genome even if, in humans, the majority of them have been detected into the transcripts intronic regions. The involvement of miRNAs in almost every physiological process such as cell fate determination, proliferation, cell death and in many cellular activities among which immune response, insulin secretion, neurotransmitter synthesis, circadian rhythm and viral replication is widely proven [1–4]. Furthermore, recent researches have shown deregulated miRNAs expression in many pathologies such as cardiovascular diseases , autoimmune disorders  and cancers with implications in numerous oncogenic or tumor suppressor pathways [7–9].
Recently, thanks to the advent of Next Generation Sequencing (NGS) technologies, it is possible to carefully detect and quantify the amount of both known and novel miRNAs in RNA-Seq samples, accounting also for single nucleotide polymorphisms (SNPs) [10–13]. The high quality of data coming from NGS technologies has been exploited to gain novel insights into miRNAs variants called isomiRs. isomiRs are miRNAs variants that originate from miRNAs loci as consequence of specific processes as exoribonucleases or nucleotidyl transferase activity, RNA editing, SNPs or imprecise cleavage by the ribonucleases Drosha and Dicer . Accordingly to Neilsen and colleagues research , isomiRs can be distinguished into three main classes: 3’ isomiRs, 5’ isomiRs and polymorphic (SNP) isomiRs. Specifically the first have been identified in different studies [16–18] as the most common and abundant variants in both plants and animals. Recently the functionality of 3’ isomiRs has been investigated with regard to compensatory base pairing at the 3’ end  and distinct silencing functions attributed to different miRNA family members with shared seed sequences but divergent 3 ends . Fernandez and colleagues  proven the variation of isomiRs expression levels in Drosophila Melanogaster, depending on the developmental stage and the specific tissue, thus suggesting a role for these miRNAs variants in different biological processes. In 2012 Li and colleagues  pointed out different isomiRs expression patterns in normal and tumour gastric tissues calling for isomiRs relevance during miRNAs analyses. Moreover, results from these studies seem to account for an impact of isomiRs on target selection, miRNA stability or a different loading into the RISC complex. In particular, with respect to isomiRs target selection, evidences of an impact of miRNA sequence variations on target mRNA selection have been recently experimentally validated by Tan and colleagues .
Many software tools have been developed during the last decade to extract miRNAs expression levels from RNA-Seq data. Among them, miRDeep [28, 29], miRanalyzer [30, 31], miRExpress , miRTRAP , DSAP , mirTools , MIReNA , miRNAkey  and mireap  have been recently compared showing quite different performances in terms of sensitivity and accuracy levels as well as poorly overlapping miRNAs output sets . Being developed on top of general purpose alignment algorithms as Bowtie  or BWA , these tools are not able to evaluate the encountered mismatches accordingly to their positions on miRNA sequence. This accounts for reduced sensitivity and accuracy levels in miRNAs detection and for their incapability in distinguishing between the different isomiRs and modifications in miRNA-mRNA interaction sites. For instance, a complete and uncorrupted seed sequence is fundamental to ensure the miRNA:mRNA complex stability . Differently from state of the art tools, isomiR Seed Extension Aligner (isomiR-SEA) exploits a miRNA specific alignment algorithm, able to correlate the encountered mismatches with their positions on the miRNA sequence and then to produce accurate miRNAs expression level quantification and both isomiRs and miRNA-mRNA interaction sites precise classification. All this information is collected during the alignment procedure, thus avoiding additional filtering stages potentially responsible for the loss of useful data. It is worth noting that the evaluation of the impact of isomiRs on the conserved miRNA-mRNA interaction sites has two main consequences. On the one hand it allows to clarify the impact of isomiRs on miRNAs functionalities thus gaining novel insights into miRNAs regulative behaviour. On the other hand, on the basis of the observed results, it is possible to decide whether to include or not detected isomiRs in the calculated miRNAs expression level.
In the “ Implementation ” Section additional algorithmic implementation details are provided. isomiR-SEA performances, assessed on two different public datasets, will be presented and discussed in the “ Results and discussion ”. This section is organized as follows: The details concerning the experimental set-up adopted to perform the analysis are given in the “ Testing procedure ” Subsection. In “ isomiR-SEA isomiRs and interaction sites detection ” Subsection are provided results obtained on the first dataset, that highlight the variety of information extracted by isomiR-SEA. The comparisons, in terms of computed expression levels between isomiR-SEA and two state of the art tools, performed on the second dataset, are shown in isomiR-SEA comparisons . Finally in “ Computational costs ” the computational costs proper of the three tools are discussed. In the “ Conclusions ” Section the final remarks deriving from the proposed work are highlighted.
The isomiR-SEA tool exploits a miRNA-tailored alignment procedure, named miR-SEA , that implements an accurate miRNA model derived from experimental evidences . The model is built upon the main features characterizing the seed sequence (red sequence in Fig. 1), which is nowadays recognized to play a fundamental role in target recognition and, as consequence, in genes regulation [43, 44]. isomiR-SEA keeps track of the results achieved at each step of the alignment procedure and analyses them in order to recognize miRNAs variants and characterize miRNA-mRNA interaction sites.
As a consequence of the addition or deletion of nucleotides and the presence of SNPs, remarkable effects on miRNAs target selection can be observed . Then the conservation in the tag sequence of miRNA-mRNA interaction sites is checked during isomiR-SEA alignment procedure. Figure 1 reports on the four interaction sites evaluated by isomiR-SEA: i) The seed sequence, a region located at nucleotides 2–7 shared by all the miRNAs belonging to the same family and known to be the most important interaction site between miRNA and its mRNA targets; ii) the 3’-supplementary site, a sequence corresponding to nucleotides 13–16 on the miRNA structure and characterized as sensitive to pairing geometry preferring at least 3–4 contiguous Watson-Crick pairs not interrupted by bulges, mismatches or wobbles; iii) the central site, a region on the miRNA corresponding to nucleotides 4–16 and iv) the offset site, at nucleotide 8 on the miRNA sequence. Moreover, in order to further improve the accuracy of the calculated expression levels and to correctly distinguish among miRNAs belonging to the same family, ambiguously mapped tags (i.e. tags assigned to more than one miRNA), are evaluated on the basis of their alignment scores and assigned to a unique miRNA.
A. Parameters setting and input files pre-processing
Twelve parameters allow users to trigger isomiR-SEA behaviour. Specifically, tags can be aligned on one or more species by imposing a minimum alignment length and the start/end positions where the seed has to be searched into the tag sequence. Moreover users can decide which alignments to discard by changing the value of a specific parameter introduced to evaluate miRNA-tag identity.
Two input files have to be provided to isomiR-SEA tool for the detection of miRNAs expression levels as shown in Fig. 3 a. The first file contains miRNAs reference sequences while the second stores the input tags table of counts (i.e. table reporting the occurrence in the sample of each tag) on which the analysis has to be carried out (green cylinders in Fig. 3).
miRNAs reference sequences have been downloaded from miRBase , a well known miRNAs repository currently at version 21. The file containing miRNAs sequences, downloaded from miRBase database, has to be processed in the so called mature DB seed-based sorting step, where all the seeds are firstly extracted from the miRNAs reference sequences, then classified depending on the species and finally grouped together. The obtained groups, sharing the same seed sequence, are then stored in an optimized data structure allowing its efficient access during the alignment procedure.
The second input file is the standard tags counts file. Generally this file is obtained by means of freely available tools performing the following steps: i) Trimming of adapter sequences on raw fastq or fasta reads, output of sequencing machines; ii) grouping in unique sequences called tags; iii) calculation of the occurrence of reads supporting the various tags. The output file, reporting tag sequences followed by their occurrence, is stored by isomiR-SEA in the main memory.
B. isomiR-SEA alignment procedure
Each miRNA seed is searched, in this phase, into each stored tags (as reported in block b of Fig. 3). Tags harbouring the seed are extended without gaps allowed in both directions (ungapped extension). This procedure is done for all the miRNAs sharing the same detected seed in a position compatible with miRNA structure . Tags not satisfying these conditions are currently discarded and will be not considered in the further steps of the analysis increasing, in this way, the computational speed. As future work, we are planning to re-map these tags on pre-miRNA sequences by adopting an accurate local aligner implementing DGS model .
The selected tags are further extended in 3’ direction allowing for the presence of a second mismatch. At this point, the distance of the second encountered mismatch with respect to the previous found, is evaluated. Tags characterized by two mismatches in the first 10 aligned nucleotides are discarded, whereas the others are further extended until the end of the alignment or a third mismatch. This threshold parameter can be modified by the user. However the default value has been chosen according to experimental insights .
These two scores can be adopted to select the most reliable alignment in case of tags ambiguously assigned to more than a miRNA.
As last step, the isomiRs, as well as the interaction sites, are annotated according to the position of the mismatches found during the alignment procedure.
C. isomiR-SEA output files generation
Nine output files are generated at each isomiR-SEA run in order to allow users to deeply investigate the performed alignments. Output files provide information both from a tag and from a miRNA view point.
In particular, if a tag has been mapped on a unique miRNA its information will be stored in a unique tag file, while if the tag aligned on more than a miRNA its data will be reported in a specific file storing the so called ambiguous tags. Some of these ambiguous tags will be then re-evaluated using the quality parameters (a l i g n score and m i R t a g diff ) and assigned to a unique miRNA. These tags are then stored in a third file.
These tag-related files provide users with all the isomiRs and miRNA-mRNA interaction sites detected starting from the same tag, the unique or multiple miRNAs on which the tag has been mapped and the two mapping parameters values previously described in Eqs. 1 and 2. Furthermore, in case of multi-mapped tags, indications relative to the different detected miRNAs are reported, such as the number of families to which they belong, the number of miRNAs on which the tag has been mapped and the scoring distance of the second best alignment detected from the best one reported.
From a miRNA point of view, two main outputs have been adopted to summarize the detected expression levels, according to the format used by the most known miRNAs discovery tools. The first file aggregates for the different miRNAs the information collected from the unique mapped tags whereas the second file, derived from the selected tags, reports on those tags that have been re-evaluated thanks to the alignment score parameters. Even in this case the information related to isomiRs and interaction sites is reported, together with data concerning the number of supporting reads and the computed alignment scores.
It is worth noting however that users are free to either consider only the miRNAs expression files to acquire the overall expression of the detected miRNAs and isomiRs, or to look at the tags files introducing the preferred filtering stages or grouping data according to their needs.
Concerning the implementation, isomiR-SEA tool has been developed in C++ programming language, taking advantage of the open source SeqAn library . This library provides users with algorithms and data structures for the analysis of biological sequences that ensure high performances and, at the same time, ease of integration in analysis pipelines.
Results and discussion
We report here the results of isomiR-SEA performance assessment on two datasets extensively analysed in previous researches by Somel and colleagues  [GEO:GSE18069] and Li et al.  [GEO:GSE31617, GEO:GSE24952] for which miRNAs and mRNAs sequencing data have been freely provided on GEO database repository . These two studies proven respectively the regulative roles of miRNAs in gene expression modulation throughout the entire life span of both humans and macaques and the general or cancer-specific involvement of different genes and miRNAs in the carcinomas of bladder, kidney and testis.
The experiments performed allowed us to show the capability of isomiR-SEA tool in providing a detailed picture of miRNAs, isomiRs and conserved miRNA-mRNA interaction sites characterizing the samples under investigation.
Finally isomiR-SEA performances have been compared with two state of the art tools, i.e. miRanalyzer [30, 31] and miRExpress . In the comparison, both miRNAs expression levels quality and computational times have been evaluated.
Both datasets have been analysed by: i) Adopting miRNAs reference databases, the miRBase release 11 (characterized by 6211 mature miRNAs from 74 species), used in the aforementioned works, and the latest release 21 that stores 35828 mature miRNAs sequences from 223 species; ii) using as an input the datasets reads trimmed allowing 2 or 3 mismatches in the adapter sequence; iii) setting the isomiR-SEA running parameters, i.e. seed length and minimum alignment threshold, respectively to 6 (from nt 2 to 7) and 11. In the following, where not explicitly stated, the reported results were obtained using human miRBase 21 as reference. Furthermore results reported in this section have been produced from reads trimmed allowing a maximum number of mismatches equal to 3 and collapsed to get the tags table of counts. Further details are reported in Additional file 1: Text S1.
isomiR-SEA isomiRs and interaction sites detection
Concerning Somel dataset, miRNAs reads were collected from the superior frontal gyrus of 12 healthy humans and 12 rhesus macaques of different ages, selected in a cohort of individuals. In agreement with previous Somel et. al researches , we will refer in the following to human developmental and ageing stage samples, respectively for data collected from <20 years old and >20 years old humans. This threshold has been fixed at 4 years for macaques.
Since the considered dataset includes two different species, in order to perform a more specific evaluation of human miRNAs expression levels, the reads alignment has been performed by adopting as reference the union of human and macaque miRBase sequences. This allowed to discard from the computed human miRNAs expression levels those tags better aligned on macaques miRNAs.
Results shown in the following have been obtained by adopting this experimental set-up and are discussed in this section to highlight trends associated to exact miRNAs, to different isomiRs, conserved miRNA-mRNA interaction sites and overall reads counts. In particular we report the results of the analysis of these behaviours in relation to ageing and developmental stages in both the species, pointing out the differences and similarities between them where present.
The account of isomiR types and their percentage is important in order to check the impact of the detected mismatches in the stability of miRNA-mRNA interaction sites (as it will be shown in few examples). These mismatches can compromise the effectiveness of miRNA activity. After this evaluation, the user can decide to discard those tags supporting isomiRs that can lead to unstable or weak interactions, thus obtaining a lower miRNA expression count.
Expression trends depending on the life stage are pointed out also in Fig. 4 b relatively to miR-154-3p: The percentage of mirna_exact reads in humans decreases during the developmental stage until a minimum equal to 64.5 % for Sample h09-13.99 at 14 years old and starts growing again during ageing. Furthermore, an interesting behaviour in iso_snp can be identified during the different stages: Their percentages grow until the transition point while they become smaller in the ageing stage.
Coming back to Fig. 4 d, it can be seen for miR-664a-5p, various isomiRs trends. In particular, human samples are characterized by an higher average percentage equal to 15.1 % of iso_snp-iso_3p with respect to macaques, where this percentage is 8.2 %. A similar behaviour, even if less evident, can be also detected for those isoforms classified as iso_5p-iso_snp-iso_3p. Furthermore, a relatively small number of mirna_exact reads contribute to the total reads number. This isomiRs spectrum is conserved in both species independently from age and developmental stage. The inter-species average values for mirna_exact reads are indeed equal to 0.5 % for both humans and macaques.
miR-664a-5p has not been identified by Somel as implicated in regulative processes occurring during development or ageing. The reduced percentages of exact mapped reads provided by isomiR-SEA could explain missed interaction sites and as a consequence justify miR-664a-5p limited regulative role. Two state of the art tools, i.e. miRanalyzer and miRExpress, which performances will be extensively discussed in the following subsection, provided for the considered miRNA an expression level higher than the one calculated by isomiR-SEA: Tags attributed by these tools to miR-664a-5p probably derive from isomiRs and so the produced expression level can not be correctly correlated to miRNA regulative functions.
The percentage of reads attributed to miR-34c-5p isomiRs (see Fig. 4 e) increases during ageing in humans, whereas a quite stable behaviour can be pointed out in macaques. miR-34c-5p, that has been proven to be differentially expressed during the developmental stage in humans, acts by down-regulating a group of genes involved in neuronal functions . The analysis of the miRNA-mRNA interaction sites in humans, as already observed for miR-29a-3p, highlighted the presence in the tags of almost all the interaction sites explaining miRNA capability in down-regulating target genes consistently with . In fact this miRNA is mainly identified by tags with exact mapping or tags with mismatches in 3p region that do not impact the interaction sites.
Finally, Fig. 4 f describes let-7e-5p isomiRs spectrum. An increasing percentage of reads attributed to iso_snp can be detected in older human samples, whereas quite stable trends are associated to the other identified isomiRs. In order to investigate to what extent miRNAs variations determining isomiRs impact on miRNA-mRNA interactions sites, we report in Fig. 7 b the percentages of reads accounting for let-7e-5p conserved interaction sites. A reduction of the percentage of reads attributed to all the three possible interaction modes is observed as age increases. The loss of interaction sites can be attributed to the presence of SNPs in the miRNA region corresponding to the central site thus highlighting a possible role of SNPs in target selection.
The second analysed dataset is composed of 27 patient-matched tumor-normal miRNA-Seq samples with ages varying from 30–70. In particular 10 samples derive from Transitional Cell Carcinoma (TCC), 7 from Testicular Germ Cell Tumor (TGCT) and the last 10 from clear cell Renal Cell Carcinoma (ccRCC) .
The results obtained by isomiR-SEA on this dataset have been compared to those provided by three state of the art tools, namely miRanalyzer [30, 31], miRExpress  and miRDeep2 [28, 29], running with the default configurations. The proposed analysis has been performed in order to show how the strategies implemented by the tools impact on the computation of miRNAs expression levels. In particular, miR-1244 has been selected over the other miRNAs because its alignment results in several isomiRs, and for this reason, miR-1244 is characterized by high variability in the expression levels provided by the various tools. Moreover, its observed low reads counts allowed to easily analyse the tags alignments and highlight the different characteristics of isomiR-SEA and the other two tools.
Referring to Sample T27N, all the tools have been able to attribute to miR-1244 those reads characterized by the presence of a complete seed sequence with exception of miRanalyzer that discarded those alignments having different nucleotides inserted in tag 5’-end, even in the presence of a complete seed sequence. On the other hand deletions in tag 3’-end in presence of complete seeds did not caused tags to be discarded. Furthermore, the first two tags characterized by the lack of a complete seed (reported with red background) are detected by all isomiR-SEA competitor tools. isomiR-SEA instead, being implemented to report tags alignments characterized by the presence of a complete seed sequence, discarded the last three alignments that present mismatches or deletions in the seed. The last red alignment highlights miRExpress sensitivity to mismatches: The tag has been indeed discarded because of a mismatch in the 5’-end.
In Sample T04N it is possible to note how tags discarded by isomiR-SEA, because they are missing the complete seed sequence, are instead identified by competitor tools even in presence of only two seed nucleotides. miRanalyzer discarded, also in this case, tags with nucleotides inserted in 5’-end even if characterized by a complete seed sequence. miRExpress as well as miRDeep2, reported all the tags containing seed sequence with exception of the fifth, probably because of the high number of mismatches encountered during its alignment on the whole miRNA sequence.
In Sample B05N miRExpress filtered out 5 out of 6 tags harbouring the seed because of a high number of mismatches. However, being these mismatches localized in the 3’-end of tag sequences, they do not account for the loss of interaction sites. As a consequence they cannot be considered responsible for changes in miRNA regulative behaviour and so could be counted, more properly, among miR-1244 supporting tags. Conversely, miRDeep2 detected all the alignments characterized by a complete seed sequence. The two green alignments in position one and three are detected by isomiR-SEA tool but then filtered out because their alignment size was shorter than the aforementioned threshold of 4 nucleotides. Indeed miRNA length is equal to 26 nucleotides whereas tags are respectively 21 and 20 nucleotides long.
However, being isomiR-SEA able (differently from the compared tools) to report in detail during all the alignment phases encountered matches/mismatches and insertions/deletions and to annotate the identified isomiRs, the combinations among them and the conserved interaction sites, the users can select: i) Which tags filter out, ii) which kind of isomiRs consider for miRNA expression evaluation and iii) how to group the data and perform statistics on the basis of the observed results.
All the alignments of Sample K07N are discarded by isomiR-SEA because tags missed the seed. Whereas concerning Sample B20N, two tags harbouring the seed have been detected by isomiR-SEA even in presence of SNPs.
With respect to Fig. 11 a, isomiR-SEA results account for a relevant number of miRNAs characterized by expression levels two order of magnitude higher than those provided by miRExpress tool. These miRNAs, detected as more expressed by isomiR-SEA tool, are supported by reads harbouring in their sequences the seed, but in the most of the cases present sequence variations with respect to the exact miRNA sequence. They are evaluated as isomiRs by isomiR-SEA. Otherwise, probably because of the high number of mismatches detected with respect to the miRNA sequence, they are not considered by miRExpress due to the inability proper of this tool to detect iso-SNP variants (as showed in Fig. 10).
The LCC relative to miRanalyzer and isomiR-SEA read counts are reported in Fig. 11 b. Differently from what was previously highlighted in relation to miRExpress and isomiR-SEA comparison, it is not possible here to detect a significant imbalance in the expression of the miRNAs detected by the two tools on the considered dataset. However miRanalyzer produces slightly higher expression levels because, as shown in the analysis reported in Fig. 10, it counts among miRNAs supporting tags also sequences characterized by the absence of seed sequence.
It is worth noting that, in the reported analysis, all the isomiRs were counted by isomiR-SEA in order to evaluate the miRNAs expression, filtering out only those tags with an incomplete seed sequence. However, differently from miRanalyzer and miRExpress, the user can decide to weight the different isomiRs in the expression evaluation or totally exclude some of them. Moreover, the prevalence of blue dots accounts for reads not exactly attributed to miRNAs and probably deriving from isomiRs. Thus, in order to discriminate between the different isomiRs and miRNAs and to correctly evaluate the conservation of miRNA-mRNA interaction sites and, as a consequence, the miRNA regulatory activity, it is necessary to use a tool able to provide accurate information about tags mapping.
Different miRNAs expression levels can be furthermore pointed out when the couple miRExpress:miRanalyzer LCCs is considered, as shown in Additional file 1: Text S4. In this case, it is not possible provide information about isomiRs and thus the tags supporting the different miRNAs have been considered as the exact miRNAs sequence and tags with mismatches in the seed have been also kept. The same analysis is reported in Additional file 1: Text S5 for Somel dataset.
As it is possible to note from Fig. 12 the most expensive tool, from a computational point of view, is miRExpress that required about 976 h to complete the test. Computational times significantly lower can be pointed out for miRanalyzer, that took about 34 h to finish the analysis. Further lower computational costs have been observed for isomiR-SEA that required about 25 h of computation.
In this paper we proposed a novel methodology and tool, named isomiR-SEA, able to provide users with a complete and accurate picture of the miRNAs, isomiRs and conserved miRNA:mRNA interaction sites characterizing the tissue under examination. Differently from state of the art tools that rely on general purpose alignment algorithms, isomiR-SEA is built on top of miRNAs specific features, among them, the seed sequence. Its work-flow basically consists in miRNAs seed sequence match followed by a series of ungapped extension in both 5’ and 3’ directions. Being able to keep track of all the performed alignment steps, the tool can evaluate all miRNAs variants, namely isomiRs, and to characterize miRNAs interaction sites.
Concerning isomiRs, isomiR-SEA provides the expression levels of iso_5p, iso_3p, iso_snp, iso_multi_snp or combinations of them and allows to distinguish which among supplementary, offset and central interaction sites are conserved in the tags.
isomiR-SEA provides as output two main types of files: The first contains information organized from a tag point of view whereas the second reports results sorted according to the specific miRNA. The proper format of both of them allows to easily process data for i) performing further statistical analyses, ii) filtering specific isomiRs, iii) implementing an isomiR-weighted expression analysis of the detected miRNAs.
Tests performed on the labelled data by Somel and colleagues  and Li et al.  allowed to highlight isomiR-SEA potential in extracting novel information related to well known miRNAs behaviours as well as its possible applications in different miRNAs studies, such as multi-species, developmental or cancer studies. isomiR-SEA results have been moreover compared to those provided by two well known tools, i.e. miRanalyzer and miRExpress, to prove the proper advantages of a miRNA specific alignment procedure. Finally the reduced computational costs make isomiR-SEA tool a viable solution for the analysis of large dataset and, as a consequence, a great alternative to design wide miRNAs studies.
Availability and requirements
Project name: isomiR-SEAProject home page: http://eda.polito.it/isomir-sea Operating systems: Linux, Windows, Mac OS XProgramming language: C, C++License: freely available for academic purposes
Computational resources were provided by HPC@POLITO , a project of Academic Computing within the Department of Control and Computer Engineering at the Politecnico di Torino.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Bartel DP. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell. 2004; 116(2):281–297.View ArticlePubMedGoogle Scholar
- Bartel DP. MicroRNAs: target recognition and regulatory functions. Cell. 2009; 136(2):215–233.View ArticlePubMedPubMed CentralGoogle Scholar
- Dong H, Lei J, Ding L, Wen Y, Ju H, Zhang X. MicroRNA: function, detection, and bioanalysis. Chemical Rev. 2013; 113(8):6207–6233.View ArticleGoogle Scholar
- Ha M, Kim VN. Regulation of microRNA biogenesis. Nat Rev Mol Cell Biol. 2014; 15(8):509–524.View ArticlePubMedGoogle Scholar
- Small EM, Frost RJ, Olson EN. MicroRNAs add a new dimension to cardiovascular disease. Circulation. 2010; 121(8):1022–1032.View ArticlePubMedPubMed CentralGoogle Scholar
- Pauley KM, Cha S, Chan EK. MicroRNA in autoimmunity and autoimmune diseases. J Autoimmun. 2009; 32(3):189–194.View ArticlePubMedPubMed CentralGoogle Scholar
- McManus MT. MicroRNAs and cancer. In: Seminars in cancer biology. vol. 13. Academic Press Elsevier: 2003. p. 253–258.Google Scholar
- Di Leva G, Croce CM. Roles of small RNAs in tumor formation. Trends Mol Med. 2010; 16(6):257–267.View ArticlePubMedPubMed CentralGoogle Scholar
- Iorio MV, Croce CM. MicroRNA dysregulation in cancer: diagnostics, monitoring and therapeutics. A comprehensive review. EMBO Mol Med. 2012; 4(3):143–159.View ArticlePubMedPubMed CentralGoogle Scholar
- Metzker ML. Sequencing technologies–the next generation. Nat Rev Genet. 2009; 11(1):31–46.View ArticlePubMedGoogle Scholar
- Lee CY, Chiu YC, Wang LB, Kuo YL, Chuang EY, Lai LC, et al.Common applications of next-generation sequencing technologies in genomic research. Transl Cancer Res. 2013; 2(1):33–45.Google Scholar
- Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009; 10(1):57–63.View ArticlePubMedPubMed CentralGoogle Scholar
- Ozsolak F, Milos PM. RNA sequencing: advances, challenges and opportunities. Nat Rev Genet. 2011; 12(2):87–98.View ArticlePubMedPubMed CentralGoogle Scholar
- Starega-Roslan J, Witkos TM, Galka-Marciniak P, Krzyzosiak WJ. Sequence Features of Drosha and Dicer Cleavage Sites Affect the Complexity of IsomiRs. Int J Mol Sci. 2015; 16(4):8110–8127.View ArticlePubMedPubMed CentralGoogle Scholar
- Neilsen CT, Goodall GJ, Bracken CP. IsomiRs–the overlooked repertoire in the dynamic microRNAome. Trends Genet. 2012; 28(11):544–549.View ArticlePubMedGoogle Scholar
- Wyman SK, Knouf EC, Parkin RK, Fritz BR, Lin DW, Dennis LM, et al.Post-transcriptional generation of miRNA variants by multiple nucleotidyl transferases contributes to miRNA transcriptome complexity. Genome Res. 2011; 21(9):1450.View ArticlePubMedPubMed CentralGoogle Scholar
- Lee LW, Zhang S, Etheridge A, Ma L, Martin D, Galas D, et al.Complexity of the microRNA repertoire revealed by next-generation sequencing. Rna. 2010; 16(11):2170–2180.View ArticlePubMedPubMed CentralGoogle Scholar
- Burroughs AM, Ando Y, de Hoon MJ, Tomaru Y, Nishibu T, Ukekawa R, et al.A comprehensive survey of 3’ animal miRNA modification events and a possible role for 3’ adenylation in modulating miRNA targeting effectiveness. Genome Res. 2010; 20(10):1398–1410.View ArticlePubMedPubMed CentralGoogle Scholar
- Kozlowska E, Krzyzosiak WJ, Koscianska E. Regulation of huntingtin gene expression by miRNA-137,-214,-148a, and their respective isomiRs. Int J Mol Sci. 2013; 14(8):16999–17016.View ArticlePubMedPubMed CentralGoogle Scholar
- Moore MJ, Scheel TK, Luna JM, Park CY, Fak JJ, Nishiuchi E, et al.miRNA-target chimeras reveal miRNA 3 [prime]-end pairing as a major determinant of Argonaute target specificity. Nature Commun. 2015; 6:6.Google Scholar
- Fernandez-Valverde SL, Taft RJ, Mattick JS. Dynamic isomiR regulation in Drosophila development. RNA. 2010; 16(10):1881–1888.View ArticlePubMedPubMed CentralGoogle Scholar
- Li SC, Liao YL, Ho MR, Tsai KW, Lai CH, Lin Wc. miRNA arm selection and isomiR distribution in gastric cancer. BMC genomics. 2012; 13(Suppl 1):S13.View ArticlePubMedPubMed CentralGoogle Scholar
- Tan GC, Chan E, Molnar A, Sarkar R, Alexieva D, Isa IM, et al. 5 isomiR variation is of functional and evolutionary importance. Nucleic Acids Res. 2014; 42:gku656.Google Scholar
- Shin C, Nam JW, Farh KKH, Chiang HR, Shkumatava A, Bartel DP. Expanding the microRNA targeting code: functional sites with centered pairing. Mol Cell. 2010; 38(6):789–802.View ArticlePubMedPubMed CentralGoogle Scholar
- Reyes-Herrera P, Ficarra E. One decade of development and evolution of microRNA target prediction algorithms. Genomics, Proteomics & Bioinforma. 2012; 10(5):254–263.View ArticleGoogle Scholar
- Peterson SM, Thompson JA, Ufkin ML, Sathyanarayana P, Liaw L, Congdon CB. Common features of microRNA target prediction tools. Frontiers in Gen. 2014; 5:5.Google Scholar
- Friedman RC, Farh KKH, Burge CB, Bartel DP. Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 2009; 19(1):92–105.View ArticlePubMedPubMed CentralGoogle Scholar
- Friedlander MR, Chen W, Adamidi C, Maaskola J, Einspanier R, Knespel S, et al.Discovering microRNAs from deep sequencing data using miRDeep. Nature Biotechnol. 2008; 26(4):407–415.View ArticleGoogle Scholar
- Friedländer MR, Mackowiak SD, Li N, Chen W, Rajewsky N. miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades. Nucleic Acids Res. 2012; 40(1):37–52.View ArticlePubMedPubMed CentralGoogle Scholar
- Hackenberg M, Sturm M, Langenberger D, Falcon-Perez JM, Aransay AM. miRanalyzer: a microRNA detection and analysis tool for next-generation sequencing experiments. Nucleic Acids Res. 2009; 37(suppl 2):W68—W76.PubMedPubMed CentralGoogle Scholar
- Hackenberg M, Rodriguez-Ezpeleta N, Aransay AM. miRanalyzer: an update on the detection and analysis of microRNAs in high-throughput sequencing experiments. Nucleic Acids Res. 2011; 39(suppl 2):W132—W138.PubMedPubMed CentralGoogle Scholar
- Wang WC, Lin FM, Chang WC, Lin KY, Huang HD, Lin NS. miRExpress: analyzing high-throughput sequencing data for profiling microRNA expression. BMC Bioinforma. 2009; 10(1):328. Available from: http://mirexpress.mbc.nctu.edu.tw/index.php. Accessed 01 Oct 2015.View ArticleGoogle Scholar
- Hendrix D, Levine M, Shi W. miRTRAP, a computational method for the systematic identification of miRNAs from high throughput sequencing data. Genome Biol. 2010; 11(4):R39.View ArticlePubMedPubMed CentralGoogle Scholar
- Huang PJ, Liu YC, Lee CC, Lin WC, Gan RRC, Lyu PC, et al.DSAP: deep-sequencing small RNA analysis pipeline. Nucleic Acids Res. 2010; 38(suppl 2):W385—W391.PubMedPubMed CentralGoogle Scholar
- Zhu E, Zhao F, Xu G, Hou H, Zhou L, Li X, et al.mirTools: microRNA profiling and discovery based on high-throughput sequencing. Nucleic Acids Res. 2010; 38(suppl 2):W392—W397.PubMedPubMed CentralGoogle Scholar
- Mathelier A, Carbone A. MIReNA: finding microRNAs with high accuracy and no learning at genome scale and from deep sequencing data. Bioinforma. 2010; 26(18):2226–2234.View ArticleGoogle Scholar
- Ronen R, Gan I, Modai S, Sukacheov A, Dror G, Halperin E, et al.miRNAkey: a software for microRNA deep sequencing analysis. Bioinforma. 2010; 26(20):2615–2616.View ArticleGoogle Scholar
- Qibin L, Jian W. MIREAP: microRNA discovery by deep sequencing. Available from: http://sourceforge.net/projects/mireap/. Accessed 02 May 2015.
- Li Y, Zhang Z, Liu F, Vongsangnak W, Jing Q, Shen B. Performance comparison and evaluation of software tools for microRNA deep-sequencing data analysis. Nucleic Acids Res. 2012; 40(10):4298–4305.View ArticlePubMedPubMed CentralGoogle Scholar
- Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012; 9(4):357–359.View ArticlePubMedPubMed CentralGoogle Scholar
- Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinforma. 2009; 25(14):1754–1760.View ArticleGoogle Scholar
- Urgese G, Paciello G, Isella C, Medico E, Macii E, Ficarra E, et al. miR-SEA: miRNA Seed Extension based Aligner Pipeline for NGS Expression Level Extraction. In: Proc. 2nd Int. Work-Conf. Bioinform. Biomed. Eng.(IWBBIO): 7–9 April 2014; Granada. Copicentro Granada SL: 2014. p. 1015–1026.Google Scholar
- Vasudevan S, Tong Y, Steitz JA. Switching from repression to activation: microRNAs can up-regulate translation. Science. 2007; 318(5858):1931–1934.View ArticlePubMedGoogle Scholar
- Orom UA, Nielsen FC, Lund AH. MicroRNA-10a binds the 5 UTR of ribosomal protein mRNAs and enhances their translation. Molecular cell. 2008; 30(4):460–471.View ArticlePubMedGoogle Scholar
- Doring A, Weese D, Rausch T, Reinert K. SeqAn an efficient, generic C++ library for sequence analysis. BMC Bioinforma. 2008; 9(1):11.View ArticleGoogle Scholar
- Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 2006; 34(suppl 1):D140–D144.View ArticlePubMedPubMed CentralGoogle Scholar
- Urgese G, Paciello G, Acquaviva A, Ficarra E, Graziano M, Zamboni M. Dynamic Gap Selector: A Smith Waterman Sequence Alignment Algorithm with Affine Gap Model Optimisation. In: Proc. 2nd Int. Work-Conf. Bioinform. Biomed. Eng. (IWBBIO): 7–9 April 2014; Granada. Copicentro Granada SL: 2014. p. 1347–1358.Google Scholar
- Somel M, Guo S, Fu N, Yan Z, Hu HY, Xu Y, et al.MicroRNA, mRNA, and protein expression link development and aging in human and macaque brain. Genome Res. 2010; 20(9):1207–1218.View ArticlePubMedPubMed CentralGoogle Scholar
- Li X, Chen J, Hu X, Huang Y, Li Z, Zhou L, et al.Comparative mRNA and microRNA expression profiling of three genitourinary cancers reveals common hallmarks and cancer-specific molecular events. PLoS ONE. 2011; 6(7):e22570.View ArticlePubMedPubMed CentralGoogle Scholar
- Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, et al.NCBI GEO: archive for functional genomics data sets-update. Nucleic Acids Res. 2013; 41(D1):D991—D995.View ArticlePubMedPubMed CentralGoogle Scholar
- The HPC Polito Project. Available from: http://www.hpc.polito.it. Accessed 02 Oct 2015.