Alternative splicing detection workflow needs a careful combination of sample prep and bioinformatics analysis
© Carrara et al.; licensee BioMed Central Ltd. 2015
Published: 1 June 2015
RNA-Seq provides remarkable power in the area of biomarkers discovery and disease characterization. Two crucial steps that affect RNA-Seq experiment results are Library Sample Preparation (LSP) and Bioinformatics Analysis (BA). This work describes an evaluation of the combined effect of LSP methods and BA tools in the detection of splice variants.
Different LSPs (TruSeq unstranded/stranded, ScriptSeq, NuGEN) allowed the detection of a large common set of splice variants. However, each LSP also detected a small set of unique transcripts that are characterized by a low coverage and/or FPKM. This effect was particularly evident using the low input RNA NuGEN v2 protocol.
A benchmark dataset, in which synthetic reads as well as reads generated from standard (Illumina TruSeq 100) and low input (NuGEN) LSPs were spiked-in was used to evaluate the effect of LSP on the statistical detection of alternative splicing events (AltDE). Statistical detection of AltDE was done using as prototypes for splice variant-quantification Cuffdiff2 and RSEM-EBSeq. As prototype for exon-level analysis DEXSeq was used. Exon-level analysis performed slightly better than splice variant-quantification approaches, although at most only 50% of the spiked-in transcripts was detected. The performances of both splice variant-quantification and exon-level analysis improved when raising the number of input reads.
Data, derived from NuGEN v2, were not the ideal input for AltDE, especially when the exon-level approach was used. We observed that both splice variant-quantification and exon-level analysis performances were strongly dependent on the number of input reads. Moreover, the ribosomal RNA depletion protocol was less sensitive in detecting splicing variants, possibly due to the significant percentage of the reads mapping to non-coding transcripts.
The application of next-generation sequencing (NGS) to transcriptomics analysis, namely RNA-Seq, has allowed many advances in the characterization and quantification of transcripts. Recently, several developments in RNA-Seq methods have provided an advance in the complete characterization of RNA molecules . These developments included improvements in transcription start site mapping, strand-specific measurements, gene fusion detection, small/long non-coding RNA characterization and detection of alternative splicing events . Further improvements in RNA-Seq methods are allowing transcript quantification from very small amounts of cellular materials or single cells [2–6]. In this work we focused on two of the major steps in RNA-Seq experiments: Library Sample Preparation (LSP) and Bioinformatics Analysis (BA), and their interplay. NGS applications require specific LSP in which fragmented DNA or cDNA molecules are attached to adapters, PCR amplified and sequenced . Since different LSPs can have a significant impact on downstream analysis and interpretation of RNA-Seq results , therefore it is evident that robust and unbiased library preparation methods are critical. Nevertheless it has also become clear that LSPs contain biases that compromise the quality of NGS datasets, which can lead to erroneous interpretations . The LSPs available on the market belong to two main classes: i) unstranded and ii) stranded (PolyA + selected, rRNA depleted or low input RNA).
The choice of LSPs does not represent the only critical step in RNA-Seq. Indeed, the sequencing data need to be converted into transcript information (transcript structure, transcript quantification, etc.), and this step requires an accurate selection of the bioinformatics and statistical analysis techniques to be used. The approaches used to quantify known transcripts, i.e. transcripts annotated on the reference genome, and not yet characterized transcripts, i.e. transcripts not associated with an annotation on the reference genome, are different and characterized by different criticalities . In this work we focused only on the annotated splice variants. The BA pipelines for the detection of differentially expressed transcripts are characterized by multiple steps , and each of them has an influence on the final results. BA pipelines for differential expression can be divided in two categories: i) differential expression based on splice variant quantification, and ii) exon-based differential expression. This work focused on the interplay of LSP and BA on the statistical detection of AltDE. In detail, we investigated the effect of different LSPs (NuGEN v2, TruSeq unstranded/stranded, ScriptSeq), as well as the effect of PolyA + selection versus ribosomal depletion, on splice variant detection. Furthermore, we compared NuGEN low input protocol with standard TruSeq protocol using BA tools for splice variant-quantification (Cuffdiff  and RSEM-EBSeq [12, 13]) and for exon-level quantification (DEXSeq ).
We analysed low input RNA (NuGEN) LSP and standard/high input RNA LSP (Illumina TrueSeq and Epicentre ScriptSeq). Sequencing data generated using TrueSeq unstranded PolyA + (100 ngs input total RNA) was used as reference to simplify the comparisons among LSPs. This was because 100 ng of total RNA input material represents the RNA quantity that can be at best obtained from a wide range of biological samples, e.g. animal models, biopsies, FACS sorted cell populations etc. Furthermore, it represents one of the cheaper RNA-Seq experiments available on the market.
Library sample preparation (LSP) effects on splice variants detection and splice variants characterization
Library preparation information
RNA concentration (ng)
Poly A +
tss (PolyA + ) tss_total (rRNA depl)
A) Number of splice variants detected using Cufflinks  starting from 80 million reads generated by different Library sample preparation.
% of single/multiple mapped reads
% of mapped reads
Splice variants (FPKM > 0.1, Coverage > 0)
Ts100 and ts1000, which were the two libraries prepared with TrueSeq unstranded using respectively 100 ng and 1000 ng of input material, allowed the detection of nearly the same number of transcripts (given the chosen minimal threshold of FPKM > 0.1). A possible explanation for the lack of an apparent advantage in using high versus low, PolyA+ RNA input in TruSeq unstranded libraries might be due to transcript quantification sensitivity. Transcripts quantification was done using Cufflinks, because Steijger et al.  showed that it provided a good sensitivity with respect to other methods. However, Cufflinks sensitivity was significantly reduced at very low coverage . Thus, it was speculated that, since PolyA+ mRNAs represented a tiny subset of the total input RNA, the increment in input material from 100 ng to 1000 ng was not enough to bring low expressed splice variants in the range of sensitivity of Cufflinks for transcript quantification, given as threshold FPKM > 0.1.
It was notable that the increase of input material affected the overall library yield (Additional file 3). The optimal starting material amount to obtain the higher library yield using TrueSeq (TS) was approx. 200 ng of total RNA. Further increase in the input material significantly reduces the library yield.
Effect of PolyA + selection versus rRNA depletion. All LSPs allowed the detection of a similar number of transcripts (Figure 1) except for tss_total, generated using total RNA upon ribosomal depletion. In tss_total, the percentage of transcripts was slightly below 46% even if the total percentage of mapped reads did not differ from the other experiments (Table 2). For this LSP, 41% of the reads were mapped on multiple locations on the genome. This increase in multiple mapping reads was most probably due to non-coding genes, which often are represented in multiple copies in the genome, e.g. tRNAs, miRNA, lincRNA, etc. Coding transcripts undetectable in tss_total were characterized by low coverage/FPKM distributions in ts100, while the transcripts detected both by tss_total and ts100 showed similar coverage and FPKM distributions (Figure 1). We think that, coding transcripts characterized by low expression might not be sampled in tss_total because of the significant reduction of reads mapping to these single copy genes with respect to the other LSPs.
We further investigated this point by analysing the raw count distribution for exons belonging to the transcripts detected by NuGEN and for those transcripts in common with ts100 (Figure 3D). From this analysis it was clear that exons belonging to transcripts detected by NuGEN, are characterized by low exon coverage (Figure 3D, black boxes). This was particularly evident for the nu05 sample, where the mean of its exon-counts distribution was not shown since the majority of the exons have 0 counts (Figure 3D, black boxes). Instead, a mean value lower than 10 count was observed in samples nu2 and nu100 (Figure 3D, green/violet boxes). For exons detected by both nu and ts100, the exon counts distribution was lower for nu05, nu2, and nu100 (Figure 3D, green boxes) with respect to ts100 (Figure 3D violet boxes). The presence of lower coverage for transcripts/exons detected by NuGEN could represent a critical issue in splice variant differential expression, since it might affect the bioinformatics quantification of the transcripts/exons.
Furthermore, to investigate the dependency of the BA approaches on gene-specific splice variants complexity we constructed a synthetic dataset that also included complex expression composition of splice variants for the same gene (Additional file 6). Synthetic reads were characterized by having a uniform distribution over all transcripts and 58 differentially expressed transcripts between C and T groups were generated (Figure 6, 7).
Spikes-in were distributed in both C and T background datasets with respect to two disjoint sets of isoforms, to generate skipping and insertion events resembling biological environmental situation. The differences between backgrounds were negligible (Additional file 2 and 7) therefore we postulated that swapping spikes-in between C and T backgrounds would not affect the alternative splicing detection, as we indeed observed in our experiments (not shown).
All datasets are available at GEO repository with the ID: GSE58001.
Splice variants differential expression analysis
The identification of differentially expressed splice variants was investigated on the above mentioned datasets using Cuffdiff  and RSEM-EBSeq [12, 13], as prototypic for direct splice variants-quantification approaches, and DEXSeq , as prototypic for exon-level analysis.
The selection of two approaches based on splice variant was motivated by the different normalizations used by Cuffdiff and RSEM-EBSeq. Cuffdiff normalization is based on FPKM , while the combination of RSEM and EBSeq allows to use raw counts  to estimate differential expression between splice variants.
In case of Cuffdiff2, there was an increment in the detected transcripts that was correlated to the number of reads used to generate the spikes-in; this was observable both in TS and NU datasets (Figure 8A, orange bar). Cuffdiff2 detected a greater number of differentially expressed transcripts than Cuffdiff1 in all datasets except for 20TS (Figure 8A, blue and orange bars).
Dillies and Soneson [20, 21] have shown that FPKM does not represent the best normalization approach for differential expression analysis. Cuffdiff2 offers two alternative library normalization approaches other than FPKM: geometric and quartile. The geometric normalization is also used in the DESeq. We have repeated the Cuffdiff2 analysis on the 80TS dataset using both geometric and quartile normalization. Although there were some differences in the number of the transcripts detected, the overall number of detected spikes-in was not affected by the different normalization procedures (Additional file 8). The analysis based on RSEM-EBSeq (a tool that works with raw counts) provided a number of detected spikes-in greater than those detected by Cuffdiff2. The difficulties in detecting transcripts differentially expressed in NU datasets were still present also using RSEM-EBSeq but at a lower extent with respect to Cufflink. A similar improvement was not observable for the synthetic transcripts. However, the false positive detection was 6 times greater than Cuffdiff (Figure 8C violet bars).
The exon-level analysis, performed using DEXSeq, was quite inefficient with respect to Cuffdiff (both versions) and REM-EBSeq for the NU dataset (Figure 8A, green bar) and in general in the samples characterized by a low number of input spike-in reads. DEXSeq detected a higher number of differential expressed transcripts compared to Cuffdiff in 80TS (Figure 8A green bar). However, DEXSeq detection rate was lower than RSEM-EBseq (Figure 8A violet bar). Indeed 81% of true spikes-in detected in 80TS by RSEM-EBSeq were associated with a false discovery rate approx. 10 times bigger than the one observed using DEXSeq (Figure 8D).
The experiments performed on the synthetic dataset revealed inferior detection efficiency for all methods (Figure 8B). The best results were obtained by Cuffdiff1, which detected approximately 34% of the total true positive splice variants.
In this work we present the first comparative evaluation of the combined effect of Library Sample Preparation and Bioinformatics Analysis on alternative splicing detection.
Library Sample Preparations using as starting material at least 100 ng of total RNA and undergoing PolyA + enrichment (ss, tss, ts100, ts100), showed a similar behaviour for common detected transcripts. The transcripts detection was significantly impaired for the low expressed transcripts when comparing ribosomal depletion versus PolyA + selection protocols.
Transcripts that were specifically detected only by a single LSP showed poor coverage and they were probably not very informative for splice variant detection, because of the non-uniform coverage at exon-level. As for NuGEN low input protocol, the number of LSP-specific transcripts increased with the rise of the amount of total RNA input. However, those LSP-specific transcripts were characterized by low coverage and in general by very low exon-level counts. FPKM estimation for those transcripts could be misleading, since it had a very similar behaviour to that observed for the transcripts in common with the LSP based on the TruSeq protocol. Nu05, nu2 and nu100 showed a lower coverage with respect to ts100 for the common transcripts as well as for the exon count. The experiments on the benchmark datasets revealed that the lower exon counts generated from NU datasets (20/40/80NU) negatively affected the ability of exon-level based approach (DEXSeq) to detect alternative splicing events. On the other side, when a high number of input reads was used and the preparation was done using the TruSeq protocol (e.g. 80TS dataset), the exon-level based approach provided the best results. Exon-level analysis provided also the best compromise between detection and false discovery rate using the TruSeq protocol.
All our experiments were based on 50-bp paired-end sequences. This choice was a compromise between the performance of the experiment and the sequencing costs. However, the optimal solution, as suggested in the 2011 ENCODE white paper on RNA-seq (https://genome.ucsc.edu/ENCODE/protocols/dataStandards/ENCODE_RNAseq_Standards_V1.0.pdf), would have been 76-bp paired-end sequencing, since the average insert size in the majority of LSP is 150-bp. The use of longer reads could be particularly useful in approaches like TopHat2/Cufflinks, where read splitting is used to detect intron-exon junctions.
Our results indicate that a low input protocol, such as NuGEN v2, is not suitable for alternative splicing analysis due to the limited coverage at exon-level. Moreover the performances of both splice variants-quantification approaches and exon-level analyses were in general comparable. However, it was notable that, for high number of input spike-in reads, exon-level analysis provided a higher detection rate of alternative splicing events with a reduced level of noise.
Furthermore, we provide to the research community a dataset that can be re-used as benchmark to compare the performance of software devoted to the identification of alternative splicing events. However, because of the intrinsic characteristics of the short sequencing reads, spikes-in generated by extracting reads mapping to a transcript do not necessarily guarantee that they come from that specific transcript. It was impossible to discriminate between different transcripts when reads mapped to exons that were shared between two or more of them. However, since our benchmark dataset contains also synthetic spike-in the combination of the two different types provides a new benchmarking tool to be used to compare software dedicated to alternative splicing detection.
Materials and methods
RNA isolation and purification
Total RNA was extracted from D1 mouse cell line . Total RNA was extracted with Trizol Reagent (Invitrogen) followed by RNeasy micro clean-up procedure (Qiagen) as per manufacturer's instructions. Total RNA integrity was assessed by Agilent 2100 Bioanalyzer (Agilent) and the RNA Integrity Number (RIN) was calculated; RNA sample had a RIN = 9.5.
The following steps were repeated for all library preparations: 1). ERCC RNA Spike in Control Mix 1 (Ambion) was added to D1 total RNA as a first step of each protocol. 2). Two additional rounds of purification of the cDNA libraries were done using Agencourt Ampure XP SPRI beads (Beckman Courter) to remove > 600 bp double stranded cDNA. 3). The length distribution of the cDNA libraries was monitored using DNA 1000 kits on the Agilent Bioanalyzer. 4). Libraries were subjected to an indexed PE sequencing run of 2x51 cycles on an Illumina HiSeq 2000.
Illumina TruSeq RNA
Illumina TruSeq RNA Sample Preparation kit version 1 (Low Sample Protocol) was used with slight modifications. Briefly, PolyA + containing RNA molecules were purified using polyT oligo-attached magnetic beads. Thermal fragmentation followed after two rounds of enrichment for PolyA + mRNA. cDNA was synthesized using reverse transcriptase (Superscript II) and random primers. This was followed by second strand cDNA synthesis, end repair process, adenylation of 3' ends and ligation of the adapters. The products were then purified and enriched with 12 cycles of PCR to create the cDNA library.
NuGEN Ovation RNA-Seq system version 2 - Dedicated Read Barcode
Total RNA was processed for cDNA synthesis using Ovation RNA-Seq system version 2 (NuGEN Technologies) according to the manufacturer's protocol. Briefly, first strand cDNA was prepared from total RNA using a unique first strand DNA/RNA chimeric primer mix and reverse transcriptase (RT). The primers have a DNA portion that hybridizes either to the 5' portion of the poly(A) sequence or randomly across the transcript. A DNA/RNA heteroduplex double-stranded cDNA was generated by fragmentation of the mRNA within the cDNA/mRNA complex, allowing the DNA polymerase to synthesize a second strand. The DNA then underwent SPIA amplification. SPIA cDNA were sheared to get a size range of 25 bp to 400 bp with the bulk of the material at 150 bp. This was done by sonication (Covaris model S2) with duty cycle 10, intensity 5 and cycle/burst 100 for 300 s. 100 ng and 500 ng of the sheared DNA were used for library preparation using the Encore Rapid Library Systems (NuGEN Technologies) according to manufacturer's protocol; the fragmented DNA underwent end repair, adaptor ligation (with 6 bases indexing tags), and final repair to produce the final library. 4 µl of each purified library underwent 10 cycles of PCR amplification using Illumina TruSeq PCR reagents.
Epicentre ScriptSeq v1
PolyA + containing mRNA molecules were selected using RiboMinus™ Eukaryote kit (Life Technologies) for RNA-Seq according to manufacturer's instructions. The RNA samples were chemically fragmented using the StarScript Reverse Transcriptase Buffer and the cDNA Synthesis Primer was annealed to the RNA. 5′ end-tagged cDNA (equivalent to the 3′ end of the original RNA) was produced by random-primed cDNA synthesis. This was followed by 3′-Terminal Tagging of the cDNA using the Terminal-Tagging Oligo (TTO) which randomly annealed to the cDNA, including to the 3′ end of the cDNA and served as template for the extension of the cDNA by DNA polymerase. The resulting di-tagged were purified using Qiagen MinElute PCR Purification Kit. Enrichment of the purified di-tagged cDNAs was done with 12 cycles of PCR.
Illumina TruSeq Stranded Total RNA
Illumina TruSeq Stranded Total RNA Sample Preparation kit (Low Sample Protocol) was used with slight modifications. The removal of ribosomal RNA was done using Ribo-Zero Gold rRNA removal beads (Epicentre), which deplete samples of both cytoplasmic and mitochondrial ribosomal RNA. After depletion, the RNA was purified and fragmented using divalent cations and thermal fragmentation. First strand cDNA synthesis was performed using reverse transcriptase (Superscript II) and random primers. This was followed by second strand cDNA synthesis using DNA Polymerase I and RNase H and dUTP in place of dTTP. Libraries were prepared as described above for the TruSeq RNA protocol except for the end repair process.
Illumina TruSeq Stranded mRNA
Illumina TruSeq Stranded mRNA Sample Preparation kit (Low Sample Protocol) was used with slight modifications. Briefly, polyA containing mRNA molecules were selected using polyT oligo-attached magnetic beads. Fragmentation and library preparation was done as described above for the TruSeq Stranded Total RNA protocol.
The common background of the spikes-in dataset was made using paired-end reads generated preparing, with the TruSeq unstranded protocol, 5 libraries, starting with 1000 ng of total RNA extracted from the D1 cell (C1-C5), and 5 libraries starting with 100 ng of total RNA D1 cells (T1-T5) (Additional file 1). The true positive set (TP) of transcripts was defined in the following way: exon counts for samples C1-C5 and T1-T5 were loaded in R using DEXseq package  and UCSC mm9 annotation (28232 genes). Genes characterized by at least three splice variants were selected (6582). Then we selected genes having at least one transcript characterized by at least one exon discriminating it from the other splice variants, for a total of 6313 genes. The genes were further filtered, removing all transcripts characterized by having, for the discriminating exons, less than 10 counts in total in C1-C5 and T1-T5 samples (leaving 2970 genes). Out of the 2970 genes 27 were selected, after the inspection of a set of more than a hundred randomly chosen one, and from them one of the splice variant was used for spike-in experiment (Additional file 4). Sequences (generated with NuGEN v2, using as input 2 ng input total RNA, and with TruSeq unstranded, using 100 ng input total RNA), were used to construct three datasets made respectively of 20, 40 and 80 million reads named 20NU, 40NU, 80NU and 20TS, 40TS, and 80TS accordingly. Each dataset was mapped against the mouse genome (mm9) and the reads mapping to the 27 transcripts were extracted and spiked in C1-C5 or in T1-T5 to simulate transcripts up and down-regulation within two experimental conditions (Additional file 4).
It is worth to remark that the selection of the transcripts to be spiked-in is a time-consuming process, due to the complexity in the identification of different types of alternative splicing events (e.g. 5'-end extension, 3'-end reduction, cassette skipping/insertion) for genes having more than one expressed splice variant. Specifically, we selected, as putative targets for spike-in, only those set of splice variant that were expressed both in C and T datasets. This restriction was meant to avoid those cases in which the alternative splicing detection problem reduces to differential gene expression identification.
Out of the 2970 transcripts 58 were selected, with a similar procedure as above and for each transcript we spiked-in a specific number of reads (Additional file 5). Since spiking-in the identical amount of synthetic reads in all samples of a replication group would represent an oversimplified experimental design, we selected the number of reads to be spiked-in according to a normal distribution, as could be observed in biological replications. We selected the number of reads to be spiked in each sample on the basis of a normal distribution for 105 elements having a mean equal to the number of reads to be spiked-in, e.g. 100, and a standard deviation 10 times smaller than the mean. For example, we decided to spike-in 100 reads in the 5 replicated of ts100 background (C1-C5), thus 5 random values from a normal distribution of 105 elements with mean 100 and standard deviation 10 were chosen (e.g. 86, 112, 81, 98, 89) and used to define the number of spike-in reads to be placed in each of the 5 replicates. The required synthetic paired-end reads 2x51nt were constructed to guarantee a uniform distribution both at exon and exon-exon junction level. Reads were then associated with a quality score of 40 and used to generate fastq files. Scripts used to generate the synthetic data set are available upon request.
Splice variants quantification and statistical detection of alternative spliced variants
Nu05, nu2, nu100, ts100, ts1000, ss, tss, tss_total, C-1-C5 and T1-T5 fastq data were mapped with STAR (2.3.1n)  using default settings. For nu05, nu2, nu100, ts100, ts1000, ss, tss and tss_total splice variant quantification was done with Cufflinks (1.3.0 or 2.2.0) . Exon-level quantification was done using DEXSeq (1.10.8)  and exon-exon junction quantification was done with subjunc function of the Rsubread (1.14.2)  Bioconductor package. Cuffdiff 1 (1.3.0), Cuffdiff2 (2.2.0) , RSEM (1.2.15) , EBSeq (1.4.0)  prototypic BA based on splice variant-quantification, were used for detection of alternative spliced variants between C1-C5 and T1-T5 groups using mm9 UCSC annotation. All Cuffdiff and RSEM-EBSeq analyses were run with standard parameters, unless in the case of the evaluation of the effect of different library normalization procedures done only with Cuffdiff 2 (library-norm-method geometric and library-norm-method quartile). Splice variants were considered differentially expressed if characterized by q-value ≤ 0.05 and FDR ≤ 0.05. for RSEM-EBSeq. Exon-level analysis made use of DEXSeq . Splice variants were considered differentially expressed if at least one splice variant-specific exon was detected as differentially expressed between C1-C5 and T1-T5 groups with a Benjamini & Hochberg adjusted p-value ≤ 0.05.
We wish to thank Prof. Paola Ricciardi-Castagnoli for providing D1 cells.
This study was funded by grants from the Epigenomics Flagship Project EPIGEN, MIUR-CNR, European Seventh framework program, Health.2012.1.2-1, NGS-PTL grant n. 306242 and by the core fund of Singapore Immunology Network, Agency for Science, Technology and Research (A*STAR), Singapore.
Publication of this article has been funded by NGS-PTL grant no. 306242.
This article has been published as part of BMC Bioinformatics Volume 16 Supplement 9, 2015: Proceedings of the Italian Society of Bioinformatics (BITS): Annual Meeting 2014: Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/16/S9.
- Ozsolak F, Milos PM: RNA sequencing: advances, challenges and opportunities. Nature Reviews Genetics. 2011, 12 (2): 87-98. 10.1038/nrg2934.PubMed CentralView ArticlePubMedGoogle Scholar
- Hashimshony T, Wagner F, Sher N, Yanai I: CEL-Seq: Single-Cell RNA-Seq by Multiplexed Linear Amplification. Cell Reports. 2012, 2 (3): 666-673. 10.1016/j.celrep.2012.08.003.View ArticlePubMedGoogle Scholar
- Islam S, Kjallquist U, Moliner A, Zajac P, Fan JB, Lonnerberg P, Linnarsson S: Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Research. 2011, 21 (7): 1160-1167. 10.1101/gr.110882.110.PubMed CentralView ArticlePubMedGoogle Scholar
- Picelli S, Faridani OR, Bjorklund AK, Winberg G, Sagasser S, Sandberg R: Full-length RNA-seq from single cells using Smart-seq2. Nature Protocols. 2014, 9 (1): 171-181. 10.1038/nprot.2014.006.View ArticlePubMedGoogle Scholar
- Qiu S, Luo S, Evgrafov O, Li R, Schroth GP, Levitt P, Knowles JA, Wang K: Single-neuron RNA-Seq: technical feasibility and reproducibility. Frontiers in Genetics. 2012, 3: 124-PubMed CentralView ArticlePubMedGoogle Scholar
- Sengupta S, Bolin JM, Ruotti V, Nguyen BK, Thomson JA, Elwell AL, Stewart R: Single read and paired end mRNA-Seq Illumina libraries from 10 nanograms total RNA. Journal of Visualized Experiments : JoVE. 2011, e3340-56Google Scholar
- van Dijk EL, Jaszczyszyn Y, Thermes C: Library preparation methods for next-generation sequencing: Tone down the bias. Experimental Cell Research. 2014, 322 (1): 12-20. 10.1016/j.yexcr.2014.01.008. Mar 10View ArticlePubMedGoogle Scholar
- Sun Z, Asmann YW, Nair A, Zhang Y, Wang L, Kalari KR, Bhagwate AV, Baker TR, Carr JM, Kocher JP, et al: Impact of library preparation on downstream analysis and interpretation of RNA-Seq data: comparison between Illumina PolyA and NuGEN Ovation protocol. PloS One. 2013, 8 (8): e71745-10.1371/journal.pone.0071745.PubMed CentralView ArticlePubMedGoogle Scholar
- Vijay N, Poelstra JW, Kunstner A, Wolf JB: Challenges and strategies in transcriptome assembly and differential gene expression quantification. A comprehensive in silico assessment of RNA-seq experiments. Molecular Ecology. 2013, 22 (3): 620-634. 10.1111/mec.12014.View ArticlePubMedGoogle Scholar
- Alamancos GP, Agirre E, Eyras E: Methods to study splicing from high-throughput RNA Sequencing data. Methods Mol Biol. 2014, 11 (26): 357-397.View ArticleGoogle Scholar
- Trapnell C, Hendrickson DG, Sauvageau M, Goff L, Rinn JL, Pachter L: Differential analysis of gene regulation at transcript resolution with RNA-seq. Nature Biotechnology. 2013, 31 (1): 46-53.View ArticlePubMedGoogle Scholar
- Li B, Dewey CN: RSEM:accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011, 12: 323-10.1186/1471-2105-12-323.PubMed CentralView ArticlePubMedGoogle Scholar
- Leng N, Dawson JA, Thomson JA, Ruotti V, Rissman AI, Smits BM, Haag JD, Gould MN, Stewart RM, Kendziorski C: EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics. 2013, 29 (8): 1035-1043. 10.1093/bioinformatics/btt087.PubMed CentralView ArticlePubMedGoogle Scholar
- Anders S, Reyes A, Huber W: Detecting differential usage of exons from RNA-seq data. Genome Research. 2012, 22 (10): 2008-2017. 10.1101/gr.133744.111.PubMed CentralView ArticlePubMedGoogle Scholar
- Winzler C, Rovere P, Rescigno M, Granucci F, Penna G, Adorini L, Zimmermann VS, Davoust J, Ricciardi-Castagnoli P: Maturation stages of mouse dendritic cells in growth factor-dependent long-term cultures. J Exp Med. 1997, 185 (2): 317-328. 10.1084/jem.185.2.317.PubMed CentralView ArticlePubMedGoogle Scholar
- Ramirez F, Dundar F, Diehl S, Gruning BA, Manke T: deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Research. 2014, 42 (Web Server): W187-191.PubMed CentralView ArticlePubMedGoogle Scholar
- Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L: Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature Protocols. 2012, 7 (3): 562-578. 10.1038/nprot.2012.016.PubMed CentralView ArticlePubMedGoogle Scholar
- Steijger T, Abril JF, Engstrom PG, Kokocinski F, RGASP Consortium, Hubbard TJ, Guigo R, Harrow J, Bertone P: Assessment of transcript reconstruction methods for RNA-seq. Nature Methods. 2013, 10 (12): 1177-1184. 10.1038/nmeth.2714.View ArticlePubMedGoogle Scholar
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods. 2008, 5 (7): 621-628. 10.1038/nmeth.1226.View ArticlePubMedGoogle Scholar
- Dillies MA, Rau A, Aubert J, Hennequet-Antier C, Jeanmougin M, Servant N, Keime C, Marot G, Castel D, Estelle J, et al: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Briefings in Bioinformatics. 2013, 14 (6): 671-683. 10.1093/bib/bbs046.View ArticlePubMedGoogle Scholar
- Soneson C, Delorenzi M: A comparison of methods for differential expression analysis of RNA-seq data. BMC bioinformatics. 2013, 14: 91-10.1186/1471-2105-14-91.PubMed CentralView ArticlePubMedGoogle Scholar
- Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR: STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013, 29 (1): 15-21. 10.1093/bioinformatics/bts635.PubMed CentralView ArticlePubMedGoogle Scholar
- Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology. 2010, 28 (5): 511-515. 10.1038/nbt.1621.PubMed CentralView ArticlePubMedGoogle Scholar
- Liao Y, Smyth GK, Shi W: The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Research. 2013, 41 (10): e108-10.1093/nar/gkt214.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.