RNA-combine: a toolkit for comprehensive analyses on transcriptome data from different sequencing platforms

Dong, Xuemin; Dong, Shanshan; Pan, Shengkai; Zhan, Xiangjiang

doi:10.1186/s12859-021-04549-y

Software
Open access
Published: 06 January 2022

RNA-combine: a toolkit for comprehensive analyses on transcriptome data from different sequencing platforms

Xuemin Dong^1,3,
Shanshan Dong²,
Shengkai Pan¹ &
…
Xiangjiang Zhan^1,4

BMC Bioinformatics volume 23, Article number: 26 (2022) Cite this article

6564 Accesses
2 Citations
14 Altmetric
Metrics details

Abstract

Background

Understanding the transcriptome has become an essential step towards the full interpretation of the biological function of a cell, a tissue or even an organ. Many tools are available for either processing, analysing transcriptome data, or visualizing analysis results. However, most existing tools are limited to data from a single sequencing platform and only several of them could handle more than one analysis module, which are far from enough to meet the requirements of users, especially those without advanced programming skills. Hence, we still lack an open-source toolkit that enables both bioinformatician and non-bioinformatician users to process and analyze the large transcriptome data from different sequencing platforms and visualize the results.

Results

We present a Linux-based toolkit, RNA-combine, to automatically perform the quality assessment, downstream analysis of the transcriptome data generated from different sequencing platforms, including bulk RNA-seq (Illumina platform), single cell RNA-seq (10x Genomics) and Iso-Seq (PacBio) and visualization of the results. Besides, this toolkit is implemented with at least 10 analysis modules more than other toolkits examined in this study. Source codes of RNA-combine are available on GitHub: https://github.com/dongxuemin666/RNA-combine.

Conclusion

Our results suggest that RNA-combine is a reliable tool for transcriptome data processing and result interpretation for both bioinformaticians and non-bioinformaticians.

Background

“How can scientists better understand the workings of a cell? Studying the transcriptome, RNA expressed from the genome, reveals a more complex picture of the gene expression behind it all” [1]. In this regard, understanding the transcriptome comprises a prerequisite for full understanding of the biological function of a cell, a tissue and even an organ [1]. Recently, in light of the development of high throughput sequencing technologies (e.g. RNA-seq, Iso-Seq), researchers are able to profile whole transcriptome of an organ, tissue or even a single cell of a species. To mine these data, an increasing number of bioinformatics tools (software) have been developed. Transcriptome analyses involve multiple modules (e.g. expression level calculation, identification of differentially expressed genes, variant calling, co-expression network construction etc.) and the processing of data from different sequencing platforms (e.g. bulk RNA-seq data generated from Illumina platform, scRNA-seq from 10x Genomics platform, Iso-Seq from PacBio platform). Many current tools, however, have been proven both time and labor consuming and not friendly for users, especially non-bioinformaticians, mainly due to the following five issues. First, most existing tools are limited to the data analysis from a single sequencing platform. Second, most tools have only one analysis module. For example, the most recently published tool, RASflow [2], is only designed for the gene expression difference analysis. Third, most tools are developed using different advanced computer languages, (e.g. RNASeqR [3] based on R, scGEAToolbox [4] based on Matlab), which are challenging for non-bioinformatician users. Fourth, the input formats vary among different tools. Lastly, the results generated from most tools can’t be visualized.

Here, we present a Linux-based command-line toolkit, RNA-combine, for the comprehensive analysis of bulk RNA-seq, scRNA-seq as well as Iso-Seq data, performing analyses for multiple modules (e.g. read preprocessing, read alignment, transcript quantification, detection of differential expression and annotation, alternative splicing, result visualization, etc.). This toolkit represents the assembly of a wide range of routine and customized transcriptome analysis workflows and is free from the abovementioned issues and thus is friendly and easily implemented for users.

Implementation

Our RNA-combine consists of 16 modules and could process transcriptome data (bulk RNA-seq, scRNA-seq, and Iso-Seq) from three sequencing platforms (Illumina, 10x Genomics, PacBio). All analysis modules are scripted in the form of bash command lines, which can be easily customized and launched by users.

The usage of each module is provided in the user manual and the scheme of our toolkit is shown in Fig. 1. Moreover, the organization of all modules and their function descriptions in RNA-combine is shown in Additional file 1: Figure S1.

Introduction of analysis modules

Bulk RNA-seq data analysis

The workflow of bulk RNA-seq data analysis starts with raw sequencing file preprocessing, including removing rRNA sequences using Sortmerna [5], removing adapters and trimming low quality bases using Trimmomatic [6], aligning reads to reference genome using Hisat2 [7], counting reads on genes to calculate gene expression levels using featureCounts [8]. The aligned sequence files and/or gene count matrix can be used as inputs for downstream analysis modules: variant (INDEL and SNP) calling, DEG analysis, function annotation, gene co-expression analysis and splicing site detection.

In the variant calling module, two methods are introduced: the one is conducted by GATK [9], which has been widely used as one of the most reliable methods in the variant identification from RNA-seq data [10]; the other is Strelka2 [11], which shows better sensitivity and precision in variant calling compared with GATK [12]. Notably, we apply three or more methods in each of the three modules: DEG, splicing site detection, gene co-expression analysis. This approach integrates the advantages of different methods. For instance, in DEG module, we use DESeq2 [13], limma [14], edgeR [15] and T-Test which differ from each other in both gene expression normalization and DEG identification: DESeq2 normalizes gene expression with a “geometric” normalization strategy and detects DEGs using an exact test; edgeR uses the weighted mean of log ratios for normalization and an exact test for DEG identification; limma adopts a quantile normalization approach and an empirical Bayesian analysis for DEG detection, whereas T-Test method applies a T-Test on RPKM (Reads Per Kilobase of transcript per Million mapped reads)/FPKM (Fragments Per Kilobase of exon model per Million mapped fragments) normalized data. In the module of splicing site detection, three methods with different strategies are adopted: DEXseq [16] based on exon boundaries, StringTie [17] based on transcripts, and rMATS [18] based on splicing events. In the module of gene co-expression, we use linear-association based Pearson and Spearman coefficients, both of which have been widely used in constructing gene co-expression networks. We also apply a recently published method PMI [19], which measures nonlinearly direct dependencies based on the part mutual information among variables (gene expression). We further adopt SSN [20], which can construct a sample-specific network relying on the gene expression profiles of control samples and process one case sample. We introduce clusterProfiler [21] to the functional enrichment (GO pathways and KEGG pathways) module. It is noted that with the aligned sequence files and gene count matrix, users can customize workflows to meet specific analysis requirements.

scRNA-seq data analysis

Cellranger [22] is used for preprocessing raw fastq files, including the read alignment and generation of feature-barcode matrices. The feature-barcode matrices can be passed onto downstream analyses.

Currently, the library construction used for scRNA-seq often causes doublet artifacts (i.e. two cells have the same barcode, thus mistakenly regarded as one cell), which may generate biased results. To overcome this limitation, we apply Scrublet [23] in the toolkit, which can identify such artifacts and exclude them before further analyses. Then, we utilize SCANPY [24] to conduct a sequence of processing and analysis, including the quality control, dimension reduction, cell clustering, cell type annotation, trajectory inference, providing users with the visualization of the number of genes in cells, percentage of mitochondrial genes in cells, cell cluster atlases, cell connectivity, and cell trajectories. To assign each cell cluster to a cell type, we develop a new method, based on two classical cell marker databases, CellMarker [25] and PanglaoDB [26]. Taken the DEG list of a cell cluster of interest (against with other cell clusters) as the input, our method searches related cell type information for each DEG in the two databases, with the cell type showing the largest hit number considered as the true one.

Iso-Seq data analysis

We use one of the most well-established pipeline-PacBio’s IsoSeq V3 [27] to process raw sequencing data, including polishing, hierarchical clustering, iterative merging to obtain consensus full-length transcripts. For the read alignment, we suggest users to use local alignment methods (e.g. Minimap2 [28] in our toolkit), since the length of Iso-Seq reads is much longer than those generated by RNA-seq.

Interface of the input format

For omics (e.g. transcriptome) analysis, one of the most labor-intensive and time-consuming processes is the formatting of input files for different tools. In our toolkit, we develop a layer of interface that could automatically format inputs for user-specified tools, guaranteeing that these tools could be linked smoothly. Under this environment, bioinformaticians and non-bioinformaticians could quickly link different tools to a reliable processing workflow without extra commands. More importantly, the users just need to input the workflow with raw transcriptome sequencing data and reference genomes. Our toolkit will start from initial preprocessing and load the modules of interest. For instance, in the analysis of bulk RNA-seq data, one can set up a workflow starting from raw sequencing data preprocessing, to the calculation of expression level of each gene or transcript, identification of differentially expressed gene or transcript, detection of alternative splicing event and differential splicing site, construction of co-expression network and sample-specific network, variant calling (SNPs, INDELs), and biological pathway annotation.

Visualization

Because each module of transcriptomic analysis involves thousands of genes and multiple dimensions (e.g. genes, networks etc.), the data presentation with graphical visualization is in need to facilitate the result interpretation. However, this has been neglected in most of the previous tools. To address this issue, we develop a visualization plugin under the environment of R and Python, which realizes the visualization of the results for each module and can generate qualified graphs for publication.

Backtracking and recording

During the processing of large datasets (e.g. transcriptomes of multiple samples), spurious interruption might occur due to unexpected factors (e.g. interruption of power supply, storage overload). This abolishes the entire workflow and users have to spend a lot efforts to manually check all the processes. To minimize this effect as much as possible, we develop a Backtracking Executor plugin in our toolkit. It automatically records the interrupted processes (i.e. error logs) in a user-specified directory. Based on these log files, it is easy for users to resume data processing from the interrupted points. As a consequence, the plugin enables users to keep tracking the processes and thus greatly reducing the efforts spent on debugging.

Results and discussion

Case studies

In this section, we focused on the assessments of reliability and automation of RNA-combine using several published transcriptome data. We first tested the workflow of the bulk RNA-seq analysis with a transcriptome dataset from three normal breast tissues and three breast tumor tissues [29]. Starting from the fastq files, RNA-combine first removed contaminated sequences, aligned sequences to reference genome, counted read numbers on genes, producing a gene count matrix, which was then passed for the differential expression analysis, co-expression analysis and splicing site detection. The results of differentially expressed genes between tumor and normal conditions and the enriched functional terms were shown in Fig. 2a, c, and the sample distances were shown in Fig. 2b. Among the identified differentially expressed genes, MUC1, which has been approved by FDA as a diagnostic marker to monitor clinical course of patients with breast tumor during treatment [30], was significantly upregulated in the tumor samples, suggesting that our toolkit was reliable. Genes specifically co-expressed with MUC1 in either tumor or normal tissues were shown in Fig. 2d. Moreover, three genes with differential splicing sites (exon-based, transcript-based and event-based) identified between tumor and normal samples were shown in Fig. 3.

Next, using the pipeline of scRNA-seq data analysis in RNA-combine we analyzed a scRNA-seq dataset of 3k PBMCs (peripheral blood mononuclear cell) from a healthy donor generated from 10x Genomics (https://support.10xgenomics.com). The gene-barcode matrix was used for doublet detection, cell clustering, cell type annotation and cell connectivity analysis. The number of 35 candidates were detected as potential doublets among 2700 cells (Fig. 4a) and thus were discarded. The clustering results were shown in Fig. 4b. These cell clusters were further annotated according to classical cell markers, such as NKG7 for natural killer cells and T cells [31], PPBP for Megakaryocytes [32], CD79A for B cells [33] (Fig. 4c). The cell connectivity analysis revealed two main branches (lymphocytes and myeloid cells), as shown in Fig. 4d.

We used the public Alzheimer’s Iso-Seq dataset (1 percentage of total samples) [34] to test the pipeline of Iso-Seq. This test produces a total of 2245 high-quality full-length transcripts. Estimated running time of each module in all of the above analyses is shown in Additional file 1: Table S1.

Overall, these results suggested that our toolkit worked smoothly and automatically for real transcriptome data from different sequencing platforms and it could process the data with raw sequencing fastq files and reference genomes as inputs.

Compared with other methods

We compared the features of RNA-combine with four toolkits (RNASeqR, RASflow, scGEAToolbox, NASQAR [35]) published in recent 2 years (Fig. 5). Among them, RNASeqR and RASflow could only process bulk RNA-seq data, and scGEAToolbox is designed for scRNA-seq data only. Although NASQAR can process both bulk RNA-seq and scRNA-seq data, it can’t process Iso-Seq data.

Regarding the analysis module, our toolkit covers all the analysis modules implemented in RNASeqR, RASflow, scGEAToolbox, and most modules in NASQAR (except for metagenomic differential analysis and gene enrichment analysis), and it also includes nine extra modules (i.e. rRNA removal, alternative splicing site detection, variant detection, gene-co-expression network construction, scRNA gene-barcode matrix production, doublet detection, cell type annotation, PacBio sequence alignment and full-length transcript analysis) that are absent in the abovementioned software. One major difference between RNA-combine and other toolkits is that it implements the modules of raw data preprocessing, which enables users to use raw fastq files as input directly. Another difference is the development of the input formatting interface, which automatically formats inputs for different analysis modules, therefore enabling end-to-end analysis from raw sequencing files to direct graphic visualization of results. This is important because input formatting is both time- and labor-consuming, especially for users without advanced programming skills. RNA-combine is a Linux-based toolkit, enabling users to parallel jobs, thus speeding up the analysis processes. Collectively, RNA-combine is more comprehensive on data analysis and more user-friendly.

Conclusions

We conclude that RNA-combine presented in this study is a user-friendly, reliable toolkit for the comprehensive analysis of transcriptome data generated from different sequencing platforms. It enables users to set up a customized workflow to analyze data from multiple platforms and generate analysis reports and result visualization. Importantly, this toolkit empowers researchers without advanced bioinformatics skills to analyze their data by working with human-readable configuration files. We will continue to provide user supports and feature enhancements in future releases.

Availability and requirements

Project name: RNA-combine
Project home page: The software and usage guideline are available online at https://github.com/dongxuemin666/RNA-combine
Operating system(s): Linux
Programming language: Python, R, Shell
Other requirements: R 3.6, Python 3.7
License: MIT, licenses for dependent packages, tools and methods are shown in Additional file 1: Table S2
Any restrictions to use by non-academics: license needed

Availability of data and materials

Bulk RNA-seq data analysis: six samples (SRR868857, SRR868862, SRR868865, SRR868869, SRR868873, SRR868877) from NCBI Sequence Read Archive. scRNA-seq data analysis: scRNA data of 3k PBMCs from a healthy donor provided by 10x (http://cf.10xgenomics.com/samples/cell-exp/1.1.0/pbmc3k). Iso-Seq data analysis: from the website https://downloads.pacbcloud.com/public/dataset.

Abbreviations

scRNA-seq:: Single cell RNA sequencing
SNP:: Single nucleotide polymorphism
INDEL:: Insertion or deletion of bases
rRNA:: Ribosomal RNA
DEG:: Differentially expressed gene
GO:: Gene Ontology
KEGG:: Kyoto Encyclopedia of Genes and Genomes
SSN:: Sample-specific network
FDA:: Food and Drug Administration

References

Adams J. Transcriptome: connecting the genome to gene function. Nat Educ. 2008;1(1):195.
Google Scholar
Zhang X, Jonassen I. RASflow: an RNA-seq analysis workflow with Snakemake. BMC Bioinform. 2020;21(1):110.
Article Google Scholar
Chao KH, Hsiao YW, Lee YF, Lee CY, Lai LC, Tsai MH, et al. RNASeqR: an R package for automated two-group RNA-Seq analysis workflow. IEEE/ACM Trans Comput Biol Bioinform. 2019. https://doi.org/10.1109/TCBB.2019.2956708.
Article Google Scholar
Cai JJ. scGEAToolbox: a Matlab toolbox for single-cell RNA sequencing data analysis. Bioinformatics. 2020;36(6):1948–9.
CAS Google Scholar
Evguenia K, Laurent N, Hélène T. SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics. 2012;28(24):3211–7.
Article Google Scholar
Bolger AM, Marc L, Bjoern U. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–20.
Article CAS Google Scholar
Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019;37(8):907–15.
Article CAS Google Scholar
Yang L, Gordon KS, Wei S. featureCounts: an efficient general-purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014;30(7):923–30.
Article Google Scholar
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–303.
Article CAS Google Scholar
Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A, et al. From FastQ data to high confidence variant calls: the genome analysis toolkit best practices pipeline. Curr Protoc Bioinformatics. 2013;43(1110):11.10.1-11.10.33.
Google Scholar
Kim S, Scheffler K, Halpern AL, Bekritsky MA, Noh E, et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat Methods. 2018;15(8):591–4.
Article CAS Google Scholar
Foox J, Tighe SW, Nicolet CM, Zook JM, Byrska-Bishop M, et al. Performance assessment of DNA sequencing platforms in the ABRF Next-Generation Sequencing Study. Nat Biotechnol. 2021;39(9):1129–40.
Article CAS Google Scholar
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550.
Article Google Scholar
Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):47.
Article Google Scholar
Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–40.
Article CAS Google Scholar
Anders S, Reyes A, Huber W. Detecting differential usage of exons from RNA-seq data. Genome Res. 2012;22(10):2008–17.
Article CAS Google Scholar
Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg SL. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33(3):290–5.
Article CAS Google Scholar
Shen S, Park JW, Lu ZX, Lin L, Henry MD, Wu YN, et al. rMATS: Robust and flexible detection of differential alternative splicing from replicate RNA-seq data. Proc Natl Acad Sci. 2014;111(51):5593–601.
Article Google Scholar
Zhao J, Zhou Y, Zhang X, Chen L. Part mutual information for quantifying direct associations in networks. Proc Natl Acad Sci. 2016;113(18):5130–5.
Article CAS Google Scholar
Liu X, Wang Y, Ji H, Aihara K, Chen L. Personalized characterization of diseases using sample-specific networks. Nucleic Acids Res. 2016;44(22):e164.
Article Google Scholar
Yu G, Wang LG, Y H, Y HQ. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS. 2012;16(5):284–7.
Article CAS Google Scholar
10x Genomics. Cellranger; 2021. https://github.com/10XGenomics/cellranger.
Wolock SL, Lopez R, Klein AM. Scrublet: computational identification of cell doublets in single-cell transcriptomic data. Cell Syst. 2019;8(4):281–91.
Article CAS Google Scholar
Wolf FA, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 2018;19(1):1–5.
Article Google Scholar
Zhang X, Lan Y, Xu J, Quan F, Zhao E, Deng C, et al. Cell Marker: a manually curated resource of cell markers in human and mouse. Nucleic Acids Res. 2019;47(D1):D721–8.
Article CAS Google Scholar
Franzén O, Gan LM, Björkegren JLM. PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data. Database. 2019;2019:baz046.
Article Google Scholar
Armin T, Elizabeth T. IsoSeq V3; 2020. https://github.com/PacificBiosciences/IsoSeq.
Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34(18):3094–100.
Article CAS Google Scholar
Brunner AL, Li J, Guo X, Sweeney RT, Varma S, Zhu SX, et al. A shared transcriptional program in early breast neoplasias despite genetic and clinical distinctions. Genome Biol. 2014;15(5):R71.
Article Google Scholar
Hayes DF, Sekine H, Ohno T, Abe M, Keefe K, Kufe DW. Use of a murine monoclonal antibody for detection of circulating DF3 antigen levels in breast cancer patients. J Clin Invest. 1985;75(5):1671–8.
Article CAS Google Scholar
Turman MA, Yabe T, Mcsherry C, Bach FH, Houchins JP. Characterization of a novel gene (NKG7) on human chromosome 19 that is expressed in natural killer cells and T cells. Hum Immunol. 1993;36(1):34–40.
Article CAS Google Scholar
Zhang C, Gadue P, Scott E, Atchison M, Poncz M. Activation of the megakaryocyte-specific gene platelet basic protein (PBP) by the Ets family factor PU.1. J Biol Chem. 1997;272(42):26236–46.
Article CAS Google Scholar
Mason DY, Cordell JL, Brown MH, Borst J, Stein H. CD79a: a novel marker for B-cell neoplasms in routinely processed tissue samples. Blood. 1995;86(4):1453–9.
Article CAS Google Scholar
Pacbcloud. Alzheimer8M; 2020. https://downloads.pacbcloud.com/public/dataset/IsoSeq_sandbox/2020_Alzheimer8M_subset.
Yousif A, Drou N, Rowe J, Khalfan M, Gunsalus KC. NASQAR: a web-based platform for high-throughput sequencing data analysis and visualization. BMC Bioinform. 2020;21(1):267.
Article Google Scholar

Download references

Acknowledgements

We thank L. Hu, Y.K. Chen for their helpful comments on this study, W. Wu and D.N. Tang for their help on software testing.

Funding

This work has been supported by the National Natural Science Foundation of China (32125005, 31930013, 32070416), the Strategic Priority Program of the Chinese Academy of Sciences (XDB31000000) and Youth Innovation Promotion Association of Chinese Academy of Sciences (2020086). None of these funding agencies played any roles in the design of the study and collection, analysis, and interpretation of data and in writing of this manuscript.

Author information

Authors and Affiliations

Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing, 100101, China
Xuemin Dong, Shengkai Pan & Xiangjiang Zhan
Key Laboratory of Biomedical Information Engineering of Ministry of Education, and Institute of Molecular Genetics, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, 710049, China
Shanshan Dong
Sino-Danish College, University of Chinese Academy of Sciences, Beijing, 100049, China
Xuemin Dong
CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223, China
Xiangjiang Zhan

Authors

Xuemin Dong
View author publications
You can also search for this author in PubMed Google Scholar
Shanshan Dong
View author publications
You can also search for this author in PubMed Google Scholar
Shengkai Pan
View author publications
You can also search for this author in PubMed Google Scholar
Xiangjiang Zhan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

XJZ and SKP conceived the project. XJZ, SKP and XMD designed the software. XMD developed the software and analysed the data. SKP and XMD wrote the manuscript. XJZ, SKP, SSD, XMD reviewed the manuscript. All authors read and approved the final draft.

Corresponding authors

Correspondence to Shengkai Pan or Xiangjiang Zhan.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional file

Additional file 1.

Estimated running time for each module, organization of all modules and functions, licenses for dependent packages of RNA-combine.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Dong, X., Dong, S., Pan, S. et al. RNA-combine: a toolkit for comprehensive analyses on transcriptome data from different sequencing platforms. BMC Bioinformatics 23, 26 (2022). https://doi.org/10.1186/s12859-021-04549-y

Download citation

Received: 03 April 2021
Accepted: 22 December 2021
Published: 06 January 2022
DOI: https://doi.org/10.1186/s12859-021-04549-y

RNA-combine: a toolkit for comprehensive analyses on transcriptome data from different sequencing platforms

Abstract

Background

Results

Conclusion

Background

Implementation

Introduction of analysis modules

Bulk RNA-seq data analysis

scRNA-seq data analysis

Iso-Seq data analysis

Interface of the input format

Visualization

Backtracking and recording

Results and discussion

Case studies

Compared with other methods

Conclusions

Availability and requirements

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Additional file

Additional file 1.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us