circRNAprofiler: an R-based computational framework for the downstream analysis of circular RNAs
BMC Bioinformatics volume 21, Article number: 164 (2020)
Circular RNAs (circRNAs) are a newly appreciated class of non-coding RNA molecules. Numerous tools have been developed for the detection of circRNAs, however computational tools to perform downstream functional analysis of circRNAs are scarce.
We present circRNAprofiler, an R-based computational framework that runs after circRNAs have been identified. It allows to combine circRNAs detected by multiple publicly available annotation-based circRNA detection tools and to analyze their expression, genomic context, evolutionary conservation, biogenesis and putative functions.
Overall, the circRNA analysis workflow implemented by circRNAprofiler is highly automated and customizable, and the results of the analyses can be used as starting point for further investigation in the role of specific circRNAs in any physiological or pathological condition.
Next-generation sequencing has revealed the expression of thousands of circular RNAs (circRNAs) in eukaryotic cells. At this time, no general function of circRNAs has been uncovered, but several lines of evidence indicate that circRNAs can act as microRNA sponges, sequester RNA binding proteins (RBPs), regulate transcription or provide a template for protein translation . CircRNAs are formed during pre-mRNA splicing, by a back-splicing event of one or two exons. They contain a unique exon-exon junction (also called back-splice junction (BSJ)), not present in the linear host transcript. Bioinformatic tools take advantage of this unique feature and use the BSJ-spanning reads to identify circRNAs from RNA sequencing data. Over a dozen of these circRNA detection tools have been developed, each differing in specificity, sensitivity, and criteria to define a circRNA. Due to these differences the use of multiple tools for circRNA detection is recommended to have the most accurate circRNA prediction.
With increasing numbers of circRNAs detected by these bioinformatics tools, the need for pipelines to analyze expression and putative functions of circRNAs became imperative. This led to the development of tools that cover different aspects of in silico circRNA analysis, from prediction to first functional insights (e.g. Circtools, CircPro and CircInteractome) [2,3,4]. These tools have specific strengths and limitations. For instance, Circtools combines circRNA data with positional data from eCLIP experiments to explore RBP sponge functions of circRNAs . Circtools is also useful to model changes in circRNA expression relative to that of the host gene. CircPro enables users to discover circRNAs with protein-coding potential . Limitations include restrictions on the use of specific aligners to map reads, installation of additional software, and compatibility with only one circRNA detection tool.
Here we present circRNAprofiler, an R-based framework that only requires an R-installation and offers 15 modules for a comprehensive in silico analysis of circRNAs. This computational framework allows to combine and analyze circRNAs detected by multiple publicly available annotation-based circRNA detection tools. It covers different aspects of circRNA analysis ranging from differential expression analysis, genomic context, evolutionary conservation, biogenesis to functional analysis. The pipeline used by circRNAprofiler is highly automated and customizable. Furthermore, circRNAprofiler includes additional functions for data visualization to facilitate interpretation of the results.
The workflow of circRNAprofiler is shown in Fig. 1. Detailed information on each module can be found in the package vignettes. Following, we only give a brief overview of the circRNA analysis workflow implemented by circRNAprofiler.
In module 1, the project folder is set up to start the analysis. The project folder contains project files and the circRNA prediction results. In detail, project files include the genome-annotation file, the file containing information about the experimental design and optional files containing user specifications that are used to customize the analysis to execute.
In module 2, circRNAs detected by one or more of the annotation-based circRNA detection tools MapSplice2 , NCLscan , CircMarker , CircExplorer2 , KNIFE , UROBORUS  are imported in R working environment.
In module 3, results of different circRNA detection tools are merged to create a unique data set ready to enter the downstream analyses.
In module 4, circRNAs are filtered on user-defined read counts to eliminate false positives.
In module 5, a pairwise comparison is executed to identify differentially expressed circRNAs.
Module 6 is used for conservation analysis of BSJ-coordinates across different species. The liftOver  utility from UCSC is used for this purpose.
In module 7, the internal exonic composition of circRNAs, as well as their flanking introns, are annotated. The genomic features are extracted from the user-provided gene annotations.
Module 8 offers the possibility to incorporate a set of random BSJs as an independent dataset. This may, for instance, be used for enrichment analysis in microRNA/RBP binding sites in a set of circRNAs.
In Modules 9, internal circRNA sequences (i.e. the exonic sequences between the BSJs) are extracted.
In Module 10, back-spliced junction sequences only are extracted.
In Module 11, the user can extract sequences flanking BSJs. These are: i) sequences of the introns flanking BSJs, ii) sequences from a user-defined genomic window surrounding the back-spliced junctions and iii) sequences of the back-spliced exons only.
Sequences extracted with Module 9–11 can enter the downstream screenings.
In module 12, sequences from module 9–11 are screened for the presence of RBP motifs (from ATtRACT or MEME databases), de novo motifs or motifs specified by the user.
In module 13, circRNA sequences are screened for miRNA binding sites. The user can select stringency criteria for miRNA binding prediction.
In module 14, SNPs of genome-wide association studies (GWAS catalog ) are annotated in the selected regions flanking BSJs to identify SNPs affecting circRNAs biogenesis.
In module 15, repetitive elements from the RepeatMasker database  are annotated in regions flanking the BSJs to investigate circRNAs biogenesis.
As a practical example, total RNA-sequencing data from human control hearts, dilated cardiomyopathy (DCM) tissues and hypertrophic cardiomyopathy (HCM) tissues (n = 3/group) were analyzed with 3 different circRNA detection tools: CircMarker , MapSplice2  and NCLscan  (see Additional File 1 for detailed method).
Analysis of the identified circRNAs – circRNAprofiler (development version 1.1.16)
After circRNA detection, the identified BSJ-spanning reads were analyzed with circRNAprofiler. The following experimental workflow is an example, but the pipeline used by circRNAprofiler is highly automated and customizable (i.e. different settings can be applied by the user).
Module 1 - set up project folder
First, the project folder is set up, which contains project files and the circRNA prediction results from CircMarker (cm), MapSplice2 (ms), and NCLscan (ns).
Module 2 - import predicted circRNAs
Files containing the detected circRNAs are imported in the R working environment. MapSplice2 predicted 7499 putative circRNAs, NCLscan identified 16,771 putative circRNAs, and CircMarker predicted 39,251 different circRNAs to be expressed in the heart samples (Additional File 1, Figure S1A; Additional File 2, sheet 1).
Module 3 - merge commonly identified circRNAs
The detected circRNAs were grouped, and we found that a total of 41,558 unique circRNAs were identified by the three detection tools (Additional File 2, sheet 2). Of these, a small subset of circRNAs (186 circRNAs) were derived from the antisense strand of the reported gene (i.e. antisense circRNAs, see Additional File 2, sheet 3). In that case, the strand from which the circRNA arises (i.e. reported in the prediction results) is different from the strand of which the gene is transcribed (i.e. reported in the genome annotation file). Interestingly, the back-spliced junction coordinates overlap with exon coordinates of the reported gene. This might be explained by technical artifacts or by the presence of a gene transcribed from the opposite strand that is not annotated. Furthermore, the expression of this subset of circRNAs is rather low and rather variable across samples. Due to the ambiguous nature of these predictions, the detected antisense circRNAs are excluded from the circRNA analysis pipeline.
Module 4 - filter circRNAs
CircRNA prediction results were filtered based on read counts (≥5 present in at least 3 samples of 1 condition) and found that 1458 circRNAs survived the filtering step with the majority of them (73%) detected by all 3 prediction tools (Additional File 1, Figure S1B; Additional File 2, sheet 4). The Na+/Ca2+ exchanger gene, Solute Carrier Family 8 Member A1 (SLC8A1) is considered to be the most abundantly expressed circRNA in the human heart [16,17,18]. Our analysis, however, revealed that CircMarker detected 3 circRNAs arising from the genes SYCP2, AL132709, and LGMN, which were predicted to be expressed approximately to the same level as SLC8A1, or even higher in case of circSYCP2 (Additional File 2, sheet 4). Since this high level of circSYCP2, circAL132709 and circLGMN expression was surprising to us, we verified their expression, together with circSLC8A1 experimentally, by RT-PCR in 4 human heart samples, with and without RNAse R treatment, using our previously described methods  (Additional File 1, Figure S1C). We did not detect a clear and specific amplicon at 35 and 40 PCR cycles for these three circRNAs, while circSLC8A1 was readily detectable already at 28 PCR cycles. This suggests that these 3 circRNAs are not as highly expressed as circSLC8A1 in the heart. However, we can not rule out that problems with primer efficiencies or design might underlie the lack of amplification for circAL132709 and circLGMN. To search for a positive control for our primers, we tested the same primers in an extensive human tissue panel of 22 different organs (Clontech, Cat No. 636643). We detected circAL132709 in adult liver and circSYCP2 faintly in fetal liver at 40 PCR cycles (Additional File 1, Figure S1D), indicating that the primers for circSYCP2 and circAL132709 are able to detect the specified circRNA. Therefore, we conclude that circSYCP2 and circAL132709, which are predicted to be highly expressed by CircMarker, are most likely not, or only lowly expressed in the heart. Overall, this suggests to us that the underlying algorithm of CircMarker is more prone to false-positive predictions and the results should be handled carefully.
Module 5 - differential circRNA expression analysis
Differential expression analysis on the 1458 filtered circRNAs revealed that 9 out of 1458 circRNAs were differentially expressed in DCM patients compared to control samples, 97 circRNAs were differently expressed in HCM patients compared to control samples, and 53 circRNAs were differentially expressed in HCM compared to DCM patients (Additional File 1, Figure S2A-C; Additional File 2, sheet 5–7). The helper function for the R Bioconductor package DESeq2  was used. As a cut-off, we used an absolute log2FC = 1 and an adjusted p-value≤0.05. Together with TTN circRNAs known to be deregulated in DCM and HCM patients  we found an additional single-exon circRNA to be highly expressed and dysregulated in DCM compared to control samples. This circRNA arises from the ALPK2 gene, which is reported to be highly expressed in the heart (GTEx portal).
Module 6 - conservation analysis of the BSJ coordinates
We mapped the BSJ coordinates of the 1458 filtered circRNAs to mouse genome coordinates and found that approximately 50% of the human BSJ coordinates were conserved to mice (Additional File 2, sheet 8).
Module 7 and 8 - analysis of the genomic context of circRNAs
Structural characterization of the 1458 filtered circRNAs revealed that introns flanking the predicted BSJs are longer than introns flanking randomly generated back-spliced exons (Additional File 1, Figure S2D). This is commonly observed by others as well . Furthermore, we did not find a difference in the length of the predicted back-spliced exons compared to the set of randomly generated back-spliced exons (Additional File 1, Figure S2E). Of note, the set of randomly generated back-spliced exons is a set of exons, that are normally not back-spliced. This set is solely created to be used for comparative purposes in the different analyses (e.g., exon-length, miRNA binding, RBP binding). We analyzed the number of different circRNAs per host gene and found that approximately half of the circRNAs-producing genes (676 genes) produced a single circRNA, while the other half generated between 2 to 46 different circRNAs. The two genes that produced the highest number of circRNAs were TTN and RyR2, which generated a total of 46 and 29 putative circRNAs, respectively (Additional File 1, Figure S2F). Most back-spliced exons were predicted to comprise between 1 and 6 exons (Additional File 1, Figure S2G).
Module 9, 10 and 11 - retrieval of internal circRNA sequences, BSJ sequences and sequences flanking the BSJ
In these modules, the user can extract sequences of interest for further downstream analysis. The package is designed to be able to analyze one or multiple circRNAs in the same run. We selected the single-exon circRNA arising from the ALPK2 gene for further analysis because we found it to be differentially expressed in DCM hearts compared to control hearts. Therefore, we retrieved the circRNA sequence of the gene ALPK2 (module 9), as well as the sequences surrounding the BSJ (module 11) of ALPK2.
Module 12 - RBP analysis on the internal circRNA sequence and on sequences surrounding the BSJs
CircRNAprofiler can be used to analyze RBP motifs on the internal circRNA sequences. This analysis is useful to investigate whether the circRNA(s) in question is/are likely to bind or scavenge certain RBPs. RBP analysis of the circALPK2 exonic sequence revealed multiple enriched RBP motifs compared to the remaining subset of 1457 filtered circRNAs (Additional File 1, Figure S3A; Additional File 3, sheet 1–3).
To investigate circRNAs biogenesis, it may be useful to assess putative RBP binding sites in sequences flanking the BSJ. Therefore, we extracted sequences from a defined genomic window surrounding the predicted BSJ of circALPK2 (module 11) and analyzed these sequences for the presence of RBP motifs (default settings). A subset of 200 nucleotides were taken from the flanking introns and 10 nucleotides from the back-spliced exon. Interestingly, we found a ~ 4-fold enrichment in the introns surrounding circALPK2 for RBM24, a heart and skeletal muscle-specific splicing factor  (Additional File 1, Figure S3B; Additional File 3, sheet 4–6) compared to the remaining subset of 1457 filtered circRNAs. A similar result was observed when this circRNA was compared to a subset of sequences flanking randomly generated BSJs (data not shown). This suggests that RBM24 may be involved in the biogenesis of this particular circRNA.
Module 13 - miRNA binding site analysis on the internal circRNA sequence
Next, we analyzed the internal circRNA sequence (retrieved from module 9) of circALPK2 for the presence of potential miRNA binding sites. The miRNA sequences we analyzed were downloaded from a study in which miRNA deep sequencing was performed on human left atria (LA) and ventricles (LV) under normal physiologic conditions . From this study, we filtered the miRNAs based on their expression level in the LV (average expression of at least 10 reads in the 4 control samples). A subset of 361 miRNAs survived the filtering step and were used in our miRNA binding site analysis. As cut-off, we used ≥6 total matches between the seed region of the miRNAs and the seed site in the target sequence of which at least 5 are canonical Watson-Crick matches and 1 can be a non-canonical match (G:U). With this setting, we found several potential miRNAs binding sites within ALPK2 circRNA sequence, including a total of 48 putative sites for miR-130b-5p (Additional File 1, Figure S3C; Additional File 4).
Module 14 - GWAS SNPs analysis on the intron flanking BSJs
We analyzed flanking introns (retrieved from module 11) of the 1458 filtered circRNAs for the presence of diseased-associated GWAS SNPs that may affect circRNAs biogenesis. In our analysis, we only considered GWAS SNPs associated with cardiac and vascular traits reported in the GWAS Catalog . We found that the 241 GWAS SNPs are located within introns flanking the predicted circRNAs (Additional File 5). A further step could be to assess whether the risk allele is present in the genome of the analyzed samples (control and diseased patients). The presence of the risk allele, coupled with differential expression of the corresponding circRNAs, might be the first step to shed light on whether these SNPs might affect circRNAs biogenesis.
Module 15 - repetitive elements analysis on the intron flanking BSJs
Finally, analysis of the introns flanking the predicted BSJs of the 1458 filtered circRNAs reveals that approximately half of them contain complementary Alu repeats (Additional File 6). The presence of complementary Alu repeats in introns is a known mechanism in the formation of circRNAs .
circRNAprofiler provides an R-based framework that allows to combine circRNAs detected by multiple annotation-based detection tools and to perform structural and functional circRNA analysis. The results of all the aforementioned analyses can serve as starting point for further investigation in the role of specific circRNAs in heart disease. Above all, these analyses described here provides an example of how this R-package can be used in the study of circRNA expression, function, or biogenesis of any physiological or pathological entity. We intend to develop additional functions in the future to provide a comprehensive and up-to-date framework for the characterization of circRNAs.
Availability and requirements
Project name: circRNAprofiler
Project home page: https://github.com/Aufiero/circRNAprofiler.
Operating system(s): Platform independent
Programming language: R
Other requirements: R v 3.6.0 or higher
Any restrictions to use by non-academics: No limitations.
Availability of data and materials
Raw RNA sequencing data are available at NCBI BioProject accession number PRJNA533243.
All data generated or analyzed during this study are included in this published article (and its supplementary information files).
RNA-binding motif protein 20
RNA-binding motif protein 24
RNA binding proteins
Genome-wide association study
Li X, Yang L, Chen L-L. The biogenesis, functions, and challenges of circular RNAs. Mol Cell. 2018;71:428–42. https://doi.org/10.1016/j.molcel.2018.06.034.
Jakobi T, Uvarovskii A, Dieterich C. circtools - a one-stop software solution for circular RNA research. Bioinformatics. 2018. https://doi.org/10.1093/bioinformatics/bty948.
Meng X, Chen Q, Zhang P, Chen M. CircPro: an integrated tool for the identification of circRNAs with protein-coding potential. Bioinformatics. 2017;33:3314–6. https://doi.org/10.1093/bioinformatics/btx446.
Dudekula DB, Panda AC, Grammatikakis I, De S, Abdelmohsen K, Gorospe M. CircInteractome: a web tool for exploring circular RNAs and their interacting proteins and microRNAs. RNA Biol. 2016;13:34–42. https://doi.org/10.1080/15476286.2015.1128065.
Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, He X, Mieczkowski P, Grimm SA, Perou CM, MacLeod JN, Chiang DY, Prins JF, Liu J. MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res. 2010;38:e178. https://doi.org/10.1093/nar/gkq622.
Chuang T-J, Wu C-S, Chen C-Y, Hung L-Y, Chiang T-W, Yang M-Y. NCLscan: accurate identification of non-co-linear transcripts (fusion, trans-splicing and circular RNA) with a good balance between sensitivity and precision. Nucleic Acids Res. 2016;44:e29. https://doi.org/10.1093/nar/gkv1013.
Li X, Chu C, Pei J, Măndoiu I, Wu Y. CircMarker: a fast and accurate algorithm for circular RNA detection. BMC Genomics. 2018;19:572. https://doi.org/10.1186/s12864-018-4926-0.
Zhang X-O, Dong R, Zhang Y, Zhang J-L, Luo Z, Zhang J, Chen L-L, Yang L. Diverse alternative back-splicing and alternative splicing landscape of circular RNAs. Genome Res. 2016;26:1277–87. https://doi.org/10.1101/gr.202895.115.
Szabo L, Morey R, Palpant NJ, Wang PL, Afari N, Jiang C, Parast MM, Murry CE, Laurent LC, Salzman J. Statistically based splicing detection reveals neural enrichment and tissue-specific induction of circular RNA during human fetal development. Genome Biol. 2015;16:126. https://doi.org/10.1186/s13059-015-0690-5.
Song X, Zhang N, Han P, Moon B-S, Lai RK, Wang K, Lu W. Circular RNA profile in gliomas revealed by identification tool UROBORUS. Nucleic Acids Res. 2016;44:e87. https://doi.org/10.1093/nar/gkw075.
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. https://doi.org/10.1186/s13059-014-0550-8.
Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–40. https://doi.org/10.1093/bioinformatics/btp616.
Kuhn RM, Haussler D, Kent WJ. The UCSC genome browser and associated tools. Brief Bioinform. 2013;14:144–61. https://doi.org/10.1093/bib/bbs038.
MacArthur J, Bowler E, Cerezo M, Gil L, Hall P, Hastings E, Junkins H, McMahon A, Milano A, Morales J, Pendlington ZM, Welter D, Burdett T, Hindorff L, Flicek P, Cunningham F, Parkinson H. The new NHGRI-EBI catalog of published genome-wide association studies (GWAS catalog). Nucleic Acids Res. 2017;45:D896–901. https://doi.org/10.1093/nar/gkw1133.
Smit AFA, Hubley R, Green P. 2013–2015., RepeatMasker Open-4.0., (n.d.). http://www.repeatmasker.org/.
Khan MAF, Reckman YJ, Aufiero S, van den Hoogenhof MMG, van der Made I, Beqqali A, Koolbergen DR, Rasmussen TB, van der Velden J, Creemers EE, Pinto YM. RBM20 regulates circular RNA production from the Titin gene. Circ Res. 2016;119:996–1003. https://doi.org/10.1161/CIRCRESAHA.116.309568.
Werfel S, Nothjunge S, Schwarzmayr T, Strom T-M, Meitinger T, Engelhardt S. Characterization of circular RNAs in human, mouse and rat hearts. J Mol Cell Cardiol. 2016;98:103–7. https://doi.org/10.1016/j.yjmcc.2016.07.007.
Tan WLW, Lim BTS, Anene-Nzelu CGO, Ackers-Johnson M, Dashi A, See K, Tiang Z, Lee DP, Chua WW, Luu TDA, Li PYQ, Richards AM, Foo RSY. A landscape of circular RNA expression in the human heart. Cardiovasc Res. 2017;113:298–309. https://doi.org/10.1093/cvr/cvw250.
Ragan C, Goodall GJ, Shirokikh NE, Preiss T. Insights into the biogenesis and potential functions of exonic circular RNA. Sci Rep. 2019;9:2048. https://doi.org/10.1038/s41598-018-37037-0.
Poon KL, Tan KT, Wei YY, Ng CP, Colman A, Korzh V, Xu XQ. RNA-binding protein RBM24 is required for sarcomere assembly and heart contractility. Cardiovasc Res. 2012;94:418–27. https://doi.org/10.1093/cvr/cvs095.
Kakimoto Y, Tanaka M, Kamiguchi H, Hayashi H, Ochiai E, Osawa M. MicroRNA deep sequencing reveals chamber-specific miR-208 family expression patterns in the human heart. Int J Cardiol. 2016;211:43–8. https://doi.org/10.1016/j.ijcard.2016.02.145.
Jeck WR, Sorrentino JA, Wang K, Slevin MK, Burd CE, Liu J, Marzluff WF, Sharpless NE. Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA. 2013;19:141–57. https://doi.org/10.1261/rna.035667.112.
This work was supported by the Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO-836.12.002, NWO-821.02. 021) and by the Cardiovasculair Onderzoek Nederland (CVON): CVON-ARENA-2011-11.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary methods and Figures. This file contains supplementary methods and figures.
circRNAs detected in human hearts. This file contains circRNAs detected in human control hearts (C1, C2, C3), dilated cardiomyopathy (D1, D2, D3) tissues and hypertrophic cardiomyopathy (H1, H2, H3) tissues with 3 different circRNA detection tools: CircMarker (cm), MapSplice2 (ms) and NCLscan (ns). Results of the circRNA differential expression analysis. Human circRNAs conserved to mouse.
RBP analysis. This file contains the results of the RBP analysis in the ALPK2 circRNA sequence and in the region flanking the BSJ of circALKP2.
miRNA binding sites analysis. This file contains the results of the miRNA binding sites analysis in the ALPK2 circRNA sequence.
GWAS SNPs analysis. This file contains the results of GWAS SNPs analysis on the intron flanking BSJs.
Repetitive elements analysis. This file contains the results of the repetitive elements analysis on the intron flanking BSJs.
About this article
Cite this article
Aufiero, S., Reckman, Y.J., Tijsen, A.J. et al. circRNAprofiler: an R-based computational framework for the downstream analysis of circular RNAs. BMC Bioinformatics 21, 164 (2020). https://doi.org/10.1186/s12859-020-3500-3
- Differential expression analysis
- Sequence analysis
- Functional prediction