CIRCUS: a package for Circos display of structural genome variations from paired-end and mate-pair sequencing data
© Naquin et al.; licensee BioMed Central Ltd. 2014
Received: 12 February 2014
Accepted: 10 June 2014
Published: 18 June 2014
Detection of large genomic rearrangements, such as large indels, duplications or translocations is now commonly achieved by next generation sequencing (NGS) approaches. Recently, several tools have been developed to analyze NGS data but the resulting files are difficult to interpret without an additional visualization step. Circos (Genome Res, 19:1639–1645, 2009), a Perl script, is a powerful visualization software that requires setting up numerous configuration files with a large number of parameters to handle. R packages like RCircos (BMC Bioinformatics, 14:244, 2013) or ggbio (Genome Biol, 13:R77, 2012) provide functions to display genomic data as circular Circos-like plots. However, these tools are very general and lack the functions needed to filter, format and adjust specific input genomic data.
We implemented an R package called CIRCUS to analyze genomic structural variations. It generates both data and configuration files necessary for Circos, to produce graphs. Only few R pre-requisites are necessary. Options are available to deal with heterogeneous data, various chromosome numbers and multi-scale analysis.
CIRCUS allows fast and versatile analysis of genomic structural variants with Circos plots for users with limited coding skills.
NGS has become a widely used tool for detecting large-scale genome variations. When genomic DNA is to be sequenced, DNA is first fragmented. Genomic libraries can then be produced and sequenced from one end or both ends of the fragments, commonly referred to as single end or paired-end sequencing, respectively. Paired-end or mate-pair sequencing strongly facilitates the detection of genomic rearrangements and is therefore the preferred method for this type of analysis. Two reads of a fragment that align to abnormal positions on a chromosome, or to two different chromosomes, may indicate a structural variation. A list of these variations is difficult to analyze since one variation often joins two positions that were originally remote. There is no genome browser that permits visualization of these distant genomic events. Visualization tools [1–3] have been developed, displaying each variation as a link between positions on a circular ideogram. The most commonly used one, Circos, is very flexible but requires installation of Perl modules, a familiarity with the operating system to run effectively and with the parameters that are used in its configuration files. To circumvent this last difficulty, the variant detection program SVDetect  provides a script that converts its output into a format readable by Circos together with a limited tutorial set of configuration files. Recently, the RCircos package  was proposed to obtain Circos-like plots in an R environment . However, the user cannot zoom into a chromosome and has to program the functions needed to generate input data. Circos, ggbio and RCircos are very powerful tools; however they were designed to manage a wide variety of analyses, and this flexibility leads to rather complex handling.
In order to provide fast visual analyses of structural genome variations, we have developed a wrapper of Circos for the R langage which supports a subset of Circos functionnalities and shelters the user from managing the large number of parameters in Circos configuration files. This software, CIRCUS, can parse output files from several variant structure detection tools to write all necessary files for Circos execution, customizable with options for a quick and flexible image production.
The concentric rings can be divided into two main parts. The first one contains one or two regions called “view(s)” within a chromosome of interest. The second was designed to display a set of entire chromosomes as well as an optional pseudo-chromosome (referred to as NM for “No Match” chromosome). It can be used to display the links for which one of the two reads does not map on the reference genome, what can indicate an integration site of a foreign DNA fragment. The relative size of these two parts can be adjusted. Inside the inner ring of the image, links are painted with color gradients according to the user defined values. Only links with at least one foot in the view(s) will be displayed.
The core and peripheral functions have many input arguments to allow flexibility; to simplify their use, almost all have default values. At the end of the process, a log file is created showing all arguments used in function calls and the primary data used for the image display.
The aims of CIRCUS are to focus on structural genome variations, and to allow non-bioinformaticians to visualize their data in a straightforward way. Therefore, CIRCUS is an R wrapper that uses only a part of Circos functionalities. Ideogram skeleton components (thickness, ticks) are fixed, as well as most of the tracks graphic parameters, depending on the kind of data displayed: coverage is drawn in histogram style, CNVs in heatmap style, annotations in highlights style and links are simple lines with a fixed thickness. In this fixed framework, the user can decide which tracks to display and can set up zoom criteria. A typical analysis may include an iterative view of the links from each chromosome against all others, followed by zooms on regions of interest. The borders of each event can thus be precisely delineated, whatever the size of the corresponding DNA fragment. Localization of foreign sequence insertions such as mobile elements can be detected by links to the NM pseudo-chromosome.
Result and discussion
This study has been performed using the following functions, called mainly with default parameters.
Creating the karytotype file:
< create_karyotype (file = "K12_MG1655.fasta")
Computing reads coverage by bins of 10 kb:
< coverage_adapt (file = "CX1313.sam", win = 10000,
chr_file = "karyotype.txt")
Adapting and coloring gene annotation:
< tab_annot_adapt (file = "K12_MG1655.gtf", coln = c(2,4,5,7,3))
< annot_paint (file = "K12_MG1655_annot.circus")
Processing and coloring NM links:
< NM_adapt (file = "CX1313.sam")
< NM_paint (file = "CX1313_NM.circus", fragSize = 330, threshold = 50)
Adapting CNV data from FREEC results:
< FREEC_CNV_adapt (file = "CX1313_CNVs")
< CNV_paint (file = "CX1313_CNVs_CNV.circus")
Adapting and coloring genome variations from SVDetect results:
< SVD_links_adapt (file = "CX1313.links.filtered")
< links_paint (file = "CX1313.links.filtered_links.circus",
conv = data.frame(c("INS_FRAGMT", "INV_INS_FRAGMT"), c(2,3)))Producing the image of Figure 3:
< chromosome_image(chr = "K12_MG1655.fa", view1 = c(1,NA),
chr_file = "karyotype.txt",
feat_file = "K12_MG1655_annot_painted.circus", coverage = 1,
NM_file = "CX1313_NM_painted.circus",
links_file = "CX1313.links.filtered_links_painted.circus",
< links_paint (file = "CX1313contig.links.filtered_links.circus",
conv = data.frame(c("TRANSLOC", "INV_TRANSLOC"), c(2,4)))
< chromosome_image (chr = "K12_MG1655.fa", view1 = c(1, NA),
chr_file = "karyotype.txt",
feat_file = "K12_MG1655_annot_painted.circus", coverage = 1, image_file = "Figure 4A",
links_file = "CX1313contig.links.filtered_links_painted.circus",
CNV_file = "CX1313_CNV_painted.circus")Producing the image Figure 4B:
< chromosome_image(chr = "K12_MG1655.fa", view1 = c(930000, 1002000),
view2 = c(4050000, 4100000), chr_file = "karyotype.txt", feat_file = "K12_MG1655_annot_painted.circus", coverage = 1, image_file = "Figure 4B",
links_file = "CX1313contig.links.filtered_links_painted.circus",
NM_file = "CX1313_NM_painted.circus", flag_view_outer = FALSE,
CNV_file = "CX1313_CNV_painted.circus")
The CIRCUS package is a simple solution for both biologists and bio-informaticians that want to display structural variants of genomes. As CIRCUS allows a programmer to easily add adaptors, its canvas may also be suitable for other applications, such as Hi-C, as long as events can be represented by links.
Availability and requirements
CIRCUS is available at https://www.imagif.cnrs.fr/plateforme-36-Plateforme_de_Sequencage_a_Haut_Debit.html.
CIRCUS is an R package and requires the installation of the Circos software. It may also require the SAMtools and BEDtools packages as well as Python to allow reads coverage and NM links displays.
The authors thank Mireille Ansaldi for providing the data used for the figures. We are grateful to Yan Jaszczyszyn, Erwin van Dijk, Hélène Auger and Maximilian Haussler for their comments on the manuscript.
The work is supported by the Centre National de la Recherche Scientifique and the IMAGIF sequencing platform.
- Krzywinski MI, Schein JE, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA: Circos: an information aesthetic for comparative genomics. Genome Res. 2009, 19: 1639-1645. 10.1101/gr.092759.109.View ArticlePubMed CentralPubMedGoogle Scholar
- Yin T, Cook D, Lawrence M: ggbio: an R package for extending the grammar of graphics for genomic data. Genome Biol. 2012, 13: R77-10.1186/gb-2012-13-8-r77.View ArticlePubMed CentralPubMedGoogle Scholar
- Sven Ekdahl , Erick L, Sonnhammer L: ChromoWheel: a new spin on eukaryotic chromosome visualization. Bioinformatics. 2004, 20 (4): 576-577. 10.1093/bioinformatics/btg448. doi: 10.1093/bioinformatics/btg448View ArticlePubMedGoogle Scholar
- Zeitouni B, Boeva V, Janoueix-Lerosey I, Loeillet S, Legoix-ne P, Nicolas A, Delattre O, Barillot E: SVDetect: a tool to identify genomic structural variations from paired-end and mate-pair sequencing data. Bioinformatics. 2010, 26: 1895-1896. 10.1093/bioinformatics/btq293.View ArticlePubMed CentralPubMedGoogle Scholar
- Zhang H, Meltzer P, Davis S: RCircos: an R package for Circos 2D track plots. BMC Bioinformatics. 2013, 14: 244-10.1186/1471-2105-14-244.View ArticlePubMed CentralPubMedGoogle Scholar
- R Core Team: R: A Language and Environment for Statistical Computing. 2014, Vienna, Austria: R Foundation for Statistical Computing, http://www.R-project.org,Google Scholar
- Ye K, Schulz MH, Long Q, Apweiler R, Ning Z: Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics. 2009, 25 (21): 2865-2871. 10.1093/bioinformatics/btp394.View ArticlePubMed CentralPubMedGoogle Scholar
- Zerbino DR, Birney E: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008, 18: 821-829. 10.1101/gr.074492.107.View ArticlePubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.