DensityMap: a genome viewer for illustrating the densities of features

Guizard, Sébastien; Piégu, Benoît; Bigot, Yves

doi:10.1186/s12859-016-1055-0

Software
Open access
Published: 06 May 2016

DensityMap: a genome viewer for illustrating the densities of features

Sébastien Guizard¹,
Benoît Piégu¹ &
Yves Bigot¹

BMC Bioinformatics volume 17, Article number: 204 (2016) Cite this article

5493 Accesses
8 Citations
5 Altmetric
Metrics details

Abstract

Background

Several tools are available for visualizing genomic data. Some, such as Gbrowse and Jbrowse, are very efficient for small genomic regions, but they are not suitable for entire genomes. Others, like Phenogram and CViT, can be used to visualise whole genomes, but are not designed to display very dense genomic features (eg: interspersed repeats). We have therefore developed DensityMap, a lightweight Perl program that can display the densities of several features (genes, ncRNA, cpg, etc.) along chromosomes on the scale of the whole genome. A critical advantage of DensityMap is that it uses GFF annotation files directly to compute the densities of features without needing additional information from the user. The resulting picture is readily configurable, and the colour scales used can be customized for a best fit to the data plotted.

Results

DensityMap runs on Linux architecture with few requirements so that users can easily and quickly visualize the distributions and densities of genomic features for an entire genome. The input is GFF3-formated data representing chromosomes (linkage groups or pseudomolecules) and sets of features which are used to calculate representations in density maps. In practise, DensityMap uses a tilling window to compute the density of one or more features and the number of bases covered by these features along chromosomes. The densities are represented by colour scales that can be customized to highlight critical points. DensityMap can compare the distributions of features; it calculates several chromosomal density maps in a single image, each of which describes a different genomic feature. It can also use the genome nucleotide sequence to compute and plot a density map of the GC content along chromosomes.

Conclusions

DensityMap is a compact, easily-used tool for displaying the distribution and density of all types of genomic features within a genome. It is flexible enough to visualize the densities of several types of features in a single representation. The images produced are readily configurable and their SVG format ensures that they can be edited.

Background

Visualizing the ever-increasing amounts of DNA sequence data for genomic purposes is becoming a great challenge [1]. One solution is to develop genome browsers. The first, and probably the most popular, was the UCSC Genome Browser, which was released in 2002 and used to display human genomic data [2]. Several others, including Gbrowse, JBrowse, Abrowse and Annot-J [3], are now available. They are ergonomically more efficient than the original and include new functions, such as collaborative annotation with web Appollo [4]. These browsers are useful for displaying discrete chromosome regions but are not suitable for visualizing whole chromosomes.

Other tools have been developed for visualizing whole chromosomes. One of the most widely used is Circos [5, 6], which represents chromosomes by arranging them on a circle. It can also be used to plot annotations, quantitative data and relationships between parts of different chromosomes or genomes [7]. However, Circos representations become dense as their complexity increases, which alters the efficacy of their visualization. Two new programs designed to simplify visualization of whole chromosome sequences were released recently. PhenoGram [8] represents chromosomes and uses ideograms, lines, and different coloured symbols to locate information like phenotypes, genes, CNVs, SNPs, etc. While the PhenoGram web-interface is user-friendly, it requires the input files to be in a specific tabulated format rather than a standard format like Generic feature format (GFF), the most common format for annotation files. It also cannot display the density of a specific feature at a given position in a chromosome. CviT (ChromosomeVisualization Tool) [9] circumvents these limitations. It can represent chromosome contents from a GFF file, is readily configurable and the output image can be customized. CViT can also plot the densities of some features along chromosomes using histograms placed beside the chromosome representation. This tool produces reliable images when the features are not too dense but becomes limited when the density of a feature like interspersed repeats or DNA motifs is high. CViT must also use a GFF file that contains the density of a feature for a given set of windows along a chromosome. As Cvit is not designed to compute these densities, the GFF file must be revised each time the window width is changed. We have therefore developed a program, DensityMap.pl, inspired by CviT, which can produce maps that include the densities of one or more types of features while displaying the whole genome in a chromosome.

Implementation

DensityMap is run with Perl script in the command line and uses the GD::SVG Perl package to produce SVG pictures. DensityMap computes a representation of the density of a feature on chromosomes using one GFF file (GFF2, GFF2.5 or GFF3) describing a chromosome as input. The program plots as many density maps along a chromosome as there are features specified. It can plot a density map for the plus strand, minus strand, or the plus and minus strands, combinations of plus and minus strands, or plus, minus and compiled strands, for each feature. Density is computed using a tilling window without overlap whose length is fixed by the user or automatically computed to produce an output image that fits the maximum image size. All this information can be set by the user in the command line. DensityMap also automatically calculates the density of a feature for each pixelized region of a chromosome, whatever the representation scale used. The way the density of a feature varies along a chromosome is represented using a colour scale from 0 to 100 %. A single colour scale can be used for all features investigated or each feature can have is own colour scale. Like CViT, DensityMap.pl produces visualizations that are fully configurable in a Scalable Vector Graphics (SVG) format. This makes it easy to edit high quality images for publication. The program also includes graphical options for configuring almost all elements (margins, map width, scale, etc.) of the image. The options are shown in Table 1.

Table 1 DensityMap options

Full size table

The program computes the size of the output image according to the number of chromosomes (GFF files), the number of features to represent, the number of strands to plot and the window size. If the user chooses automatic scale computing, the program calculates a windows size that gives an image that lies within the maximum image size defined by the user. The program asks the user to check the output picture size before processing the data. It then builds the image by adding the various graphical elements (background, title, scale) and processes the data for plotting the chromosome strands. It sequentially opens GFF files, filter features (GFF file third column) selected by the user with the option -ty (types). The intervals are collected and sorted by their beginnings and merged to remove overlaps. Lastly, the program computes the densities - (number of bases covered by the feature /window size) x 100 - and then draws it within the image. A synopsis of the main algorithm and functions is supplied in Additional file 1 and a manual in Additional file 2.

Even if the main purpose of DensityMap is to plot whole genome data, it can be interesting to compare specific loci of several sequences. This can be done using the --region_file option. The user has to provide a BED file - a tabular formatted file compound of three column where the first column design the sequence, the second the region start position and the third region end position - describing the region of interest on each sequence. In addition to the density map, the program produce a CSV file - a tabular formatted file - that contain the densities computed for all features, windows and sequences.

Results

We have used DensitMap to examine two examples based on data on the genome of Drosophila melanogaster (available at http://flybase.org). The first (Fig. 1) illustrates the capacity of DensityMap to represent features that occur very frequently in a genome. This study is of the genes, exons, regions coding ncRNAs and the GC content of D. melanogaster chromosomes. The image produced shows that genes cover very large regions of the chromosomes, are absent from the centromeres and less frequent on the Y chromosome. As expected, the distribution of exons agrees with that of the genes. The representation of the GC content shows that the centromeres are GC-poor while the regions covered by genes are GC-enriched. The terminal regions are different of the rest of the X chromosome in that they are very GC-rich. The image also shows that ncRNAs are evenly distributed throughout the chromosomes, except for the centromeres and chromosome Y and a few regions where the ncRNA density is over 10 %.

The second example illustrates the ability of DensityMap to produce images describing features that occur at extreme (high or low) densities. We looked at the distributions and densities of three kinds of transposable elements (TEs): LTR and LINE retrotransposons and rolling-circle transposons. Rolling-circle transposons like helitrons are present in this genome, but they are much less abundant than LTR or LINE retrotransposons. These features were visualized with colour scales that were appropriate for features present at low density (Fig. 2). The default program setting rounds down values using a floor method that transforms values between 0 and 1 to 0. But, in this case, we selected the ceiling method, which rounds up values between 0 and 1 to 1 and are thus visualized. The densities of the LTR and LINE retrotransposons can also be visualized. Their distributions in the D. melanogaster genome are similar, except that LTRs are very dense in the inner regions of the Y chromosome while most LINEs are present at one end. The TEs in chromosomes 2 and 3 are clustered in the telomeres. A large intra-chromosomal region is devoid of repeated elements. Rolling circle transposons are concentrated at the ends of chromosomes 2 and 3 and the arms of the Y chromosome. The red windows seem to indicate helitron hotspots. Helitrons are also present in the inner regions of chromosomes but their densities are very low. There are two hotspots of these TEs on the X chromosome, one in each telomere; they are absent from most of the other regions. The density of helitrons in most regions of chromosome 4 is over 10 %.

Conclusion

The development of sequencing technologies has led to improvements in genome sequence models—they have become better adapted and much more varied. This, in turn, has led to the development of tools for analysing the genome models, such as genome browsers. While these tools are most useful for viewing small regions of chromosomes, very few provide an overall view of the complete genome. CViT and Phenogram provide two solutions, but they also have limitations: non-standard annotation file formats, or not designed to deal with very dense annotation files such as repeated sequences. DensityMap can automatically compute the densities of features to give a series of windows along chromosomes—and this for a complete genome. It is very flexible; it can be used to analyse not just very dense annotations but also low density annotations by applying the computing and graphical options provided. It is also very efficient for plotting density maps of total repeats – satellites, TEs, simple sequence repeats - of human genome – 5 295 850 features – in 2 min 14 second a on computer equipped of a Intel(R) Xeon(R) W3670 CPU @ 3.20GHz and 16 Go of RAM. DensityMap is very simple to install and run, and so is a good way to obtain a global view of genomic data. To make easier the usage of DensityMap to persons non initiate to linux command line, we developed a web graphical user interface for online DensityMap analysis.

Availability and requirements

Project name: DensityMap.pl
Project home page: https://github.com/sguizard/DensityMap
Graphical user interface: http://chicken-repeats.inra.fr/launchDM_form.php
Operating system(s): Linux
Programming language: Perl
Other requirements: Perl module GD::SVG
License: GNU GPL v3
Restrictions on its non-academic use: None

Abbreviations

BED:: browser extensible data
Bp:: base pair
CNV:: copy number variation
CPU:: central processing unit
CSV:: comma-separated values
DNA:: deoxyribonucleic acid
GFF:: generic feature format
LINE:: long interspersed nuclear element
LTR:: long terminal repeat
Mbp:: mega base pair
RAM:: random access memory
SNP:: single nucleotide polymorphism
SVG:: scalable vector graphics
TE:: transposable element

References

Batley J, Edwards D. Genome sequence data: Management, storage, and visualization. Biotechniques. 2009;46:333–5.
Article CAS PubMed Google Scholar
Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, et al. The Human Genome Browser at UCSC. Genome Res. 2002;12:996–1006.
Article CAS PubMed PubMed Central Google Scholar
Wang J, Kong L, Gao G, Luo J. A brief introduction to web-based genome browsers. Brief Bioinformatics. 2013;14:131–43.
Article PubMed Google Scholar
Lee E, Helt G, Reese JT, Munoz-Torres MC, Childers CP, et al. Web Apollo: a web-based genomic annotation editing platform. Genome Biol. 2013;14:R93.
Article PubMed PubMed Central Google Scholar
Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, et al. Circos: An information aesthetic for comparative genomics. Genome Res. 2009;19:1639–45.
Article CAS PubMed PubMed Central Google Scholar
An J, Lai J, Sajjanhar A, Batra J, Wang C, et al. J-Circos: an interactive Circos plotter. Bioinformatics. 2015;31:1463–5.
Article PubMed Google Scholar
Pont C, Murat F, Guizard S, Flores R, Foucrier S, et al. Wheat syntenome unveils new evidences of contrasted evolutionary plasticity between paleo- and neoduplicated subgenomes. Plant J. 2013;76:1030–44.
Article CAS PubMed Google Scholar
Wolfe D, Dudek S, Ritchie MD, Pendergrass S. Visualizing genomic information across chromosomes with PhenoGram. BioData Mining. 2013;6:18.
Article PubMed PubMed Central Google Scholar
Cannon EKS, Cannon SB. Chromosome visualization tool: a whole genome viewer. International J plant genom. 2011;2011:373875.
Google Scholar

Download references

Acknowledgements

We thank Jérome Salse, Hadi Quesneville and Claire Lemaitre for advice and discussions during program development. Dr Owen Parkes edited the English text.

Funding

This work was funded by the Région Centre (AviGeS Project), C.N.R.S., I.N.R.A., the Groupements de Recherche CNRS 3546 (Elements Génétiques Mobiles) and 3604 (Modèles Aviaires), and the Ministère de l’Education Nationale, de la Recherche et de la Technologie.

Author information

Authors and Affiliations

UMR INRA-CNRS 7247, PRC, Centre INRA Val de Loire, 37380, Nouzilly, France
Sébastien Guizard, Benoît Piégu & Yves Bigot

Authors

Sébastien Guizard
View author publications
You can also search for this author in PubMed Google Scholar
Benoît Piégu
View author publications
You can also search for this author in PubMed Google Scholar
Yves Bigot
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yves Bigot.

Additional information

Competing interest

The authors declare that they have no competing interest.

Authors’ contributions

SG developed the DensityMap program. BP, YB helped with program design and publication editing. All authors read and approved the final manuscript.

Authors’ information

Sébastien Guizard holds a doctoral fellowship jointly funded by I.N.R.A. (PHASE department)/Région Centre, and a training grant for the Ecole doctorale “Santé, Sciences Biologiques et Chimie du Vivant” of the University PRES Centre Val de Loire.

Benoît Piégu is a C.N.R.S. engineer based at the INRA Centre, Tours.

Yves Bigot is C.N.R.S. Research Director at the INRA Centre, Tours.

Additional files

Additional file 1:

Synopsis of the main program and functions. (DOCX 6 kb)

Additional file 2:

DensityMap Manual. (DOCX 239 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Guizard, S., Piégu, B. & Bigot, Y. DensityMap: a genome viewer for illustrating the densities of features. BMC Bioinformatics 17, 204 (2016). https://doi.org/10.1186/s12859-016-1055-0

Download citation

Received: 19 January 2016
Accepted: 13 April 2016
Published: 06 May 2016
DOI: https://doi.org/10.1186/s12859-016-1055-0

DensityMap: a genome viewer for illustrating the densities of features