- Open Access
jvenn: an interactive Venn diagram viewer
BMC Bioinformaticsvolume 15, Article number: 293 (2014)
Venn diagrams are commonly used to display list comparison. In biology, they are widely used to show the differences between gene lists originating from different differential analyses, for instance. They thus allow the comparison between different experimental conditions or between different methods. However, when the number of input lists exceeds four, the diagram becomes difficult to read. Alternative layouts and dynamic display features can improve its use and its readability.
jvenn is an open source component for web environments helping scientists to analyze their data. The library package, which comes with full documentation and an example, is freely available at http://bioinfo.genotoul.fr/jvenn.
High-throughput biology has led to an increasing number of data, with more and more complex experimental designs. The analysis of these data often produces biological identifier lists, including gene names or OTU (Operational Taxonomic Unit), obtained from different methods (for differential analysis) or from different experimental conditions. Venn diagrams  are a common visualization chart, which allows to spot shared and unshared identifiers providing an insight on lists similarities.
In a Venn diagram, each list is presented by a transparent shape. Shape overlaps contain the elements shared between lists or more often the corresponding counts. In proportional Venn diagrams, the size of a shape is proportional to the number of elements of the corresponding list or of the corresponding lists intersection. Venn diagrams with up to four lists are easy to read and understand but Venn diagrams with more than four lists, are much harder to interpret. To solve this problem, the Edwards-Venn  representation introduces new shapes providing a clearer view, shown in the example of Figure 1.
Many Venn diagram software packages are already available. The first six lines of Table 1 present the main packages with their main features (maximum number of input lists, input data formats, Venn diagram layouts, application types and output formats). The table gives insight on several aspects of the Venn diagram production and highlights that, up to now, no web application handled up to six lists. VENNTURE  is the only application able to produce such diagrams but it only implements Edwards layout and runs only under MS-Windows OS, producing static MS-PowerPoint and MS-Excel files. Proportional Venn diagrams can only display a very limited number of lists, three at maximum. The only feature available in other software which is not in jvenn is the proportional diagram. This is justified by the fact that jvenn was designed to display up to six lists and that proportional diagram is not suited to visualize more than three lists.
This section presents the main features of the jvenn library, including the kind of inputs it accepts, the different types of charts it displays, the types of the outputs and how it can be integrated in websites or directly used on our example web page.
The jvenn library accepts three different input formats : “Lists”, “Intersection counts” and “Count lists”. Examples are presented in Table 2, where the different lists are “sample1” and “sample2”, the elements of the different lists are given in the fields “data”. For “Intersection counts”, the lists are given a label (“A” or “B”) which is used to make the correspondence between the list and its count. Finally, “Count lists” provide a count number for each element of a list. Hence, with “Count lists” the figures presented in the diagram correspond to the sums of counts of all elements shared between lists. they can be particularly useful to present OTU read counts . For “Lists” and “Count lists”, jvenn computes the intersection counts and displays the chart. For “intersection counts”, the intersection counts is provided by the user.
Venn diagrams are commonly used to present up to six lists but for six lists, the intersection areas obtained when using a proportional layout are often too small to display the figures.To display five or six lists diagrams, in a user-friendly manner, jvenn implements several features. First, the layout can be switched between the standard layout and the Edwards-Venn layout (Figure 1) which gives a clearer graphical representation for six lists diagrams. To enhance the figure’s readability for the classical six lists Venn chart, some count values are not shown and some are display outside the chart, using lines to line the count to its corresponding area. However, this is still not enough to show all figures. Therefore, a switch button panel (right side of Figure 2) was added. It enables to switch on and off the different lists and to display the corresponding intersection counts. When the number of characters of the intersection count exceeds the available space to display it, the value is substituted by a question mark. When the mouse is mouved over this question mark, the value pops-up. To emphasize the list involved in an intersection area, jvenn highlights the intersection shapes when mouse is moved over, fading the others out.
The extra charts presented under the Venn diagram ease the verification and comparison of multiple lists. The list size graph allows users to check the homogeneity of the input list sizes. The intersection size graph can be used to compare the compactness of multiple Venn diagrams.
Scientists are usually interested in extracting identifier lists for some intersections, therefore, jvenn implements a one-click function which retrieves the names of the corresponding sets and the identifiers. To find an identifier, one can use a dynamic search box. The shapes containing the matching identifiers are highlighted when using this tool.
jvenn can also be directly used as a web application, which is available at http://bioinfo.genotoul.fr/jvenn/example.html(Figure 3). jvenn’s web application performances depend on the client browser. Using the current version on a standard Linux computer (one cpu, four GB of RAM), it displays a six lists diagram of 10,000 identifiers in two seconds.
M.A. Dillies and colleagues  have compared seven methods for normalization and search of differentially expressed genes in RNASeq data. This study is designed to provide a set of best practices to help biologists with their data processing. Table 2 of the cited article is the contingency table of the differentially expressed genes obtained from the seven methods, where counts in the table correspond to the intersection of two lists obtained from two different methods. The raw data table, kindly provided by the team, contains 5,277 lines and seven columns. The columns correspond to the different methods presented in the “Differential expression analysis” section of their article. The data in the table was filtered (p<0.05) to retrieve the gene name lists corresponding to each method. As, jvenn handles only six list at most, six out of the seven lists were selected for further processing: we left out the median normalization method because, for one hand, this method is very similar to several other methods (as shown in the article) and, for the other hand, we believe that median is a poor estimate of the sequencing length, which is the bias that normalization methods try to correct. The lists were uploaded to the jvenn application and a Venn diagram was obtained, using an Edwards layout, which is shown in Figure 1.
The same analysis was performed with VENNTURE, the only other tool able to generate a six list Edwards Venn diagram. First, the software package was installed on a computer running under MS-Windows OS. The six gene lists were loaded in an MS-Excel spreadsheet and VENNTURE was run using the spreadsheet as input generating a static MS-PowerPoint file containing the diagram and a MS-Excel file with all the intersection contents.
The lists overlaps, as produced by jvenn, are given in Figure 1 (Edwards layout) and Figure 2 (standard layout). The highest counts are located in central areas of the graph, showing that the corresponding methods share large portions of gene lists. The jvenn statistics show that the different methods produce gene lists with very different sizes (minimum 417 - maximum 1,249) and that most of the genes are shared between methods: 1,069 genes out of 1,347 are common between at least four methods. In a very intuitive manner, the chart also points out that the results are strongly consensual since there are many zeros in the peripheral areas. Only a few genes (114) are specific to one list only (24 for FQ, 27 for UQ and 63 for DESeq, which appears to be the less restrictive method, as shown in the barplot below the Venn diagram, and also the most different from the others). Genes that are in two lists only are also very few (47: 13 for DESeq and TMM, 5 for UQ and FQ, 15 for TMM and UQ, 8 for FQ and DESeq and 6 for DESeq and UQ). Note that all these numbers are easily read from the chart and that the strong consensus between the lists is also clearly shown from the upper side figure “Number of elements: specific or shared by several lists”). Such findings are not easily shown using only contingency tables.The largest count over all lists overlaps is found to be 484, which is the number of genes found to be differential by DESeq, TMM, UQ and FQ. As shown in Figure 3, this list is very easily retrieved from the web application in one click only, providing the biologist with a large list of very consensual list to study.On the other hand, if the biologist is interested in one specific gene, this gene can easily be tracked using the search box at the top side of Figure 3. As no specific gene is of interest in the seminal work, we simply picked out one of the 5,277 genes randomly (G002562) and used it in the search box. It was found to be part of the five genes specific to FQ and UQ.
Making the same analysis with VENNTURE is also possible but a bit harder: the 484 genes shared by DESeq, TMM, UQ and FQ can be found easily in the intersection spreadsheet outputed by VENNTURE but the diagram did not allow to search for gene G002562. Thus, this gene has to be found using MS-Excel text search in the intersection spreadsheet, which is less handy than a dynamic and interactive search. Moreover, the additional statistics are not provided by the tool.
jvenn enables to compare up to six lists and updates the diagram automatically when modifying the lists content. Compared to VENNTURE it does not need any local installation of a new program and it gives access to a dynamic diagram providing simple tools to extract gene lists and perform searches. jvenn’s statistics charts give a simple and quick overview of the sizes of the different lists and of their overlaps. It permits to compare different Venn diagrams. These features are not available in the VENNTURE software package.
For biologists using different techniques in their experiment or in their statistical analysis, jvenn enables to quickly extract the shared identifiers. When comparing different methods applied to extract differentially expressed genes, these features ease the analysis.
Availability and requirements
Project name: jvenn
Project home page:http://bioinfo.genotoul.fr/jvenn
Project demo site:http://bioinfo.genotoul.fr/jvenn/example.html
Operating system(s): Platform independent
Other requirements: Web browser
License: GNU GPL
Any restrictions to use by non-academics: GNU GPL
Venn J: On the diagrammatic and mechanical representation of propositions and reasonings. Philos Mag J Sci. 1880, 9: 1-18. 10.1080/14786448008626791.
Edwards AWF: Cogwheels of the Mind: The Story of Venn Diagrams. 2004, Baltimore: Johns Hopkins University Press
Martin B, Chadwick W, Yi T, Park S-S, Lu D, Ni B, Gadkaree S, Farhang K, Becker KG, Maudsley S: Vennture-a novel venn diagram investigational tool for multiple pharmacological dataset analysis. PLoS ONE. 2012, 7 (5): e36911-10.1371/journal.pone.0036911.
Chen H: VennDiagram: Generate High-resolution Venn and Euler Plots. 2013, [http://cran.r-project.org/web/packages/VennDiagram/index.html],
Hulsen T, de Vlieg J, Alkema W: Biovenn a web application for the comparison and visualization of biological lists using area-proportional venn diagrams. BMC Genomics. 2008, 9: 488-10.1186/1471-2164-9-488.
Oliveros J: An Interactive Tool for Comparing Lists with Venn Diagrams. 2007, [http://bioinfogp.cnb.csic.es/tools/venny/index.html],
The Canvasxpress Venn Diagram Functionalities. [http://canvasxpress.org/venn.html],
The Google Chart API. [https://developers.google.com/chart/],
Bianchia L, Gagliardi A, Campanella G, Landi C, Capaldo A, Carleo A, Armini A, Leo VD, Piomboni P, Focarelli R, Bini L: A methodological and functional proteomic approach of human follicular fluid en route for oocyte quality evaluation. J Proteomics. 2013, 90: 61-76.
Aravindraja C, Viszwapriya D, Pandian SK: Ultradeep 16s rrna sequencing analysis of geographically similar but diverse unexplored marine samples reveal varied bacterial community composition. PLOS one. 2013, 8 (10): e76724-10.1371/journal.pone.0076724.
Mariette J, Escudie F, Allias N, Salin G, Noirot C, Thomas S, Klopp C: Ng6: Integrated next generation sequencing storage and processing environment. BMC Genomics. 2012, 13: 462-10.1186/1471-2164-13-462.
Mariette J, Noirot C, Nabihoudine I, Bardou P, Hoede C, Djari A, Cabau C, Klopp C: RNAbrowse: RNA-seq de Novo Assembly Results Browser. PLOS one. 2014, 9 (5): e96821-10.1371/journal.pone.0096821.
Clemente HS, Jamet E: Wallprotdb, a database resource for plant cell wall proteomics. [http://www.polebio.lrsv.ups-tlse.fr/WallProtDB/],
Oscar W, Mitchell S, Ian H: Visualizing next-generation sequencing data with jbrowse. Brief Bioinform. 2013, 14 (2): 172-177. 10.1093/bib/bbr078. doi:10.1093/bib/bbr078. [http://bib.oxfordjournals.org/content/14/2/172.full.pdf+html],
Lopes C, Franz M, Kazi F, Donaldson S, Morris Q, Bader G: Cytoscape web: an interactive web-based network browser. Bioinformatics. 2010, 26 (18): 2347-2348. 10.1093/bioinformatics/btq430.
Deu-Pons J, Schroeder MP, Lopez-Bigas N: jheatmap: an interactive heatmap viewer for the web. Bioinformatics. 2014, 30 (12): 1757-1758. 10.1093/bioinformatics/btu094.
Dillies M-A, Rau A, Aubert J, Hennequet-Antier C, Jeanmougin M, Servant N, Keime C, Marot G, Castel D, Estelle J, Guernec G, Jagla B, Jouneau L, Laloë D, Gall CL, Schaëffer B, Crom SL, Guedj M, Jaffrézic F, The French StatOmique Consortium: A comprehensive evaluation of normalization methods for illumina high-throughput rna sequencing data analysis. Brief Bioinform. 2012, 14 (6): 671-683.
We would like to acknowledge all our users for providing us useful feedback on the system and for pointing out features worth developing. We thank the reviewers and Nathalie Villa-Vialaneix for their insightful and constructive comments. We also thank Julie Aubert and the French StatOmique Consortium for providing us the data used in the “Results” section.
The authors declare that they have no competing interests.
JM conceived and designed the project. JM, PB, FE and CD implemented the project. CK evaluated software capabilities, and provided feedback on implementation. JM and CK wrote the manuscript. All authors read and approved the final manuscript.
Philippe Bardou, Jérôme Mariette contributed equally to this work.