Intervene: a tool for intersection and visualization of multiple gene or genomic region sets

Background A common task for scientists relies on comparing lists of genes or genomic regions derived from high-throughput sequencing experiments. While several tools exist to intersect and visualize sets of genes, similar tools dedicated to the visualization of genomic region sets are currently limited. Results To address this gap, we have developed the Intervene tool, which provides an easy and automated interface for the effective intersection and visualization of genomic region or list sets, thus facilitating their analysis and interpretation. Intervene contains three modules: venn to generate Venn diagrams of up to six sets, upset to generate UpSet plots of multiple sets, and pairwise to compute and visualize intersections of multiple sets as clustered heat maps. Intervene, and its interactive web ShinyApp companion, generate publication-quality figures for the interpretation of genomic region and list sets. Conclusions Intervene and its web application companion provide an easy command line and an interactive web interface to compute intersections of multiple genomic and list sets. They have the capacity to plot intersections using easy-to-interpret visual approaches. Intervene is developed and designed to meet the needs of both computer scientists and biologists. The source code is freely available at https://bitbucket.org/CBGR/intervene, with the web application available at https://asntech.shinyapps.io/intervene. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1708-7) contains supplementary material, which is available to authorized users.


CHAPTER 1 Introduction
Intervene is a tool for intersection and visualization of multiple genomic region and gene sets (or lists of items).
Intervene provides an easy and automated interface for effective intersection and visualization of genomic region sets or lists of items, thus facilitating their analysis and interpretations. Intervene contains three modules.
• venn to compute Venn diagrams of up-to 6 sets • upset to compute UpSet plots of multiple sets • pairwise to compute and visualize intersections of genomic sets as clustered heatmap.
Intervene gives user flexibility to choose figure colors, labels, size, quality, and type to make them as publication standard.

Prerequisites
Intervene requires the following Python modules and R packages:

Install BEDTools
Intervene is using pybedtools, which is a Python wrapper for the BEDTools. BEDTools should be installed before using Intervene. It is recomended to have the latest version of the tool. Please read the installation instructions at https://github.com/arq5x/bedtools2 to install BEDTools, and make sure it is accessible through your PATH variable.

Install required Python modules
Intervene takes care of the installation of all the required Python modules. If you already have a working installation of Python, the easiest way to install the required Python modules is by installing Intervene using pip. If you're setting up Python for the first time, we recommend to install it using the Anaconda Python distribution http://continuum.io/ downloads. These come with several helpful scientific and data processing libraries. These are available for platforms including Windows, Mac OSX and Linux.
If you want to install the required Python modules manually, you can use the following commands.

Install Pandas
Install it from PyPi pip install pandas Or install with conda conda install pandas

Install required R packages
Intervene rquires three R packages, UpSetR , corrplot for visualization and Cairo to generate high-quality vector and bitmap figures. To install these, open R/RStudio and use the following command.

Install Intervene
You can install a stable version of Intervene by using pip from PyPi or a development version by using git from our bitbucket repository at https://bitbucket.org/CBGR/intervene.

Install using pip
You can install Intervene either from PyPi using pip or install it from the source. Please make sure you have already installed the above mentioned python libraries required to run Intervene. ./intervene/intervene venn -i intervene/example_data/ENCODE_hESC/ * .bed ./intervene/intervene upset -i intervene/example_data/ENCODE_hESC/ * .bed ./intervene/intervene pairwise -i intervene/example_data/dbSUPER_mm9/ * .bed These subcommands will save the results in the current working directory with a folder named Intervene_results. If you wish to save the results in a specific folder, you can type: intervene <module_name> --test --output~/path/to/your/results/folder CHAPTER 4 Intervene modules Intervene provides three types of plots to visualize intersections of genomic regions and list sets. These are pairwise heatmap of N genomic region sets, classic Venn diagrams of genomic regions and list sets of up to 6-way and UpSet plots.

Venn diagram module
Once you have installed Intervene, you can type: Usage: intervene venn [options] Note: Please scroll down to see a detailed summary of available options. Help: intervene venn --help Example: intervene venn -i path/to/BED/files/ * .bed This will save the results in the current working directory with a folder named Intervene_results. If you wish to save the results in a specific folder, you can type: intervene venn -i path/to/BED/files/ * .bed --output~/results/path {genomic,list}. Type of input sets. Genomic regions or lists of genes/SNPs. Default is genomic -names Comma-separated list of names as labels for input files. If it is not set file names will be used as labels.

Summary of options
For example: -names=A,B,C,D,E,F filenames Use file names as labels instead. Default is False

-colors
Comma-separated list of matplotlib-valid colors. E.g., -colors=r,b,k -o, -output Output folder path where results will be stored. Default is current working directory. This will run the program on test data.

UpSet plot module
Once you have installed Intervene, you can type: Usage: intervene upset [options] Note: Please scroll down to see a detailed summary of available options.
Help: You can also see list of options by typing this on the terminal.
intervene upset --help Example: intervene upset -i path/to/BED/files/ * .bed This will save the results in the current working directory with a folder named Intervene_results. If you wish to save the results in a specific folder, you can type:

Pairwise intersection module
Once you have installed Intervene, you can type: Usage: intervene pairwise [options] Note: Please scroll down to see a detailed summary of available options.

Help:
intervene pairwise --help Example: intervene pairwise -i path/to/BED/files/ * .bed --type jaccard --htype tribar This will save the results in the current working directory with a folder named Intervene_results. If you wish to save the results in a specific folder, you can type:

Summary of options
intervene upset --help

Pairwise module examples
In this example, we performed a pairwise intersections of super-enhancers in 24 mouse cell and tissue types from dbSUPER (Khan and Zhang, 2016) and showed the fraction of overlap in heatmap.

Usage instructions
To use this module you can upload a correctly formatted .csv or text file, encoded in binary. Before uploading the file, choose the correct separator, wheather the names in each column are seperated by a ' , ' choose comma, by a ' ; ' choose semicolon, or by tabs choose tab. Header names (first row) will be used as set names.
UpSet module takes three types of inputs.

List type data
List data is a correctly formatted csv/text file, with lists of names. Each column represents a set, and each row represents an element (names/gene/SNPs). Header names (first row) will be used as set names.

Binary type data
In the binary input file each column represents a set, and each row represents an element. If a names is in the set then it is represented as a 1, else it is represented as a 0.