Comparative analysis of ChIP-exo peak-callers: impact of data quality, read duplication and binding subtypes

Sharma, Vasudha; Majumdar, Sharmistha

doi:10.1186/s12859-020-3403-3

BMC Bioinformatics

Table 1 Peak-callers used for comparison in this study along with their key features and output formats

From: Comparative analysis of ChIP-exo peak-callers: impact of data quality, read duplication and binding subtypes

Tool	Key feature	Output
MACS, 2008 [20]	1. Uses bimodal distribution of reads to model fragment length. 2. Uses dynamic Poisson distribution to compare test and control samples	1. Peak position 2. p-value (based on pileup height at peak summit) and q value (against random Poisson distribution with local lambda)
GEM, 2012 [15]	1. Uses a generative probabilistic model to assign positions to the reads after each iteration 2. Reciprocally links binding event discovery and motif discovery 3. Resolves closely spaced binding events	1. Binding events file (including location, IP strength, fold enrichment, p-value is computed from the Binomial test when control data is available, p-value computed from Poisson test in the absence of control data, divergence of the IP reads from the empirical read distribution, fraction of noise, Kmer Group and p-value associated to the K-mer and strand) 2. Motif files 3. K-mer set memory motifs 4. HTML output 5. Read distribution file 6. The spatial distribution between primary and secondary motifs
Peakzilla, 2013 [18]	1. Estimates all parameters from the data itself 2. Uses bimodal distribution of reads to calculate fragment length and predict binding sites 3. Resolves closely spaced binding events	1. Peak file with exact position, summit, score (based on read distribution in peaks that fits bimodal tag distribution and chi-square test), FDR, fold enrichment. 2. Negative peaks in the presence of control.
Genetrack, 2008 [16]	1. Rapid data smoothing using Gaussian smoothing 2. Peak detection by selecting the highest peak in a local maximum with an exclusion zone of up to a few hundred bp 3. Combines strand information in a composite value 4. Requires manual pairing of border peaks	1. Gff file with chromosome, peak exclusion zone, tag sum, strand information and standard deviation of reads in the peak exclusion zone
MACE, 2014 [17]	1. Normalizes and corrects sequencing data for any biases 2. Consolidates signal to noise ratio by reducing noise 3. Detects border peaks using the Chebyshev Inequality and pairs them using Gale-Shapley stable matching algorithm	1. BED file containing border pairs of the binding event, the method for detecting each border pair and corresponding p-value (composite p-value of two borders in a pair)
Exoprofiler, 2015 [10]	1. Useful to detect different types of footprints 2. The peaks are scanned against the motif database to find the highest scoring peaks 3. High scoring peaks are then used to calculate 5′ ChIP-exo coverage of reads relative to the TFBS center to find the protein-DNA crosslink boundaries	1. Heat map of 5′ ChIP-exo coverage 2. Footprint profile of 5′ coverage of all reads 3. Footprint profile of the 5′ coverage of reads on both strands matching the scanned motif (output of motif permutation)
ChExMix, 2018 [11]	1. Probabilistic mixture model for characterizing different modes of DNA-protein interactions 2. Expectation Maximization (EM) algorithm for estimating binding subtype probability for each binding event	1. Event subtype file (reports total read count, signal fraction, binding coordinate, fold enrichment, event subtype, binding sequence, log[2]p-value (log likelihood score of subtype specificity for a motif hit)) 2. Motif file 3. Peak-peak distance histogram 4. Peak-motif distance histogram

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com