diffHic: a Bioconductor package to detect differential genomic interactions in Hi-C data
© Lun and Smyth. 2015
Received: 11 May 2015
Accepted: 22 July 2015
Published: 19 August 2015
Chromatin conformation capture with high-throughput sequencing (Hi-C) is a technique that measures the in vivo intensity of interactions between all pairs of loci in the genome. Most conventional analyses of Hi-C data focus on the detection of statistically significant interactions. However, an alternative strategy involves identifying significant changes in the interaction intensity (i.e., differential interactions) between two or more biological conditions. This is more statistically rigorous and may provide more biologically relevant results.
Here, we present the diffHic software package for the detection of differential interactions from Hi-C data. diffHic provides methods for read pair alignment and processing, counting into bin pairs, filtering out low-abundance events and normalization of trended or CNV-driven biases. It uses the statistical framework of the edgeR package to model biological variability and to test for significant differences between conditions. Several options for the visualization of results are also included. The use of diffHic is demonstrated with real Hi-C data sets. Performance against existing methods is also evaluated with simulated data.
On real data, diffHic is able to successfully detect interactions with significant differences in intensity between biological conditions. It also compares favourably to existing software tools on simulated data sets. These results suggest that diffHic is a viable approach for differential analyses of Hi-C data.
KeywordsHi-C Genomic interaction Differential analysis
Most analyses of Hi-C data have focused on identifying “significant” interactions from a single sample [2, 3]. This is challenging because non-specific ligation and apparent interactions can arise from a variety of uninteresting technical causes and rigorous analysis requires a precise quantitative understanding of these artifacts. Identifying biologically interesting interactions from a single sample requires elaborate modeling of the background signal in Hi-C experiments in order to correct for systematic biases due to GC content, mappability and fragment length . Such modeling inevitably involves assumptions and approximations. Furthermore, the interaction space for any single sample will be dominated by conserved features such as topologically associating domains . These may not be of scientific interest when interactions specific to a particular cell type or experiment condition are being sought. An alternative approach is to identify interactions that are significantly different across two or more biological conditions [5–7]. These differential interactions (DIs) are likely to be scientifically relevant because they are directly associated with the biological conditions being studied. A differential analysis is also technically simpler because it involves a like-for-like comparison, where the intensity of the same interaction is compared between samples. The fact that the same genome is present across samples implies that sequence-related genomic biases will be largely constant between conditions and therefore will tend to cancel out during testing. It follows that interaction-specific biases due to GC content, mappability and similar causes will be substantially mitigated.
Although several studies have performed custom analyses to detect differential interactions from Hi-C data [5, 6], there are only a couple of publicly available software packages that can do this type of analysis [7, 8]. HOMER is a command-line software suite that tests for DIs, assuming binomially-distributed counts and using a background model that takes sequence-based and compartmental biases into account . However, HOMER is limited to comparisons between two libraries and does not consider the variability between biological replicates. The binomial assumption means that the tests will only account for sequencing variability. HiBrowse is a user-friendly web-tool implemented in Python  that can make comparisons between two experimental conditions. This uses the edgeR package  to estimate biological variablity between replicates. However, HiBrowse is implemented as a web-tool and is not practical for high-throughput analyses of large-scale datasets.
Here, we present the diffHic package for rigorous detection of differential interactions. Unlike previous tools, diffHic is able to accommodate complex experimental designs, including paired or blocked designs and those with more than two groups. It does this by accessing the generalized linear model functionality of edgeR . diffHic also estimates biological variability between replicates using quasi-likelihood methods that robustly control the type I error and false discovery rates . diffHic includes functionality to consolidate results at different resolutions while maintaining rigorous error rate control.
In the diffHic pipeline, read pairs are aligned to a reference genome, processed for quality control and counted into bin pairs across the interaction space. Low-abundance bin pairs are filtered out and the remaining bin pairs are normalized with non-linear methods to eliminate complex biases between libraries. Bin pairs are tested for significant differences between conditions using the latest methods in the edgeR package . Careful attention is given to filtering and normalization steps that are sometimes overlooked in existing analysis pipelines. In particular, diffHic provides new normalization methods to removed trended biases that are abundance-dependent. diffHic also implements methods to remove simple scaling biases between libraries and methods to remove genomic biases between interactions and between libraries . diffHic can efficiently handle large datasets.
This article outlines the functionality of the diffHic package. The practical use of the diffHic package is demonstrated with some real Hi-C data sets, for which a number of DIs are successfully detected between conditions. Simulated data is also generated to show that diffHic provides improved sensitivity and error rate control for DI detection, compared to the HOMER software suite.
diffHic is implemented as an R package. The code is written primarily in R, with time-critical functions written in C++ for greater speed. It makes use of a number of core Bioconductor packages  such as GenomicRanges, Rsamtools and BSgenome, in addition to edgeR. The pipeline takes a set of name-sorted BAM files  as input, and processes them into HDF5 files  prior to further analysis. A helper script written in Python is also provided to facilitate read alignment. The analysis can be run interactively through an R session, or it can be automated for batch jobs.
Results and discussion
Introduction to the real data sets
The diffHic pipeline can be applied on any Hi-C data set containing biological replicates across multiple conditions, where the aim is to detect DIs between conditions. In the following sections, the use of diffHic will be demonstrated on three Hi-C data sets. Each was obtained from the NCBI Gene Expression Omnibus, with the accession number shown below in parentheses. The first data set is taken from a study on human prostate epithelial cells overexpressing the ERG protein or a GFP control (GSE37752) . The aim of the differential analysis in this study is to detect ERG-induced changes in the chromatin structure. The second data set is taken from a study on human embryonic stem cells (ESCs) and lung myofibroblasts (GSE35156) , where the aim is to detect changes between cell types. The final data set is taken from a study on mouse neural stem cells before and after deletion of the Rad21 gene (GSE49017) , which aims to identify changes due to the loss of cohesin activity. Two biological replicates are present for each condition in all studies.
Read alignment and processing
The first step in a Hi-C data analysis is read alignment to a reference genome. However, this is complicated by the presence of chimeric reads. Recall that a proximity ligation step is performed to construct the Hi-C library. This involves ligating together two interacting DNA fragments from different parts of the genome. A chimeric read is generated when sequencing of the ligation product is performed across the ligation junction. This means that the 5′ and 3′ segments of the read are derived from distinct genomic loci. Correct alignment of the 5′ end is more important than that of the 3′ end as the location of the latter is already provided by the mate read. Naïvely performing local alignment of full-length reads will be suboptimal as there is no preference for the proper alignment of the 5′ end.
The diffHic package uses a pre-splitting strategy to perform chimeric read alignment. This approach takes advantage of the known “signature” sequence around the ligation junction . The ligation signature is easily derived from the known recognition sequence of the restriction enzyme used for the initial digestion of the chromatin. For example, the HindIII enzyme has a recognition sequence of AAGCTT with a 4 bp 5′ overhang, resulting in a ligation signature of AAGCTAGCTT. Each read sequence containing this signature is split into 5′ and 3′ segments at the centre of the signature, using the Cutadapt program . Each segment of each read in each pair is then independently aligned to the reference genome using Bowtie2 . This pre-splitting approach outperforms the naïve approach for simulated chimeric reads (Additional file 1: Section 1, Table S1). For both chimeric and non-chimeric reads, pre-splitting also outperforms the “iterative mapping” approach, where each read is truncated to a 5′ subsequence and gradually extended from the 3′ end until it aligns uniquely . Similar differences are observed when these non-naïve strategies are applied to real Hi-C libraries (Additional file 1: Section 1, Table S2).
Once reads are aligned into BAM files, a number of quality control steps can be applied to remove artifacts. The sizes of the sequencing fragments are estimated by computing the distance of each read to the nearest restriction site in the direction of the read, and summing those distances for both reads in the pair. Fragments with sizes above a default threshold of 600 bp are assumed to result from non-specific cleavage and are discarded . Inward-facing read pairs less than 1 kbp apart are also discarded, to avoid dangling ends from inefficient ligation of (incompletely digested) restriction fragments . Similarly, outward-facing read pairs less than 25 kbp apart are discarded to avoid self-ligation products from those fragments.
For the real data, reads were aligned using the pre-splitting strategy to the appropriate reference genome for each study – mm10 for mouse, and hg19 for human. Read pairs were ignored if the 5′ segment of either read was unmapped, had a mapping quality (MAPQ) score below 10 or was marked as a potential PCR duplicate with the MarkDuplicates tool in the Picard suite v1.117 ( http://broadinstitute.github.io/picard ). Quality control was applied to all remaining read pairs, as described. Any technical replicates were pooled into a single library. Approximately 25–55 % of read pairs were retained in the final libraries.
Counting into bin pairs
The bin size is a critical parameter that determines the desired resolution of the analysis. Larger bins will contain more reads and provide larger counts, increasing precision and power for downstream hypothesis testing . This is often necessary for Hi-C data where read pairs are sparsely distributed across the interaction space. In contrast, smaller bins have lower counts but achieve greater spatial resolution, i.e., adjacent regions in the interaction space can be distinguished. This is important for detecting sharp events such as looping interactions, where the use of larger bins would result in “contamination” by irrelevant counts in the neighbouring space. Traditionally, bin sizes from 100 kbp to 1 Mbp have been used [1, 5, 6, 20] though sizes below 10 kbp are feasible with higher-resolution studies [19, 21]. Analyses with different sizes can be consolidated later for comprehensive detection of DIs.
For the real data sets, pairs of 1 Mbp bins were used for counting. This ensures that the counts are sufficiently large, albeit at the cost of spatial resolution. In addition, bin pairs with one or more bins on chromosome Y were discarded. This avoids spurious detection of DIs between conditions due to sex differences. diffHic is also capable of performing higher-resolution analyses – some results with smaller bin sizes (20–100 kbp) are presented throughout Additional file 1.
Filtering out low-abundance bin pairs
Filtering is recommended to remove low-abundance bin pairs prior to further analysis. This decreases the severity of the multiple testing correction; avoids loss of accuracy for statistical approximations at low, discrete counts; and reduces computational work. In edgeR’s statistical framework, the filter statistic for each bin pair is the average log-count-per-million (CPM), i.e., the average abundance across all libraries. This is (roughly) independent of the p-value under the null hypothesis, i.e., that there is no difference in counts between conditions . Any bin pair with an average abundance below a specified threshold value can be discarded. The aim is to enrich for false nulls without affecting the type I error rate for true nulls .
A number of different filtering approaches are implemented in diffHic. The simplest method uses the median abundance of all inter-chromosomal bin pairs as an estimate of the non-specific ligation rate, and only retains bin pairs with abundances above this estimate. This is motivated by the organization of chromosomes into self-contained territories , which limits the number of genuine contacts that can occur between chromosomes. Another strategy involves fitting a trend to the abundance of intra-chromosomal bin pairs against genomic distance, i.e., the distance between bins in each bin pair. A bin pair is only retained if its abundance is greater than the fitted value of the trend. This assumes that most interactions are driven by the compaction of the linear genome into the nucleus  which is largely uninteresting. Finally, bin pairs corresponding to high-abundance “peaks” in the two-dimensional interaction space can also be identified . This approach regards diffuse interactions as uninteresting and selects for sharp events instead.
The choice of filtering approach for each analysis depends on the interactions of interest. For example, if the researcher is interested in looping interactions, the peak-based approach may be more useful. In this paper, the simple non-specific method was used for filtering in each real data set. This avoids strong assumptions regarding the definition of “interesting”, as non-specific ligation is obviously uninformative and should be removed. Specifically, filtering was performed to only retain bin pairs with average abundances that were five-fold higher than the estimated non-specific ligation rate. This removes the majority of low-abundance bin pairs that are dominated by non-specific ligation, as these are unlikely to be genuine (differential) interactions. Note that the choice of five-fold is arbitrary – other values can be used so long as the majority of low-abundance bin pairs are removed. Obviously, excessively high thresholds are not ideal as power will be lost from removal of genuine DIs.
Normalization for library-specific biases
The iterative correction strategy of Imakaev et al.  is also implemented in diffHic. This method factorizes out genomic biases from the interaction intensities, yielding “true” contact probabilities that can be compared between interactions. This method facilitates comparisons between different interactions and can also be used to remove condition-specific genomic biases if these are considered to be important for a particular dataset.
It should be stressed that these normalization strategies do not alter the counts directly. Rather, they compute offsets that are used in fitting generalized linear models (GLMs). For all downstream steps, the offsets computed by the loess-based method (to remove trended biases) were used for the Sofueva et al. data set, while those computed by multi-dimensional smoothing (to remove CNV biases) were used for the Rickman et al. and Dixon et al. data sets. This corrects for the presence of CNVs in the immortalized cell lines that were used in the latter analyses. In all cases, normalization was only applied to bin pairs that remained after filtering.
Modelling complex experimental designs
The values o bi are offsets that incorporate the sequencing depth and other normalization factors. The offset o bi is equal to the logarithm of the total number of unfiltered read pairs for sample i, modulated by any normalization factors computed by the methods described in the previous section. The offsets are computed automatically by the diffHic normalization functions and are usually invisible to users. They provide a flexible mathematical means by which bin-specific, condition-specific and sample-specific adjustments can be incorporated into the analysis.
Modelling technical and biological variability
Testing for significant differences
Results can also be consolidated for easier interpretation. If multiple analyses were performed with different bin sizes, smaller bin pairs can be nested within larger “parent” bin pairs. The p-values of both nested and parent bin pairs can be combined using Simes’ method , yielding a single combined p-value that represents the overall evidence for a DI within the parent. The genomic coordinates of the parent bin pair can then be reported, along with the combined p-value and its FDR-adjusted value. This avoids redundant results from reporting multiple nested bin pairs individually. Similarly, adjacent bin pairs in the interaction space can be clustered together and reported as a single event to reduce redundancy. This is demonstrated in Additional file 1: Figure S2 for a high-resolution analysis using 20 kbp bin pairs.
Comparison with existing tools
It should be mentioned that this is not the first time that edgeR has been used to analyze Hi-C data. The HiBrowse pipeline uses edgeR to detect DIs between groups in the presence of biological replicates . However, HiBrowse is limited in that it does not account for trended NB dispersions, complex experimental designs or non-linear normalization schemes. diffHic can naturally accommodate these aspects of the differential analysis, as it uses the latest GLM-based methods in edgeR . diffHic can also account for variable dispersions across bin pairs through the QL framework [11, 12]. Finally, HiBrowse is a web tool that is somewhat inconvenient for high-throughput use, whereas diffHic can be easily run on local systems.
Intended use and future directions
The diffHic package should be used to detect DIs between two or more biological conditions in a Hi-C experiment. This provides an alternative to conventional analysis strategies that aim to detect “significant” interactions within each sample. The differential analysis may yield more relevant results when the aim of the study is to detect changes in chromatin organization. We anticipate that diffHic – and differential analyses in general – will complement the existing conventional methods, such that the most appropriate analysis strategy can be selected based on the research question. Future development of diffHic will aim to accommodate other types of chromatin conformation data, such as DNase Hi-C  and Capture-C .
The diffHic package provides a comprehensive and rigorous pipeline for detecting DIs from Hi-C data. Functions are available for alignment and processing; read counting with bin pairs; filtering of low-abundance bin pairs; normalization to remove trended and CNV-driven biases; statistical analyses to model biological variability and to test for significance; and visualization of detected features. A demonstration with real data provides some examples of the types of DIs that can be detected with this approach. Analyses of simulated data indicate that diffHic provides better performance than the existing HOMER software. These results suggest that diffHic may be a useful alternative to conventional methods for Hi-C data analysis, especially for researchers who want to conduct differential analyses.
Availability and requirements
The diffHic package is part of the open-source Bioconductor project  and can be installed by following the standard Bioconductor installation procedures, as described at http://www.bioconductor.org/packages/release/bioc/html/diffHic.html. diffHic is freely available under version 3 of the GNU General Public License. It is platform independent and can be used on any system that can run R and Bioconductor.
All statistical analyses reported in this article were run on a Dell Precision laptop with an Intel i7 processor and 16 GB of RAM. Analyses were performed using CentOS 6.6, R v3.2.0, Bioconductor v3.1, diffHic v1.0.0 and edgeR v3.10.0. Read alignments were run separately on a Linux server using Bowtie2 v2.2.4. Excluding the Bowtie2 alignnments, all analyses ran in less than an hour using one core.
Project name: diffHicProject home page: http://www.bioconductor.org/packages/release/bioc/html/diffHic.html Operating systems: UNIX, Windows, MacOS Programming language: R version 3.2.0 or higher, C++ Other requirements: diffHic depends on the Bioconductor packages GenomicRanges, Rsamtools, Biostrings, BSgenome, IRanges, S4Vectors, GenomeInfoDb, BiocGenerics, rhdf5, edgeR, limma, csaw, locfit, methods.License: GPL-3Any restrictions to use by non-academics: none
ATLL was funded by the University of Melbourne (Elizabeth and Vernon Puzey scholarship). GKS was funded by the National Health and Medical Research Council (Program Grant 1054618 and Fellowship). This study was undertaken with Victorian State Government Operational Infrastructure Support and Australian Government NHMRC IRIIS.
- Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, et al.Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009; 326(5950):289–93.View ArticlePubMedPubMed CentralGoogle Scholar
- Imakaev M, Fudenberg G, McCord RP, Naumova N, Goloborodko A, Lajoie BR, et al.Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods. 2012; 9(10):999–1003.View ArticlePubMedPubMed CentralGoogle Scholar
- Yaffe E, Tanay A. Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nat Genet. 2011; 43(11):1059–65.View ArticlePubMedGoogle Scholar
- Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, et al.Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012; 485(7398):376–80.View ArticlePubMedPubMed CentralGoogle Scholar
- Rickman DS, Soong TD, Moss B, Mosquera JM, Dlabal J, Terry S, et al.Oncogene-mediated alterations in chromatin conformation. Proc Natl Acad Sci U S A. 2012; 109(23):9083–088.View ArticlePubMedPubMed CentralGoogle Scholar
- Seitan VC, Faure AJ, Zhan Y, McCord RP, Lajoie BR, Ing-Simmons E, et al.Cohesin-based chromatin interactions enable regulated gene expression within preexisting architectural compartments. Genome Res. 2013; 23(12):2066–077.View ArticlePubMedPubMed CentralGoogle Scholar
- Paulsen J, Sandve GK, Gundersen S, Lien TG, Trengereid K, Hovig E. HiBrowse: multi-purpose statistical analysis of genome-wide chromatin 3D organization. Bioinformatics. 2014; 30(11):1620–22.View ArticlePubMedPubMed CentralGoogle Scholar
- Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, et al.Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell. 2010; 38(4):576–89.View ArticlePubMedPubMed CentralGoogle Scholar
- Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010; 26(1):139–40.View ArticlePubMedGoogle Scholar
- McCarthy DJ, Chen Y, Smyth GK. Differential expression analysis of multifactor SRNA-Seq experiments with respect to biological variation. Nucleic Acids Res. 2012; 40(10):4288–97.View ArticlePubMedPubMed CentralGoogle Scholar
- Lund SP, Nettleton D, McCarthy DJ, Smyth GK. Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Stat Appl Genet Mol Biol. 2012; 11(5):Article 8.Google Scholar
- Lun ATL, Chen Y, Smyth GK. It’s DE-licious: a recipe for differential expression analyses of RNA-seq experiments using quasi-likelihood methods in edgeR. Technical report, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Melbourne. 2015. http://www.statsci.org/smyth/pubs/QLedgeRPreprint.pdf.
- Huber W, Carey VJ, Gentleman R, Anders S, Carlson M, Carvalho, BS, et al.Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods. 2015; 12(2):115–21.View ArticlePubMedPubMed CentralGoogle Scholar
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al.The sequence alignment/map format and SAMtools. Bioinformatics. 2009; 25(16):2078–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Fischer B, Pau G. Rhdf5: HDF5 interface to R. 2015. R package version 2.12.0. http://www.bioconductor.org/packages/release/bioc/html/rhdf5.html.
- Sofueva S, Yaffe E, Chan WC, Georgopoulou D, Vietri Rudan M, Mira-Bontenbal H, et al.Cohesin-mediated interactions organize chromosomal domain architecture. EMBO J. 2013; 32(24):3119–29.View ArticlePubMedPubMed CentralGoogle Scholar
- Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2011; 17(1):10–12.View ArticleGoogle Scholar
- Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012; 9(4):357–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Jin F, Li Y, Dixon JR, Selvaraj S, Ye Z, Lee AY, et al.A high-resolution map of the three-dimensional chromatin interactome in human cells. Nature. 2013; 503(7475):290–4.PubMedPubMed CentralGoogle Scholar
- Belton JM, McCord RP, Gibcus JH, Naumova N, Zhan Y, Dekker J. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods. 2012; 58(3):268–76.View ArticlePubMedGoogle Scholar
- Rao SS, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, et al.A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014; 159(7):1665–80.View ArticlePubMedGoogle Scholar
- Lun AT, Smyth GK. De novo detection of differentially bound regions for ChIP-seq data using peaks and windows: controlling error rates correctly. Nucleic Acids Res. 2014; 42:95.View ArticleGoogle Scholar
- Bourgon R, Gentleman R, Huber W. Independent filtering increases detection power for high-throughput experiments. Proc Natl Acad Sci U S A. 2010; 107(21):9546–51.View ArticlePubMedPubMed CentralGoogle Scholar
- Bickmore WA. The spatial organization of the human genome. Annu Rev Genomics Hum Genet. 2013; 14:67–84.View ArticlePubMedGoogle Scholar
- Lin YC, Benner C, Mansson R, Heinz S, Miyazaki K, Miyazaki M, et al.Global changes in the nuclear positioning of genes and intra- and interdomain genomic interactions that orchestrate B cell fate. Nat Immunol. 2012; 13(12):1196–204.View ArticlePubMedPubMed CentralGoogle Scholar
- Lun AT, Smyth GK. csaw: detection of differentially bound regions in ChIP-seq data with sliding windows, with methods for normalization and proper FDR control. R package version 1.2.1.http://bioconductor.org/packages/release/bioc/html/csaw.html.
- Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010; 11(3):25.View ArticleGoogle Scholar
- Loader C. Locfit: local regression, likelihood and density estimation. 2013. R package version 1.5-9.1. http://CRAN.R-project.org/package=locfit.
- Chen Y, Lun ATL, Smyth GK. Differential expression analysis of complex RNA-seq experiments using edgeR In: Datta S, Nettleton DS, editors. Statistical analysis of next generation sequence data. New York: Springer: 2014. p. 51–74.Google Scholar
- Phipson B, Lee S, Majewski IJ, Alexander WS, Smyth GK. Empirical Bayes in the presence of exceptional cases, with application to microarray data. Australia: Technical report, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Melbourne; 2015. http://www.statsci.org/smyth/pubs/RobustEBayesPreprint.pdf.Google Scholar
- Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008; 18(9):1509–17.View ArticlePubMedPubMed CentralGoogle Scholar
- Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Royal Stat Soc B. 1995; 57:289–300.Google Scholar
- Simes RJ. An improved Bonferroni procedure for multiple tests of significance. Biometrika. 1986; 73(3):751–4.View ArticleGoogle Scholar
- Ma W, Ay F, Lee C, Gulsoy G, Deng X, Cook S, et al.Fine-scale chromatin interaction maps reveal the cis-regulatory landscape of human lincRNA genes. Nat Methods. 2015; 12(1):71–8.View ArticlePubMedGoogle Scholar
- Hughes JR, Roberts N, McGowan S, Hay D, Giannoulatou E, Lynch, M, et al.Analysis of hundreds of cis-regulatory landscapes at high resolution in a single, high-throughput experiment. Nat Genet. 2014; 46(2):205–12.View ArticlePubMedGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.