Multiscale structural community organisation of the human genome
 Rasha E. Boulos^{1, 2},
 Nicolas Tremblay^{1, 3},
 Alain Arneodo^{1, 4},
 Pierre Borgnat^{1} and
 Benjamin Audit^{1}Email author
DOI: 10.1186/s128590171616x
© The Author(s) 2017
Received: 23 August 2016
Accepted: 28 March 2017
Published: 11 April 2017
Abstract
Background
Structural interaction frequency matrices between all genome loci are now experimentally achievable thanks to highthroughput chromosome conformation capture technologies. This ensues a new methodological challenge for computational biology which consists in objectively extracting from these data the structural motifs characteristic of genome organisation.
Results
We deployed the fast multiscale community mining algorithm based on spectral graph wavelets to characterise the networks of intrachromosomal interactions in human cell lines. We observed that there exist structural domains of all sizes up to chromosome length and demonstrated that the set of structural communities forms a hierarchy of chromosome segments. Hence, at all scales, chromosome folding predominantly involves interactions between neighbouring sites rather than the formation of links between distant loci.
Conclusions
Multiscale structural decomposition of human chromosomes provides an original framework to question structural organisation and its relationship to functional regulation across the scales. By construction the proposed methodology is independent of the precise assembly of the reference genome and is thus directly applicable to genomes whose assembly is not fully determined.
Keywords
Chromosome interaction network Multiscale community mining Structural domain hierarchical organisation Spectral graph wavelets Human genomeBackground
Here we propose a novel method to analyse HiC data that allows a multiscale identification of structural domains. Because it does not rely on the specific assembly of the reference genome, this method does not look for structural domains limited to chromosome intervals thereby relaxing our preconception about the nature of structural domains. Moreover, due to polymorphisms within a species or to chromosome rearrangements characteristic of cancer cells [35], the assembly of the reference genome does not necessarily corresponds to the true assembly for a cell line under investigation. In these situations, reduced sensitivity to genome assembly is likely to avoid erroneous structural domain predictions. A HiC colocalisation frequency matrix is positive and symmetric, it can thus be interpreted as the adjacency matrix of the genome interaction network where the nodes are the chromosome loci (typically nonoverlapping windows) and the edges reflect the colocalisation frequency between these regions. This justifies the use of concepts and tools from graph theory to analyse HiC data [28, 30, 33, 34, 36–38]. This representation depends on genome assembly only up to the scale used to define the HiC matrix, the columns/rows of the HiC matrix can be permuted without affecting the output of graph algorithms. In graph theory, a set of nodes that share more connections between themselves than with the rest of the graph is called a community [39]. Hence we reformulate the question of structural domain mining as a search for community in the HiC interaction network. Note that Markov graph clustering was already experienced to delineate subsegments within large A/Blike chromosomal domains obtained in a first step [30] and that extensions of graph stochastic block models were also applied to HiC matrices of human chromosome 4 and a segment of human chromosome 6 [33]. In order not to privilege any particular scale in the analysis, we performed the multiscale partitioning of the full intrachromosomal interaction networks into structural communities using a multiscale community mining algorithm based on graph wavelets [40].
Methods
Chromatin conformation capture data and topologically associating domains

Embryonic stem cell line H1 (H1 ES) and foetal lung fibroblast cell line IMR90 HiC data for which TADs are available [11], allowing a direct comparison of our structural communities with what is considered as reference structural domains in the literature. HiC matrices at resolution 20 kb and 40 kb for two replicates in each cell lines as well as TADs predictions in these cell lines were downloaded from the GEO database under accession number GSE35156. These data are based on the hg18 assembly version of the human genome.

Myelogenous leukemia cell line K562 and lymphoblastoid cell line GM06990 HiC data [10] for the analysis of the structural conservation between cell lines. HiC matrices at resolution 100 kb for the two cell lines were downloaded from the GEO database under accession number GSE18199. These matrices are based on the hg18 assembly version of the human genome.

Cervical cancer cell line HeLaS3 HiC data [19] where the HiC experiments were performed on synchronised cells during mitosis and G1 allowing a study of the community structure during the cell cycle. The HiC reads alignment files to the human genome (hg19 assembly version) for the two stages of the cell cycle were downloaded from the ArrayExpress database under accession number EMTAB1948.
HiC intrachromosomal colocalisation frequency matrices for nonoverlapping 100 kb loci correspond to the downloaded matrices that were downsampled to 100 kb when necessary or were constructed from the alignment files (Fig. 1). Unexpectedly low and unexpectedly high interacting loci that are likely to introduce noise were removed (Additional file 1: Table S1). The remaining 100 kb loci were concatenated resulting in new masked positions.
We compared the structuralcommunities described in this work to the TADs [11] that are considered as a reference for the structural description of HiC data. TADs were identified in H1 ES and IMR90 cell lines at both 20 and 40 kb resolutions [11]. Given our adopted resolution of 100 kb, we used the TADs dataset obtained at the 40 kb resolution, and we assigned each TAD border to the corresponding 100 kb pixel keeping only TADs larger than 200 kb (3 pixels). This led to a database of 2 993 (resp. 2 263) TADs in H1 ES (resp. IMR90), with 3 905 (resp. 3 096) distincts borders in H1 ES (resp. IMR90).
In this work one focus is to question the existence of a TADlike structuration of the human genome in the intermediary scale range from the described TAD typical size up to the chromosome length. A second objective is to address the possible conservation of these structural motifs across cell lines. This led us to include the K562 and GM06990 datasets from the original HiC study [10]. These datasets are less resolutive than more recent ones in IMR90 and H1 ES cell lines [11] due to a limited sequencing depth and were analysed at best at 100 kb resolution by the original authors. This explains why we chose 100 kb as the resolution for all the analysis presented in our manuscript. However to check whether lower or higher resolution has significant impact on the results, the IMR90 dataset was also analysed at resolutions 40 and 200 kb.
Multiscale community mining using graph wavelets
We used the multiscale community mining algorithm based on spectral graph wavelets that we previously described and benchmarked against two other multiscale community mining methods from the literature [40]. The purpose of detecting communities at different scales using graph wavelets instead of, say, cutting a hierarchical clustering at different levels, is to fit as close to the data as possible. Cutting a hierarchical clustering impose a hierarchical structure to the set of community obtained at the different scales (cutting levels). When using wavelets, we do not suppose beforehand that the data have a hierarchical structure: a community at a coarse scale does not necessarily have to contain communities found at a finer scale.
Our community mining algorithm [40] relies on the precise construction of graph wavelets in order to introduce the notion of scale [41] (Supplementary text: Graph wavelet transform and community mining and Figures S1 to S4 in Additional file 1). A graph wavelet centred on a node a is a function on the nodes of the graph whose values capture the proximity of each node to node a given a scale s of observation. As such, the set of graph wavelets at scale s characterise the local graph structure around each node over a “distance” controlled by the scale parameter s, as illustrated in Additional file 1: Figures S1 and S3. At a fixed scale, the similarity between the neighbourhood of 2 nodes (a and b) can be quantified as the correlation (\(\mathcal {C}^{(s)}(a,b)\), Additional file 1: Equation (S11)) between the wavelets centred on each of the two nodes at that scale. Computing the correlation distance \(\mathcal {D}^{(s)}(a,b)=1\mathcal {C}^{(s)}(a,b)\) (Additional file 1: Equation (S12)) between all pairs of nodes results in a distance matrix capturing the similarity of node neighbourhood, which can in turn be used as the input of a hierarchical clustering algorithm in order to partition the nodes into communities for the scale of observation s (Additional file 1: Figure S4). To sum up, at each scale and for each intrachromosomal interaction network, the community mining algorithm amounts to (i) compute the matrix of correlation distance \(\mathcal {D}^{(s)}(a,b)\), (ii) apply averagelinkage hierarchical clustering [42, 43], and (iii) finally cut the resulting dendrogram following the method prescribed in [40]. This results in a set of structural communities for a given scale and a given chromosome.
We used the fast implementation of this procedure [40]. On the one hand, the graph wavelet transform is computed using the fast algorithm proposed in [41] (Additional file 1: Equation (S17)). On the other hand, instead of computing the wavelets on the n nodes of the graph which requires n wavelet transforms of Dirac functions (using Additional file 1: Equation (S17) n times), the matrix of correlations between wavelets at scale s is approximated by the correlation between η (≪n) wavelet transforms of random Gaussian functions on the graph (Supplementary text: Graph wavelet transform and community mining in Additional file 1). Importantly, the fast implementation of our multiscale community mining protocol is applicable to large networks with \(\gtrsim 10\;000\) nodes [40], allowing to consider its future application to intrachromosomal interaction networks at high resolution (∼10 kb) in mammals [20] but also to full genome interaction networks at the resolution used in the present work (100 kb). Note that graph spectral clustering can also be considered for these large interaction network settings thanks to recent algorithmic developments [44, 45].
Comparing sets of genomic domains

Mean best mutual coverage: We define the mutual coverage m _{ c } between two domains \(d_{1} \in \mathcal {D}_{1}\) and \(d_{2} \in \mathcal {D}_{2}\) as their intersection length \(L_{d_{1} \cap d_{2}}\) divided by the maximum length of the two domain lengths \(L_{d_{1}}\) and \(L_{d_{2}}\): \(m_{c}(d_{1},d_{2}) = L_{d_{1} \cap d_{2}}/\max (L_{d_{1}},L_{d_{2}})\).
The maximal value 1 of m _{ c } is obtained when the two domains d _{1} and d _{2} are identical. Then, for each domain \(d_{1} \in \mathcal {D}_{1}\), we define its best mutual coverage with \(\mathcal {D}_{2}\) domains (\({bm}_{c_{\mathcal {D}_{2}}}\)) as its maximal mutual coverage with \(\mathcal {D}_{2}\) domains: \({bm}_{c_{\mathcal {D}_{2}}} (d_{1}) = \max _{d_{2} \in \mathcal {D}_{2}}\)
(m _{ c }(d _{1},d _{2})). Sorting the \(\mathcal {D}_{1}\) domains by size, we compute the mean best mutual coverage with \(\mathcal {D}_{2}\) of groups of 50 \(\mathcal {D}_{1}\) domains that we plot as a function of the mean length of the domains in the group. This results in an average mean best mutual coverage curve between domains in \(\mathcal {D}_{1}\) and \(\mathcal {D}_{2}\) as a function of \(\mathcal {D}_{1}\) domain size.

We say that a domain d has a match in \(\mathcal {D}_{2}\) if \({bm}_{c_{\mathcal {D}_{2}}}(d) \geq 0.8\). \(P_{\mathcal {D}_{2}}(\mathcal {D})\) is then defined as the proportion of domains \(d \in \mathcal {D}\) that have a match in \(\mathcal {D}_{2}\). Sorting the \(\mathcal {D}_{1}\) domains by size, we consider them in groups \(\mathcal {D}\) of 50 domains and plot \(P_{\mathcal {D}_{2}} (\mathcal {D})\) as a function of the mean length of the domains in \(\mathcal {D}\). This results in a matching proportion curve of domains in \(\mathcal {D}_{1}\) and \(\mathcal {D}_{2}\) as a function of \(\mathcal {D}_{1}\) domain size.

We say that a border b has a match in \(\mathcal {B}_{2}\) when there is a border in \(\mathcal {B}_{2}\) less than 100 kb away from b i.e ± 1 pixel away. \(P_{\mathcal {B}_{2}}(\mathcal {B})\) is then defined as the proportion of borders \(b \in \mathcal {B}\) that have a match in \(\mathcal {B}_{2}\). Sorting the \(\mathcal {B}_{1}\) borders according to their associated lengths (see below), we consider them in groups \(\mathcal {B}\) of 100 borders and plot \(P_{\mathcal {B}_{2}} (\mathcal {B})\) as a function of the average associated length of the borders in \(\mathcal {B}\). This results in a matching proportion curve of borders in \(\mathcal {B}_{1}\) and \(\mathcal {B}_{2}\) as a function of \(\mathcal {B}_{1}\) border associated length.
Results and discussion
Waveletbased community detection in the DNA interaction network
Structural communities correspond to genome intervals
The interaction frequencies outside the diagonal blocks characterising the structural compartimentalisation as described in [11] are not negligible (look for instance at the region around [82,89] Mb in IMR90 that highly interacts with the region around [92,93] Mb in Fig. 1). This suggests that structural communities may not necessarily reduce to intervals along the genome. Hence for each non trivial community (community of size >1), we computed the proportion P _{ int } of the largest set of successive 100 kb loci covered by the community over the size of the community: P _{ int }=1 when all the nodes of the community constitute an interval of the masked genome and P _{ int }=1/N where N is the size of the community when the community do not contain any pair of consecutive loci of the genome. Considering P _{ int }≥0.95 as a criterion for a community to constitute an interval along the genome, we observed for the 2 cell lines that more than 99% of the communities correspond to intervals of the genome. This property for the communities remains true for all the scales and whatever the size of the communities. This is consistent with the fact that at all scales, genomic neighbours tend to strongly colocalise resulting in higher frequency of interactions. These results demonstrate that the strongest motifs of structural organisation involve contiguous genomic segments. We will refer to the communities forming a genomic interval as intervalcommunities. We only kept the communities that correspond to an interval (P _{ int }≥0.95) reducing them to their main interval. This allowed us to adopt a simple representation of the structuralcommunities obtained across scales (Fig. 2). The differences observed between the resulting community size distributions in IMR90 and H1 ES (Fig. 3 a) are visible in this representation. We clearly see a first range of scales (s≤20) where the intervalcommunities reduce to singletons in IMR90 (Fig. 2 a) and not in H1 ES (Fig. 2 b). Above this critical scale, non trivial intervalcommunities appear in IMR90. Note that the mean size of the intervalcommunities for this first meaningful partitioning in IMR90 is larger than the ones observed in H1 ES for its first meaningful partitioning (smallest scale). This results in a lack of small non trivial intervalcommunities in IMR90.
A hierarchical organisation of the genome
The representation in Fig. 2 reveals the hierarchical organisation of the communities. Across scales, small communities merge together to form bigger communities at larger scales. Hence the community borders present at the smallest scale progressively disappear at some larger scale allowing the emergence of bigger communities. Importantly, the conservation of borders from large scales to small scales is very high. For each pair of scales s2>s1, we computed the proportion of borders at the larger scale s2 that are also present at the smaller scale s1. This proportion is close to 1 regardless of the scales (Additional file 1: Figure S5). The fact that the borders are conserved across scales means that there is no “new” structure that emerges and that only existent ones merge together, i.e. small structures are nested into bigger ones. This is consistent with the results of recent studies suggesting that TADs hierarchically coassociate to form larger structures [16, 29, 46].
Number of structural communities
Cell line  N  N (filtered)  Remaining  Distinct 

communities  borders  
H1 ES  12 343  65  12 278  5 751 
IMR90  8 852  25  8 827  6 824 
GM06990  10 279  60  10 219  6 967 
K562  13 383  30  13 353  8 273 
HeLaS3 G1  6 752  36  6 716  4 108 
HeLaS3 M  1 059  4  1 055  885 
Are intervalcommunities structural domains?
To test the robustness of the waveletbased community detection method with respect to the possible absence of a community structure over some range of scales, we compared the intervalcommunities obtained for the HiC datasets in synchronised HeLaS3 cells during G1 and M phase, respectively (Methods). The original study [19] showed that the highly compartmentalised organisation described before from non synchronous cells [10, 11, 13, 15, 16, 20, 26, 27] was restricted to interphase and that during a cell cycle, chromosomes transit from a decondensed and spatially organised state during interphase to a highly condensed and morphologically reproducible metaphase chromosome state. In the former phase, the HiC interaction maps display similar plaid patterns of regional enrichment or depletion of long range interactions (as the one shown in Fig. 1) while the maps in mitotic cells change and the plaid patterns disappear [19]. For HeLaS3 G1 (resp. mitosis) dataset, we obtained 6 716 (resp. 1055) non trivial communities and 4 108 (resp. 885) distinct borders (Table 1). For the mitosis HeLaS3 HiC dataset, we obtained 1 059 communities from which we filtered out 4 resulting in 885 distincts borders (Table 1). Consistently with non synchronous cells, G1 cells present a hierarchical structure into intervalcommunities that increase in size across scales (Additional file 1: Figures S6 and S7). Small scale singletons hierarchically group to form large intervalcommunities at larger scales. As discussed above, the length distribution of the G1 HeLaS3 intervalcommunities is similar to the intervalcommunities size distribution obtained in the 3 other differentiated cell line datasets (Fig. 3 b). In contrast, metaphase chromosomes do not present a hierarchical structural organisation. More specifically, chromosomes 16, 21 and 22 do not present any structure (each node constitutes a community on the full available range of scales, Additional file 1: Figure S6). In the 19 other autosomes, at small scales each node is a singleton and above a critical scale a sharp discontinuity of the community sizes distribution is observed: nodes are abruptly grouped in a small number (2–5) of communities (Additional file 1: Figure S7). For 12 out of these 19 chromosomes, when divided in two communities, these communities correspond to the two chromosomal arms, as illustrated for chromosome 17 in Additional file 1: Figure S7. These results demonstrate that the waveletbased community detection method does not produce misleading intermediate scale communities when no structuration exists in that scale range.
To strengthen this point in a noisy situation, we simulated a structural interaction matrix between 2000 nodes (comparable to the largest human chomosomes at resolution 100 kb) organised in fully connected intervalcommunities with no specific organisation at scales larger than the community size: the matrix is built as a series of 40 pairs of domains of size 20 nodes and 30 nodes with internal domain interaction set to 60, with the two first (resp. second) subdiagonals set to 80 (resp. 70) to assure connectivity and with an additive Poisson noise over all interaction pairs of mean value λ=50 (Additional file 1: Figure S8 Left). When applying the graph wavelet community mining protocol, we recovered only trivial singleton communities at small and large scales. However in the intermediate scale range, we nicely recovered all the 20 and 30 nodes on a range of scales that depends on their size (Additional file 1: Figure S8 Right). This example shows that the method does not produce a fake hierarchical domain organisation by merging existing domains even in a noisy situation.
Are TADs intervalcommunities?
To test the robustness of our methodology with regards to the binning resolution, we reproduced the analysis of the IMR90 data at a finer (40 kb, total running time 8h48mn) and a coarser (200 kb, total running time 31 mn) resolution (Additional file 1: Figure S10). The lengths of intervalcommunities determined at these 2 resolutions nicely reproduce the distribution obtained at the 100 kb resolution (Additional file 1: Figure S10A). Intervalscommunities of size \(\gtrsim 12\) Mb strongly match between these intervals communities datasets (recovery proportion ≃70−90%, Additional file 1: Figure S10B). For smaller community size, recovery proportion between datasets decreases with community size in the same manner for the 3 resolution pairs. This can be understood when noting that the isolation strength of community borders significantly weakens when decreasing the genomic distance below ∼1 Mb (Fig. 4). Finally, the proportion of TADs that have a match in the intervalcommunity database is similar at 40 kb resolution than at 100 kb resolution (Additional file 1: Figure S11). This demonstrates that the results do not depend on the choice of the 100 kb resolution and further underlines that the lower structural domain recovery rate generally observed for small domain sizes (≲1 Mb) is likely related to the weaker isolation strength of structural domain borders over short distances (≲1 Mb).
Conservation of structural communities across cell lines
Conclusions
We introduced a fast multiscale community mining algorithm based on spectral graph wavelets [40] to identify structural motifs from highthroughput chromatin conformation capture data (HiC) [10]. HiC data were represented as intrachromosomal interaction networks and structural motifs were delineated as communities of these networks. The novelty of this approach relies on the combination of a multiscale procedure and a representation of the data that is independent of the exact assembly of the reference genome over length scales larger than the window size used to construct the interaction network. The proposed methodology has no a priori on the size and on the nature of the structural motifs. The application of this protocol to 6 HiC datasets led to a database of several thousands structural communities (Table 1). The database of intervalcommunities in mitotic HeLaS3 cells that were described not to present a TADlike structural organisation [19], does not contain any intermediary scale structural communities, illustrating the robustness of the proposed methodology with regards to the absence of structural motifs. Consistently with the recent usage of HiC data for genome sequence assembly [47, 48], we observed that structuralcommunities in unsynchronised and G1 cells form hierarchies of chromosome intervals of length ranging from the resolution (100 kb) to the chromosome lengths (\(\gtrsim 10\) Mb) (Fig. 3). The prevalence of intervalcommunities underlines that chromosome folding is mainly driven by interactions between neighbouring loci, at all scales of observation. This constitutes a justification that TADlike structural motifs indeed correspond to chromosome intervals. For domains significantly larger than the resolution of the analysis (\(\gtrsim 600\) kb), a majority of the TADs [11] are recovered as intervalcommunities (Fig. 5) and, whatever the intervalcommunity length, their borders present an insulatorlike behaviour (Fig. 4) as expected for TADlike structural motifs. Hence intervalcommunities capture similar structural organisation patterns as TADs but over the full chromosome range of scales.
This novel multiscale structural decomposition of human chromosomes provides an original framework to question structural organisation and its relationship to functional regulation. It allowed us to reformulate the question of structural domain conservation between different cell lines across the scales: a high level of structural conservation between cell lines up to the largest scales becomes apparent. For example, ∼ 65% of the differentiated cell lines intervalcommunities larger than 600 kb were also found to be structuralcommunities in H1 ES cell line (Fig. 6 a). It was previously noted that there likely exists some links between structural domains and replication domains [23, 25, 27, 49] including the socalled replication timing Udomains [24, 50]. Udomains are bordered by early replicating master replication initiation zones that present similar insulating properties as the ones observed for TADs and intervalcommunities borders (Fig. 4 and Additional file 1: Figure S9) [24]. In Human ES cells, master replication initiation zones are enriched in CTCF and pluripotent transcription factors NANOG and OCT4 that were recently shown to contribute to the overall folding of embryonic stem cells genome via specific longrange contacts [51, 52], and appear to be fundamental determinants of pluripotency maintenance [53, 54]. In particular they are at the heart of the socalled consolidation phenomenon [17, 23, 55, 56] corresponding to early to late transitions from embryonic stem cells to differentiated cells coinciding with the emergence of compact heterochromatin at the nuclear periphery [54]. ES cell line are characterised by smaller replication Udomains [24]. Here we observed in H1 ES cell line an excess of intervalcommunities in the range of scales from ∼500 kb to ∼1.5 Mb as compared to the differentiated cell lines (Fig. 3 b). These domains not observed in differentiated cell lines might be subject to some structural consolidation scenario during cell differentiation, similar to the one described for replication timing domains. For example, the strutural community border present in H1 ES and absent in IMR90 at position ∼84 Mb in Fig. 1 correspond to a replication timing Udomain border specific of ES cell line. Further analysis of the structural consolidation scenario is likely to shed a new light on the role of structural organisation in the epigenetically regulated chromatin reorganisation that underlies the loss of pluripotency and lineage commitment [54]. It was shown that master origins of replication conserved between 6 cell lines are encoded in the DNA sequence via a local enrichment in nucleosome excluding energy barriers [57, 58]. This raises the question whether borders of the conserved structural community borders (Fig. 6) might be specified by a similar genetic mechanism.
A recent HiC experimental study at much higher (kb) resolution has provided some refined partitioning of the human genome by TADs of mean size ∼180 kb [20], much closer to the estimate ∼100kb previously reported in Drosophila [16]. Interestingly, as in Drosophila, these refined TADs seem to have some specific epigenetic chromatin identity that can change dramatically their functional identity in different cell types [16, 20, 59]. Detecting intervalcommunities at higher resolution can provide better quantification of the chromatin state blocks as epigenetic communities. The waveletbased community detection method provides us with a tool to investigate further the existence of some underlying rules for the association of structural/functional domains across scales. The robustness of the proposed protocol with respect to rearranged genomes is a key property to pursue this research.
Abbreviations
 DNA:

Deoxyribonucleic acid
 HiC:

Highthroughput 3C
 H1 ES:

Embryonic stem cell line H1
 TAD:

Topologically associating domains
 3C:

Chromatin conformation capture
Declarations
Funding
This work was supported by Centre national de la Recherche Scientifique (CNRS), Ecole Normale Supérieure de Lyon (ENS de Lyon) and Agence National de la Recherche (ANR14CE270001 GRAPHSIP). BA acknowledges support from Science and Technology Commission of Shanghai Municipality (15520711500), Fondation pour la Recherche Médicale (DEI20151234404) and Agence National de la Recherche (ANR LightComb 20162018). The funding bodies had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Availability of data and materials
The structural domain predictions supporting the conclusions of this article as well as the version of the multiscale community detection (MSCD) algorithm code used for all analyses and a wrapper MATLAB script allowing to reproduce the described pipeline can be downloaded at http://perso.enslyon.fr/benjamin.audit/StructuralCommunities/.
The latest version of MSCD MATLAB source code is freely available at http://www.gipsalab.fr/extasciitildenicolas.tremblay/index.php?page=downloads.
Authors’ contributions
All authors designed the study. REB performed the analysis. NT and PB contributed to elaborate analysis tools. AA, PB and BA supervised the study. BA wrote the manuscript with contributions from all coauthors. All authors read and approved the final manuscript.
Competing interests
All authors declare that they have no competing interests.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Cook PR. The organization of replication and transcription. Science. 1999; 284(5421):1790–5.View ArticlePubMedGoogle Scholar
 Cremer T, Cremer C. Chromosome territories, nuclear architecture and gene regulation in mammalian cells. Nat Rev Genet. 2001; 2:292–301.View ArticlePubMedGoogle Scholar
 Berezney R. Regulating the mammalian genome: the role of nuclear architecture. Adv Enzyme Regul. 2002; 42:39–52.View ArticlePubMedGoogle Scholar
 Misteli T. Beyond the sequence: cellular organization of genome function. Cell. 2007; 128(4):787–800. doi:10.1016/j.cell.2007.01.028.View ArticlePubMedGoogle Scholar
 Fraser P, Bickmore W. Nuclear organization of the genome and the potential for gene regulation. Nature. 2007; 447(7143):413–7. doi:10.1038/nature05916.View ArticlePubMedGoogle Scholar
 Arneodo A, Vaillant C, Audit B, Argoul F, d’AubentonCarafa Y, Thermes C. Multiscale coding of genomic information: From DNA sequence to genome structure and function. Phys Rep. 2011; 498:45–188.View ArticleGoogle Scholar
 Luger K, Mäder AW, Richmond RK, Sargent DF, Richmond TJ. Crystal structure of the nucleosome core particle at 2.8 Å resolution. Nature. 1997; 389(6648):251–60.View ArticlePubMedGoogle Scholar
 Dekker J, Rippe K, Dekker M, Kleckner N. Capturing chromosome conformation. Science. 2002; 295:1306–11.View ArticlePubMedGoogle Scholar
 de Wit E, de Laat W. A decade of 3C technologies: insights into nuclear organization. Genes Dev. 2012; 26(1):11–24. doi:10.1101/gad.179804.111.View ArticlePubMedPubMed CentralGoogle Scholar
 LiebermanAiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, Sandstrom R, Bernstein B, Bender MA, Groudine M, Gnirke A, Stamatoyannopoulos J, Mirny LA, Lander ES, Dekker J. Comprehensive mapping of longrange interactions reveals folding principles of the human genome. Science. 2009; 326(5950):289–93. doi:10.1126/science.1181369.View ArticlePubMedPubMed CentralGoogle Scholar
 Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, Hu M, Liu JS, Ren B. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012; 485(7398):376–80. doi:10.1038/nature11082.View ArticlePubMedPubMed CentralGoogle Scholar
 Hou C, Li L, Qin ZS, Corces VG. Gene density, transcription, and insulators contribute to the partition of the Drosophila genome into physical domains. Mol Cell. 2012; 48(3):471–84. doi:10.1016/j.molcel.2012.08.031.View ArticlePubMedPubMed CentralGoogle Scholar
 Kalhor R, Tjong H, Jayathilaka N, Alber F, Chen L. Genome architectures revealed by tethered chromosome conformation capture and populationbased modeling. Nat Biotechnol. 2012; 30(1):90–8. doi:10.1038/nbt.2057.View ArticleGoogle Scholar
 Moindrot B, Audit B, Klous P, Baker A, Thermes C, de Laat W, Bouvet P, Mongelard F, Arneodo A. 3D chromatin conformation correlates with replication timing and is conserved in resting cells. Nucleic Acids Res. 2012; 40(19):9470–81. doi:10.1093/nar/gks736.View ArticlePubMedPubMed CentralGoogle Scholar
 Nora EP, Lajoie BR, Schulz EG, Giorgetti L, Okamoto I, Servant N, Piolot T, van Berkum NL, Meisig J, Sedat J, Gribnau J, Barillot E, Bluthgen N, Dekker J, Heard E. Spatial partitioning of the regulatory landscape of the Xinactivation centre. Nature. 2012; 485(7398):381–5. doi:10.1038/nature11049.View ArticlePubMedPubMed CentralGoogle Scholar
 Sexton T, Yaffe E, Kenigsberg E, Bantignies F, Leblanc B, Hoichman M, Parrinello H, Tanay A, Cavalli G. Threedimensional folding and functional organization principles of the Drosophila genome. Cell. 2012; 148(3):458–72. doi:10.1016/j.cell.2012.01.010.View ArticlePubMedGoogle Scholar
 Takebayashi SI, Dileep V, Ryba T, Dennis JH, Gilbert DM. Chromatininteraction compartment switch at developmentally regulated chromosomal domains reveals an unusual principle of chromatin folding. Proc Natl Acad Sci USA. 2012; 109(31):12574–9. doi:10.1073/pnas.1207185109.View ArticlePubMedPubMed CentralGoogle Scholar
 Zhang Y, McCord RP, Ho YJ, Lajoie BR, Hildebrand DG, Simon AC, Becker MS, Alt FW, Dekker J. Spatial organization of the mouse genome and its role in recurrent chromosomal translocations. Cell. 2012; 148(5):908–21. doi:10.1016/j.cell.2012.02.002.View ArticlePubMedPubMed CentralGoogle Scholar
 Naumova N, Imakaev M, Fudenberg G, Zhan Y, Lajoie BR, Mirny LA, Dekker J. Organization of the mitotic chromosome. Science. 2013; 342(6161):948–53. doi:10.1126/science.1236083.View ArticlePubMedPubMed CentralGoogle Scholar
 Rao SSP, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, Sanborn AL, Machol I, Omer AD, Lander ES, Aiden EL. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014; 159(7):1665–80. doi:10.1016/j.cell.2014.11.021.View ArticlePubMedGoogle Scholar
 Fudenberg G, Mirny LA. Higherorder chromatin structure: bridging physics and biology. Curr Opin Genet Dev. 2012; 22(2):115–24. doi:10.1016/j.gde.2012.01.006.View ArticlePubMedPubMed CentralGoogle Scholar
 Gibcus JH, Dekker J. The hierarchy of the 3D genome. Mol Cell. 2013; 49(5):773–82. doi:10.1016/j.molcel.2013.02.011.View ArticlePubMedPubMed CentralGoogle Scholar
 Ryba T, Hiratani I, Lu J, Itoh M, Kulik M, Zhang J, Schulz TC, Robins AJ, Dalton S, Gilbert DM. Evolutionarily conserved replication timing profiles predict longrange chromatin interactions and distinguish closely related cell types. Genome Res. 2010; 20(6):761–70. doi:10.1101/gr.099655.109.View ArticlePubMedPubMed CentralGoogle Scholar
 Baker A, Audit B, Chen CL, Moindrot B, Leleu A, Guilbaud G, Rappailles A, Vaillant C, Goldar A, Mongelard F, d’AubentonCarafa Y, Hyrien O, Thermes C, Arneodo A. Replication fork polarity gradients revealed by megabasesized Ushaped replication timing domains in human cell lines. PLoS Comput Biol. 2012; 8(4):1002443. doi:10.1371/journal.pcbi.1002443.View ArticleGoogle Scholar
 Boulos RE, Julienne H, Baker A, Chen CL, Petryk N, Kahli M, d’AubentonCarafa Y, Goldar A, Jensen P, Hyrien O, Thermes C, Arneodo A, Audit B. From the chromatin interaction network to the organization of the human genome into replication N/Udomains. New J Phys. 2014; 16:115014.View ArticleGoogle Scholar
 Le Dily F, Baù D, Pohl A, Vicent GP, Serra F, Soronellas D, Castellano G, Wright RHG, Ballare C, Filion G, MartiRenom MA, Beato M. Distinct structural transitions of chromatin topological domains correlate with coordinated hormoneinduced gene regulation. Genes Dev. 2014; 28(19):2151–62. doi:10.1101/gad.241422.114.View ArticlePubMedPubMed CentralGoogle Scholar
 Pope BD, Ryba T, Dileep V, Yue F, Wu W, Denas O, Vera DL, Wang Y, Hansen RS, Canfield TK, Thurman RE, Cheng Y, Gulsoy G, Dennis JH, Snyder MP, Stamatoyannopoulos JA, Taylor J, Hardison RC, Kahveci T, Ren B, Gilbert DM. Topologically associating domains are stable units of replicationtiming regulation. Nature. 2014; 515(7527):402–5. doi:10.1038/nature13986.View ArticlePubMedPubMed CentralGoogle Scholar
 Chen H, Chen J, Muir LA, Ronquist S, Meixner W, Ljungman M, Ried T, Smale S, Rajapakse I. Functional organization of the human 4D nucleome. Proc Natl Acad Sci USA. 2015; 112(26):8002–7. doi:10.1073/pnas.1505822112.View ArticlePubMedPubMed CentralGoogle Scholar
 Fraser J, Ferrai C, Chiariello AM, Schueler M, Rito T, Laudanno G, Barbieri M, Moore BL, Kraemer DCA, Aitken S, Xie SQ, Morris KJ, Itoh M, Kawaji H, Jaeger I, Hayashizaki Y, Carninci P, Forrest ARR, FANTOM Consortium, Semple CA, Dostie J, Pombo A, Nicodemi M. Hierarchical folding and reorganization of chromosomes are linked to transcriptional changes in cellular differentiation. Mol Syst Biol. 2015; 11(12):852.View ArticlePubMedPubMed CentralGoogle Scholar
 Liu L, Zhang Y, Feng J, Zheng N, Yin J, Zhang Y. GeSICA: genome segmentation from intrachromosomal associations. BMC Genomics. 2012; 13:164. doi:10.1186/1471216413164.View ArticlePubMedPubMed CentralGoogle Scholar
 Filippova D, Patro R, Duggal G, Kingsford C. Identification of alternative topological domains in chromatin. Algorithms Mol Biol. 2014; 9:14. doi:10.1186/17487188914.View ArticlePubMedPubMed CentralGoogle Scholar
 LévyLeduc C, Delattre M, MaryHuard T, Robin S. Twodimensional segmentation for analyzing HiC data. Bioinformatics. 2014; 30(17):386–92. doi:10.1093/bioinformatics/btu443.View ArticleGoogle Scholar
 Cabreros I, Abbe E, Tsirigos A. Detecting Community Structures in HiC Genomic Data. 2015. arXiv:1509.05121 [qbio.GN]. http://arxiv.org/abs/1509.05121.
 Chen J, 3rd Hero AO, Rajapakse I. Spectral identification of topological domains. Bioinformatics. 2016; 32(14):2151–158. doi:10.1093/bioinformatics/btw221.
 Negrini S, Gorgoulis VG, Halazonetis TD. Genomic instability–an evolving hallmark of cancer. Nat Rev Mol Cell Biol. 2010; 11(3):220–8. doi:10.1038/nrm2858.View ArticlePubMedGoogle Scholar
 Botta M, Haider S, Leung IXY, Lio P, Mozziconacci J. Intra and interchromosomal interactions correlate with CTCF binding genome wide. Mol Syst Biol. 2010; 6:426. doi:10.1038/msb.2010.79.View ArticlePubMedPubMed CentralGoogle Scholar
 Sandhu KS, Li G, Poh HM, Quek YLK, Sia YY, Peh SQ, Mulawadi FH, Lim J, Sikic M, Menghi F, Thalamuthu A, Sung WK, Ruan X, Fullwood MJ, Liu E, Csermely P, Ruan Y. Largescale functional organization of longrange chromatin interaction networks. Cell Rep. 2012; 2(5):1207–19. doi:10.1016/j.celrep.2012.09.022.View ArticlePubMedPubMed CentralGoogle Scholar
 Boulos RE, Arneodo A, Jensen P, Audit B. Revealing longrange interconnected hubs in human chromatin interaction data using graph theory. Phys Rev Lett. 2013; 111:118102.View ArticlePubMedGoogle Scholar
 Fortunato S. Community detection in graphs. Phys Rep. 2010; 486:75–174.View ArticleGoogle Scholar
 Tremblay N, Borgnat P. Graph wavelets for multiscale community mining. IEEE Trans Signal Process. 2014; 62(20):5227–39. doi:10.1109/TSP.2014.2345355.View ArticleGoogle Scholar
 Hammond DK, Vandergheynst P, Gribonval R. Wavelets on graphs via spectral graph theory. Appl Comput Harmon Anal. 2011; 30:129–50.View ArticleGoogle Scholar
 King B. Stepwise clustering procedures. J Amer Statist Assoc. 1967; 62(317):86–101.View ArticleGoogle Scholar
 Jain A, Murty M, Flynn P. Data clustering: A review. ACM Comput Surv (CSUR). 1999; 31(3):264–323.View ArticleGoogle Scholar
 Tremblay N, Puy G, Borgnat P, Gribonval R, Vandergheynst P. Accelerated spectral clustering using graph filtering of random signals. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, 2025 March 2016. Shangai: IEEE: 2016. doi:10.1109/ICASSP.2016.7472447.Google Scholar
 Tremblay N, Puy G, Gribonval R, Vandergheynst P. Compressive spectral clustering. In: Proceedings of the Thirtythird International Conference on Machine Learning (ICML 2016), New York. JMLR Workshop and Conference Proceedings 2016;48:1002–11.
 Sexton T, Cavalli G. The role of chromosome domains in shaping the functional genome. Cell. 2015; 160(6):1049–59. doi:10.1016/j.cell.2015.02.040.View ArticlePubMedGoogle Scholar
 Kaplan N, Dekker J. Highthroughput genome scaffolding from in vivo DNA interaction frequency. Nat Biotechnol. 2013; 31(12):1143–47. doi:10.1038/nbt.2768.View ArticlePubMedPubMed CentralGoogle Scholar
 MarieNelly H, Marbouty M, Cournac A, Flot JF, Liti G, Parodi DP, Syan S, Guillén N, Margeot A, Zimmer C, Koszul R. Highquality genome (re)assembly using chromosomal contact data. Nat Commun. 2014; 5:5695. doi:10.1038/ncomms6695.View ArticlePubMedPubMed CentralGoogle Scholar
 Boulos RE, Drillon G, Argoul F, Arneodo A, Audit B. Structural organization of human replication timing domains. FEBS Lett. 2015; 589(20 Pt A):2944–57. doi:10.1016/j.febslet.2015.04.015.View ArticlePubMedGoogle Scholar
 Audit B, Baker A, Chen CL, Rappailles A, Guilbaud G, Julienne H, Goldar A, d’AubentonCarafa Y, Hyrien O, Thermes C, Arneodo A. Multiscale analysis of genomewide replication timing profiles using a waveletbased signalprocessing algorithm. Nat Protoc. 2013; 8(1):98–110. doi:10.1038/nprot.2012.145.View ArticlePubMedGoogle Scholar
 Denholtz M, Bonora G, Chronis C, Splinter E, de Laat W, Ernst J, Pellegrini M, Plath K. Longrange chromatin contacts in embryonic stem cells reveal a role for pluripotency factors and polycomb proteins in genome organization. Cell Stem Cell. 2013; 13(5):602–16. doi:10.1016/j.stem.2013.08.013.View ArticlePubMedGoogle Scholar
 de Wit E, Bouwman BAM, Zhu Y, Klous P, Splinter E, Verstegen MJAM, Krijger PHL, Festuccia N, Nora EP, Welling M, Heard E, Geijsen N, Poot RA, Chambers I, de Laat W. The pluripotent genome in three dimensions is shaped around pluripotency factors. Nature. 2013; 501(7466):227–31. doi:10.1038/nature12420.View ArticlePubMedGoogle Scholar
 Julienne H, Zoufir A, Audit B, Arneodo A. Epigenetic regulation of the human genome: coherence between promoter activity and largescale chromatin environment. Front Life Sci. 2013; 7(12):44–62. doi:10.1080/21553769.2013.832706.View ArticleGoogle Scholar
 Julienne H, Audit B, Arneodo A. Embryonic stem cell specific “master” replication origins at the heart of the loss of pluripotency. PLoS Comput Biol. 2015; 11(2):1003969. doi:10.1371/journal.pcbi.1003969.View ArticleGoogle Scholar
 Hiratani I, Ryba T, Itoh M, Yokochi T, Schwaiger M, Chang CW, Lyou Y, Townes TM, Schubeler D, Gilbert DM. Global reorganization of replication domains during embryonic stem cell differentiation. PLoS Biol. 2008; 6(10):245. doi:10.1371/journal.pbio.0060245.View ArticleGoogle Scholar
 Hiratani I, Ryba T, Itoh M, Rathjen J, Kulik M, Papp B, Fussner E, BazettJones DP, Plath K, Dalton S, Rathjen PD, Gilbert DM. Genomewide dynamics of replication timing revealed by in vitro models of mouse embryogenesis. Genome Res. 2010; 20(2):155–69. doi:10.1101/gr.099796.109.View ArticlePubMedPubMed CentralGoogle Scholar
 Drillon G, Audit B, Argoul F, Arneodo A. Ubiquitous human ‘master’ origins of replication are encoded in the DNA sequence via a local enrichment in nucleosome excluding energy barriers. J Phys Condens Matter. 2015; 27(6):064102. doi:10.1088/09538984/27/6/064102.View ArticlePubMedGoogle Scholar
 Drillon G, Audit B, Argoul F, Arneodo A. Evidence of selection for an accessible nucleosomal array in human. BMC Genomics. 2016; 17:526.View ArticlePubMedPubMed CentralGoogle Scholar
 Ciabrelli F, Cavalli G. Chromatindriven behavior of topologically associating domains. J Mol Biol. 2015; 427:248–58. doi:10.1016/j.jmb.2014.09.013.View ArticleGoogle Scholar