- Open Access
Dynamic epigenetic mode analysis using spatial temporal clustering
© The Author(s) 2016
Published: 23 December 2016
Differentiation of human embryonic stem cells requires precise control of gene expression that depends on specific spatial and temporal epigenetic regulation. Recently available temporal epigenomic data derived from cellular differentiation processes provides an unprecedented opportunity for characterizing fundamental properties of epigenomic dynamics and revealing regulatory roles of epigenetic modifications.
This paper presents a spatial temporal clustering approach, named STCluster, which exploits the temporal variation information of epigenomes to characterize dynamic epigenetic mode during cellular differentiation. This approach identifies significant spatial temporal patterns of epigenetic modifications along human embryonic stem cell differentiation and cluster regulatory sequences by their spatial temporal epigenetic patterns.
The results show that this approach is effective in capturing epigenetic modification patterns associated with specific cell types. In addition, STCluster allows straightforward identification of coherent epigenetic modes in multiple cell types, indicating the ability in the establishment of the most conserved epigenetic signatures during cellular differentiation process.
An epigenome consists of chemical modifications and variations to histones, DNA methylation and other proteins that package the genome [1, 2]. These epigenetic modifications crucially contribute to epigenetic maintenance of chromatin structures and gene expression regulation . There are various interactions among these modifications, which act combinatorially to orchestrate gene expression in different cell types . When heritable from one cell generation to the next, the epigenetic information can bring about lasting changes in gene expression [5, 6].
Embryonic development is a complex process that requires precise gene regulation to govern developmental decisions during cellular differentiation [7, 8]. However, how gene expression is regulated and maintained along developmental transitions remains to be understood. Currently, it is well accepted that transcription factors binding to cis-regulatory sequences coordinately regulate gene expression in response to various environmental cues . On the contrary, the regulatory functions of epigenetic modifications that accompany embryogenesis are largely unexplored. To fully investigate the mechanisms of epigenetic regulation in the cellular differentiation process, extensive research efforts provide genome-wide maps of epigenetic modifications at multiple developmental time points. Human embryonic stem cells were differentiated into a variety of precursor cell types [3, 10], including mesendoderm , trophoblast-like cells , mesenchymal stem cells  and neural progenitor cells . Mouse embryonic stem cells were also differentiated into mesendoderm cells . The availability of these temporal epigenomic data provides a unique opportunity for characterizing fundamental properties of epigenome dynamics and revealing regulatory roles of epigenetic modifications.
To establish combinatorial patterns of epigenetic modification, previous computational methods primarily utilize spatial information of epigenetic marks. For example, Chromasig was designed to study histone modification patterns using correlations of histone signals . CoSBI also used the correlations within 5Kbp genomic segments to exhaustively searched for histone code . ChromHMM applied a HMM model to annotate genomic sequences by co-occurrence of multiple epigenetic marks . RFECS was developed based on random forests . AWNFR explored epigenomic landscapes using the wavelet transforms . Although these methods successfully identify the combinatorial epigenetic mode based on spatial epigenomic information, there is still an urgent need to exploit newly produced temporal information to study the dynamic patterns and functions of epigenetic modifications.
In this study, we developed a spatial temporal clustering approach that exploits the temporal variation information of epigenomes along the differentiation process, aiming to characterize dynamic properties of epigenetic modifications. This approach identifies significant spatial temporal patterns of epigenetic modifications during embryogenesis and cluster regulatory sequences by their spatial temporal epigenetic patterns. The results might shed a light on how epigenetic modifications evolve temporally and how the spatial temporal patterns of epigenetic modifications regulate gene expression during the process of cellular differentiation.
In mammals, studying the epigenetic mechanisms of early embryonic development often requires access to embryonic cell types. In recent studies, to analyze early human developmental decisions, human embryonic stem cells (hESCs) were differentiated into trophoblast-like cells, mesendoderm, mesenchymal stem cells, and neural progenitor cells [3, 10]. The first three states represent developmental events that mirror critical developmental decisions in the embryo. Mesenchymal stem cells are fibroblastoid cells that are capable of expansion and multi-lineage differentiation to bone, cartilage, adipose, muscle, and connective tissues. In these cell types, genome-wide maps of main epigenetic marks have been generated using ChIP-seq . In detail, the investigated epigenetic modifications were profiled, including H3K4me1/2/3, H3K36me3, H3K9me3, H3K27me3, H3K79me1, H2AK5ac, H2bK120ac, H2BK5ac, H3K18ac, H3K23ac, H3K27ac, H3K4ac, H3K9ac and H4K8ac. RNA expression profiles of these five cell types were also generated using Affymetrix GeneChip-arrays. Here, we downloaded these datasets from the website of NIH Roadmap Epigenome Project (http://www.epigenomebrowser.org/) .
General scheme of the STCluster algorithm
Step 1. Data transformation
The whole human genome were divided into non-overlapping 200bp bins. For each epigenetic modification map, we first computed the summary tag count of every bin. Then, in each cell type, raw sequence read counts of each epigenetic modification were normalized by the total number of reads followed by arcsine transformation , to remove noises resulting from spurious tag counts in the ChIP-seq experiments. Further, we divided the whole genome into 5Kbp genomic regions. In this way, for each cell type, the profiles of these epigenetic modifications are represented as a matrix R i , where i is the index of the genomic regions ranging from 1 to N (assuming there are totally N genomic regions under consideration), as shown in Fig. 1 a. In each region, the number of columns is denoted as B and the number of epigenetic modifications is denoted as K. The column vectors correspond to combinatorial epigenetic modification tag counts within individual genomic bins and the row vectors correspond to the contiguous genomic landscape of individual epigenetic modifications.
Step 2. Construct co-occurrence graph for each cell type
In this step, we computed the correlation coefficients of epigenetic modification pairs in each region, and then we constructed the corresponding co-occurrence graph for each cell type. Given the processed and organized epigenetic modification data of each cell type, correlation coefficients of any two histone modifications at every region were calculated to obtain a coefficient matrix. If the coefficients are higher than a given threshold, the two epigenetic modifications are regarded as coherent in this region. Subsequently, this region was added to the corresponding region set. Based on the coefficient matrix, we further constructed the co-occurrence graph, which is modeled as an undirected graph G = (V, E), where V is the set of all histone modifications. For any two epigenetic modification types h i and h j (i ≠j), if they are correlated at any region, there exists an edge e∈E between vertices h i and h j . In addition, each edge in the co-occurrence graph is associated with the region set. Figure 1 b shows an example of co-occurrence graph. Here, we set the correlation coefficient threshold as 0.9 to achieve a high quality of spatial clusters.
Step 3. Mine spatial clusters from co-occurrence graph
The co-occurrence graph represents in a compact way all the correlated epigenetic modifications in different regions. It can be used to mine potential spatial clusters corresponding to each developmental stage, and filter out most of the unrelated data. The STCluster algorithm applies a depth first search (DFS) strategy on the co-occurrence graph to mine all the spatial clusters. A typical spatial cluster represents a group of genomic regions that share spatial epigenetic patterns. To gain the significant epigenetic states, we set the minimum number of histone modifications as 5 and the minimum percent of regions as 0.1%. For each cell type, we identified a set of spatial clusters, as shown in Fig. 1 c.
Step 4. Identify spatial-temporal clusters from spatial clusters
On obtaining the maximal spatial cluster set for all cell types, we utilized them to mine the maximal spatial-temporal clusters. This is accomplished by enumerating the subsets of the time points (Fig. 1 d), using a process similar to the spatial cluster clique mining. The regions in each spatial temporal clusters exhibit similar changes of epigenetic modifications during the cellular differentiation process. Spatial-temporal clusters indicate specific conserved chromatin signatures that are shared by multiple time points along the embryonic stem cell differentiation.
Results and Discussion
Identifying combinatorial epigenetic states during differentiation
To investigate combinatorial epigenetic states during the differentiation of embryonic stem cells, we applied STCluster to the genome-wide epigenetic modification maps of five cell types, including H1, Mesendoderm, Trophoblast-like cells, Mesenchymal stem cells and Neuronal progenitor cells. STCluster first grouped genomic regions based on spatial patterns of epigenetic modifications to identify spatial clusters. For each cell type, we set the minimum number of histone modifications as 5 and the minimum percent of regions as 0.1%, which allow us to capture patterns that involve at least five epigenetic modifications and re-occur across at least 0.1% of the human genome. With this parameter setting, we respectively identified 3344, 667, 4726, 1422, 1984 spatial clusters in H1, Mesendoderm, Trophoblast-like cells, Mesenchymal stem cells and Neuronal progenitor cells.
The top 10 co-occurred epigenetic modifications in different cell types during the differentiation process
Mesenchymal stem cells
Neuronal progenitor cells
H3K4me2,H3K4me1, H2aK5ac,H3K27ac, H2bK5ac,H3K23ac, H3K9me3,H3K79me1, H3K4ac,H3K9ac, H2bK120ac
H3K4me2,H3K4me1, H3K27ac,H3K23ac, H3K9ac,H3K18ac
H2aK5ac,H2bK5ac, H3K27me3,H3K23ac, H3K9me3,H3K79me1, H3K4ac,H3K9ac, H2bK120ac,H3K18ac
H3K4me2,H3K36me3, H3K27ac,H2bK5ac, H3K23ac,H3K9me3, H3K4ac,H3K9ac, H2bK120ac
H3K4me1,H3K36me3, H2aK5ac,H3K27ac, H2bK5ac,H3K27me3, H3K23ac,H3K79me1, H3K9ac,H2bK120ac
H2aK5ac,H2bK5ac, H3K23ac,H3K79me1, H3K4ac,H2bK120ac
H3K4me2,H3K4me1, H3K27ac,H3K23ac, H3K9ac,H2bK120ac
H3K4me1,H2bK5ac, H3K27me3,H3K23ac, H3K9me3,H3K79me1, H3K4ac,H3K9ac, H2bK120ac,H3K18ac
H3K4me2,H3K36me3, H2bK5ac,H3K23ac, H3K4ac,H2bK120ac
H3K4me1,H3K36me3, H3K27ac,H2bK5ac, H3K27me3,H3K9me3, H3K79me1,H3K9ac, H2bK120ac
H2bK5ac,H3K27me3, H3K23ac,H3K79me1, H3K4ac,H3K9ac
H2aK5ac,H2bK5ac, H3K79me1,H3K4ac, H2bK120ac,H3K18ac
H3K27ac,H2bK5ac, H3K27me3,H3K23ac, H3K9me3,H3K18ac
H3K36me3,H2aK5ac, H2bK5ac,H3K23ac, H3K9me3,H3K4ac H3K9ac
H3K4me2,H2aK5ac, H3K23ac,H3K9me3, H3K4ac,H2bK120ac
H3K4me2,H3K4me1, H2aK5ac,H3K27ac, H2bK5ac,H3K27me3, H3K23ac,H3K79me1, H3K4ac,H3K9ac,H3K18ac
H3K4me1,H3K36me3, H3K27ac,H3K23ac, H3K79me1,H3K4ac, H3K9ac,H2bK120ac, H3K18ac
H3K27ac,H2bK5ac, H3K27me3,H3K23ac, H3K9me3,H2bK120ac
H3K36me3,H2aK5ac, H2bK5ac,H3K23ac, H3K4ac,H3K9ac
H3K4me2,H2aK5ac, H3K23ac,H3K9me3, H3K9ac,H2bK120ac
H2bK5ac,H3K27me3, H3K23ac,H3K79me1, H3K4ac,H3K18ac
H3K27ac,H3K27me3, H3K9me3,H3K4ac, H2bK120ac,H3K18ac
H3K27ac,H2bK5ac, H3K27me3,H3K23ac, H3K9me3,H3K9ac
H3K4me2,H3K36me3, H2bK5ac,H3K23ac, H3K9me3,H3K9ac
H3K27ac,H2bK5ac, H3K27me3,H3K9me3, H3K9ac,H2bK120ac
H2aK5ac,H2bK5ac, H3K9me3,H3K79me1, H3K4ac,H2bK120ac
H3K4me2,H3K27ac, H3K9me3,H3K4ac, H2bK120ac,H3K18ac
H3K27ac,H2bK5ac, H3K27me3,H3K23ac, H3K9me3,H3K79me1
H3K36me3,H2aK5ac, H2bK5ac,H3K9me3, H3K4ac,H2bK120ac
H3K27ac,H2bK5ac, H3K27me3,H3K9me3, H3K4ac,H2bK120ac
H2aK5ac,H2bK5ac, H3K23ac,H3K79me1, H3K4ac,H3K9ac
H3K4me2,H3K4me1, H3K27ac,H3K27me3, H3K9me3,H3K18ac
H3K27ac,H2bK5ac, H3K27me3,H3K23ac, H3K9me3,H3K4ac
H3K4me2,H3K36me3, H2bK5ac,H3K23ac, H3K9me3,H3K4ac
H3K4me2,H3K4me1, H3K23ac,H3K9me3, H3K9ac,H2bK120ac
H3K4me2,H2aK5ac, H3K27ac,H2bK5ac, H3K27me3,H3K23ac, H3K9me3,H3K79me1, H3K4ac,H3K9ac,H3K18ac
H3K4me2,H3K4me1, H3K27ac,H3K27me3, H3K4ac,H3K18ac
H3K4me2,H3K27ac, H3K27me3,H3K23ac, H3K9me3,H3K9ac
H3K4me2,H3K36me3, H2bK5ac,H3K23ac, H3K4ac,H3K9ac, H2bK120ac
H3K4me2,H3K4me1, H3K23ac,H3K9me3, H3K4ac,H2bK120ac
H3K4me2,H3K4me1, H2aK5ac,H3K27ac, H2bK5ac,H3K23ac, H3K9me3,H3K79me1, H3K4ac,H3K9ac H3K18ac
H3K4me2,H3K4me1, H3K27ac,H3K27me3, H3K4ac,H2bK120ac
H3K4me1,H3K27ac, H3K27me3,H3K23ac, H3K9me3,H3K79me1
H3K4me2,H3K4me1, H3K9me3,H3K4ac, H3K9ac,H2bK120ac
H3K4me2,H3K4me1, H3K36me3,H2aK5ac, H3K27me3,H3K23ac, H3K79me1,H3K9ac, H2bK120ac
H2aK5ac,H2bK5ac, H3K9me3,H3K79me1, H3K4ac,H3K9ac
H3K4me2,H3K27ac, H3K27me3,H3K4ac, H2bK120ac,H3K18ac
H3K4me1,H3K27ac, H3K27me3,H3K23ac, H3K9me3,H3K79me1
H3K4me2,H3K36me3, H2bK5ac,H3K23ac, H3K9me3,H2bK120ac
H3K4me2,H3K4me1, H2aK5ac,H3K27ac, H3K9me3,H3K9ac, H2bK120ac
Identifying conserved epigenetic states during the differentiation of ES cells
The detailed information of all identified spatial temporal clusters are listed in Additional file 1. The results indicate that there exist conserved epigenetic states during the differentiation process. We observed a high co-occurrence and stable patterns of H3K4me2 with H3K23ac, H3K18ac, H3K27ac and H3K9ac at five different stages along the differentiation process. Our observation is consistent with the previous finding that H3K4me2 is one of the backbone epigenetic modifications along with H3K27ac and H3K9ac [18, 24]. On the contrary, some epigenetic modification patterns are only coherent in certain cell types. For example, the variation pattern of epigenetic modifications <H3K27me3, H3K9me3, H3K79me1, H3K4ac, H3K9ac > is shared in four cell types except Mesendoderm. <H3K4me2, H3K23ac, H3K27ac, H3kK27me3, H3K79me1, H3K18ac > are only shared in H1, Mesendoderm, and Trophoblast-like cells except Mesenchymal stem cells and Neuronal progenitor cells. Notably, the identified spatial temporal clusters reveal more details of the differentiation process.
Analyzing the regulatory roles of epigenetic modifications during differentiation
Identifying epigenomic dynamics is important to understand mechanisms for gene regulation. Our knowledge about the temporal patterns of epigenetic modifications and the consequence of them are still limited. There is a urgent need to develop new computational approach that exploits the complex epigenomic landscapes and discovers significant signatures out of them. In this study, we developed a spatial temporal clustering algorithm to explore the epigenomic landscapes of five cell types during embryonic stem cell differentiation. Using this approach, we identified spatial temporal patterns of epigenetic modifications in early embryogenesis. Different from previous computational methods, our approach is designed to investigate the dynamic epigenetic landscapes as well as the combinational epigenetic modes. The experimental results demonstrate that the proposed STCluster algorithm could successfully capture dynamic epigenetic modification patterns associated with specific cell types. In addition, STCluster allows straightforward identification of epigenetic conservation at multiple developmental stages during cell differentiation process.
Authors are grateful to NIH Roadmap Epigenome Project for providing the genome-wide data to carry out this work. We thank anonymous reviewers for their useful comments on the manuscript.
This article has been published as part of BMC Bioinformatics Volume 17 Supplement 17, 2016: Proceedings of the 27th International Conference on Genome Informatics: bioinformatics. The full contents of the supplement are available online at http://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-17-supplement-17.
This work and the publication costs were supported in part by the National Natural Science Foundation of China (61272380, 61300100, 61303096), Shanghai Natural Science Foundation (13ZR1451000, 13ZR1454600, 15ZR1400900), Chen Guang project supported by Shanghai Municipal Education Commission and Shanghai Education Development Foundation, and the Fundamental Research Funds for the Central Universities (16D111208).
Availability of data and materials
Datasets used in this study can be accessed at the following URL http://www.tongjidmb.com/human/index.html.
YLG and JHG have designed the clustering algorithm. YLG and HT have implemented and tested the method. GBZ and CRY have coordinated data preprocessing. All authors have read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
No ethics approval was required for this work, as all results reported have already been published elsewhere.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Bernstein BE, Meissner A, Lander ES. The mammalian epigenome. Cell. 2007; 128(4):669–81.View ArticlePubMedGoogle Scholar
- Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, Wei G, Chepelev I, Zhao K. High-resolution profiling of histone methylations in the human genome. Cell. 2007; 129(4):823–37.View ArticlePubMedGoogle Scholar
- Gifford CA, Ziller MJ, Gu H, Trapnell C, Donaghey J, Tsankov A, Shalek AK, Kelley DR, Shishkin AA, Issner R, et al.Transcriptional and epigenetic dynamics during specification of human embryonic stem cells. Cell. 2013; 153(5):1149–1163.View ArticlePubMedPubMed CentralGoogle Scholar
- Chen T, Dent SY. Chromatin modifiers and remodellers: regulators of cellular differentiation. Nat Rev Genet. 2014; 15(2):93–106.View ArticlePubMedGoogle Scholar
- Kouzarides T. Chromatin modifications and their function. Cell. 2007; 128(4):693–705.View ArticlePubMedGoogle Scholar
- Zhou VW, Goren A, Bernstein BE. Charting histone modifications and the functional organization of mammalian genomes. Nat Rev Genet. 2011; 12(1):7–18.View ArticlePubMedGoogle Scholar
- Vastenhouw NL, Schier AF. Bivalent histone modifications in early embryogenesis. Curr Opin Cell Biol. 2012; 24(3):374–86.View ArticlePubMedPubMed CentralGoogle Scholar
- Young RA. Control of the embryonic stem cell state. Cell. 2011; 144(6):940–54.View ArticlePubMedPubMed CentralGoogle Scholar
- Tsankov AM, Gu H, Akopian V, Ziller MJ, Donaghey J, Amit I, Gnirke A, Meissner A. Transcription factor binding dynamics during human es cell differentiation. Nature. 2015; 518(7539):344–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Xie W, Schultz MD, Lister R, Hou Z, Rajagopal N, Ray P, Whitaker JW, Tian S, Hawkins RD, Leung D, et al.Epigenomic analysis of multilineage differentiation of human embryonic stem cells. Cell. 2013; 153(5):1134–1148.View ArticlePubMedPubMed CentralGoogle Scholar
- Yu P, Pan G, Yu J, Thomson JA. Fgf2 sustains nanog and switches the outcome of bmp4-induced human embryonic stem cell differentiation. Cell Stem Cell. 2011; 8(3):326–34.View ArticlePubMedPubMed CentralGoogle Scholar
- Xu RH, Chen X, Li DS, Li R, Addicks GC, Glennon C, Zwaka TP, Thomson JA. Bmp4 initiates human embryonic stem cell differentiation to trophoblast. Nat Biotechnol. 2002; 20(12):1261–1264.View ArticlePubMedGoogle Scholar
- Vodyanik MA, Yu J, Zhang X, Tian S, Stewart R, Thomson JA, Slukvin II. A mesoderm-derived precursor for mesenchymal stem and endothelial cells. Cell Stem Cell. 2010; 7(6):718–29.View ArticlePubMedPubMed CentralGoogle Scholar
- Chen G, Gulbranson DR, Hou Z, Bolin JM, Ruotti V, Probasco MD, Smuga-Otto K, Howden SE, Diol NR, Propson NE, et al.Chemically defined conditions for human ipsc derivation and culture. Nat Methods. 2011; 8(5):424–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Yu P, Xiao S, Xin X, Song CX, Huang W, McDee D, Tanaka T, Wang T, He C, Zhong S. Spatiotemporal clustering of the epigenome reveals rules of dynamic gene regulation. Genome Res. 2013; 23(2):352–64.View ArticlePubMedPubMed CentralGoogle Scholar
- Hon G, Ren B, Wang W. Chromasig: a probabilistic approach to finding common chromatin signatures in the human genome. PLoS Comput Biol. 2008; 4(10):1000201.View ArticleGoogle Scholar
- Ucar D, Hu Q, Tan K. Combinatorial chromatin modification patterns in the human genome revealed by subspace clustering. Nucleic Acids Res. 2011; 39(10):4063–75.View ArticlePubMedPubMed CentralGoogle Scholar
- Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB, Zhang X, Wang L, Issner R, Coyne M, et al.Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. 2011; 473(7345):43–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Rajagopal N, Xie W, Li Y, Wagner U, Wang W, Stamatoyannopoulos J, Ernst J, Kellis M, Ren B. Rfecs: a random-forest based algorithm for enhancer identification from chromatin state. PLoS Comput Biol. 2013; 9(3):1002968.View ArticleGoogle Scholar
- Nguyen N, Vo A, Won KJ. A wavelet-based method to exploit epigenomic language in the regulatory region. Bioinformatics. 2014; 30(7):908–14.View ArticlePubMedGoogle Scholar
- Furey TS. Chip–seq and beyond: new and improved methodologies to detect and characterize protein–dna interactions. Nat Rev Genet. 2012; 13(12):840–52.View ArticlePubMedPubMed CentralGoogle Scholar
- Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A, Meissner A, Kellis M, Marra MA, Beaudet AL, Ecker JR, et al. The nih roadmap epigenomics mapping consortium. Nat Biotechnol. 2010; 28(10):1045–1048.View ArticlePubMedPubMed CentralGoogle Scholar
- Pinello L, Xu J, Orkin SH, Yuan GC. Analysis of chromatin-state plasticity identifies cell-type–specific regulators of h3k27me3 patterns. Proc Natl Acad Sci. 2014; 111(3):344–53.View ArticleGoogle Scholar
- Wang Z, Zang C, Rosenfeld JA, Schones DE, Barski A, Cuddapah S, Cui K, Roh TY, Peng W, Zhang MQ, et al.Combinatorial patterns of histone acetylations and methylations in the human genome. Nat Genet. 2008; 40(7):897–903.View ArticlePubMedPubMed CentralGoogle Scholar
- Karlić R, Chung HR, Lasserre J, Vlahoviček K, Vingron M. Histone modification levels are predictive for gene expression. Proc Natl Acad Sci. 2010; 107(7):2926–931.View ArticlePubMedPubMed CentralGoogle Scholar