### Gene-level Summary Score

Gene annotation was based on Refseq; we obtained 17,772 genes in total. For four of the five modifications (H3K4me2, H3K4me3, H3K27me3 and H3K9me3), the tag counts peak near the TSS; therefore the average was taken over the regions from -2 kb to +2 kb with respect to the TSS. The tag counts for H3K36me3 are highest around the 3'-end of the coding region of a gene; thus, we took the average tag counts over this area as a gene's corresponding score. Genes with more than 50% repetitive sequences in either of these two regions were not used in further analysis. Our preliminary data manipulations led to the elimination of sites with poor ChIP-seq coverage, resulting 17,469 (98.3%) genes used in further analysis.

### Clustering

With the gene-specific summary scores, each gene is associated with an

*m*-dimensional vector, where

*m* is the number of histone modification marks. As an initial guess for the number of domain types, we clustered the

*m*-dimensional vectors using the

*k*-means average agglomeration clustering method. The optimal cluster number

*k* was selected using the gap statistic [

26], defined as

where
is the observed within-cluster sum of squares around the clusters means for one run, and *E*(.) represents the mean value for 1000 random bootstrap permutations.

### Hidden Markov model

We chose HMM to infer domain locations, where the hidden state at a given gene represented the associated domain type, and the emission variables are the *m*-dimensional vector summarizing the local histone modification pattern. We assumed that our emissions followed a multivariate Gaussian distribution.

When dealing with ChIP-seq data, researchers often like to assume a Poisson distribution for the counts mapped to each bin. This was not appropriate for our analysis for two main reasons: (1) we fit our model on non-integer summary scores to examine domain structure at the gene level and (2) the multi-dimensionality of our study. By the central limit theory, even if our raw counts followed a Poisson distribution, an average score of these counts (say over a promoter region) would follow a Gaussian distribution. To evaluate all five modifications simultaneously, we assumed that together they were from a multivariate Gaussian emission distribution with no covariance structure. We also checked the validity of our assumption by comparing distribution of the score data to its corresponding Poisson and Gaussian distributions. For these reasons, we assumed that our score data followed a multivariate Gaussian distribution.

We used the expectation-maximization (EM) algorithm to estimate the model parameters, and then used the Viterbi algorithm to infer the maximum likely state configuration [

38]. One technical problem is that the EM algorithm can only achieve local optimization, and the results are dependent on the initial condition. One possible approach to overcome this problem is to repeat the procedure many times, each with a randomly selected initial guess. However, we found that it is more efficient to choose a particular initial guess based on the clustering results; that is, using the cluster means and variances as initial guesses for the model parameters associated with the hidden states. To test whether our clustering method-based prior led to the optimal model (in addition to being more efficient), we compared its resulting log-likelihood to that of 100 models where we randomly selected our prior parameters from the semi-conjugate hierarchical model:

where *μ*
_{
ks
} is the prior mean for modification *k* (*k = 1, 2, ..., 5*) and state *s* (*s = 1, 2, 3*), *σ*
_{
ks
}
^{
2
} is the initial variance for modification *k* and state *s*, *θ*
_{
k
} is the sample mean for modification *k*, *s*
^{
2
}
_{
k
} is the sample variance for modification *k*, and *ν*
_{
k
} is the degrees of freedom for modification *k* such that *E(σ*
_{
ks
}
^{
2
}
*)=(ν*
_{
k
}
*-2)*
^{
-1
}
*= s*
^{
2
}
_{
k
}. We found that using the results from *k*-means clustering to pick our prior results in a much higher final log-likelihood than this hierarchal prior (Additional file 13: Figure S7).

We also used the Viterbi algorithm, an approximation of the Forward-Backward algorithm, to assign a state to each gene. To determine the accuracy of the Viterbi method, we compared its corresponding hidden maximum likelihood estimate (MLE) state assignment to that obtained by the forward-backward algorithm, which assigns states based on posterior probabilities, also known as a maximum a posteriori (MAP) estimate. These two algorithms were compared on the 615 gene chromosome 19 and produced similar hidden states. The Viterbi algorithm was used to assign gene state as it is more computationally efficient.

The number of hidden states was equal to the optimal number of clusters from our gap statistic results. For the mouse data considered in this paper, our mouse data, the optimal number was *K = 3*, where *K* is the number of clusters. However, the gap statistic for *K = 2* was similar, so we compared both setups in our analysis.

### Significance of detected domains

To determine statistical significance of a domain, we first calculated a likelihood ratio test statistic for each domain

*j (j = 1,2, ..., n)*:

where x
_{i} is the observed *m*-dimensional vector of histone modifications for gene *i (i = 1, 2, n*
_{
j
}), *n*
_{
j
} is the number of genes in domain *j*, X
_{
i
}
*| H*
_{
0
}
*~N(μ*
_{
0
}
*, Σ*
_{
0
}
*)* and X
_{
i
}
*| HMM ~N(μ*
_{
s
}
*, Σ*
_{
s
}
*)*. Note that *μ*
_{
0
} and *Σ*
_{
0
} are the *m*-dimensional mean vector and diagonal variance matrix of the entire dataset whereas *μ*
_{
s
} and *Σ*
_{
s
} are their individual state-based counterparts. To calculate *λ*
_{
j
}
*(*
X
*)* for each domain, the *μ*
_{
s
} and *Σ*
_{
s
} of the corresponding maximum likelihood estimate state was used. Based on likelihood theory, λ_{j}(**X**) ~ χ^{2}
_{df} where the degrees of freedom (*df*) is the difference in parameters between models. Thus, *df = 2*S*m-2*m*, where *S* is the number of states in the model. To correct for multiple hypothesis testing, the significant domains were selected such that the FDR = 0.01.

### Numerical simulations

We simulated histone modification data corresponding to prescribed domain configurations and assessed the accuracy of our model. We assumed that the histone modification data for each gene was normally distributed with the mean and variances estimated from the real data (Tables 3 &4). To further explore the model robustness against data noise, we also repeated the above simulations with variances equal to twice the observed variances. We applied our HMM inference procedure to the simulated data. The accuracy of our model was quantified as the percentage of correctly assigned states.

### Gene Ontology analysis

To test for functional coherence, we examined the accordance of GO memberships within a domain via a new statistic for each significant domain. For domain *j*, we calculated the level of GO accordance, *Y*
_{
j
}, with Fleiss' Kappa [27], which is a generalization of the standard kappa for more than two raters (or in our case, ontologies). We then calculated the average accordance for these domains, and compared it to a distribution made under the null hypothesis (i.e., the hypothesis that domains are independent of GO memberships).

For the null distribution, we took into account the fact that neighboring genes often share GO annotations. To this end, we selected *1000*n* random sections of the genome (of corresponding equal length in terms of number of genes to our significant domains), calculated their *Y*
_{
j
}'s and averaged them to get a null distribution for our average accordance. The p-values were evaluated as the proportion of permuted means that are larger than the observed mean accordance. Thus the minimum possible p-value was 0.001.

### Gene expression data analysis

To account for variability in gene activity, expression data in the ES cell line was normalized across data from 23 cell lines [10, 25] to get a Z-statistic for each gene. We then calculated the within-domain variance for each significant domain and compared the average of these values to that of a random permutation.

To determine the significance of our within domain variance, we randomly selected 1000**n* sections of the genome (each with an equal number of genes as its corresponding significant domains) and calculated the average within domain variances.

### Epigenetic domains in NP cells

The histone modification data in NP cells were obtained from a published dataset [10, 25]. The ChIP-seq data in NP cells were normalized against those in ES cells by a negative-binomial regression as recommended [39]. We assumed that the model parameters in the NP cells are identical to those in the ES cells, and inferred the hidden states in NP cells by using the Viterbi algorithm again. We determined domain changes by comparing the results in NP and ES cells. A domain is called changed if at least one gene changed state between the two cell lines.