Prediction of regulatory elements in mammalian genomes using chromatin signatures
© Won et al; licensee BioMed Central Ltd. 2008
Received: 23 August 2008
Accepted: 18 December 2008
Published: 18 December 2008
Recent genomic scale survey of epigenetic states in the mammalian genomes has shown that promoters and enhancers are correlated with distinct chromatin signatures, providing a pragmatic way for systematic mapping of these regulatory elements in the genome. With rapid accumulation of chromatin modification profiles in the genome of various organisms and cell types, this chromatin based approach promises to uncover many new regulatory elements, but computational methods to effectively extract information from these datasets are still limited.
We present here a supervised learning method to predict promoters and enhancers based on their unique chromatin modification signatures. We trained Hidden Markov models (HMMs) on the histone modification data for known promoters and enhancers, and then used the trained HMMs to identify promoter or enhancer like sequences in the human genome. Using a simulated annealing (SA) procedure, we searched for the most informative combination and the optimal window size of histone marks.
Compared with the previous methods, the HMM method can capture the complex patterns of histone modifications particularly from the weak signals. Cross validation and scanning the ENCODE regions showed that our method outperforms the previous profile-based method in mapping promoters and enhancers. We also showed that including more histone marks can further boost the performance of our method. This observation suggests that the HMM is robust and is capable of integrating information from multiple histone marks. To further demonstrate the usefulness of our method, we applied it to analyzing genome wide ChIP-Seq data in three mouse cell lines and correctly predicted active and inactive promoters with positive predictive values of more than 80%. The software is available at http://http:/nash.ucsd.edu/chromatin.tar.gz.
Transcriptional regulation in eukaryotic cells requires highly orchestrated interactions between transcription factors (TFs), their co-factors, RNA polymerase and the chromatin [1, 2]. Several classes of regulatory elements, including promoters, enhancers, silencer and insulators, are involved in this process. Systematic and precise mapping of these elements in the genome is essential for understanding transcriptional programs responsible for temporal and tissue specific gene expression. A high throughput experimental approach has recently been used to tackle this problem and it involves the chromatin immunoprecipitation assay followed by microarray (ChIP-chip)[3, 4] or large scale sequencing (ChIP-Seq)[5–8]. Currently, this approach is still limited by the availability of antibody specifically recognizing individual TFs at different regulatory elements. Another method involves comparative genomic analysis of related genomes[9, 10] and clustering of multiple sequence motifs[11–13]. This approach has been successfully applied to a number of eukaryotic genomes including yeast, Drosophila and mammal genomes (see review, for example, ). These methods rely on precise alignment of regulatory elements across multiple genomes which is not necessarily true for all elements, or prior knowledge of a set of cooperative TFs which is not always available.
Recently, a chromatin based regulatory element mapping approach has been proposed. This approach exploits the observation that transcriptional promoters and enhancers are associated with distinct chromatin signatures. Specifically, the active promoters are characterized by tri-methylation on Lys4 in H3 (H3K4me3), while the active enhancers are associated with mono methylation of this residue and a much reduced or non-existent signal of the tri-methylation . Currently, it is not yet clear what mechanisms underlie the different chromatin signatures at these two classes of cis-regulatory sequences, but the characteristic chromatin signatures of regulatory elements provide a pragmatic way to systematically identify these elements in the genome without prior knowledge of the underlying sequences. Compared with the other methods, there are several advantages of this chromatin-based approach. First, it requires no prior knowledge of the sequence features of the promoters or enhancers; Second, the chromatin modification profiles could be obtained for most organisms as the existing antibodies can specifically recognize the characteristic histone modifications in different species. Third, this approach does not make the assumption that promoters or enhancers are evolutionarily conserved, thereby can identify fast evolving regulatory elements in the genome.
To overcome the above limitations, we developed a method coupling HMM with simulated annealing (SA)  (a HMM-SA procedure) to identify promoters and enhancers based on chromatin signatures. The HMM is capable of extracting more information from the chromatin modification profile signals, is less sensitive to the measurement noise of an individual histone mark, and can automatically select the most informative combination of histone marks as well as the optimal window size. In each run of SA we trained HMMs[17, 18] using the 105 promoters and 73 enhancers determined by the ChIP-chip experiments on RNAPII, TAF1 and p300. Inside each HMM, the histone patterns are regarded as continuous observation densities emitted from the HMM states. The number of histone patterns is the input dimension of the HMM. The optimal combination and window size of histone modifications to discriminate promoters from enhancers were searched using the HMM-SA procedure. We then used the trained HMMs to predict promoters and enhancers in the entire ENCODE regions. Below, we describe this method and the results comparing the performance of our new method with the previous method. We also demonstrated that including more histone marks can further boost the performance of our method, which is also distinct from the profile-based method. In addition, we showed the usefulness of our method on predicting the activity of promoters in the mouse genome using histone modification data generated by ChIP-Seq .
Results and discussion
Find the most informative combination and the optimal window size of histone modifications
To characterize chromatin signature of promoters and enhancers, one needs to define the histone modifications that can discriminate different regulatory elements. Since the chromatin signals from ChIP-chip analysis typically span thousands of base pairs, a small window may not fully capture the chromatin signature while a large window may include non-informative regions to deteriorate the prediction accuracy. Therefore, an optimal window size is critical in predicting promoters and enhancers using histone modification patterns. To find the most informative combination and the optimal window size, we coupled the hidden Markov model (HMM) with simulated annealing (SA)  (see Methods).
To compare with the profile-based method, we considered the 105 promoters and 73 enhancers determined by the ChIP-chip experiments on RNAPII, TAF1 and p300 in the Heintzman et al. study. The datasets were divided into two equal sets, one for training and one for evaluation. The HMM-SA procedure started with a random combination of histone marks and a random window size chosen from 1, 2, 4, 6, 8, 10 and 12 kb centered on the TSSs or p300 peaks. We have conducted 100 independent simulations and collected all the final outputs of the combinations of histone marks and the window size.
The results of 100 HMM-SA runs.
H4ac, H3ac, H3Kme1, H3K4me2, H3K4me3, H3
H3Kme1, H3K4me2, H3K4me3, H3
H4ac, H3ac, H3Kme1, H3K4me2, H3
H4ac, H3Kme1, H3K4me2, H3K4me3,
H3Kme1, H3K4me2, H3K4me3, H3
H3ac, H3Kme1, H3K4me2, H3K4me3, H3
H3Kme1, H3K4me2, H3K4me3
H4ac, H3Kme1, H3K4me2, H3
H4ac, H3ac, H3Kme1, H3K4me2, H3K4me3
H4ac, H3Kme1, H3K4me2
H4ac, H3ac, H3Kme1, H3K4me3, H3
H3Kme1, H3K4me2, H3
H3ac, H3Kme1, H3K4me2
H3ac, H3Kme1, H3K4me2, H3
Number of times used
Prediction rate (promoter/enhancer)
Occurrence of each histone modification in the most informative combinations found by the 100 HMM-SA runs.
Cross validation shows that HMM method predicts enhancers more accurately than the profile-based method
Comparison of cross-validation results for predicting promoters and enhancers.
Promoter PPVa (standard deviation)
Enhancer PPV (standard deviation)
HMM method using 6 histone signaturesb
HMM method using 2 histone signaturesc
Heintzman et al. using 6 histone signatures
Heintzman et al. using 2 histone signatures
Analysis on the trained HMM
After we validated the HMM model using cross-validations, we further examined the probability density distribution of each state in the HMM. The 3-state HMMs with no backward transition were trained on promoters and enhancers separately. This type of structures without backward transitions has been widely used in speech recognition to capture the pattern of speech . The second state usually corresponds to the location of TSS or p300 peaks in this configuration. The first and the third states correspond to the upstream and downstream profiles of chromatin modifications, respectively. Figure S1 (A,B,C,D,E,F) (see Additional files 1, 2, 3, 4, 5, 6) shows the probability density of Gaussian mixtures for the three states of every chromatin mark on promoters and enhancers. It is obvious that promoters and enhances present differences in distributions of probability density, which reflects the chromatin modification patterns in these regions. For example, the probability density of H3K4me3 showed peaks in the high ChIP-chip ratio regions for promoters compared to peaks in the low ratio regions for enhancers (Figure S1(E), see Additional file 5). In addition, examining the probability density distribution of the three states in the promoters suggested that the HMM model also captured the characteristics of the chromatin modification profiles. The probability density functions of the third state for H3K4me3 were peaked around 2.5 of ChIP-chip log ratio. The second state and the first state peaked at low ChIP-chip ratios with lower probability. This is indeed a bimodal pattern with higher ChIP-chip ratios on the downstream, which is consistent with the finding in the previous study .
Promoter prediction in the ENCODE regions
Comparison of PPV = TP/(TP+FP) in promoter predictions using the annotated TSS sites.
Total Prediction (TP+FP)
Heintzman et al.
< 1.0 × 10-16
HMM(c1 = 2.205)
< 1.0 × 10-16
HMM (c1 = 1.6)
< 1.0 × 10-16
HMM (c1 = 0.5)
< 1.0 × 10-16
Total Prediction (TP+FP)
Heintzman et al.
< 1.0 × 10-16
HMM (c1 = 2.1)
< 1.0 × 10-16
HMM (c1 = 1.367)
2.8 × 10-3
HMM (c1 = 0.5)
< 1.0 × 10-16
Comparison of active promoter predictions.
Expression Supported Prediction
Heintzman et al. b
HMM (c1 = 1.95)
HMM (c1 = 1.6)
HMM (c1 = 0.5)
Expression Supported Prediction
Heintzman et al. 
HMM (c1 = 1.853)
HMM (c1 = 1.367)
HMM (c1 = 0.5)
Enhancer prediction in the ENCODE regions
We also used the trained HMM to predict enhancers in the ENCODE regions. We found 319 (82.01%) and 243 (75.00%) common enhancers predicted in the untreated (389 predictions) and treated cells (324 predictions), respectively, by the HMM method and the profile-based method with the same number of total predictions. To compare the performances of the two methods, we checked how many of them were supported (within 2.5 kb) by nearby p300 and TRAP200 binding sites as well as DNase hypersensitivity sites (DHSs). p300 is a transcriptional co-activator[22, 23]. TRAP220 is a component of the Mediator complex[22, 23] that have been shown to bind to enhancers as well as promoters. DHSs are nucleosome free regions that are often occupied by enhancers . We only considered p300, TRAP220 and DHS sites that are distal (> 2.5 kb) from any TSS to avoid confusion with promoters.
Comparison of enhancer predictions in the untreated Hela cells.
Heintzman et al.  total 389 prediction
HMM method total 389 prediction
distal p300 (n = 94)
77 (sensitivity = 81.91%)
82 (sensitivity = 87.23%)
distal DHS (n = 587)
165 (sensitivity = 28.11%)
179 (sensitivity = 30.49%)
Distal TRAP220 (n = 77)
43 (sensitivity = 55.84%)
47 (sensitivity = 61.04%)
Any of distal (DHS, p300, TRAP220)
206 (PPV = 52.96%)
213 (PPV = 54.76%)
In the treated cell, only p300 binding data was available and it was used to evaluate the predictions of the two methods. While Heintzman et al. had 104 out of 318 predictions overlapping with p300 sites (sensitivity = 104/147 = 70.75%, PPV = 32.70%), the HMM method found 109, out of 288, p300-supported predictions (sensitivity = 76.22%, PPV = 37.85%). Again, the HMM method outperformed the profile-based method in this test set.
Including additional histone marks can further improve the performance of the HMM method
Recently, Hon et al. conducted the same ChIP-chip experiments on more histone modification marks, H3K9Ac, H3K18Ac, H3K27Me3 and H3K27Ac, in the ENCODE regions. A robust method should achieve better performance when including additional data. We applied the HMM method to this larger dataset and evaluated its performance as above. After training the HMM predictor using all ten histone marks and a window size of 2 kb, same as in the six histone mark dataset, we predicted promoters and enhancers in the ENCODE regions. We observed a significant improvement in the promoter predictions (Figure 6a). The HMM method using 10 histone marks was quite close to the ideal line even when other methods reached plateau. For example, the HMM made 291, out of 341, correct predictions (PPV = 291/341 = 85.34%) using 10 histone marks and only 264 out of 337 correct predictions using 6 histone makers. Such improvement became more significant when more predictions were made.
The performance of the HMM method on enhancer prediction was also improved using more histone marks (Figure 6b). For example, 232 enhancers were correctly predicted (PPV = 54.46%) using the 10 histone marks, compared with 226 correct predictions (53.05%) among the same number (426) of the total predictions using the 6 histone marks. The improvement was not as significant as in the case of the promoters. It is possibly because the evidences of true enhancers (p300/TRAP200 binding and DHS sites) are not as direct as those for the promoters (the annotated TSSs were determined using full length cDNA).
Prediction of active and inactive promoters using genome-wide ChIP-Seq data
Compared with ChIP-chip, ChIP-Seq is more costly effective and probably also less noisy on mapping chromatin modifications at the genome-wide scale. We investigated how well our method works with the ChIP-Seq data generated by Mikkelsen et al. in the three mouse cell lines: embryonic stem (ES) cells, neural progenitor cells (NPCs) and embryonic fibroblasts (MEFs). We first compared the patterns of the four histone marks, H3K4me3, H9K4me3, H3K27me3, and H3K36me3, around TSSs because these four marks were measured in all the three cell lines. We assigned each promoter to one of the four groups based on the gene expression level measured in the same study. We averaged the sequencing read counts of each group around TSS. The active and inactive promoters exhibit distinct patterns of all but H3K9me3 marks (Figure S2, see Additional file 7). Strong signals of H3K4me3 in the active promoters and H3K27me3 in the inactive promoters are consistent with their known functions. H3K36me3, a mark for transcriptional elongation, shows a quite spread out pattern around TSS.
Predicted active and inactive promoters in the mouse genome.
Refseq Supported PPV
Predicted promoters not present in the expression measurement
Expression Supported PPV
Refseq Supported PPV
Predicted promoters not present in the expression measurement
Expression Supported PPV
We present here an HMM method to predict promoters and enhancers using their characteristic histone modification patterns. We used a HMM-SA procedure to automatically select the most informative and the optimal window size of histone modifications. We showed that the more histone marks are considered, the better the performance of the HMM can achieve. We compared the HMM method with the best prediction results using the profile-based method in the Heintzman et al. study. The cross-validation test showed that the HMM method performed better than the profile-based method, especially in the enhancer classification (Table 3). This observation suggests that the HMM method has a better capability to learn complicated patterns particularly for the weak signals around enhancers. Because correct identification of distal enhancers is critical in deciphering transcriptional regulation, this feature of HMM gives it an edge over the profile-based method.
We also found that the window size of 2 kb gave the best balance between inclusion of sufficiently strong signals and exclusion of non-informative ones that undermine the prediction accuracy. However, the improvement of using a 2 kb window instead of 10 kb was rather small compared to the use of HMM (Table 1). It suggests that the improvement in classification is mainly from the HMM's ability to capture the characteristic patterns of histone modifications for multiple marks.
We demons trated that the HMM method outperforms the previously developed profile-based method on predicting promoters and enhancers using chromatin signatures, particularly on the independent test dataset in the HeLa cells treated with IFNγ. The profile-based method performed well with small number of predictions. It reached the maximum true positives (TPs) when the number of promoter predictions was about 230 (Table 4 and Figure 6). Beyond 230, TPs almost do not increase with the number of predictions. In contrast, the HMM method keeps making correct predictions and it outperformed the profile-based method even more significantly (Figure 6). The improvement in enhancer prediction is not significant (Figure 6), which may be due to the limited knowledge of enhancer positions in the genome. We only evaluated the prediction accuracy using the DHS and the binding sites of p300 and TRAP220 that may miss many enhancers.
The HMM method is also less sensitive to noise in individual histone modifications. As shown in Figure 2(A) the profile method failed to find a TSS where H3K3me3 signal is weak. The HMM method predicted this TSS by using all the histone marks. In Figure 2(B) the HMM method predicted an enhancer that is supported by both p300 and DHS sites. Weak signal of H3K4me3 may cause the failure of the profile based method of identifying this site. An opposite example is shown in Figure 2(C) where a relatively stronger H3K4me3 signal than typical enhancers prevents identification of DHS site to be enhancers by the profile-based method while the HMM method was not affected.
In the present work, we did not further distinguish sub-clusters of promoters and enhancers as in the study of Heintzman et al. to avoid overfitting. It is very likely that promoter and enhancers may have distinct histone modification patterns depending upon their functional state (active, repressed or poised) . As histone modification data are becoming available on more histone marks and on the entire human genome , it is possible to train separate or refined HMMs for promoter/enhancer in different functional states, which should further improve the performance of our model.
We also demonstrated the success of our approach on analyzing ChIP-Seq data. By including chromatin marks that are characteristics of transcription, our method could successfully predicted the activities of promoters. If annotated enhancers are available for training the HMMs, it is straightforward to extend our predictions to enhancers. With the fast accumulation of chromatin modification data, we believe that our method will provide a useful tool in systematically mapping regulatory elements.
The histone modification data were obtained from the Heintzman et al. study . The averaged profile and individual histone marks are shown in Figure 1, comparing the histone patterns on promoter and enhancer. We followed their smoothing procedure. Data were grouped into 100 bp bins and the values of probes within each bin were averaged, e.g. a histone pattern of 2 Kb consists of 20 bins. The regions not covered by probes were linearly interpolated if the size of the uncovered region is less than 1000 bp. Heintzman et al. studied histone modifications in both untreated HeLa cells and HeLa cells treated with IFNγ. To design a classifier, HMMs were trained on promoters, enhancers and background, respectively. Previous studies demonstrated that p300 and related acetyltransferases are present at enhancers and promoters. Heintzman et al. determined 124 and 182 p300 binding sites in the untreated and treated HeLa cells, respectively. We used 74 p300 binding sites in the untreated cells after removing those within 2.5 kb of the known 5' ends of genes. These sites were enriched with DNaseI hypersensitive sites (69.7%) and over 60% of them were conserved across species . These evidences strongly support that distal p300 binding sites represent a subset of enhancers. Heintzman et al. used 106 active promoters in the untreated cells that were centered at annotated RefSeq TSSs as their training data for promoters . In the current study, one promoter and one enhancer were deleted from the training set used by Heintzman et al. because they included many unprobed regions.
While Heintzman et al. only tested on the window size of 10 kb centered on TSS and p300 binding sites, we tested various window size. The candidate window sizes of histone marks for the HMM-SA procedure were 1, 2, 4, 6, 8, 10, and 12 kb. Once the optimal window size 2 kb was selected by HMM-SA, all the training dataset of 105 promoters and 73 enhancers were used to train HMMs to predict promoters and enhancers in the ENCODE regions. The histone patterns in the cell treated with IFNγ were used as an independent test set.
The HMM classifier
where x is the vector being modeled and c jm is the mixture coefficient for the m th Gaussian in state j; G [x, μ jm , U jm ] represents the Gaussian function with mean vector μ jm and covariance matrix U jm . The forward and backward algorithm was used to estimate the transition probabilities and the mixture coefficients in each state. We trained three HMMs for promoters, enhancers and background separately. We set Q = (number of bins)/k to change the number of states depending on the length of data (we set k = 8) and the minimum Q was set to 3. The background HMM was designed to have the minimum number of states (Q = 3). Each state is composed of 3 mixtures of Gaussian components (M = 3) to capture the complex histone modification patterns. Models with larger m did not improve the prediction performance (data not shown).
The log-odd score reflects how strong a signal is compared to the background. If the log-odd is below a cutoff (c1, c2), it is regarded as a background signal. The number of prediction depends on these cut-off values. We plotted Figure 3 and 6 while changing the cut-off values.
When we scan the ENCODE regions, we smoothed results by averaging adjacent 3 log-odds and took peaks of the log-odds of promoter over enhancer. This smoothing procedure reduced fluctuations of log-odds along the chromosome, especially at the boundaries of the unprobed regions. If multiple predictions were made within 1.5 Kbp, only the prediction with the highest log-odds was kept. If a promoter and an enhancer were predicted within 1.5 Kb, we only kept the prediction with the higher log-odd. We examined the percentage of promoters and enhancers being correctly predicted while varying the cutoff values c1 and c2 (Figure 3, Figure 6 and Table 4). Using six histone marks we observed the same number of prediction of the HMM predictor as the profile-based method when c1 = 2.205 (untreated) and c1 = 2.1 (treated). We used c2 = 0.25 (untreated) and c2 = 0.0 (treated) to compare the prediction result of the enhancer (Table 6).
Search for the most informative histone modification combination
That is, if E current is greater than the previous value (E), the move is always accepted; otherwise, the move is accepted with a probability of that decreases with T.
To adapt the SA method to our model, we hybridized HMMs with SA. Initially, SA randomly selected a candidate combination of histone modifications. Also, a window size was randomly selected among 1, 2, 4, 6, 8, 10, 12 Kbp. An HMM was trained with the candidate combination and evaluated by E current . E current is defined as:E current = (sensitivity of promoter × 100) + (sensitivity of enhancer × 100).
The combination is accepted with the probability given in equation (3). E current is always accepted if E current >E; otherwise, it is accepted with a probability that generally decreases as the temperature (T) decreases. The next move is made by randomly adding or removing one or two histone patterns and increasing or decreasing one 2 Kb of the window size. This procedure is repeated while decreasing the temperature T. In the simulation we usedT = 0.9 iteration
In the HMM-SA procedure, the 105 promoter and 73 enhancers in the training dataset was divided into training and evaluation sets, half of them were used to train the HMMs and the other half to calculate E current . The training set (52 promoters and 36 enhancers) and the test set are fixed for each run. We set the maximum number of iterations to be 200 to give SA enough burning period. In fact, most simulations were converged in less than 100 iterations. We recorded the results for 30 independent simulations.
We validated the prediction results in the ENCODE regions by calculating how many predicted promoters are supported by annotated TSSs in RefSeq. The adjacent 3 log-odds (1, 2) are averaged. If multiple peaks of promoters or enhancers are found within 1.5 kb, only the highest log-odd is selected. A prediction was considered as correct if the predicted center is within D = 2.5 kb to the closest annotated TSS of a gene. When multiple predicted sites are supported by the same TSS or any enhancer evidence, we merge these predictions. However, when multiple predicted sites are not within the distance, we counted all of them as FPs. The total number of the predicted promoters in Heintzman et al was 208. Since two promoters are referred to the same gene, we treated these two promoters as one and thus the total number of predictions becomes 207. We defined PPV = TP/(TP+FP).
To compare the performance of the two methods, we plotted ROC curves for promoter predictions in both untreated and treated cells (Figure 5). We defined FN as the number of active promoters that were missed in our predictions. It is not very straightforward to define true negatives. We chose to divide the entire ENCODE regions into 2.5 kb-long non-overlapping segments. There were 9928 segments in which no annotated TSSs were found within ± 2.5 kb. We defined TN as the number of segments that did not contain any predicted promoters. The sensitivity and the specificity were given as TP/(TP+FN) and TN/(FP+TN), respectively.
ChIP-Seq data in the three mouse cell lines
Mikkelsen et al. generated the genome-wide mapping of chromatin modifications in three mouse cell lines: embryonic stem (ES) cells, neural progenitor cells (NPCs) and embryonic fibroblasts (MEFs) . Four chromatin marks, H3K4me3, H9K4me3, H3K27me3, and H3K36me3, were measured in all these cell lines. We trained a HMM classifier using the chromatin modification patterns around TSS in the ES cells and tested it in all three cell lines. Based on the gene expression measured by Mikkelsen et al., we randomly selected 200 active and 200 inactive promoters in the ES cells as the training set. Because there were only four chromatin marks, we used all of them in the HMM model. Similar to analysis of ChIP-chip data, we first used a 2 Kb window to locate TSSs in the genome (see above). Considering the spread out pattern of H3K36me3 that distinguishes active from inactive promoters (Figure S2, see Additional file 7), we next used a 10 Kb window to classify the predicted promoters into active or inactive category. A background HMM was trained using the sequencing reads mapped to chromosome 1.
We evaluated the classification performance of our method using gene expression and RefSeq annotation on predictions that could be unambiguously assigned to a gene, namely located 2.5 Kb within an annotated TSS. Mikkelsen et. al conducted replicate measurements of gene expression in the same cell lines (GEO accession number is GSE8024). There were 13482 unique genes in their experiments. The numbers of active and inactive genes in each cell line were counted using the majority rule in the replicate experiments and the genes with marginal expression levels or conflicting calls were excluded (Table 7).
Hidden Markov Model
Transcription Start Site
Positive Predictive Value
DNaseI hypersensitive Site
Receiver Operator Characteristics
We are grateful to Gary Hon and Nathaniel Heintzman for insightful discussion. This work was supported in part by NIH (to WW).
- Levine M, Tjian R: Transcription regulation and animal diversity. Nature 2003, 424: 147–51. 10.1038/nature01763View ArticlePubMedGoogle Scholar
- Bernstein BE, Meissner A, Lander ES: The mammalian epigenome. Cell 2007, 128: 669–81. 10.1016/j.cell.2007.01.033View ArticlePubMedGoogle Scholar
- Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, Simon I, Zeitlinger J, Schreiber J, Hannett N, Kanin E, Volkert TL, Wilson CJ, Bell SP, Young RA: Genome-wide location and function of DNA binding proteins. Science 2000, 290: 2306–9. 10.1126/science.290.5500.2306View ArticlePubMedGoogle Scholar
- Iyer VR, Horak CE, Scafe CS, Botstein D, Snyder M, Brown PO: Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature 2001, 409: 533–8. 10.1038/35054095View ArticlePubMedGoogle Scholar
- Euskirchen GM, Rozowsky JS, Wei CL, Lee WH, Zhang ZD, Hartman S, Emanuelsson O, Stolc V, Weissman S, Gerstein MB, Ruan Y, Snyder M: Mapping of transcription factor binding regions in mammalian cells by ChIP: comparison of array- and sequencing-based technologies. Genome Res 2007, 17: 898–909. 10.1101/gr.5583007PubMed CentralView ArticlePubMedGoogle Scholar
- Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, Wei G, Chepelev I, Zhao K: High-resolution profiling of histone methylations in the human genome. Cell 2007, 129: 823–37. 10.1016/j.cell.2007.05.009View ArticlePubMedGoogle Scholar
- Johnson DS, Mortazavi A, Myers RM, Wold B: Genome-wide mapping of in vivo protein-DNA interactions. Science 2007, 316: 1497–502. 10.1126/science.1141319View ArticlePubMedGoogle Scholar
- Mikkelsen TS, Ku M, Jaffe DB, Issac B, Lieberman E, Giannoukos G, Alvarez P, Brockman W, Kim TK, Koche RP, Lee W, Mendenhall E, O'Donovan A, Presser A, Russ C, Xie X, Meissner A, Wernig M, Jaenisch R, Nusbaum C, Lander ES, Bernstein BE: Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 2007, 448: 553–60. 10.1038/nature06008PubMed CentralView ArticlePubMedGoogle Scholar
- Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ, Mattick JS, Haussler D: Ultraconserved elements in the human genome. Science 2004, 304: 1321–5. 10.1126/science.1098119View ArticlePubMedGoogle Scholar
- Xie X, Lu J, Kulbokas EJ, Golub TR, Mootha V, Lindblad-Toh K, Lander ES, Kellis M: Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature 2005, 434: 338–45. 10.1038/nature03441PubMed CentralView ArticlePubMedGoogle Scholar
- Zhou Q, Wong WH: CisModule: de novo discovery of cis-regulatory modules by hierarchical mixture modeling. Proc Natl Acad Sci USA 2004, 101: 12114–9. 10.1073/pnas.0402858101PubMed CentralView ArticlePubMedGoogle Scholar
- Gupta M, Liu JS: De novo cis-regulatory module elicitation for eukaryotic genomes. Proc Natl Acad Sci USA 2005, 102: 7079–84. 10.1073/pnas.0408743102PubMed CentralView ArticlePubMedGoogle Scholar
- Blanchette M, Bataille AR, Chen X, Poitras C, Laganiere J, Lefebvre C, Deblois G, Giguere V, Ferretti V, Bergeron D, Coulombe B, Robert F: Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression. Genome Res 2006, 16: 656–68. 10.1101/gr.4866006PubMed CentralView ArticlePubMedGoogle Scholar
- Elnitski L, Jin VX, Farnham PJ, Jones SJ: Locating mammalian transcription factor binding sites: a survey of computational and experimental techniques. Genome Res 2006, 16: 1455–64. 10.1101/gr.4140006View ArticlePubMedGoogle Scholar
- Heintzman ND, Stuart RK, Hon G, Fu Y, Ching CW, Hawkins RD, Barrera LO, Van Calcar S, Qu C, Ching KA, Wang W, Weng Z, Green RD, Crawford GE, Ren B: Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat Genet 2007, 39: 311–318. 10.1038/ng1966View ArticlePubMedGoogle Scholar
- Kirkpatrick S, Gelatt CD Jr, Vecchi MP: Optimization by Simulated Annealing. Science 1983, 220: 617–680. 10.1126/science.220.4598.671View ArticleGoogle Scholar
- Rabiner LR: A Tutorial on Hidden Markov-Models and Selected Applications in Speech Recognition. Proceedings of the IEEE 1989, 77: 257–286. 10.1109/5.18626View ArticleGoogle Scholar
- Durbin R, Eddy S, Krogh A, Mitchison G: Biological Sequence Analysis. Cambridge University Press, Cambridge; 1998.View ArticleGoogle Scholar
- Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D: The human genome browser at UCSC. Genome Res 2002, 12(6):996–1006.PubMed CentralView ArticlePubMedGoogle Scholar
- Carninci P, Sandelin A, Lenhard B, Katayama S, Shimokawa K, Ponjavic J, Semple CA, Taylor MS, Engstrom PG, Frith MC, Forrest AR, Alkema WB, Tan SL, Plessy C, Kodzius R, Ravasi T, Kasukawa T, Fukuda S, Kanamori-Katayama M, Kitazume Y, Kawaji H, Kai C, Nakamura M, Konno H, Nakano K, Mottagui-Tabar S, Arner P, Chesi A, Gustincich S, Persichetti F, Suzuki H, Grimmond SM, Wells CA, Orlando V, Wahlestedt C, Liu ET, Harbers M, Kawai J, Bajic VB, Hume DA, Hayashizaki Y: Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet 2006, 38: 626–35. 10.1038/ng1789View ArticlePubMedGoogle Scholar
- Kim TH, Barrera LO, Zheng M, Qu C, Singer MA, Richmond TA, Wu Y, Green RD, Ren B: A high-resolution map of active promoters in the human genome. Nature 2005, 436: 876–80. 10.1038/nature03877PubMed CentralView ArticlePubMedGoogle Scholar
- Hatzis P, Talianidis I: Dynamics of enhancer-promoter communication during differentiation-induced gene activation. Mol Cell 2002, 10: 1467–77. 10.1016/S1097-2765(02)00786-4View ArticlePubMedGoogle Scholar
- Wang Q, Carroll JS, Brown M: Spatial and temporal recruitment of androgen receptor and its coactivators involves chromosomal looping and polymerase tracking. Mol Cell 2005, 19: 631–42. 10.1016/j.molcel.2005.07.018View ArticlePubMedGoogle Scholar
- Felsenfeld G: Chromatin unfolds. Cell 1996, 86: 13–9. 10.1016/S0092-8674(00)80073-2View ArticlePubMedGoogle Scholar
- Hon G, Hawkins D, Harp LF, Ye Z, Ching KA, Antosiewicz JE, Stewart R, Thomson JA, Ren B: Differential roles of promoters, enhancers, and insulators in cell-type specific gene expression. 2007, in press.Google Scholar
- Hon G, Ren B, Wang W: ChromaGibbs: A Gibbs sampling approach to finding common chromatin modification patterns. 2007, in press.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.