Comparative analysis of chromatin landscape in regulatory regions of human housekeeping and tissue specific genes
© Ganapathi et al; licensee BioMed Central Ltd. 2005
Received: 17 November 2004
Accepted: 26 May 2005
Published: 26 May 2005
Global regulatory mechanisms involving chromatin assembly and remodelling in the promoter regions of genes is implicated in eukaryotic transcription control especially for genes subjected to spatial and temporal regulation. The potential to utilise global regulatory mechanisms for controlling gene expression might depend upon the architecture of the chromatin in and around the gene. In-silico analysis can yield important insights into this aspect, facilitating comparison of two or more classes of genes comprising of a large number of genes within each group.
In the present study, we carried out a comparative analysis of chromatin characteristics in terms of the scaffold/matrix attachment regions, nucleosome formation potential and the occurrence of repetitive sequences, in the upstream regulatory regions of housekeeping and tissue specific genes. Our data show that putative scaffold/matrix attachment regions are more abundant and nucleosome formation potential is higher in the 5' regions of tissue specific genes as compared to the housekeeping genes.
The differences in the chromatin features between the two groups of genes indicate the involvement of chromatin organisation in the control of gene expression. The presence of global regulatory mechanisms mediated through chromatin organisation can decrease the burden of invoking gene specific regulators for maintenance of the active/silenced state of gene expression. This could partially explain the lower number of genes estimated in the human genome.
Eukaryotic gene transcription is largely known to be orchestrated by protein factors like activators, co-activators and co-repressors . However, nucleosomal organisation, non-passive structural scaffolds and global structure of chromatin are increasingly being recognised as major players in the regulation of gene expression. The ability of sequences to position nucleosomes and to be anchored to the nuclear matrix to provide a spatial context for regulation of expression are measurable parameters that may influence the interactions with transcription machinery [2, 3]. This level of regulation may be distinctly different for genes whose expression is constitutive in comparison to genes that exhibit tissue specific expression. The latter would demand an open chromatin configuration in certain tissues and repressive organisation in others. In this study, we examined whether the potential to utilise global regulatory mechanisms to control gene expression through chromatin organisation varies between housekeeping and tissue specific genes (Hkg and Tsg respectively) by virtue of their organisation. An in-silico comparison of chromatin related organisational differences in the 5' and 3' regulatory regions of housekeeping and tissue specific genes was carried out to shed light in this direction.
Results and discussion
Chromatin landscape of a region plays a major role in determining and modulating the expression status of its neighbouring genes . The role played by chromatin in the 5' regulatory regions of genes in transcriptional regulation has been extensively studied [5, 6]. In the present study, we have taken 2 distinct sets of genes differing predominantly in their spatial expression aspect, namely, housekeeping and tissue specific, to understand the various attributes of the regulatory role played by chromatin organisation in the 5' region.
Analysis of scaffold/matrix associated sequences
Distribution of putative S/MARs in housekeeping and tissue specific genes.
Putative S/MARs in 5' regions (%)*
Putative S/MARs in 3' regions (%)*
presence of S/MAR
absence of S/MAR
The observation that the 5' regulatory regions of Hkg are less enriched in S/MARs in comparison with Tsg might be related to the distribution of housekeeping genes in the genome. Housekeeping genes cluster in chromosomes and therefore, they often would be present in distinct chromatin domains along with housekeeping genes that have a co-ordinated expression [9, 10]. The data showing preferential absence of S/MARs in the 3' regions in Hkg further lend support to this hypothesis. On the other hand, tissue specific genes are known to be dispersed in gene dense as well as heterochromatic regions [9, 11]. It may be necessary for them to shield themselves against the effects of positive and negative cis-acting elements of adjacent regions in order to maintain tissue specific expression profile. In this context, the boundary elements or the insulator model has been proposed earlier . S/MARs function as boundary elements and their co-localisation with insulators such as the Drosophila gypsy element is also reported [12, 13]. They also function as boundary elements in in vitro systems by shielding away the position effect . Some earlier reports have suggested a role for S/MARs in maintaining tissue specific gene expression . More recently, the 5'-HS4 chicken-globin insulator is known to have a CTCF protein binding dependent matrix association . Hence, the over representation of S/MARs seen in Tsg set might possibly be associated with a boundary element function.
Our results on the prediction performance of the programs have been quite different from the previous reports . We find that MAR Finder (an under predictor) predicts more number of S/MAR regions in our dataset in comparison to ChrClass program (an over predictor) . This may be attributed to the use of the advanced version of MAR Finder in our study wherein, new parameters/features have been added in the form of the "New MAR Rules" option.
Analysis of nucleosomal organisation
The primary template for local and global changes in the chromatin structure of a chromosome is the nucleosomal unit . Chromatin structure and nucleosomal organisation over the promoter regions play a major role in regulation of expression of downstream gene(s) [6, 17]. The nucleosome distribution would depend upon the occurrence of nucleosome destabilising elements as well as nucleosome forming sequences. We have analysed both these parameters in our study.
Nucleosome destabilising elements
Nucleosome destabilising/excluding elements such as poly (dA.dT) and (CCGNN)n in promoter regions have been implicated in maintaining constitutive gene expression [18–21]. At the functional level, it is known that poly (dA.dT) elements increase the accessibility of promoters of HIS3, URA3 and Ilv1 in yeast to the cognate transcription factor . With the increasing length of poly (dA.dT) repeat, the availability of the sequences to transcription factors improves and similarly, with increasing lengths, the propensity to exclude nucleosomes increases for (CCGNN)n sequence motif as demonstrated in yeast and mammalian systems [19–21]. It has been demonstrated that (CCGNN)n sequences promote meiotic recombination and activated HIS4 expression by generating open chromatin .
Distribution of poly (dA.dT) repeats of various lengths in the 5' upstream regions of housekeeping and tissue specific genes.
Poly (dA.dT) stretch (bp)
No. of repeat stretches in the two classes
No. of genes with repeats in 5' region (%)
§ P -value
In Hkg, 670 repeats of (CCGNN)2–5 were detected as against 430 in Tsg. (CCGNN)2 was the most prevalent repeat unit and uninterrupted repeat units (>5 mers) were not found in the sequence sets. Although shorter repeat units (2–5 mers) have not been studied for nucleosome exclusion, they might play a role in destabilising the histone octamer . Further, many of them form a part of longer interrupted stretches. The t-test for difference in distribution of (CCGNN)2–5 between Hkg and Tsg shows a significant P-value of 1.71E-06.
Nucleosome formation potential scores and expression level of genes
Using Recon, Levitsky et al (2001) have examined the nucleosome formation potential of 3 classes of human genes namely, Hkg, Tsg and widely expressed genes that differ in their spatial expression status . Their report, based on a small sample size of around 200 genes shows the difference in the nucleosome formation potential between these 3 classes of genes in the upstream 50 bp from the transcription start site. In this study, we examined the nucleosome formation potential values in upstream 2000 bp of 5' regions of Hkg and Tsg and their correlation with gene expression levels with the complete set of 1083 genes.
t-test P-values for the difference in the distribution of nucleosome formation potential scores between housekeeping and tissue specific genes.
P-value in intervals of scores
-1.2 to -1
-1 to -0.8
0.8 to 1
1 to 1.2
Correlation coefficients of total expression levels (log10) with nucleosome formation potential scores in housekeeping (Hkg) and tissue specific genes (Tsg).
-1.2 to -1
-1 to -0.8
0.8 to 1
1 to 1.2
Comparison of the level of correlation between nucleosome formation potential scores and contrasting expression levels of genes.
-1.2 to -1 #
-1 to -0.8 #
0.8 to 1 #
1 to 1.2 #
Our data restates that chromatin in 5' region plays a major role in determining the ubiquitous or restricted tissue expression of a gene as shown by Levitsky et al (2001) . The abundance of nucleosome exclusion elements in Hkg 5' regions and the low Recon scores reflect their poor preference for nucleosome assembly. The expression analysis suggests that although chromatin plays a role in bringing about extreme variations of gene expression levels in certain classes of genes such as the housekeeping genes, the relation is not linearly correlated with the total, wider range of expression levels. It is possible that nucleosomes might be involved in fine-tuning of expression levels that may escape our attention, since the difference in the range of expression considered is fairly large. The difference detected in nucleosome formation potential between the two sets might reflect the accessibility to basal transcription factors for Hkg and gene/tissue specific transcription factors for Tsg, considering the difference in spatial and temporal expression patterns of the two groups.
Analysis of repetitive sequences
The distribution of Alu repeats in 5' upstream regions of housekeeping (Hkg) and tissue specific genes (Tsg) is represented in terms of the number of copies and basepairs covered by Alu repeats.
No. of copies
% of the total sequences covered by the repeat
t-test P-values for the difference in the distribution of Alu repeats in 5' upstream regions of housekeeping and tissue specific genes.
No. of repeats
Repeat content (bp)
Genes with high expression levels are clustered in genomic regions known as ridges. These gene rich regions also have high (G+C) content, SINES and genes with short introns . Eisenberg and Levanon  have reported the presence of significantly shorter introns and an overall compact gene structure in Hkg as compared to non-Hkg . We have used the gene list provided by Eisenberg and Levanon  for our analysis. The enrichment of SINES in the 5' regions of Hkg suggests that Hkg might be localised in the ridge regions of the genome. More recently, it has been suggested that the contrasting attributes of gene compactness, GC content and the length of the intronic and intergenic sequences in Hkg and Tsg might be involved in chromatin mediated regulation for maintaining distinct expression patterns in the gene sets . Recently, Alu elements have been shown to house transcription factor binding sites and the presence of such regulatory elements might influence the chromatin structure and gene expression .
We have demonstrated that the regulatory regions of housekeeping and tissue specific genes have differential chromatin architecture with respect to S/MAR binding, nucleosome positioning potential and repetitive sequences. This has potential implications for regulation of gene expression in eukaryotic genomes.
In this study, the 5' and 3' flanking regions of genes were analysed for various attributes of chromatin organisation. The list of human housekeeping genes (Hkg) was retrieved from http://www.compugen.co.il/supp_info/Housekeeping_genes.html[28, 36]. 532 genes have been categorised as housekeeping because of their ubiquitous and high expression levels in 47 tissues. The list and expression levels of the human tissue specific genes were obtained from Eli Eisenberg (personal communication). 566 genes expressed in only a single tissue were taken as tissue specific genes (Tsg) and analysed. We could unambiguously retrieve sequences of 525 Hkg and 558 Tsg from human genome build 33 (NCBI). Approximately, 2000 bp of the 5' and 3' regions from each of these genes were taken for analysis.
Scaffold/matrix associated regions (S/MAR) analysis
MAR Finder was used for prediction of S/MAR regions [37, 38]. All the default options and the "New MAR Rules" were selected for predicting S/MARs. ChrClass program was used for S/MAR prediction [39, 40].
Nucleosome organisation and gene expression correlation analysis
The upstream regions (2000 bp) were scanned for nucleosome exclusion elements [18, 20] – poly (dA.dT) pure stretches of >10 bp length and [5' (CCGNN) 3']2–5 using in-house programs. Recon was used for evaluating nucleosome formation potential in the sequences [2, 41]. The score outputs of the 5' regions were categorised in frequency intervals of 0.2 with a range from -3.2 to +3.2. The Recon scores around +1 and -1 imply strong nucleosome formation and exclusion potentials respectively. The scores in the four intervals of relevance (0.8 to 1, 1 to 1.2, -0.8 to -1 and -1 to -1.2) were taken for all the analyses. Since the promoter region information was not retrieved for these genes, the 2000 bp upstream region from the gene start site was split into 400, 800, 1200 & 1600 bp and analysed.
The Recon scores at 400 bp were used to draw correlation between the nucleosome formation potential and expression levels in the two sets of genes. In each sequence set, genes with expression levels <500 and >5000 affymetrix expression units were classified as low and high expression genes respectively. We considered a minimum ten fold difference in the expression levels of genes as a relevant criterion for classifying them as high and low expression genes. In Hkg, this criterion yielded 33 low expression and 35 high expression genes. In Tsg, we categorised 416 low expression genes and 24 high expression genes.
Repetitive sequence analysis
RepeatMasker version: 20040306-web was used to calculate the repeat content in 2000 bp upstream sequences of the two groups of genes .
List of abbreviations
scaffold/matrix attachment regions
tissue specific genes
degree of freedom
MG acknowledges the financial support provided by Council for Scientific and Industrial Research (CSIR), India. SKB thanks Council for Scientific and Industrial Research, India and VB thanks Indian Council for Medical Research (ICMR), for financial assistance through a grant. The authors wish to acknowledge Dr. Beena Pillai, Dr. Rakesh Sharma, Dr. Neeraj Pandey and Dr. Mitali Mukerji for their valuable discussions and suggestions. We would also like to acknowledge Samira for careful checking and helping with the manuscript.
- Lemon B, Tjian R: Orchestrated response: a symphony of transcription factors for gene control. Genes Dev 2000, 14: 2551–2569. 10.1101/gad.831000View ArticlePubMed
- Levitsky VG, Podkolodnaya OA, Kolchanov NA, Podkolodny NL: Nucleosome formation potential of eukaryotic DNA: calculation and promoters analysis. Bioinformatics 2001, 17: 998–1010. 10.1093/bioinformatics/17.11.998View ArticlePubMed
- Bode J, Benham C, Knopp A, Mielke C: Transcriptional augmentation: modulation of gene expression by scaffold/matrix-attached regions (S/MAR elements). Crit Rev Eukaryot Gene Expr 2000, 10: 73–90.View ArticlePubMed
- Grewal SI, Moazed D: Heterochromatin and epigenetic control of gene expression. Science 2003, 301: 798–802. 10.1126/science.1086887View ArticlePubMed
- Boeger H, Griesenbeck J, Strattan JS, Kornberg RD: Nucleosomes unfold completely at a transcriptionally active promoter. Mol Cell 2003, 11: 1587–1598. 10.1016/S1097-2765(03)00231-4View ArticlePubMed
- Wolffe AP: Transcriptional activation. Switched-on chromatin. Curr Biol 1994, 4: 525–528. 10.1016/S0960-9822(00)00114-7View ArticlePubMed
- Glazko GV, Rogozin IB, Glazkov MV: Comparative study and prediction of DNA fragments associated with various elements of the nuclear matrix. Biochim Biophys Acta 2001, 1517: 351–364.View ArticlePubMed
- Glazko GV, Koonin EV, Rogozin IB, Shabalina SA: A significant fraction of conserved noncoding DNA in human and mouse consists of predicted matrix attachment regions. Trends Genet 2003, 19: 119–124. 10.1016/S0168-9525(03)00016-7View ArticlePubMed
- Versteeg R, van Schaik BD, van Batenburg MF, Roos M, Monajemi R, Caron H, Bussemaker HJ, van Kampen AH: The human transcriptome map reveals extremes in gene density, intron length, GC content, and repeat pattern for domains of highly and weakly expressed genes. Genome Res 2003, 13: 1998–2004. 10.1101/gr.1649303PubMed CentralView ArticlePubMed
- Lercher MJ, Urrutia AO, Hurst LD: Clustering of housekeeping genes provides a unified model of gene order in the human genome. Nat Genet 2002, 31: 180–183. 10.1038/ng887View ArticlePubMed
- de Laat W, Grosveld F: Spatial organisation of gene expression: the active chromatin hub. Chromosome Res 2003, 11: 447–459. 10.1023/A:1024922626726View ArticlePubMed
- Byrd K, Corces VG: Visualization of chromatin domains created by the gypsy insulator of Drosophila. J Cell Biol 2003, 162: 565–574. 10.1083/jcb.200305013PubMed CentralView ArticlePubMed
- Nabirochkin S, Ossokina M, Heidmann T: A nuclear matrix/scaffold attachment region co-localizes with the gypsy retrotransposon insulator sequence. J Biol Chem 1998, 273: 2473–2479. 10.1074/jbc.273.4.2473View ArticlePubMed
- Kim JM, Kim JS, Park DH, Kang HS, Yoon J, Baek K, Yoon Y: Improved recombinant gene expression in CHO cells using matrix attachment regions. J Biotechnol 2004, 107: 95–105. 10.1016/j.jbiotec.2003.09.015View ArticlePubMed
- Bonifer C, Yannoutsos N, Kruger G, Grosveld F, Sippel AE: Dissection of the locus control function located on the chicken lysozyme gene domain in transgenic mice. Nucleic Acids Res 1994, 22: 4202–4210.PubMed CentralView ArticlePubMed
- Yusufzai TM, Felsenfeld G: The 5'-HS4 chicken beta-globin insulator is a CTCF-dependent nuclear matrix-associated element. Proc Natl Acad Sci U S A 2004, 101: 8620–8624. 10.1073/pnas.0402938101PubMed CentralView ArticlePubMed
- Khorasanizadeh S: The nucleosome: from genomic organisation to genomic regulation. Cell 2004, 116: 259–272. 10.1016/S0092-8674(04)00044-3View ArticlePubMed
- Suter B, Schnappauf G, Thoma F: Poly(dA.dT) sequences exist as rigid DNA structures in nucleosome-free yeast promoters in vivo. Nucleic Acids Res 2000, 28: 4083–4089. 10.1093/nar/28.21.4083PubMed CentralView ArticlePubMed
- Koch KA, Thiele DJ: Functional analysis of a homopolymeric (dA-dT) element that provides nucleosomal access to yeast and mammalian transcription factors. J Biol Chem 1999, 274: 23752–23760. 10.1074/jbc.274.34.23752View ArticlePubMed
- Wang YH, Griffith JD: The [(G/C)3NN]n motif: a common DNA repeat that excludes nucleosomes. Proc Natl Acad Sci U S A 1996, 93: 8863–8867. 10.1073/pnas.93.17.8863PubMed CentralView ArticlePubMed
- Iyer V, Struhl K: Poly(dA:dT), a ubiquitous promoter element that stimulates transcription via its intrinsic DNA structure. Embo J 1995, 14: 2570–2579.PubMed CentralPubMed
- Kirkpatrick DT, Wang YH, Dominska M, Griffith JD, Petes TD: Control of meiotic recombination and gene expression in yeast by a simple repetitive DNA sequence that excludes nucleosomes. Mol Cell Biol 1999, 19: 7661–7671.PubMed CentralPubMed
- Grover D, Mukerji M, Bhatnagar P, Kannan K, Brahmachari SK: Alu repeat analysis in the complete human genome: trends and variations with respect to genomic composition. Bioinformatics 2004, 20: 813–817. 10.1093/bioinformatics/bth005View ArticlePubMed
- Jordan IK, Rogozin IB, Glazko GV, Koonin EV: Origin of a substantial fraction of human regulatory sequences from transposable elements. Trends Genet 2003, 19: 68–72. 10.1016/S0168-9525(02)00006-9View ArticlePubMed
- Brahmachari SK, Meera G, Sarkar PS, Balagurumoorthy P, Tripathi J, Raghavan S, Shaligram U, Pataskar S: Simple repetitive sequences in the genome: structure and functional significance. Electrophoresis 1995, 16: 1705–1714. 10.1002/elps.11501601283View ArticlePubMed
- Grover D, Majumder PP, Rao CB, Brahmachari SK, Mukerji M: Nonrandom distribution of alu elements in genes of various functional categories: insight from analysis of human chromosomes 21 and 22. Mol Biol Evol 2003, 20: 1420–1424. 10.1093/molbev/msg153View ArticlePubMed
- Vijaya S, Steffen DL, Robinson HL: Acceptor sites for retroviral integrations map near DNase I-hypersensitive sites in chromatin. J Virol 1986, 60: 683–692.PubMed CentralPubMed
- Eisenberg E, Levanon EY: Human housekeeping genes are compact. Trends Genet 2003, 19: 362–365. 10.1016/S0168-9525(03)00140-9View ArticlePubMed
- Vinogradov AE: Compactness of human housekeeping genes: selection for economy or genomic design? Trends Genet 2004, 20: 248–253. 10.1016/j.tig.2004.03.006View ArticlePubMed
- Oei SL, Babich VS, Kazakov VI, Usmanova NM, Kropotov AV, Tomilin NV: Clusters of regulatory signals for RNA polymerase II transcription associated with Alu family repeats and CpG islands in human promoters. Genomics 2004, 83: 873–882. 10.1016/j.ygeno.2003.11.001View ArticlePubMed
- Nemeth A, Langst G: Chromatin higher order structure: opening up chromatin for transcription. Brief Funct Genomic Proteomic 2004, 2: 334–343.View ArticlePubMed
- Pennisi E: Human genome. A low number wins the GeneSweep Pool. Science 2003, 300: 1484. 10.1126/science.300.5625.1484bView ArticlePubMed
- Brahmachari SK, Ramesh N, Shouche YS, Mishra RK, Bagga R, Meera G: Unusual DNA Structures: Sequence Requirements and Role in Transcriptional Control. In Structure and Methods. Volume 2. Edited by: Sarma RH, Sarma MH. New York: Adenine Press; 1990:33–49.
- Conrad M, Brahmachari SK, Sasisekharan V: DNA structural variability as a factor in gene expression and evolution. Biosystems 1986, 19: 123–126. 10.1016/0303-2647(86)90024-9View ArticlePubMed
- Gilbert N, Boyle S, Fiegler H, Woodfine K, Carter NP, Bickmore WA: Chromatin architecture of the human genome: gene-rich domains are enriched in open chromatin fibers. Cell 2004, 118: 555–566. 10.1016/j.cell.2004.08.011View ArticlePubMed
- List of housekeeping genes[http://www.compugen.co.il/supp_info/Housekeeping_genes.html]
- Singh GB, Kramer JA, Krawetz SA: Mathematical model to predict regions of chromatin attachment to the nuclear matrix. Nucleic Acids Res 1997, 25: 1419–1425. 10.1093/nar/25.7.1419PubMed CentralView ArticlePubMed
- MAR Finder[http://futuresoft.org/MAR-Wiz/]
- Rogozin IB, Glazko GV, Glazkov MV: Computer prediction of sites associated with various elements of the nuclear matrix. Brief Bioinform 2000, 1: 33–44.View ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.