Genome-scale study of the importance of binding site context for transcription factor binding and gene regulation
© Westholm et al; licensee BioMed Central Ltd. 2008
Received: 27 June 2008
Accepted: 17 November 2008
Published: 17 November 2008
The rate of mRNA transcription is controlled by transcription factors that bind to specific DNA motifs in promoter regions upstream of protein coding genes. Recent results indicate that not only the presence of a motif but also motif context (for example the orientation of a motif or its location relative to the coding sequence) is important for gene regulation.
In this study we present ContextFinder, a tool that is specifically aimed at identifying cases where motif context is likely to affect gene regulation. We used ContextFinder to examine the role of motif context in S. cerevisiae both for DNA binding by transcription factors and for effects on gene expression. For DNA binding we found significant patterns of motif location bias, whereas motif orientations did not seem to matter. Motif context appears to affect gene expression even more than it affects DNA binding, as biases in both motif location and orientation were more frequent in promoters of co-expressed genes. We validated our results against data on nucleosome positioning, and found a negative correlation between preferred motif locations and nucleosome occupancy.
We conclude that the requirement for stable binding of transcription factors to DNA and their subsequent function in gene regulation can impose constraints on motif context.
Regulation of gene expression enables cells to respond to external signals such as nutrient availability, stress and signalling molecules, and also allows cells in multicellular organisms to differentiate into different cell types. Gene expression is regulated on many different levels such as chromatin structure, splicing of RNA and post-translational protein modifications, but the most important regulatory step takes place at the level of transcription. The rate of transcription is controlled by transcription factors (TFs) that bind to specific DNA sequences (called motifs in the following) in promoter regions upstream of the transcribed sequences. TFs bound to their designated DNA sites can regulate transcription by interacting with the basal transcription machinery or with co-factors, by modifying chromatin structure or by blocking or facilitating access to the DNA for other TFs. The motifs bound by TFs are thus important components in the regulation of gene expression, as they determine which genes different TFs will regulate. Binding sites for many TFs have been characterized [1, 2] and several computational approaches have been developed to identify conserved DNA motifs in promoters of co-regulated genes [3–9]. However, the mere presence of a TF-binding motif in a promoter is not sufficient to guarantee it is bound by this TF in vivo. In fact, most TF-binding motifs found in promoters have no documented effects on gene expression.
An additional level of complexity comes from the presence of multiple distinct motifs in the same promoter. This can increase the number of possible gene expression patterns, and enables cells to fine-tune the response to different conditions. Moreover, since different TFs can modulate each other's DNA binding and/or activity, the location of different motifs with respect to each other (the promoter context) is also important. Several previous studies [3, 10–17] have examined the combinatorial aspects of gene regulation. However, interest has recently focused on the importance of motif context, i.e. how geometric constraints such as the location or orientation of a motif can affect gene expression. Genome-wide localization studies have shown patterns of localization of TFs to motifs closely upstream of transcription start sites . When the overall distribution of motifs in promoters bound by TFs was plotted, enrichment within a region a few hundred bp upstream of the start codon was found . Some de novo motif finding tools (e.g. [4, 20]) used conservation of location as a selection criterion when searching for novel motifs in promoters of co-regulated genes. In an effort to predict gene expression patterns from promoter sequence in yeast , motif context in the form of location and orientation was also included in the model. Regulation was modelled separately for groups of co-expressed genes ("regulons"). However, a later study  showed that including motif context into the models did not improve predictions of gene expression. Another study  modelled the influence of motifs in different contexts on yeast gene expression, without partitioning genes into different sets. Both  and  took into account the orientations of the motifs and their locations relative to the start codons. The model used in  also included combinations of motifs. In  the location of motifs was analyzed on a global scale, in promoters of genes sharing functional annotations in human and mouse. One study  also examined the importance of motif context for combinatorial gene regulation, by studying distances between pairs of motifs. A recent study  presented a motif finding approach where the discovered motifs were further characterized in terms of location and orientation bias. However, none of the above studies has carried out an examination on a global scale where patterns of motif location and orientation relative the coding sequence were correlated with TF-DNA interactions and as well as with gene expression.
In addition to factors such as the locations and orientations of TF-binding motifs, nucleosome occupancy in promoters is also an important predictor of the biological effects of these sites. In most cases, nucleosomes inhibit transcription by blocking access to DNA so that TFs and the basal transcription machinery cannot bind. Consistent with this, promoters of highly transcribed genes are usually depleted of nucleosomes as compared to genes with lower expression [13, 25, 26]. Moreover, active TF binding sites that are bound by TFs are usually depleted of nucleosomes as compared to inactive (cryptic) sites [27, 28].
Except for one study , the studies mentioned above were carried out in yeast. The yeast S. cerevisiae has been the organism of choice when studying regulation of gene expression in eukaryotes. There are several reasons for this, such as the availability of genome wide data on mRNA transcription (for example [29–31]) and TF-DNA interactions [19, 32], the availability of knockout mutants for all yeast genes, including all TFs, and the fact that yeast has a compact genome with small and well-defined promoters.
In this study we have carried out a genome-scale examination of the importance of motif context for both TF-DNA interactions and gene expression in S. cerevisiae. This was done using ContextFinder, a new tool we have designed to identify cases where motif context is likely to be important for gene regulation. For the purpose of this study, we define motif context as the location and the orientation of the motif relative to the start codon, since the distance between transcription start site and start codon is usually fixed in yeast [33–35], and since the position of the start codon always is known (see Methods). It is worth pointing out that the problem investigated in this study is a different problem than the one discussed in previous studies [12, 22], where the aim was to model gene expression, and information about motif context was included in the models. Here, instead of modelling gene expression, we are interested in finding and characterizing cases where motif location and orientation appears to be important for gene regulation, irrespective of the details of this regulation. Our approach is thus related to those used in  and . However, our study differs in two aspects. The first aspect is the data. Tabach et al.  primarily used groups of genes sharing a functional annotation to approximate co-regulation, and also investigated the effects of the locations of six specific motifs on gene expression. The study by Elemento et al.  examined the orientation and location of 23 yeast motifs in connection with gene expression data. In contrast, we have examined 150 yeast motifs both in co-expressed promoters and in promoters bound by the same TF. The data used in our study covers a wider range of yeast motifs and is closely connected to the biological function of the motifs in terms of both TF binding and gene regulation. Consequently, basing analysis on these data is likely to provide a more accurate picture of the effects of motif context. The other new aspect in our work is methodological: The method used in  was based on performing separate tests for motif enrichment within different regions of the promoter. This results in many p-values (one for each region), without any obvious statistical interpretation with regard to the overall bias in motif location. Moreover, that study did not consider motif orientation. The method used in  used a randomization test to provide a single p-value for location bias. However, no significance measure was provided for the orientation bias. Instead, orientation bias was reported if one orientation of a motif contained significant information about gene expression (compared to a threshold) but not the other orientation. A drawback of that approach is that the two orientations are not compared directly to each other, but only to the significance threshold. In contrast the method presented here fits a model to the motif distribution and specifically looks for differences in orientation and location between a set of active promoters and a background set of promoters. Two p-values are returned, one for bias in location and one for bias in orientation, making the results easy to interpret.
We have developed a method, implemented in a program called ContextFinder that can identify cases where motif context is likely to be important for gene regulation. The basic idea behind ContextFinder is to look for differences between a selected set of promoters (for example promoters bound by a given TF or promoters of co-expressed genes) and a control set (typically all other promoters except the selected set). The differences of interest to us are the locations and orientations of a specific motif. This tool is then used together with experimental data to study how common location and orientation bias is, for DNA binding and for regulation of gene expression.
Data and Procedure
ContextFinder takes as input a selected set of promoters, a control set and a motif. The underlying assumption is that motifs found in the selected set are biologically active in some way (for instance, by binding TFs and/or regulating gene expression) while motifs in the control set are not. We proceed to determine if the distribution of motifs in the selected set of promoters is significantly different from the control set. This is done by fitting a model to the data in which the motif frequency depends on the set that the promoter belongs to, the location within the promoter, the orientation of the motif and interactions between these factors. Significance in the form of p-values for location bias (difference in location between the selected set and the control set) and orientation bias (difference in orientation between the two sets) are then computed from the model. For a detailed description of the procedure, see the Methods section. A web interface to ContextFinder is available at .
To carry out an genome-wide study of motif context that goes beyond looking at a few individual examples, we used a comprehensive list of known yeast motifs  together with sets of genes derived from data describing DNA binding of TFs [19, 32] and gene co-expression data . All motifs were tested against all sets of genes in order to identify cases where a known motif is enriched in a given set of promoters. ContextFinder was then applied to all such cases (in total, 280 for the TF binding data and 23 for the gene expression data). We focused our studies on protein encoding genes for two reasons. Firstly, the vast majority of all TFs are involved in regulating such genes, which accounts for much of the complexity in gene expression. Secondly, it is easy to define the location of a motif by using the start of the open reading frame as a point of reference, even if the transcription start site has not been mapped for a given gene.
Motif location is important for DNA binding of transcription factors
Frequencies of location and orientation bias in motif-experiment pairs
nr of pairs examined
DNA binding (only unique promoters)
Cases of divergently transcribed genes, where the DNA binding data from the shared promoter region is mapped to both genes, are a potential problem. In such cases a certain motif may be important only for regulation of one of the two genes, and it is its position with respect to the coding region of that gene that matters. The contribution from the other gene will obscure patterns of location and orientation bias. To avoid this problem, we also performed the analysis on a subset of the DNA binding data, where only promoters that were mapped to a single gene were considered. The results are shown in the second row of Table 1. Although fewer motif-experiment instances were examined in this case, the overall results were similar.
Significance of location and orientation bias for selected motif-experiment pairs
Nr selected promoters
ABF1_Lee reduce (cgtnnnnnntga) in ABF1_YPD
RAP1_YPD (ccrtaca) in RAP1_YPD
GCN4_Lee reduce (gagtca) in GCN4_YPD
MBP1_lee reduce (acgcgt) in MBP1_YPD
GAL4_lee reduce (cggnnnnnnnnnnnccg) in GAL4_RAFF
PAC_ESR reduce (cgatgag) in group 4
rRPE_ESR reduce (aaaattt) in group 4
RAP1_YPD (ccrtaca) in group 1
MBP1_lee reduce (acgcgt) in group 30
Motif location and orientation is important for effects on gene expression
A different question from the effects of motif position or orientation on TF binding is whether sets of co-expressed genes also show a bias for location or orientation for TF-binding motifs that are shared by these genes. It should be noted that for a given TF to regulate its target genes, it not only has to be able to bind to the DNA, but also has to interact correctly with other molecules, such as the basic transcription machinery and various co-factors. These interactions may introduce additional constraints on motif location or orientation. We therefore expected location or position bias for TF-binding motifs to be even more common among promoters of co-regulated genes than among promoters that simply bind a given TF. As shown in Table 1, this is indeed the case. Thus, out of the 23 motif-group pairs that we examined, we found that 57% (13 pairs) exhibit location bias and 22% (5 pairs) orientation bias (Table 1, third row). These numbers are higher than those associated with just DNA binding (see above). In particular, we note that orientation bias seems to be more common among co-expressed genes, as it was not seen when looking at just DNA binding. These results are in accordance with , where location and orientation bias were also frequently correlated with co-expression. Below we discuss some examples of sets of co-regulated genes that show position and/or orientation bias for TF-binding motifs (the corresponding p-values are shown in Table 2):
In addition to the examples discussed above, location and/or orientation bias was found for the following TF binding motifs: Fkh1/2, Hap4, Msn2/4, Rpn4, and Yap1. The complete results can be found in additional file 2: Table S2.
Preferred motif locations are negatively correlated with nucleosome occupancy
It is becoming increasingly clear that the mere presence of a TF-binding motif in a promoter is not sufficient for correct gene regulation by that TF in vivo, but that the promoter context within which a motif is found also may have a significant effect. The short motifs recognized by TFs, typically six bp or less, are ubiquitously found in genomes, but only a small fraction of these motifs have been shown to be involved in gene regulation. Genome-wide location studies  have shown clear patterns of location bias in motifs bound by TFs in vivo. The study by Nguyen et al.  showed, for a few selected examples, that the same TF binding motif can have different effects on gene expression depending on the location and orientation of the motif. Tabach et al.  showed that promoters of genes sharing functional annotations in the human and the mouse are often enriched for motifs in a region close to the transcription start site. Moreover, the study by Elemento et al.  found that location and orientation bias was common among yeast motifs (but interestingly not P. falciparum motifs) in promoters of co-expressed genes. On the other hand, Yuan et al.  found that including information about motif context in their model did not improve predictions concerning gene expression. However, as pointed out by the authors themselves, this does not necessarily mean that motif context is biologically unimportant. The lack of predictive power when motif context was included in the model could be explained by increased model complexity, which makes training a general model more difficult. This is especially true for large scale models that intend to cover all the regulatory events in a cell, such as the one used in . Thus, the question of how the promoter context influences the biological effects of TF-binding motifs is still largely unsolved. Our study presents the first genome-scale examination where both motif location and orientation is correlated with TF-DNA interactions and well as with co-expression data. For this, we have developed a new tool, ContextFinder. It is specifically aimed at finding and characterizing biologically significant differences in motif context on a genome-wide scale.
ContextFinder is based on a sound statistical framework (see Methods) and works with a wide range of data. ContextFinder does not require any parameter tuning, all that is required is one or several sequence motifs, a set of promoters that has been chosen for study, and a control set to which this set is compared. The set of promoters can be obtained from DNA biding data, expression data, or in some other way. The output of the method is the significances, in the form of p-values, for biases in motif location and orientation. Estimating the performance of ContextFinder is difficult since in general we cannot tell whether a given location or orientation bias is "true" or "false" in the sense that it reflects a biologically important preference that has been selected during evolution. Given that our statistical model is sound, we expect a false discovery rate of 5%. Thus, we expect the majority of the instances of location and orientation bias that are found by ContextFinder to be "true" positives. It is harder to estimate the number of false negatives, since there are a number of possible error sources. One comes from the pre-selection step where we only consider cases of motifs significantly enriched in a given set of genes. This means that we may remove some "true" positives from the subsequent analysis. Another source of error is the lack of a sufficient number of motif occurrences in order to obtain good statistics. For small sets of promoters, or for long and specific motifs, such scarcity of data can lead to "false" negatives. For these reasons, we expect our procedure to be rather conservative.
An overview of how common location and orientation bias is when our method is applied to sets of promoters chosen either from TF-DNA interaction data or from gene expression data is shown in Table 1. Although these numbers depend on the experimental details in each case, they can still provide an estimate of how important motif context is for DNA binding by TFs and their effects on gene expression, respectively. Our results suggest that motif location (but not motif orientation) frequently is important for DNA binding by TFs. Most TFs with location constraints seem to have a preference for motifs that are located 101–400 bp upstream of the coding sequence, which is close to the transcription start sites (located approximately 70 up upstream of the start codon). This may indicate that, for many TFs, interactions with the basal transcriptional machinery are required for stable binding to DNA. However, some TFs, such as Gal4, seem to prefer motifs further upstream.
Unlike the case with DNA binding, when we examined sets of co-expressed genes, we also found bias in the orientation of TF binding motifs. Location bias was also more common among promoters of co-expressed genes, than among promoters that simply share the fact that they bind the same TF. These results seem intuitive, since the activity of a TF in gene regulation involves not only its binding to DNA (which as we have seen above imposes constraints on motif location), but also interaction with other molecules and complexes such as the basal transcriptional machinery or co-factors: This may introduce additional constraints on the location and orientation of the motif.
It should be noted that by using ContextFinder on DNA binding data together with expression data it is possible to draw conclusions concerning the likely source(s) of any context biases found for a given motif. For example, if a motif context is important already for DNA binding, and does not change in the expression data, it is likely that the motif context is required for stable DNA binding. On the other hand, in cases where motif context is important only for gene expression, but not for DNA binding of a TF, we can infer that the processes subsequent to DNA binding by the TF that require a specific motif context. Finally, there may exist cases where some context bias is seen in DNA binding, with further constraints apparently being imposed for the TF to be active in gene regulation. Rap1 and Mbp1 are examples of this.
We have validated our results against two independent data sources: The first is global gene expression data from yeast deletion strains that lack individual TFs . Our results show that there is a significant difference in the effects of these TF deletions on the expression of genes containing binding motifs for the given TF in either the preferred location or in other locations. We have further shown that there is an anti-correlation between motif occurrence and nucleosome occupancy, so that TF-binding motifs in preferred locations are depleted of nucleosomes as compared to motifs in other locations. Similar results were obtained in  for a few examples (Abf1, Reb1 and Mbp1) where the motifs clustered to a region within 80–100 of the transcriptions start site. Since we used additional data to distinguish between biologically active and cryptic motifs, we found many more cases of anti-correlation between nucleosome occupancy and the locations of motifs (see additional file 3: Table 3), also for motifs that are preferentially located further upstream than 100 bp. We interpret the anti-correlation between motif occurrence and nucleosome occupancy, as well as the observed differences in gene expression that correlate with the locations or orientations of motifs, as evidence that motif context in these cases has biological relevance.
There are several possible mechanisms by which motif context could affect DNA binding or activity of individual TFs. Since all TFs studied here tend to bind within 600 bp upstream of the start codon (and most within 400 bp), interactions with the basic pol II transcription machinery are likely to be important. The cases of orientation bias that we found for sets of co-expressed genes could also be due to interactions with the pol II complex or with co-factors, which require the TF to be positioned in a certain way. It is also possible that the induced changes in DNA conformation that are needed for gene regulation, such as DNA bending or unwinding, may impose constraints on the locations and orientations of TF-binding sites. One obvious case is binding of TBP to the TATA-box, a motif which shows strict orientation bias. As for the effects of nucleosome positioning, the region immediately (1–200 bp) upstream of the transcription start site is usually depleted of nucleosomes [26, 28]. Since this region is also enriched for many TF binding sites (e. g. Abf1, Reb1, Mbp1) it may be the case that the ability to bind DNA, which is determined by nucleosome positioning, is the reason why motif context is important for these TFs. However, this does not apply to other TFs, such as Gal4, Rap1 and Swi4, whose binding sites are found further upstream in regions with high nucleosome occupancy. Thus, it is likely that several different mechanisms contribute to the observed biases in motif location and orientation.
In this paper we have presented a new method to identify constraints on motif location and orientation, that may be imposed by the need for stable DNA binding and/or the regulatory functions of transcription factors. Our method is based on a generalized linear model, and outputs p-values describing the significances of any biases in motif locations and orientations.
We used this method to analyse 303 cases of motifs enriched in experimentally selected groups of yeast promoters. Bias in motif location was found to be common for motifs that were enriched in promoters identified as being bound by a specific TF in TF-DNA interaction experiments, whereas bias in both location and orientation was found for motifs enriched in promoters of co-expressed genes. Furthermore, motifs in preferred locations were depleted of nucleosomes, compared to motifs in other locations. These results suggest that motif context is likely to be an important mechanism responsible for TF specificity in gene regulation.
We conclude that when using motif information to predict gene regulatory relationships, information about motif locations and orientations may have to be considered in addition to the mere presence or absence of a motif. We provide the first generally available method to find and characterize biases in motif context, that may easily be accessed though a web interface.
Modelling binding site occurrences
Here yg, o, lis the number of promoters containing the motif, ng, lis the number of available promoters, μ is the intersect, α g is the effect of promoters belonging to the group g, β o is the effect of motif orientation o, and χ l is the effect of the location l. The model also contains interaction effects: (αβ)g, obetween group and orientation, (αχ)g, lbetween group and location and (βχ)o, lbetween orientation and location. After the data has been fitted to the model, the null hypothesis that each coefficient is equal to zero is tested, using the residual deviance. For each coefficient, the residual deviance follows a χ2 distribution (with the same number of degrees of freedom as the coefficient), which enables us to compute a p-value . The coefficients of interest to us are (αβ)g, o(orientation bias, indicating differences in orientation between the two sets of promoters,) and (αχ)g, l(location bias, indicating differences in location). These coefficients were considered significant if the corresponding p-value was below a given threshold. Since many pairs of motifs and promoter groups were considered, the p-values were adjusted for multiple hypothesis testing . The threshold used in our analysis corresponds to a false discovery ratio of 0.05.
As a test of whether it was reasonable to assume a Poisson distribution, we checked for over-dispersion. Dispersion values were computed by dividing the residual deviance from the full model with the degrees of freedom . In ~95% of the cases the dispersion was below 2. For the 5% cases with higher dispersion, the p-values of the coefficients were adjusted accordingly . This procedure did not change the overall results significantly. The program ContextFinder implements this method (in R). A web interface to the program is available at  and the source code is available upon request (and in the process of submission to BioConductor ).
All available yeast promoter regions were retrieved from the RSAT database . The promoter regions ranged from the start codon and 800 bp upstream or until the next ORF was reached, resulting in sequences of variable length. Since the distance between start codon and transcription start site is usually fixed (at around 70 bp) in S. cerevisiae [33–35], we used the start codon, which is easier to locate, instead of the transcription start site. This is not likely to have a major effect on the results, particularly since we use bins of 100 bp in our analysis. As the set of motifs to analyze, we used a list of 150 putative TF binding sites (represented as IUPAC strings) from , along with a few additional motifs, such as the PDS element .
To identify promoters that are bound by a specific TF, ChIP-chip data from 350 experiments (using different TFs and growth conditions) from Lee et al.  and Harbison et al.  were used. For each experiment, all promoters with p-values below 0.01 were considered to be bound by the TF, and all other promoters were used as the control set.
To examine promoters of co-regulated genes, the grouping of genes from  was used. The genes in that study were clustered on expression data from two studies: response to different types of environmental stress  and progression through the cell cycle . This resulted in 49 sets of genes. For each set, the promoters in all other sets were used as the control set.
The next step was to find motif-experiment pairs that could be used for further analysis, i.e. where the motif was significantly enriched among the selected promoters. Motif enrichment was tested using a one-sided hypergeometrical test on the number of selected promoters with and without the motif, compared to the number of control promoters with and without the motif. Since the number of the tested motif-experiment pairs was large, the threshold for motif enrichment was set rather strict, to 1e-8. This resulted in 292 motif-experiment pairs from the DNA binding data, and 26 motif-gene set pairs from the gene expression data. These pairs were then tested for context dependence.
When groups of promoters are analyzed together for motif context there is a risk that the results will be misleading if the promoters are highly conserved. Thus, if there is high sequence conservation among a group of promoters, the location and orientation bias that we may find will not be informative, since such bias would be detected for almost any sequence present in the promoters. To handle this, we checked for conservation for each analyzed motif-experiment pair that had a significant location or orientation bias, by aligning all selected promoters containing the motif of interest. The alignment was done in ClustalW (implemented in the R-library dna, ), using default parameters (gap opening penalty 15 and gap extension penalty 6). Twelve cases from the DNA binding data and three from the expression data with highly conserved promoters were removed from the subsequent analysis. See the additional file 1: Table S1 and additional file 2: Table S2.
Validation against other datasets
To check whether motif position had any effect on gene expression, microarray data from yeast deletion strains  was used. T-tests were used to compare expression of genes whose promoters contained the motif of interest in the preferred location, against expression of genes whose promoters contained the motif in some other location.
where, p i is the number of available promoters at i base pairs upstream of the start codon and M i is defined as above. We then computed the correlation between n and k for each motif-experiment pair. The correlations for cases with and without location bias were then compared using Wilcoxon's rank-sum-test.
The authors wish to thank Lars Rönnegård for helpful discussions. This work was supported by a grant from the WCN program of the Knut and Alice Wallenberg Foundation to JK and HR, by grants from Swedish Foundation for Strategic Research to JK and HR, and by a grant from the Swedish Research Council VR to HR.
- Sandelin A, Alkema W, Engstrom P, Wasserman WW, Lenhard B: JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res 2004, (32 Database):D91–94. 10.1093/nar/gkh012Google Scholar
- Wingender E, Chen X, Fricke E, Geffers R, Hehl R, Liebich I, Krull M, Matys V, Michael H, Ohnhauser R, et al.: The TRANSFAC system on gene expression regulation. Nucleic Acids Res 2001, 29(1):281–283. 10.1093/nar/29.1.281PubMed CentralView ArticlePubMedGoogle Scholar
- Bussemaker HJ, Li H, Siggia ED: Regulatory element detection using correlation with expression. Nat Genet 2001, 27(2):167–171. 10.1038/84792View ArticlePubMedGoogle Scholar
- Hughes JD, Estep PW, Tavazoie S, Church GM: Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol 2000, 296(5):1205–1214. 10.1006/jmbi.2000.3519View ArticlePubMedGoogle Scholar
- Liu X, Brutlag DL, Liu JS: BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac Symp Biocomput 2001, 127–138.Google Scholar
- Pavesi G, Mauri G, Pesole G: An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics 2001, 17(Suppl 1):S207–214.View ArticlePubMedGoogle Scholar
- Siggia ED: Computational methods for transcriptional regulation. Curr Opin Genet Dev 2005, 15(2):214–221. 10.1016/j.gde.2005.02.004View ArticlePubMedGoogle Scholar
- Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM: Systematic determination of genetic network architecture. Nat Genet 1999, 22(3):281–285. 10.1038/10343View ArticlePubMedGoogle Scholar
- Tompa M, Li N, Bailey TL, Church GM, De Moor B, Eskin E, Favorov AV, Frith MC, Fu Y, Kent WJ, et al.: Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol 2005, 23(1):137–144. 10.1038/nbt1053View ArticlePubMedGoogle Scholar
- Andersson CR, Hvidsten TR, Isaksson A, Gustafsson MG, Komorowski J: Revealing cell cycle control by combining model-based detection of periodic expression with novel cis-regulatory descriptors. BMC Syst Biol 2007, 1: 45. 10.1186/1752-0509-1-45PubMed CentralView ArticlePubMedGoogle Scholar
- Banerjee N, Zhang MQ: Identifying cooperativity among transcription factors controlling the cell cycle in yeast. Nucleic Acids Res 2003, 31(23):7024–7031. 10.1093/nar/gkg894PubMed CentralView ArticlePubMedGoogle Scholar
- Beer MA, Tavazoie S: Predicting gene expression from sequence. Cell 2004, 117(2):185–198. 10.1016/S0092-8674(04)00304-6View ArticlePubMedGoogle Scholar
- Bernstein BE, Liu CL, Humphrey EL, Perlstein EO, Schreiber SL: Global nucleosome occupancy in yeast. Genome Biol 2004, 5(9):R62. 10.1186/gb-2004-5-9-r62PubMed CentralView ArticlePubMedGoogle Scholar
- Hvidsten TR, Wilczynski B, Kryshtafovych A, Tiuryn J, Komorowski J, Fidelis K: Discovering regulatory binding-site modules using rule-based learning. Genome Res 2005, 15(6):856–866. 10.1101/gr.3760605PubMed CentralView ArticlePubMedGoogle Scholar
- Kato M, Hata N, Banerjee N, Futcher B, Zhang MQ: Identifying combinatorial regulation of transcription factors and binding motifs. Genome Biol 2004, 5(8):R56. 10.1186/gb-2004-5-8-r56PubMed CentralView ArticlePubMedGoogle Scholar
- Pilpel Y, Sudarsanam P, Church GM: Identifying regulatory networks by combinatorial analysis of promoter elements. Nat Genet 2001, 29(2):153–159. 10.1038/ng724View ArticlePubMedGoogle Scholar
- Yu X, Lin J, Masuda T, Esumi N, Zack DJ, Qian J: Genome-wide prediction and characterization of interactions between transcription factors in Saccharomyces cerevisiae. Nucleic Acids Res 2006, 34(3):917–927. 10.1093/nar/gkj487PubMed CentralView ArticlePubMedGoogle Scholar
- Rada-Iglesias A, Ameur A, Kapranov P, Enroth S, Komorowski J, Gingeras TR, Wadelius C: Whole-genome maps of USF1 and USF2 binding and histone H3 acetylation reveal new aspects of promoter structure and candidate genes for common human disorders. Genome Res 2008, 18(3):380–392. 10.1101/gr.6880908PubMed CentralView ArticlePubMedGoogle Scholar
- Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, Hannett NM, Tagne JB, Reynolds DB, Yoo J, et al.: Transcriptional regulatory code of a eukaryotic genome. Nature 2004, 431(7004):99–104. 10.1038/nature02800PubMed CentralView ArticlePubMedGoogle Scholar
- Pavesi G, Zambelli F, Pesole G: WeederH: an algorithm for finding conserved regulatory motifs and regions in homologous sequences. BMC Bioinformatics 2007, 8: 46. 10.1186/1471-2105-8-46PubMed CentralView ArticlePubMedGoogle Scholar
- Yuan Y, Guo L, Shen L, Liu JS: Predicting gene expression from sequence: a reexamination. PLoS Comput Biol 2007, 3(11):e243. 10.1371/journal.pcbi.0030243PubMed CentralView ArticlePubMedGoogle Scholar
- Nguyen DH, D'Haeseleer P: Deciphering principles of transcription regulation in eukaryotic genomes. Mol Syst Biol 2006., 2:Google Scholar
- Tabach Y, Brosh R, Buganim Y, Reiner A, Zuk O, Yitzhaky A, Koudritsky M, Rotter V, Domany E: Wide-scale analysis of human functional transcription factor binding reveals a strong bias towards the transcription start site. PLoS ONE 2007, 2(8):e807. 10.1371/journal.pone.0000807PubMed CentralView ArticlePubMedGoogle Scholar
- Elemento O, Slonim N, Tavazoie S: A universal framework for regulatory element discovery across all genomes and data types. Mol Cell 2007, 28(2):337–350. 10.1016/j.molcel.2007.09.027PubMed CentralView ArticlePubMedGoogle Scholar
- Lee CK, Shibata Y, Rao B, Strahl BD, Lieb JD: Evidence for nucleosome depletion at active regulatory regions genome-wide. Nat Genet 2004, 36(8):900–905. 10.1038/ng1400View ArticlePubMedGoogle Scholar
- Lee W, Tillo D, Bray N, Morse RH, Davis RW, Hughes TR, Nislow C: A high-resolution atlas of nucleosome occupancy in yeast. Nat Genet 2007, 39(10):1235–1244. 10.1038/ng2117View ArticlePubMedGoogle Scholar
- Liu X, Lee CK, Granek JA, Clarke ND, Lieb JD: Whole-genome comparison of Leu3 binding in vitro and in vivo reveals the importance of nucleosome occupancy in target site selection. Genome Res 2006, 16(12):1517–1528. 10.1101/gr.5655606PubMed CentralView ArticlePubMedGoogle Scholar
- Yuan GC, Liu YJ, Dion MF, Slack MD, Wu LF, Altschuler SJ, Rando OJ: Genome-scale identification of nucleosome positions in S. cerevisiae. Science 2005, 309(5734):626–630. 10.1126/science.1112178View ArticlePubMedGoogle Scholar
- Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, Storz G, Botstein D, Brown PO: Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell 2000, 11(12):4241–4257.PubMed CentralView ArticlePubMedGoogle Scholar
- Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, Armour CD, Bennett HA, Coffey E, Dai H, He YD, et al.: Functional discovery via a compendium of expression profiles. Cell 2000, 102(1):109–126. 10.1016/S0092-8674(00)00015-5View ArticlePubMedGoogle Scholar
- Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 1998, 9(12):3273–3297.PubMed CentralView ArticlePubMedGoogle Scholar
- Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I, et al.: Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 2002, 298(5594):799–804. 10.1126/science.1075090View ArticlePubMedGoogle Scholar
- David L, Huber W, Granovskaia M, Toedling J, Palm CJ, Bofkin L, Jones T, Davis RW, Steinmetz LM: A high-resolution map of transcription in the yeast genome. Proc Natl Acad Sci USA 2006, 103(14):5320–5325. 10.1073/pnas.0601091103PubMed CentralView ArticlePubMedGoogle Scholar
- Hurowitz EH, Brown PO: Genome-wide analysis of mRNA lengths in Saccharomyces cerevisiae. Genome Biol 2003, 5(1):R2. 10.1186/gb-2003-5-1-r2PubMed CentralView ArticlePubMedGoogle Scholar
- Miura F, Kawaguchi N, Sese J, Toyoda A, Hattori M, Morishita S, Ito T: A large-scale full-length cDNA analysis to explore the budding yeast transcriptome. Proc Natl Acad Sci USA 2006, 103(47):17846–17851. 10.1073/pnas.0605645103PubMed CentralView ArticlePubMedGoogle Scholar
- Context Finder[http://contextfinder.lcb.uu.se/]
- Boorsma A, Foat BC, Vis D, Klis F, Bussemaker HJ: T-profiler: scoring the activity of predefined groups of genes using gene expression data. Nucleic Acids Res 2005, (33 Web Server):W592–595. 10.1093/nar/gki484Google Scholar
- Yarragudi A, Parfrey LW, Morse RH: Genome-wide analysis of transcriptional dependence and probable target sites for Abf1 and Rap1 in Saccharomyces cerevisiae. Nucleic Acids Res 2007, 35(1):193–202. 10.1093/nar/gkl1059PubMed CentralView ArticlePubMedGoogle Scholar
- McCullagh P, Nelder JA: Generalized Linear Models. 2nd edition. Chapman & Hall/CRC; 1989.View ArticleGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B 1995, 57: 289–300.Google Scholar
- Hastie TJ, Pregibon D: Generalized linear models. In Statistical Models in S. Edited by: Chambers JM, Hastie TJ. Wadsworth & Brooks/Cole; 1992.Google Scholar
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, et al.: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 2004, 5(10):R80. 10.1186/gb-2004-5-10-r80PubMed CentralView ArticlePubMedGoogle Scholar
- van Helden J: Regulatory sequence analysis tools. Nucleic Acids Res 2003, 31(13):3593–3596. 10.1093/nar/gkg567PubMed CentralView ArticlePubMedGoogle Scholar
- Boorstein WR, Craig EA: Regulation of a yeast HSP70 gene by a cAMP responsive transcriptional control element. EMBO J 1990, 9(8):2543–2553.PubMed CentralPubMedGoogle Scholar
- Statistical Libraries[http://popgen.unimaas.nl/~jlindsey/rcode.html]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.