- Research article
- Open Access
Flanking sequence context-dependent transcription factor binding in early Drosophila development
- Jessica L Stringham†1,
- Adam S Brown†2,
- Robert A Drewell2, 3, 4 and
- Jacqueline M Dresch5, 6Email author
© Stringham et al.; licensee BioMed Central Ltd. 2013
- Received: 29 May 2013
- Accepted: 24 September 2013
- Published: 4 October 2013
Gene expression in the Drosophila embryo is controlled by functional interactions between a large network of protein transcription factors (TFs) and specific sequences in DNA cis-regulatory modules (CRMs). The binding site sequences for any TF can be experimentally determined and represented in a position weight matrix (PWM). PWMs can then be used to predict the location of TF binding sites in other regions of the genome, although there are limitations to this approach as currently implemented.
In this proof-of-principle study, we analyze 127 CRMs and focus on four TFs that control transcription of target genes along the anterio-posterior axis of the embryo early in development. For all four of these TFs, there is some degree of conserved flanking sequence that extends beyond the predicted binding regions. A potential role for these conserved flanking sequences may be to enhance the specificity of TF binding, as the abundance of these sequences is greatly diminished when we examine only predicted high-affinity binding sites.
Expanding PWMs to include sequence context-dependence will increase the information content in PWMs and facilitate a more efficient functional identification and dissection of CRMs.
- Transcription factor
- Binding site
- Position weight matrix
- Cis-regulatory module
The control of gene expression during development in Drosophila and other metazoans is tightly directed by cis-acting regulatory sequences in the genome. These DNA sequences modulate expression of target genes by binding protein transcription factors (TFs) . Contact between a TF and DNA sequence is mediated through the TF’s DNA binding domain(s) in a sequence dependent manner [2-4]. Each TF has one or more of a variety of different DNA binding domains, including zinc fingers and homeoboxes [5-9]. Significant efforts have been undertaken to comprehend the organization of DNA sequence at known binding regions and further understand how this influences the ability of a TF to bind.
A host of computational tools have been developed that seek to streamline the discovery of de novo TF binding regions using PWMs [10, 14, 19, 20]. However, a major limitation of PWMs is their potential to lose information content during construction. The lengths of PWMs are often determined based on an optimal alignment between minimal sequences of varying length, potentially eliminating bordering regions crucial to determining a TF’s binding preference [21-23]. Extending PWMs may therefore serve to increase their information content, and thus their predictive power . One limitation resulting from the experimental approaches to isolate TF-bound DNA fragments [25, 26], is that there may be additional, but non-contiguous, bases that are fundamentally important to TF binding initiation (or transient TF-DNA binding) which are not represented in the experimental data and therefore not taken into account during traditional PWM construction. A potential explanation for this lack of information content in canonical PWM construction is the omission of secondary binding by TFs with multiple DNA binding domains . For example, in Drosophila the HUNCHBACK TF has two distinct C2H2-type zinc-finger binding domains . If multiple DNA binding domains contact sequences separately then each domain may contribute to the overall binding of the TF. Accordingly, in the case where there are two binding domains, one of the DNA binding regions may be either: a) discarded because it fails to meet minimal fragment size requirements or b) incorporated into a combined alignment along with the sequences representing regions bound by the other binding domain. Either of these scenarios may lower the information content of the PWM. The first scenario may result in a PWM that does not include all nucleotides necessary for in vivo binding (i.e. a PWM representing the actual binding region may be longer than that which is constructed from the current experimental data). The second, on the other hand, points to an even larger problem in PWM construction: the possibility that a TF may have two different modes of binding, and thus a single unique PWM is insufficient to predict all DNA binding regions, which there has been strong evidence to support in the case of mammalian DNA binding proteins .
To address the limitations of PWMs, we align and analyze predicted binding regions for four well-studied TFs in 127 cis-regulatory modules (CRMs) that are essential to direct gene expression along the anterio-posterior axis in early Drosophila development. Our analysis indicates that the current PWMs for all four TFs examined exclude significant biases towards a given base, or bases, in specific positions in the neighboring sequences and that the information content of these PWMs can be improved by including these additional sequences.
Cis-regulatory module and flanking sequences
The PWMs we use for CAUDAL (CAD) , HUNCHBACK (HB), KNIRPS (KNI) and KRUPPEL (KR)  are as previously described. For our analysis, we run PATSER  with default settings (i.e.: the total number of pseudo-counts is set to 1 and the background sequence A/T content is 0.3 and C/G content is 0.2). To determine score (ln(p-value)) cutoffs, we first observe the distribution of scores PATSER assigns, using each of the four individual PWMs, to all the known binding regions used to construct each of the original respective PWMs. The cutoffs used are calculated by taking the 75th and 50th percentile cutoff of all these scores, and are referred to as 'strong’ and 'weak’ cutoffs respectively. We then run PATSER on each of the original 127 CRMs (excluding flanking regions) with each of the four PWMs to predict binding regions. Only those regions scoring above the respective cutoff are used for further analysis and are referred to as core PWM predicted binding regions (PWM-PBRs). Note that scores are all negative, so 'scoring above’ refers to PATSER outputting a ln(p-value) less than or equal to the cutoff. The strong cutoff is more stringent, only predicting binding regions that receive a score less than or equal to that obtained from the top 25% of known binding region scores (108 CAD, 157 HB, 79 KNI, and 18 KR sites), representing binding regions that are most similar to the consensus core binding region for the given TF. The weak cutoff is less stringent, predicting binding regions that receive a score less than or equal to that obtained from the top 50% of known binding region scores (430 CAD, 450 HB, 359 KNI, and 127 KR sites), representing binding regions that are contained in a larger range of similarity to the consensus core binding region for the given TF. One should note that these cutoff scores are TF-specific and are different for each of the four TFs analyzed. In cases where overlapping binding regions were identified, both regions are included in all subsequent analyses. Lists of all the known binding regions for each TF, their corresponding ln(p-value) obtained using PATSER, and whether they fall into the 75th percentile, 50th percentile, or neither are available in the Additional file 3: Table S1.
For each TF and cutoff score, given the list of PBRs (including flanking regions on both sides of the core PWM-PBRs), we run a Chi-squared test on each position with the null hypothesis that at any given location, the distribution of A/C/G/T is exactly the same as the genome-wide distribution , A(0.3):C(0.2):G(0.2):T(0.3). We note here that the overall nucleotide frequency in the 127 extended CRMs, A(0.2845):C(0.2155):G(0.2155):T(0.2845) is not significantly different from the genome-wide distribution (χ2 test, p-value > 0.95). We analyze the results of these tests at three different confidence levels α = 0.05, α = 0.01 and α = 0.001. We choose more than one confidence level to control the familywise error rate for multiple comparisons. A simple Bonferroni correction leads to a corrected alpha value obtained by dividing alpha by the number of Chi-squared tests (ie: in the n = 25 case, α = 0.05/50 = 0.001 is the Bonferroni corrected value corresponding to α = 0.05). Thus, although the three alpha values stated can be interpreted without a Bonferroni correction, α = 0.001 can also be interpreted as a Bonferonni corrected alpha corresponding to α = 0.05. The Chi-squared values obtained are shown for n = 25 in Figures 3 and 4 (top bars on each graph) and actual values for each nucleotide are in the Additional file 4: Table S2. Note that in Figures 3 and 4, the color-coding corresponds to the smallest alpha value of those analyzed in which the null hypothesis is rejected.
The web application that is used to run this analysis is freely available for non-commercial use at: http://drewell.sites.hmc.edu/projects/sequence_context_grapher.html.
JASPAR database search
Using the alignment produced by the bioinformatic analysis for the 'weak’ (50th percentile) cutoff described above, a single PWM was constructed from a portion of the flanking sequence of HB showing statistically significant nucleotide bias (-9 to -1 relative to the core PWM-PBR) for use in a JASPAR alignment search . The top 5 Drosophila melanogaster TFs that give similarity scores to this PWM (similarity scores > 86%) are manually annotated for expression pattern in early (stages 1-5) embryos .
Testing predictions for expanded HB PWMs
When expanding the HB PWM, we use a 2-step process. First, we choose an initial cutoff score and predict and align all binding sites using the original core PWM as described in 'Bioinformatic Analysis’. The PWM-PBRs identified with the original core PWM using this initial cutoff score are then extended by including flanking regions of interest (ranging from -25 to +25). An extended PWM is constructed from the base frequencies of these extended PWM-PBRs. Next, PATSER is used with the extended PWM to determine the score for each of the extended PWM-PBRs constructed from the set of predicted HB binding sites with the initial cutoff score. Computing the percentiles of these scores, in the same way as is described for the original PWM analysis in 'Bioinformatic Analysis’, we obtain a secondary cutoff score. Lists of the extended PWM-PBRs, their corresponding ln(p-value) obtained using PATSER with the corresponding extended HB PWM, and whether they fall into the 0th, 25th, 50th or 75th percentile are available in the Additional file 5: Table S3. We run Patser again with the extended PWM and secondary cutoff score on the IAB7b CRM, which contains one known functional HB binding site . This allows us to then compare the location of the predicted binding sites obtained using extended HB PWMs of varying lengths to the known HB binding site to determine which PWMs result in the lowest number of false positive and false negative predictions.
Analyzing expanded PWMs using ChIP-seq datasets
To analyze the predictive power of the extended PWMs, we first choose initial cutoff scores representing the 25th and 50th percentile scores and generate PWM-PBRs for each of the four TFs as described in 'Bioinformatic Analysis’. We then extend those core PWM-PBRs (core) to include the core and all highly significant (χ2 test, α < 0.001) flanking sequence context-dependent biases (extended), as well as the core with -25 to +25 flanking regions used to generate the PWM-PBR (full). We use a secondary cutoff score corresponding to the 0th percentile as described in 'Testing Predictions for Expanded HB PWMs’. ChIP-seq datasets for each TF from the BDGP  are filtered to include only those peaks with more than 100 bp of sequence. We run PATSER with each of the three different PWMs for each TF on their respective TF peaks and score a true positive prediction when the PWM predicts at least one TF binding site. For each ChIP-seq peak, we calculate the nucleotide distribution within the peak and create 10 'scrambled peaks’, random DNA sequences of the same length and nucleotide distribution. We then run PATSER with each of the three different PWMs for each TF on these scrambled peaks and score a false positive prediction when the PWM predicts at least one TF binding site. Both the true positive and false positive results for the 25th and 50th percentile initial cutoff scores at the 0th, 25th, 50th, and 75th percentile secondary cutoff scores are available in the Additional file 6: Table S4.
When considering the possibility of sequence context-dependence for TF binding, evidence has pointed toward the existence of nucleotide biases at positions in close proximity to a region experimentally verified or computationally predicted to bind a TF [5, 37]. To test this idea, we analyze 127 CRMs that are active during early Drosophila development for predicted binding regions for four TFs using PATSER (see Methods for details). These four TFs: CAUDAL (CAD), HUNCHBACK (HB), KNIRPS (KNI) and KRUPPEL (KR) are all critical for normal development and are present in spatially restricted patterns along the anterio-posterior axis in early embryogenesis . A number of in vivo confirmed minimal binding regions have been characterized for these TFs and the existing PWMs for each of these factors range in size from 7 to 9 bp (Figure 1) [30, 31]. Of greatest importance for this study, their current canonical PWMs have been proven to have greater predictive power for experimentally validated TF binding regions, when compared to other published PWMs . If context-dependent biases are present in sequences near these characterized binding regions, we predict that these bases would be evolutionarily conserved. By combining PATSER  analysis with EvoPrinterHD  analysis, we are able to identify several examples of extended regions of sequence conservation surrounding evolutionarily conserved TF binding regions, including portions of the even-skipped stripe 5 CRM and the paired stripe 2 CRM (Figure 2). In all cases within the depicted portions of the even-skipped stripe 5 CRM and paired stripe 2 CRM, predicted TF binding regions are flanked by substantial extended sequence conservation on one or both sides. This presents a testable hypothesis: that these regions of extended conservation contain functionally important flanking bases that are important for robust TF binding.
To address the hypothesis that there may be sequence context-dependent binding for the four TFs, we investigate the sequences 25 bp up- and downstream of defined core PWM-predicted binding regions (PWM-PBRs) (described in detail in Methods). Alignment of the core PWM-PBRs and their flanking regions for each individual TF does indeed reveal a statistically significant enrichment of certain bases outside of the core PWM-PBRs (Chi-squared test, α < 0.05). A very clear example of this enrichment with high statistical significance (Chi-squared test, α = 0.001) is found at binding regions for HB. Using the weak cutoff value (see Methods for details), beyond the HB core PWM-PBR there is context-dependent bias at the first two and the 7th nucleotide downstream (+1, +2 and +7) of the core PWM-PBR (Figure 3). In addition, there are four clusters of context-dependent bias upstream of the HB core PWM-PBR at positions -1, -3 to -5, -8 to -9 and -19 (Figure 3). KNI seems to follow a similar pattern to HB, with nucleotide enrichment bias at 5 positions downstream and 7 upstream of the core PWM-PBR (Figure 3). This enrichment bias is also seen for CAD and KR (Figure 4), but is not as prevalent. CAD and KR display only short stretches of sequence with robust context-dependent bias, and in both cases these sequences are largely contiguous to the core PWM-PBRs (Figure 4). For all four TFs the enrichment biases at positions neighboring the defined core PWM-PBRs could be incorporated in to expanded PWMs.
A further testable explanation for the fact that all four TFs exhibit context-dependent biases at their extended binding regions may be that these TFs have multiple DNA binding domains, each of which contacts different nucleotides independently , but together act to increase the TF’s binding affinity for target sequences. If this is the case, these secondary DNA binding domains may enhance, but not replace the function of the primary, canonical binding domain in the TF protein. To assess this hypothesis, we compare the binding regions predicted by PATSER using both strong and weak cutoff scores (see Methods). While many of the nucleotide position biases persist at either cutoff score, we find a general decrease in the number and significance of context-dependent biases for all four TFs when we only consider strong sites (Figures 3 and 4 and Additional file 4: Table S2). For example, context-dependent biases within 15 bp of the KNI PWM-PBR are greatly reduced when we only consider strong sites - of the 12 enriched biases found with the weak cutoff, only one remains (Figure 3). The range of decrease across the four TFs is variable (Figures 3 and 4). Overall, these results suggest two separate binding regimes: (1) if a TF has secondary, non-contiguous binding, stronger core binding regions may overcome a paucity of context-dependent biases in the flanking sequences, whereas weaker core binding regions may depend more heavily on these biases; (2) if a TF exhibits contiguous biases, these biases may simply suggest a larger canonical core binding region. However, in some cases, there are nucleotides that are found to be significant only when the strong cutoff score is used. For example, two new biases are detected at +20 and -23 relative to the KNI PWM-PBR (Figure 3). This finding supports a hypothesis that strong and weak core binding regions may in fact have two different functional roles that allow the TF to bind in a different mode, again posing the question of whether it is valid or not to represent a TF binding with one unique PWM .
Taken together these data suggest that current PWMs may not be optimal to explain the complexity of TF binding. Although we only test four TFs in this study, we demonstrate that all four TFs exhibit context-dependent biases towards given nucleotides both contiguously with the defined minimal binding region and non-contiguously in flanking DNA sequences, thus providing a foundation for this to be explored more broadly. An additional intriguing question for future study will be to investigate if the context-dependent bias persists at predicted TF binding regions in other genomic regions that are not characterized as CRMs. By taking these secondary context-dependencies into account, we propose that the information content of PWMs can be expanded in many cases. This expansion would not only provide better predictions of true TF binding regions in the genome, but may also help improve estimates of relative binding affinities at specific sites, allowing one to understand the molecular basis for the difference between weak and strong binding sites. The ability to identify novel CRMs and decipher the sequence organization at CRMs relies heavily on a concrete understanding of TF binding preferences. Improving the information content of PWMs and our comprehension of TF binding events will contribute to these continued efforts.
Availability of supporting data
The data sets supporting the results of this article are included within the article (and its Additional file 3: Table S1, Additional file 4: Table S2 and Additional file 5: Table S3, Additional file 1: Datasets S1 and Additional file 2: Datasets S2, and Supporting legends.)
The research in this paper was supported by funding to R.A.D. from the National Institutes of Health (GM090167), the National Science Foundation (IOS-0845103) and Howard Hughes Medical Institute Undergraduate Science Education Program grants (520051213 and 52006301) to the Biology department at Harvey Mudd College. J.M.D. was funded as a Teaching and Research Postdoctoral Fellow, supported in part by NSF Grant DMS-0839966, and through research funds provided by Amherst College.
- Ptashne M: Gene regulation by proteins acting nearby and at a distance. Nature. 1986, 6081: 697-701.View ArticleGoogle Scholar
- Badis G, Berger MF, Philippakis AA, Talukder S, Gehrke AR, Jaeger SA, Chan ET, Metzler G, Vedenko A, Chen X, et al: Diversity and complexity in DNA recognition by transcription factors. Science. 2009, 324: 1720-1723. 10.1126/science.1162327.PubMed CentralView ArticlePubMedGoogle Scholar
- Mitchell PJ, Tjian R: Transcriptional regulation in mammalian cells by sequence-specific DNA binding proteins. Science. 1989, 245: 371-378. 10.1126/science.2667136.View ArticlePubMedGoogle Scholar
- Ptashne M, Gann A: Transcriptional activation by recruitment. Nature. 1997, 6625: 569-577.View ArticleGoogle Scholar
- Jolma A, Yan J, Whitington T, Toivonen J, Nitta KR, Rastas P, Morgunova E, Enge M, Taipale M, Wei G, et al: DNA-binding specificities of human transcription factors. Cell. 2013, 152: 327-339. 10.1016/j.cell.2012.12.009.View ArticlePubMedGoogle Scholar
- Kadonaga JT: Regulation of RNA polymerase II transcription by sequence-specific DNA binding factors. Cell. 2004, 116: 247-257. 10.1016/S0092-8674(03)01078-X.View ArticlePubMedGoogle Scholar
- Mlodzik M, Fjose A, Gehring WJ: Isolation of caudal, a Drosophila homeo box-containing gene with maternal expression, whose transcripts form a concentration gradient at the pre-blastoderm stage. EMBO J. 1985, 4: 2961-2969.PubMed CentralPubMedGoogle Scholar
- Rothe M, Nauber U, Jäckle H: Three hormone receptor-like Drosophila genes encode an identical DNA-binding finger. EMBO J. 1989, 8: 3087-3094.PubMed CentralPubMedGoogle Scholar
- Sommer RJ, Retzlaff M, Goerlich K, Sander K, Tautz D: Evolutionary conservation pattern of zinc-finger domains of drosophila segmentation genes. Proc Natl Acad Sci USA. 1992, 89: 10782-10786. 10.1073/pnas.89.22.10782.PubMed CentralView ArticlePubMedGoogle Scholar
- Bailey TL, Bodén M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS: MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 2009, 37: W202-W208. 10.1093/nar/gkp335.PubMed CentralView ArticlePubMedGoogle Scholar
- Berg OG, von Hippel PH: Selection of DNA binding sites by regulatory proteins: statistical-mechanical theory and application to operators and promoters. J Mol Biol. 1987, 193: 723-743. 10.1016/0022-2836(87)90354-8.View ArticlePubMedGoogle Scholar
- Djordjevic M, Sengupta AM, Shraiman BI: A biophysical approach to transcription factor binding site discovery. Genome Res. 2003, 13: 2381-2390. 10.1101/gr.1271603.PubMed CentralView ArticlePubMedGoogle Scholar
- Hertz GZ, Hartzell GW, Stormo GD: Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Comput Appl Biosci. 1990, 6: 81-92.PubMedGoogle Scholar
- Hertz GZ, Stormo GD: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics. 1999, 15: 563-577. 10.1093/bioinformatics/15.7.563.View ArticlePubMedGoogle Scholar
- Morozov AV, Havranek JJ, Baker D, Siggia ED: Protein-DNA binding specificity predictions with structural models. Nucleic Acids Res. 2005, 33: 5781-5798. 10.1093/nar/gki875.PubMed CentralView ArticlePubMedGoogle Scholar
- Whitington T, Frith MC, Johnson J, Bailey TL: Inferring transcription factor complexes from ChIP-seq data. Nucleic Acids Res. 2011, 39: e98-10.1093/nar/gkr341.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhu LJ, Christensen RG, Kazemian M, Hull CJ, Enuameh MS, Basciotta MD, Brasefield JA, Zhu C, Asriyan Y, Lapointe DS, et al: FlyFactorSurvey: a database of Drosophila transcription factor binding specificities determined using the bacterial one-hybrid system. Nucleic Acids Res. 2011, 39: D111-D117. 10.1093/nar/gkq858.PubMed CentralView ArticlePubMedGoogle Scholar
- Gershenzon NI, Stormo GD, Ioshikhes IP: Computational technique for improvement of the position-weight matrices for the DNA/protein binding sites. Nucleic Acids Res. 2005, 33: 2290-2301. 10.1093/nar/gki519.PubMed CentralView ArticlePubMedGoogle Scholar
- Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, Hornischer K, et al: TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 2006, 34: D108-D110. 10.1093/nar/gkj143.PubMed CentralView ArticlePubMedGoogle Scholar
- Sandelin A, Alkema W, Engstrom P, Wasserman W, Lenhard B: JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 2004, 32: D91-D94. 10.1093/nar/gkh012.PubMed CentralView ArticlePubMedGoogle Scholar
- Aerts S: Computational strategies for the genome-wide identification of cis-regulatory elements and transcriptional targets. Curr Top Dev Biol. 2012, 98: 121-145.View ArticlePubMedGoogle Scholar
- Van Loo P, Marynen P: Computational methods for the detection of cis-regulatory modules. Breifings Bioinformatics. 2009, 10: 509-524. 10.1093/bib/bbp025.View ArticleGoogle Scholar
- Van Nimwegen E: Finding regulatory elements and regulatory motifs: a general probabilistic framework. BMC Bioinforma. 2007, 8 (6): S4-View ArticleGoogle Scholar
- Stormo GD: Maximally efficient modeling of DNA sequence motifs at all levels of complexity. Genetics. 2011, 187: 1219-1224. 10.1534/genetics.110.126052.PubMed CentralView ArticlePubMedGoogle Scholar
- Boyle AP, Song L, Lee BK, London D, Keefe D, Birney E, Iyer VR, Crawford GE, Furey TS: High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome Res. 2011, 21: 456-464. 10.1101/gr.112656.110.PubMed CentralView ArticlePubMedGoogle Scholar
- Landt SG, Marinov GK, Kundaje A, Kheradpour P, Pauli F, Batzoglou S, Bernstein BE, Bickel P, Brown JB, Cayting P, et al: ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 2012, 22: 1813-1831. 10.1101/gr.136184.111.PubMed CentralView ArticlePubMedGoogle Scholar
- McQuilton P, St Pierre SE, Thurmond J, Consortium F: FlyBase 101--the basics of navigating FlyBase. Nucleic Acids Res. 2012, 40: D706-D714. 10.1093/nar/gkr1030.PubMed CentralView ArticlePubMedGoogle Scholar
- Gallo SM, Gerrard DT, Miner D, Simich M, Des Soye B, Bergman CM, Halfon MS: REDfly v3.0: Toward a comprehensive database of transcriptional regulatory elements in drosophila. Nucleic Acids Res. 2011, 21: 456-464.Google Scholar
- Odenwald WF, Rasband W, Kuzin A, Brody T: EVOPRINTER, a multigenomic comparative tool for rapid identification of functionally important DNA. Proc Natl Acad Sci USA. 2005, 102: 14700-14705. 10.1073/pnas.0506915102.PubMed CentralView ArticlePubMedGoogle Scholar
- Noyes MB, Christensen RG, Wakabayashi A, Stormo GD, Brodsky MH, Wolfe SA: Analysis of homeodomain specificities allows the family-wide prediction of preferred recognition sites. Cell. 2008, 133: 1277-1289. 10.1016/j.cell.2008.05.023.PubMed CentralView ArticlePubMedGoogle Scholar
- Ho MC, Johnsen H, Goetz SE, Schiller BJ, Bae E, Tran DA, Shur ASA JM, Rau C, Bender W, Fisher WW, et al: Functional evolution of cis-regulatory modules at a homeotic gene in Drosophila. PLoS Genet. 2009, 5: e1000709-10.1371/journal.pgen.1000709.PubMed CentralView ArticlePubMedGoogle Scholar
- Herold J, Kurtz S, Giegerich R: Efficient computation of absent words in genomic sequences. BMC Bioinforma. 2008, 9: doi:10.1186/1471-2105-1189-1167Google Scholar
- Portales-Casamar E, Thongjuea S, Kwon AT, Arenillas D, Zhao X, Valen E, Yusuf D, Lenhard B, Wasserman WW, Sandelin A: JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res. 2010, 38: D105-D110. 10.1093/nar/gkp950.PubMed CentralView ArticlePubMedGoogle Scholar
- Tomancak P, Beaton A, Weiszmann R, Kwan E, Shu S, Lewis SE, Richards S, Ashburner M, Hartenstein V, Celniker SE, et al: Systematic determination of patterns of gene expression during Drosophila embryogenesis. Genome Biol. 2002, 3: RESEARCH0088-PubMed CentralView ArticlePubMedGoogle Scholar
- Starr MO, Ho MC, Gunther EJM, Tu Y-K, Shur AS, Goetz SE, Borok MJ, Kang V, Drewell RA: Molecular dissection of cis-regulatory modules at the Drosophila bithorax complex reveals critical transcription factor signature motifs. Dev Biol. 2011, 359: 290-302. 10.1016/j.ydbio.2011.07.028.PubMed CentralView ArticlePubMedGoogle Scholar
- MacArthur S, Li XY, Li J, Brown JB, Chu HC, Zeng L, Grondona BP, Hechmer A, Simirenko L, Keränen SV, et al: Developmental roles of 21 Drosophila transcription factors are determined by quantitative differences in binding to an overlapping set of thousands of genomic regions. Genome Biol. 2009, 10: R80-10.1186/gb-2009-10-7-r80.PubMed CentralView ArticlePubMedGoogle Scholar
- Siggers T, Duyzend MH, Reddy J, Khan S, Bulyk ML: Non-DNA-binding cofactors enhance DNA-binding specificity of a transcriptional regulatory complex. Mol Syst Biol. 2011, 7: 555-PubMed CentralView ArticlePubMedGoogle Scholar
- Borok MJ, Tran DA, Ho MC, Drewell RA: Dissecting the regulatory switches of development: lessons from enhancer evolution in Drosophila. Development. 2010, 137: 5-13. 10.1242/dev.036160.PubMed CentralView ArticlePubMedGoogle Scholar
- Mihaly J, Barges S, Sipos L, Maeda R, Cleard F, Hogga I, Bender W, Gyurkovics H, Karch F: Dissecting the regulatory landscape of the Abd-B gene of the bithorax complex. Development. 2006, 133 (15): 2983-2993. 10.1242/dev.02451.View ArticlePubMedGoogle Scholar
- Zhou J, Ashe H, Burks C, Levine M: Characterization of the transvection mediating region of the abdominal-B locus in Drosophila. Development. 1999, 126 (14): 3057-3065.PubMedGoogle Scholar
- Filippova GN, Fagerlie S, Klenova EM, Myers C, Dehner Y, Goodwin G, Neiman PE, Collins SJ, Lobanenkov VV: An exceptionally conserved transcriptional repressor, CTCF, employs different combinations of zinc fingers to bind diverged promoter sequences of avian and mammalian c-myc oncogenes. Mol Cell Biol. 1996, 16 (6): 2802-2813.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.