- Research article
- Open Access
Novel computational analysis of protein binding array data identifies direct targets of Nkx2.2 in the pancreas
© Hill et al; licensee BioMed Central Ltd. 2011
- Received: 28 September 2010
- Accepted: 25 February 2011
- Published: 25 February 2011
The creation of a complete genome-wide map of transcription factor binding sites is essential for understanding gene regulatory networks in vivo. However, current prediction methods generally rely on statistical models that imperfectly model transcription factor binding. Generation of new prediction methods that are based on protein binding data, but do not rely on these models may improve prediction sensitivity and specificity.
We propose a method for predicting transcription factor binding sites in the genome by directly mapping data generated from protein binding microarrays (PBM) to the genome and calculating a moving average of several overlapping octamers. Using this unique algorithm, we predicted binding sites for the essential pancreatic islet transcription factor Nkx2.2 in the mouse genome and confirmed >90% of the tested sites by EMSA and ChIP. Scores generated from this method more accurately predicted relative binding affinity than PWM based methods. We have also identified an alternative core sequence recognized by the Nkx2.2 homeodomain. Furthermore, we have shown that this method correctly identified binding sites in the promoters of two critical pancreatic islet β-cell genes, NeuroD1 and insulin2, that were not predicted by traditional methods. Finally, we show evidence that the algorithm can also be applied to predict binding sites for the nuclear receptor Hnf4α.
PBM-mapping is an accurate method for predicting Nkx2.2 binding sites and may be widely applicable for the creation of genome-wide maps of transcription factor binding sites.
- Transcription Factor Binding Site
- Electrophoretic Mobility Shift Assay
- Core Sequence
- Position Weight Matrix
- Relative Binding Affinity
The dynamic process of gene regulation is essential for embryonic development and cellular function. Gene regulation is primarily mediated by the combinatorial effects of transcription factors interacting with cis-regulatory elements such as promoters and enhancers. Therefore, accurate identification of transcription factor binding sites within the genome is necessary to understand a wide range of cellular processes from cell differentiation to homeostasis to cancer. However, identifying these sites within the genome remains a complex biological and computational question.
One of the challenges in predicting transcription factor binding sites is that identification of the strongest binding sequence, or consensus site, is not sufficient. Research analyzing genome wide transcription factor occupancy has shown that low affinity binding sites are also significantly occupied in both yeast and Drosophila[1, 2]. Furthermore, transcription factors from the same family have been shown to bind identical high affinity sites, but distinct low affinity sites [3, 4]. Therefore, identification of both high and low affinity sites will be essential to fully understand transcription factor specificity within the genome.
Current transcription factor motif algorithms generally rely on a statistical model, such as a position weight matrix (PWM), generated with information derived from homology between co-regulated promoters, conserved regions in orthologous genes, known binding sites, or in vitro binding assays . However, these methods have a low level of specificity and sensitivity [6–9]. This problem is due to both experimental and theoretical errors. Experimental errors include alignment of a limited set of binding sites , resulting in a PWM that is information poor, and non-physiological conditions used in in vitro binding assays . Theoretical errors stem from assumptions used as the basis of the PWM model. PWMs assume that contributions by individual bases within a binding site are independent and additive, and that the binding energy contribution is proportional to their frequency in the position weight matrix. However, all three of these assumptions have been called into question. Interdependencies have been demonstrated at least in a subset of transcription factors resulting in contributions to binding affinity that are both interdependent and non-additive [10, 12, 13]. Position weight matrix scores have also been experimentally shown to be a poor predictor of binding affinities in both eukaryotic and prokaryotic systems [7, 14].
Nkx2.2 is a homeodomain transcription factor expressed in the ventral neural tube, intestine and pancreas . A consensus sequence (T(t/c)AAGT(a/g)(c/g)TT) has been identified by SELEX and a corresponding PWM was generated and deposited in the TRANSFAC database . However, experiences in our lab and others have shown that the predictive power of this PWM is low. More recently, Berger et al.  generated a PWM for Nkx2.2 using protein binding microarray (PBM) technology. PBMs use a mathematically constructed set of oligos to quantitatively measure protein-DNA binding for all possible octamers. This should, in theory, result in a PWM that is more information rich than those constructed by other methods.
Here, we show the results of an in-depth analysis of the PBM data for Nkx2.2. The original Nkx2.2 consensus sequence contains an invariable "AAGT" core. We have identified an alternative low-affinity core sequence with a wobble in the first position to contain "GAGT". We also mapped the PBM data directly to the genome to identify putative Nkx2.2 binding sites. Using this method, we identified 111 novel Nkx2.2 binding sites within the proximal promoters of genes differentially expressed in wildtype and Nkx2.2 null pancreata and confirmed approximately 90% of these sites by EMSA and/or ChIP analysis. Six of the sites confirmed by EMSA and ChIP contain the alternative "GAGT" core sequence. We also show that using a moving average of E-scores from the protein binding microarrays to predict relative binding affinity outperformed both the TRANSFAC PWM and the PBM-based PWM. Since the PBM-based PWM and the method described in this study rely on the same input data, these results show that the assumptions used to generate PWMs do not accurately describe Nkx2.2 binding. Therefore, creating genome-wide maps by directly using experimental data will greatly increase the specificity and sensitivity of transcription factor binding site (TFBS) predictions over statistical models. Furthermore, these experiments revealed a gene battery that includes a large number of genes required for insulin secretion functions in the β-cell that is controlled by Nkx2.2. We demonstrate that this same method can be adapted to other transcription factors. Based on these findings, we propose that PBM-mapping can be used to create composite TFBS maps across the entire genome. Such a map would greatly aid in the identification of cis-regulatory elements and the understanding of gene regulation.
Identification of an alternative Nkx2.2 binding site core sequence
Nkx2.2 was previously shown to specifically bind a 10 base-pair sequence containing an invariable "AAGT" core sequence flanked by several less conserved bases . To date, only two in vivo binding sites (in the insulin2 and MafA promoters) identified using this sequence have been successfully verified [17, 18]. One possible explanation for the low predictive power of this consensus sequence is that it does not completely encompass the possible binding motifs for Nkx2.2 binding. More recently, Berger et al.  published a protein binding microarray (PBM) analyzing the binding affinity of the Nkx2.2 homeodomain. PBMs generate an enrichment score (E-score) with a range from -0.5 (low affinity) to 0.5 (high affinity) for every possible 8-base combination based on the relative intensity readouts from microarray data . Therefore, it has the potential to be more information rich than other methods because it allows for complete coverage of possible binding sequences and provides quantitative binding results. We used this data as a basis to identify additional Nkx2.2 DNA binding motifs.
Several combinations of two core sequences are possible within a single octamer. In order to identify the effect of two adjacent cores on Nkx2.2 binding, we analyzed these "dual-core" octamers (Additional File 1). Dual-core octamers for Nkx2.2 binding can be divided into 4 groups based on which core sequence (AAGT or GAGT) is included and their relative orientation (inline or reverse compliment). Interestingly, all octamers containing two cores in the reverse compliment orientation had E-scores > 0.45 while octamers with inline cores had E-scores < 0.37, independent of which cores were contained in the octamer.
The two identified core sequence motifs differ only in the first position. In order to determine whether significant enrichment could be achieved with the other two possible first position bases, we plotted a histogram of the number of occurrences of each possible base in the first position for all E-scores (Figure 1B). We found that there is a significant enrichment of only the AAGT and GAGT core sequences.
To experimentally test the alternative GAGT binding site, we performed Electrophoretic Mobility Shift Assay (EMSA) experiments using the previously published Nkx2.2 consensus sequence as a baseline probe [16, 20] (Figure 1C). We mutated this probe to contain the alternative GAGT core sequence (max E-score = 0.48593, PBM-mapping score = 0.43038) and compared binding to both the original consensus probe (max E-score = 0.49841, PBM-mapping score = 0.46186) and a probe with a deleted core sequence (max E-score = 0.11739, PBM-mapping score = 0.05029). The GAGT containing probe showed significant binding with in vitro translated Nkx2.2 (TNT Nkx2.2) or nuclear extract from pancreatic cell lines with or without exogenously expressed Nkx2.2, although binding was weaker than the AAGT containing probe. Taken together, these experiments show that GAGT represents an alternative core sequence for Nkx2.2 binding sites, although its relative binding affinity is lower than the canonical AAGT core sequence.
Identification of Novel Endogenous Nkx2.2 Binding Sites Using Protein Binding Microarray Mapping
We tested our algorithm by searching for Nkx2.2 binding sites throughout the genome. Complete analysis of the genome resulted in 3 × 106 predicted sites, which falls within range of the expected number of binding sites in the genome for a single transcription factor, based on statistical probability of the sequence occurring in the genome. In order to investigate sites that are most likely to be biologically relevant, we selected sites within regulatory regions from 2.5 kb upstream to 1 kb downstream of the transcription start site of 35 genes with expression levels significantly (P ≤ 0.05) changed in Nkx2.2 null embryos [23–25]. A total 111 putative binding sites were identified in 31 differentially expressed genes including 7 of the 8 genes with increased expression in the Nkx2.2 null pancreas, and 24 of the 27 genes with decreased expression in the Nkx2.2 null pancreas. Furthermore, genes with differential expression in the Nkx2.2 null embryo were more likely to have predicted sites within 500 bp upstream of the transcriptional start site (Additional File 2, P = .02).
Although EMSA analysis shows the sequence specificity of transcription factor binding, it does not take into account cell-specific chromatin states or cooperative factors that may affect binding in vivo. In order to confirm Nkx2.2 occupancy of these sites in β-cells, we performed chromatin immunoprecipitation (ChIP) using the βTC6 cell line (Figure 3B). We did not include the Nkx6.2 -1441, Nkx6.2 +1669, Irs4 +1495 and Tm4sf4 +912 sites in this analysis because they fell in low complexity regions, which hindered the design of appropriate primers. Precipitation with the Nkx2.2 monoclonal antibody resulted in significant enrichment of all predicted promoters that were tested (Figure 3B). The promoter for the housekeeping gene GapdH was used as a negative control. This data shows that promoters containing predicted sites are occupied in vivo. However, ChIP results cannot distinguish between multiple sites in of the Gcg, Ins2, Iapp, Nkx2.2, and Tm4sf4 promoters due to the close proximity of the sites.
PBM-mapping accurately predicts relative Nkx2.2 binding affinity in vitro
AAGT and GAGT core sequences contribute to Nkx2.2 binding in the NeuroD1 promoter
Identification of a novel Nkx2.2 binding site in the insulin promoter
PBM-mapping accurately predicts Hnf4α binding sites
To test the wider applicability of PBM-mapping for the general prediction of transcription factor direct targets, we applied a moving average of E-scores from 6 overlapping octamers to the nuclear receptor family member Hnf4α. Hnf4α has been shown to bind DNA as a homodimer to a site containing two direct repeats of the consensus binding site (consensus site: AGGTCA) . Therefore, Hnf4α represents a transcription factor that is structurally and functionally distinct from Nkx2.2. We applied PBM-mapping to 30 putative Hnf4α binding sites (18 positive and 12 negative) that were previously tested by EMSA analysis . At a threshold value of 0.26, PBM-mapping predicted 16 of the 18 (89%) confirmed binding sites and correctly did not predict any of the negative sites (Additional File 6). Adding or subtracting the number of overlapping octamers did not improve prediction scores for the analyzed data set (data not shown). This data suggests that PBM-mapping may be directly applicable to a wide range of transcription factors.
The identification of transcription factor binding sites is an important biological question. To date, the majority of methods to detect these sites have focused on creating statistical models, such as position weight matrices, of transcription factor specificities. However, these models are limited due to the fact that they must make generalized assumptions about transcription factor binding properties that are not completely understood. Other recent technologies have been developed such as ChIP-Seq to look at genomic transcription factor occupancy . However, these technologies do not identify the precise binding sites, are technically difficult, and limited by the lack of high quality antibodies for many transcription factors. Therefore, bioinformatic prediction of transcription factor binding sites remains a powerful and useful tool for understanding transcriptional regulation. In this paper, we propose a modified technique for creating genome-wide TFBS site maps using direct mapping of protein binding microarray data (PBM-mapping). This method is technically simpler than ChIP-seq methods and does not rely on the assumptions used in statistical models.
We have shown that PBM-mapping more accurately predicts relative binding affinity than previously reported TRANSFAC or PBM based PWMs. PWM inaccuracies have been attributed to both experimental and theoretical errors . Our studies support both of these limitations. The TRANSFAC PWM was developed from the alignment of 23 sequences enriched using SELEX experiments. The PBM-PWM was based on microarray experiments, which provide data for all possible octamers. In our experiments, the PBM-PWM was more highly correlated with observed relative binding affinity than the TRANSFAC PWM, most likely due to the increased information content in the PBM-PWM. However, PBM-mapping scores correlated to a greater extent with relative binding affinity than either PWM method, even though it was generated from the same data used for the PBM-PWM. Therefore, the PWM statistical model does not accurately model Nkx2.2 binding.
Numerous methods for generating PWMs exist and there are several statistical corrections that can be applied to the PWM model, however accurately testing and comparing all of these corrections is technically difficult and therefore were outside the scope of this study . Predictions based on the combined results of more than one PWM could also be attempted. However, these predictions would still be susceptible to the limitations of the PWM model to account for the influence of neighboring nucleotides and flanking regions.
A method for using the average E-score of 3 overlapping octamers surrounding a previously identified core sequence (AvgES) has been utilized for predicting binding sites in C. elegans. However, this method was not fully tested for accuracy and is biased by the assumption of a completely conserved 4-basepair long core sequence. Our results show that using an average E-score from an increased number of overlapping octamers improved the accuracy of transcription factor binding site prediction. Interestingly, an average E-score of 7 overlapping octamers resulted in the highest correlation with relative binding affinity. This represents the greatest number of overlapping octamers possible with at least one base pair common between all oligos. Nkx2.2 binding has been reported to be influenced by the flanking DNA sequence [16, 34]. Therefore, the increase in accuracy is most likely due to accounting for the flanking regions; however, it may also be due to reductions in the inherent noise present from microarray quantification . It remains possible that bases outside of the 7 octamer window can affect binding affinity, although using a window larger than 7 overlapping octamers resulted in decreased accuracy. This may be due to limitations of using 8 bp long motifs for generation of PBM data. It remains to be seen if increasing the length of DNA-binding sequences used in the microarray based experiments to generate E-scores would further increase the accuracy of binding affinity prediction.
We did not observe a large difference between the flanking regions of "AAGT" and "GAGT" containing sequences. Furthermore, both core sequences share common flanking regions in the octamers with the highest E-score (TCAAGT GG and TCGAGT GG). However, the magnitude of the effect that base substitutions in the flanking regions have of the overall score of the site differed between "AAGT" and "GAGT" containing sites. For example, conversion of "G" to "C" in the last position of the above mentioned octamers reduces the single octamer E-score of the "AAGT" site by 0.0038 and the "GAGT" site by 0.011, confirming that the bases in these positions are not independently additive, but are dependent on other bases in the binding site, even when these bases are not immediately adjacent.
We tested PBM-mapping using the homeodomain transcription factor Nkx2.2. Nkx2.2 has been shown to act as both an activator and a repressor during pancreatic islet formation [23, 35]. β-cells are completely absent in the Nkx2.2 null embryo . There was also a corresponding decrease in many, but not all, β-cell markers . However, it was unclear which of these genes were down-regulated due to direct transcriptional control by Nkx2.2 or because the β-cell population was absent. We used PBM-mapping to predict novel Nkx2.2 binding sites within bound promoters of genes differentially expressed between wildtype and Nkx2.2 null embryos. Interestingly, a large majority of differentially expressed genes had predicted sites, suggesting that between e12.5 and e13.5 the changes seen in the Nkx2.2 null pancreas are largely due to direct regulation by Nkx2.2 and not a cascade of downstream transcription factors. We were also able to predict binding sites in several β-cell specific genes, including a battery of genes encoding proteins present in secretory vesicles, such as insulin, IAPP and ChgB that appear to be coordinately activated by Nkx2.2.
Several studies have shown that relatively weak or secondary binding sites are biologically important [1, 2]. These sites may create genomic binding profiles dependent on protein concentrations or they may differentially bind closely related transcription factors that share a common primary binding motif . Careful analysis of the PBM data for Nkx2.2 revealed a previously unidentified alternative "GAGT" binding site motif for Nkx2.2. GAGT containing sites were also represented in our predicted sites (Gcg -432, Iapp -1355, Nkx2.2 -716, and Tm4sf4 -1723)--confirming the ability of the algorithm to predict secondary sites. This is the first secondary motif that has been identified for an Nkx2 family member, although a unique secondary motif has been identified for Nkx3.1 (GTAC). PBM-mapping does not predict that Nkx2.2 binds the GTAC core sequence; however this is not surprising since the homeodomain sequences between Nkx2 and Nkx3 family members are not well conserved and the two protein families are known to preferentially bind different core sequences . Further research is necessary to determine whether the GAGT motif identified in this study is unique to Nkx2.2 or is shared among several Nkx2 genes.
The importance of weak or secondary binding sites highlights the importance of finding a suitable threshold for determining positive prediction results. In our study, we set the threshold for putative Nkx2.2 binding sites at 0.37 based on the results from our EMSA analysis. However, this may not accurately reflect in vivo binding. For example, the Gcg -1080 site, which had an average E-score just below our threshold, did not show binding in EMSA analysis (Figure 3A), but appeared to be occupied in ChIP (Figure 3B). Therefore, it is possible that our threshold is overly stringent or that Nkx2.2 binding to weak sites is dependent on the presence of cofactors. The appropriate threshold may also vary between cell types due to different expression levels of the transcription factor. Furthermore, our results show that the appropriate threshold for different transcription factors will differ.
Although our studies to test PBM-mapping mainly focused on Nkx2.2, we believe that it could be widely applied to other transcription factor binding sites. Homeodomain proteins generally bind to a motif with a strong 4-5 base pair core with less conserved flanking sequences ; therefore, the PBM-mapping algorithm should be directly applicable to most homeodomain containing proteins. We also show that PBM-mapping can be applied to the nuclear receptor protein Hnf4α, but with a different threshold score and number of overlapping octamers than Nkx2.2. Not surprisingly, this suggests that the different DNA-binding site profiles associated with each class of protein will necessitate modifying the number of overlapping octamers used in each analysis. This will especially apply to transcription factor classes that contain shorter or longer binding sites. For example, several zinc finger proteins are thought to have binding sites that contains a 3-base pair core, while some bHLH factors have a 6 bp core sequence . A shorter core sequence may rely more heavily on flanking sequences for binding specificity, which would require increased octamer coverage, whereas a longer core may be more self-contained and require fewer octamers for predictive coverage. Threshold values may also need to be adjusted for each protein analyzed. Therefore, adapting this method to other transcription factor families may require a training set of known binding sites to determine an appropriate threshold for each transcription factor. If previously identified binding sites are not available, a training set can be generated from a subset of PBM-mapping predicted sites that span a wide range of PBM-mapping scores, as was done in this study. Although small adaptations may be necessary, the application of this method to other transcription factors merits further investigation.
Alternative Core Sequence Identification
All octamers with an E-score greater than 0.45 were selected from the protein binding microarray for Nkx2.2. Octamers containing "AAGT" or its reverse compliment "ACTT" were removed and the remaining octamers analyzed for common motifs.
Nkx2.2 and Hnf4α Binding Site Prediction
An algorithm to predict Nkx2.2 binding sites was developed and implemented in the Perl programming language as follows: Chromosome sequence data were retrieved from the UCSC genome browser website (http://genome.ucsc.edu). A moving average of the E-scores from the Nkx2.2 or Hnf4α protein binding microarrays was calculated for each subset of 7 overlapping octamers across the entire genome. Genomic locations of putative sites were then saved to a database. For Nkx2.2, results with an average E-score greater than 0.34 were originally considered positive, but subsequently modified to 0.37. For Hnf4α, a threshold value of 0.24 was used.
Electrophoretic Mobility Shift Assay (EMSA)
In vitro synthesized Nkx2.2 protein was made using the TNT Coupled Reticulolysate System (Promega). Nuclear extract was prepared from βTC6 cells using the Nuclear Extract kit (Active Motif). Probes were designed for each of the predicted binding sites analyzed (Additional File 7). Binding reactions were performed as described by . Shifted bands were quantified using the integrated mean of a fixed window for each of the shifts using Photoshop Extended CS3 (Adobe). Fraction bound was calculated in order to estimate the relative binding affinities using densitometry analysis of bound DNA compared to total DNA as measured by EMSA analysis.
Regression analysis was performed using Prism 5 (Graphpad). Linear regression was used for PBM-mapping score/fraction bound comparisons. Fraction bound is related to Kd by the equation Kd = [ligand](1-Fraction Bound)/Fraction Bound. Therefore, the PBM-score should be correlated with the Kd by the equation Kd = z(1-(ax + b))/(ax + b). Non-linear regression of PBM-score predictions of Kd were performed using this equation as the model to create a best-fit trendline.
Chromatin Immunoprecipitation (ChIP)
ChIP experiments were performed using the ChIP IT Express kit (Active Motif). βTC6 cells were grown in DMEM supplemented with 15% FBS. Approximately 1.5 × 107 cells were crosslinked in 1% paraformaldehyde for 10 min at room temperature. Chromatin was then extracted and sheared by sonication using a Diagnode BioRuptor (8min - 30sec ON/OFF) resulting in chromatin fragments from 200-800 bp long. The sheared chromatin was divided into 6 reactions and run independently. Immunoprecipitations were performed with 3 μg mouse anti-Nkx2.2 monoclonal antibody (DSHB). Normal mouse IgG (Millipore) was used as a negative control. Occupancy of the predicted sites was tested by Sybr-Green qPCR of two independent immunoprecipitations (primers are listed in Additional File 8).
We thank Joshua Levine for critical reading of the manuscript. This work was supported by NIH U01 DK072504 (β-Cell Biology Consortium, L.S. and K.R.A), β-Cell Biology Consortium "MATURE" CBP (L.S. and K.H.K), NIH R01 DK082590, the Naomi Berrie Diabetes Center (L.S.), Hormones: Biochemistry and Molecular Biology Training Grant T32 DK07328 (J.T.H.), and the Naomi Berrie Postdoctoral Fellowship (T.L.M.). We would also like to acknowledge support of the Columbia DERC core facilities.
- Segal E, Raveh-Sadka T, Schroeder M, Unnerstall U, Gaul U: Predicting expression patterns from regulatory sequence in Drosophila segmentation. Nature 2008, 451(7178):535–540.View ArticlePubMedGoogle Scholar
- Tanay A: Extensive low-affinity transcriptional interactions in the yeast genome. Genome Res 2006, 16(8):962–972.PubMed CentralView ArticlePubMedGoogle Scholar
- Badis G, Berger MF, Philippakis AA, Talukder S, Gehrke AR, Jaeger SA, Chan ET, Metzler G, Vedenko A, Chen X, et al.: Diversity and complexity in DNA recognition by transcription factors. Science 2009, 324(5935):1720–1723.PubMed CentralView ArticlePubMedGoogle Scholar
- Berger MF, Badis G, Gehrke AR, Talukder S, Philippakis AA, Pena-Castillo L, Alleyne TM, Mnaimneh S, Botvinnik OB, Chan ET, et al.: Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences. Cell 2008, 133(7):1266–1276.PubMed CentralView ArticlePubMedGoogle Scholar
- Elnitski L, Jin VX, Farnham PJ, Jones SJ: Locating mammalian transcription factor binding sites: a survey of computational and experimental techniques. Genome Res 2006, 16(12):1455–1464.View ArticlePubMedGoogle Scholar
- Frech K, Quandt K, Werner T: Finding protein-binding sites in DNA sequences: the next generation. Trends Biochem Sci 1997, 22(3):103–104.View ArticlePubMedGoogle Scholar
- Robison K, McGuire AM, Church GM: A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genome. J Mol Biol 1998, 284(2):241–254.View ArticlePubMedGoogle Scholar
- Stormo GD: DNA binding sites: representation and discovery. Bioinformatics 2000, 16(1):16–23.View ArticlePubMedGoogle Scholar
- Tompa M, Li N, Bailey TL, Church GM, De Moor B, Eskin E, Favorov AV, Frith MC, Fu Y, Kent WJ, et al.: Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol 2005, 23(1):137–144.View ArticlePubMedGoogle Scholar
- O'Flanagan RA, Paillard G, Lavery R, Sengupta AM: Non-additivity in protein-DNA binding. Bioinformatics 2005, 21(10):2254–2263.View ArticlePubMedGoogle Scholar
- Djordjevic M, Sengupta AM, Shraiman BI: A biophysical approach to transcription factor binding site discovery. Genome Res 2003, 13(11):2381–2390.PubMed CentralView ArticlePubMedGoogle Scholar
- Bulyk ML, Johnson PL, Church GM: Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. Nucleic Acids Res 2002, 30(5):1255–1261.PubMed CentralView ArticlePubMedGoogle Scholar
- Man TK, Stormo GD: Non-independence of Mnt repressor-operator interaction determined by a new quantitative multiple fluorescence relative affinity (QuMFRA) assay. Nucleic Acids Res 2001, 29(12):2471–2478.PubMed CentralView ArticlePubMedGoogle Scholar
- Liu J, Stormo GD: Combining SELEX with quantitative assays to rapidly obtain accurate models of protein-DNA interactions. Nucleic Acids Res 2005, 33(17):e141.PubMed CentralView ArticlePubMedGoogle Scholar
- Sussel L, Kalamaras J, Hartigan-O'Connor DJ, Meneses JJ, Pedersen RA, Rubenstein JL, German MS: Mice lacking the homeodomain transcription factor Nkx2.2 have diabetes due to arrested differentiation of pancreatic beta cells. Development 1998, 125(12):2213–2221.PubMedGoogle Scholar
- Watada H, Mirmira RG, Kalamaras J, German MS: Intramolecular control of transcriptional activity by the NK2-specific domain in NK-2 homeodomain proteins. Proc Natl Acad Sci USA 2000, 97(17):9443–9448.PubMed CentralView ArticlePubMedGoogle Scholar
- Cissell MA, Zhao L, Sussel L, Henderson E, Stein R: Transcription factor occupancy of the insulin gene in vivo. Evidence for direct regulation by Nkx2.2. J Biol Chem 2003, 278(2):751–756.View ArticlePubMedGoogle Scholar
- Raum JC, Gerrish K, Artner I, Henderson E, Guo M, Sussel L, Schisler JC, Newgard CB, Stein R: FoxA2, Nkx2.2, and PDX-1 regulate islet beta-cell-specific mafA expression through conserved sequences located between base pairs -8118 and -7750 upstream from the transcription start site. Mol Cell Biol 2006, 26(15):5735–5743.PubMed CentralView ArticlePubMedGoogle Scholar
- Berger MF, Philippakis AA, Qureshi AM, He FS, Estep PW, Bulyk ML: Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat Biotechnol 2006, 24(11):1429–1435.PubMed CentralView ArticlePubMedGoogle Scholar
- Anderson KR, Torres CA, Solomon K, Becker TC, Newgard CB, Wright CV, Hagman J, Sussel L: Cooperative transcriptional regulation of the essential pancreatic islet gene NeuroD1 (beta2) by Nkx2.2 and neurogenin 3. J Biol Chem 2009, 284(45):31236–31248.PubMed CentralView ArticlePubMedGoogle Scholar
- Chen X, Hughes TR, Morris Q: RankMotif++: a motif-search algorithm that accounts for relative ranks of K-mers in binding transcription factors. Bioinformatics 2007, 23(13):i72–79.View ArticlePubMedGoogle Scholar
- Grove CA, De Masi F, Barrasa MI, Newburger DE, Alkema MJ, Bulyk ML, Walhout AJ: A multiparameter network reveals extensive divergence between C. elegans bHLH transcription factors. Cell 2009, 138(2):314–327.PubMed CentralView ArticlePubMedGoogle Scholar
- Doyle MJ, Loomis ZL, Sussel L: Nkx2.2-repressor activity is sufficient to specify alpha-cells and a small number of beta-cells in the pancreatic islet. Development 2007, 134(3):515–523.PubMed CentralView ArticlePubMedGoogle Scholar
- Prado CL, Pugh-Bernard AE, Elghazi L, Sosa-Pineda B, Sussel L: Ghrelin cells replace insulin-producing beta cells in two mouse models of pancreas development. Proc Natl Acad Sci USA 2004, 101(9):2924–2929.PubMed CentralView ArticlePubMedGoogle Scholar
- Anderson KR, White P, Kaestner KH, Sussel L: Identification of known and novel pancreas genes expressed downstream of Nkx2.2 during development. BMC Dev Biol 2009, 9: 65.PubMed CentralView ArticlePubMedGoogle Scholar
- Gerrish K, Van Velkinburgh JC, Stein R: Conserved transcriptional regulatory domains of the pdx-1 gene. Mol Endocrinol 2004, 18(3):533–548.View ArticlePubMedGoogle Scholar
- Brodbelt JS: Evaluation of DNA/Ligand interactions by electrospray ionization mass spectrometry. Annu Rev Anal Chem (Palo Alto Calif) 2010, 3: 67–87.View ArticleGoogle Scholar
- Wang LH, Chmelik R, Nirenberg M: Sequence-specific DNA binding by the vnd/NK-2 homeodomain of Drosophila. Proc Natl Acad Sci USA 2002, 99(20):12721–12726.PubMed CentralView ArticlePubMedGoogle Scholar
- Jiang G, Sladek FM: The DNA binding domain of hepatocyte nuclear factor 4 mediates cooperative, specific binding to DNA and heterodimerization with the retinoid X receptor alpha. J Biol Chem 1997, 272(2):1218–1225.View ArticlePubMedGoogle Scholar
- Kel AE, Niehof M, Matys V, Zemlin R, Borlak J: Genome wide prediction of HNF4alpha functional binding sites by the use of local and global sequence context. Genome Biol 2008, 9(2):R36.PubMed CentralView ArticlePubMedGoogle Scholar
- Johnson DS, Mortazavi A, Myers RM, Wold B: Genome-wide mapping of in vivo protein-DNA interactions. Science 2007, 316(5830):1497–1502.View ArticlePubMedGoogle Scholar
- Djordjevic M: SELEX experiments: new prospects, applications and data analysis in inferring regulatory pathways. Biomol Eng 2007, 24(2):179–189.View ArticlePubMedGoogle Scholar
- Das MK, Dai HK: A survey of DNA motif finding algorithms. BMC Bioinformatics 2007, 8(Suppl 7):S21.PubMed CentralView ArticlePubMedGoogle Scholar
- Hill JT, Chao CS, Anderson KR, Kaufman F, Johnson CW, Sussel L: Nkx2.2 activates the ghrelin promoter in pancreatic islet cells. Mol Endocrinol 2010, 24(2):381–390.PubMed CentralView ArticlePubMedGoogle Scholar
- Doyle MJ, Sussel L: Nkx2.2 regulates beta-cell function in the mature islet. Diabetes 2007, 56(8):1999–2007.PubMed CentralView ArticlePubMedGoogle Scholar
- Steadman DJ, Giuffrida D, Gelmann EP: DNA-binding sequence of the human prostate-specific homeodomain protein NKX3.1. Nucleic Acids Res 2000, 28(12):2389–2395.PubMed CentralView ArticlePubMedGoogle Scholar
- Noyes MB, Christensen RG, Wakabayashi A, Stormo GD, Brodsky MH, Wolfe SA: Analysis of homeodomain specificities allows the family-wide prediction of preferred recognition sites. Cell 2008, 133(7):1277–1289.PubMed CentralView ArticlePubMedGoogle Scholar
- Newburger DE, Bulyk ML: UniPROBE: an online database of protein binding microarray data on protein-DNA interactions. Nucleic Acids Res 2009, (37 Database):D77–82.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.