Quantitative analysis of EGR proteins binding to DNA: assessing additivity in both the binding site and the protein
© Liu and Stormo; licensee BioMed Central Ltd. 2005
Received: 13 April 2005
Accepted: 13 July 2005
Published: 13 July 2005
Recognition codes for protein-DNA interactions typically assume that the interacting positions contribute additively to the binding energy. While this is known to not be precisely true, an additive model over the DNA positions can be a good approximation, at least for some proteins. Much less information is available about whether the protein positions contribute additively to the interaction.
Using EGR zinc finger proteins, we measure the binding affinity of six different variants of the protein to each of six different variants of the consensus binding site. Both the protein and binding site variants include single and double mutations that allow us to assess how well additive models can account for the data. For each protein and DNA alone we find that additive models are good approximations, but over the combined set of data there are context effects that limit their accuracy. However, a small modification to the purely additive model, with only three additional parameters, improves the fit significantly.
The additive model holds very well for every DNA site and every protein included in this study, but clear context dependence in the interactions was detected. A simple modification to the independent model provides a better fit to the complete data.
Zinc finger proteins are the largest family of transcription factors in the human genome. The EGR sub-family of C2H2 zinc finger proteins has been extensively studied to determine the basis of DNA-protein binding specificity. The structure of the DNA-protein complex has been determined for the wild-type EGR1 (zif268) protein bound to its consensus site [1, 2] and for several other variants of the interaction [3–5]. From the structure, the interaction appears very modular with each protein containing several zinc finger domains and each finger interacting with adjacent 3 base-pair (or overlapping 4 base-pair) segments of the binding site. Analysis of binding sites for this family of proteins suggested there were simple rules that relate the sequence of the zinc finger protein to its preferred binding site sequence , and that those rules could be used to design proteins with desired specificities [7, 8]. Soon after, experimental techniques of in vitro randomization and selection were employed to greatly expand the collection of protein-DNA high affinity interactions [9–12]. Several reviews [4, 13–18] have analyzed the protein-DNA crystal structures, summarized the results of the in vitro selection experiments, described rules for predicting high affinity protein-DNA interacting pairs and assessed the success of those rules for designing proteins to recognize particular sequences. Most of the recognition rules that have been developed are qualitative, specifying the amino acid and base-pair combinations that are preferred at each position in the binding sites . Such rules can be effectively used to design proteins with preferred binding sites that are desired .
Despite the success of the qualitative recognition codes for designing proteins with desired preferred binding sites, the utility of such codes is still quite limited. If one compares the collection of known protein-DNA interacting pairs obtained in in vitro selection experiments, more than half of the fingers contain at least one amino acid/base-pair interaction that is not included in the code . Furthermore, the code only predicts the preferred binding site for each protein sequence, or preferred protein for each DNA binding site. But it does not, by its qualitative nature, attempt to predict differences in affinities to similar sequences. Because all of these proteins bind with limited specificity, sites that are very similar to the preferred binding site can often bind with only slightly reduced affinity. Therefore predicting the quantitative binding specificities is important for a comprehensive view of their functions.
Several quantitative binding models have been developed, either specifically for the zinc finger proteins or for general protein-DNA interactions [20–26]. In many cases such codes can accurately predict the preferred binding sites as well as the qualitative codes, but the overall accuracy of the quantitative predictions is limited, undoubtedly for a combination of reasons. One reason is that there are limited data upon which to infer the model parameters using statistical approaches. Another reason is that many of the models are overly simplified, for instance assuming that each amino acid/base-pair contact is independent of any of the surrounding structure. We know, for instance, that the interactions of the protein and DNA are not completely additive [27, 28], and it is also known that both intermolecular and intramolecular interactions contribute to protein-DNA recognition (24). But it has also been shown that models which are additive over the DNA positions can be a reasonably good approximations, at least for some proteins [29, 30]. Most studies of additivity have focused on the DNA binding site, testing whether independent models for each base-pair fit the binding data well [29, 31, 32]. But equally important to the recognition codes is whether additivity holds within the protein. In one example from the EGR family, additivity within the protein was shown to be approximately additive (within 0.5 kcal) for one pair of mutated amino acids . But very few studies have addressed the issue. Even though many variants of EGR family proteins have been used in SELEX and phage-display selection studies (see  for a summary), very few of the affinities have been quantified. Bulyk et al  did measure the affinity to each of 64 different binding sites for five different proteins, but the proteins were different at too many positions to be useful for determining additivity. One needs to have a set of single mutations and their double mutant combinations in order to determine whether the contributions to binding are independent or not. Several structural studies have highlighted the substantial rearrangements that can occur at the protein-DNA interface and can cause single amino acid or base-pair substitutions to influence the interactions at neighboring positions [3, 15, 34, 35]. Such context effects may limit the predictive accuracy of simple recognition codes, although it is also possible that additivity can hold approximately even in the presence of such rearrangements. In the Mnt protein, a single amino acid change can alter the preferred binding site primarily at two adjacent positions, and more weakly over a longer distance [36, 37]. Nevertheless, a complete quantitative analysis of the adjacent positions that were primarily affected showed that the interaction was largely additive for a wide variety of amino acid substitutions .
In this study we analyze the additivity of the interaction in both the DNA binding sites and in the interacting positions of the protein. We measure binding affinities for each of six different proteins, with single and double mutations compared to the wild-type protein, to each of six different DNA sites, also with single and double mutations from the wild-type binding site. We show that for any specific protein or DNA an additive model fits the data quite well. However, there are clear context effects such that no single interaction model fits all of the protein-DNA combinations. But only a small modification to the additive model, with just three additional parameters, improves the fit significantly.
Results and discussion
Oligos applied in this study. I: Synthesized DNA templates bearing either wild-type binding site (Zif_1) for zif268 or one of its variants (Zif_2 to Zif_6) used for generating DNA binding sites by PCR amplification, where KS-1 and SK-1 are two primers (low case). II: Oligos employed to construct five zif268 variants with QuickChange™ XL site-directed mutagenesis Kit (Stratagene) using pzif268 as a template.
SK*-1 gtggcggccgctctagaact (SK-1 was fluorescent labeled with either FAM, HEX, TAMRA, ROX, or CY5)
18Q_plus 5' CGCCGCTTTTCTcagTCGGATGAGCTTACCCGCC
18Q_minus 5' GGCGGGTAAGCTCATCCGActgAGAAAAGCGGCG
18D_plus 5' CGCCGCTTTTCTgatTCGGATGAGCTTACCCGCC
18D_minus 5' GGCGGGTAAGCTCATCCGAatcAGAAAAGCGGCG
21N_plus 5' CGCCGCTTTTCTCGCTCGGATaacCTTACCCGCC
21N_minus 5' GGCGGGTAAGgttATCCGAGCGAGAAAAGCGGCG
18Q_21N_plus 5' CGCCGCTTTTCTcagTCGGATaacCTTACCCGCC
18Q_21N_minus 5' GGCGGGTAAGgttATCCGActgAGAAAAGCGGCG
18D_21N_plus 5' CGCCGCTTTTCTgatTCGGATaacCTTACCCGCC
18D_21N_minus 5' GGCGGGTAAGgttATCCGAatcAGAAAAGCGGCG
Relative binding constants for six DNA binding sites for wild-type of zif268 and its 5 derivatives, where wild-type operator of zif268 was used as the reference. Each data were obtained from 5 or more independent examinations, inside of parenthesis are the standard deviations.
Experimental determined association constants (106M-1) for individual indicated DNA binding site binding to its corresponding protein. Each value is the mean from 5 or more independent determinations and the standard deviations are shown in parenthesis.
Absolute K a (106M-1) for six DNA binding sites and six variants of zif268, derived from the combination of Table 2 and Table 3.
From the binding data we can assess the additivity of the interaction for both the protein and the DNA. In a perfectly additive interaction the binding energy for each sequence would be the sum of the independent contributions at each position. For example, for any protein j, the binding energy to any DNA sequence XY, would be the sum of the interactions with base X and base Y:
ΔG j (X8Y9) = ΔG j (X8) + ΔG j (Y9). (1)
The important assumption of the additive model is that the interaction energy at position 8, for example, doesn't depend on which base occurs at position 9. We do not expect additivity to hold precisely [30, 27, 28], but it can be a very good approximation, at least for some proteins [27, 29]. Previously, studies of additivity have focused on whether the positions in the DNA binding site contribute independently to the binding of a particular protein. Using the data of Table 4 we can also determine whether the positions in the protein contribute additively to the binding of a particular DNA site. That is, we can reverse the symbols of equation 1 to refer to the binding of a particular DNA sequence, i, to a protein sequence UV:
ΔG i (U-1V3) = ΔG i (U-1) + ΔG i (V3). (2)
Of course, we have not measured affinities to all possible DNA sequences or for all possible protein sequences, but because we have both single and double mutants in both the protein and the DNA, and have measured the binding affinities of all combinations, we can determine how well additivity holds on both sides, the DNA and the protein, at least for this limited set of variants.
We cannot actually measure the binding affinities to single positions because they always occur in some context. But we can find the "best fit" values for the independent interactions, and then determine how well the total data fits the additive model using those values. One method to obtain the best fit independent parameters is to apply multiple linear regression to the total data [31, 32]. However, we have argued previously  that a better criterion is to minimize the difference in total free energy between the observed data and the model.
Given the best fit independent parameters we can calculate the specificity information, I spec , of each position independently . For example the specificity information for the protein or DNA ω at the first interacting position is
Several interesting results are evident in Figure 2. As stated above, the proteins vary considerably in their specificity, with RE (shown as "ER" in the figure) showing large discrimination between the different DNA sites, whereas QE and DE are fairly non-specific. The same holds for the different DNA sites, where CG is much more specific than CC or AC. It is interesting that every DNA site prefers R at position -1 of the protein, showing that it contributes to the total affinity of each protein as well as to the specificity of some proteins. The small degree of mutual information, the "M" in each logo, means that every interaction fits well with an additive model. Not only do the DNA positions contribute very additively, as has been shown previously for this family of proteins , but the contributions of the amino acids in the protein are also largely additive. The conclusion that additive models are good approximations to the true data holds for every DNA site and every protein included in the analysis. However, it is also true that there is not a single set of additive parameters that fit well for every case. This is consistent with the context effects previously noted for this family [15, 34]. For example, R prefers to bind to G over A or C, but the magnitude of that preference is much larger if position +3 is an E instead of N. And an N at position +3 always prefers an A over C in the binding site, but that preference is much weaker with an R at position -1 than with a Q or D. Similarly, E at position +3 prefers a C very strongly in the context of an R, but is quite non-specific with either a Q or D at position -1. Similar effects, but of smaller magnitude, can be seen in the context effects of the DNA sites. These results show that additive models can be good approximations not only for the DNA sites in binding to any particular protein as has been seen before , but also for the proteins in binding to any particular DNA site. But the results also show that additivity for specific proteins and DNA sites is not sufficient to generate a general recognition code because context effects can still be important when both the DNA and protein can be variable. The small amounts of mutual information observed for any specific protein or DNA can be reinforced to give much larger amounts when measured over combinations of both components.
Because the data in Figure 3 are in probabilities (if divided by 1000), the information specificity can be calculated more easily than in equation (4):
I spec (α) = log2N α - H α (5)
Information for the position dependence. The diagonal is the specificity information for each of positions -1, 3, 8, and 9. The upper half of the matrix is the specificity information for each of the pairs of positions, and the lower half is the mutual information between pairs of positions.
M(α,β) = I spec (α,β) - (I spec (α) + I spec (β)) (6)
Those values are shown in the lower half of Table 5. From the standard model of interaction between the DNA and protein we would expect there to be very little mutual information for any of the 2D datasets of Figure 3D–G, and that expectation is met. But we do expect high mutual information for the datasets in Figure 3B and 3C because those are the interacting positions. Just as we get high mutual information for positions that interact in RNA structures , we expect to see compensating changes between the amino acids and base-pairs that interact. That expectation is met for the combination of protein position +3 and base-pair position 8 (Figure 3C) where there is a clear preference for E binding to C and for N binding to A. In that case the mutual information is 0.19 bits, which is the main contribution to the total information of that pair, 0.24 bits. However, protein position -1 and base-pair position 9 also interact but show little mutual information because R is the preferred amino acid for each different DNA sequence and G is the preferred base-pair for each different protein. That pair has high specificity information, 1.09 bits, but it is very additive with only 0.02 bits of mutual information.
The EGR family of proteins is an ideal case to study the effectiveness of a recognition code for protein-DNA interactions. The collection of crystal structures along with a large number of examples from selection experiments provides a wealth of information for determining the relationship between the protein sequence and the affinity for different DNA sequences. Simple qualitative models that predict the preferred interactions can be very effective and useful for designing new TFs [14, 19]. Quantitative models, that predict relative binding affinities to multiple DNA sites, are more challenging but some success has been achieved by statistical approaches as well as by structure based approaches [20–26]. Most current models of this type assume independence of the contributions to binding between the positions in the interactions. In this work we show that additive models can be a good approximation for any particular EGR protein and also for binding to any particular DNA site; additivity holds well for both the DNA and protein side of the interaction. But we also show that there is not a universal set of parameters that work for all proteins or all DNA sites, rather there is context dependence in the interactions. However, at least in the cases studied here, a simple addition to the independent model that divides sites into two classes provides a much better fit. This holds promise that, even though additivity does not hold precisely, it may still be possible to determine an additive recognition code by identifying a small set of classes that cover the entire set of interactions. How many classes will be needed is unknown at this time. The 36 combinations in our study required only two classes to give a very good fit but this is still far from a comprehensive analysis. The total number of adjacent amino acid pairs is 400 and the number of di-nucleotide combinations is 16, so there are 6400 possible combinations of the two. Quantitative analyses that cover all possible combinations of even a single zinc finger are impossible at this time. But more thorough sampling of the space of high affinity interactions, followed by quantitative binding assays, will provide much valuable information regarding the nature of recognition codes. While a completely additive model for the interaction of the protein and DNA is not correct, it may be that only relatively minor modifications are needed to make significantly better predictions.
By determining the binding affinities of single and double mutants in both the DNA binding site and in the protein we were able to assess the degree of additivity in both halves of the interaction. Although only a limited number of combinations were tested, we find that for every DNA sequence and for every protein sequence an additive model is a good approximation to the real binding data. However, when all of the data are considered together there are clear context effects that are not well fit by a single additive model. A slightly more complex model does provide a good fit to the observed data, suggesting that quite simple may still be employed to predict quantitative binding interactions of proteins with DNA. Further data are needed to determine how well these findings generalize to more variations and to other protein families.
Construction of wild-type zif268 DNA binding domain (DBD) and its variants
A plasmid containing the DNA binding domain of wild-type zif268 was obtained from Gendaq Limited . The portion of zif268 cDNA encoding the three zinc-finger DBD (cDNA nucleotides 996–1262, amino acids 331–420) was amplified by PCR and subcloned into expression vector pET-28a-c(+) (Novagen) to create His-tagged fusion protein. The resulting construct, denoted pzif268, was verified by DNA sequencing. Five zif268 mutants with alterations in the base-contacting residues in finger one of zif268 DBD were constructed with QuikChange™ XL site-directed mutagenesis Kit (Stratagene) using pzif268 as a template: 3 single substitution mutants R18Q, R18D, E21N, and two double substitution mutants R18Q/E21N and R18D/E21N. The mutagenic primers containing the desired mutations used to create the five mutants are shown in Table 1. The resulting plasmids p18Q, p18D, p21N, p18Q21N and p18D21N were verified by DNA sequencing. Hereafter, the proteins are referred to by their amino acids at positions -1 and +3: RE (wild-type), QE, DE, RN, QN and DN.
Expression and purification of His-tagged-zif268 fusion protein and its variants
E. coli BL21 cells bearing pzif268 or one of its derivatives were grown in 2xYT medium at 37°C with constant shaking at 250 rpm. IPTG was added to a final concentration of 1 mM when OD600 reached 0.6–1.0. Cells were harvested 3 hrs after IPTG induction by centrifugation at 4000 rpm for 20 min. The pellets were then resuspended in 15 ml of lysis buffer (50 mM Tris-HCl pH 8.0, 300 mM NaCl, 10 mM DDT and 1 tablet of protease inhibitor cocktail tablets (Roche) and lysed with sonication. The pellets were then separated by centrifugation at 6000 rpm for 20 min and insoluble material removed. The His-tagged fusion protein was purified with Ni-resin chromatography similar to those described previously . The elutions were collected as 2 ml fractions. Fractions were analyzed on 12% SDS-PAGE gel, followed by silver staining. Finally the fractions were pooled and dialysed against dialysis buffer (30 mM Tris-HCl pH 8.0, 50 mM NaCl, 3 mM DTT) at 4°C, followed by concentration with a Centricon filter (Amicon) and kept at -80°C until usage. The protein concentration was determined with BioRad assay kit.
Multiple quantitative fluorescence relative affinity (QuMFRA) assay to determine the relative binding constants
The relative binding constants of each protein to different binding sites were determined by the QuMFRA assay  with some modifications. Double-strand oligonucleotide binding sites used in this study were generated by PCR reactions. In each PCR reaction, a synthesized oligo containing either the wild-type binding site (zif1) of zif268 or one of its variants (Table 1) was used as template and the two primers are KS and SK (Table 1). The SK primer was labeled with one of the following four fluorophores: FAM, HEX, TAMRA, or ROX . The PCR products were dissolved in TS buffer (10 mM Tris-HCl pH 8.0, 50 mM NaCl) after purification and precipitated with 1/10 vol of 3M NaAc and equal volume of isopropanol. The concentration of DNA was determined using a method similar to those as described previously .
The competitive binding assay  was performed by mixing 4 different fluorophore-labeled DNA binding sites with a certain amount of His-tagged zinc finger protein in 1x reaction buffer (30 mM Tris-HCl pH 8.0, 50 mM NaCl, 0.1 mg/ml BSA, 3 mM DTT, 20 uM ZnSO4, polydI-dC 5 ug/ml), in which the fluorophore-labeled zif1 served as an internal reference in each reaction. The reaction was equilibrated for 1 hr on ice before being electrophoresed on a 10% polyacrylamide gel. Each of 4 fluorophore-labeled PCR products was also loaded individually onto the same gel. After electrophoresis, the gels were scanned by a Typhoon Variable Scanner (Molecular Dynamics, Sunnyvale, CA) to obtain the fluorescent intensities of the separated bands (bound and unbound) at 4 different emission wavelengths using the same machine settings as employed by Man and Stormo . For each separated band, the resultant fluorescence intensities at four emission wavelengths make up the output vector . Using the fluorescence intensities of the 4 individual fluorophore-labeled DNA at each emission wavelength we obtain the emission matrix E . The input mixture of the 4 DNAs in each band, represented as the vector , were computed by a program developed for this study using the Gaussian elimination algorithm from the following relationship:
From the amount of each DNA in the bound and unbound bands of each lane, the relative binding affinity can be calculated by the following formula, where the wild-type binding site of zif268 (zif1) serves as the reference:
K b test /K b ref = [P·D] test [D] ref /[D] test [P·D] ref
K b test /K b ref = IP-DtestI Dref /I Dtest IP-Dref
where IP-Dand I D are the intensities of the specified DNAs in the bound and unbound bands, respectively.
Determination of the absolute binding constant of a zinc finger protein to a binding site by Scatchard analysis
Scatchard analysis  was applied here to examine the absolute association constant, Ka, of a zinc finger protein to a binding site. Specifically, a fixed amount of purified His-tagged zinc finger protein, [P]total, was mixed with increasing Cy5-labeled DNA generated by PCR reactions in 1x reaction buffer for 1 hr on ice. The bound and unbound DNA were separated by electrophoresis on a10% polyacrylamide gel, as above, and the gels were scanned by a Typhoon Variable Scanner using the excitation wavelength of 633 nm and emission wavelength of 670 nm. From the following relationship
it can be seen that the association constant for the particular combination of protein and DNA, K a (P,D), can be obtained from a plot of at multiple DNA concentrations. At least five independent determinations were made for each protein.
We thank Gendaq for giving us DNA phage coding for zif268. We thank Takis Benos for help with subcloning and David Granas for some statistical analyses of the SELEX and phage-display data. This work was supported by NIH grant GM28755.
- Pavletich NP, Pabo CO: Zinc finger-DNA recognition: crystal structure of a Zif268-DNA complex at 2.1 A. Science 1991, 252: 809–17.View ArticlePubMedGoogle Scholar
- Elrod-Erickson M, Rould MA, Nekludova L, Pabo CO: Zif268 protein-DNA complex refined at 1.6 A: a model system for understanding zinc finger-DNA interactions. Structure 1996, 4: 1171–80. 10.1016/S0969-2126(96)00125-6View ArticlePubMedGoogle Scholar
- Elrod-Erickson M, Benson TE, Pabo CO: High-resolution structures of variant Zif268-DNA complexes: implications for understanding zinc finger-DNA recognition. Structure 1998, 6: 451–64. 10.1016/S0969-2126(98)00047-1View ArticlePubMedGoogle Scholar
- Choo Y, Klug A: Physical basis of a protein-DNA recognition code. Curr Opin Struct Biol 1997, 7: 117–25. 10.1016/S0959-440X(97)80015-2View ArticlePubMedGoogle Scholar
- Wolfe SA, Nekludova L, Pabo CO: DNA recognition by Cys2His2 zinc finger proteins. Annu Rev Biophys Biomol Struct 2000, 29: 183–212. 10.1146/annurev.biophys.29.1.183View ArticlePubMedGoogle Scholar
- Desjarlais JR, Berg JM: Toward rules relating zinc finger protein sequences and DNA binding site preferences. Proc Natl Acad Sci U S A 1992, 89: 7345–9.PubMed CentralView ArticlePubMedGoogle Scholar
- Desjarlais JR, Berg JM: Redesigning the DNA-binding specificity of a zinc finger protein: a data base-guided approach. Proteins 1992, 12: 101–4. 10.1002/prot.340120202View ArticlePubMedGoogle Scholar
- Desjarlais JR, Berg JM: Use of a zinc-finger consensus sequence framework and specificity rules to design specific DNA binding proteins. Proc Natl Acad Sci U S A 1993, 90: 2256–60.PubMed CentralView ArticlePubMedGoogle Scholar
- Choo Y, Klug A: Selection of DNA binding sites for zinc fingers using rationally randomized DNA reveals coded interactions. Proc Natl Acad Sci U S A 1994, 91: 11168–72.PubMed CentralView ArticlePubMedGoogle Scholar
- Choo Y, Klug A: Toward a code for the interactions of zinc fingers with DNA: selection of randomized fingers displayed on phage. Proc Natl Acad Sci U S A 1994, 91: 11163–7.PubMed CentralView ArticlePubMedGoogle Scholar
- Desjarlais JR, Berg JM: Length-encoded multiplex binding site determination: application to zinc finger proteins. Proc Natl Acad Sci U S A 1994, 91: 11099–103.PubMed CentralView ArticlePubMedGoogle Scholar
- Rebar EJ, Pabo CO: Zinc finger phage: affinity selection of fingers with new DNA-binding specificities. Science 1994, 263: 671–3.View ArticlePubMedGoogle Scholar
- Nagaoka M, Sugiura Y: Artificial zinc finger peptides: creation, DNA recognition, and gene regulation. J Inorg Biochem 2000, 82: 57–63. 10.1016/S0162-0134(00)00154-9View ArticlePubMedGoogle Scholar
- Pabo CO, Peisach E, Grant RA: Design and selection of novel Cys2His2 zinc finger proteins. Annu Rev Biochem 2001, 70: 313–40. 10.1146/annurev.biochem.70.1.313View ArticlePubMedGoogle Scholar
- Wolfe SA, Greisman HA, Ramm EI, Pabo CO: Analysis of zinc fingers optimized via phage display: evaluating the utility of a recognition code. J Mol Biol 1999, 285: 1917–34. 10.1006/jmbi.1998.2421View ArticlePubMedGoogle Scholar
- Suzuki M, Gerstein M, Yagi N: Stereochemical basis of DNA recognition by Zn fingers. Nucleic Acids Res 1994, 22: 3397–405.PubMed CentralView ArticlePubMedGoogle Scholar
- Pabo CO, Nekludova L: Geometric analysis and comparison of protein-DNA interfaces: why is there no simple code for recognition? J Mol Biol 2000, 301: 597–624. 10.1006/jmbi.2000.3918View ArticlePubMedGoogle Scholar
- Benos PV, Lapedes AS, Stormo GD: Is there a code for protein-DNA recognition? Probab(ilistical)ly.. Bioessays 2002, 24: 466–75. 10.1002/bies.10073View ArticlePubMedGoogle Scholar
- Liu Q, Xia Z, Zhong X, Case CC: Validated zinc finger protein designs for all 16 GNN DNA triplet targets. J Biol Chem 2002, 277: 3850–6. 10.1074/jbc.M110669200View ArticlePubMedGoogle Scholar
- Benos PV, Lapedes AS, Stormo GD: Probabilistic code for DNA recognition by proteins of the EGR family. J Mol Biol 2002, 323: 701–27. 10.1016/S0022-2836(02)00917-8View ArticlePubMedGoogle Scholar
- Paillard G, Lavery R: Analyzing protein-DNA recognition mechanisms. Structure (Camb) 2004, 12: 113–22. 10.1016/j.str.2003.11.022View ArticleGoogle Scholar
- Suzuki M, Brenner SE, Gerstein M, Yagi N: DNA recognition code of transcription factors. Protein Eng 1995, 8: 319–28.View ArticlePubMedGoogle Scholar
- Mandel-Gutfreund Y, Margalit H: Quantitative parameters for amino acid-base interaction: implications for prediction of protein-DNA binding sites. Nucleic Acids Res 1998, 26: 2306–12. 10.1093/nar/26.10.2306PubMed CentralView ArticlePubMedGoogle Scholar
- Gromiha M, Siebers JG, Selvaraj S, Kono H, Sarai A: Intermolecular and intramolecular readout mechanisms in protein-DNA recognition. J Mol Biol 2004, 337: 285–94. 10.1016/j.jmb.2004.01.033View ArticleGoogle Scholar
- Kono H, Sarai A: Structure-based prediction of DNA target sites by regulatory proteins. Proteins 1999, 35: 114–31. 10.1002/(SICI)1097-0134(19990401)35:1<114::AID-PROT11>3.0.CO;2-TView ArticlePubMedGoogle Scholar
- Yoshida T, Nishimura T, Aida M, Pichierri F, Gromiha MM, Sarai A: Evaluation of free energy landscape for base-amino acid interactions using ab initio force field and extensive sampling. Biopolymers 2002, 61: 84–95. 10.1002/1097-0282(2001)61:1<84::AID-BIP10045>3.0.CO;2-XView ArticleGoogle Scholar
- Man TK, Stormo GD: Non-independence of Mnt repressor-operator interaction determined by a new quantitative multiple fluorescence relative affinity (QuMFRA) assay. Nucleic Acids Res 2001, 29: 2471–8. 10.1093/nar/29.12.2471PubMed CentralView ArticlePubMedGoogle Scholar
- Bulyk ML, Johnson PL, Church GM: Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. Nucleic Acids Res 2002, 30: 1255–61. 10.1093/nar/30.5.1255PubMed CentralView ArticlePubMedGoogle Scholar
- Benos PV, Bulyk ML, Stormo GD: Additivity in protein-DNA interactions: how good an approximation is it? Nucleic Acids Res 2002, 30: 4442–51. 10.1093/nar/gkf578PubMed CentralView ArticlePubMedGoogle Scholar
- Man TK, Yang JS, Stormo GD: Quantitative modeling of DNA-protein interactions: effects of amino acid substitutions on binding specificity of the Mnt repressor. Nucleic Acids Res 2004, 32: 4026–32. 10.1093/nar/gkh729PubMed CentralView ArticlePubMedGoogle Scholar
- Lee ML, Bulyk ML, Whitmore GA, Church GM: A statistical model for investigating binding probabilities of DNA nucleotide sequences using microarrays. Biometrics 2002, 58: 981–8. 10.1111/j.0006-341X.2002.00981.xView ArticlePubMedGoogle Scholar
- Stormo GD, Schneider TD, Gold L: Quantitative analysis of the relationship between nucleotide sequence and functional activity. Nucleic Acids Res 1986, 14: 6661–79.PubMed CentralView ArticlePubMedGoogle Scholar
- Elrod-Erickson M, Pabo CO: Binding studies with mutants of Zif268. Contribution of individual side chains to binding affinity and specificity in the Zif268 zinc finger-DNA complex. J Biol Chem 1999, 274: 19281–5. 10.1074/jbc.274.27.19281View ArticlePubMedGoogle Scholar
- Miller JC, Pabo CO: Rearrangement of side-chains in a Zif268 mutant highlights the complexities of zinc finger-DNA recognition. J Mol Biol 2001, 313: 309–15. 10.1006/jmbi.2001.4975View ArticlePubMedGoogle Scholar
- Wolfe SA, Grant RA, Elrod-Erickson M, Pabo CO: Beyond the "recognition code": structures of two Cys2His2 zinc finger/TATA box complexes. Structure (Camb) 2001, 9: 717–23. 10.1016/S0969-2126(01)00632-3View ArticleGoogle Scholar
- Raumann BE, Knight KL, Sauer RT: Dramatic changes in DNA-binding specificity caused by single residue substitutions in an Arc/Mnt hybrid repressor. Nat Struct Biol 1995, 2: 1115–22. 10.1038/nsb1295-1115View ArticlePubMedGoogle Scholar
- Silbaq FS, Ruttenberg SE, Stormo GD: Specificity of Mnt 'master residue' obtained from in vivo and in vitro selections. Nucleic Acids Res 2002, 30: 5539–48. 10.1093/nar/gkf684PubMed CentralView ArticlePubMedGoogle Scholar
- Isalan M, Choo Y: Rapid, high-throughput engineering of sequence-specific zinc finger DNA-binding proteins. Methods Enzymol 2001, 340: 593–609.View ArticlePubMedGoogle Scholar
- Liu J, Zuber P: The ClpX protein of Bacillus subtilis indirectly influences RNA polymerase holoenzyme composition and directly stimulates sigma-dependent transcription. Mol Microbiol 2000, 37: 885–97. 10.1046/j.1365-2958.2000.02053.xView ArticlePubMedGoogle Scholar
- Teare JM, Islam R, Flanagan R, Gallagher S, Davies MG, Grabau C: Measurement of nucleic acid concentrations using the DyNA Quant and the GeneQuant. Biotechniques 1997, 22: 1170–4.PubMedGoogle Scholar
- Hamilton TB, Borel F, Romaniuk PJ: Comparison of the DNA binding characteristics of the related zinc finger proteins WT1 and EGR1. Biochemistry 1998, 37: 2051–8. 10.1021/bi9717993View ArticlePubMedGoogle Scholar
- Stormo GD, Fields DS: Specificity, free energy and information content in protein-DNA interactions. Trends Biochem Sci 1998, 23: 109–13. 10.1016/S0968-0004(98)01187-6View ArticlePubMedGoogle Scholar
- Schneider TD, Stephens RM: Sequence logos: a new way to display consensus sequences. Nucleic Acids Res 1990, 18: 6097–100.PubMed CentralView ArticlePubMedGoogle Scholar
- Gorodkin J, Heyer LJ, Brunak S, Stormo GD: Displaying the information contents of structural RNA alignments: the structure logos. Comput Appl Biosci 1997, 13: 583–6.PubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.