An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system
© AlQuraishi et al. 2015
Received: 4 March 2015
Accepted: 11 November 2015
Published: 19 November 2015
Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions.
We have developed an integrated affinity-structure database in which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis.
This database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.
KeywordsProtein-DNA Database Helix-turn-helix Transcription factors Structure PWM
Protein-DNA interactions are among the most fundamental molecular interactions in the cell, underlying transcriptional regulation, chromosome replication, repair, and segregation, nucleosome positioning, plus many other processes. Owing to their central role in biology, protein-DNA interactions have been extensively analyzed and modeled using a variety of computational approaches. These approaches have traditionally been either sequence-based or structure-based. Sequence-based methods model the DNA-binding affinity of a protein using its known DNA binding sites and range in complexity from simple models such as consensus sequences and position-weight matrices (PWMs) to complex models like Variable-Order Bayesian Networks and Feature Motif Models [1–6]. Data from experimental methods such as DNA footprinting [7, 8], SELEX , ChIP-seq [10, 11], and microarrays  are used to derive such models. In contrast, structure-based methods predict the DNA binding affinity of a protein from its molecular structure—obtained either computationally or by experimental methods such as X-ray crystallography and NMR—and its predicted orientation vis-a-vis different DNA sequences, by employing an energy function to compute the protein-DNA binding energy [13–17]. The energy functions that have been used in structure-based methods are derived either from theory or from statistics of interatomic contacts in crystallized protein-DNA structures.
Many databases have been developed that address the particular needs of the sequence- and structure-based approaches. On the sequence side, DNA-binding site databases such as TRANSFAC , JASPAR , and others provide accessibility to raw binding site data and simple models of protein-DNA binding affinity like PWMs. Specialty databases that include quantitative binding-affinity data also exist, such as ProNIT , UniPROBE . On the structure-side, databases like the Protein Data Bank (PDB)  provide general access to protein structures, and specialty databases such as NPIDB  and BIPA  provide culled resources containing only protein-DNA complexes.
While these databases have proven satisfactory for addressing the needs of computational methods that fall squarely into one category or another, the development of algorithmic techniques that utilize both sequence and structural data necessitates an integrative database that couples protein-DNA structural complexes with their binding affinity. In particular, merely curating structural and binding affinity is insufficient. For algorithms to exploit the association between structural properties and quantitative binding affinity, a correspondence must be established between every DNA basepair position in a protein-DNA structural complex and the protein’s experimentally-determined binding affinity for different nucleotides at that position. In this way, supervised machine learning algorithms can use structural properties as inputs and binding affinity as output to learn models that can predict protein-DNA interactions. To our knowledge none of the databases currently combining structural and binding affinity data, including TFinDit  and 3d-footprint , provide such a correspondence.
By integrating binding information from dozens of sources, presenting a unified probabilistic formulation to describe the DNA-binding affinity of proteins, mapped directly onto the atomic structures of aligned protein-DNA complexes, and creating a unified coordinate system to analyze and compare these structures, we have constructed a database that will be a valuable and unique resource for researchers.
Construction and content
Curation of protein-DNA structures
PDB search settings for all HTH-DNA retrieval settings. Indented rows indicate sub-fields
Contains DNA/RNA hybrid
Structural families used as target queries to retrieve HTH-DNA structures. Indented rows indicate sub-fields, and multiple columns under “Setting” indicate a hierarchical choice
DNA/RNA-binding 3-helical bundle (core: 3-helices; bundle, closed or partly opened, right-handed twist; up-and down)
lambda repressor-like DNA-binding domains
Arc Repressor Mutant, subunit A
Arc Repressor Mutant, subunit A
434 Repressor (Amino-terminal Domain)
Arc Repressor Mutant, subunit A
Factor For Inversion Stimulation; Chain: A
Chromosomal Replication Initiator Protein DnaA; Chain: A
Trp Operon Repressor; Chain A
Arc Repressor Mutant, subunit A
Arc Repressor Mutant, subunit A
Tetracycline Repressor; domain 2
Putative cytoplasmic protein
Arc Repressor Mutant, subunit A
Arc Repressor Mutant, subunit A
Arc Repressor Mutant, subunit A
Apoptosis Regulator Bcl-x
Arc Repressor Mutant, subunit A
Elimination of pathological structures
Complexes with three types of structural pathologies were eliminated: (i) the DNA is single-stranded instead of double-stranded, (ii) the complex contains missing backbone atoms, specifically Cα atoms for proteins and C1’, C2’, C3’, C4’, and C5’ atoms for DNA, and (iii) the protein contains non-standard amino acid residues. The elimination of such pathologies streamlines the analysis and insures that only atomically accurate structures are considered.
Elimination of false positive structures
Elimination of redundant structures
After the final set of protein-DNA complexes was selected, we used a sequence of processing steps to generate a uniform set of PDB files that can be readily used in computational analysis. First, we processed all dsDNA molecules to conform to a standardized format in which the two strands of DNA are treated as separate chains, the chains are ordered in a 5′ to 3′ orientation, all overhangs are removed, and the basepairs aligned so that they are physically matched. Since many structures in the PDB do not conform to this standard, we developed scripts to reformat all PDB files in the database accordingly. Second, we extracted protein chains with multiple HTH domains and single HTH domains that span multiple chains, and formatted these protein chains so that each individual HTH domain is spanned by a single chain in an individual PDB file, along with its cognate DNA molecule. Finally, we processed the final set of PDB files with the PDB2PQR [33, 34] utility to carry out the protonation and dewatering steps. PDB2PQR is run with default settings using the AMBER molecular mechanics force field .
Curation of PWMs and structure mapping
We curated experimentally-determined DNA binding sites for each of the protein-DNA structural complexes in the database. The set of binding sites was compiled from several data repositories such as TRANSFAC along with primary sources [9, 12, 16, 18–20, 36–74]. All the DNA binding sites in the database are based on experimentally assays. In some instances, the same experiment was reported in two or more of the data repositories we used. When possible (e.g. by checking the original PMID reference from which experiment is derived), we removed such redundant entries to insure that each binding site entry in the database corresponds to a unique experiment. Multiple distinct experiments reporting on the same binding site were retained however. The experimental assay and, when available, quality ratings of binding sites included in the original data repository are cited in the database (e.g. TRANSFAC quality scores). Using these DNA binding sites we generated an experimentally-derived PWM for each of the protein-DNA complexes in the database. The PWMs were derived by setting the probability of every nucleotide at every position to its empirically-observed relative frequency in the database. For positions for which we did not have any data, we used a uniform distribution over the four nucleotides as a non-informative prior. We also used Laplace smoothing to mitigate errors due to small sample size. Since the orientation and length of the binding sites varied between and within data sources, manual and automated alignment methods were used in constructing the PWMs, which were then mapped onto the protein-DNA structures so that for every basepair position in every protein-DNA complex, we maintain a probability distribution over all four possible nucleotides.
We structurally superimposed all protein-DNA complexes in the database, to establish an alignment between DNA basepairs in one complex to another, and between the amino acid residues of the recognition helices of the proteins. While in general this is not possible for any two arbitrary DNA-binding proteins, proteins within the same structural family typically exhibit a conserved modality for binding. In particular, the HTH family of proteins uses a highly conserved mode of docking into the major groove of DNA [75–77]. This suggested that it would be possible to align all HTH-DNA complexes in the database such that the DNA molecules and recognition helices are superimposed. We developed a novel structural alignment algorithm for this purpose, and used it for a pairwise alignment of all complexes in the database.
We formulated the structural alignment problem as the following optimization problem. Let RMSDDNA be the root mean square deviation (RMSD) between the backbone carbon atoms of two DNA molecules, and RMSDHTH be the RMSD between the Cα atoms of two recognition helices. Then we defined the optimal alignment as the one (over all possible alignments) that minimizes RMSDHTH subject to RMSDDNA < δ. The parameter δ was set to 2 Å. We solved this problem using the following four-step algorithm.
Canonical matching regions
Recognition helix-based alignment
The alignment procedure described so far yields a pairwise alignment between pairs of HTH-DNA complexes. We sought a multiple alignment that would yield a unified coordinate system across the database, where a DNA base (or amino acid residue) in one HTH-DNA complex would map to corresponding DNA bases (or amino acid residues) in all other HTH-DNA complexes. To obtain such a multiple alignment and its resulting unified coordinate system, the Affinity Propagation (AP)  clustering algorithm was run on the complexes in the database, with the distance between any two complexes defined as the final RMSD value of the alignment obtained from the pairwise structural alignment step. The AP algorithm has the advantage of returning an exemplar for every cluster found. Exemplars are characterized by being the cluster member with the smallest distance to all other members of the cluster. Furthermore, the AP algorithm does not require an explicit specification of the number of clusters to be returned, but instead uses a soft parameter approach that enables biasing toward smaller or larger clusters. By varying this single soft parameter and rerunning the AP algorithm, a clustering configuration was found that yielded a single, large cluster, which included the majority of HTH-DNA complexes, and a set of smaller clusters, mostly comprising one HTH-DNA complex each. Inspection of the singleton clusters revealed that they were either false positives that were not detected during the earlier stages of our pipeline, or protein-DNA complexes in which the DNA molecule was substantially bent. Because these complexes deviated markedly in structure from most HTH-DNA complexes and formed only a small subset (nine proteins), they were excluded from the analysis used in deriving a unified coordinate system. However they were retained in the database, as a separate set, to facilitate their future analysis. All false positives were removed entirely. Using the exemplar of the cluster as a reference point, the pairwise alignments between every HTH-DNA complex and the exemplar complex were used to establish a multiple alignment. A correspondence between any two complexes can be found by first mapping to the exemplar complex, and then mapping to the other complex. For example, if the ith DNA base of complex 1 mapped to the jth base of the exemplar, and the jth base of the exemplar mapped to the kth base of complex 2, then the ith base of complex 1 maps to the kth base of complex 2. Using this scheme, a single unified multiple alignment was determined. In addition, all HTH-DNA complexes other than the exemplar were affine transformed so that their DNA molecules and recognition helices are superimposed on the exemplar complex, to prepare the final database.
Utility and discussion
List of final set of structures in database. Some PDB files contain multiple non-redundant HTH domains which were treated as separate structures
Recognition helix residues
Recognition helix residues
Distribution of DNA binding site data sources
Fraction of DNA binding sites
Distribution of source organisms for DNA binding sites
Fraction of DNA binding sites
The database is available in downloadable form for programmatic use, and as a web service for interactive use. Users are able to browse and search for HTH-DNA complexes using all available fields, including gene and protein names, motif types, and source organism. For each entry, graphical and numerical representations of the PWM are readily accessible on the website, in addition to information describing the mapping of the structure to the unified coordinate system.
Protein-specific statistics of the HTH-DNA binding interface
Global statistics on HTH-DNA interactions
Development of sequence-structure algorithms
In addition to interactive use, the major utility of this database is to provide numeric access to the statistics of HTH-DNA interactions using a unified coordinate system that links structural and sequence information. Without this mapping, it is not possible to use supervised machine learning methods that use structural information as input and PWM information as output. We previously used this database in this precise fashion to derive de novo and statistical protein-DNA potentials that rely on combining structural and sequence data [26, 27]. These algorithms improved protein-DNA prediction performance beyond existing algorithms, and this improvement was shown to be due in part to the integration of structural and sequence information .
The database described in this work will facilitate a number of unique applications. First, the coupling of structural information with binding affinity data enables the statistical analysis of the structural basis of protein-DNA biochemical affinity. Second, the unified coordinate system enables a comparison of the similarities and differences of the steric and physico-chemical environments at the interface of HTH-DNA binding at single-residue resolution. Third, the standardization of all complexes in the databases facilitates machine learning and data-driven applications that require structured and standardized data sets. Taken together these features will enable the exploration of sequence- and structure-based approaches to protein-DNA modeling.
Availability and requirements
The service and database is freely available for academic use at http://ProteinDNA.hms.harvard.edu.
Iterative closest points
Nuclear magnetic resonance
Protein data bank
Position weight matrix
Root mean square deviation
Systematic evolution of ligands by exponential enrichment
We thank K. Arya and G. Cooperman for customizing the DMTCP checkpointing software for our purposes, and the three anonymous reviewers for their helpful comments. Wolfram Research provided the Mathematica software environment used for the analysis. This work was supported by U.S. Department of Energy Office of Science [grant number DE-FG02-05ER64136]; the Stanford Genome Training Program [grant number T32 HG00044] from the National Human Genome Research Institute; the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy [grant number DE-AC02-05CH11231]; and the National Institutes of Health [grant number P50 GM107618-01].
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Ben-Gal I, Shani A, Gohr A, Grau J, Arviv S, Shmilovici A, et al. Identification of transcription factor binding sites with variable-order Bayesian networks. Bioinformatics. 2005;21(11):2657–66.View ArticlePubMedGoogle Scholar
- Salama RA, Stekel DJ. Inclusion of neighboring base interdependencies substantially improves genome-wide prokaryotic transcription factor binding site prediction. Nucleic Acids Res. 2010;38(12):e135.View ArticlePubMedPubMed CentralGoogle Scholar
- Sharon E, Lubliner S, Segal E. A feature-based approach to modeling protein-DNA interactions. PLoS Comput Biol. 2008;4(8):e1000154.View ArticlePubMedPubMed CentralGoogle Scholar
- Reid JE, Evans KJ, Dyer N, Wernisch L, Ott S. Variable structure motifs for transcription factor binding sites. BMC Genomics. 2010;11:30.View ArticlePubMedPubMed CentralGoogle Scholar
- Stormo GD. Maximally efficient modeling of DNA sequence motifs at all levels of complexity. Genetics. 2011;187(4):1219–24.View ArticlePubMedPubMed CentralGoogle Scholar
- Maienschein-Cline M, Dinner AR, Hlavacek WS, Mu F. Improved predictions of transcription factor binding sites using physicochemical features of DNA. Nucleic Acids Res. 2012;40(22):e175.View ArticlePubMedPubMed CentralGoogle Scholar
- Galas DJ, Schmitz A. DNAse footprinting: a simple method for the detection of protein-DNA binding specificity. Nucleic Acids Res. 1978;5(9):3157–70.View ArticlePubMedPubMed CentralGoogle Scholar
- Gallo SM, Gerrard DT, Miner D, Simich M, Des Soye B, Bergman CM, et al. REDfly v3.0: toward a comprehensive database of transcriptional regulatory elements in Drosophila. Nucleic Acids Res. 2011;39(Database issue):D118–123.View ArticlePubMedGoogle Scholar
- Jagannathan V, Roulet E, Delorenzi M, Bucher P. HTPSELEX--a database of high-throughput SELEX libraries for transcription factor binding sites. Nucleic Acids Res. 2006;34(Database issue):D90–94.View ArticlePubMedGoogle Scholar
- Yang JH, Li JH, Jiang S, Zhou H, Qu LH. ChIPBase: a database for decoding the transcriptional regulation of long non-coding RNA and microRNA genes from ChIP-Seq data. Nucleic Acids Res. 2013;41(Database issue):D177–187.View ArticlePubMedGoogle Scholar
- Johnson DS, Mortazavi A, Myers RM, Wold B. Genome-wide mapping of in vivo protein-DNA interactions. Science. 2007;316(5830):1497–502.View ArticlePubMedGoogle Scholar
- Newburger DE, Bulyk ML. UniPROBE: an online database of protein binding microarray data on protein-DNA interactions. Nucleic Acids Res. 2009;37:D77–82.View ArticlePubMedGoogle Scholar
- Angarica VE, Perez AG, Vasconcelos AT, Collado-Vides J, Contreras-Moreira B. Prediction of TF target sites based on atomistic models of protein-DNA complexes. BMC Bioinformatics. 2008;9(1):436.View ArticlePubMedPubMed CentralGoogle Scholar
- Donald JE, Chen WW, Shakhnovich EI. Energetics of protein-DNA interactions. Nucleic Acids Res. 2007;35(4):1039–47.View ArticlePubMedPubMed CentralGoogle Scholar
- Moroni E, Caselle M, Fogolari F. Identification of DNA-binding protein target sequences by physical effective energy functions: free energy analysis of lambda repressor-DNA complexes. BMC Struct Biol. 2007;7:61.View ArticlePubMedPubMed CentralGoogle Scholar
- Morozov AV, Havranek JJ, Baker D, Siggia ED. Protein-DNA binding specificity predictions with structural models. Nucleic Acids Res. 2005;33(18):5781–98.View ArticlePubMedPubMed CentralGoogle Scholar
- Liu LA, Bradley P. Atomistic modeling of protein–DNA interaction specificity: progress and applications. Curr Opin Struct Biol. 2012;22(4):397–405.View ArticlePubMedPubMed CentralGoogle Scholar
- Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, et al. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 2006;34(Database issue):D108–110.View ArticlePubMedGoogle Scholar
- Portales-Casamar E, Thongjuea S, Kwon AT, Arenillas D, Zhao X, Valen E, et al. JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res. 2010;38(Database issue):D105–110.View ArticlePubMedGoogle Scholar
- Kumar MDS, Bava KA, Gromiha MM, Prabakaran P, Kitajima K, Uedaira H, et al. ProTherm and ProNIT: thermodynamic databases for proteins and protein-nucleic acid interactions. Nucleic Acids Res. 2006;34:D204–6.View ArticlePubMedGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The protein data bank. Nucleic Acids Res. 2000;28(1):235–42.View ArticlePubMedPubMed CentralGoogle Scholar
- Kirsanov DD, Zanegina ON, Aksianov EA, Spirin SA, Karyagina AS, Alexeevski AV. NPIDB: nucleic acid--protein interaction database. Nucleic Acids Res. 2013;41(D1):D517–23.View ArticlePubMedGoogle Scholar
- Lee S, Blundell TL. BIPA: a database for protein-nucleic acid interaction in 3D structures. Bioinformatics. 2009;25(12):1559–60.View ArticlePubMedGoogle Scholar
- Turner D, Kim R, Guo JT. TFinDit: transcription factor-DNA interaction data depository. BMC Bioinformatics. 2012;13:220.View ArticlePubMedPubMed CentralGoogle Scholar
- Contreras-Moreira B. 3D-footprint: a database for the structural analysis of protein-DNA complexes. Nucleic Acids Res. 2010;38(Database issue):D91–97.View ArticlePubMedGoogle Scholar
- AlQuraishi M, McAdams HH. Direct inference of protein–DNA interactions using compressed sensing methods. Proc Natl Acad Sci. 2011;108(36):14819–24.View ArticlePubMedPubMed CentralGoogle Scholar
- AlQuraishi M, McAdams HH. Three enhancements to the inference of statistical protein-DNA potentials. Proteins: Struct Funct Bioinf. 2013;81(3):426–42.View ArticleGoogle Scholar
- Aravind L, Anantharaman V, Balaji S, Babu MM, Iyer LM. The many faces of the helix-turn-helix domain: Transcription regulation and beyond. Fems Microbiol Rev. 2005;29(2):231–62.View ArticlePubMedGoogle Scholar
- Vaquerizas JM, Kummerfeld SK, Teichmann SA, Luscombe NM. A census of human transcription factors: function, expression and evolution. Nat Rev Genet. 2009;10(4):252–63.View ArticlePubMedGoogle Scholar
- Lu XJ, Olson WK. 3DNA: a versatile, integrated software system for the analysis, rebuilding and visualization of three-dimensional nucleic-acid structures. Nat Protoc. 2008;3(7):1213–27.View ArticlePubMedPubMed CentralGoogle Scholar
- Gajiwala KS, Burley SK. Winged helix proteins. Curr Opin Struct Biol. 2000;10(1):110–6.View ArticlePubMedGoogle Scholar
- Mo Y, Vaessen B, Johnston K, Marmorstein R. Structure of the elk-1-DNA complex reveals how DNA-distal residues affect ETS domain recognition of DNA. Nat Struct Biol. 2000;7(4):292–7.View ArticlePubMedGoogle Scholar
- Dolinsky TJ, Nielsen JE, McCammon JA, Baker NA. PDB2PQR: an automated pipeline for the setup of Poisson–Boltzmann electrostatics calculations. Nucleic Acids Res. 2004;32 suppl 2:W665–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Dolinsky TJ, Czodrowski P, Li H, Nielsen JE, Jensen JH, Klebe G, et al. PDB2PQR: expanding and upgrading automated preparation of biomolecular structures for molecular simulations. Nucleic Acids Res. 2007;35 suppl 2:W522–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Ponder JW, Case DA. Force fields for protein simulations. Adv Protein Chem. 2003;66:27–85.View ArticlePubMedGoogle Scholar
- Kazakov AE, Cipriano MJ, Novichkov PS, Minovitsky S, Vinogradov DV, Arkin A, et al. RegTransBase - a database of regulatory sequences and interactions in a wide range of prokaryotic genomes. Nucleic Acids Res. 2007;35:D407–12.View ArticlePubMedGoogle Scholar
- Halfon MS, Gallo SM, Bergman CM. REDfly 2.0: an integrated database of cis-regulatory modules and transcription factor binding sites in Drosophila. Nucleic Acids Res. 2008;36(Database issue):D594–598.PubMedGoogle Scholar
- Munch R, Hiller K, Barg H, Heldt D, Linz S, Wingender E, et al. PRODORIC: prokaryotic database of gene regulation. Nucleic Acids Res. 2003;31(1):266–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Gama-Castro S, Jimenez-Jacinto V, Peralta-Gil M, Santos-Zavaleta A, Penaloza-Spinola MI, Contreras-Moreira B, et al. RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation. Nucleic Acids Res. 2008;36(Database issue):D120–124.PubMedGoogle Scholar
- Sierro N, Makita Y, de Hoon M, Nakai K. DBTBS: a database of transcriptional regulation in Bacillus subtilis containing upstream intergenic conservation information. Nucleic Acids Res. 2008;36(Database issue):D93–96.PubMedGoogle Scholar
- Down TA, Bergman CM, Su J, Hubbard TJ. Large-scale discovery of promoter motifs in Drosophila melanogaster. PLoS Comput Biol. 2007;3(1):e7.View ArticlePubMedPubMed CentralGoogle Scholar
- Palaniswamy SK, James S, Sun H, Lamb RS, Davuluri RV, Grotewold E. AGRIS and AtRegNet. A platform to link cis-regulatory elements and transcription factors into regulatory networks. Plant Physiol. 2006;140(3):818–29.View ArticlePubMedPubMed CentralGoogle Scholar
- Bulow L, Engelmann S, Schindler M, Hehl R. AthaMap, integrating transcriptional and post-transcriptional data. Nucleic Acids Res. 2009;37(Database issue):D983–986.View ArticlePubMedGoogle Scholar
- Yellaboina S, Ranjan S, Chakhaiyar P, Hasnain SE, Ranjan A. Prediction of DtxR regulon: Identification of binding sites and operons controlled by Diphtheria toxin repressor in Corynebacterium diphtheriae. BMC Microbiol. 2004;4:38.View ArticlePubMedPubMed CentralGoogle Scholar
- Franks AH, Griffiths AA, Wake RG. Identification and Characterization of New DNA-Replication Terminators in Bacillus subtilis. Mol Microbiol. 1995;17(1):13–23.View ArticlePubMedGoogle Scholar
- Griffiths AA, Wake RG. Search for additional replication terminators in the Bacillus subtilis 168 chromosome. J Bacteriol. 1997;179(10):3358–61.View ArticlePubMedPubMed CentralGoogle Scholar
- Griffiths AA, Andersen PA, Wake RG. Replication terminator protein-based replication fork-arrest systems in various Bacillus species. J Bacteriol. 1998;180(13):3360–7.PubMedPubMed CentralGoogle Scholar
- Sugisaki H, Kanazawa S. New restriction endonucleases from Flavobacterium okeanokoites (FokI) and Micrococcus luteus (MluI). Gene. 1981;16(1–3):73–8.PubMedGoogle Scholar
- Falvey E, Grindley NDF. Contacts between gamma-delta-resolvase and the gamma-delta-Res site. EMBO J. 1987;6(3):815–21.PubMedPubMed CentralGoogle Scholar
- Moskowitz IP, Heichman KA, Johnson RC. Alignment of recombination sites in Hin-mediated site-specific DNA recombination. Genes Dev. 1991;5(9):1635–45.View ArticlePubMedGoogle Scholar
- Rosandic M, Paar V, Basar I, Gluncic M, Pavin N, Pilas I. CENP-B box and pJ alpha sequence distribution in human alpha satellite higher-order repeats (HOR). Chromosome Res. 2006;14(7):735–53.View ArticlePubMedGoogle Scholar
- Tronche F, Yaniv M. HNF1, a homeoprotein member of the hepatic transcription regulatory network. Bioessays. 1992;14(9):579–87.View ArticlePubMedGoogle Scholar
- Liston DR, Johnson PJ. Analysis of a ubiquitous promoter element in a primitive eukaryote: early evolution of the initiator element. Mol Cell Biol. 1999;19(3):2380–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Shen WF, Montgomery JC, Rozenfeld S, Moskow JJ, Lawrence HJ, Buchberg AM, et al. AbdB-like Hox proteins stabilize DNA binding by the Meis1 homeodomain proteins. Mol Cell Biol. 1997;17(11):6448–58.View ArticlePubMedPubMed CentralGoogle Scholar
- Kostelidou K, Thomas CM. The hierarchy of KorB binding at its 12 binding sites on the broad-host-range plasmid RK2 and modulation of this binding by IncC1 protein. J Mol Biol. 2000;295(3):411–22.View ArticlePubMedGoogle Scholar
- Garcia-Castellanos R, Mallorqui-Fernandez G, Marrero A, Potempa J, Coll M, Gomis-Ruth FX. On the transcriptional regulation of methicillin resistance - MecI repressor in complex with its operator. J Biol Chem. 2004;279(17):17888–96.View ArticlePubMedGoogle Scholar
- Colloms SD, van Luenen HG, Plasterk RH. DNA binding activities of the Caenorhabditis elegans Tc3 transposase. Nucleic Acids Res. 1994;22(25):5548–54.View ArticlePubMedPubMed CentralGoogle Scholar
- Prakash P, Yellaboina S, Ranjan A, Hasnain SE. Computational prediction and experimental verification of novel IdeR binding sites in the upstream sequences of Mycobacterium tuberculosis open reading frames. Bioinformatics. 2005;21(10):2161–6.View ArticlePubMedGoogle Scholar
- Wilson DS, Guenther B, Desplan C, Kuriyan J. High resolution crystal structure of a paired (Pax) class cooperative homeodomain dimer on DNA. Cell. 1995;82(5):709–19.View ArticlePubMedGoogle Scholar
- Hughes KT, Gaines PCW, Karlinsey JE, Vinayak R, Simon MI. Sequence-Specific Interaction of the Salmonella Hin Recombinase in Both Major and Minor Grooves of DNA. EMBO J. 1992;11(7):2695–705.PubMedPubMed CentralGoogle Scholar
- Hoey T, Levine M. Divergent homeo box proteins recognize similar DNA sequences in Drosophila. Nature. 1988;332(6167):858–61.View ArticlePubMedGoogle Scholar
- White CE, Winans SC. The quorum-sensing transcription factor TraR decodes its DNA binding site by direct contacts with DNA bases and by detection of DNA flexibility. Mol Microbiol. 2007;64(1):245–56.View ArticlePubMedGoogle Scholar
- Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, et al. Transcriptional regulatory code of a eukaryotic genome. Nature. 2004;431(7004):99–104.View ArticlePubMedPubMed CentralGoogle Scholar
- Chen SF, Gunasekera A, Zhang XP, Kunkel TA, Ebright RH, Berman HM. Indirect readout of DNA sequence at the primary-kink site in the CAP-DNA complex: Alteration of DNA binding specificity through alteration of DNA kinking. J Mol Biol. 2001;314(1):75–82.View ArticlePubMedGoogle Scholar
- Koudelka GB, Lam CY. Differential recognition of OR1 and OR3 by bacteriophage 434 repressor and Cro. J Biol Chem. 1993;268(32):23812–7.PubMedGoogle Scholar
- Koudelka GB, Harrison SC, Ptashne M. Effect of non-contacted bases on the affinity of 434 operator for 434 repressor and Cro. Nature. 1987;326(6116):886–8.View ArticlePubMedGoogle Scholar
- Schumacher MA, Lau AOT, Johnson PJ. Structural basis of core promoter recognition in a primitive eukaryote. Cell. 2003;115(4):413–24.View ArticlePubMedGoogle Scholar
- Smale ST, Jain A, Kaufmann J, Emami KH, Lo K, Garraway IP. The initiator element: a paradigm for core promoter heterogeneity within metazoan protein-coding genes. Cold Spring Harb Symp Quant Biol. 1998;63:21–31.View ArticlePubMedGoogle Scholar
- Lo K, Smale ST. Generality of a functional initiator consensus sequence. Gene. 1996;182(1–2):13–22.View ArticlePubMedGoogle Scholar
- Javahery R, Khachi A, Lo K, Zenziegregory B, Smale ST. DNA-Sequence Requirements for Transcriptional Initiator Activity in Mammalian-Cells. Mol Cell Biol. 1994;14(1):116–27.View ArticlePubMedPubMed CentralGoogle Scholar
- Huerta AM, Francino MP, Morett E, Collado-Vides J. Selection for unequal densities of sigma(70) promoter-like signals in different regions of large bacterial genomes. PLoS Genet. 2006;2(11):1740–50.View ArticleGoogle Scholar
- Fischer SEJ, van Luenen HGAM, Plasterk RHA. Cis requirements for transposition of Tc1-like transposons in C. elegans. Mol Gen Genet. 1999;262(2):268–74.PubMedGoogle Scholar
- Rodgers DW, Harrison SC. The complex between phage 434 repressor DNA-binding domain and operator site OR3: structural differences between consensus and non-consensus half-sites. Structure. 1993;1(4):227–40.View ArticlePubMedGoogle Scholar
- van Luenen HGAM, Plasterk RHA. Target site choice of the related transposable elements Tc1 and Tc3 of Caenorhabditis-elegans. Nucleic Acids Res. 1994;22(3):262–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Wintjens R, Rooman M. Structural classification of HTH DNA-binding domains and protein-DNA interaction modes. J Mol Biol. 1996;262(2):294–313.View ArticlePubMedGoogle Scholar
- Suzuki M, Gerstein M. Binding geometry of alpha-helices that recognize DNA. Proteins Struct Funct Genet. 1995;23(4):525–35.View ArticlePubMedGoogle Scholar
- Pabo CO, Nekludova L. Geometric analysis and comparison of protein-DNA interfaces: Why is there no simple code for recognition? J Mol Biol. 2000;301(3):597–624.View ArticlePubMedGoogle Scholar
- Besl PJ, Mckay ND. A method for registration of 3-D shapes. Ieee T Pattern Anal. 1992;14(2):239–56.View ArticleGoogle Scholar
- Frey BJ, Dueck D. Clustering by passing messages between data points. Science. 2007;315(5814):972–6.View ArticlePubMedGoogle Scholar