TFinDit: transcription factor-DNA interaction data depository
© Turner et al.; licensee BioMed Central Ltd. 2012
Received: 18 April 2012
Accepted: 23 August 2012
Published: 3 September 2012
One of the crucial steps in regulation of gene expression is the binding of transcription factor(s) to specific DNA sequences. Knowledge of the binding affinity and specificity at a structural level between transcription factors and their target sites has important implications in our understanding of the mechanism of gene regulation. Due to their unique functions and binding specificity, there is a need for a transcription factor-specific, structure-based database and corresponding web service to facilitate structural bioinformatics studies of transcription factor-DNA interactions, such as development of knowledge-based interaction potential, transcription factor-DNA docking, binding induced conformational changes, and the thermodynamics of protein-DNA interactions.
TFinDit is a relational database and a web search tool for studying transcription factor-DNA interactions. The database contains annotated transcription factor-DNA complex structures and related data, such as unbound protein structures, thermodynamic data, and binding sequences for the corresponding transcription factors in the complex structures. TFinDit also provides a user-friendly interface and allows users to either query individual entries or generate datasets through culling the database based on one or more search criteria.
TFinDit is a specialized structural database with annotated transcription factor-DNA complex structures and other preprocessed data. We believe that this database/web service can facilitate the development and testing of TF-DNA interaction potentials and TF-DNA docking algorithms, and the study of protein-DNA recognition mechanisms.
KeywordsTranscription factor Database Binding site prediction Interaction potential
Transcription factors (TFs) represent a distinct group of DNA binding proteins. They are sequence-specific while allowing certain degrees of variations at particular sites . Though regulation of gene expression is a complicated biological process, one key step of this process is the binding of TFs to their DNA binding sites. At the genome level, identification of DNA target sites of transcription factors has been considered one of the grand challenges in post-genomic bioinformatics. The complex structures in Protein Data Bank (PDB) provide fine details about macromolecular interactions . Knowledge of TF-DNA interactions can help us better understand the mechanisms of protein-DNA recognition, and more importantly, guide the design of new therapeutics for diseases in which transcription factors play critical roles [3–5]. Even though the number of TF-DNA complex structures in PDB has increased steadily due to technical advance in solving complex structures, it still only represents a small percentage of all the annotated transcription factors and their target DNA sites. At the same time, computational studies have made notable progress in modeling protein-DNA interactions. These include development of knowledge-based protein-DNA interaction potentials [6–8], investigation of binding affinity and specificity [9, 10], and protein-DNA docking studies [11–13]. Recently, structure-based TF binding site prediction has received much deserved attention owing to its ability to consider the position interdependence of TFs and the contribution of flanking sequences to binding specificity. The development of more accurate interaction potentials makes these structure-based methods feasible and more appealing in computational prediction of TF binding sites [8, 11, 14].
The paramount importance of transcription factors in gene regulation has attracted significant interests and efforts in developing TF resources either for one specific genome, such as RegulonDB for E. coli K-12  and EDGEdb for C. elegans, or for one specific kingdom, such as JAPAR for Eukaryotes  and RegTransBase for bacteria . The TF resources currently available across the tree of life are listed in a recent survey . Most of these TF resources have either manually annotated or computationally predicted TFs while others use a combination of both annotation approaches. Though these TF resources contain large amounts of data that are valuable to study the diversity and evolution of transcription factors, they are not designed for structural bioinformatics studies of TF-DNA interactions.
On the other hand, several databases/web servers about general protein-nucleic acids interactions have been developed. These include AANT , ProNIT , NPIDB , PDA , BIPA , hPDI , 3D-footprint , PDIdb , ccPDB  and others. While each database/web server offers search options on certain aspects about general protein-nucleic acid interactions, the unique characteristics of transcription factors and the imperative goal of structure-based TF-binding site prediction call for a TF-specific database/web server, especially when transcription factors are not well classified and annotated in PDB. In addition, previous studies have revealed different interaction “modes” between transcription factors and other types of DNA binding proteins [29, 30]. To the best of our knowledge, there are no TF-specific structural databases/web services available.
We developed TFinDit (for T ranscription F actor-DNA in teraction D ata deposit ory) to facilitate structural bioinformatics studies of TF-DNA interactions. TFinDit offers annotated TF-DNA complex structures and other useful information, such as unbound TF structures, thermodynamic data of TF-DNA complexes, and automatic mapping between TF-DNA complexes and known TF binding sites. TFinDit also provides a web interface with multiple search options. Potential users can generate datasets based on their research needs in studying TF-DNA interaction, such as bound-unbound TF pairs, DNA binding sites, and thermodynamic data for wild-type and/or mutants (TF and DNA), or focus on the structural details of one specific TF-DNA complex. The framework of TFinDit can be easily extended to include more useful information once identified in the future.
Construction and content
Computationally, TFinDit has two major components: a relational database using MySQL 5.0.45 and a web server providing an interface accessible to potential users to search the database and display the search results. The web server is developed with a combination of PHP 5.1.6, Java JDK v1.6.0, Python 2.4.3, and Apache Web Server 2.3.3.
The database contains all TF-DNA complexes from PDB . The collection of TF-DNA complexes from PDB is not trivial since the classification of some DNA-binding proteins in PDB is ambiguous. For example, transcription factors Escherichia coli SigmaE Region 4, 2H27  and the ribbon-helix-helix domain of Escherichia coli PutA, 2RBF  are classified as “transferase” and “oxidoreductase” respectively in PDB. So we first developed an in-house program that can automatically identify transcription factors in PDB by combining information from Gene Ontology (GO) terms , PDB keywords, and UniProt keywords . The procedure of the annotation process is shown in Additional file 1 Figure S1. The script and related files are available for download from the TFinDit site (Resources Tab).
Another important component in preprocessing is the mapping of TF structures to entries in other important databases. These include databases with TF binding sites (RegulonDB and Jaspar) [15, 17] and ProNIT, a thermodynamic database for protein-nucleic interactions . Among the 1391 bound TF chains in current release, 307 have ProNIT entries and 433 have annotated binding sequences from RegulonDB/Jaspar. After the preprocessing step, all the data are stored in a relational database. The same procedure will be used for future updates and newly identified entries and related data will be added to the database (Figure 1). We plan to update the database every two to three months.
Utility and discussion
TFinDit is a specialized structural database with annotated transcription factor-DNA complex structures and other related data. We believe that this database/web service can facilitate structural bioinformatics studies, especially in the development of TF-DNA interaction potentials, the testing of TF-DNA docking algorithms, and the study of protein-DNA recognition mechanisms.
Availability and requirements
The service is available at http://bioinfozen.uncc.edu/tfindit
Protein Data Bank
Root Mean Square Deviation
We would like to thank Ms. Akshita Dutta and Ms. Rosario I. Corona for their help with the project. This work was supported by the National Science Foundation #DBI0844749 to JTG.
- Pan Y, Tsai CJ, Ma B, Nussinov R: Mechanisms of transcription factor selectivity. Trends Genet 2010, 26: 75–83. 10.1016/j.tig.2009.12.003View ArticlePubMedGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000, 28: 235–242. 10.1093/nar/28.1.235PubMed CentralView ArticlePubMedGoogle Scholar
- Kuntz ID: Structure-based strategies for drug design and discovery. Science 1992, 257: 1078–1082. 10.1126/science.257.5073.1078View ArticlePubMedGoogle Scholar
- Darnell JE Jr: Transcription factors as targets for cancer therapy. Nat Rev Cancer 2002, 2: 740–749. 10.1038/nrc906View ArticlePubMedGoogle Scholar
- Sankpal UT, Goodison S, Abdelrahim M, Basha R: Targeting Sp1 transcription factors in prostate cancer therapy. Med Chem 2011, 7: 518–525. 10.2174/157340611796799203View ArticlePubMedGoogle Scholar
- Liu Z, Mao F, Guo JT, Yan B, Wang P, Qu Y, Xu Y: Quantitative evaluation of protein-DNA interactions using an optimized knowledge-based potential. Nucleic Acids Res 2005, 33: 546–558. 10.1093/nar/gki204PubMed CentralView ArticlePubMedGoogle Scholar
- Robertson TA, Varani G: An all-atom, distance-dependent scoring function for the prediction of protein-DNA interactions from structure. Proteins 2007, 66: 359–374.View ArticlePubMedGoogle Scholar
- Xu B, Yang Y, Liang H, Zhou Y: An all-atom knowledge-based energy function for protein-DNA threading, docking decoy discrimination, and prediction of transcription-factor binding profiles. Proteins 2009, 76: 718–730. 10.1002/prot.22384PubMed CentralView ArticlePubMedGoogle Scholar
- Ashworth J, Baker D: Assessment of the optimization of affinity and specificity at protein-DNA interfaces. Nucleic Acids Res 2009, 37: e73. 10.1093/nar/gkp242PubMed CentralView ArticlePubMedGoogle Scholar
- Luscombe NM, Thornton JM: Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity. J Mol Biol 2002, 320: 991–1009. 10.1016/S0022-2836(02)00571-5View ArticlePubMedGoogle Scholar
- Liu Z, Guo JT, Li T, Xu Y: Structure-based prediction of transcription factor binding sites using a protein-DNA docking approach. Proteins 2008, 72: 1114–1124. 10.1002/prot.22002View ArticlePubMedGoogle Scholar
- van Dijk M, Bonvin AM: Pushing the limits of what is achievable in protein-DNA docking: benchmarking HADDOCK's performance. Nucleic Acids Res 2010, 38: 5634–5647. 10.1093/nar/gkq222PubMed CentralView ArticlePubMedGoogle Scholar
- van Dijk M, van Dijk AD, Hsu V, Boelens R, Bonvin AM: Information-driven protein-DNA docking using HADDOCK: it is a matter of flexibility. Nucleic Acids Res 2006, 34: 3317–3325. 10.1093/nar/gkl412PubMed CentralView ArticlePubMedGoogle Scholar
- Angarica VE, Perez AG, Vasconcelos AT, Collado-Vides J, Contreras-Moreira B: Prediction of TF target sites based on atomistic models of protein-DNA complexes. BMC Bioinforma 2008, 9: 436. 10.1186/1471-2105-9-436View ArticleGoogle Scholar
- Gama-Castro S, Salgado H, Peralta-Gil M, Santos-Zavaleta A, Muniz-Rascado L, Solano-Lira H, Jimenez-Jacinto V, Weiss V, Garcia-Sotelo JS, Lopez-Fuentes A, et al.: RegulonDB version 7.0: transcriptional regulation of Escherichia coli K-12 integrated within genetic sensory response units (Gensor Units). Nucleic Acids Res 2011, 39: D98-D105. 10.1093/nar/gkq1110PubMed CentralView ArticlePubMedGoogle Scholar
- Barrasa MI, Vaglio P, Cavasino F, Jacotot L, Walhout AJ: EDGEdb: a transcription factor-DNA interaction database for the analysis of C. elegans differential gene expression. BMC Genomics 2007, 8: 21. 10.1186/1471-2164-8-21PubMed CentralView ArticlePubMedGoogle Scholar
- Portales-Casamar E, Thongjuea S, Kwon AT, Arenillas D, Zhao X, Valen E, Yusuf D, Lenhard B, Wasserman WW, Sandelin A: JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res 2010, 38: D105-D110. 10.1093/nar/gkp950PubMed CentralView ArticlePubMedGoogle Scholar
- Kazakov AE, Cipriano MJ, Novichkov PS, Minovitsky S, Vinogradov DV, Arkin A, Mironov AA, Gelfand MS, Dubchak I: RegTransBase–a database of regulatory sequences and interactions in a wide range of prokaryotic genomes. Nucleic Acids Res 2007, 35: D407-D412. 10.1093/nar/gkl865PubMed CentralView ArticlePubMedGoogle Scholar
- Charoensawan V, Wilson D, Teichmann SA: Genomic repertoires of DNA-binding transcription factors across the tree of life. Nucleic Acids Res 2010, 38: 7364–7377. 10.1093/nar/gkq617PubMed CentralView ArticlePubMedGoogle Scholar
- Hoffman MM, Khrapov MA, Cox JC, Yao J, Tong L, Ellington AD: AANT: the Amino Acid-Nucleotide Interaction Database. Nucleic Acids Res 2004, 32: D174-D181. 10.1093/nar/gkh128PubMed CentralView ArticlePubMedGoogle Scholar
- Kumar MD, Bava KA, Gromiha MM, Prabakaran P, Kitajima K, Uedaira H, Sarai A: ProTherm and ProNIT: thermodynamic databases for proteins and protein-nucleic acid interactions. Nucleic Acids Res 2006, 34: D204-D206. 10.1093/nar/gkj103PubMed CentralView ArticlePubMedGoogle Scholar
- Spirin S, Titov M, Karyagina A, Alexeevski A: NPIDB: a database of nucleic acids-protein interactions. Bioinformatics 2007, 23: 3247–3248. 10.1093/bioinformatics/btm519View ArticlePubMedGoogle Scholar
- Kim R, Guo JT: PDA: an automatic and comprehensive analysis program for protein-DNA complex structures. BMC Genomics 2009, 10(Suppl 1):S13. 10.1186/1471-2164-10-S1-S13PubMed CentralView ArticlePubMedGoogle Scholar
- Lee S, Blundell TL: BIPA: a database for protein-nucleic acid interaction in 3D structures. Bioinformatics 2009, 25: 1559–1560. 10.1093/bioinformatics/btp243View ArticlePubMedGoogle Scholar
- Xie Z, Hu S, Blackshaw S, Zhu H, Qian J: hPDI: a database of experimental human protein-DNA interactions. Bioinformatics 2010, 26: 287–289. 10.1093/bioinformatics/btp631PubMed CentralView ArticlePubMedGoogle Scholar
- Contreras-Moreira B: 3D-footprint: a database for the structural analysis of protein-DNA complexes. Nucleic Acids Res 2010, 38: D91-D97. 10.1093/nar/gkp781PubMed CentralView ArticlePubMedGoogle Scholar
- Norambuena T, Melo F: The Protein-DNA Interface database. BMC Bioinforma 2010, 11: 262. 10.1186/1471-2105-11-262View ArticleGoogle Scholar
- Singh H, Chauhan JS, Gromiha MM, Raghava GP: ccPDB: compilation and creation of data sets from Protein Data Bank. Nucleic Acids Res 2012, 40: D486-D489. 10.1093/nar/gkr1150PubMed CentralView ArticlePubMedGoogle Scholar
- Contreras-Moreira B, Sancho J, Angarica VE: Comparison of DNA binding across protein superfamilies. Proteins 2010, 78: 52–62. 10.1002/prot.22525View ArticlePubMedGoogle Scholar
- Kim R, Corona RI, Hong B, Guo JT: Benchmarks for flexible and rigid transcription factor-DNA docking. BMC Struct Biol 2011, 11: 45. 10.1186/1472-6807-11-45PubMed CentralView ArticlePubMedGoogle Scholar
- Lane WJ, Darst SA: The structural basis for promoter −35 element recognition by the group IV sigma factors. PLoS Biol 2006, 4: e269. 10.1371/journal.pbio.0040269PubMed CentralView ArticlePubMedGoogle Scholar
- Zhou Y, Larson JD, Bottoms CA, Arturo EC, Henzl MT, Jenkins JL, Nix JC, Becker DF, Tanner JJ: Structural basis of the transcriptional regulation of the proline utilization regulon by multifunctional PutA. J Mol Biol 2008, 381: 174–188. 10.1016/j.jmb.2008.05.084PubMed CentralView ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000, 25: 25–29. 10.1038/75556PubMed CentralView ArticlePubMedGoogle Scholar
- Wu CH, Apweiler R, Bairoch A, Natale DA, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, et al.: The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res 2006, 34: D187-D191. 10.1093/nar/gkj161PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang Y, Skolnick J: Scoring function for automated assessment of protein structure template quality. Proteins 2004, 57: 702–710. 10.1002/prot.20264View ArticlePubMedGoogle Scholar
- Zhang Y, Skolnick J: TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 2005, 33: 2302–2309. 10.1093/nar/gki524PubMed CentralView ArticlePubMedGoogle Scholar
- Xu J, Zhang Y: How significant is a protein structure similarity with TM-score = 0.5? Bioinformatics 2010, 26: 889–895. 10.1093/bioinformatics/btq066PubMed CentralView ArticlePubMedGoogle Scholar
- Liu J, Perumal NB, Oldfield CJ, Su EW, Uversky VN, Dunker AK: Intrinsic disorder in transcription factors. Biochemistry 2006, 45: 6873–6888. 10.1021/bi0602718PubMed CentralView ArticlePubMedGoogle Scholar
- Minezaki Y, Homma K, Kinjo AR, Nishikawa K: Human transcription factors contain a high fraction of intrinsically disordered regions essential for transcriptional regulation. J Mol Biol 2006, 359: 1137–1149. 10.1016/j.jmb.2006.04.016View ArticlePubMedGoogle Scholar
- Dunker AK, Uversky VN: Drugs for 'protein clouds': targeting intrinsically disordered transcription factors. Curr Opin Pharmacol 2010, 10: 782–788. 10.1016/j.coph.2010.09.005View ArticlePubMedGoogle Scholar
- Wang G, Dunbrack RL Jr: PISCES: a protein sequence culling server. Bioinformatics 2003, 19: 1589–1591. 10.1093/bioinformatics/btg224View ArticlePubMedGoogle Scholar
- Fraenkel E, Rould MA, Chambers KA, Pabo CO: Engrailed homeodomain-DNA complex at 2.2 A resolution: a detailed view of the interface and comparison with other engrailed structures. J Mol Biol 1998, 284: 351–361. 10.1006/jmbi.1998.2147View ArticlePubMedGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Iype L, Schneider B, Zardecki C: The Nucleic Acid Database. Acta Crystallogr D: Biol Crystallogr 2002, 58: 889–898. 10.1107/S0907444902003487View ArticleGoogle Scholar
- Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM: CATH–a hierarchic classification of protein domain structures. Structure 1997, 5: 1093–1108. 10.1016/S0969-2126(97)00260-8View ArticlePubMedGoogle Scholar
- Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995, 247: 536–540.PubMedGoogle Scholar
- Contreras-Moreira B, Branger PA, Collado-Vides J: TFmodeller: comparative modelling of protein-DNA complexes. Bioinformatics 2007, 23: 1694–1696. 10.1093/bioinformatics/btm148View ArticlePubMedGoogle Scholar
- Zhao H, Yang Y, Zhou Y: Structure-based prediction of DNA-binding proteins by structural alignment and a volume-fraction corrected DFIRE-based energy function. Bioinformatics 2010, 26: 1857–1863. 10.1093/bioinformatics/btq295PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.