DBBP: database of binding pairs in protein-nucleic acid interactions
© Park et al.; licensee BioMed Central Ltd. 2014
Published: 3 December 2014
Interaction of proteins with other molecules plays an important role in many biological activities. As many structures of protein-DNA complexes and protein-RNA complexes have been determined in the past years, several databases have been constructed to provide structure data of the complexes. However, the information on the binding sites between proteins and nucleic acids is not readily available from the structure data since the data consists mostly of the three-dimensional coordinates of the atoms in the complexes.
We analyzed the huge amount of structure data for the hydrogen bonding interactions between proteins and nucleic acids and developed a database called DBBP (D ataB ase of B inding P airs in protein-nucleic acid interactions, http://bclab.inha.ac.kr/dbbp). DBBP contains 44,955 hydrogen bonds (H-bonds) of protein-DNA interactions and 77,947 H-bonds of protein-RNA interactions.
Analysis of the huge amount of structure data of protein-nucleic acid complexes is labor-intensive, yet provides useful information for studying protein-nucleic acid interactions. DBBP provides the detailed information of hydrogen-bonding interactions between proteins and nucleic acids at various levels from the atomic level to the residue level. The binding information can be used as a valuable resource for developing a computational method aiming at predicting new binding sites in proteins or nucleic acids.
Protein-nucleic acid interactions play an important role in many biological activities. Site-specific DNA-binding proteins or transcription factors (TFs) play important roles in gene regulations by forming protein complexes . These protein-DNA complexes may bind alone or in combination near the genes whose expression they control . For example, DNA-binding proteins may regulate the expression of a target gene , so protein-DNA interactions are important for DNA replication, transcription and gene regulations in general.
Protein-RNA interactions also have important roles in a wide variety of gene expression . For instance, ribonucleoprotein particles (RNPs) bind to RNA in the post-transcriptional regulation of gene expression , and tRNAs bind to aminoacyl-tRNA synthetases to properly translate the genetic code into amino acids . As protein and RNA mutually interact, RNA-binding proteins are essential molecules in degradation, localization, regulating RNA splicing, RNA metabolism, stability, translation, and transport . Therefore, identification of amino acids involved in DNA/RNA binding or (ribo)nucleotides involved in amino acid binding is important for understanding of the mechanism of gene regulations.
As the number of structures of protein-DNA/RNA complexes that have been resolved has been increased plentifully for the past few years, a huge amount of structure data is available at several databases [7–10]. However, the data on the binding sites between proteins and nucleic acids is not readily available from the structure data, which consist mostly of the three-dimensional coordinates of the atoms in the complexes. A recent database called the Protein-RNA Interface Database (PRIDB)  provides the information on protein-RNA interfaces by showing interacting amino acids and ribonucleotides in the primary sequences. However, it does not provide the binding sites on the interacting partners of the amino acids and ribonucleotides in protein-RNA interfaces.
In this study we performed wide analysis of the structures of protein-DNA/RNA complexes and built a database called DBBP (D ataB ase of B inding P airs in protein-nucleic acid interactions). The database shows hydrogen-bonding interactions between proteins and nucleic acids at an atomic level, which is not readily available in any other databases, including the Protein Data Bank (PDB) . The binding pairs of hydrogen bonds provided by the database will help researchers determine DNA (or RNA) binding sites in proteins and protein binding sites in DNA or RNA molecules. It can also be used as a valuable resource for developing a computational method aiming at predicting new binding sites in proteins or nucleic acids. The rest of the paper presents the structure and interface of the database.
Materials and methods
The protein-DNA/RNA complexes determined by X-ray crystallography were selected from PDB. As of February, 2013 there were 2,568 protein-DNA complexes and 1,355 protein-RNA complexes in PDB. After extracting complexes with a resolution of 3.0 Å or better, 2,138 protein-DNA complexes (called the DS1 data set) and 651 protein-RNA complexes (the DS2 data set) remained.
Binding sites in protein-nucleic acid interactions
Different studies [9, 12–14] have defined slightly different criteria for a binding site in protein-nucleic acid interactions. For example, in RNABindR [15, 16] and BindN  an amino acid with an atom within a distance of 5 Å from any other atom of a ribonucleotide was considered to be an RNA-binding amino acid.
As for the criteria for a binding site between proteins and nucleic acids, we use a hydrogen bond (H-bond), which is stricter than the distance criteria. The locations of hydrogen atoms (H) were inferred from the surrounding atoms since hydrogen atoms are invisible in purely X-ray-derived structures. H-bonds between proteins and nucleic acids were identified by finding all proximal atom pairs between H-bond donors (D) and acceptors (A) that satisfy the following the geometric criteria: (1) the hydrogen-acceptor (H-A) distance < 2.5 Å, (2) the donor-hydrogen-acceptor (D-H-A) angle > 90°, (3) the contacts with the donor-acceptor (D-A) distance < 3.9 Å, (4) H-A-AA angle > 90°, where AA is an acceptor antecedent. These are the most commonly used criteria for H bonds. In particular, the criteria of H-A distance < 2.5 Å and D-H-A angle > 90° are essential for H bonds . If there is no H-bond within a protein-nucleic acid complex, we eliminated the complex from the data sets of DS1 and DS2. As a result, we gathered 2,068 protein-DNA complexes (DS3) and 637 protein-RNA complexes (DS4).
The probability of binding amino acid
Results and discussion
Hydrogen bonds in protein-nucleic acid interactions
Atoms of amino acids involved in H-bonding interactions with nucleic acids.
Atoms of nucleotides involved in H-bonding interactions with amino acids.
If an atom of DNA acts as a hydrogen acceptor, an atom of protein should be a hydrogen donor. Hence, the number of DNA acceptors (41,298) is the same as the number of protein donors (41,298), and the number of DNA donors (3,657) is the same as the number of protein acceptors (3,657). Likewise, the number of RNA acceptors (59,796) is the same as the number of protein donors (59,796) and the number of RNA donors (18,151) is the same as the number of protein acceptors (18,151).
DBBP shows binding pairs at various levels, from the atomic level to the residue level. When it shows detailed information on H-Bonds, it shows the donors and acceptors of each H-bond. A same type of atom can play a role of hydrogen donor or acceptor depending on the context. We generated XML files for binding sites of protein-DNA/RNA complexes. Users of the database can access the XML file via PDB ID.
From an extensive analysis of the structure data of protein-DNA/RNA complexes extracted from PDB, we have identified hydrogen bonds (H-bonds). Analysis of the large amount of structure data for H-bonds is labor-intensive, yet provides useful information for studying protein-nucleic acid interactions. The protein-DNA complexes contain 44,955 H-bonds, which have 3,657 hydrogen acceptors (HA) and 41,298 hydrogen donors (HD) in amino acids, and 41,298 HA and 3,657 HD in nucleotides. The protein-RNA complexes contain 77,947 H-bonds, which have 18,151 HA and 59,796 HD in amino acids, and 59,796 HA and 18,151 HD in nucleotides. Using the data of H-bonding interactions, we developed a database called DBBP (D ataB ase of B inding P airs in protein-nucleic acid interactions). DBBP provides the detailed information of H-bonding interactions between proteins and nucleic acids at various levels. Such information is not readily available in any other databases, including PDB, but will help researchers determine DNA (or RNA) binding sites in proteins and protein binding sites in DNA or RNA molecules. It can also be used as a valuable resource for developing a computational method aiming at predicting new binding sites in proteins or nucleic acids. The database is available at http://bclab.inha.ac.kr/dbbp.
This work was funded by the Ministry of Science, ICT and Future Planning (2012R1A1A3011982) and the Ministry of Education (2010-0020163) of Republic of Korea. The cost of the article was funded by the Ministry of Science, ICT and Future Planning (2012R1A1A3011982).
This article has been published as part of BMC Bioinformatics Volume 15 Supplement 15, 2014: Proceedings of the 2013 International Conference on Intelligent Computing (ICIC 2013). The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/15/S15.
- Simicevic J, Deplancke B: DNA-centered approaches to characterize regulatory protein-DNA interaction complexes. Molecular Biosystems. 6 (3): 462-468.Google Scholar
- Berger MF, Bulyk ML: Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors. Nature Protocols. 2009, 4 (3): 393-411. 10.1038/nprot.2008.195.PubMed CentralView ArticlePubMedGoogle Scholar
- Licatalosi DD, Mele A, Fak JJ, Ule J, Kayikci M, Chi SW, Clark TA, Schweitzer AC, Blume JE, Wang XN, Darnell JC, Darnell RB: HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature. 456 (7221): 464-U22.Google Scholar
- Varani G, Nagai K: RNA recognition by RNP proteins during RNA processing. Annual Review of Biophysics and Biomolecular Structure. 1998, 27: 407-445. 10.1146/annurev.biophys.27.1.407.View ArticlePubMedGoogle Scholar
- Moras D: Aminoacyl-tRNA synthetases. Current Opinion in Structural Biology. 1992, 2: 138-142. 10.1016/0959-440X(92)90189-E.View ArticleGoogle Scholar
- van Kouwenhove M, Kedde M, Agami R: MicroRNA regulation by RNA-binding proteins and its implications for cancer. Nature Reviews Cancer. 2011, 11 (9): 644-656. 10.1038/nrc3107.View ArticlePubMedGoogle Scholar
- Contreras-Moreira B: 3D-footprint: a database for the structural analysis of protein-DNA complexes. Nucleic Acids Research. 2010, 38 (suppl 1): D91-D97.PubMed CentralView ArticlePubMedGoogle Scholar
- Hoffman MM, Khrapov MA, Cox JC, Yao J, Tong L, Ellington AD: AANT: the Amino Acid-Nucleotide Interaction Database. Nucleic Acids Research. 2004, 32 (suppl 1): D174-D181.PubMed CentralView ArticlePubMedGoogle Scholar
- Lewis BA, Walia RR, Terribilini M, Ferguson J, Zheng C, Honavar V, Dobbs D: PRIDB: a protein-RNA interface database. Nucleic Acids Research. 2011, 39: D277-D282. 10.1093/nar/gkq1108.PubMed CentralView ArticlePubMedGoogle Scholar
- Xie Z, Hu S, Blackshaw S, Zhu H, Qian J: hPDI: a database of experimental human protein-DNA interactions. Bioinformatics. 2010, 26 (2): 287-289. 10.1093/bioinformatics/btp631.PubMed CentralView ArticlePubMedGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Research. 2000, 28: 235-242. 10.1093/nar/28.1.235.PubMed CentralView ArticlePubMedGoogle Scholar
- Allers J, Shamoo Y: Structure-based analysis of Protein-RNA interactions using the program ENTANGLE. Journal of Molecular Biology. 2001, 311: 75-86. 10.1006/jmbi.2001.4857.View ArticlePubMedGoogle Scholar
- Norambuena T, Melo F: The Protein-DNA Interface database. Bmc Bioinformatics. 2010, 11:Google Scholar
- Kirsanov DD, Zanegina ON, Aksianov EA, Spirin SA, Karyagina AS, Alexeevski AV: NPIDB: nucleic acid-protein interaction database. Nucleic Acids Research. 2013, 41 (D1): D517-D523. 10.1093/nar/gks1199.PubMed CentralView ArticlePubMedGoogle Scholar
- Terribilini M, Lee JH, Yan CH, Jernigan RL, Honavar V, Dobbs D: Prediction of RNA binding sites in proteins from amino acid sequence. Rna-a Publication of the Rna Society. 2006, 12 (8): 1450-1462. 10.1261/rna.2197306.View ArticleGoogle Scholar
- Terribilini M, Sander JD, Lee JH, Zaback P, Jernigan RL, Honavar V, Dobbs D: RNABindR: a server for analyzing and predicting RNA-binding sites in proteins. Nucleic Acids Research. 35: W578-W584.Google Scholar
- Wang LJ, Brown SJ: BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Research. 2006, 34: W243-W248. 10.1093/nar/gkl298.PubMed CentralView ArticlePubMedGoogle Scholar
- Torshin IY, Weber IT, Harrison RW: Geometric criteria of hydrogen bonds in proteins and identification of 'bifurcated' hydrogen bonds. Protein Engineering. 2002, 15 (5): 359-363. 10.1093/protein/15.5.359.View ArticlePubMedGoogle Scholar
- Elkayam E, Kuhn CD, Tocilj A, Haase AD, Greene EM, Hannon GJ, Joshua-Tor L: The Structure of Human Argonaute-2 in Complex with miR-20a. Cell. 2012, 150: 100-110. 10.1016/j.cell.2012.05.017.PubMed CentralView ArticlePubMedGoogle Scholar
- McDonald IK, Thornton JM: Satisfying Hydrogen-Bonding Potential in Proteins. Journal of Molecular Biology. 1994, 238 (5): 777-793. 10.1006/jmbi.1994.1334.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.