Skip to main content

Identification of single-stranded and double-stranded dna binding proteins based on protein structure

Abstract

Background

Protein-DNA interactions are essential for many biological processes. However, the structural mechanisms underlying these interactions are not fully understood. DNA binding proteins can be classified into double-stranded DNA binding proteins (DSBs) and single-stranded DNA binding proteins (SSBs), and they take part in different biological functions. DSBs usually act as transcriptional factors to regulate the genes' expressions, while SSBs usually play roles in DNA replication, recombination, and repair, etc. Understanding the binding specificity of a DNA binding protein is helpful for the research of protein functions.

Results

In this paper, we investigated the differences between DSBs and SSBs on surface tunnels as well as the OB-fold domain information. We detected the largest clefts on the protein surfaces, to obtain several features to be used for distinguishing the potential interfaces between SSBs and DSBs, and compared its structure with each of the six OB-fold protein templates, and use the maximal alignment score TM-score as the OB-fold feature of the protein, based on which, we constructed the support vector machine (SVM) classification model to automatically distinguish these two kinds of proteins, with prediction accuracy of 87%,83% and 83% for HOLO-set, APO-set and Mixed-set respectively.

Conclusions

We found that they have different ranges of tunnel lengths and tunnel curvatures; moreover, the alignment results with OB-fold templates have also found to be the discriminative feature of SSBs and DSBs. Experimental results on 10-fold cross validation indicate that the new feature set are effective to describe DNA binding proteins. The evaluation results on both bound (DNA-bound) and non-bound (DNA-free) proteins have shown the satisfactory performance of our method.

Background

The family of DNA binding proteins is able to recognize and bind to DNAs, and they play vital roles in many biological processes such as DNA replication, recombination, repair, transcription, translation, and maintenance of telomeres, and so on [14]. There are two kinds of DNAs, single-stranded DNA (ssDNA) and double-stranded DNA (dsDNA). Accordingly, the DNA binding proteins usually consist of single-stranded DNA-binding proteins (SSBs) and double-stranded DNA-binding proteins (DSBs). SSB binds with ssDNA with high affinity and low specificity, and is mainly involved in DNA replication, recombination and repair. While DSBs involve in binding to particular dsDNA sequences, to modulate the process of transcription, to cleave DNA molecules, or to be involved in chromosome packaging and transcription in the cell nucleus, etc. Though there are some researches [57] on the SSB and DSB respectively, few attentions have been paid on investigating what makes SSB and DSB have such different kind of binding specificity.

With the development of biotechnology, a large amount of proteins have been sequenced. However, SSBs have shown to have little sequence conservation [8]. Even DSBs involved in similar functions may have conserved subsequences, different kinds of DSBs with different functions seems to show few common subsequences. Therefore, it is hard to recognize SSB sequences from DSB sequences, or vice versa. Now that the molecular structure determines its biological function, structural information is expected to provide insight on the binding mechanism of SSB or DSB. The great progress of the structure genomics project [9] results that more and more high resolution 3D structures for DSBs and SSBs are available now, which makes it possible to investigate the common structural differences between SSB and DSB that are responsible for the binding specificity. In the meantime, the investigation results can help to annotate or refine the annotation of the proteins with known structures yet unknown or not fully understood functions. In fact, up to Jan. 25, 2013, the Protein Data Bank (PDB) [10] contains 3390 structures for DNA binding proteins (see Additional file 1), among them only about 30% and 5% are annotated as DSBs and SSBs, respectively, and whether the remains belong to DSBs or SSBs are still not very clear. Therefore, a computational method is required to annotate the DNA binding protein as DSB or SSB automatically. To address this question, this work is devoted to characterize the structural differences between DSBs and SSBs, and then to construct the distinguishing model that can automatically refine the annotations of the DNA binding proteins.

The surface of a protein is generally irregular, containing many clefts and grooves of varying shapes and sizes [11]. Previous researches have shown that a large cleft can provide an increased opportunity for the protein to form interactions with other molecules, particularly small ligands [12, 13]. Therefore, some researches used a particularly large and deep cleft to characterize the binding active sites of the proteins [11, 13, 14]. We guess that for DNA binding proteins, the cleft properties on the surface may also play important roles on the dsDNA/ssDNA binding specificity.

Research results have shown that although the sequences of different SSBs are very different, there are well-conserved elements in the structures. That is, most SSBs contain one or more OB (oligonucleotide/oligosaccharide binding) -fold domains [6, 1518]. A typical OB-fold has a five-stranded beta-sheet coiled to form a closed beta-barrel. This barrel is capped by an alpha-helix located between the third and fourth strands. The OB-fold plays critical role in binding with ssDNA. Although it is hard to say that the OB-fold is unique for SSBs, we think that it should also be used as an important descriptor to distinguish SSBs from DSBs.

In this paper, we aim to investigate the structural differences between collected SSBs and DSBs, and extract the structure-based features related to surface clefts and OB-folds, based on which, we construct a computational model that can automatically classify the DNA protein as a DSB or SSB by using the widely used support vector machine (SVM). The promising performance suggests that our method will be useful in the protein function annotation and refinement.

Methods

Data sets

We first extracted the structures of all 3390 DNA binding proteins from PDB (Jan. 25, 2013 release) according to their annotations, which contain 1039 DSBs (HOLO 890, APO 149), 158 SSBs (HOLO 70, APO 88) and 2193 unknowns. Then we use PISCES (http://dunbrack.fccc.edu/PISCES.php) [19] to get the non-redundant set, in which every structure is either solved by NMR or by X-ray yet with resolution better than 3Å, the sequence identity is less than 30%, and the length of chain is greater than 40 amino acid residues. As a result, we finally got 204 DSBs (HOLO 154 and APO 50), 75 SSBs (HOLO 37 and APO 38) and 727 unknowns (Additional file 2). For simplicity, we call the set containing protein-DNA bound structures as HOLO set, and the set containing protein-DNA unbound structures as APO set, and the proteins in these sets are respectively denoted as DSB_holo, SSB_holo, DSB_apo, and SSB_apo hereinafter.

Features on clefts

The protein surface has a very complex and irregular shape that contains concave, convex and flat, which contributes to protein to interact with the external environment. The clefts, pockets, or cavities are generally considered as the active sites on protein surfaces, thus the research on them are meaningful of understanding the protein functions.

Now that it has been reported that a large cleft can provide an increased opportunity for the protein to form interactions with other molecules [12, 13], and the particularly large and deep clefts have been used to characterize the binding sites of the proteins [11], we consider that for DNA binding proteins, the large clefts on the surface may also play important roles on the dsDNA/ssDNA binding. In other words, the large clefts on SSB would be narrow enough to prevent it from binding with dsDNA.

Some tools have been developed to recognize the clefts based on the protein structures, such as HOLE [20], MOLE [21, 22], MolAxis [23] and Caver [24, 25]. In this work, we applied CAVER 3.0 package to detect the clefts and the corresponding indexes of the largest clefts (also called as tunnels in this work) on the protein surfaces, to investigate whether they are possible to be used for distinguishing the potential interfaces between SSBs and DSBs. Concretely, we mainly got three indexes of the detected tunnels: length, curvature and bottleneck radius.

Length: indicating the length of the path from the start point to the end point along the tunnel axis.

Curvature: indicating the curvature of the tunnel. The curvature of the tunnel is calculated by Curvature = Length/Distance, where the distance is the length of the straight line from the start point to the end point of the tunnel. The greater the curvature, the curved is the tunnel.

Bottleneck radius: indicating the radius of the narrowest part of the tunnel, also representing the radius of the largest possible ball that can be centered at a given point of the tunnel axis without colliding with the input structure.

Since the protein surface contains many tunnels of varying shapes and sizes. The CAVER package return as many tunnels as possible. For the reason mentioned above, we just check the largest one in terms of maximizing (Length*Bottleneck Radius). For example, for protein 1A73, CAVER detects out 27 tunnels shown in Figure 1, and 1their indexes are listed in Table 1. According to the choosing criteria, tunnel number 25 (Figure 2) will be considered as the largest tunnel.

Figure 1
figure 1

All detected tunnels of protein 1A73. The graph shows the CAVER package detects out 27 tunnels in 1A73 protein, and show 3D structure for all tunnels with different colours in protein surface.

Table 1 Index values for all tunnels of 1A73
Figure 2
figure 2

The largest tunnel (25#) of protein 1A73. The graph shows the red tunnel is the largest tunnel in terms of maximizing (Length*Bottleneck Radius).

Feature on OB-fold domain

OB-fold is a small structural motif that was first characterized in 1992 in four proteins that bind either oligonucleotides or oligosaccharides [26]. Typically, the OB fold comprises a five-stranded β-sheet coiled to form a closed β barrel and capped by an α-helix located between the third and fourth β strands [2730]. Although OB-fold has since been observed at protein/protein interfaces as well, but the nucleic acid-binding superfamily is the largest within the OB-folds, and proteins containing OB-folds involve almost any time that single-stranded DNAs or RNAs are present or require manipulation [8]. Now that OB-folds are conserved and play important roles in SSB-ssDNA binding, we extract the feature indicating whether OB-fold is contained in a protein, with the hope that the feature is able to distinguish SSBs with DSBs.

Considering that OB-folds evolve into several variants though they are very conserved, we choose the chain A of six typical proteins (PDB:1QUQ [31], 1V1Q [32], 4GS3 [33], 3ULL [34], 1O7I [35], 1JMC [36]) shown in Figure 3 as OB-fold templates. From Figure 3, we can see that these proteins contain nothing except for OB-fold domains. Moreover, each chain of the former five proteins contains one and only one OB-fold domain. Since 1JMC_A contains two OB-fold domains, we only use one of them as the template.

Figure 3
figure 3

Six templates of the OB-fold domain. They show structural similarity but different topologies, and the similarity of sequences are with <30%.

For an unknown protein, we use the protein structure alignment package TM-align [37] to compare its structure with each of the templates and use the maximal alignment score TM-score as the OB-fold feature of the protein.

Classification model and evaluation

In this work, we used support vector machine (SVM) to build the classification model. The SVM classifiers were implemented using Matlab 2012a SVM package with the Gaussian Radial Basis Function (RBF) as a kernel.

In order to evaluate the performance of the prediction results, we used several measures, including Accuracy, Sensitivity, Specificity, and F-measured and area under the receiver operating characteristic curve (AUC). Let TP (true positive) is the number of proteins correctly predicted as SSBs, FP (false positive) is the number of proteins incorrectly predicted as SSBs, TN (true negative) be the number of proteins correctly predicted as DSBs and FN (false negative) be the number of proteins incorrectly predicted as DSBs. The accuracy (ACC), sensitivity (SN), specificity (SP), F-measured (F1) and Matthews Correlation Coefficient (MCC) are defined as the following:

Accuracy = T P + T N T P + F N + T N + F P
(1)
Sensitivity = T P T P + F N
(2)
Specificity = T N T N + F P
(3)
F - measure = 2 × T P 2 × T P + F P + F N
(4)
MCC = T P × T N - F P × F N ( T P + F P ) × ( T P + F N ) × ( T N + F P ) × ( T N + F N )
(5)

We use 10-fold cross validation test to evaluate the classification performance. Because of the unbalance of different kinds of proteins, in each fold we iterate 15 times to randomly select the equal numbers of SSBs and DSBs into the train set by using down-sampling method, and use the voting strategy to assign the class label of the test protein. To the best of our knowledge, there is no computational method to distinguish SSBs from DSBs, therefore we also train the random classifier as the baseline in each test.

Results and discussion

Investigation of the distinguishing ability of the features

By using CAVER3.0, we have detected 990 tunnels from HOLO set (865 for DSBs, 125 for SSBs), and 1168 tunnels from APO set (757 for DSBs, 411 for SSBs). According to the maximizing criterion described above, we selected one maximal tunnel for each protein. As a result, we finally got 37 tunnels for bound (DNA-bound) SSBs, 38 tunnels for unbound (DNA-free) SSBs, 154 tunnels for bound DSBs and 51 tunnels for unbound DSBs. Accordingly, we also got three feature values for each tunnel. By using TM-align, we aligned every protein with each of the six OB-fold templates shown in Figure 3, and got the maximal alignment score as the TM-score of the protein. In order to investigate the distinguishing ability of the features, we had statistically analysed the distribution for each feature, shown in Figure 4. It is obvious that, bottleneck radius shows little difference between DSBs and SSBs in either bound or unbound forms; and the DNA binding protein in bound form tends to have larger bottleneck radius than that in unbound form, which may be due to the fact that the protein usually need to widen the tunnel for binding with the DNA. SSBs tend to have the smaller tunnel length and curvature than DSBs, and tunnel length seems to be more distinguishable than tunnel curvature between DSBs and SSBs; moreover, it seems easier to differentiate DSBs and SSBs in bound forms than in unbound forms by using either of the features. As expected, SSBs obtain much higher TM-scores than DSBs by comparing to the OB-fold templates, illustrating that most SSBs have OB-fold like domains. In conclusion, TM-score, tunnel length and tunnel curvature are usable features to construct distinguish model for SSBs and DSBs, while bottleneck radius is lack of the distinguishing ability. Since the statistical results of tunnel length and tunnel curvature are very similar, we further investigate the correlation between these two features, listed in Table 2 showing that they are actually positive correlated with each other.

Figure 4
figure 4

Feature distributions of different kinds of DNA-binding proteins. These graphs show the box plot of the four features for the HOLO and APO datasets. Those are (a) tunnel bottleneck radius, (b) tunnel length, (c) tunnel curvature and (d) TM-score.

Table 2 Correlation of tunnel length and tunnel curvature

This table shows the values of Pearson coefficient and P-value between tunnel length and curvature. The columns of Pearson coefficient and P-value correspond to the pairs of DSBs/SSBs in HOLO set and APO set, respectively.

Validation of the differentiating features

We have done the validation experiments on HOLO set and APO set by using one, two or three features to construct the classification models. The validation performances are shown in Table 3, 4 respectively. From the tables we can see that, feature TM-Score can recognize out SSBs with high accuracy, while the feature tunnel length/curvature can recognize out DSBs with high accuracy, meaning that the distinguishing abilities of TM-Score and length/curvature are complementary. The performance of the classification model constructed with length feature is better than that constructed with curvature, also better than or nearly equal to that constructed with length and curvature features, further confirming that curvature feature is redundant with length feature and adding redundant features into the classification model does not necessarily get the positive response. Compared to the model with single feature, the significant enhancement of performance when using TM-Score together with one or more other features showing that constructing classification models with complementary features is preferable to the discrimination of DSBs and SSBs.

Table 3 Performance on HOLO set
Table 4 Performance on APO set

Independent test on APO set

In many cases, it is easier to collect information on DNA binding proteins in the bound form than in unbound form, whereas we need to know whether an unknown unbound protein be SSB or DSB. Thus, we train the classifier on HOLO set and test it on APO set. The results are listed in Table 5 from which we can see that the structural information on tunnel and OB-fold can actually reflect that differences between SSBs and DSBs thus can be used as discriminant features to build the classification model.

Table 5 Performance of the independent test

Prediction on mixed set

In practice, we often found the available dataset include not only the bound form proteins, but also the unbound form proteins, whereas we need to know whether an unknown DNA binding protein be SSB or DSB. Thus, we have done the validation experiments on the mixed set by using one, two or three features construct the classification models. The results are listed in Table 6 from the tables we can see that, feature TM-Score can still recognize out SSBs with high accuracy in each single feature. Compared to the models with single feature, the best performance using more features with an accuracy of 0.8251, MCC of 0.6632, SN of 0.8605 and SP of 0.7904 is much better. Thus, we further train the classifier on mixed set and predicted the unknown proteins (727 unknowns). The classified results are listed in additional file 2.

Table 6 Performance on mixed set

Conclusion

Despite many similar properties, dsDNA and ssDNA possess distinctive entities that are recognized differently by specialized dsDNA and ssDNA binding proteins, respectively. SSBs and DSBs binding interfaces are thus expected to differ in their geometrical features consistent with the different nature of dsDNA and ssDNA [29, 38, 39]. While the sequence and structural properties of DSBs and SSBs binding interfaces has been studied during the last decade [28, 40], computationally distinguishing between the DSBs and SSBs binding interfaces is still a lack of research. In this study, we investigated surface tunnels features of SSBs and DSBs and found that they have different ranges of tunnel lengths and tunnel curvatures; moreover, the alignment results with OB-fold templates have also found to be the discriminative feature of SSBs and DSBs. Therefore, we made the first try to present a method to computationally distinguish SSBs with DSBs based on the discriminant features and got the satisfactory results.

The protein surface features should also be useful for the analysis of other types of molecular interactions, such as protein-ligand, protein-RNA, and protein-protein complexes, and for the study of a variety of proteins, multiple binding sites or a specific family of proteins. These problems would require modelling interface surfaces of different characteristics such as compatibility, different sizes, and cooperatives between these surfaces, thus new surface features in addition to the solid angle may be needed.

Abbreviations

DSBs:

double-stranded DNA binding proteins

SSBs:

single-stranded DNA binding proteins

ssDNA:

single-stranded DNA

dsDNA:

double-stranded DNA

OB-fold:

OB (oligonucleotide/oligosaccharide binding) -fold

ACC:

accuracy

SN:

sensitivity

SP:

specificity

F1:

F-measured

MCC:

Matthews Correlation Coefficient. AUC: area under the receiver operating characteristic curve

TP:

true positive

FP:

false positive

TN:

true negative

FN:

false negative.

References

  1. Zeng T, Li J, Liu J: Distinct interfacial biclique patterns between ssDNA-binding proteins and those with dsDNAs. Proteins: Structure, Function, and Bioinformatics. 2011, 79 (2): 598-610. 10.1002/prot.22908.

    Article  CAS  Google Scholar 

  2. Shazman S, Elber G, Mandel-Gutfreund Y: From face to interface recognition: a differential geometric approach to distinguish DNA from RNA binding surfaces. Nucleic Acids Research. 2011, 39 (17): 7390-7399. 10.1093/nar/gkr395.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  3. Attaiech L, Olivier A, Mortier-Barrière I, Soulet AL, Granadel C, Martin B: Role of the single-stranded DNA-binding protein SsbB in pneumococcal transformation: maintenance of a reservoir for genetic plasticity. PLoS Genetics. 2011, 7 (6): e1002156-10.1371/journal.pgen.1002156.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  4. Richard DJ, Bolderson E, Cubeddu L, Wadsworth RI, Savage K, Sharma GG: Single-stranded DNA-binding protein hSSB1 is critical for genomic stability. Nature. 2008, 453 (195): 677-681.

    Article  CAS  PubMed  Google Scholar 

  5. Shlyakhtenko LS, Lushnikov AY, Miyagi A, Lyubchenko YL: Specificity of binding of single-stranded DNA-binding protein to its target. Biochemistry. 2012, 51 (7): 1500-1509. 10.1021/bi201863z.

    Article  CAS  PubMed  Google Scholar 

  6. Wakamatsu T, Kitamura Y, Kotera Y, Nakagawa N, Kuramitsu S, Masui R: Structure of RecJ exonuclease defines its specificity for single-stranded DNA. The Journal of Biological Chemistry. 2010, 285 (13): 9762-9769. 10.1074/jbc.M109.096487.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  7. Edsö JR, Gustafsson C, Cohn M: Single- and double-stranded DNA binding proteins act in concert to conserve a telomeric DNA core sequence. Genome Integrity. 2011, 2 (1): 2-9. 10.1186/2041-9414-2-2.

    Article  Google Scholar 

  8. Theobald DL, Mitton-Fry RM, Wuttke DS: Nucleic acid recognition by OB-fold proteins. Annu Rev Biophys Biomol Struct. 2003, 32: 115-133. 10.1146/annurev.biophys.32.110601.142506.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  9. Montelione GT, Anderson S: Structural genomics: keystone for a human proteome project. Nature Structural Biology. 1999, 6 (1): 11-12. 10.1038/4878.

    Article  CAS  PubMed  Google Scholar 

  10. Berman HM, Westbrook J, Feng Z, Nakagawa N, Kuramitsu S, Masui R: The protein data bank. Nucleic Acids Research. 2000, 28 (1): 235-242. 10.1093/nar/28.1.235.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  11. Laskowski RA, Luscombe NM, Swindells MB, Thornton JM: Protein clefts in molecular recognition and function. Protein and Peptide Letters. 1996, 5 (12): 2438-2452.

    CAS  Google Scholar 

  12. Glaser F, Morris RJ, Najmanovich RJ, Laskowski RA, Thornton JM: A method for localizing ligand binding pockets in protein structures. Proteins. 2006, 62 (2): 479-488.

    Article  CAS  PubMed  Google Scholar 

  13. Qvist J, Davidovic M, Hamelberg D, Halle B: A dry ligand-binding cavity in a solvated protein. PNAS. 2008, 105 (17): 6296-6301. 10.1073/pnas.0709844105.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  14. Sonavane Shrihari CP: Cavities in protein-DNA and protein-RNA interfaces. Nucleic Acids Research. 2009, 37 (14): 4613-4620. 10.1093/nar/gkp488.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  15. Marceau AH, Bahng S, Massoni SC, Georgel NP, Sandler SJ, Marians KJ: Structure of the SSB-DNA polymerase III interface and its role in DNA replication. The EMBO Journal. 2011, 30 (20): 4236-4247. 10.1038/emboj.2011.305.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  16. Hollis T, Stattel JM, Walther DS, Richardson CC, Ellenberger T: Structure of the gene 2.5 protein, a single-stranded DNA binding protein encoded by bacteriophage T7. PNAS. 2001, 98 (17): 9557-9562. 10.1073/pnas.171317698.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  17. Evansa RJ, Daviesb DR, Bullard JM, Christensenb J, Greena LS, Guilesa JW: Structure of PolC reveals unique DNA binding and fidelity determinants. PNAS. 2008, 105 (52): 20695-20700. 10.1073/pnas.0809989106.

    Article  Google Scholar 

  18. Pretto DI, Tsutakawa S, Brosey CA, Castillo A, Chagot ME, Smith JA: Structural dynamics and ssDNA binding activity of the three N-terminal domains of the large subunit of Replication Protein A from small angle X-ray scattering. Biochemistry. 2010, 49 (13): 2880-2889. 10.1021/bi9019934.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  19. Wang G, Dunbrack RL: PISCES: a protein sequence culling server. Bioinformatics. 2003, 19 (12): 1589-1591. 10.1093/bioinformatics/btg224.

    Article  CAS  PubMed  Google Scholar 

  20. Smart OS, Neduvelil JG, Wang X, Wallace BA, Sansom MS: HOLE: a program for the analysis of the pore dimensions of ion channel structural models. J Mol Graph. 1996, 14 (6): 354-360. 10.1016/S0263-7855(97)00009-X.

    Article  CAS  PubMed  Google Scholar 

  21. Berka K, Hanák O, Sehnal D, Banáš P, Navrátilová V, Jaiswal D, Otyepka M: MOLEonline 2.0 interactive web-based analysis of biomacromolecular channels. Nucleic Acids Research. 2012, 40 (W1): W222-W227. 10.1093/nar/gks363.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  22. Petřek M, Košinová P, Koča J, Otyepka M: MOLE: a voronoi diagram based explorer of molecular channels, pores, and tunnels. Structure. 2007, 15 (11): 1357-1363. 10.1016/j.str.2007.10.007.

    Article  PubMed  Google Scholar 

  23. Yaffe E, Fishelovitch D, Wolfson HJ, Halperin D, Nussinov R: MolAxis: a server for identification of channels in macromolecules. Nucleic Acids Research. 2008, 36 (suppl 2): W210-W215.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  24. Petrek M, Otyepka M, Banas P, Kosinova P, Koca J: CAVER: a new tool to explore routes from protein clefts, pockets and cavities. BMC Bioinformatics. 2006, 7 (1): 316-10.1186/1471-2105-7-316.

    Article  PubMed Central  PubMed  Google Scholar 

  25. Chovancova E, Pavelka A, Benes P, Strnad O, Brezovsky J, Kozlikova B: CAVER 3.0: a tool for the analysis of transport pathways in dynamic protein structures. PLoS Computational Biology. 2012, 8 (10): e1002708-10.1371/journal.pcbi.1002708.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  26. Murzin AG: OB(oligonucleotide/oligosaccharide binding)-fold: common structural and functional solution for non-homologous sequences. The EMBO Journal. 1993, 12 (3): 861-867.

    PubMed Central  CAS  PubMed  Google Scholar 

  27. Yu EY, Wang F, Lei M, Lue NF: A proposed OB-fold with a protein-interaction surfacein candida albicans telomerase protein Est3. Nature Structural & Molecular Biology. 2008, 15 (9): 985-989. 10.1038/nsmb.1471.

    Article  Google Scholar 

  28. Bochkarev A, Bochkareva E: From RPA to BRCA2: lessons from single-stranded DNA binding by the OB-fold. Current Opinion in Structural Biology. 2004, 14 (1): 36-42. 10.1016/j.sbi.2004.01.001.

    Article  CAS  PubMed  Google Scholar 

  29. Skowyra A, Macneil SA: Identification of essential and non-essential single-stranded DNA-binding proteins in a model archaeal organism. Nucleic Acids Research. 2012, 40 (3): 1077-1090. 10.1093/nar/gkr838.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  30. Kerr ID, Wadsworth RIM, Cubeddu L, Blankenfeldt W, Naismith JH, White MF: Insights into ssDNA recognition by the OB fold from a structural and thermodynamic study of Sulfolobus SSB protein. The EMBO Journal. 2003, 22 (11): 2561-2570. 10.1093/emboj/cdg272.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  31. Bochkarev A, Bochkareva E, Frappier L, Edwards AM: The crystal structure of the complex of replication protein A subunits RPA32 and RPA14 reveals a mechanism for single-stranded DNA binding. The EMBO journal. 1999, 18 (16): 4498-4504. 10.1093/emboj/18.16.4498.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  32. Liu JH, Chang TW, Huang CY, Chen SU, Wu HN, Chang MC: Crystal structure of PriB, a primosomal DNA replication protein of Escherichia coli. Journal of Biological Chemistry. 2004, 279 (48): 50465-50471. 10.1074/jbc.M406773200.

    Article  CAS  PubMed  Google Scholar 

  33. Liebschner D, Brzezinski K, Dauter M, Dauter Z, Nowak M, Kur J: Dimeric structure of the N-terminal domain of PriB protein from Thermoanaerobacter tengcongensis solved ab initio. Acta Crystallographica Section D: Biological Crystallography. 2012, 68 (12): 1680-1689. 10.1107/S0907444912041637.

    Article  PubMed Central  CAS  Google Scholar 

  34. Yang C, Curth U, Urbanke C, Kang C: Crystal structure of human mitochondrial single-stranded DNA binding protein at 2.4 Å resolution. Nature Structural & Molecular Biology. 1997, 4 (2): 153-157. 10.1038/nsb0297-153.

    Article  CAS  Google Scholar 

  35. Kerr ID, Wadsworth RI, Cubeddu L, Blankenfeldt W, Naismith JH, White MF: Insights into ssDNA recognition by the OB fold from a structural and thermodynamic study of Sulfolobus SSB protein. The EMBO journal. 2003, 22 (11): 2561-2570. 10.1093/emboj/cdg272.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  36. Bochkarev A, Pfuetzner RA, Edwards AM, Frappier L: Structure of the single-stranded-DNA-binding domain of replication protein a bound to DNA. Nature. 1997, 176-181.

    Google Scholar 

  37. Zhang Y, Skolnick J: TM-align: a protein structure alignment algorithm. Nucleic Acids Research. 2005, 33 (7): 2302-2309. 10.1093/nar/gki524.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  38. Paytubi S, McMahon SA, Graham S, Liu H, Botting CH, Makarova KS: Displacement of the canonical single-stranded DNA-binding protein in the Thermoproteales. Proceedings of the National Academy of Sciences. 2012, 109 (7): E398-E405. 10.1073/pnas.1113277108.

    Article  CAS  Google Scholar 

  39. Morgan HP, Estibeiro P, Wear MA, Max KE, Heinemann U, Cubeddu L: Sequence specificity of single-stranded DNA-binding proteins: a novel DNA microarray approach. Nucleic Acids Research. 2007, 35 (10): e75-10.1093/nar/gkm040.

    Article  PubMed Central  PubMed  Google Scholar 

  40. Kozlov AG, Jezewska MJ, Bujalowski W, Lohman TM: Binding specificity of escherichia coli single-stranded DNA binding protein for the χ subunit of DNA pol III holoenzyme and pria helicase. Biochemistry. 2010, 49 (17): 3555-3566. 10.1021/bi100069s.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work is supported by the grants from the National Science Foundation of China (61272274), Program for New Century Excellent Talents in Universities (NCET-10-0644), the Open Research Fund of State Key Laboratory of Hybrid Rice (Wuhan University) (KF201301) and the Fundamental Research Funds for the Central Universities (No. 2012211020204).

Declarations

The publication costs for this article were funded by the National Science Foundation of China (61272274).

This article has been published as part of BMC Bioinformatics Volume 15 Supplement 12, 2014: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2013): Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/15/S12.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juan Liu.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

W.W., J.L. contributed to the software design and testing. W.W. and X.Z. implemented the software. W.W. and J.L. wrote this paper. All authors read and approved the final manuscript.

Electronic supplementary material

Additional file 1: This file contains the complete list of PDB codes for DNA-binding proteins set. (DOCX 24 KB)

12859_2014_6675_MOESM2_ESM.xlsx

Additional file 2: This file describes the classified results of the unknown proteins by the mixed set classifier. (XLSX 52 KB)

Rights and permissions

Open Access  This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, W., Liu, J. & Zhou, X. Identification of single-stranded and double-stranded dna binding proteins based on protein structure. BMC Bioinformatics 15 (Suppl 12), S4 (2014). https://doi.org/10.1186/1471-2105-15-S12-S4

Download citation

  • Published:

  • DOI: https://doi.org/10.1186/1471-2105-15-S12-S4

Keywords