The PD-(D/E)XK superfamily revisited: identification of new members among proteins involved in DNA metabolism and functional predictions for domains of (hitherto) unknown function
© Kosinski et al; licensee BioMed Central Ltd. 2005
Received: 13 April 2005
Accepted: 12 July 2005
Published: 12 July 2005
The PD-(D/E)XK nuclease superfamily, initially identified in type II restriction endonucleases and later in many enzymes involved in DNA recombination and repair, is one of the most challenging targets for protein sequence analysis and structure prediction. Typically, the sequence similarity between these proteins is so low, that most of the relationships between known members of the PD-(D/E)XK superfamily were identified only after the corresponding structures were determined experimentally. Thus, it is tempting to speculate that among the uncharacterized protein families, there are potential nucleases that remain to be discovered, but their identification requires more sensitive tools than traditional PSI-BLAST searches.
The low degree of amino acid conservation hampers the possibility of identification of new members of the PD-(D/E)XK superfamily based solely on sequence comparisons to known members. Therefore, we used a recently developed method HHsearch for sensitive detection of remote similarities between protein families represented as profile Hidden Markov Models enhanced by secondary structure. We carried out a comparison of known families of PD-(D/E)XK nucleases to the database comprising the COG and PFAM profiles corresponding to both functionally characterized as well as uncharacterized protein families to detect significant similarities. The initial candidates for new nucleases were subsequently verified by sequence-structure threading, comparative modeling, and identification of potential active site residues.
In this article, we report identification of the PD-(D/E)XK nuclease domain in numerous proteins implicated in interactions with DNA but with unknown structure and mechanism of action (such as putative recombinase RmuC, DNA competence factor CoiA, a DNA-binding protein SfsA, a large human protein predicted to be a DNA repair enzyme, predicted archaeal transcription regulators, and the head completion protein of phage T4) and in proteins for which no function was assigned to date (such as YhcG, various phage proteins, novel candidates for restriction enzymes). Our results contributes to the reduction of "white spaces" on the sequence-structure-function map of the protein universe and will help to jump-start the experimental characterization of new nucleases, of which many may be of importance for the complete understanding of mechanisms that govern the evolution and stability of the genome.
The PD-(D/E)XK superfamily of Mg2+-dependent nucleases groups together protein domains initially identified in structurally characterized type II restriction enzymes (REases) (reviews: [1, 2]) and later found in diverse enzymes involved in DNA replication, repair, and recombination, including phage λ exonuclease , bacterial enzymes exerting ssDNA nicking in the context of methyl-directed and very-short-patch DNA repair: MutH  and Vsr , Tn7 transposase TnsA , archaeal Holliday junction resolvases (AHJRs) Hjc/Hje [7–9], phage T7 endonuclease I , the XPF/Rad1/Mus81 family of nucleases that cleaves branched structures generated during DNA repair, replication, and recombination , and RecB nuclease .
We have previously used the comparisons of sequence profiles to detect novel members of the PD-(D/E)XK superfamily [16, 18, 19, 23–25], as well as applied the sequence-structure threading to map the sequences of REases onto the crystal structures of known members [21, 26–28]. However, both approaches have shortcomings: the profile-profile analysis is unable to find very remote relationships when families include only a few members (as is often the case in the PD-(D/E)XK superfamily) or if the compared proteins contain large unrelated insertions, and the threading analysis works only if an experimentally solved structure is available for one of the potential homologs. Therefore, we decided to use an intermediate approach, in which the compared sequence profiles are combined with the structural information, which can be obtained either from the crystal structure (if it is available for one of the members of the considered families) or predicted by bioinformatics methods, thereby allowing protein fold recognition without the explicit reference to an experimentally solved protein structure. We used the recently developed method HHsearch that allows comparison of sequence alignments together with secondary structures represented as Hidden Markov Models (HMM)  to compare the library of PD-(D/E)XK nuclease families obtained previously  with the set of domains in the Clusters of Orthologous Groups (COG and KOG) and PFAM  databases. Our aim was to identify the previously unknown PD-(D/E)XK domains in families of proteins that are either completely uncharacterized, or whose function is known (or predicted), but the structure and phylogenetic relationships are unclear.
Identification of new candidate PD-(D/E)XK nucleases with profile HHM searches
In order to carry out a systematic search for new PD-(D/E)XK nucleases, we prepared a set of multiple sequence alignments corresponding to previously identified families [18, 23, 32](see Methods). For each family we generated a profile HMM that included the sequences and predicted secondary structure (see Methods). These profile HMMs were compared with HHsearch  to a database of profile HMMs corresponding to multiple sequence alignments obtained from the COG, KOG, and PFAM databases, also with predicted secondary structures (see Methods for details). It is noteworthy, that searches initiated with most of the known PD-(D/E)XK nucleases in the query dataset identified families containing sequences of other known PD-(D/E)XK nucleases, thereby validating our approach and providing useful threshold values for confident identification of true members of PD-(D/E)XK superfamily. For instance, AtF16A14.4 profile detected herpesvirus alkaline exonuclease (pfam01771) with the e-value of 10-3, Pfu-HJR detected Mrr with the e-value of 10-2, YcjD detected Vsr with the e-value of 10-5, etc. It must be noted, that some of the "best hits" to true positives exhibited poor scores, e.g. Mrr detected HJR (pfam01870) with the e-value of 1.2. As expected, our HHM-HHM comparisons revealed also numerous other potential homologs, not included in the query dataset. Based on these results, we generated a preliminary list of candidate new PD-(D/E)XK subfamilies. For further analysis, we retained only such proteins, which have not been reported as members of the PD-(D/E)XK superfamily in earlier analyses carried out by ourselves or other groups (e.g. ).
New members of the PD-(D/E)XK superfamily Columns I and II contain accession numbers of Pfam and COG database entries corresponding to protein families analyzed in this work. Typically, COGs contain only subsets of seqeunces from the corresponding Pfam entries. Previously described molecular and/or cellular function and the newly predicted function is shown. * indicates only a general functional prediction based on the fact that a given protein family is found to possess a nuclease domain with a seemingly complete set of catalytic residues.
E. coli RmuC
transmembrane helix, coiled-coil regions, unknown alpha/beta domain
limiting inversions at short-inverted repeats
cleaves DNA structures arising during recombination of short- inverted repeats
S. pneumoniae CoiA
unknown C-terminal domain
DNA uptake process and recombination
degradation of one DNA strand during DNA uptake
E. coli SfsA
OB-fold N-terminal domain
sugar fermentation stimulation
H. sapiens KIAA1018
TPR-like domain, Rad18-like CCHC zinc finger, uknown 300aa domain
DNA repair, protein binding
E. coli YhcG
~200aa unknown N-terminal domain
Phage phi ETA orf25
TnsA or T7 EndoI
degradation of the host DNA upon lytic infection, production of recombinogenic fragments
A. tumefaciens AGRL2570
C-terminal domain with wHTH fold
DNA binding, DNA cleavage
Phage PSA gp51
P. abyssi PAB2104
second inactive AHJRlike domain, HTH_3 domain
DNA cleavage or nicking*
S. typhimurium STM4490
putative inactivated nuclease domain
T. volcanium TVN1166
N-terminal membrane helix
membrane-associated guardian against foreign DNA
A. tumefaciens AGR_C_8
P. abyssi PAB1046
unknown ~250aa N-terminal domain
DNA repair or recombination
Phage T4 gp4
TnsA or T7 EndoI
important for the phage T4 head assembly process 
determination if DNA is packed properly into the phage head
P. aerophilum PAE1662
type II REase
COG1322/pfam02646 (RmuC family)
COG4469/pfam06054 (CoiA family)
COG4469/pfam06054 is represented by the CoiA protein from Streptococcus pneumoniae, which has been implicated in the DNA uptake process and recombination, but without any clues as to the molecular mechanism of its action . The profile HMM analysis identified a relationship (P-value 10-2) between the central region of CoiA (aa 56–209) and COG4636, a family of predicted nucleases abundant in Cyanobacteria . This prediction was also confirmed by the fold-recognition analysis, which aligned this domain to the recently solved structure of the COG4636 member (Structural Genomics target Tt1808 from Thermus thermophilus Hb8, 1wdj in the PDB, annotated as "to be published") (top matches in FFAS, score -11.0, and mGenTHREADER, score 0.369) and the structure of the Hef nuclease (1j22 in the PDB, INBGU top match, score 8.38), which both belong to the PD-(D/E)XK superfamily. While the template structures exhibit unorthodox configurations of catalytic residues (migration of the Lys residue in COG4636 members and topological crossover and migration of the Glu residue in Hef and its homologs), members of the CoiA/COG4469/pfam06054 family exhibit a more typical PD-EXQ variant of the active site, which has been observed in REase BglII . Additionally, CoiA and its homologs possess a conserved Arg residue at the same position as the swapped Lys residue in COG4636. It will be interesting to determine if this residue is involved in substrate binding or catalysis. We hypothesize that in competent pneumococcus cells, the CoiA nuclease degrades one strand of the double-stranded DNA, while the other strand is imported inside the cell. Alternatively, CoiA may be involved in the incorporation of the foreign DNA into the chromosome, but this process is more likely to be carried out by the general homologous recombination machinery, such as RecA.
COG1489/pfam03749 (SfsA family)
COG1489/pfam03749 is represented by the SfsA protein from E. coli, which has been implicated in sugar fermentation stimulation by the so far unknown mechanism and was shown to non-specifically bind to DNA . The profile HMM analysis confidently identified a relationship (P-value 10-5) between the C-terminal domain of SfsA (aa 80–234) and the YraN family of previously predicted endonucleases . This finding was confirmed by the fold-recognition analysis, which confidently identified the Vsr nuclease, a member of the PD-(D/E)XK superfamily (1vsr and 1cw0 in the PDB ) as the best template for the sequence of the SfsA C-terminal domain (scores: INBGU: 17.65, mGenTHREADER: 0.428, FUGUE: 3.4). Interestingly, while the Vsr nuclease exhibits a highly unorthodox active site, with the (D/E)XK half-motif replaced by FXH, SfsA and its homologs (including the YraN family) exhibit a typical version of the nuclease catalytic motif, ID-EVK. The fold-recognition analysis revealed also that the N-terminal domain of SfsA (aa 1–80) is a member of the OB-fold superfamily of nucleic acid binding proteins . Thus, both domains of SfsA are likely to be involved in DNA binding, while the C-terminal domain is likely to possess an unforeseen nuclease activity. It will be interesting to determine whether the N-terminal or C-terminal domains or both are important for the activity of sugar fermentation stimulation and how does this activity relate to the predicted nuclease function. Or perhaps SfsA could be a hydrolase involved in another type of reaction than its homologs from the PD-(D/E)XK superfamily? Deletion mutagenesis based on our prediction and site-directed mutagenesis of the PD-(D/E)XK active site may help to address these questions.
KOG2143 (KIA001018 family)
KOG2143 groups together 4 large (>1000 aa) eukaryotic proteins, including the KIA001018 protein from H. sapiens (PMID: 10231032), which has been annotated as a DNA binding protein engaged in DNA repair, probably due to a presence of an easily identified Rad18-like CCHC zinc finger in the N-terminus . However, no suggestions concerning the mechanism of its action have been made. PSI-BLAST searches revealed that KOG2143 has homologs in other Eukaryota (plants and fungi) as well as in several bacteria (shorter versions, about 550 aa). The C-terminal domain (aa 935–1016 in KIA001018) shows significant similarity to HJR families (HHsearch P-values ranging from 10-3 to 10-5) and exhibits the orthodox version of the PD-(D/E)XK motif, perfectly conserved in all members of KOG2143 as well as in their prokaryotic relatives. The fold-recognition analysis identified AHJR (1gef) as one of the possible templates, but with low confidence (mGenTHREADER, score 0.406). The analysis of predicted secondary structures and the preliminary tertiary model of KIA001018 suggest that the putative nuclease domain in KOG2143 and related proteins corresponds to the minimal core of the PD-(D/E)XK fold, i.e. only 4 β-strands and 2 α-helices  (Figure 1). Combination of reciprocal HHsearches and fold-recognition carried out for the remaining part of the KIA001018 sequence revealed that KOG2143 members possess a long (300–400 aa) α-helical domain, most probably comprising 5–7 tetratricopeptide-like repeats (TPRs), an unknown domain (~300 aa) with no clear relationship to any protein families and no confident prediction from FR, and the afore-mentioned Rad18-like CCHC zinc finger. The unknown domain and the zinc finger are missing from the shorter prokaryotic members. Such composition of domains suggests that KIA001018 is a DNA binding protein with endonuclease activity, possibly engaged in DNA repair, with a potential to bind other proteins via TPR domains. The analysis of genomic neighborhood did not reveal any obvious associations with other conserved proteins, which could shed more light on the possible function of KIA001018 and its homologs.
COG4804/pfam06250 (YhcG family)
COG4804/pfam06250 is represented by the uncharacterized and functionally not annotated protein YhcG from E. coli, whose homologs can be found in Prokaryota – mostly in Bacteria, but also in Archaea (a few homologs are found in Methanosarcinales) and bacteriophages. The profile HMM analysis confidently identified a relationship (P-value 10-4) between the C-terminal domain of YhcG (aa 235–375) and pfam01939 (DUF91)/COG1637, a family of predicted RecB-like nucleases . Both YhcG and DUF91 families exhibit the same orthodox endonuclease motif: E-D-E(I/L/V)K. Nonetheless, our profile HMM analysis indicates that the endonuclease domain of DUF91 is more similar to YraN-like HJRs rather than to the RecB-like nucleases (P-values 10-6 for YraN and 10-4 for DUF91). This is also confirmed by the fold-recognition analysis for YhcG, which indicates the AHJR structures 1hh1 and 1gef/1ipi as possible templates, however with low scores (INBGU: 1gef, score: 7.7; 1hh1, score: 6.08; FUGUE: 1ipi, score: 2.97; 3DPSSM: 1hh1, score: 7.7). COG4804/pfam06250 members possess an additional N-terminal domain (~200 aa), predicted to be mainly α-helical, but lacking any confident matches to known structures, according to fold-recognition servers. The analysis of genomic neighborhood did not reveal any obvious associations of the yhcG family with conserved genes. In Rickettsiae 5 YhcG paralogs could be found in two gene clusters. Only in Methanosarcina acetivorans an YhcG homolog was found associated with a type I restriction-modification gene cluster, also with another predicted PD-(D/E)XK superfamily member from COG3586 (a putative endonuclease family, whose members were previously identified )
Pfam06356 (φH_25 family)
Pfam06356 (DUF1064) groups together functionally uncharacterized and not annotated proteins from tailed bacteriophages and bacteria (Proteobacteria and Firmicutes), and is represented by the hypothetical product encoded by ORF25 from S. aureus phage φH. The profile HMM analysis identified this family as a relative (P-value from 10-3 to 10-5) of YcjD-like proteins – putative members of the PD-(D/E)XK superfamily identified previously . The fold-recognition analysis identified TnsA (1f1z) as one of the possible structural templates (mGenTHREADER: score: 0.302, FUGUE: score: 2.56). The structure of phage T7 Endonuclease I, 1fzr, had slightly better FR scores, but exhibited numerous insertions and deletions in the alignment. Nonetheless, proteins from pfam06356 exhibit a more typical version of the catalytic motif, namely AD-DXK (more similar to T7 Endo I than to TnsA). PSI-BLAST searches using pfam06356 members as queries revealed also a low similarity to the PD-(D/E)XK domain in the Res subunit of Type III restriction-modification systems  (alignment only in the endonuclease core region with sequence identity 20% and lower) and to COG3372/pfam5626 (DUF790), an uncharacterized family predicted to belong to the PD-(D/E)XK superfamily in this work, see below. The genomic analysis reveals that pfam06356 members are typically found only in phage genomes or in prophages in bacterial genomes, however, without any strongly conserved neighbors. In only one case, S. aureus subsp. aureus COL prophage L54a, the gene coding a putative DNA:m6 A MTase (M.SauCOLORF346P according to the REBASE nomenclature ) was found in a putative operon with a pfam06356 member – hypothetical protein SACOL0347, suggesting that together they may form a novel RM system. We speculate that most members of this family may serve a purpose similar to that of phylogenetically unrelated Endonuclease II of phage T4, namely the degradation of the host DNA upon lytic infection as well as production of recombinogenic fragments 
COG5482 includes a few functionally uncharacterized and not annotated proteins (~230 aa) mainly from Proteobacteria (only two homologs were found in species from other taxa: Clostridium acetobutylicum and Symbiobacterium thermophilum). The profile HMM analysis suggested that they are remotely related to AHJRs (P-value ~10-3). The fold-recognition analysis has also identified the AHJR structures (1gef/1ipi and 1hh1) as the best templates for modeling of COG5482 sequences, with very good agreement of alignments reported by different servers despite moderate scores (mGenTHREADER: 1hh1, score: 0.543; 1gef, score: 0.413; FUGUE: 1hh1, score: 3.23; 1ipi, score: 3.89; 3DPSSM: 1hh1, score: 3.5; SAM: 1gef/1ipi, scores: 0.17 (0.41). The catalytic domain exhibits the consensus motif: E-CD-ELK. Interestingly, the C-terminal 100 aa was predicted to form a separate domain of the winged helix-turn-helix (wHTH) fold, typically involved in DNA binding , which may therefore dictate the target specificity of this putative nuclease. The precise role of these proteins remains to be determined experimentally.
Pfam06190 (gp51 family)
Pfam06190 (DUF944) groups together very small (~100), functionally uncharacterized and not annotated proteins, exemplified by the gp51 protein from the Listeria monocytogenes phage PSA. Homologs of this protein can be found mainly in other phages and in a few bacteria from different taxa, frequently within putative prophages. The profile HMM comparison revealed a remote relationship of Pfam06190 members to AHJRs (P-value ~10-4), while sequence analysis revealed an orthodox version of the catalytic motif, (P/S/C)D-EXK. All fold-recognition servers identified PD-(D/E)XK enzymes as the best templates with high scores (e.g. AHJRs (1hh1, 1gef) identified by INBGU, score 25.16, RecU/PrfA recombinase (1y1o, 1rzn) identified by FFAS, score -10.8). Structure prediction suggests that the members of the gp51 family exhibit a minimal form of the central β-sheet, with only four strands, flanked by three α-helices. However, they seem to possess an additional β-hairpin similar to the element, which in RecU forms a dimerization interface (our unpublished analysis of the 1y1o structure). Interestingly, the analysis of genomic neighborhood reveals that gp51 and its homologs are frequently associated with primases or helicases, which suggests they may be involved in DNA replication. We predict that gp51 and its relatives are recombinases involved in resolution of branched intermediates of phage DNA undergoing replication and/or recombination, similarly to T7 endonuclease I.
COG1395 (PAB2104 family)
COG1395 groups together archaeal sequences of functionally uncharacterized proteins represented by the PAB2104 protein from Pyrococcus abyssi, annotated as putative transcription regulators. As might have been expected based on the database annotation, the profile HMM analysis identified with high confidence (P-value ~10-10) the helix-turn-helix motif (similar to that one from HTH_3 (Xre) family like proteins: pfam01381, smart00530, cd00093) in the central part of the COG1395 sequences. However, using HHsearch we found that the N – terminal domain is related to the family of AHJRs (pfam01870) (P-value ~10-6). The fold-recognition analysis confidently confirmed AHJR (1gef) as the best template for modeling of the N-terminal domain. Preliminary modeling of the N-terminal domain (data not shown) suggests that in PAB2104 and its homologs the catalytic D/E residue from the orthodox (D/E)XK half-motif migrated to the α-helix following the β-strand in which that half-motif was transformed into the "K(I/V)L" form. Interestingly, the profile HMM analysis predicted that the C-terminal domain in COG1395 members may be homologous to the PD-(D/E)XK nucleases as well. This prediction was also supported by the fold recognition analysis, which found AHJR 1hh1 (scores: FFAS: -6.66, INBGU: 13.34, mGenTHREADER: 0.447, SPARKS: -1.74, FUGUE: 4.33, 3DPSSM: 0.061) as the potentially best template structure. However, despite the apparent structural conservation and the presence of hydrophobic residues that confer protein stability, the typical active site is not conserved in this domain (Figure 2). This suggests that the C-terminal domain was generated by intragenic duplication (or fusion of the N-terminal module with a PD-(D/E)XK domain from another source) and then underwent degeneration. Variants of the PD-(D/E)XK domains that lost the active site but presumably retained the ability to bind nucleic acids have been already described (review: ).
Thus, COG1395 represents a new family of archaeal putative nucleases with a novel domain architecture: PD-(D/E)XK-HTH-(inactivated)PD-(D/E)XK. It is noteworthy that a related architecture, namely wHTH-PD-(D/E)XK, was observed in the Type IIE REase NaeI, in which the PD-(D/E)XK domain and the wHTH domain, despite the unrelated folds, bind two copies of the same sequence, but only the copy bound to the nuclease domain is cleaved . A PD-(D/E)XK-wHTH fusion (albeit with a reversed order of domains) was also observed in the TnsA transposase . On the other hand, a variant of a tandemly duplicated PD-(D/E)XK domain with a degenerated C-terminal repeat was found in the Sau3AI enzyme [46, 47], another Type IIE REase, which also binds as a dimer two copies of the same sequence, but cleaves only the one bound to the pair of "active" N-terminal domains. Tandemly duplicated PD-(D/E)XK domains have been also found in Type IIS restriction enzymes that act as monomers, bind only one asymmetrical DNA target, and use different domains for the cleavage of each strand of the substrate (JMB, JK, and Arvydas Lubys, unpublished data). It will be interesting to study the mode of action of COG1395 members, i.e. whether they act as monomers and dimers, how many copies of the target sequence they bind and whether they act similarly to Type IIE enzymes (like NaeI or Sau3AI) or as nickases, which cleave only one strand of the substrate using the C-terminal active domain.
COG4127 groups together 4 uncharacterized proteins, of which only one, STM4490 from S. typhimurium, has been annotated as a predicted restriction endonuclease. PSI-BLAST searches revealed that COG4127 members have numerous homologs in the non-redundant database and are related (E-value ~ 10-10) to the Mrr family of REases . The relationship between the "extended" COG4127 family and the Mrr family was confirmed by the profile HMM analysis (P-value ~10-14). The fold-recognition analysis identified AHJRs (FFAS: 1hh1, score: -13.5; 1gef/1ipi, score: -7.58; 3DPSSM: 1hh1, score: 3.4; SAM-T02: 1gef/1ipi, score: 0.54) as the best templates for modeling of the C-terminal part of the protein, but failed to suggest a good structural match for the N-terminal domain of a similar size. It is noteworthy that some members of COG4127 contain only the N-terminal domain, but either lack the C-terminal Mrr-like domain or possess another, apparently unrelated domain. Analysis of sequence conservation and predicted secondary structures suggested that the N-terminal domain also resembles the PD-(D/E)XK nucleases, but without the hallmark active site, e.g. like in the afore-mentioned COG1395. However, we were unable to confirm the relationship of the N-terminal domain of COG4127 to known PD-(D/E)XK nucleases either by fold-recognition or by HHsearches. Thus, the structure and function of the N-terminal domain remains to be determined experimentally; the availability of members of COG4127 that already lack the C-terminal domain may be particularly useful.
Interestingly, analyses of genomic context revealed that a few members of COG4127 are associated with putative Type I RM systems (R/S/M.CcrMORF620P in Caulobacter crescentus, R/S/M.XorKORF3462P and R/S/M.XorKORF3457P in Xanthomonas oryzae and a putative type I RM system in Methylobacillus flagellatus not yet included in REBASE) or with a Type III RM system (R/M.DetORF1112P in Dehalococcoides ethenogenes). Additionally, another gene cluster from D. ethenogenes contains two representatives of COG4127 (one of them contains only the N-terminal domain) and a member of the SfsA family (predicted to be a nuclease in this work). Another COG4127 member from Azotobacter vinelandii was found in the neighborhood of the putative Mod subunit and a putative protein annotated as a "virulence protein" (COG3943).
COG4741 consists of functionally uncharacterized proteins mostly from Archaea, of which only one, the hypothetical product of the locus TVN1166 from T. volcanium is annotated as "a predicted secreted endonuclease distantly related to AHJRs". The profile HMM analysis confidently (P-value ~10-5) identified the relationship of this family to the YraN subfamily of PD-(D/E)XK enzymes . The predicted nuclease domain encompasses 120 C-terminal residues and corresponds to the absolutely minimal core, with only 4 β-strands and 2 α-helices that serve as a scaffold for a well-conserved E-(V/I)D-E(V/I)K motif. The N-terminus reveals a strongly hydrophobic stretch of residues predicted to form a transmembrane helix, which could be either used as a leader peptide to guide the translocation of the nuclease domains through the membrane or it could anchor it to the membrane. Programs for the prediction of transmembrane protein topology HMMTOP , TMAP , and TMPRED  predicted that the nuclease domain has a cytosolic orientation. We speculate that COG4741 members could be released to the environment as toxic agents against other cells, like endonuclease colicins (review: ), or be used to guard the cell against the uptake of foreign DNA and/or to cleave the encountered nucleic acids to produce (oligo)nucleotides that can be used by the host.
COG5321/pfam06319 (DUF1052) comprises functionally uncharacterized and not annotated short (~160 aa) proteins found almost exclusively in Proteobacteria. The profile HMM analysis confidently identified their relationship to AHJRs (P-value ~10-7). The fold-recognition analysis confirmed the AHJR structures (1gef and 1hh1) as the best templates, however with low scores (data not shown). The reason for this low confidence of fold-recognition could be due to the strong divergence of the C-terminal part of the domain and poor consensus of secondary structure prediction. Nonetheless, COG5321/pfam06319 members exhibit the conserved orthodox AD-E(V/I/C)K motif associated with the characteristic pattern of predicted secondary structures in the N-terminal part of the common fold, which is a strong indication that they are genuine members of the PD-(D/E)XK superfamily, active as nucleases. However, the analysis of genomic neighborhood did not provide any specific clues as to their specific function.
COG3372/pfam5626 (DUF790) is represented by the uncharacterized product of PAB1046 gene from Pyrococcus abyssi. Members of this family are found almost exclusively in Archaea and Cyanobacteria. The profile HMM analysis identified their relationship (P-value ~10-4) to Pfam06356 (DUF1064) family of putative endonucleases (identified in this work, see above) and a more remote similarity to AHJRs (P-value ~10-2). All fold-recognition servers with the exception of FUGUE identified with high scores the structure of bacteriophage T7 Endonuclease I (1fzr, 1m0d) as the best templates for modeling of the C-terminal domain of COG3372 (INBGU: 1fzr, score: 60.96; 1hh1, score: 29.12; 1gef, 19.81; FFAS: 1m0d, score -8.15; mGenTHREADER: 1fzr, score: 0.529; 1hh1, score: 0.459; SAM: 1m0d, score: 1.6*10-4; 1fzr, score: 5.4*10-4; 1ob8, score: 0.53; SPARKS: 1m0d, score: -3.62; 1ob8, score: -3.28; 1gef, score: -2.8; 1hh1, score: -2.53; FUGUE: 1hh1, score: 4.48; 1fzr, score: 4.26; 1j24, score: 3.01; 3DPSSM: 1fzr, score: 0.53; 1gef, score: 0.6). However, the fold-recognition failed to identify any confident structural matches for the N-terminal domain (size ~250 aa). Analysis of the genomic neighborhood revealed that COG3372 members are usually associated with hypothetical proteins from COG1061 (SSL2: DNA or RNA helicases of superfamily II). This suggests they may be involved in DNA repair or recombination, but does not exclude their potential selfish character.
The gp4 family is represented by a small (150 aa) protein gp4 from the T4 bacteriophage, annotated as a head completion protein important for the final stages of bacteriophage head assembly process and predicted to be important for the DNA-mediated attachment of independently assembled head and fibers . To our knowledge, only a general functional prediction has been made for this protein and sequences homologous to gp4 have not been grouped into a PFAM or COG family. We identified them by PSI-BLAST, using a Pfam06356 (DUF1064) representative as a query, with a very low E-value of 3.2. The HHsearch analysis carried out for the gp4 family confirmed their relationship to Pfam06356 (DUF1064) (P-value 10-3) as well as to COG3372/pfam5626 (P-value 10-4) and gp51 families (P-value 10-3). Exhaustive PSI-BLAST searches (5 iterations) confidently reported structures of bacteriophage T7 endonuclease I (1fzr, 1m0d) as best templates for the gp4 tertiary structure (E-value = 10-12). The consensus of fold-recognition analysis confirmed the prediction pf the PD-(D/E)XK fold, with TnsA structures (1f1z, 1t0f) as well as T7 Endonuclease I (1fzr, 1m0d) reported as the best structural matches. gp4 is positioned in a cluster of genes encoding structural proteins important for the formation of infectious phage particles and to our knowledge has not been reported to interact with the DNA, but to be necessary for the assembly of protein components of the viral particles. The attachment of independently assembled head and fibers is however DNA-dependent (the process does not occur if there is no DNA loaded to the phage head) (review: ). Thus, it is possible that gp4 may be involved in determination if the DNA is packed properly or in some yet unknown process involving the DNA cleavage upon the attachment of the tail and fibers to the head, for instance when one end of the packaged DNA descends into the tail.
PAE1662 from Pyrobaculum aerophilum str. IM2 is an uncharacterized, functionally not annotated protein. We found it during exhaustive PSI-BLAST searches using Pfam06356 (DUF1064) representative as a query, with E-value only about 3.4, but also with a hallmark motif E-AD-ELK. We found a characteristic pattern of predicted secondary structure elements associated with the putative nuclease motif, while the fold-recognition servers reported that PAE1662 is similar to several structures of PD-(D/E)XK superfamily members (mGenTHREADER: 1w36C (RecC), score: 0.651; 1hh1, score 0.594; 1ob8, score: 0.538; SAM: 1hh1, score: 3.8; 1t0f, score: 4.2; 1f1z, score 6.3; 3DPSSM: score: 1hh1, score: 1.5). The consensus predictor Pcons selected 1hh1 (AHJR) as a preferred template. Interestingly, PAE1662 is associated with a gene encoding a putative DNA:m5C MTase (annotated as M.PaeIMORF1659P in the REBASE database ). In this context it is noteworthy that a reciprocal PSI-BLAST search revealed no homologs of PAE1662. Altogether, the association with a DNA MTase, the lack of homologs detectable by database searches and a remote relationship to the PD-(D/E)XK nucleases detected by fold-recognition suggest that PAE1662 is a novel putative type II REase, which to date has not been annotated as such in REBASE.
We have carried out a sequence/structure profile HMM search using a new method  to identify new members of the PD-(D/E)XK superfamily. Our results revealed the presence of this highly diverged nuclease domain in families of proteins implicated in DNA metabolism but with unknown structure (such as a putative recombinase RmuC, DNA competence factor CoiA, a DNA-binding protein SfsA, a large human protein predicted to be a DNA repair enzyme, predicted archaeal transcription regulators, and the head completion protein of phage T4), and in proteins for which no function or structure was assigned to date (such as YhcG, various phage proteins, and novel candidates for restriction enzymes). The initial predictions were validated by protein fold-recognition, leading to preliminary structural models, which were used as platforms for identification of the potential active sites. Thus far, all known members of the PD-(D/E)XK fold were found to be nucleases, mostly acting on the DNA, or at least have been implicated in nucleic acid metabolism. It cannot be excluded that some of the newly reported members may be hydrolases acting on other substrates (e.g. the SfsA protein involved in stimulation of the sugar fermentation), but we speculate that most of them would cleave DNA. The predictions reported in this article will facilitate the search for the possible substrates.
Our predictions contribute to the reduction of "white spaces" on the sequence-structure-function map of the protein universe and will help to jump-start the experimental characterization of the cellular function of these putative nucleases, as well as the molecular mechanisms of their interactions with the DNA. That we identified several members of the PD-(D/E)XK superfamily with very low scores suggests that more strongly diverged members still await discovery. Our analysis has provided a set of new sequence profiles that may be used to search for even more members of this important group of enzymes and will help to select targets for experimental analyses. For instance, it would be interesting to determine high-resolution structures of the presumably "minimal" members of the superfamily (such as the predicted nuclease domain in the C-terminus of the giant protein KIA001018) or proteins with interesting combinations of domains (such as the fusion of predicted active and inactive nuclease domains with the HTH domain in PAB2104). The availability of the complete catalog of nucleases and the knowledge of their mechanisms of action (and interaction) in the cell under different conditions is essential for the complete understanding of mechanisms that govern the evolution and stability of the genome. Our analysis provides a small, but important step towards this aim.
Protein structure prediction
Secondary structure prediction and tertiary fold-recognition was carried out via the GeneSilico meta-server gateway at http://genesilico.pl/meta/. Secondary structure was predicted using PSIPRED , PROFsec , PROF , SABLE , JNET , JUFO , and SAM-T02 . Solvent accessibility for the individual residues was predicted with SABLE  and JPRED . The fold-recognition analysis (attempt to match the query sequence to known protein structures) was carried out using FFAS03 , SAM-T02 , 3DPSSM , BIOINBGU , FUGUE , mGenTHREADER , and SPARKS . Fold-recognition alignments reported by these methods were compared, evaluated, and ranked by the Pcons server . In the cases, where the active site could not be unambiguously identified from the analysis of alignments, we carried out homology modeling and tried to identify the potential catalytic residues based on structural considerations. Accordingly, fold-recognition alignments to the structures of selected templates were used as a starting point for homology modeling using the "FRankenstein's Monster" approach , comprising cycles of model building, evaluation, realignment in poorly scored regions and merging of best scoring fragments. The modeling protocol was essentially identical to that published in .
Construction and comparison of sequence-structure profile HMMs
A set of previously identified members of the PD-(D/E)XK superfamily (Pfu-HJR, GI:5689160; McrC, GI:585466; Mrr, GI:127320; XisH, GI:1613875; YcjD, GI:1175688; A putative nuclease from Mycobacterium, GI:15609145; PAB0571, GI:7450881, Yor305wp, GI:6324881; YraN, GI: 1176818; AtF16A14.4, GI:8778385; GP69, GI:465349) were used as seeds in PSI-BLAST  searches of the non-redundant (nr) database. For each sequence, the search was carried out in two versions: "conservative", with the expectation (e) value threshold for the retrieval of related sequences set to 10-6 and the maximum number of iterations set to 6, and "aggressive", with the e-value threshold of 10-2 and the maximum number of iterations set to 12. The final blast (blunt-end master-slave) alignments together with the predicted secondary structure were used to generate a set of query profile HMMs using HHmake from the HHsearch package . The profile HMMs corresponding to all COG, KOG , PFAM , PDB , CDD  and SMART  entries were downloaded from the home site of HHsearch (the Department of Developmental Biology (MPI), http://protevo.eb.tuebingen.mpg.de/). Comparison of the profile HMMs (sequence+structure) was carried out using HHsearch , with default parameters.
product of an open reading frame,
This analysis was funded by the MNiI (grant 3P04A01124 to JMB). JMB was also supported by the EMBO/HHMI Young Investigator Award. JK was supported by the NIH (grant 1R21CA97899-01A from the NCI). MF was supported by the NIH (Fogarty International Center grant R03 TW007163-01).
- Aggarwal AK: Structure and function of restriction endonucleases. Curr Opin Struct Biol 1995, 5: 11–19. 10.1016/0959-440X(95)80004-KView ArticlePubMedGoogle Scholar
- Bujnicki JM: Molecular phylogenetics of restriction endonucleases. Nucleic Acids and Molecular Biology. In Restriction Endonucleases. Volume 14. Edited by: Pingoud A. Edited by: Gross HJ. Berlin, Springer-Verlag; 2004:63–87.View ArticleGoogle Scholar
- Kovall RA, Matthews BW: Structural, functional, and evolutionary relationships between lambda-exonuclease and the type II restriction endonucleases. Proc Natl Acad Sci U S A 1998, 95: 7893–7897. 10.1073/pnas.95.14.7893PubMed CentralView ArticlePubMedGoogle Scholar
- Ban C, Yang W: Structural basis for MutH activation in E.coli mismatch repair and relationship of MutH to restriction endonucleases. Embo J 1998, 17: 1526–1534. 10.1093/emboj/17.5.1526PubMed CentralView ArticlePubMedGoogle Scholar
- Tsutakawa SE, Muto T, Kawate T, Jingami H, Kunishima N, Ariyoshi M, Kohda D, Nakagawa M, Morikawa K: Crystallographic and functional studies of very short patch repair endonuclease. Mol Cell 1999, 3: 621–628. 10.1016/S1097-2765(00)80355-XView ArticlePubMedGoogle Scholar
- Hickman AB, Li Y, Mathew SV, May EW, Craig NL, Dyda F: Unexpected structural diversity in DNA recombination: the restriction endonuclease connection. Mol Cell 2000, 5: 1025–1034. 10.1016/S1097-2765(00)80267-1View ArticlePubMedGoogle Scholar
- Nishino T, Komori K, Tsuchiya D, Ishino Y, Morikawa K: Crystal structure of the archaeal holliday junction resolvase Hjc and implications for DNA recognition. Structure (Camb) 2001, 9: 197–204. 10.1016/S0969-2126(01)00576-7View ArticleGoogle Scholar
- Bond CS, Kvaratskhelia M, Richard D, White MF, Hunter WN: Structure of Hjc, a Holliday junction resolvase, from Sulfolobus solfataricus. Proc Natl Acad Sci U S A 2001, 98: 5509–5514. 10.1073/pnas.091613398PubMed CentralView ArticlePubMedGoogle Scholar
- Middleton CL, Parker JL, Richard DJ, White MF, Bond CS: Substrate recognition and catalysis by the Holliday junction resolving enzyme Hje. Nucleic Acids Res 2004, 32: 5442–5451. 10.1093/nar/gkh869PubMed CentralView ArticlePubMedGoogle Scholar
- Hadden JM, Convery MA, Declais AC, Lilley DM, Phillips SE: Crystal structure of the Holliday junction resolving enzyme T7 endonuclease I. Nat Struct Biol 2001, 8: 62–67. 10.1038/83067View ArticlePubMedGoogle Scholar
- Nishino T, Komori K, Ishino Y, Morikawa K: X-ray and biochemical anatomy of an archaeal XPF/Rad1/Mus81 family nuclease: similarity between its endonuclease domain and restriction enzymes. Structure (Camb) 2003, 11: 445–457. 10.1016/S0969-2126(03)00046-7View ArticleGoogle Scholar
- Singleton MR, Dillingham MS, Gaudier M, Kowalczykowski SC, Wigley DB: Crystal structure of RecBCD enzyme reveals a machine for processing DNA breaks. Nature 2004, 432: 187–193. 10.1038/nature02988View ArticlePubMedGoogle Scholar
- Venclovas C, Timinskas A, Siksnys V: Five-stranded beta-sheet sandwiched with two alpha-helices: a structural link between restriction endonucleases EcoRI and EcoRV. Proteins 1994, 20: 279–282. 10.1002/prot.340200308View ArticlePubMedGoogle Scholar
- Newman M, Strzelecka T, Dorner LF, Schildkraut I, Aggarwal AK: Structure of restriction endonuclease BamHI and its relationship to EcoRI. Nature 1994, 368: 660–664. 10.1038/368660a0View ArticlePubMedGoogle Scholar
- Lukacs CM, Kucera R, Schildkraut I, Aggarwal AK: Understanding the immutability of restriction enzymes: crystal structure of BglII and its DNA substrate at 1.5 A resolution. Nat Struct Biol 2000, 7: 134–140. 10.1038/72405View ArticlePubMedGoogle Scholar
- Pingoud V, Sudina A, Geyer H, Bujnicki JM, Lurz R, Luder G, Morgan R, Kubareva E, Pingoud A: Specificity changes in the evolution of Type II restriction endonucleases: a biochemical and bioinformatic analysis of restriction enzymes that recognize unrelated sequences. J Biol Chem 2005, 280: 4289–4298. 10.1074/jbc.M409020200View ArticlePubMedGoogle Scholar
- Skirgaila R, Grazulis S, Bozic D, Huber R, Siksnys V: Structure-based redesign of the catalytic/metal binding site of Cfr10I restriction endonuclease reveals importance of spatial rather than sequence conservation of active centre residues. J Mol Biol 1998, 279: 473–481. 10.1006/jmbi.1998.1803View ArticlePubMedGoogle Scholar
- Bujnicki JM, Rychlewski L: Identification of a PD-(D/E)XK-like domain with a novel configuration of the endonuclease active site in the methyl-directed restriction enzyme Mrr and its homologs. Gene 2001, 267: 183–191. 10.1016/S0378-1119(01)00405-XView ArticlePubMedGoogle Scholar
- Pingoud V, Kubareva E, Stengel G, Friedhoff P, Bujnicki JM, Urbanke C, Sudina A, Pingoud A: Evolutionary relationship between different subgroups of restriction endonucleases. J Biol Chem 2002, 277: 14306–14314. 10.1074/jbc.M111625200View ArticlePubMedGoogle Scholar
- Tamulaitis G, Solonin AS, Siksnys V: Alternative arrangements of catalytic residues at the active sites of restriction enzymes. FEBS Lett 2002, 518: 17–22. 10.1016/S0014-5793(02)02621-2View ArticlePubMedGoogle Scholar
- Feder M, Bujnicki JM: Identification of a new family of putative PD-(D/E)XK nucleases with unusual phylogenomic distribution and a new type of the active site. BMC Genomics 2005, 6: 21. 10.1186/1471-2164-6-21PubMed CentralView ArticlePubMedGoogle Scholar
- Bujnicki JM: Crystallographic and bioinformatic studies on restriction endonucleases: inference of evolutionary relationships in the "midnight zone" of homology. Curr Protein Pept Sci 2003, 4: 327–337. 10.2174/1389203033487072View ArticlePubMedGoogle Scholar
- Bujnicki JM, Rychlewski L: Grouping together highly diverged PD-(D/E)XK nucleases and identification of novel superfamily members using structure-guided alignment of sequence profiles. J Mol Microbiol Biotechnol 2001, 3: 69–72.PubMedGoogle Scholar
- Bujnicki JM, Rychlewski L: The herpesvirus alkaline exonuclease belongs to the restriction endonuclease PD-(D/E)XK superfamily: insight from molecular modeling and phylogenetic analysis. Virus Genes 2001, 22: 219–230. 10.1023/A:1008131810233View ArticlePubMedGoogle Scholar
- Pingoud V, Conzelmann C, Kinzebach S, Sudina A, Metelev V, Kubareva E, Bujnicki JM, Lurz R, Luder G, Xu SY, Pingoud A: PspGI, a type II restriction endonuclease from the extreme thermophile Pyrococcus sp.: structural and functional studies to investigate an evolutionary relationship with several mesophilic restriction enzymes. J Mol Biol 2003, 329: 913–929. 10.1016/S0022-2836(03)00523-0View ArticlePubMedGoogle Scholar
- Chmiel AA, Bujnicki JM, Skowronek KJ: A homology model of restriction endonuclease SfiI in complex with DNA. BMC Struct Biol 2005, 5: 2. 10.1186/1472-6807-5-2PubMed CentralView ArticlePubMedGoogle Scholar
- Chmiel AA, Radlinska M, Pawlak SD, Krowarsch D, Bujnicki JM, Skowronek KJ: A theoretical model of restriction endonuclease NlaIV in complex with DNA, predicted by fold recognition and validated by site-directed mutagenesis and circular dichroism spectroscopy. Protein Eng Des Sel 2005, 18: 181–189. 10.1093/protein/gzi019View ArticlePubMedGoogle Scholar
- Pawlak SD, Radlinska M, Chmiel AA, Bujnicki JM, Skowronek KJ: Inference of relationships in the 'twilight zone' of homology using a combination of bioinformatics and site-directed mutagenesis: a case study of restriction endonucleases Bsp6I and PvuII. Nucleic Acids Res 2005, 33: 661–671. 10.1093/nar/gki213PubMed CentralView ArticlePubMedGoogle Scholar
- Soding J: Protein homology detection by HMM-HMM comparison. Bioinformatics 2005, 21: 951–960. 10.1093/bioinformatics/bti125View ArticlePubMedGoogle Scholar
- Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA: The COG database: an updated version includes eukaryotes. BMC Bioinformatics 2003, 4: 41. 10.1186/1471-2105-4-41PubMed CentralView ArticlePubMedGoogle Scholar
- Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bradley P, Bork P, Bucher P, Cerutti L, Copley R, Courcelle E, Das U, Durbin R, Fleischmann W, Gough J, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lonsdale D, Lopez R, Letunic I, Madera M, Maslen J, McDowall J, Mitchell A, Nikolskaya AN, Orchard S, Pagni M, Ponting CP, Quevillon E, Selengut J, Sigrist CJ, Silventoinen V, Studholme DJ, Vaughan R, Wu CH: InterPro, progress and status in 2005. Nucleic Acids Res 2005, 33 Database Issue: D201–5.Google Scholar
- Aravind L, Makarova KS, Koonin EV: SURVEY AND SUMMARY: Holliday junction resolvases and related nucleases: identification of new families, phyletic distribution and evolutionary trajectories. Nucleic Acids Res 2000, 28: 3417–3432. 10.1093/nar/28.18.3417PubMed CentralView ArticlePubMedGoogle Scholar
- Slupska MM, Chiang JH, Luther WM, Stewart JL, Amii L, Conrad A, Miller JH: Genes involved in the determination of the rate of inversions at short inverted repeats. Genes Cells 2000, 5: 425–437. 10.1046/j.1365-2443.2000.00341.xView ArticlePubMedGoogle Scholar
- Pieper U, Pingoud A: A mutational analysis of the PD...D/EXK motif suggests that McrC harbors the catalytic center for DNA cleavage by the GTP-dependent restriction enzyme McrBC from Escherichia coli. Biochemistry 2002, 41: 5236–5244. 10.1021/bi0156862View ArticlePubMedGoogle Scholar
- Pestova EV, Morrison DA: Isolation and characterization of three Streptococcus pneumoniae transformation-specific loci by use of a lacZ reporter insertion vector. J Bacteriol 1998, 180: 2701–2710.PubMed CentralPubMedGoogle Scholar
- Takeda K, Akimoto C, Kawamukai M: Effects of the Escherichia coli sfsA gene on mal genes expression and a DNA binding activity of SfsA. Biosci Biotechnol Biochem 2001, 65: 213–217. 10.1271/bbb.65.213View ArticlePubMedGoogle Scholar
- Tsutakawa SE, Jingami H, Morikawa K: Recognition of a TG mismatch: the crystal structure of very short patch repair endonuclease in complex with a DNA duplex. Cell 1999, 99: 615–623. 10.1016/S0092-8674(00)81550-0View ArticlePubMedGoogle Scholar
- Murzin AG: OB(oligonucleotide/oligosaccharide binding)-fold: common structural and functional solution for non-homologous sequences. Embo J 1993, 12: 861–867.PubMed CentralPubMedGoogle Scholar
- Nagase T, Ishikawa K, Suyama M, Kikuno R, Hirosawa M, Miyajima N, Tanaka A, Kotani H, Nomura N, Ohara O: Prediction of the coding sequences of unidentified human genes. XIII. The complete sequences of 100 new cDNA clones from brain which code for large proteins in vitro. DNA Res 1999, 6: 63–70.View ArticlePubMedGoogle Scholar
- Bujnicki JM: Understanding the evolution of restriction-modification systems: clues from sequence and structure comparisons. Acta Biochim Pol 2001, 48: 935–967.PubMedGoogle Scholar
- Davies GP, Martin I, Sturrock SS, Cronshaw A, Murray NE, Dryden DT: On the structure and operation of type I DNA restriction enzymes. J Mol Biol 1999, 290: 565–579. 10.1006/jmbi.1999.2908View ArticlePubMedGoogle Scholar
- Roberts RJ, Vincze T, Posfai J, Macelis D: REBASE--restriction enzymes and DNA methyltransferases. Nucleic Acids Res 2005, 33(Database Issue):D230–2. 10.1093/nar/gki029PubMed CentralView ArticlePubMedGoogle Scholar
- Carlson K, Kosturko LD: Endonuclease II of coliphage T4: a recombinase disguised as a restriction endonuclease? Mol Microbiol 1998, 27: 671–676. 10.1046/j.1365-2958.1998.00728.xView ArticlePubMedGoogle Scholar
- Brennan RG: The winged-helix DNA-binding motif: another helix-turn-helix takeoff. Cell 1993, 74: 773–776. 10.1016/0092-8674(93)90456-ZView ArticlePubMedGoogle Scholar
- Huai Q, Colandene JD, Topal MD, Ke H: Structure of NaeI-DNA complex reveals dual-mode DNA recognition and complete dimer rearrangement. Nat Struct Biol 2001, 8: 665–669. 10.1038/90366View ArticlePubMedGoogle Scholar
- Bujnicki JM: A model of structure and action of Sau3AI restriction endonuclease that comprises two MutH-like endonuclease domains within a single polypeptide. Acta Microbiol Pol 2001, 50: 219–231.PubMedGoogle Scholar
- Friedhoff P, Lurz R, Luder G, Pingoud A: Sau3AI, a monomeric type II restriction endonuclease that dimerizes on the DNA and thereby induces DNA loops. J Biol Chem 2001, 276: 23581–23588. 10.1074/jbc.M101694200View ArticlePubMedGoogle Scholar
- Kolade OO, Carr SB, Kuhlmann UC, Pommer A, Kleanthous C, Bouchcinsky CA, Hemmings AM: Structural aspects of the inhibition of DNase and rRNase colicins by their immunity proteins. Biochimie 2002, 84: 439–446. 10.1016/S0300-9084(02)01451-7View ArticlePubMedGoogle Scholar
- Milpetz F, Argos P, Persson B: TMAP: a new email and WWW service for membrane-protein structural predictions. Trends Biochem Sci 1995, 20: 204–205. 10.1016/S0968-0004(00)89009-XView ArticlePubMedGoogle Scholar
- Hofmann K, Stoffel W: TMBASE - A database of membrane spanning proteins segments. Biol Chem Hoppe Seyler 1993, 347: 166.Google Scholar
- James R, Penfold CN, Moore GR, Kleanthous C: Killing of E coli cells by E group nuclease colicins. Biochimie 2002, 84: 381–389. 10.1016/S0300-9084(02)01450-5View ArticlePubMedGoogle Scholar
- Leiman PG, Kanamaru S, Mesyanzhinov VV, Arisaka F, Rossmann MG: Structure and morphogenesis of bacteriophage T4. Cell Mol Life Sci 2003, 60: 2356–2370. 10.1007/s00018-003-3072-1View ArticlePubMedGoogle Scholar
- Kurowski MA, Bujnicki JM: GeneSilico protein structure prediction meta-server. Nucleic Acids Res 2003, 31: 3305–3307. 10.1093/nar/gkg557PubMed CentralView ArticlePubMedGoogle Scholar
- McGuffin LJ, Bryson K, Jones DT: The PSIPRED protein structure prediction server. Bioinformatics 2000, 16: 404–405. 10.1093/bioinformatics/16.4.404View ArticlePubMedGoogle Scholar
- Rost B, Yachdav G, Liu J: The PredictProtein server. Nucleic Acids Res 2004, 32: W321–6.PubMed CentralView ArticlePubMedGoogle Scholar
- Ouali M, King RD: Cascaded multiple classifiers for secondary structure prediction. Protein Sci 2000, 9: 1162–1176.PubMed CentralView ArticlePubMedGoogle Scholar
- Adamczak R, Porollo A, Meller J: Accurate prediction of solvent accessibility using neural networks-based regression. Proteins 2004, 56: 753–767. 10.1002/prot.20176View ArticlePubMedGoogle Scholar
- Cuff JA, Barton GJ: Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins 2000, 40: 502–511. 10.1002/1097-0134(20000815)40:3<502::AID-PROT170>3.0.CO;2-QView ArticlePubMedGoogle Scholar
- Meiler J, Baker D: Coupled prediction of protein secondary and tertiary structure. Proc Natl Acad Sci U S A 2003, 100: 12105–12110. 10.1073/pnas.1831973100PubMed CentralView ArticlePubMedGoogle Scholar
- Karplus K, Karchin R, Draper J, Casper J, Mandel-Gutfreund Y, Diekhans M, Hughey R: Combining local-structure, fold-recognition, and new fold methods for protein structure prediction. Proteins 2003, 53 Suppl 6: 491–496. 10.1002/prot.10540View ArticlePubMedGoogle Scholar
- Rychlewski L, Jaroszewski L, Li W, Godzik A: Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci 2000, 9: 232–241.PubMed CentralView ArticlePubMedGoogle Scholar
- Kelley LA, MacCallum RM, Sternberg MJ: Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol 2000, 299: 499–520. 10.1006/jmbi.2000.3741View ArticlePubMedGoogle Scholar
- Fischer D: Hybrid fold recognition: combining sequence derived properties with evolutionary information. Pacific Symp Biocomput 2000, 119–130.Google Scholar
- Shi J, Blundell TL, Mizuguchi K: FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol 2001, 310: 243–257. 10.1006/jmbi.2001.4762View ArticlePubMedGoogle Scholar
- Jones DT: GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J Mol Biol 1999, 287: 797–815. 10.1006/jmbi.1999.2583View ArticlePubMedGoogle Scholar
- Zhou H, Zhou Y: Single-body residue-level knowledge-based energy score combined with sequence-profile and secondary structure information for fold recognition. Proteins 2004, 55: 1005–1013. 10.1002/prot.20007View ArticlePubMedGoogle Scholar
- Lundstrom J, Rychlewski L, Bujnicki JM, Elofsson A: Pcons: a neural-network-based consensus predictor that improves fold recognition. Protein Sci 2001, 10: 2354–2362. 10.1110/ps.08501PubMed CentralView ArticlePubMedGoogle Scholar
- Kosinski J, Cymerman IA, Feder M, Kurowski MA, Sasin JM, Bujnicki JM: A "FRankenstein's monster" approach to comparative modeling: merging the finest fragments of Fold-Recognition models and iterative model refinement aided by 3D structure evaluation. Proteins 2003, 53 Suppl 6: 369–379. 10.1002/prot.10545View ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25: 3389–3402. 10.1093/nar/25.17.3389PubMed CentralView ArticlePubMedGoogle Scholar
- Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats C, Eddy SR: The Pfam protein families database. Nucleic Acids Res 2004, 32: D138–41. 10.1093/nar/gkh121PubMed CentralView ArticlePubMedGoogle Scholar
- Deshpande N, Addess KJ, Bluhm WF, Merino-Ott JC, Townsend-Merino W, Zhang Q, Knezevich C, Xie L, Chen L, Feng Z, Green RK, Flippen-Anderson JL, Westbrook J, Berman HM, Bourne PE: The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema. Nucleic Acids Res 2005, 33(Database Issue):D233–7. 10.1093/nar/gki057PubMed CentralView ArticlePubMedGoogle Scholar
- Marchler-Bauer A, Anderson JB, DeWeese-Scott C, Fedorova ND, Geer LY, He S, Hurwitz DI, Jackson JD, Jacobs AR, Lanczycki CJ, Liebert CA, Liu C, Madej T, Marchler GH, Mazumder R, Nikolskaya AN, Panchenko AR, Rao BS, Shoemaker BA, Simonyan V, Song JS, Thiessen PA, Vasudevan S, Wang Y, Yamashita RA, Yin JJ, Bryant SH: CDD: a curated Entrez database of conserved domain alignments. Nucleic Acids Res 2003, 31: 383–387. 10.1093/nar/gkg087PubMed CentralView ArticlePubMedGoogle Scholar
- Letunic I, Copley RR, Schmidt S, Ciccarelli FD, Doerks T, Schultz J, Ponting CP, Bork P: SMART 4.0: towards genomic data integration. Nucleic Acids Res 2004, 32: D142–4. 10.1093/nar/gkh088PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.