- Research article
- Open Access
Structure and computational analysis of a novel protein with metallopeptidase-like and circularly permuted winged-helix-turn-helix domains reveals a possible role in modified polysaccharide biosynthesis
© Das et al.; licensee BioMed Central Ltd. 2014
Received: 20 July 2013
Accepted: 4 March 2014
Published: 19 March 2014
CA_C2195 from Clostridium acetobutylicum is a protein of unknown function. Sequence analysis predicted that part of the protein contained a metallopeptidase-related domain. There are over 200 homologs of similar size in large sequence databases such as UniProt, with pairwise sequence identities in the range of ~40-60%. CA_C2195 was chosen for crystal structure determination for structure-based function annotation of novel protein sequence space.
The structure confirmed that CA_C2195 contained an N-terminal metallopeptidase-like domain. The structure revealed two extra domains: an α+β domain inserted in the metallopeptidase-like domain and a C-terminal circularly permuted winged-helix-turn-helix domain.
Based on our sequence and structural analyses using the crystal structure of CA_C2195 we provide a view into the possible functions of the protein. From contextual information from gene-neighborhood analysis, we propose that rather than being a peptidase, CA_C2195 and its homologs might play a role in biosynthesis of a modified cell-surface carbohydrate in conjunction with several sugar-modification enzymes. These results provide the groundwork for the experimental verification of the function.
CA_C2195 from Clostridium acetobutylicum [UniProtKB:Q97H19_CLOAB] is a novel 434-residue protein of unknown function. Initial sequence analysis suggested that this protein could be a metallopeptidase. A PSI-BLAST  search against UniProt revealed that there are over 200 other similar proteins of unknown function. Pairwise sequence identities of these proteins to CA_C2195 vary between 40-60%. We present here the crystal structure of CA_C2195, determined as part of the Protein Structure Initiative program to extend structural coverage of novel protein sequence space to provide structure-based function assignment [2, 3]. CA_C2195 was specifically targeted by the Joint Center for Structural Genomics (JCSG) in an effort to increase the structural coverage of proteins in Pfam  clan CL0035 of metallopeptidases (Peptidase MH/MC/MF), which has ~64000 protein sequences (including CA_C2195) in 12 families (Pfam v27.0, March 2013) but with only limited (~0.2%), biased structural coverage. The families that form this clan contain many sequences, are functionally diverse, and are important in numerous biological processes. For example, recombinant bacterial carboxypeptidase G2 is used in cancer therapy to hydrolyze methotrexate  and is being tested in prodrug therapy; and human aspartoacylase is implicated in Canavan’s disease in the brain . There are also non-peptidase homologs of these proteins: some of these have active catalytic domains, but perform distinct albeit related enzymatic functions, such as the glutaminyl-peptide cyclotransferase. In other cases the homologous domains are not catalytically active and they perform protein-protein interaction based functions, such as the transferrin receptor proteins 1 and 2. JCSG has determined ~20 structures to date from clan CL0035 (see http://www.topsan.org/Groups/Zinc_Peptidase). Proteins in these families [7, 8] have a broad phylogenetic spread across all kingdoms of life and show substantial sequence divergence.
The structure of CA_C2195 revealed that it is composed of three domains. Our sequence and structure analysis led to the assignment of these three domains of CA_C2195 and its homologs to new Pfam families (using standard Pfam protocols) , to be released in the next Pfam update, version 28.0: the N-terminal metallopeptidase-like domain to DUF4910 (Domain of Unknown Function, [Pfam:PF16254]), which is distantly related by sequence to the Peptidase_M28 family [Pfam:PF04389] in clan CL0035 (MEROPS  M28 family in the peptidase MH clan); the insert domain to DUF2172 [Pfam:PF09940] (a reassignment of the existing entry); and the C-terminal wHTH to HTH_47 [Pfam:PF16221]. We believe that our results may aid in the design of structure-based biochemical experiments to further explore the biology of these proteins similar to other recent efforts on proteins of unknown function [10–15]. Based on a recent study, many DUF proteins are likely essential proteins .
Results and discussion
Summary of crystal parameters, data collection and refinement statistics for PDB 3k9t
Unit cell parameters (Å)
a = 153.78, b = 153.78, c = 168.38
Resolution range (Å)
No. of observations
No. of unique reflections
Mean I/σ (I)
R merge on I† (%)
R meas on I‡ (%)
R p.i.m. on I‡‡ (%)
Model and refinement statistics
Resolution range (Å)
No. of reflections (total)
No. of reflections (test)
Data set used in refinement
|F| > 0
R cryst ¶
R free ¶
Restraints (RMSD observed)
Bond angles (º)
Bond lengths (Å)
Average isotropic B value†† (Å2)
ESU‡‡‡ based on R free (Å)
Protein residues/ atoms
422 / 3386
Waters / Zn/ Cl/ Imd/ MRD
221 / 1 / 2 / 1 / 4
N-terminal metallopeptidase-like domain (DUF4910)
All known Peptidase_M28 members bind two Zn ions, which are described as “co-catalytic” as both Zn ions participate in the catalytic activity. In contrast, CA_C2195 has one bound Zn ion. In an earlier study, it was found that HmrA [PDB:3ram] , a Peptidase_M20 [Pfam:PF01546] protein (M20 and M28 peptidases are both in the MH clan and closely related to each other), also contained only one Zn ion and that this might have been enough to change its specificity from that of an exopeptidase (aminopeptidase or carboxypeptidase, which are the predominant specificities in both M20 and M28) to that of an endopeptidase. Despite only one Zn ion in HmrA (it is not fully clear whether the HmrA physiologically contains only one Zn ion or whether this was an artifact of the crystallization and that two Zn should be present), all five Zn-coordinating residues expected in Peptidase_M20 are conserved, which is not the case with CA_C2195. In CA_C2195 only the residues that bind the single Zn ion have been retained.
CA_C2195 does not possess conventional Peptidase_M28 active site residues, as both of the essential, invariant, active site residues have been replaced: Ser191 replaces the conserved Asp and Pro225 replaces the conserved Glu. Ser191 is conserved as Ser in 73 of the 82 homologs that were aligned and present as either Ala or Gly in the remaining 9 homologs. Pro225 is conserved as Pro in 81 of the homologs and present as Val in 1 homolog. All enzymes in Peptidase_M28, the closest known peptidase family by structure and sequence, have these residues conserved. There are over 550 non-peptidase M28 homologs in MEROPS, but only a few have been characterized. Those that have been characterized have evolved different functions, for example, the transferrin receptor proteins 1 and 2, and glutaminyl-peptide cyclotransferase. The glutaminyl-peptide cyclotransferase also has all five Zn-binding and both active site Asp and Glu residues conserved , therefore, CA_C2195 is unlikely to have comparable catalytic activity. Transferrin in blood serum binds iron, which is internalized once transferrin docks to its receptor .
Insert domain (DUF2172)
C-terminal wHTH domain (HTH_47)
As mentioned above, crystal packing analysis predicts a trimer as the oligomeric form in solution, which is supported by size-exclusion chromatography coupled with static light scattering. The trimeric assembly is formed by the interaction of residues in the wHTH domain (loop residues 362–368 and helix residues 389–393) with loop residues 62–64 in the insert domain and loop residues 302–305 and 293–294 in the metallopeptidase-like domain. Some of these residues forming the assembly in all 3 domains show high conservation, indicating that these are likely to be the key binding residues in the protein interaction interface. In particular, a substantial portion of the surface on one side of the wHTH appears to be responsible for mediating the monomer protein interactions in the oligomeric state, covering the majority of the highly conserved residues. These observations strongly suggest that the wHTH functions in mediating protein interactions in the oligomeric state.
Conserved gene neighborhoods point to a potential role in modified carbohydrate biosynthesis
Gene neighborhood analysis
N-acetylneuraminic acid synthase + SAF sugar-binding (condenses of phosphoenolpyruvate and N-acetylmannosamine)
ATP-grasp amino acyl ligase
Sugar phosphate nucleotidyltransferase
DUF3880 + Glycosyltransferase
nucleoside-diphosphate sugar epimerase
Peptidase-like (peptidase_MH superfamily)
Methyltransferase + Glycosyltransferase (currently annotated as: MAF_flag10, DUF115)
acyl carrier protein
aminosugsar N-acetyltransferase + HAD Phosphatase
This linkage between a gene coding for a peptidase-like protein with a carbohydrate biosynthetic system could be explained in at least three alternative ways: 1) CA_C2195 protein and its homologs are post-translationally glycosylated; 2) The DUF4910 domain cleaves target proteins alongside their modification by glycosylation; 3) The DUF4910 domain actually participates in the biosynthesis of a sugar-derived metabolite by catalyzing a reaction biochemically distinct from the classical peptidase reaction. Circumstantial evidence supports the third alternative. First, as discussed above, the CA_C2195-like genes do not seem to preserve the conventional metallopeptidase active site. Moreover, these genes are usually embedded in the middle of an operon with genes for carbohydrate-modifying enzymes on either side. Second, these operons do not show any linked genes coding for other potential target proteins. Third, in several cases these operons contain genes for a transmembrane carbohydrate export protein (related to the O-antigen and teichoic acid export proteins) and transmembrane sugar pyruvyltransferase (Table 2, Additional file 1, Additional file 2). These proteins suggest that the modified carbohydrate is unlikely to be used to modify intracellular proteins; rather it is likely to be translocated to the cell-surface and used as part of a surface polysaccharide/lipopolysaccharide. In light of these observations it is possible that DUF4910 is involved in modification of the sugar-derived metabolites, perhaps via transacylation of a peptide/glutamine to an amino sugar. In principle, they could also be used in an amidase reaction for deacylation of a sugar amide, but this would imply that they utilize distinctive active site residues (see above). TMPRED (http://www.ch.embnet.org/software/TMPRED_form.html) predicts one significant transmembrane helix in CA_C2195 (residues 192–213, inside to outside, score 557), which is buried in the metallopeptidase-like domain (and therefore incorrectly predicted to be transmembrane), and Phobius  predicts most of the protein to be extracellular, with a dip where the possible transmembrane helix might be. SignalP  fails to predict a signal peptide and so it is unknown how this protein gets into the periplasm or if it is extracellular.
The crystal structure of CA_C2195 and subsequent sequence-structure-function analysis shows that CA_C2195 (and ~200 homologs, ranging in sequence identity from 40-60%) is a three-domain protein, which includes a C-terminal wHTH domain and a DUF2172 domain inserted in the DUF4910 metallopeptidase-like domain. The presence of the PA domain-like DUF2172 domain shows similarity in domain architecture to some members of the Peptidase_M28 family [PDB: 2ek8]. However, the presences of a C-terminal wHTH domain in CA_C2195, shows similarity to domain architectures found in Peptidase_M24 [PDB:1boa]. Analysis of sequence conservation reveals a cluster of non-sequential, highly conserved residues on the surface of the structure of CA_C2195, which are likely to be functionally important, some of which in the wHTH are involved in forming the protein interaction interface in the oligomeric form. It is possible that these proteins do not have any metallopeptidase activity because of the absence of all the catalytic residues that are expected from other characterized members of this peptidase clan. Based on gene neighborhood analysis, we propose that CA_C2195 and its homologs could be involved in the biosynthesis of modified carbohydrates. Given the importance of cell surface polysaccharides in inter-organismal interactions, further characterization of the biochemical activity of this protein is likely to be of interest in the case of pathogens that encode a CA_C2195 like gene, such as Brucella and Campylobacter.
Protein production and crystallization of CA_C2195 was carried out by standard JCSG protocols [36–38]. Data collection was performed at SSRL beamline 9–2. The crystal structure was determined by MAD phasing using a seleno-methionine-derivatized protein. X-ray data collection, processing, structure solution, tracing, crystallographic refinement and model building were performed using BLU-ICE , MOSFLM /SCALA , SHELXD /AUTOSHARP , ARP/wARP , REFMAC  and COOT . To find homologs for sequence conservation analysis, PSI-BLAST was used to search the Uniref90 database in 3 iterations with e-value cutoff of 0.0001, searching for a maximum of 150 homologs between 35-95%, using MAFFT as the alignment method MAFFT, Bayesian calculation method, and JTT evolutionary substitution method, as implemented in CONSURF . Figure 2 was prepared using Chimera (http://www.cgl.ucsf.edu/chimera) and all others were prepared using PyMOL . The topology diagrams in Figure 7C are from PDBsum . Gene neighborhood was comprehensively analyzed using a custom Perl script using the CA_C2195 gene or its homolog as anchors. This script uses either the PTT file (downloadable from the NCBI ftp site) or the Genbank file in the case of whole genome shot gun sequences to extract 20 gene neighbors on the 3’ and 5’ sides of a given query gene. The protein sequences of all neighbors were clustered using the BLASTCLUST program (ftp://ftp.ncbi.nih.gov/blast/documents/blastclust.html) to identify related sequences in gene neighborhoods. Each cluster of homologous proteins were then assigned an annotation based on the domain architecture or conserved shared domain which were detected using Pfam models and in-house profiles run using RPS-BLAST . This allowed an initial annotation of gene neighborhoods and their grouping based on conservation of neighborhood associations. In further analysis, care was taken to ensure that genes are unidirectional on the same strand of DNA and shared a putative common promoter to be counted as a single operon. If they were head to head on opposite strands they were examined for potential bidirection promoter sharing patterns. A total of 4789 representative bacterial and archaeal genomes were analyzed for the detection of CA_C2195 orthologs. These genomes spanned representatives of all currently known major lineages of bacteria and archaea. From these 229 genomes were identified as having CA_C2195 orthologs with gene neighborhoods and further analysis was performed on this subset of genomes. Within this subset conserved gene neighborhood associations were detected in 10 major bacterial clades namely actinobacteria, firmicutes, cyanobacteria, planctomycetes, bacteroidetes, nitrospirae, alphaproteobacteria, betaproteobacteria, epsilonproteobacteria and spirochaetes. Using a simulation with sampling with no replacement and the average genome size of 4000 genes we found that such genes as described above coming together by chance alone in such neighborhoods was p < 10-9. For all bioinformatics analyses that were performed using homologs within a family for comparison, the chosen sequences were well over the inclusion threshold for the family as built.
Availability of supporting data
Atomic coordinates and experimental structure factors for CA_C2195 have been deposited in the Protein Data Bank (http://www.wwpdb.org) with PDB accession code 3k9t (DOI:10.2210/pdb3k9t/pdb).
We are grateful to the Sanford Burnham Medical Research Institute and University of California, San Diego, for hosting the DUF Annotation Jamboree in June 2013, which allowed the authors to collaborate on this work. We would like to thank all the other participants of this workshop for their intellectual contributions to this work: Herbert Axelrod, Yuanyuan Chang, Ruth Y. Eberhardt, William Hwang, Lukasz Jaroszewski, Padmaja Natarajan, Marco Punta, Daniel Rigden, Mayya Sedova, Anna Sheydina and John Wooley. We would like to thank Mayya Sedova for assistance in preparing some of the figures. We thank members of the JCSG High-Throughput Structural Biology pipeline for their contribution to this work. This work was supported in part by National Institutes of Health, USA, grant U54 GM094586 from the NIGMS Protein Structure Initiative to JCSG; National Science Foundation grants IIS-0646708 and IIS-1153617; UK Medical Research Council grant MC_U105192716 to AGM; intramural funds of the National Library of Medicine, USA, to LA; NIH grant R01GM101457 to AG; Howard Hughes Medical Institute to RDF; and Wellcome Trust grant WT077044/Z/05/Z for funding for open access charges. Portions of this research were carried out at the Stanford Synchrotron Radiation Lightsource, a Directorate of SLAC National Accelerator Laboratory and an Office of Science User Facility operated for the U.S. Department of Energy Office of Science by Stanford University. The SSRL Structural Molecular Biology Program is supported by the DOE Office of Biological and Environmental Research, and by the National Institutes of Health, National Institute of General Medical Sciences (including P41GM103393). The contents of this publication are solely the responsibility of the authors and do not necessarily represent the official views of DOE, NSF, NIGMS or NIH.
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389.View ArticlePubMed CentralPubMedGoogle Scholar
- Dessailly BH, Nair R, Jaroszewski L, Fajardo JE, Kouranov A, Lee D, Fiser A, Godzik A, Rost B, Orengo C: PSI-2: structural genomics to cover protein domain family space. Structure. 2009, 17 (6): 869-881. 10.1016/j.str.2009.03.015.View ArticlePubMed CentralPubMedGoogle Scholar
- Norvell JC, Berg JM: Update on the protein structure initiative. Structure. 2007, 15 (12): 1519-1522. 10.1016/j.str.2007.11.004.View ArticlePubMedGoogle Scholar
- Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer EL, Tate J, Punta M: Pfam: the protein families database. Nucleic Acids Res. 2014, 42 (1): D222-D230.View ArticlePubMed CentralPubMedGoogle Scholar
- Rowsell S, Pauptit RA, Tucker AD, Melton RG, Blow DM, Brick P: Crystal structure of carboxypeptidase G2, a bacterial enzyme with applications in cancer therapy. Structure. 1997, 5 (3): 337-347. 10.1016/S0969-2126(97)00191-3.View ArticlePubMedGoogle Scholar
- Bitto E, Bingman CA, Wesenberg GE, McCoy JG, Phillips GN: Structure of aspartoacylase, the brain enzyme impaired in Canavan disease. Proc Natl Acad Sci USA. 2007, 104 (2): 456-461. 10.1073/pnas.0607817104.View ArticlePubMed CentralPubMedGoogle Scholar
- Rawlings ND, Barrett AJ: Introduction: metallopeptidases and their clans. Handbook of Proteolytic Enzymes. Edited by: Rawlings ND, Salvesen GS. 2013, Amsterdam: Elsevier, 325-370. 3View ArticleGoogle Scholar
- Auld DS: Catalytic Mechanisms for metallopeptidases. Handbook of Proteolytic Enzymes. Edited by: Rawlings ND, Salvesen GS. 2013, Amsterdam: Elsevier, 370-396. 3View ArticleGoogle Scholar
- Rawlings ND, Barrett AJ, Bateman A: MEROPS: the database of proteolytic enzymes, their substrates and inhibitors. Nucleic Acids Res. 2012, 40 (Database issue): D343-D350.View ArticlePubMed CentralPubMedGoogle Scholar
- Hermann JC, Marti-Arbona R, Fedorov AA, Fedorov E, Almo SC, Shoichet BK, Raushel FM: Structure-based activity prediction for an enzyme of unknown function. Nature. 2007, 448 (7155): 775-779. 10.1038/nature05981.View ArticlePubMed CentralPubMedGoogle Scholar
- Das D, Lee WS, Grant JC, Chiu HJ, Farr CL, Vance J, Klock HE, Knuth MW, Miller MD, Elsliger MA, Deacon AM, Godzik A, Lesley SA, Kornfeld S, Wilson IA: Structure and function of the DUF2233 domain in bacteria and in the human mannose 6-phosphate uncovering enzyme. J biol Chem. 2013, 288 (23): 16789-16799. 10.1074/jbc.M112.434977.View ArticlePubMed CentralPubMedGoogle Scholar
- Das D, Hervé M, Feuerhelm J, Farr CL, Chiu HJ, Elsliger MA, Knuth MW, Klock HE, Miller MD, Godzik A, Lesley SA, Deacon AM, Mengin-Lecreulx D, Wilson IA: Structure and function of the first full-length murein peptide ligase (Mpl) cell wall recycling protein. PLoS One. 2011, 6 (3): e17624-10.1371/journal.pone.0017624.View ArticlePubMed CentralPubMedGoogle Scholar
- Das D, Moiani D, Axelrod HL, Miller MD, McMullan D, Jin KK, Abdubek P, Astakhova T, Burra P, Carlton D, Chiu HJ, Clayton T, Deller MC, Duan L, Ernst D, Feuerhelm J, Grant JC, Grzechnik A, Grzechnik SK, Han GW, Jaroszewski L, Klock HE, Knuth MW, Kozbial P, Krishna SS, Kumar A, Marciano D, Morse AT, Nigoghossian E, Okach L, et al: Crystal structure of the first eubacterial Mre11 nuclease reveals novel features that may discriminate substrates during DNA repair. J Mol Biol. 2010, 397 (3): 647-663. 10.1016/j.jmb.2010.01.049.View ArticlePubMed CentralPubMedGoogle Scholar
- Das D, Hyun H, Lou Y, Yokota H, Kim R, Kim SH: Crystal structure of a novel single-stranded DNA binding protein from Mycoplasma pneumoniae. Proteins. 2007, 67 (3): 776-782. 10.1002/prot.21340.View ArticlePubMedGoogle Scholar
- Shin DH, Hou J, Chandonia JM, Das D, Choi IG, Kim R, Kim SH: Structure-based inference of molecular functions of proteins of unknown function from Berkeley Structural Genomics Center. J Struct Funct Genomics. 2007, 8 (2–3): 99-105.View ArticlePubMedGoogle Scholar
- Goodacre NF, Gerloff DL, Uetz P: Protein domains of unknown function are essential in bacteria. mBio. 2013, 5 (1): e00744-13. doi:10.1128/mBio.00744-13PubMed CentralPubMedGoogle Scholar
- Diederichs K, Karplus PA: Improved R-factors for diffraction data analysis in macromolecular crystallography. Nat Struct Biol. 1997, 4 (4): 269-275. 10.1038/nsb0497-269.View ArticlePubMedGoogle Scholar
- Cruickshank DW: Remarks about protein structure precision. Acta Crystallogr D Biol Crystallogr. 1999, 55 (Pt 3): 583-601.View ArticlePubMedGoogle Scholar
- Weiss MS, Hilgenfeld R: On the use of the merging R factor as a quality indicator for X-ray data. J Appl Crystallogr. 1997, 30: 203-205. 10.1107/S0021889897003907.View ArticleGoogle Scholar
- Weiss MS, Metzner HJ, Hilgenfeld R: Two non-proline cis peptide bonds may be important for factor XIII function. FEBS Lett. 1998, 423 (3): 291-296. 10.1016/S0014-5793(98)00098-2.View ArticlePubMedGoogle Scholar
- Krissinel E, Henrick K: Inference of macromolecular assemblies from crystalline state. J Mol Biol. 2007, 372 (3): 774-797. 10.1016/j.jmb.2007.05.022.View ArticlePubMedGoogle Scholar
- Krissinel E, Henrick K: Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr D Biol Crystallogr. 2004, 60 (Pt 12 Pt 1): 2256-2268.View ArticlePubMedGoogle Scholar
- Aravind L, Anantharaman V, Balaji S, Babu MM, Iyer LM: The many faces of the helix-turn-helix domain: transcription regulation and beyond. FEMS Microbiol Rev. 2005, 29 (2): 231-262.View ArticlePubMedGoogle Scholar
- Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJ, Chothia C, Murzin AG: Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res. 2008, 36 (Database issue): D419-D425.PubMed CentralPubMedGoogle Scholar
- Botelho TO, Guevara T, Marrero A, Arede P, Fluxa VS, Reymond JL, Oliveira DC, Gomis-Ruth FX: Structural and functional analyses reveal that Staphylococcus aureus antibiotic resistance factor HmrA is a zinc-dependent endopeptidase. J Biol Chem. 2011, 286 (29): 25697-25709. 10.1074/jbc.M111.247437.View ArticlePubMed CentralPubMedGoogle Scholar
- Schilling S, Cynis H, von Bohlen A, Hoffmann T, Wermann M, Heiser U, Buchholz M, Zunkel K, Demuth HU: Isolation, catalytic properties, and competitive inhibitors of the zinc-dependent murine glutaminyl cyclase. Biochemistry. 2005, 44 (40): 13415-13424. 10.1021/bi051142e.View ArticlePubMedGoogle Scholar
- Cheng Y, Zak O, Aisen P, Harrison SC, Walz T: Structure of the human transferrin receptor-transferrin complex. Cell. 2004, 116 (4): 565-576. 10.1016/S0092-8674(04)00130-8.View ArticlePubMedGoogle Scholar
- Luo X, Hofmann K: The protease-associated domain: a homology domain associated with multiple classes of proteases. Trends Biochem Sci. 2001, 26 (3): 147-148. 10.1016/S0968-0004(00)01768-0.View ArticlePubMedGoogle Scholar
- Finn RD, Clements J, Eddy SR: HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 2011, 39 (Web Server issue): W29-W37.View ArticlePubMed CentralPubMedGoogle Scholar
- Liu S, Widom J, Kemp CW, Crews CM, Clardy J: Structure of human methionine aminopeptidase-2 complexed with fumagillin. Science. 1998, 282 (5392): 1324-1327.View ArticlePubMedGoogle Scholar
- Aravind L: Guilt by association: contextual information in genome analysis. Genome Res. 2000, 10 (8): 1074-1077. 10.1101/gr.10.8.1074.View ArticlePubMedGoogle Scholar
- Huynen M, Snel B, Lathe W, Bork P: Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res. 2000, 10 (8): 1204-1210. 10.1101/gr.10.8.1204.View ArticlePubMed CentralPubMedGoogle Scholar
- Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, de Crécy-Lagard V, Diaz N, Disz T, Edwards R, Fonstein M, Frank ED, Gerdes S, Glass EM, Goesmann A, Hanson A, Iwata-Reuyl D, Jensen R, Jamshidi N, Krause L, Kubal M, Larsen N, Linke B, McHardy AC, Meyer F, Neuweger H, Olsen G, Olson R, Osterman A, Portnoy V, et al: The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 2005, 33 (17): 5691-5702. 10.1093/nar/gki866.View ArticlePubMed CentralPubMedGoogle Scholar
- Kall L, Krogh A, Sonnhammer EL: Advantages of combined transmembrane topology and signal peptide prediction--the Phobius web server. Nucleic Acids Res. 2007, 35 (Web Server issue): W429-W432.View ArticlePubMed CentralPubMedGoogle Scholar
- Petersen TN, Brunak S, von Heijne G, Nielsen H: SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods. 2011, 8 (10): 785-786. 10.1038/nmeth.1701.View ArticlePubMedGoogle Scholar
- Klock HE, Koesema EJ, Knuth MW, Lesley SA: Combining the polymerase incomplete primer extension method for cloning and mutagenesis with microscreening to accelerate structural genomics efforts. Proteins. 2008, 71 (2): 982-994. 10.1002/prot.21786.View ArticlePubMedGoogle Scholar
- Lesley SA, Kuhn P, Godzik A, Deacon AM, Mathews I, Kreusch A, Spraggon G, Klock HE, McMullan D, Shin T, Vincent J, Robb A, Brinen LS, Miller MD, McPhillips TM, Miller MA, Scheibe D, Canaves JM, Guda C, Jaroszewski L, Selby TL, Elsliger MA, Wooley J, Taylor SS, Hodgson KO, Wilson IA, Schultz PG, Stevens RC: Structural genomics of the Thermotoga maritima proteome implemented in a high-throughput structure determination pipeline. Proc Natl Acad Sci USA. 2002, 99 (18): 11664-11669. 10.1073/pnas.142413399.View ArticlePubMed CentralPubMedGoogle Scholar
- Elsliger MA, Deacon AM, Godzik A, Lesley SA, Wooley J, Wuthrich K, Wilson IA: The JCSG high-throughput structural biology pipeline. Acta Crystallogr Sect F: Struct Biol Cryst Commun. 2010, 66 (Pt 10): 1137-1142.View ArticleGoogle Scholar
- McPhillips TM, McPhillips SE, Chiu HJ, Cohen AE, Deacon AM, Ellis PJ, Garman E, Gonzalez A, Sauter NK, Phizackerley RP, Soltis SM, Kuhn P: Blu-Ice and the Distributed Control System: software for data acquisition and instrument control at macromolecular crystallography beamlines. J Synchrotron Radiat. 2002, 9 (Pt 6): 401-406.View ArticlePubMedGoogle Scholar
- Collaborative Computing Project: The CCP4 suite: programs for protein crystallography. Acta Crystallogr. Sect. D Biol. Crystallogr. 1994, 50: 760-763. 10.1107/S0907444994003112.View ArticleGoogle Scholar
- Collaborative Computing Project N: The CCP4 suite: programs for protein crystallography. Acta Crystallogr D Biol Crystallogr. 1994, 50 (Pt 5): 760-763.View ArticleGoogle Scholar
- Sheldrick GM: A short history of SHELX. Acta Crystallogr A. 2008, 64 (Pt 1): 112-122.View ArticlePubMedGoogle Scholar
- Vonrhein C, Blanc E, Roversi P, Bricogne G: Automated structure solution with autoSHARP. Methods Mol Biol. 2007, 364: 215-230.PubMedGoogle Scholar
- Langer G, Cohen SX, Lamzin VS, Perrakis A: Automated macromolecular model building for X-ray crystallography using ARP/wARP version 7. Nat Protoc. 2008, 3 (7): 1171-1179. 10.1038/nprot.2008.91.View ArticlePubMed CentralPubMedGoogle Scholar
- Winn MD, Murshudov GN, Papiz MZ: Macromolecular TLS refinement in REFMAC at moderate resolutions. Methods Enzymol. 2003, 374: 300-321.View ArticlePubMedGoogle Scholar
- Emsley P, Cowtan K: Coot: model-building tools for molecular graphics. Acta Crystallogr D Biol Crystallogr. 2004, 60 (Pt 12 Pt 1): 2126-2132.View ArticlePubMedGoogle Scholar
- Ashkenazy H, Erez E, Martz E, Pupko T, Ben-Tal N: ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids. Nucleic Acids Res. 2010, 38 (Web Server issue): W529-W533.View ArticlePubMed CentralPubMedGoogle Scholar
- DeLano WL: The PyMOL Molecular Graphics System. 2008, Palo Alto, CA, USA: DeLano Scientific LLCGoogle Scholar
- Laskowski RA: PDBsum new things. Nucleic Acids Res. 2009, 37 (Database issue): D355-D359.View ArticlePubMed CentralPubMedGoogle Scholar
- Schaffer AA, Wolf YI, Ponting CP, Koonin EV, Aravind L, Altschul SF: IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics. 1999, 15 (12): 1000-1011. 10.1093/bioinformatics/15.12.1000.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.