Linear array of conserved sequence motifs to discriminate protein subfamilies: study on pyridine nucleotide-disulfide reductases
© Avila et al; licensee BioMed Central Ltd. 2007
Received: 17 October 2006
Accepted: 16 March 2007
Published: 16 March 2007
The pyridine nucleotide disulfide reductase (PNDR) is a large and heterogeneous protein family divided into two classes (I and II), which reflect the divergent evolution of its characteristic disulfide redox active site. However, not all the PNDR members fit into these categories and this suggests the need of further studies to achieve a more comprehensive classification of this complex family.
A workflow to improve the clusterization of protein families based on the array of linear conserved motifs is designed. The method is applied to the PNDR large family finding two main groups, which correspond to PNDR classes I and II. However, two other separate protein clusters, previously classified as class I in most databases, are outgrouped: the peroxide reductases (NAOX, NAPE) and the type II NADH dehydrogenases (NDH-2). In this way, two novel PNDR classes III and IV for NAOX/NAPE and NDH-2 respectively are proposed. By knowledge-driven biochemical and functional data analyses done on the new class IV, a linear array of motifs putatively related to Cu(II)-reductase activity is detected in a specific subset of NDH-2.
The results presented are a novel contribution to the classification of the complex and large PNDR protein family, supporting its reclusterization into four classes. The linear array of motifs detected within the class IV PNDR subfamily could be useful as a signature for a particular subgroup of NDH-2.
Sequence information and protein function
One of the main aims of computational biology is to infer protein function using sequence information. Clustering of proteins in families sharing functional characteristics and derived from a common ancestor is a key purpose of sequence comparative analyses. However, algorithms used to explore sequence similarity and to retrieve homologous proteins, such as BLAST  or FASTA , are not sensitive enough to find out evolutionary divergent members of large protein families, while PSI-BLAST  increases the sensitivity on detriment of specificity. In this scenario, an efficient strategy to detect more divergent sequences within protein families is the use of sequence motifs. These are highly conserved regions across a subset of proteins sharing the same function. In general, they play an important role in protein functions and folds . Furthermore, several motifs may be arranged into fingerprints which improve the detection of remote homologous proteins reducing the "noise" that accompanies the local alignment algorithms.
Databases of motifs and domains
The most widely used biological databases that explore and classify proteins according to their composition in patterns, motifs or domains are: PROSITE ; Pfam ; BLOCKS ; PRINTS  and InterPro . InterPro is at present the database that includes the widest and most comprehensive classification of proteins families since it integrates the information about domains and motifs from most of the other databases . While all of the resources share a common interest in protein family classification using sequence similarity as a key factor to achieve their purposes, the focus of each database is different. In fact, the specific methods underpinning each of them are optimal for different purposes. Some databases use all-encompassing domain/profile-based approaches and they are good to detect members of divergent superfamilies (i.e., Pfam). Others use motif-based approaches which correspond to functional sites, being appropriated to detect members of more specific subfamilies (i.e., PROSITE and PRINTS). Several of these databases also include structural information useful for the identification of globular protein domains. In general, the protein families proposed are quite large, having many members with rather different biochemical functions and activities. Therefore, it is important to try new ways to achieve an improved functional assignment within the protein families.
The PNDR protein family
The pyridine nucleotide disulfide reductases (PNDR) is a large and heterogeneous protein family with a characteristic disulfide redox-active site together with the NAD(P)H and FAD binding sites . In InterPro the PNDR includes around 10,000 proteins divided in two main subfamilies: class I (IPR001100) with 6,701 sequences and class II (IPR000103) with 2,809 sequences. These two large classes are further divided in other subfamilies that incorporate subsets of proteins with some specific motifs or domains. For each subset, a different InterPro accession number is given. More restrictive motifs in order to place a protein in the PNDR family are the "active site class I" and "active site class II", bearing InterPro IDs: IPR012999 (1,608 sequences) and IPR008255 (701 sequences), respectively.
Since most proteins have multiple domains and motifs and InterPro give a particular assignment for each of them, the highly conserved modules (domains or motifs) impose strong bias in the protein classifications. In the case of the PNDR superfamily, all of the proteins include one NAD+ binding motif and two FAD binding motifs. These highly conserved motifs bring together a large amount of nucleotide oxidoreductases that many times have quite different functional activities. In contrast to all the InterPro PNDR class I and class II hits, only a very small subset of proteins are included in PROSITE and in PRINTS. The PROSITE's active site pattern defined as PNDR class I (PS00076) includes 124 proteins and class II (PS00573) 73 proteins. The PRINTS' signature defined as PNDR class I (PR00411) includes a true set of 102 sequences and class II (PR00469) 41 sequences.
Type II NADH dehydrogenase (NDH-2) is a member of the PNDR family that catalyzes the electron transfer from NAD(P)H to quinones without energy-transduction. A large number of organisms, ranging from archaea to eukaryotes, present NDH-2 besides the canonical rotenone-sensitive type I NADH dehydrogenase . The NDH-2 enzyme could be considered a redundant protein, but it acquires other roles in certain organisms in some conditions. For example, in Escherichia coli, NDH-2 has Cu(II)-reductase activity  rendering the cells more stable in front of high or very low copper concentrations in the culture media ; in Azotobacter vinelandii, it protects the nitrogenase complex against O2 inhibition ; in Methylococcus capsulatus, it mediates the electron transfer to the membrane-bound methane monooxygenase . This plasticity could be associated with the presence of some specific functional motifs.
Linear array of motifs to classify the PNDR protein family
Considering that only a small proportion of the proteins clusterized as PNDR class I and PNDR class II in InterPro have a PROSITE's PNDR active site signature, a revision of this superfamily classes was performed. To achieve this, we applied a workflow to the whole PNDR family that resulted in the identification of two new outgroups: class III and class IV. The specific linear array of motifs provides a good fingerprint to describe each class. Furthermore, one of these fingerprints aided to the analysis of PNDR class IV leading to the discovery of a new linear motif that could explain the observed Cu(II)-reductase activity in a subset of the NDH-2 proteins.
Results and Discussion
Identification of four classes within the PNDR family
In order to achieve a comprehensive classification of the PNDR family the following workflow was designed and performed:
1.- All the sequences annotated as PNDR in UniProt/SwissProt database were extracted and grouped into eleven initial protein groups or clusters based on their biochemical function: AHPF, bacterial alkyl hydroperoxide reductases; DHNA, NADH dehydrogenases or alkyl hydroperoxide reductases; DLDH, lipoamide dehydrogenase; GSHR, glutathione reductase; MERA, mercuric reductase; NAOX, NADH oxidase; NAPE, NADH peroxidase; NDH-2, NADH dehyrogenase-2; TRXB, prokaryotes, archaea and lower eukaryotes thioredoxin reductases; TRXR, higher eukaryotes thioredoxin reductases; TYTR, trypanothione reductase.
2.- For each one of the eleven groups a multiple sequence alignment (MSA) was built and a HMM profile was derived . The number of proteins included in each MSA was: 5 AHPF, 41 DLDH, 25 GSHR, 13 MERA, 36 TRXB, 8 TRXR, 5 TYTR. UniProt/SwisProt has not enough DHNA, NAOX, NAPE or NDH-2 sequences to construct a HMM profile. In order to obtain at least 5 candidates for these groups, manual retrieval of the most referenced sequences in UniProt/TrEMBL was performed.
3.- To enrich the MSA of the eleven groups, new sequences were extracted from UniProt/TrEMBL database using a two-way search method: (i) the database was scanned with the eleven HMM profiles; (ii) the newly found sequences were then compared with all the groups using a second round of BLAST and they were assigned to a given group according to the lowest BLAST E-value found. The number of proteins of each final cluster after filtering out the redundancy was: 43 AHPF, 21 DHNA, 97 DLDH, 71 GSHR, 60 MERA, 61 NAOX, 11 NAPE, 56 NDH-2, 99 TRXB, 69 TRXR, 20 TYTR.
4.- Each cluster was submitted to MEME  for the detection of conserved blocks. All MEME parameters were set as default except for the maximum number of motifs which was established in 10, based on the number of motifs present in the PNDRDTASE I (PR00411) and the PNDRDTASE II (PR00469) fingerprints from PRINTS database.
5.- The groups were then globally analyzed by comparison of all the conserved blocks using LAMA tool . Sequences with most of their blocks in common were regrouped together and sent back to step 4 up to reach stable groups.
Phylogenetic analysis of the PNDR sequences
PNDR classes III and IV
The NADH peroxidases (new PNDR class III) are structural homologues of GSHR but they do not contain a C-x(2,4)-C redox active motif . Studies on NAPE from Enterococcus faecalis demonstrated that a modified single Cys residue (Cys-sulfenic acid, Cys42-SOH) plays the catalytic redox role. In the same way, NAOX from E. faecalis has mechanistic and structural similarities .
The type II NADH dehydrogenases (new PNDR class IV) correspond to a group of proteins widely spread in nature. Only some of them have a C-x-x-C motif, but there is little information about its biochemical function. In NDH-2 from E. coli the C-x-x-C motif seems to be involved in copper binding and it has been related to the Cu(II)-reductase activity of this enzyme .
The fingerprint based PNDR clusterization proposed above is in agreement with the classical classification criteria, which assume that these enzymes were originated by divergent evolution from an ancestral FAD/NAD(P)H oxidase and they had acquired their disulfide reductase activities independently. The segregation of the PNDR family in four classes with clear differences in their disulfide redox active sites is in support of an independent evolutionary trail.
Specific linear motifs detected in NDH-2 PNDR class IV
Using this signature as a search motif, many other NDH-2s from bacteria that resist high copper concentrations were retrieved: Pseudomonas putida (57% ID against E. coli NDH-2), Pseudomonas syringae (54% ID), Ralstonia sp. (48% ID), Salmonella enterica serovar Typhimurium (97% ID),Erwinia amylovora (80% ID), Cupriavidus necator (50% ID) [29–33]. Since the pair-wise identity (ID) in the full sequences alignment ranges from 99 % to 22 %, the conserved region found is considered statistically significant and therefore it can be related to a specific function within the NDH-2 subfamily. High conservation of this region reflects a relatively high selection pressure through the evolution. Further experimental studies will be needed in order to test the proposed functional assignment predicted for the NDH-2 proteins.
The wide use of protein sequence analysis together with the exponential growth of protein databases justify the demand of new approaches to protein comparison and functional classification. The presented strategy proves to be efficient to perform protein clusterization based on the assignment of fingerprints (i.e. linear array of conserved motifs and domains), since it allows to recognize divergent members within large protein families. The proposed method is applied to the complex and large PNDR protein family improving its classification and detecting two non-described outgroups. Moreover, a specific array of motifs detected in a subset of proteins within class IV PNDR subfamily is proposed as a new signature for NDH-2 proteins that may have copper binding-redox activity.
In order to automate data processing, standard bioinformatics tools in addition to a few home-built PERL scripts were used.
Construction and matching up of PNDR family fingerprints
Initial protein clusters were built using keyword search in UniProt/SwissProt database. MSA were generated using ClustalX  with default parameterization. HMM profiles  were created with HMMER package . Initial clusters were further enriched implementing a reciprocal best-match approach (i.e. extracting sequences from UniProt/TrEMBL based on a HMM profile search and performing the reverse BLAST  search of each sequence against all clusters). Conserved blocks among each cluster were detected by MEME  setting the maximum number of blocks to 10 and all other parameters as default. Block to block comparison was performed using LAMA . Clusters having more than 8 blocks in common were fused together.
Unrooted tree for the PNDR family was generated with four randomly selected sequences from each of the initial clusters, i.e. AHPF, DHNA, DLDH, GSHR, MERA, NAOX, NAPE, NDH-2, TRXB, TRXR, TYTR. The neighbor-joining analysis was performed from a MSA with default parameterization and with 1000 bootstrap trials using ClustalX . The results were visualized with TreeView . The maximum parsimony method was carried out using PHYLIP package .
Designing functional protein signature related to copper metabolism in NDH-2
NCBI's non-redundant database (nrdb) was scanned for NDH-2s using MAST  fed with the PNDR class IV fingerprint with a 10-25 E-value cut-off. Only sequences bearing a C-x-x-C motif were kept and redundancy was further removed based on a 90 % sequence identity criteria. Motifs were extracted using PRATT  and MEME  automatic algorithms, and further reinforced by manual inspection of the MSA. Pattern based searches were performed using ScanProsite standalone tool . The LOGO representation of sequence conservation was created by WebLogo .
List of abbreviations used
Multiple sequence alignment
Hidden Markov Model
higher eukaryotes thioredoxin reductases
prokaryotes, archaea and lower eukaryotes thioredoxin reductases
bacterial alkyl hydroperoxide reductases
NADH dehydrogenases or alkyl hydroperoxide reductases.
This research was supported by grants PIP 6399 from Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), PICT 22221 from Agencia Nacional de Promoción Científica y Tecnológica (ANPCyT), and 26/D-313 from Secretaría de Ciencia y Técnica de la Universidad Nacional de Tucumán (CIUNT) all in Argentina. CLA is a CONICET fellow. The authors are grateful to Ms. Amelia Campos for the English revision of the manuscript.
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215: 403–410.View ArticlePubMedGoogle Scholar
- Pearson WR: Using the FASTA program to search proteins and DNA sequence database. Methods Mol Biol 1994, 25: 365–389.PubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acid Res 1997, 25: 3389–3402. 10.1093/nar/25.17.3389PubMed CentralView ArticlePubMedGoogle Scholar
- Lesk AM: Computational Molecular Biology. Volume 17. Edited by: Lesk AM. Oxford University Press, Oxford; 1988:In 26.Google Scholar
- Hofmann K, Bucher P, Falquet L, Bairoch A: The PROSITE database, its status in 1999. Nucleic Acid Res 1999, 27: 215–219. 10.1093/nar/27.1.215PubMed CentralView ArticlePubMedGoogle Scholar
- Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR, Griffiths-Jones S, Howe KL, Marshall M, Sonnhammer EL: The Pfam Protein Families Database. Nucleic Acid Res 2002, 30: 276–280. 10.1093/nar/30.1.276PubMed CentralView ArticlePubMedGoogle Scholar
- Henikoff S, Henikoff JG, Pietrokovski S: Blocks: a non-redundant database of proteins alignments derived from multiple compilations. Bioinformatics 1999, 15: 471–479. 10.1093/bioinformatics/15.6.471View ArticlePubMedGoogle Scholar
- Attwood TK, Bradley P, Flower DR, Gaulton A, Maudling N, Mitchell AL, Moulton G, Nordle A, Paine K, Taylor P, Uddin A, Zygouri C: PRINTS and its automatic supplement, prePRINTS. Nucleic Acids Research 2003, 31: 400–402. 10.1093/nar/gkg030PubMed CentralView ArticlePubMedGoogle Scholar
- Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bradley P, Bork P, Bucher P, Cerutti L, Copley R, Courcelle E, Das U, Durbin R, Fleischmann W, Gough J, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lonsdale D, Lopez R, Letunic I, Madera M, Maslen J, McDowall J, Mitchel A, Nikolskaya AN, Orchard S, Pagni M, Ponting CP, Quevillon E, Selengut J, Sigrist CJ, Silventoinen V, Studholme DJ, Vaughan R, Wu CH: InterPro, progress and status in 2005. Nucleic Acids Res 2005, 33: D201–205. 10.1093/nar/gki106PubMed CentralView ArticlePubMedGoogle Scholar
- Argyrou A, Blanchard JS: Flavoprotein disulfide reductases: advances in chemistry and function. Prog Nucleic Acid Res Mol Biol 2004, 78: 89–142.View ArticlePubMedGoogle Scholar
- Melo AM, Bandeiras TM, Teixeira M: New insights into type II NAD(P)H:quinone oxidoreductases. Microbiol Mol Biol Rev 2004, 68: 603–616. 10.1128/MMBR.68.4.603-616.2004PubMed CentralView ArticlePubMedGoogle Scholar
- Rapisarda VA, Rodriguez Montelongo L, Farias RN, Massa EM: Characterization of an NADH-linked cupric reductase activity from the Escherichia coli respiratory chain. Arch Biochem Biophys 1999, 370: 143–150. 10.1006/abbi.1999.1398View ArticlePubMedGoogle Scholar
- Rodriguez-Montelongo L, Volentini SI, Farias RN, Massa EM, Rapisarda VA: The Cu(II)-reductase NADH dehydrogenase-2 of Escherichia coli improves the bacterial growth in extreme copper concentrations and increases the resistance to the damage caused by copper and hydroperoxide. Arch Biochem Biophys 2006, 451: 1–7. 10.1016/j.abb.2006.04.019View ArticlePubMedGoogle Scholar
- Bertsova YV, Bogachev AV, Skulachev VP: Noncoupled NADH : ubiquinone oxidoreductase of Azotobacter vinelandii is required for diazotrophic growth at high oxygen concentrations. J Bacteriol 2001, 183: 6869–6874. 10.1128/JB.183.23.6869-6874.2001PubMed CentralView ArticlePubMedGoogle Scholar
- Cook SA, Shiemke AK: Evidence that a type-2 NADH:quinone oxidoreductase mediates electron transfer to particulate methane monooxygenase in Methylococcus capsulatus. Arch Biochem Biophys 2002, 398: 32–40. 10.1006/abbi.2001.2628View ArticlePubMedGoogle Scholar
- Baldi P, Chauvin Y, Hunkapiller T, McClure MA: Hidden Markov models of biological primary sequence information. Proc Natl Acad Sci USA 1994, 91: 1059–1063. 10.1073/pnas.91.3.1059PubMed CentralView ArticlePubMedGoogle Scholar
- Bailey TL, Williams N, Misleh C, Li WW: MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res 2006, (Web Server):W369–373. 10.1093/nar/gkl198
- Pietrokovski S: Searching databases of conserved sequence regions by aligning protein multiple-alignments. Nucleic Acids Res 1996, 24: 3836–3845. 10.1093/nar/24.19.3836PubMed CentralView ArticlePubMedGoogle Scholar
- Stehle T, Ahmed SA, Claiborne A, Schulz GE: Structure of NADH peroxidase from Streptococcus faecalis 10C1 refined at 2.16 A resolution. J Mol Biol 1991, 221: 1325–1344.PubMedGoogle Scholar
- Mallett TC, Parsonage D, Claiborne A: Equilibrium analyses of the active-site asymmetry in enterococcal NADH oxidase: role of the cysteine-sulfenic acid redox center. Biochemistry 1999, 38: 3000–3011. 10.1021/bi9817717View ArticlePubMedGoogle Scholar
- Rapisarda VA, Chehin RN, De Las Rivas J, Rodriguez-Montelongo L, Farias RN, Massa EM: Evidence for Cu(I)-thiolate ligation and prediction of a putative copper-binding site in the Escherichia coli NADH dehydrogenase-2. Arch Biochem Biophys 2002, 405: 87–94. 10.1016/S0003-9861(02)00277-1View ArticlePubMedGoogle Scholar
- Bailey TL, Gribskov M: Combining evidence using p-values: application to sequence homology searches. Bioinformatics 1998, 14: 48–54. 10.1093/bioinformatics/14.1.48View ArticlePubMedGoogle Scholar
- Jonassen I: Efficient discovery of conserved patterns using a pattern graph. Comput Appl Biosci 1997, 13: 509–522.PubMedGoogle Scholar
- Schneider TD, Stephens RM: Sequence logos: a new way to display consensus sequences. Nucleic Acids Res 1990, 18: 6097–6100. 10.1093/nar/18.20.6097PubMed CentralView ArticlePubMedGoogle Scholar
- Eggink G, Engel H, Vriend G, Terpstra P, Witholt B: Rubredoxin reductase of Pseudomonas oleovorans. Structural relationship to other flavoprotein oxidoreductases based on one NAD and two FAD fingerprints. J Mol Biol 1990, 212: 135–142. 10.1016/0022-2836(90)90310-IView ArticlePubMedGoogle Scholar
- Ma K, Wang K: Binding of copper(II) ions to the polyproline II helices of PEVK modules of the giant elastic protein titin as revealed by ESI-MS, CD, and NMR. Biopolymers 2003, 70: 297–309. 10.1002/bip.10477View ArticlePubMedGoogle Scholar
- Arnesano F, Banci L, Bertini I, Mangani S, Thompsett AR: A redox switch in CopC: an intriguing copper trafficking protein that binds copper(I) and copper(II) at different sites. Proc Natl Acad Sci USA 2003, 100: 3814–3819. 10.1073/pnas.0636904100PubMed CentralView ArticlePubMedGoogle Scholar
- Fisher N, Rich PR: A motif for quinone binding sites in respiratory and photosynthetic systems. J Mol Biol 2000, 296: 1153–1162. 10.1006/jmbi.2000.3509View ArticlePubMedGoogle Scholar
- Chen X, Shi J, Chen Y, Xu X, Xu S, Wang Y: Tolerance and biosorption of copper and zinc by Pseudomonas putida CZ1 isolated from metal-polluted soil. Can J Microbiol 2006, 52: 308–316. 10.1139/W05-157View ArticlePubMedGoogle Scholar
- Zhang L, Koay M, Maher MJ, Xiao Z, Wedd AG: Intermolecular Transfer of Copper Ions from the CopC Protein of Pseudomonas syringae . Crystal Structures of Fully Loaded Cu(I)Cu(II) Forms. J Am Chem Soc 2006, 128: 5834–5850. 10.1021/ja058528xView ArticlePubMedGoogle Scholar
- Konstantinidis KT, Isaacs N, Fett J, Simpson S, Long DT, Marsh TL: Microbial diversity and resistance to copper in metal-contaminated lake sediment. Microb Ecol 2003, 45: 191–202. 10.1007/s00248-002-1035-yView ArticlePubMedGoogle Scholar
- Lim SY, Joe MH, Song SS, Lee MH, Foster JW, Park YK, Choi SY, Lee IS: CuiD is a crucial gene for survival at high copper environment in Salmonella enterica serovar Typhimurium. Mol Cells 2002, 14: 177–184.PubMedGoogle Scholar
- Zhang Y, Jock S, Geider K: Genes of Erwinia amylovora involved in yellow color formation and release of a low-molecular-weight compound during growth in the presence of copper ions. Mol Gen Genet 2000, 264: 233–240. 10.1007/s004380000290View ArticlePubMedGoogle Scholar
- Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG: The CLUSTALX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 1997, 25: 4876–4882. 10.1093/nar/25.24.4876PubMed CentralView ArticlePubMedGoogle Scholar
- Eddy SR: HMMER: Profile hidden Markov models for biological sequence analysis.2001. [http://hmmer.wustl.edu/]Google Scholar
- Page RDM: TREEVIEW: An application to display phylogenetic trees on personal computers. Computer Applications in the Biosciences 1996, 12: 357–358.PubMedGoogle Scholar
- PHYLIP package on POWER[http://power.nhri.org.tw]
- Gattiker A, Gasteiger E, Bairoch A: ScanProsite: a reference implementation of a PROSITE scanning tool. Applied Bioinformatics 2002, 1: 107–108.PubMedGoogle Scholar
- Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: A sequence logo generator. Genome Research 2004, 14: 1188–1190. 10.1101/gr.849004PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.