The Hotdog fold: wrapping up a superfamily of thioesterases and dehydratases
© Dillon and Bateman; licensee BioMed Central Ltd. 2004
Received: 30 April 2004
Accepted: 12 August 2004
Published: 12 August 2004
The Hotdog fold was initially identified in the structure of Escherichia coli FabA and subsequently in 4-hydroxybenzoyl-CoA thioesterase from Pseudomonas sp. strain CBS. Since that time structural determinations have shown a number of other apparently unrelated proteins also share the Hotdog fold.
Using sequence analysis we unify a large superfamily of HotDog domains. Membership includes numerous prokaryotic, archaeal and eukaryotic proteins involved in several related, but distinct, catalytic activities, from metabolic roles such as thioester hydrolysis in fatty acid metabolism, to degradation of phenylacetic acid and the environmental pollutant 4-chlorobenzoate. The superfamily also includes FapR, a non-catalytic bacterial homologue that is involved in transcriptional regulation of fatty acid biosynthesis.
We have defined 17 subfamilies, with some characterisation. Operon analysis has revealed numerous HotDog domain-containing proteins to be fusion proteins, where two genes, once separate but adjacent open-reading frames, have been fused into one open-reading frame to give a protein with two functional domains. Finally we have generated a Hidden Markov Model library from our analysis, which can be used as a tool for predicting the occurrence of HotDog domains in any protein sequence.
The HotDog domain is both an ancient and ubiquitous motif, with members found in the three branches of life.
Results and Discussion
Classifying the HotDog superfamily into subfamilies.
Number of Members
Proteobacteria : 67
Cytosolic long-chain acyl-CoA thioester hydrolase/ Brain acyl-CoA hydrolase
2 × HotDog
Fatty acid metabolism
Cytoplasmic acetyl-CoA hydrolase (including brown fat inducible thioesterase)
2 × HotDog, START
Fatty acid metabolism
Fatty acid metabolism
Probable medium chain acyl-CoA hydrolase
Fatty acid metabolism
(3R)-hydroxymyristoyl – (acyl-carrier-protein) dehydratase/ β-hydroxy-acyl ACP dehydratase (FabZ)
Lpx C in Q8KBX0
Fatty acid biosynthesis, lipid A biosynthesis (Q8KBX0)
Coronafacic acid (CFA) dehydratase
(R) specific enoyl-CoA hydratase (phaJ)
PHA synthase (phaC)
ADH short, SCP
Nodulation protein N
Cell envelope maintenance?
FabA-like dehydratases/ synthases
β-hydroxydecanoyl ACP dehydratase (FabA)
Unsaturated fatty acid biosynthesis
Omega-3 polyunsaturated fatty acid synthase (pfaC)
2 × HotDog, various numbers of BKAS N-term, BKAS-C-term & Acyl-transf. in Q93CG6, Q8YWH0, and Q9S1Z9
Polyunsaturated fatty acid biosynthesis
Fatty acid synthase
Acyl-transf., BKAS-N & BKAS-C
Fatty acid biosynthesis
Fatty acid synthesis
Fatty acid synthesis
E. coli acyl-CoA thioesterase II (tesB)
2 × HotDog, cNMP in Q8GYW7
Fatty acid metabolism
Human thioesterase II/ Peroxisomal acyl-CoA thioesterase
2 × HotDog
Fatty acid metabolism. Role in HIV infection?
4-hydroxybenzoyl-CoA thioesterase from Arthrobacter sp. strain SU and TM1
HAD domain in Q89YN2
Hypothetical protein bh3175
DRTGG, 2 × CBS
Phenylacetic acid degradation protein I
Phenylacetic acid metabolism
A. tumefaciens C58 3-hydroxyacyl-CoA dehydrogenase
3HCDH_N, 3HCDH in Q8UJY0 and Q92NF5
Fatty acid metabolism
E. coli YiiD protein
Transcription factor FapR
Transcriptional regulation of fatty acid metabolism
Metazoa: 10 Proteobacteria: 1
Mesenchymal stem cell protein DSCD75
Hypothetical protein ybaW
Hypothetical protein RSp0367
2 × AMP-bind
Proteobacteria: 4 Firmicutes: 1
4-hydroxybenzoyl-CoA thioesterase from Pseudomonas sp. strain CBS-3 and DJ-12
Degradation of aromatic compounds?
The Pfam database  contains a Thioesterase superfamily with 697 members, each member containing a 4HBT domain (accession: PF03061) corresponding to the HotDog domain. The SCOP database  contains a thioesterase/thiol ester dehydrase-isomerase superfamily, divided into 5 families, namely the 4HBT-like, beta-hydroxydecanoyl thiol ester dehydrase, Thioesterase II (TesB), MaoC dehydratase and PaaI/YdiI-like families.
Acyl-CoA thioesterase subfamily
The largest subfamily represents over a hundred acyl-CoA thioesterases that are widespread throughout the prokaryotic kingdom, with members also found in eukaryotes. This group of enzymes catalyze the hydrolysis of acyl-CoA thioesters to free fatty acids and coenzyme A (CoA-SH.) . The subfamily includes thioesterases with activity towards medium and long chain acyl-CoAs (medium chain acyl-CoA hydrolase and cytosolic long-chain acyl-CoA hydrolase/brain acyl-CoA hydrolase (BACH) respectively) and also cytoplasmic acetyl-CoA hydrolase (CACH), which hydrolyzes acetyl-CoA to acetate and CoA-SH. Brown-fat-inducible thioesterase (BFIT), a cold-induced protein found in brown adipose tissue (BAT)  is also included in this group. Both BFIT and CACH possess a StAR-related lipid-transfer (START) domain  that is involved in lipid binding, consistent with the role of BFIT and CACH in lipid metabolism. Duplication of the HotDog domain and recruitment of the START domain seems to be a mammalian innovation.
FabZ like dehydratase subfamily
Members of this subfamily are found in a wide range of bacteria and sporadically in eukaryotes. In E. coli the products of the fab operon catalyze the four sequential reactions necessary for each round of fatty acid elongation . The third step in each cycle of fatty acid elongation involves the dehydration of the β-hydroxyacyl-ACP protein intermediate by β-hydroxyacyl-[acyl carrier protein] dehydratase (FabZ) to give trans-2-decenoyl-ACP. FabZ is effective at dehydrating both short-chain and long chain saturated and unsaturated pathway intermediates.
This subfamily also contains a dehydratase component of the coronafacic acid (CFA) biosynthetic cluster encoded by the cfa2 gene [20, 21]. CFA is the polyketide constituent of a phytotoxin called coronatine, which is a virulence factor of Pseudomonas syringae, a plant pathogen that causes disease in many agriculturally important plants .
MaoC dehydratase-like subfamily
The mao C gene exists as an operon with the maoA gene in E. coli and is an enoyl-CoA hydratase involved in supplying (R)-3-hydroxyacyl-CoA from the fatty acid oxidation pathway to polyhydroxyalkanoate (PHA) biosynthetic pathways in fadB mutant E. coli strains. It was identified through its homology to P. aeruginosa (R)-specific enoyl-CoA hydratase (PhaJ1) . PHAs are polyesters of (R)-hydroxyalkanoic acids, synthesized by numerous bacteria as an intracellular carbon and energy storage material in times of excess carbon sources , with intermediates of fatty acid metabolism such as enoyl-CoA, (S)-3-hydroxyacyl-CoA, and 3-ketoacyl-CoA acting as precursors for PHA biosynthesis . The crystal structure of the (R)-specific enoyl-CoA hydratase (phaJ) from the Aeromonas caviae has shown that this enzyme also contains a Hotdog fold/domain . The E. coli MaoC C-terminal HotDog domain is most likely responsible for its enoyl-CoA hydratase actvity. MaoC also contains an N-terminal short-chain dehydrogenase domain, involved in catalysing dehydrogenation of a variety of aliphatic and aromatic aldehydes using NADP as a cofactor. This subfamily also includes the human 17 β-hydroxysteroid dehydrogenase (17 β HSD) type 4, one of four different human 17 β HSDs that catalyze the redox reactions at position C17 of steroid molecules, one of the final steps in androgen and estrogen biosynthesis [24, 25]. We also include a NodN-like sub-subfamily here that is found in another cluster containing several other MaoC proteins. Rhizobium and related species form nodules on the roots of their legume hosts, a symbiotic process that requires production of Nod factors, which are signal molecules involved in root hair deformation and meristematic cell division . The nodulation gene products, including NodN, are involved in producing the Nod factors, however the role played by NodN is unclear.
This subfamily contains a large number of proteins about which very little is known except for the YbgC protein. The YbgC protein of the tol-pal cluster in the gamma-proteobacterium Haemophilus influenzae  has been shown to catalyze the hydrolysis of short-chain aliphatic acyl-CoA thioesters. The tol-pal cluster is present in many Gram-negative bacteria and is important for the maintenance of cell envelope integrity  and this operon is well conserved across gram-negative bacteria. Therefore we hypothesize that uncharacterized members of this subfamily are thioesterases.
The Asp17 residue is conserved in YbgC from Haemophilus influenzae and Pseudomonas aeruginosa, along with the backbone amide NH of Tyr24, suggestive of a nucleophilic attack mechanism very similar to the Pseudomonas sp. strain CBS-3 thioesterase mechanism discussed below in the 4HBT class I section.
The dehydration of the β-hydroxyacyl-ACP protein intermediate during the third step in each cycle of fatty acid elongation can be catalyzed by β-hydroxydecanoyl-ACP dehydratase/isomerase (FabA), as well as by FabZ, to give trans-2-decenoyl-ACP. FabA is uniquely able to isomerise trans-2-decenoyl-ACP to cis-3-decenoyl ACP, initiating unsaturated fatty acid biosynthesis  and is specific for acyl ACPs of 9–11 carbons in length.
Polyketides are a large and structurally diverse class of natural products, produced mainly by soil-dwelling bacteria such as Pseudomonas spp. and Streptomyces spp. They include clinically useful drugs such as the antibiotic erythromycin A and the immunosuppressants FK506 and rapamycin. The biosythesis of polyketides is very similar to that of fatty acids  and polyketide synthases (PKSs) have been classified as type I or type II according to fatty acid synthase (FAS) similarity. Most bacteria and plants use a highly conserved type II FAS system, which uses a distinct enzyme for each reaction. This is in contrast to the mammalian type I system (also used by fungi and some mycobacteria), which uses one multifunctional polypeptide to catalyze all reactions [29, 30]. The HotDog domain is found in type II fatty acid synthesis in bacteria (FabA/FabZ), but also in a small number of bacterial polyketide synthases that are of the type I, being composed of several modules  such as β keto-acyl synthases and omega-3 polyunsaturated fatty acid synthase (PfaC). The marine bacteria Shewanella sp. SCRC-2738, Moritella marina strain MP-1 and Photobacterium profundum strain SS9 contain an eicosapentaenoic acid (EPA) biosynthetic cluster (pfaA-D), responsible for the synthesis of this omega-3 polunsaturated fatty acid (PUFA), [32, 33]. The PfaC protein contains two HotDog domains (see Figure 2 for the domain organisation found in P. profundum), which are also found in the eukaryotic marine protist, Schizochytrium, suggesting that the PUFA synthetic cluster has undergone lateral gene transfer .
This subfamily also includes several fatty acid synthase proteins from bacteria, such as Mycobacterium bovis fatty acid synthase. This multifunctional protein is capable of catalysing de novo synthesis and chain elongation of fatty acids  and has a very similar domain architecture to the polyunsaturated fatty acid synthases, as it contains an acyl-transferase, β-keto acyl synthase N and C-terminal domains (see Figure 2).
The catalytic residues of FabA's bifunctional active site are His70 and Asp84, His70 is conserved in FabZ dehydratase, but Asp84 is replaced with Glutamate. This replacement may be responsible for FabZ's inability to catalyze the isomerization reaction .
Fat subfamily Acyl-ACP thioesterases
In plants, fatty acid synthesis occurs in the stroma of plastids, where the acyl chains are bound to the acyl carrier protein (ACP) during extension cycles . Acyl-ACP thioesterases terminate fatty acid synthesis in plants by hydrolysing the thioester bond existing between an acyl moiety and the ACP . In higher plants acyl-ACP thioesterases have been classified into two gene classes, fatA and fatB, based on sequence similarity and substrate specificities [37, 38]. Arabidopsis FatA displays highest activity towards oleoyl-ACP whereas Arabidopsis FatB is most active towards palmitoyl-ACP . This subfamily contains both FatA and FatB members . The proteins in this subfamily range in length from 240 to 400 amino acids and therefore we hypothesized that they might contain two HotDog domains, located at the N and C teminal halves. By splitting the sequence of proteins from this subfamily into an N-terminal half and C-terminal half we were readily able to detect the relationship to other subfamilies using PSI-blast with query proteins such as Q899Q1 and Q42714, confirming our hypothesis.
This subfamily contains the E. coli medium chain length acyl-CoA thioesterase II  encoded by the tesB gene , which is a close homolog of the human thioesterase II (hTE) enzyme. hTE catalyzes the hydrolysis of palmitoyl-CoA to CoA and palmitate and was identified as a human T cell protein that binds to the myristoylated HIV-1 Nef protein, correlating with Nef-mediated CD4 down regulation . hTE could regulate targeting of the cytoplasmic Nef protein to the plasma membrane, which is dependent on a lipid modification, i.e. a myristoylation anchor and recombinant hTE shows maximal activity with myristoyl-CoA . However further studies have shown that hTE localizes to peroxisomes [40, 41], dependent on a C-terminal peroxisomal targeting sequence, SKL, and coexpression of Nef and hTE results in relocation of Nef to peroxisomes, so the role of Nef and hTE during HIV infection remains unsolved.
The catalytic site of E. coli thioesterase II was identified by site directed mutagenesis and involves a hydrogen-bonded triad of Asp204, Thr 228, and Gln 278, which synergistically activate a water molecule for nucleophilic attack of the carbonyl thioester carbon of medium chain length acyl-CoA substrates . This is a novel reaction mechanism for a thioesterase and differs from the nucleophilic mechanisms used by β-hydroxydecanoyl dehydratase and 4HBT thioesterase in both Pseudomonas and Arthrobacter discussed below. This subfamily is found in bacteria and eukaryotes.
4HBT class II subfamily
This subfamily includes 4-hydroxybenzoyl CoA thioesterase (4HBT) from Arthrobacter sp. strains SU and TM1 encoded by the fcbC gene . The Pseudomonas thioesterase uses the Asp17 residue to mediate the hydrolysis reaction as discussed below in the 4HBT class I section. Gln58 from Arthrobacter corresponds to the Asp17 residue in Pseudomonas but inspection of the Arthrobacter strain SU active site has revealed the catalytic base (or nucleophile) to be Glu73, on the opposite side of the substrate binding pocket to Asp17. Also the Pseudomonas thioesterase dimers form a tetramer with their long α-helices facing inwards, in contrast to Arthrobacter thioesterase where the dimers form a tetramer with their long α-helices facing outwards . In Pseudomonas and Arthrobacter thioesterases, the 4-hydroxyphenacyl moieties are positioned in such an orientation that the thioester C = O interacts with the α-helical N-terminus by means of hydrogen bonding to a backbone amide NH, on Tyr24 in Pseudomonas and Gly65 in Arthrobacter, and it is this contact that results in polarization of the C = O for nucleophilic attack . While the structure of Arthrobacter sp. strain SU thioesterase displays a similar Hotdog-fold topology to the 4HBT class I Pseudomonas enzyme, the enzymes differ at the level of catalytic platform, CoA binding site and quaternary structure [3, 42]. This is not an unexpected finding as Todd et al. have found that 12 of the 31 superfamilies they analyzed displayed positional variation for residues playing equivalent catalytic roles .
A surprising inclusion in this subfamily is the ComA2 protein from Bacillus subtilis. ComA is a response regulator and transcription factor  that together with the histidine kinase, ComP, constitutes a two-component signal transduction system required for the development of competence. The com A locus is composed of two ORFs. ComA2 is cotranscribed with ComA1, which is required for competence while ComA2 is not , and so the role of the HotDog domain in this protein remains a mystery.
The phenylacetic acid (PA) catabolic pathway in E. coli has been characterised and found to contain 14 genes, allowing catabolism of this aromatic compound into likely Krebs cycle intermediates . The paa operon in E. coli encodes PaaI, which is probably a thioesterase involved in the catabolism of PA. The catabolism of phenylacetic acid (PA) in E. coli begins with an activation step where Phenylacetyl-CoA ligase, PaaK, converts phenylacetate into Phenylacetyl-CoA. 4-chlorobenzoate-CoA ligase catalyzes a similar reaction at the first step of the 4-chlorobenzoate-degradation pathway. The thioesterase, PaaI, may be involved in a reaction similar to the last step in the degradation of 4-chlorobenzoate (see 4HBT class I below), however this remains to be demonstrated.
This small subfamily is restricted to firmicutes. FapR is a highly conserved transcriptional regulator found in many gram-positive organisms, including all species of Bacillus . It controls expression of genes involved in type II fatty acid and phospholipid biosynthesis, by binding to a consensus promoter sequence of the fap regulon and acting as a negative regulator. Malonyl-CoA, an intermediate in the lipid biosynthetic pathway, controls FapR. The HotDog domain has likely retained its substrate specificity for malonyl-CoA, but appears to have lost its catalytic ability, in common with the ligand binding domain of other transcriptional regulators. FapR contains a helix-turn-helix motif at the N-terminus (see Figure 2), which is similar to the DeoR transcriptional regulator family (data not shown), consistent with its role as a DNA binding protein.
4HBT class I subfamily
The crystal structure of 4HBT from the soil-inhabiting bacterium Pseudomonas sp. strain CBS-3  has helped define the HotDog domain. A lot of attention has been focused on this microorganism because of its ability to survive on 4-chlorobenzoate (4CBA) as its only source of carbon . 4CBA is a by-product of microbial degradation of industrial pollutants such as DDT and polychlorinated biphenyl herbicides  and this strain of Pseudomonas may be used as a bioremediation agent for degrading 4CBA. Pseudomonas sp. strain CBS-3 contains an fcb operon responsible for hydrolytic dechlorination of 4CBA, with 4CBA-CoA ligase (FcbA), 4CBA-CoA dehalogenase (FcbB), and 4HBT (FcbC) catalyzing sequential reactions that result in the degradation of 4CBA to 4-hydroxybenzoate. The thioesterase catalyzes the third step in the degradation pathway, which is the hydrolysis of the 4-hydroxybenzoyl-CoA thioester moiety to give 4-hydroxybenzoate and CoA .
4HBT from Pseudomonas sp. strain DJ-12  is also found in this subfamily. The organization of the fcb operon in strain DJ-12 is different from that observed in strain CBS-3. The fcb genes are organised as B-A-C in both strains but strain DJ-12 has three ORFs between A and C called T1, T2, and T3 that are unique to this strain. These three genes are similar to the C4-dicarboxylate transport system in Rhodobacter capsulatus, suggesting that they may encode membrane proteins involved in the uptake of 4CBA . This is in contrast to the gene organisation observed in the 4HBT class II, where Arthrobacter sp. strain SU and strain TM1 have an A-B-C order . There is a duplication of the cluster in strain SU, where it is found on a plasmid, whereas only one copy exists in strain TM1, where it is located chromosomally. Both operons contain a T gene located at the end of the cluster, possibly involved in 4CBA uptake.
Bacillus halodurans C-125 contains a gene called BH1999, encoding a novel gentisyl-CoA thioesterase, which catalyzes the hydrolysis of gentisyl-CoA (2,5-dihydroxybenzoyl-CoA)[4, 52] to yield gentisate (2,5 dihydroxybenzoate). BH1999 is found in a gentisate oxidation pathway gene cluster in B. halodurans. Gentisate has been implicated as an intermediate in the degradation of several industrial aromatic compounds .
Gentisyl-CoA thioesterase and 4HBT from Pseudomonas perform different physiological functions but remain in the same subfamily because they are highly related. The active site residues Asp16 and Asp31 of gentisyl-CoA thioesterase align with Asp17 and Asp32 of 4HBT. These are crucial residues that are proposed to function in nucleophilic catalysis and substrate binding respectively. Loss of Asp17 in the Pseudomonas enzyme effectively halts catalysis, while loss of the corresponding Asp16 residue to the Bacillus halodurans enzyme only reduces its catalytic rate by 230-fold, perhaps indicating that the hydrolysis reaction does not proceed through an Asp16-mediated nucleophilic attack mechanism previously proposed for Asp17 [53, 4]. Asp17 in Pseudomonas strain CBS-3 has been suggested to participate in nucleophilic catalysis rather than general base catalysis based on the following observations. The Asp17 carboxylate is located at a distance of 3.2 Å from the substrate C = O thioester bond, its aligned trajectory and the absence of a water molecule near the reaction centre are all suggestive of a role for Asp17 as a catalytic nucleophile [9, 53]. Asp32 in Pseudomonas interacts with the benzoyl OH of 4-hydroxybenzoyl-CoA  and perhaps Asp31 plays a similar role.
Other subfamilies/ members
In the above sections we have described the 11 subfamilies that have some functional characterization. In this section we describe the other 6 subfamilies that have no functional characterization, except they are associated with other domains or have been structurally characterized.
The CBS associated subfamily contains the hypothetical protein BH3175 from Bacillus halodurans. The BH3175 protein contains two homologous copies of the CBS domain . Scott et al. have recently shown that tandem pairs of CBS domains act as sensors of cellular energy status by binding AMP, ATP, or S-adenosyl methionine and mutations in CBS domains impair this binding in several hereditary disorders . Although we do not know the substrate or activity of this subfamily of the HotDog superfamily, we can suggest that this step is regulated in an energy dependent manner by the CBS domains.
3-hydroxyacyl-CoA dehydrogenase is an enzyme involved in fatty acid metabolism, catalyzing the reduction of 3-hydroxyacyl-CoA to 3-oxoacyl-CoA . The hydroxyacyl-CoA dehydrogenase-associated subfamily includes 3-hydroxyacyl-CoA dehydrogenase from Agrobacterium tumefaciens strain C58, which contains a HotDog domain at its C-terminus and the two domains (3HCDH_N and 3HCDH) associated with 3-hydroxyacyl-CoA dehydrogenase activity are located at the N-terminus and central portion of this protein. The combination of activities may allow substrate to be passed from one domain to the next.
Other subfamilies in the superfamily include the YiiD protein from E. coli, where an acetyltransferase domain is fused. The human mesenchymal stem cell protein DSCD75 and its counterpart in mouse also contain a HotDog domain. A Structural proteomics project has shown that the conserved hypothetical E. coli protein YbaW contains a Hotdog fold . Finally the Ralstonia solanacearum hypothetical protein RSp0367, containing a HotDog domain and two AMP-binding domains, found in proteins involved in ATP-dependent covalent binding of AMP to their substrate, is a member of another subfamily.
Domain fusion events
Within the FabZ subfamily the LpxC deacetylase domain (UDP-3-O-acyl N-acetylglucosamine deacetylase) is fused to the FabZ-like HotDog domain in Chlorobium tepidium (see Figure 3a). LpxC catalyzes the N-deacetylation of UDP-3-O-acyl N-acetylglucosamine deacetylase, the second and committed step in the biosynthesis of lipid A, which anchors lipopolysaccharide (LPS) in the outer membranes of most gram-negative bacteria . The unfused proteins are found adjacent in operons from several species of chlamydia and cyanobacteria.
In the 4HBT class II subfamily we observed the order of the operon is ligase(A)-dehalogenase(B)-thioesterase(C). In Bacteroides thetaiotaomicron there is a Rosetta protein that contans a haloacid dehalogenase-like hydrolase domain (see Figure 3b). This domain architecture is similar to the fcb operon structure in Arthrobacter, with a dehalogenase-like hydrolase (HAD) domain and a HotDog domain (see Figure 3) i.e. it represents a fusion of the fcbB and fcbC gene products to form a novel protein in B. thetaiotaomicron.
The final domain fusion is in the 3-hydroxyacyl-CoA dehydrogenase from Agrobacterium tumefaciens strain C58, which possesses the HotDog domain, 3HCDH_N domain (3-hydroxyacyl-CoA dehydrogenase, NAD binding) and 3HCDH (3-hydroxyacyl-CoA dehydrogenase, C-terminal domain) domain (see Figure 3c). This may represent a fusion of the PaaC and PP3281 proteins in the gamma-proteobacterium Pseudomonas putida 2440 phenylacetic acid degradation operon.
These fusion events suggest that the domain fusion process can occur in a simple scheme with two distinct phases. Firstly, two proteins are recombined into adjacent positions in an operon. Secondly, the two genes are then fused by a process of mutation that removes the stop codon at the end of the first gene and maintains reading frame through the second gene [60, 61].
The MASIA program  was used to search for HotDog domain motifs in the aligned sequences of the 17 subfamilies. The various motifs are found in Additional file 5. It must also be noted that the PROSITE database release 18.29  contains a consensus sequence motif (PS01328), called the 4-hydroxybenzoyl-CoA thioesterase family active site, and this is found in 29 Swiss-Prot, TrEMBL and TrEMBL-NEW entries cross-referenced with PS01328. This consensus pattern, [QR]-[IV]-x(4)-[TC]-D-x(2)-G [IV]-V-x-[HF]-x(2)-[FY], where D is the active site residue, is found in the YbgC-like subfamily and in the 4HBT-I subfamily. 19 of the 29 members are found in the YbgC-like group and 3 in the smaller 4HBT-I group. The remaining 7 proteins are scattered in various clusters consisting of hypothetical or unknown proteins. We have found, using MASIA, that this motif is found in the entire YbgC and 4HBT-I subfamilies, extending the number of proteins containing this motif to 107. We have also identified a HGG motif in the 4HBT-II and PaaI subfamilies. This motif is HGGAS-x-ALAE in the 4HBT-II subfamily and HGG-x-IF-x-LAD in PaaI members. The active site residue, Glu73, is known for 4-hydroxybenzoyl-CoA thioesterase from Arthrobacter sp. Strain SU, however the active site for E. coli PaaI is not known and we suggest that it is Asp61 in the HGG motif above, which is 100 % conserved in all members of this subfamily (see Additional file 6).
We have defined and analyzed the HotDog domain superfamily and in our analysis of this superfamily we have found 18 different domain architectures and defined 17 subfamiles. We have also investigated the domain organisation and the role that this plays in generating functionally diverse enzymatic and nonenzymatic functions based on the HotDog fold. Domain duplication, domain recruitment and incremental mutation have been key to the evolution of this superfamily. We have also looked at gene context and operon structures and found many examples of fusion proteins, in which the HotDog domain has been fused to another protein to generate functional diversity. The large number of subfamilies we have found, the diverse range of activities these proteins participate in and the taxonomic distribution of the HotDog domain indicates an ancient superfamily that has diverged substantially to fulfil numerous roles in the cell.
Our analysis may help with further experimental investigation of members of this superfamily. Some members of this superfamily, such as the P. falciparium FabZ enzyme have been proposed as a target for new anti-malarial drugs  as FabZ homologues are not found in humans. Finally our analysis identified hundreds of novel proteins such as human mesenchymal stem cell protein DSCD75 and the Ralstonia solanacearum hypothetical protein RSp0367 as probable enzymes potentially involved in lipid metabolism. Given that the large majority of proteins in this family are involved in bacterial lipid metabolism we suggest that the HotDog domain evolved in bacteria first and may then have been transferred to eukaryotes and archaea on several occasions. Since this time duplication and mutation has allowed it to fill a variety of roles.
All PSI-BLAST searches were carried out using default inclusion thresholds and searched against the Swiss-Prot and TrEMBL sequence database (SWISS-PROT release 42.12 and TrEMBL release 25.12).
To define subfamilies we clustered the results of an all-against-all search of the 1357 HotDog domain proteins using NCBI BLASTP and single linkage clustering at an E-value of 10-15.
We would like to acknowledge Robert Finn's help in generating Figure 1 and Alexey Murzin and Chris Ponting for helpful discussions. We would also like to thank David J. Studholme for critical reading of the manuscript. Finally we would like to thank the referee's for their helpful comments. AB and SCD are funded by the Wellcome Trust.
- Leesong M, Henderson BS, Gillig JR, Schwab JM, Smith JL: Structure of a dehydratase-isomerase from the bacterial pathway for biosynthesis of unsaturated fatty acids: two catalytic activities in one active site. Structure 1996, 4: 253–264. 10.1016/S0969-2126(96)00030-5View ArticlePubMedGoogle Scholar
- Benning MM, Wesenberg G, Liu R, Taylor KL, Dunaway-Mariano D, Holden HM: The three-dimensional structure of 4-hydroxybenzoyl-CoA thioesterase from Pseudomonas sp. Strain CBS-3. J Biol Chem 1998, 273: 33572–33579. 10.1074/jbc.273.50.33572View ArticlePubMedGoogle Scholar
- Thoden JB, Zhuang Z, Dunaway-Mariano D, Holden HM: The structure of 4-hydroxybenzoyl-CoA thioesterase from arthrobacter sp. strain SU. J Biol Chem 2003, 278: 43709–43716. 10.1074/jbc.M308198200View ArticlePubMedGoogle Scholar
- Zhuang Z, Song F, Takami H, Dunaway-Mariano D: The BH1999 protein of Bacillus halodurans C-125 is gentisyl-coenzyme A thioesterase. J Bacteriol 2004, 186: 393–399. 10.1128/JB.186.2.393-399.2004PubMed CentralView ArticlePubMedGoogle Scholar
- Li J, Derewenda U, Dauter Z, Smith S, Derewenda ZS: Crystal structure of the Escherichia coli thioesterase II, a homolog of the human Nef binding enzyme. Nat Struct Biol 2000, 7: 555–559. 10.1038/76776View ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25: 3389–3402. 10.1093/nar/25.17.3389PubMed CentralView ArticlePubMedGoogle Scholar
- Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O'Donovan C, Phan I, Pilbout S, Schneider M: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 2003, 31: 365–370. 10.1093/nar/gkg095PubMed CentralView ArticlePubMedGoogle Scholar
- Hisano T, Tsuge T, Fukui T, Iwata T, Miki K, Doi Y: Crystal structure of the (R)-specific enoyl-CoA hydratase from Aeromonas caviae involved in polyhydroxyalkanoate biosynthesis. J Biol Chem 2003, 278: 617–624. 10.1074/jbc.M205484200View ArticlePubMedGoogle Scholar
- Thoden JB, Holden HM, Zhuang Z, Dunaway-Mariano D: X-ray crystallographic analyses of inhibitor and substrate complexes of wild-type and mutant 4-hydroxybenzoyl-CoA thioesterase. J Biol Chem 2002, 277: 27468–27476. 10.1074/jbc.M203904200View ArticlePubMedGoogle Scholar
- Yee A, Pardee K, Christendat D, Savchenko A, Edwards AM, Arrowsmith CH: Structural proteomics: toward high-throughput structural biology as a tool in functional genomics. Acc Chem Res 2003, 36: 183–189. 10.1021/ar010126gView ArticlePubMedGoogle Scholar
- Bertone P, Gerstein M: Integrative data mining: the new direction in bioinformatics. IEEE Eng Med Biol Mag 2001, 20: 33–40. 10.1109/51.940042View ArticlePubMedGoogle Scholar
- Chance MR, Bresnick AR, Burley SK, Jiang JS, Lima CD, Sali A, Almo SC, Bonanno JB, Buglino JA, Boulton S, Chen H, Eswar N, He G, Huang R, Ilyin V, McMahan L, Pieper U, Ray S, Vidal M, Wang LK: Structural genomics: a pipeline for providing structures for the biologist. Protein Sci 2002, 11: 723–738. 10.1110/ps.4570102PubMed CentralView ArticlePubMedGoogle Scholar
- Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats C, Eddy SR: The Pfam protein families database. Nucleic Acids Res 2004, 32(Database issue):D138–141. 10.1093/nar/gkh121PubMed CentralView ArticlePubMedGoogle Scholar
- Andreeva A, Howorth D, Brenner SE, Hubbard TJ, Chothia C, Murzin AG: SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res 2004, 32(Database issue):D226–229. 10.1093/nar/gkh039PubMed CentralView ArticlePubMedGoogle Scholar
- HMMER: profile HMMs for protein sequence analysis[http://hmmer.wustl.edu]
- Hunt MC, Alexson SE: The role Acyl-CoA thioesterases play in mediating intracellular lipid metabolism. Prog Lipid Res 2002, 41: 99–130. 10.1016/S0163-7827(01)00017-0View ArticlePubMedGoogle Scholar
- Adams SH, Chui C, Schilbach SL, Yu XX, Goddard AD, Grimaldi JC, Lee J, Dowd P, Colman S, Lewin DA: BFIT, a unique acyl-CoA thioesterase induced in thermogenic brown adipose tissue: cloning, organization of the human gene and assessment of a potential link to obesity. Biochem J 2001, 360: 135–142. 10.1042/0264-6021:3600135PubMed CentralView ArticlePubMedGoogle Scholar
- Ponting CP, Aravind L: START: a lipid-binding domain in StAR, HD-ZIP and signalling proteins. Trends Biochem Sci 1999, 24: 130–132. 10.1016/S0968-0004(99)01362-6View ArticlePubMedGoogle Scholar
- Heath RJ, Rock CO: Roles of the FabA and FabZ beta-hydroxyacyl-acyl carrier protein dehydratases in Escherichia coli fatty acid biosynthesis. J Biol Chem 1996, 271: 27795–27801. 10.1074/jbc.271.44.27795View ArticlePubMedGoogle Scholar
- Rangaswamy V, Mitchell R, Ullrich M, Bender C: Analysis of genes involved in biosynthesis of coronafacic acid, the polyketide component of the phytotoxin coronatine. J Bacteriol 1998, 180: 3330–3338.PubMed CentralPubMedGoogle Scholar
- Rangaswamy V, Jiralerspong S, Parry R, Bender CL: Biosynthesis of the Pseudomonas polyketide coronafacic acid requires monofunctional and multifunctional polyketide synthase proteins. Proc Natl Acad Sci U S A 1998, 95: 15469–15474. 10.1073/pnas.95.26.15469PubMed CentralView ArticlePubMedGoogle Scholar
- Park SJ, Lee SY: Identification and characterization of a new enoyl coenzyme A hydratase involved in biosynthesis of medium-chain-length polyhydroxyalkanoates in recombinant Escherichia coli. J Bacteriol 2003, 185: 5391–5397. 10.1128/JB.185.18.5391-5397.2003PubMed CentralView ArticlePubMedGoogle Scholar
- Anderson AJ, Haywood GW, Dawes EA: Biosynthesis and composition of bacterial poly(hydroxyalkanoates). Int J Biol Macromol 1990, 12: 102–105. 10.1016/0141-8130(90)90060-NView ArticlePubMedGoogle Scholar
- Penning TM, Bennett MJ, Smith-Hoog S, Schlegel BP, Jez JM, Lewis M: Structure and function of 3 alpha-hydroxysteroid dehydrogenase. Steroids 1997, 62: 101–111. 10.1016/S0039-128X(96)00167-5View ArticlePubMedGoogle Scholar
- Leenders F, Dolez V, Begue A, Moller G, Gloeckner JC, de Launoit Y, Adamski J: Structure of the gene for the human 17beta-hydroxysteroid dehydrogenase type IV. Mamm Genome 1998, 9: 1036–1041. 10.1007/s003359900921View ArticlePubMedGoogle Scholar
- Baev N, Schultze M, Barlier I, Ha DC, Virelizier H, Kondorosi E, Kondorosi A: Rhizobium nodM and nodN genes are common nod genes: nodM encodes functions for efficiency of nod signal production and bacteroid maturation. J Bacteriol 1992, 174: 7555–7565.PubMed CentralPubMedGoogle Scholar
- Zhuang Z, Song F, Martin BM, Dunaway-Mariano D: The YbgC protein encoded by the ybgC gene of the tol-pal gene cluster of Haemophilus influenzae catalyzes acyl-coenzyme A thioester hydrolysis. FEBS Lett 2002, 516: 161–163. 10.1016/S0014-5793(02)02533-4View ArticlePubMedGoogle Scholar
- Sturgis JN: Organisation and evolution of the tol-pal gene cluster. J Mol Microbiol Biotechnol 2001, 3: 113–122.PubMedGoogle Scholar
- Smith S: The animal fatty acid synthase: one gene, one polypeptide, seven enzymes. Faseb J 1994, 8: 1248–1259.PubMedGoogle Scholar
- Raetz CR, Dowhan W: Biosynthesis and function of phospholipids in Escherichia coli. J Biol Chem 1990, 265: 1235–1238.PubMedGoogle Scholar
- Hopwood DA, Sherman DH: Molecular genetics of polyketides and its comparison to fatty acid biosynthesis. Annu Rev Genet 1990, 24: 37–66. 10.1146/annurev.ge.24.120190.000345View ArticlePubMedGoogle Scholar
- Metz JG, Roessler P, Facciotti D, Levering C, Dittrich F, Lassner M, Valentine R, Lardizabal K, Domergue F, Yamada A, Yazawa K, Knauf V, Browse J: Production of polyunsaturated fatty acids by polyketide synthases in both prokaryotes and eukaryotes. Science 2001, 293: 290–293. 10.1126/science.1059593View ArticlePubMedGoogle Scholar
- Allen EE, Bartlett DH: Structure and regulation of the omega-3 polyunsaturated fatty acid synthase genes from the deep-sea bacterium Photobacterium profundum strain SS9. Microbiology 2002, 148: 1903–1913.View ArticlePubMedGoogle Scholar
- Fernandes ND, Kolattukudy PE: Cloning, sequencing and characterization of a fatty acid synthase-encoding gene from Mycobacterium tuberculosis var. bovis BCG. Gene 1996, 170: 95–99. 10.1016/0378-1119(95)00842-XView ArticlePubMedGoogle Scholar
- Jones A, Davies HM, Voelker TA: Palmitoyl-acyl carrier protein (ACP) thioesterase and the evolutionary origin of plant acyl-ACP thioesterases. Plant Cell 1995, 7: 359–371. 10.1105/tpc.7.3.359PubMed CentralView ArticlePubMedGoogle Scholar
- Ohlrogge JB, Jaworski JG: Regulation of Fatty Acid Synthesis. Annu Rev Plant Physiol Plant Mol Biol 1997, 48: 109–136. 10.1146/annurev.arplant.48.1.109View ArticlePubMedGoogle Scholar
- Salas JJ, Ohlrogge JB: Characterization of substrate specificity of plant FatA and FatB acyl-ACP thioesterases. Arch Biochem Biophys 2002, 403: 25–34. 10.1016/S0003-9861(02)00017-6View ArticlePubMedGoogle Scholar
- Naggert J, Narasimhan ML, DeVeaux L, Cho H, Randhawa ZI, Cronan JE Jr, Green BN, Smith S: Cloning, sequencing, and characterization of Escherichia coli thioesterase II. J Biol Chem 1991, 266: 11044–11050.PubMedGoogle Scholar
- Liu LX, Margottin F, Le Gall S, Schwartz O, Selig L, Benarous R, Benichou S: Binding of HIV-1 Nef to a novel thioesterase enzyme correlates with Nef-mediated CD4 down-regulation. J Biol Chem 1997, 272: 13779–13785. 10.1074/jbc.272.21.13779View ArticlePubMedGoogle Scholar
- Jones JM, Nau K, Geraghty MT, Erdmann R, Gould SJ: Identification of peroxisomal acyl-CoA thioesterases in yeast and humans. J Biol Chem 1999, 274: 9216–9223. 10.1074/jbc.274.14.9216View ArticlePubMedGoogle Scholar
- Cohen GB, Rangan VS, Chen BK, Smith S, Baltimore D: The human thioesterase II protein binds to a site on HIV-1 Nef critical for CD4 down-regulation. J Biol Chem 2000, 275: 23097–23105. 10.1074/jbc.M000536200View ArticlePubMedGoogle Scholar
- Zhuang Z, Gartemann KH, Eichenlaub R, Dunaway-Mariano D: Characterization of the 4-hydroxybenzoyl-coenzyme A thioesterase from Arthrobacter sp. strain SU. Appl Environ Microbiol 2003, 69: 2707–2711. 10.1128/AEM.69.5.2707-2711.2003PubMed CentralView ArticlePubMedGoogle Scholar
- Todd AE, Orengo CA, Thornton JM: Evolution of function in protein superfamilies, from a structural perspective. J Mol Biol 2001, 307: 1113–1143. 10.1006/jmbi.2001.4513View ArticlePubMedGoogle Scholar
- Solomon JM, Lazazzera BA, Grossman AD: Purification and characterization of an extracellular peptide factor that affects two different developmental pathways in Bacillus subtilis. Genes Dev 1996, 10: 2014–2024.View ArticlePubMedGoogle Scholar
- Weinrauch Y, Guillen N, Dubnau DA: Sequence and transcription mapping of Bacillus subtilis competence genes comB and comA, one of which is related to a family of bacterial regulatory determinants. J Bacteriol 1989, 171: 5362–5375.PubMed CentralPubMedGoogle Scholar
- Ferrandez A, Minambres B, Garcia B, Olivera ER, Luengo JM, Garcia JL, Diaz E: Catabolism of phenylacetic acid in Escherichia coli. Characterization of a new aerobic hybrid pathway. J Biol Chem 1998, 273: 25974–25986. 10.1074/jbc.273.40.25974View ArticlePubMedGoogle Scholar
- Schujman GE, Paoletti L, Grossman AD, de Mendoza D: FapR, a bacterial transcription factor involved in global regulation of membrane lipid biosynthesis. Dev Cell 2003, 4: 663–672. 10.1016/S1534-5807(03)00123-0View ArticlePubMedGoogle Scholar
- Klages U, Markus A, Lingens F: Degradation of 4-chlorophenylacetic acid by a Pseudomonas species. J Bacteriol 1981, 146: 64–68.PubMed CentralPubMedGoogle Scholar
- Haggblom MM: Microbial breakdown of halogenated aromatic pesticides and related compounds. FEMS Microbiol Rev 1992, 9: 29–71.View ArticlePubMedGoogle Scholar
- Dunaway-Mariano D, Babbitt PC: On the origins and functions of the enzymes of the 4-chlorobenzoate to 4-hydroxybenzoate converting pathway. Biodegradation 1994, 5: 259–276.View ArticlePubMedGoogle Scholar
- Chae JC, Kim Y, Kim YC, Zylstra GJ, Kim CK: Genetic structure and functional implication of the fcb gene cluster for hydrolytic dechlorination of 4-chlorobenzoate from Pseudomonas sp. DJ-12. Gene 2000, 258: 109–116. 10.1016/S0378-1119(00)00419-4View ArticlePubMedGoogle Scholar
- Gescher J, Zaar A, Mohamed M, Schagger H, Fuchs G: Genes coding for a new pathway of aerobic benzoate metabolism in Azoarcus evansii. J Bacteriol 2002, 184: 6301–6315. 10.1128/JB.184.22.6301-6315.2002PubMed CentralView ArticlePubMedGoogle Scholar
- Zhuang Z, Song F, Zhang W, Taylor K, Archambault A, Dunaway-Mariano D, Dong J, Carey PR: Kinetic, Raman, NMR, and site-directed mutagenesis studies of the Pseudomonas sp. strain CBS3 4-hydroxybenzoyl-CoA thioesterase active site. Biochemistry 2002, 41: 11152–11160. 10.1021/bi0262303View ArticlePubMedGoogle Scholar
- Bateman A: The structure of a domain common to archaebacteria and the homocystinuria disease protein. Trends Biochem Sci 1997, 22: 12–13. 10.1016/S0968-0004(96)30046-7View ArticlePubMedGoogle Scholar
- Scott JW, Hawley SA, Green KA, Anis M, Stewart G, Scullion GA, Norman DG, Hardie DG: CBS domains form energy-sensing modules whose binding of adenosine ligands is disrupted by disease mutations. J Clin Invest 2004, 113: 274–284. 10.1172/JCI200419874PubMed CentralView ArticlePubMedGoogle Scholar
- Birktoft JJ, Holden HM, Hamlin R, Xuong NH, Banaszak LJ: Structure of L-3-hydroxyacyl-coenzyme A dehydrogenase: preliminary chain tracing at 2.8-A resolution. Proc Natl Acad Sci U S A 1987, 84: 8262–8266.PubMed CentralView ArticlePubMedGoogle Scholar
- Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D: Detecting protein function and protein-protein interactions from genome sequences. Science 1999, 285: 751–753. 10.1126/science.285.5428.751View ArticlePubMedGoogle Scholar
- Enright AJ, Iliopoulos I, Kyrpides NC, Ouzounis CA: Protein interaction maps for complete genomes based on gene fusion events. Nature 1999, 402: 86–90.View ArticlePubMedGoogle Scholar
- Coggins BE, Li X, McClerren AL, Hindsgaul O, Raetz CR, Zhou P: Structure of the LpxC deacetylase with a bound substrate-analog inhibitor. Nat Struct Biol 2003, 10: 645–651. 10.1038/nsb948View ArticlePubMedGoogle Scholar
- Sali A: Functional links between proteins. Nature 1999, 402: 23–26. 10.1038/46915View ArticlePubMedGoogle Scholar
- Doolittle RF: Do you dig my groove? Nat Genet 1999, 23: 6–8. 10.1038/12597View ArticlePubMedGoogle Scholar
- Zhu H, Schein CH, Braun W: MASIA: recognition of common patterns and properties in multiple aligned protein sequences. Bioinformatics 2000, 16: 950–951. 10.1093/bioinformatics/16.10.950View ArticlePubMedGoogle Scholar
- Hulo N, Sigrist CJ, Le Saux V, Langendijk-Genevaux PS, Bordoli L, Gattiker A, De Castro E, Bucher P, Bairoch A: Recent improvements to the PROSITE database. Nucleic Acids Res 2004, 32(Database issue):D134–137. 10.1093/nar/gkh044PubMed CentralView ArticlePubMedGoogle Scholar
- Sharma SK, Kapoor M, Ramya TN, Kumar S, Kumar G, Modak R, Sharma S, Surolia N, Surolia A: Identification, characterization, and inhibition of Plasmodium falciparum beta-hydroxyacyl-acyl carrier protein dehydratase (FabZ). J Biol Chem 2003, 278: 45661–45671. 10.1074/jbc.M304283200View ArticlePubMedGoogle Scholar
- Ciria R, Abreu-Goodger C, Morett E, Merino E: GeConT: gene context analysis. Bioinformatics 2004.Google Scholar
- GeConT Home Page[http://www.ibt.unam.mx/biocomputo/gecont.html]
- Pfam Home Page[http://www.sanger.ac.uk/Software/Pfam/]
- MASIA 2.0 Home Page[http://184.108.40.206/masia/]
- Kraulis P: MOLSCRIPT: a program to produce both detailed and schematic plots of protein structures. J Appl Crystallogr 1991, 24: 946–950. 10.1107/S0021889891004399View ArticleGoogle Scholar
- Merritt EA, Bacon DJ: Raster3D: Photorealistic Molecular Graphics. Methods Enzymol 1997, 277: 505–524. 10.1016/S0076-6879(97)77028-9View ArticlePubMedGoogle Scholar
- Schein CH, Ozgun N, Izumi T, Braun W: Total sequence decomposition distinguishes functional modules, "molegos" in apurinic/apyrimidinic endonucleases. BMC Bioinformatics 2002, 3: 37. 10.1186/1471-2105-3-37PubMed CentralView ArticlePubMedGoogle Scholar
- Katoh K, Misawa K, Kuma K, Miyata T: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 2002, 30: 3059–3066. 10.1093/nar/gkf436PubMed CentralView ArticlePubMedGoogle Scholar
- Goodstadt L, Ponting CP: CHROMA: consensus-based colouring of multiple alignments for publication. Bioinformatics 2001, 17: 845–846. 10.1093/bioinformatics/17.9.845View ArticlePubMedGoogle Scholar
- Cuff JA, Barton GJ: Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins 2000, 40: 502–511. 10.1002/1097-0134(20000815)40:3<502::AID-PROT170>3.0.CO;2-QView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.