Functional relevance of dynamic properties of Dimeric NADP-dependent Isocitrate Dehydrogenases

Background Isocitrate Dehydrogenases (IDHs) are important enzymes present in all living cells. Three subfamilies of functionally dimeric IDHs (subfamilies I, II, III) are known. Subfamily I are well-studied bacterial IDHs, like that of Escherischia coli. Subfamily II has predominantly eukaryotic members, but it also has several bacterial members, many being pathogens or endosymbionts. subfamily III IDHs are NAD-dependent. The eukaryotic-like subfamily II IDH from pathogenic bacteria such as Mycobacterium tuberculosis IDH1 are expected to have regulation similar to that of bacteria which use the glyoxylate bypass to survive starvation. Yet they are structurally different from IDHs of subfamily I, such as the E. coli IDH. Results We have used phylogeny, structural comparisons and molecular dynamics simulations to highlight the similarity and differences between NADP-dependent dimeric IDHs with an emphasis on regulation. Our phylogenetic study indicates that an additional subfamily (IV) may also be present. Variation in sequence and structure in an aligned region may indicate functional importance concerning regulation in bacterial subfamily I IDHs. Correlation in movement of prominent loops seen from molecular dynamics may explain the adaptability and diversity of the predominantly eukaryotic subfamily II IDHs. Conclusion This study discusses possible regulatory mechanisms operating in various IDHs and implications for regulation of eukaryotic-like bacterial IDHs such as that of M. tuberculosis, which may provide avenues for intervention in disease.

. Isocitrate Dehydrogenases are important enzymes essential for survival of all organisms. In humans, mutations in IDHs have been associated with diseases like Glioblastoma [2]. IDH is also important for applications in biotechnology, drug design against pathogens and for general understanding of biochemistry and systems biology.
IDHs are functionally either monomers or dimers. The functionally monomeric type has an active site completely defined by a single protein chain, while the functionally dimeric type has active sites contributed to by residues from both chains. Examples of functional monomeric type are the Azotobacter vinelandii IDH [3] [PDB:1ITW] and Corynebacterium glutamicum IDH [PDB:2B0T]. Bacteria such as Mycobacterium tuberculosis [4] and Vibrio [5] have both dimeric type IDHs (IDH1) and monomeric type IDH (IDH2). Functionally dimeric IDHs are more abundant and diverse. In this study, unless otherwise mentioned, references to IDH from Mycobacterium, Vibrio or any such bacterium refers to the dimeric type IDH.
Previous studies [6,7] have classified dimeric NADPdependent IDHs into two groups: Subfamily I (S1-IDH) and Subfamily II (S2-IDH), while NAD-dependent IDHs have been classified as Subfamily III (S3-IDH). There are several unclassified IDHs which do not fall into these three subfamilies. Phylogenetic analysis of increasingly available data [8][9][10] tends to indicate that cofactor-specificity is not a monophyletic property; i.e., NAD-dependent IDHs may be found in all subgroups and are ancestral to all dimeric IDHs. NADP-dependent IDHs are not found in subfamily III, while the functionally monomeric IDHs are all NADPdependent. S1-IDHs are homodimers with two active sites, active in soluble dimeric form, and are found in Prokaryotes. Most are NADP-dependent, such as Escherischia coli IDH [11] and Bacillus subtilis IDH [12]. Some are NAD-dependent, such as Acidothiobacillus thiooxidans IDH [PDB:2D4V] [13] and Hydrogenobacter thermophilus IDH [14].
IDHs have various functions in the biochemistry of organisms. Anaerobic bacteria use NAD-dependent IDHs for diverse purposes such as glutamate biosynthesis [18]. In aerobic organisms, IDHs catalyze an irreversible step in the Tricarboxylic Acid cycle (TCA) or Krebs cycle, responsible for respiration. Eukaryotic mitochondria use NAD-dependent IDHs of subfamily III for this purpose. Aerobic bacteria dependent on the Glyoxylate bypass for survival during conditions of glucose starvation have NADP-dependent IDHs that perform this role [8].
To open the Glyoxylate bypass, IDH is inactivated by kinase phosphorylation in enteric bacteria such as Escherischia coli IDH [19,20], but not in others like Bacillus subtilis IDH [21]. This specificity is facilitated by the interaction of kinase AceK with the AceK Recognition Segment (ARS) of E. coli IDH [20,22]. Eukaryotic NADPdependent IDHs replenish pathways concerned with lipid synthesis [23] oxidative stress repair [24] with NADPH or oxoglutarate. Eukaryotic cells contain at least two kinds of NADP-IDH isoenzymes: cytosolic and mitochondrial. Fungi, plants and various protists may have localized IDH isoenzymes for organelles like chloroplasts, glyoxysomes, peroxysomes etc. This functional diversity in subfamily II implies that the enzymes have evolved diverse catalytic rates and mechanisms of regulation [25].
Regulation by phosphorylation has not been shown to exist in eukaryotic subfamily II IDHs. However dimeric NADP-dependent IDH from the pathogenic bacterium Mycobacterium tuberculosis [4,26,27] (M.tb IDH or MtIDH1) is shown to get phosphorylated [26] during the persistent stage. M.tb IDH is closer in sequence identity to Eukaryotic IDHs and belongs to subfamily II. The closest homologous resolved structure in the Protein Data bank [28] belongs to its host i.e. Human cytosolic IDH, sharing 65.4% identity with MtIDH1. The recently identified Sinorhizobium IDH [PDB:3US8] is a subfamily II bacterial IDH, and has a higher identity at 72.4%, but is not included in study.
NADP-dependent IDH1 from Mycobacterium tuberculosis takes part in the TCA cycle, and has a functional glyoxylate bypass. An attempt [26] was made to compare it's function with that of Escherischia coli IDH, and identify the kinase responsible for deactivating IDH1 by phosphorylation. The kinase PknG was seen to be the most likely candidate. It phosphorylated Serine 213 in M.tb IDH1. To decipher the mechanism of deactivation, a homology model of the M.tb IDH1 [27] was constructed.
This structure revealed that the residue targeted for phosphorylation by the kinase PknG, is in a different location from that of E.coli IDH [29]. E. coli IDH gets phosphorylated at Serine 105 which is located within the active site cavity, and takes part in anchoring the substrate isocitrate. M.tb IDH1 seems to have a remote buried target, where the target Serine, while located close to the active site, does not have a direct role to play in catalysis. Moreover, the mechanism of access to this Serine by any kinase attempting to phosphorylate the residue is unclear.
The mechanism of access to this residue cannot be explained by simulation of the model structure alone, and the need was felt to compare the results with other IDH structures to understand the significance of differences in atomic motions. The current study therefore concentrates mainly on dimeric NADP-dependent IDHs from subfamilies I and II and additionally subfamily IV (Table 1), with an emphasis on regulation in dimeric M.tb IDH.

Methods
We first extend earlier phylogenetic studies [6,[8][9][10]30] using a larger number of sequences and combine this with structural information. Representative dimeric IDH structures were first aligned using the structural alignment tool STAMP [31] to ensure that functional residues (Table 1 for representative list) were aligned. This was then subject to CLUSTALW [32] realignment by preserving gaps using the Jalview [33] interface [see Additional file 1]. This was done to ensure that catalytic and important scaffold residues are aligned as subsequent sequences were added to the initial set.
Full-length reviewed protein sequence ids provided by the ExPasy Enzyme database [34] [EC:1.1.1.42] from UniProt [35] and Protein Databank [28] structures were used. BLAST was run on each of these sequences using the UniProt web interface to identify similar sequences. We also added eukaryotic NAD-dependent IDHs yielding a dataset consisting of 111 dimeric IDH sequences [see Additional File 2].
Average distance (UPGMA) and neighbor joining methods [36] were initially used through the Jalview interface to generate phylogenetic trees ( Figure 1). The average distance method tree for dimeric IDH sequences shows four groups of IDHs. While this method yields clustering information about the phenetic similarities or differences between the sequences, it does not necessarily trace the evolutionary pathway [37].
The IDH dataset is characterized by large variation in sequence identity (15% and above). Yet the overall structures and distinct scaffold and active site residues are conserved. Rate heterogeneity estimation was therefore used with the Maximum likelihood method to account for conserved residues. The required α shape parameter for gamma-distribution for 8 categories was estimated using tree-puzzle [38], and highly similar sequences reported by the program were reduced to one representative.
The program ProML in Phylip [39] was used to calculate the final tree (Figure 2), and the coefficient of varia- At most four representative crystal structures were chosen from each group seen in the phylogenetic tree (Table 1), making a total of 9 structures, four each from subfamily I and II and one belonging to neither. An additional homology model of dimeric IDH from Mycobacterium tuberculosis [27] (subfamily II) was also included. The sequence alignment of these 10 structures is shown in Figure 3.

Molecular dynamics
In order to examine the consequences of the phylogenetic and structural variations, molecular dynamics simulations were carried out. The structures given in Table 1 were used for this analysis. Ligands, cofactors and divalent ions were removed to make comparisons easier.
AMBER version 9 [42] with the ff99 [43] forcefield was used. Protonation states were assigned to each structure using PDB 2PQR [44] through ProPKa [45] at pH 7.0. With the exception of ApIDH, all other IDH structures that were used lacked disulphide bonds. The protein structures were solvated with the TIP3P [46] water model in a truncated octahedral box with a 10Å buffer and neutralizing ions added. Periodic boundary conditions were used. Each system contained approximately 800-830 residues and~20000 water molecules.
All systems were first minimized with solute restraints for 500 steepest descent (SD) and 500 Conjugate gradient (CG) steps followed by minimizations without restraints for an additional 1500 SD and 3000 CG steps. At most four representatives of each type (I, II and IV) of NADP-dependent IDH were chosen for simulation, in addition to the model of MtIDH1. Ligands and metal ions were removed, as they are different in each case. Uniprot sequences may be longer than PDB lengths given here, due to unresolved terminal residues. These residues were not modelled. Monomeric IDHs (M) were simulated but results are not discussed here. The data of the monomeric type is provided here for completeness and comparison purposes.
The systems were subsequently heated to 300 K at constant volume. An equilibration run was carried out for 250 ps under constant pressure (NPT) conditions with isotropic box scaling for pressure regulation. The particle mesh Ewald method [47] was used to model the electrostatics. Kinetic and total energy of the system was monitored to ensure stability for equilibration. The root mean squared deviation (RMSD) of atomic coordinates relative to the starting minimized structure was also monitored at this stage. SHAKE [48] was used to enable a timestep of 2fs. The Langevin thermostat [49] was used.
Simulations were run for 20 ns, and some were extended if required for up to 30 ns to ensure stability.
A window of 15 ns was chosen from each of these simulations, which showed the least variability in the RMSD plots. Standard fluctuation analysis and correlation analysis were used to analyse these simulations, using the  Table 1. Numbers correspond to residues given in Table 2. The numbers are 1-9 and A-F. Colors correspond to those given in structure markers in other figures. Some C-terminal residues of Thermus thermophilus TtIDH are not shown, as this IDH is longer than other IDHs and the extra region doesn't align with the other IDH sequences.
Subfamily III consists of heterodimeric NAD-dependent IDHs, along with a few bacterial members. An additional group whose members were previously classified as outliers [7,8] are found to be closer to subfamily III. A resolved structure of Thermus thermophilus ( Figure 5) belongs to this group. The structure and alignment show homodimers with 480-500 residues per chain with a unique extended C-terminal region of approximately 100 residues. This suggests that the clade may be regarded as a distinct subfamily IV.
Maximum likelihood analysis shows notable differences. NAD-dependent bacterial IDHs are grouped with subfamily III by phenetic clustering. Maximum likelihood analysis places them closer to subfamily I. These may be considered outliers, as they are most likely homodimers like those of subfamily I but do not seem to be part of subfamily I. Subfamily III IDHs are mostly NAD-dependant eukaryotic heterodimers, and some of these outliers may share close common ancestors with them.
Subfamily IV shows two subgroups. One subgroup contains Rickettsia IDH and other bacterial IDHs, while the other has Thermus thermophilus IDH and several putative thermophilic sequences.
Sequence alignment shows regions of conservation and regions where insertions or gaps are prominent between the different subfamilies ( Figure 3, Figure 4 and Figure 5). These variable regions will be referred to as: Complementary region 1 (CR1), Phosphorylation loop (Phos-loop), Clasp domain (clasp), ARS-like [52], NADP discriminating loop, nucleotide binding loop and Complementary region 2 (CR2).
The homodimeric IDHs of subfamilies I, II and IV have two active sites present symmetrically, each formed from residues contributed by the larger domain of one subunit, and the smaller central domain of the other subunit. These homodimers may be described as pseudo 3Ddomain-swapped dimmers [54,55] as a single subunit is not known to be independently active [4]. It has been speculated that higher order oligomers, such as tetramers [7,30] may exist, however they retain the homodimer as a basic unit. The prominent cross-over domain forming interaction between the two subunits is called the clasp domain as it resembles two hands, each representing a subunit, clasped together (see Figure 4 and Figure 5 for comparative structures).
Subfamily III IDHs form heterodimeric units with one active site and one regulatory site. Yeast NAD-dependent IDH [56] [PDB:3BLV], [PDB:3BLW], [PDB:3BLX] is represented by two sequences in Uniprot [Uniprot: IDH1_YEAST] and [Uniprot:IDH2_YEAST]. Two heterodimers associate by their clasp domains to form tetramers and two such tetramers associate to form the octamer, which is the biological unit in yeast. The clasp domain (C) is usually formed by at least one β-sheet between the two subunits.
The distinctly different shape of this domain in each subfamily helps to immediately distinguish structurally the four subfamilies of dimeric IDHs. Subfamily IV IDH subunits are longer than other dimeric IDHs. The extra length is accounted for by a long C-terminal region forming a larger clasp-like structure (C2) with motif ββ-α-β-α-ββ, as seen in T. thermophilus ( Figure 5). Without the longer C-terminal region, the subfamily IV homodimeric IDHs structurally resemble subfamily III heterodimeric IDHs. The clasp region is known to play role in higher order oligomer formation and signalling [7,56].
The various regions which show variations in sequence length are highlighted in the alignment (see Figure 3 and the corresponding color-coded region in Figure 4 and Figure 5). The function of these regions is not apparent from sequence or structural examination, but they clearly classify the different subfamilies. These features may modulate the rate and regulation of the enzyme through the diversity of roles they play in the biochemical cycles of their corresponding organisms.
As an example, the ARS-like region differs greatly in length and associated structure within subfamily I. At least five types can be identified, of which three can be structurally represented ( Figure 6). These can be correlated with the bacterial family and the role and associated mode of regulation of IDH in these bacteria. The variation in length is not seen in subfamily II, and this region is reduced in subfamily III and IV.
Simulations reveal the dynamic properties of these enzymes and their modes of action. The role in modulation of the enzyme by these regions may be inferred from their dynamic behaviour, allowing us to probe the mechanism of the enzyme further.

Simulations
The major regions of fluctuation correspond mostly to the variable regions in the alignment ( Figure 6). Sharp peaks are observed in E.coli (Figure 7) and other S1-IDHs [see Additional file 4: S4 A-D], while broader regions corresponding to the three loops show movement in the α-helix regions for subfamily II [see Additional File 4: S4 E-I]. The third loop or nucleotide-binding loop is more mobile in Eukaryotic IDHs than bacterial IDHs within subfamily II, corresponding to the longer loop in the alignment (Figure 3). These regions are known to have higher crystal B-factors [15,57,58] in several structures in comparison with other regions within the protein, implying that they are characterized by higher mobility.
Correlation plots of the two subfamilies, subfamily I and subfamily II (Figure 8 and Figure 9, also [see Additional   Figure 3. Note the difference in Clasp region, the three loops and the ARS-like region. Subfamily I IDHs have α-helices (β-α-β pattern from each subunit). Subfamily II have all β (β-ββ-β) greek-key motif [57,58]. Images were made using Chimera [80].  Figure 3. The sequentially central homologous clasp region (C1) in subfamilies III and IV is reduced to a two-strand anti-parallel sheet (ββ) (residues 148-160 in TtIDH), and is similar in both. C-terminal forms a larger domain over the clasp (C2). Images were made with Chimera [80]. map of PmIDH (Figure 9). The nucleotide-binding loop (371-392) also shows similar correlations. Other negatively correlated regions include the n-terminal residues of both subunits with each other, suggesting a correlated hinged open-close motion. This hints at the possibility that each active site functions in tandem.
Positive correlations are seen as expected near the diagonal and in domains which are sequentially distant, but structurally close and associated, such as regions 605-684 and 190-270 both of which refer to the same region on the different subunits. Most of these correlations are either completely absent or very subdued in S1 type IDHs.
Among subfamily II IDHs, the movement of the NADP-binding loop is pronounced in mitochondrial enzymes, such as PmIDH and YmIDH, and subdued in HcIDH [see Additional file 4: S5]. The Mycobacterium MtIDH1 model was constructed based upon pig PmIDH as a template. However, the correlations of the loops are smaller in the MtIDH1 model than in PmIDH. The NADP discriminating loop, in particular has much smaller correlations. The cytosolic Human IDH shows very low negatively correlated motion for the NADP discrimination loop with respect to the central domain, in both the active [PDB:1T0L] and inactive [PDB:1T09] forms, whereas in both PmIDH and in YmIDH, this correllation is very strong (~1.0). The nucleotide-binding loop has less movement in MtIDH and TmIDH than in the Eukaryotic IDHs as the loop is shorter in the prokaryotes, as can be seen in the alignment in Figure 3.
The loops are subject to large domain motions. Principal component analysis (PCA) of the simulation data was used to see trends in the relative domain motions. The first principal component shows a very high contribution compared to the second and the third in subfamily II IDHs, while the difference is much lesser in subfamily I. In the stable sample sampled region (15 ns), this difference is subdued, but still discernible [see Additional file 4: S6].
A porcupine plot [59] of the PCA movements ( Figure 10) shows domain motion, which is extensive in S2-IDHs, but attenuated in S1-IDHs. The overall RMSD and gyration plots show two relatively stable regions in S2-IDHs, implying an open and a closed form, but show only one region in S1 IDHs. The transition to a more open form is seen in S2-type IDHs, while bacterial types prefer the closed form. The porcupine plot of motions along the first principal component highlights this transition. Subfamily II IDHs have a pronounced open-close motion, which appears to compensate for the hindrance to entry into the active site that result from the large loops. Cyanobacteria like Nostoc IDH_ANASP have the longest ARS-like sequence, which is not structurally resolved yet. The shortest S1-type, IDH_STRMU (Streptococcus mutans) may be NAD-dependent. S2-IDHs have conserved structure, represented by Pig PmIDH. The residues may differ, however, as the alignment between PmIDH and Mycobacterium tuberculosis IDH_MYCTU shows here. The MtIDH sequence has a stretch of glutamates (-EEE-) and is richer in acidic residues. The shortest length is seen TtIDH, as well as S3-IDHs. Image was made using Chimera [80] and Jalview [33].  The motions of the loops appear to effectively open and close the active site ( Figure 10). The Complementary regions I and II are so-named because they may explain the differences in the hinge-like motion between subfamilies I and II. Subfamily I has larger CR1 and correspondingly smaller CR2. In contrast, subfamily II has larger CR2 and correspondingly smaller CR1, while subfamily IV is short in both regions. While sequentially distant, these two regions are structural neighbours of each other. They are located close to the hinge region, and may modulate the differences in motion between the subfamilies I and II.
The results show that the mode of working of subfamily I and subfamily II are distinctly different. Although the enzyme has the same basic function, these differences correlate with their overall function in the biochemical pathway of the organism. The loop movements in subfamily II may be exploited for regulation by modulation of the enzyme in eukaryotes, where the enzyme is not involved in respiration, while the ARS region may be

Phylogeny
Subfamily II IDHs include Eukaryotic IDHs and some bacterial IDHs. Thermatoga maritima and Desulphotalea IDHs along with some others such as Clostridia form one basal group of bacterial S2-IDHs. The other group of bacterial S2-IDHs consists of alphaproteobacterial IDHs and Actinobacterial IDHs from Bifidobacteria and Actinomycetales. These are closer to the isozymes of Eukaryotes and many organisms within this subgroup are either endosymbionts or cellular pathogens.
The alphaproteobacterial members, such as Rhizobium IDH [60], the recently resolved Sinorhizobium meliloti Figure 9 Correlation map for S2-IDH. Normalized Correlation map representative for dimeric S2-IDH (Sus scrofa mitochondrial). S2-IDH map has been annotated. Colored circles within the lower triangle region representing negative correlations, show the general movement indicated in the inset image, with the color bars corresponding to the color codes in Figure 3, Figure 4 and Figure 7. The region highlighted in the upper triangle of the matrix show the positive correlations of the loops with each other (green) and the central region (blue). This graph was plotted using Bio3d [52] and structure image was made in Chimera [80]. [PDB:3US8], Brucella, Bradyrhizobium and Paracoccus have IDHs most closely related to their Eukaryotic homologs, while Actinobacteria like Mycobacteria are more distant. This similarity is in agreement with the Endosymbiont theory of evolution [61,62] which states that mitochondria evolved from alphaproteobacterial endosymbionts sharing a close common ancestor with Rhizobia and Rickettsia.
The phylogenetic analysis answers an immediate question: what is the reason for the similarity between M. tuberculosis IDH1 and host IDH? This similarity is not a result of gene exchange between host and parasite, and a clear pathway can be traced through evolution. Many of these, such as Rhizobium show close common ancestry with eukaryotic mitochondria, while others like Rickettsia have an NAD-dependent IDH of subfamily IV which appears to beclose to the subfamily III IDHs present in mitochondria. Most α -proteobacterial IDHs have subfamily II NADP-dependent IDHs, while some have NAD-dependent IDHs which are close to subfamily III or IV. This implies that IDH is one of several proteins, such as kinases [63] within the proteome of these organisms, which can be termed eukaryotic-like. Eukaryotic-like genes may aid pathogenesis [64] and endosymbiosis.

Activity regulation
Some important active site residues are listed in Table 2 and can be grouped as those interacting with substrate isocitrate and those involved in interactions with the cofactor. Residues associated with isocitrate binding [65,66] are conserved in most IDHs. Among them, S113 and T105 in E. coli IDH are involved in anchoring the substrate isocitrate within the active site. S113 is also the target of phosphorylation in E.coli regulation [66,67]. The Phos loop is the loop between and including these two residues. This loop is considerably larger in S2-group IDHs, hindering kinase phosphorylation [15,57,58]. The larger loop in subfamily II has a prominent α-helix (see alignment in Figure 3 and color-coded regions in Figure 4).
Residues K344 and Y345 in E. coli IDH are NADPbinding residues found to have a strong role in cofactor specificity [10]. The mutant K344D, Y345I makes the enzyme NAD-specific, incapable of using NADP as a cofactor [68]. The loop on which these residues are present is thus called the NADP-Discriminating loop, and the residues in this position can be used to distinguish NADP specificity vs. NAD specificity, making this fact a useful classification criterion [69].
The replacement of positively charged K with negatively charged D is thought to change the interaction with the electronegative phosphate of NADP [68]. This mutation (KY to DI) mimics the residues found in NAD-dependent IDHs in subfamily III and IMDH [68]. Most NADP-dependent IDHs from subfamily I and IV have K and Y, while those of subfamily II have R and H. Monomeric type IDHs and some subfamily I IDHs have K and H, responsible for high NADP-specificity [70]. There are however IDHs with DI in all four subfamilies, mostly at the basal level. The third loop or the nucleotide-binding loop has residues which anchor and guide the nucleotide base of the cofactor [10].
The three loops are therefore important for modulating the activity of the enzyme, and may provide clues for the mechanisms of activity of the enzyme. These loops may regulate the entry of substrate on their own, or help guide the substrate and cofactor to the active site, discriminate between similar cofactors, such as demonstrate selectivity for NADP vs. NAD, and thus contribute towards tuned regulation, depending on the function of the enzyme within the biochemical pathways of the organism.
Known regulation mechanisms for NADP IDHs include transcription control [71], inhibition by NAD(P)H or ATP (TCA feedback), concerted glyoxylate and oxaloacetate [72] phosphorylation by kinase [11], glutathione inhibition [73], specific changes in secondary structure as in Human cytosolic IDH [15] or allosteric regulation as in yeast subfamily III IDH [56]. In eukaryotes, these can be quite different in each case, as isoenzymes may be present for different tasks.
The three loops i.e., the Phos loop, NADP discriminating loop and third nucleotide-binding loop, are prominent with α-helices in subfamily II IDHs. Eukaryotic IDHs have evolved as paralogs within the same cell, within different organelles, and adapted to different biochemical feedback mechanisms. Modulation of the movement of these loops is likely to affect the activity of these enzymes.
Mitochondrial subfamily II IDHs (PmIDH and YmIDH) show anti-correlated motions in all three loops with the domains, while cytosolic IDH (HcIDH) does not show the correlation in the NADP-discrimination loop. However, the first loop shows anti-correlated movement. The cytosolic enzyme may be subjected to feedback concerning the substrate isocitrate.
In mitochondria the NADP-dependent iso-enzymes of subfamily II, compete with efficient NAD-dependent subfamily III enzymes for isocitrate. The substrate is plentiful in the mitochondria, thus rendering the relative availability of cofactor NADP or NAD as the regulating factors, to which subfamily II IDHs may respond.
Sequence lengths within subfamily I are variable. E.coli IDH has a length of 416 residues and B. subtilis IDH is 423 residues long, while Nostoc sp. [Uniprot:IDH_-NOSS1] has 471 residues. Most of these differences are incorporated in the ARS in E. coli or the ARS-like region [22]. The ARS region in E.coli IDH plays a role in assisting the AceK kinase to phosphorylate its target S113 [22,74]. The same region in B. subtilis IDH forms a fairly rigid helical hairpin structure which prevents AceK from acting on BsIDH [21].
Subfamily I may be divided into subgroups by their variable regions alone ( Figure 6). Assuming the variable region is defined between EcIDH 239-275, the lengths of this region correlate with different families of bacteria. Gram-negative bacteria of the proteobacterial order: E. coli, Burkholderia pseudomallei, Helicobacter pylori, Coxiella burnetii etc., share the structure seen in EcIDH and BpIDH, which is~36 residues. These may follow the classic regulation with kinase AceK seen in E.coli (Class A [22]), Gram positives like B. subtilis [21] and the NADdependent Acidothiobacillus thiooxidans IDH [13] all of which show a large helix hairpin, of~49 residues (Class C [22]). Archaea such as Aeropyrum pernix [75], Sulfolobus tokodaii and Archeoglobus fulgidus IDH [76] have a short loop with a short helix, of~37 residues (Class D [22]). In Nostoc, the sequence length is~84 residues. Nostoc [Uniprot:IDH_NOSS1] requires IDH for a different role, i.e. nitrogen fixation [77]; it is likely that the regulation process may be different. Aquifex aeolicus IDH has 32 residues, representing another type of system. The Streptococcus mutans sequence shows the shortest sequence in S1.
Subfamily II IDHs do not show large variations in length of the ARS-like region. S4-IDHs have a very short length. This indicates that the region may have little direct influence in actual enzymatic activity, but may serve in protein-protein interactions concerned with bacterial regulation, as seen in E.coli IDH [20].
Within subfamily II, bacterial IDHs are differentiated from the Eukaryotic ones by the length of the nucleotidebinding loop region. The nucleotide-binding loop has a conserved α-helix with a conserved threonine and aspartate (T390 and D392 in EcIDH) and residues around them which contribute to cofactor binding [10] and specificity [69]. The nucleotide-binding loop is longer in subfamily II IDHs than in subfamily I, and within subfamily II, bacterial IDHs have shorter lengths than eukaryotic IDHs. This makes the helix more mobile in eukaryotic IDHs than bacterial IDHs.

Conclusions
Implications for Mycobacterium tuberculosis NADP-dependent IDHs take part in the TCA cycle, and there is provision for a glyoxylate bypass. The ARS region has been shown to play a role in regulation of IDHs in E. coli and the variation in structure of this region implies similar roles in other IDHs as well. Subfamily II bacterial NADP-dependent IDHs with a functional glyoxylate cycle, such as Mycobacterium tuberculosis IDH1 [78] perform a similar function in the bacterial cell like other subfamily I bacterial IDHs. It implies that they may also utilize the ARS-like region as in similar bacterial IDHs.  TmIDH  TtIDH  AvIDH  CgIDH   Subfamily  I  I  I  I  II  II  II  II  II  IV  M  M   Phos loop start  T105  T96  T112  T107  T78  T78  T77  T78  T77  T90  S86  S85 Phos loop end S113 S104 S120 S115 S95 S95 S94 S95 S94 S98 S132 S130 Isocitrate Binding Active site residues in Isocitrate Dehydrogenases. The residues in S1, S2 and S4 align properly in structural alignment. Functionally monomeric IDHs (type M) are also included for comparison. In monomeric IDHs, the respective residues don't appear in the same sequence. They do not have a Phos loop. Serine residues (such as S86 in AvIDH) play a similar role to threonines in dimeric IDHs and are indicated in italic font. N-loop refers to the NADP binding loop.
Metabolic Flux analysis [79] of the pathway indicates that inactivation of IDH is required for the glyoxylate cycle to function. The kinase responsible for inactivation, i.e., PknG and its target S213 was determined previously [26]. An attempt was made to decipher the effects of phosphorylation of the target serine in comparison with other likely targets in a previous study [27]. However, it was also found that the target serine was buried during the length of the short 5 ns simulation, and extending the simulation to 30 ns did not result in any exposure of the residue.
The serine residue lies below the variable region helix of the model structure. Correlation plots of all S2-IDHs show a square region containing the ARS-like region and the adjacent helix which has high positive correlations and negligible or no negative correlations. For the MtIDH1 model, this same square contains prominent negative correlations, and S213 seems to show this tendency as well, with respect to the corresponding residues in the other subunit ( Figure 11). Compared with the template PmIDH used, this tendency for movement may be attributed to a greater proportion of acidic residues, such Figure 11 Correlation map for MtIDH1. The region around S213, including the ARS-like region just above it, shows negative correlations not seen in any S2-type IDH simulated here. The ARS-like region in particular shows negative correlations, and so does S213 and its immediate vicinity. This movement may be biologically relevant, as it does not appear in any other IDH simulation, particularly S2-IDHs, and is unlikely to be obtained by chance.
as a stretch of three glutamates, both on the surface of the modelled structure and mainly in these loops, and also the replacement of bulky aromatic residues such as W with the smaller polar residue T at a critical position near S213. The large proportion of negative charges may lead to frustration in the region.
Using homology modelling, MD simulations and phylogenetic analysis of an important class of enzymes in the metabolic pathway provides clues towards the possible mechanism of phosphorylation and functional inactivation of M.tb IDH in persistent bacteria, leading to the opening of the shunt pathway. Selective biologically relevant movements of the ARS-like region and nucleotide-binding loop need to be explored further in the context of regulation and performance of the enzymes.

Additional material
Additional file 1: Alignment of isocitrate dehydrogenases. This file was used as input for obtaining the phylogeny trees in Figures 1 and 2 and is in PHYLIP format (can be viewed using a text viewer). The list of IDH sequences used is provided in Additional file 2.
Additional File 2: List of sequences with their UniProt Ids, used for the phylogeny of Isocitrate dehydrogenases and other members of the b-decarboxylase family.
Additional File 3: Alignment of Isocitrate dehydrogenases and other members of the b-decarboxylase family. This file is in PHYLIP format (can be viewed using a text viewer). The list of sequences used is provided in Additional file 2.