Structure and phylogenetic distribution of the genes coding for AK, ASDH and HD in Proteobacteria
The aminoacid sequences of the E. coli AK, ASDH, and HD sequences were used as a query to probe the protein database of completely sequenced proteobacterial genomes with the BLASTP option of BLAST program [13], in order to retrieve the most similar sequences. To this purpose 58 proteobacterial genomes were selected and, in most cases, only one strain for each species was taken into account. Data obtained are schematically reported in Figure 2, where a phylogenetic tree constructed using the RpoD sequences of the 58 proteobacteria is shown together with the number and the structure of all the retrieved AK, and HD coding genes. The asd genes were not included in Figure 2, since just one copy of this gene was retrieved from the 58 proteobacteria. The analysis of data reported in Figure 2 revealed that:
-
a)
in all the α-, β- and δ\ε-proteobacterial genomes a single, monofunctional, stand-alone, copy of the gene coding for AK or HD was detected; moreover, neither duplicated copies nor fusion events involving these genes were detected.
-
b)
multiple as well as fused copies of AK and HD were found only in γ-proteobacteria, where the scenario is (apparently) more complex and intriguing. Indeed, a variable structure and copy-number of genes coding for AK (1 to 5) and HD (1 to 2) can be observed. Moreover, there is an apparent increasing complexity concerning these genes that is parallel to the evolutionay branching of γ-proteobacteria, with enterobacteria and vibrionaceae showing the highest number of redundant and fused copies of AK and HD. This phylogenetic distribution strongly suggests that the duplication of AK coding genes and the fusion to HD apparently can be traced within γ-proteobacteria or soon after the divergence of the γ-proteobacterial ancestor from α-, β- and δ\ε-proteobacteria.
A model for the evolution of the AK and HD coding genes
On the basis of the phylogenetic distribution of stand-alone and bifunctional genes of the CP we propose a possible, plausible evolutionary and timing model explaining the extant scenario. The model, which is schematically reported in Figure 3, predicts that the proteobacterial ancestor possessed a single copy of hom, ask and asd genes. During evolution, this organization was maintained in proteobacteria belonging to the α-, β- and δ\ε-subdivisions. One of the cross-roads for the evolution of these genes is represented by the branching point between β- and γ-proteobacteria. It appears quite possible that, in the ancestor of γ-proteobacteria, a first duplication of the ask gene may have taken place, generating two redundant copies that underwent an evolutionary divergence. The finding that no bacterium (with the exception of Vibrio strains, see below) shows two copies of monofunctional ask genes, strongly suggests that this duplication event and its further fusion to hom might have occurred in a relatively short evolutionary time, giving raise to an ancestral bifunctional gene, which might have retained the function of the extant metL and thrA. This sort of "gene duplication-gene fusion coupling" is quite similar to that described recently for the evolution of γ-proteobacterial hisN and hisB histidine biosynthetic genes [6, 7, 9]. Finally, a paralogous duplication event of this bifunctional ancestor gene followed by evolutionary divergence (which very likely concerned with the regulatory mechanism, rather than the catalytic activity) led to the extant metL and thrA genes. On the basis of the phylogenetic distribution of the bifunctional genes (Figure 3), this "final" step might have occurred just before the separation between the "clusters" 1 and 2 of the γ-proteobacterial subdivision.
The biological significance of this cascade of duplication and fusion events might rely on the "patchwork" hypothesis on the origin and evolution of metabolic pathways [14]. According to this idea, metabolic pathways may have been assembled through the recruitment of primitive enzymes that could react with a wide range of chemically related substrates. Such relatively slow, unspecific enzymes may have been enabled primitive cells containing small genomes to overcome their limited coding capabilities [4]. Paralogous gene duplication event(s) followed by evolutionary divergence might have permitted the appearance of enzymes with an increase and narrow specificity and/or the diversification of function. In this way, an ancestral enzyme belonging to a given metabolic route, is "recruited" to serve a single or other (novel) pathways. Besides, it may permit the evolution and refinement of regulatory mechanisms coincident with the development of new pathways and/or the refinement of pre-existing ones.
In our opinion, the evolutionary model proposed here to explain the origin and evolution the extant metL and thrA genes is in full agreement with the Jensen hypothesis and the cascade of gene duplications and fusions involving ask and hom genes might actually represent a mechanism for the refinement of the feedback regulation mechanisms controlling the activity of the enzymes they code for.
Phylogenetic analysis
If the evolutionary model proposed here is correct, one should expect that the fused copies of AK (AKI and AKII) and HD (HDI and HDII) share a degree of sequence similarity higher than that exhibited with AKIII and HD, respectively, and cluster together in a phylogenetic tree. In order to check this hypothesis, the AK and HD aminoacid sequences were aligned using the program ClustalW [15] and the multialignments obtained used to draw the phylogenetic trees shown in Figure 4 and 5. The analysis of the AK tree (Figure 4) showed that all the α-, β- and δ\ε-proteobacterial sequences form a unique cluster separated from γ-proteobacterial ones. Besides, the γ-proteobacterial AKI, AKII, and AKIII sequences form three different and separated clusters with AKIII representing the root of the others. A similar situation can be observed in the HD tree (Figure 5): α-, β- and δ\ε-proteobacterial HD sequences form a distinct unique cluster, while HDI and HDII form two close clusters.
The topology of the phylogenetic trees obtained fits well with the evolutionary model proposed and indicates that horizotal gene transfer of these genes rarely occurred and did not strongly influenced the evolution of AK and HD domanis. However, even though the evolutionary model reported in Figure 3 is in agreement with gene structure and phylogenetic analyses, the following exceptions have to be explained:
-
1)
The absence of lysC and metL in a group of enterobacteria (Buchnera aphidicola strains, Candidatus Blochmannia floridanus, Wigglesworthia glossinidia) and in Haemophilus influenzae, the absence of bifunctional genes in H. ducrey, and the lack of hom in Coxiella burnetii, Ricketsia prowazekii, Wolbachia endosymbiont of Drosophila melanogaster and Bdellovibrio bacteriovorus. This is very likely due to the absence of the corrensponding metabolic route(s), which, in turn, is correlated to the parasitic lifestyle of these proteobacteria. Such a lifestyle may allow the bacteria to acquire essential compounds directly from the metabolic activities of their host and the adaptation to this environmental condition might have caused the loss of entire metabolic routes or part thereof.
-
2)
The increase of the AK copies in Vibrio strains in respect to other γ-proteobacteria is probably related to the high genomic rearrangement rate typical of these species.
-
3)
The absence of bifunctional ask-hom genes in Pseudomonas and Methylococcus capsulatus that, in spite of their taxonomical position within γ-proteobacteria, exhibit the same structural and organization pattern of bacteria belonging to the α-, β- and δ\ε-subdivisions. This is not an isolated example; in fact, the same situation has been recorded for other biosynthetic pathways, such as histidine biosynthesis [6, 7]. The reason(s) of such structure and organization is still unclear.
-
4)
The fusion of ask to lysA in Xanthomonadaceae, which represents an exception to this general model. In these bacteria the paralogous duplication of ask gene originated two copies, one of which fused to hom, whereas the other one underwent another fusion event with lysA, a gene coding coding for DAPDC activity). The biological significance of the last fusion might rely in the spatial colocalization of the products of the two modules and a faster feedback inhibition of the first enzyme (AK) by the end product of the pathway (lysine), whose last biosynthetic step is catalyzed by the enzyme coded for by lysA.
Analysis of gene organization
If the model proposed and its biological significance is correct, i.e. that the duplication and fusion events, and the successive evolutionary divergence allowed the three copies of AKs and the two of HDs to narrow their specificity and to become increasingly more sensitive to specific regulatory signals, then it is plausible to assume that the ancestral copy of AK (AKIII) might serve different metabolic pathways and hence might have been under the control of multiple different regulatory signals (i.e. the availability of DAP, lysine, threonine, methionine etc). On the other hand, the expression of the bifunctional genes, thrA and metL, once they were channelled towards the biosynthesis of threonine and methionine, should have become increasingly more dependent on more specific signals (for example the concentration of the final product of that route). In general, it is plausible that once a "new" gene introgresses and becomes part of a pre-existing metabolic pathway, it will become co-regulated with the other genes belonging to the same metabolic pathway. In some cases, co-regulation of genes of the same biosynthetic route is achieved by organizing genes in operon structures, even though co-regulation may also be obtained by regulon construction. This is particularly true for fused genes; as reported in previous works, based on the analysis of the histidine biosynthetic pathway in γ-proteobacteria, the appearance of fused genes (specific for a single pathway) is often parallel to their presence within operons [6, 7, 9]. This raises the question whether the structure and distribution of duplicated and fused copies of ask and hom genes might somehow be correlated to their organization in the proteobacterial genome. Therefore, we analysed the organization of all the genes of the lys, met and thr biosynthesis in all the 58 proteobacteria. Data obtained revealed that:
-
1.
Genes involved in the DAP\lysine biosynthesis are scattered throughout the chromosome(s) of all the 58 proteobacteria taken into account (data not shown).
-
2.
In addition to ask, asd and hom genes, the other two genes involved in threonine biosynthesis (thrB and thrC) are scattered on the chromosome of bacteria belonging to α-, β- and δ\ε subdivisions (except Bordetella strains that own a hom-thrC operon) (Figure 6). The γ-proteobacterial scenario is completely different; according to the hypothesis mentioned above, in all of organisms possessing a bifunctional thrA gene, it is endowed within a three-cystronic operon, in the same relative gene order (thrABC), also suggesting that its construction should have been occurred once during evolution.
-
3.
The organization of methionine biosynthetic genes in proteobacteria partly reflects that exhibited by lys or thr genes. In fact, in the α-, β- and δ\ε branches all the met biosynthetic genes are scattered on the chromosome(s) (Figure 7). This organization is also shared by γ-proteobacteria; the only exception is represented by the bifunctional metL, which is clustered with metB to form a bicistronic metLB operon.
Thus, no bifunctional gene of the CP is located outside operons. Data obtained strongly suggest that the production of genes coding for enzymes specific of a single metabolic pathway coincides with their presence within a polycistronic transcriptional unit that includes all (or at least some of) the other genes of that route. Concerning the timing of the operons construction, the comparative analysis of Figure 2, 5, and 6 revealed that the "gene duplication-gene fusion coupling" occurring in γ-proteobacteria appears to be coincident with gene clustering and the formation of operons of different length.
Analysis of microarray experiments data
In order to elucidate the correlation existing between the structure and organization of lys, met, and thr genes and their expression within the cell, we analyzed the microarray data from E. coli and P. aeruginosa, which show two different arrays of structure and organization of CP genes. Microarray data were downloaded as supplemental material to published papers (see Additional File 1: Additional References for the Expression compendium); only normalized and filtered data were used. Values were transformed into base 2 logarithm of the ratio of the wild type (untreated) / mutant (treated) expression levels, if not yet in that form.
For each of the three metabolic pathways we carried out a pairwise comparison of the expression pattern of each gene, by calculating the Pearson's correlation coefficient.
Data obtained are reported in Figure 8, whose analysis revealed:
-
1.
A low co-regulation of the methionine biosynthetic genes (Figure 8a). Most of these genes are scarcely co-expressed, and they appeared to be expressed independently from each other. The fact that both metL and metB show very high correlation coefficient value in respect to the other met genes is in agreement with their operonic organization.
-
2.
The three E. coli thrABC genes (Figure 8b) are highly co-expressed, with correlation coefficient > 0.84. This is in agreement with their organization in a compact operon.
-
3.
The trend of the lysine pathway genes in the γ-proteobacterium E. coli (Figure 8c) is quite surprising; although the lys genes are scattered throughout the E. coli chromosome, they show a high degree of co-expression with correlation coefficient values often > 0.8. It is not clear how these genes can be highly co-expressed in the absence of an operonic organization. However, it is known [16] that lysine biosynthetic genes are regulated by the so-called LYS element (lysine-specific RNA element) located in their regulatory regions and able to repress or to allow their trascription in response to lysine concentration. The high coexpression pattern of lysine bosynthetic genes might be due to this mechanism.
The same analysis was carried out on lysine, methionine and threonine biosynthetic genes of Pseudomonas aeruginosa, whose structure and organization pattern is the same of the α-, β-, and δ\ε subdivision of proteobacteria. Data obtained (reported in Figure 8) showed that, overall, there is a low degree of co-expression between genes belonging to the same pathway; this is particularly pronounced for methionine, where in some cases, the correlation coefficient assumes negative values (Figure 8e), and lysine genes, whereas the thr biosynthetic genes were more correlated between them. The low degree of co-expression of P. aeruginosa genes is in agreement with their scattering on the bacterial genome.