Reconstruction of a draft genome-scale metabolic model for E. coli Nissle 1917
The GEM reconstruction process involved multiple steps based on comparative genomics, manual curation, and validation following published reconstruction protocols [5, 27] (Fig. 1A). A high-quality genome (CP022686.1 from 2018, as compared to CP007799.1 from 2014 used by Kim et. al.) of EcN was used for reconstruction, with a length of 5,055,316 bp and annotated with a total of 5045 genes. Additionally, EcN harbours two plasmids, pMUT1 and pMUT2 [28]. First, we compared the genome of EcN against 55 related strains with high-quality GEMs available, herafter referred to as reference strains, using bidirectional best blastp hits [3]. Of the 5045 EcN genes compared, 1783 were common among all strains and 196 were unique to EcN (Fig. 1D, Additional file 1: Data S1). We observed that the genome of EcN shared the highest number of homologous genes with uropathogenic E. coli strains (UPEC) such as CFT073 (87.0% of EcN genes) and ABU 83972 (86.8% of EcN genes), with which it may share a common ancestor [29, 30, 31, 32]. CFT073 has hitherto repeatedly been compared to EcN and ABU 83972, to identify potential causes for the difference in pathogenicity between these highly related strains [24, 29, 33]. The commonly used strain E. coli K-12 MG1655 has 3693 (73.2%) homologous genes. Several horizontally acquired DNA regions that are, at least partially, shared with other E. coli strains such as CFT073, are not present in MG1655. These regions may contribute to the difference in phenotypic traits of EcN [34]. EcN contains regions that are mostly shared with extraintestinal pathogenic E. coli (ExPEC) strains (Fig. 1D). EcN specific sequences (selected regions among genetic islands I to IV) were previously observed to be shared among many ExPEC strains, but less frequently observed in intestinal pathogenic E. coli (InPEC) and non-pathogenic strains [34]. The lowest number of homologous genes was shared with all Shigella strains (59.0–66.5%).
Using the gene homology analysis between EcN and all reference strains, a draft GEM was reconstructed based on the corresponding reference GEMs. This draft model consisted of 1487 genes, 2867 reactions, and 2069 metabolites (Fig. 1B). Of these reactions, 90.9% (2607) originated from the E. coli K-12 MG1655 model iML1515, which was used as the base for the construction of this model as this is one of the most highly curated E. coli models. One of the subsystems of which multiple reactions were added from the other reference strains was tRNA charging (22 reactions). When comparing the sequence context of tRNA loci in EcN versus K-12, 15 out of 37 tRNA loci varied due to the integration of horizontally acquired genetic information, including asnT, argW, ileY, leuX, pheV, serX, and thrW [34]. The genes encoding for these seven tRNA synthetases, among others, did not have homology with the K-12 genome and were added from models other than iML1515 during the construction of the draft model (Fig. 1C). The remaining 260 reactions added from other models belonged to subsystems such as various forms of transport systems (65 reactions), alternate carbon metabolism (16), and metabolism of various amino acids (Fig. 1C). Several of the 260 reactions were unassigned (26) or removed during manual curation (63).
Furthermore, an additional comparison between nucleotide sequences of all EcN genes (BLASTn) did not identify any additional genes to be added. Finally, the draft EcN model was compared to iDK1463 and 211 reactions only present in the latter were imported along with 30 additional genes (Fig. 1C, Additional file 2: Data S2) [26].
Manual curation of EcN draft model to remove network inconsistencies
A thorough manual curation of the draft model was performed in multiple stages, which included removing inconsistencies, correcting mass and charge balances, gap filling with phenotype microarray data, and expansion of secondary metabolite biosynthetic pathways (Fig. 1B). In general, the reconstruction process followed the standard protocols [5, 27]. First, 48 reaction duplications caused by differences in directionality, metabolites, and gene rules across reference models were removed (e.g., MTHFD and MTHFD_1). Additionally, 14 reactions causing futile energy-generating cycles were corrected. Subsequently, mass and charge imbalances were identified and 53 such inconsistencies were manually resolved (6 remaining). One of the cases involved a mass imbalance of two reactions (ACGAL6PI & ACGAL6PISO) involving the metabolite N-acetyl-D-galactosamine-6-phosphate (GalNac-6-P). The conversion of GalNAc-6-P to Tag-6-P is performed in two steps: deacetylation of GalNAc-6-P to GalN-6-P and deamination/isomerisation of GalN-6-P to Tag-6-P [35]. Both original reactions were replaced by two reactions: ACGAL6PDA (deacetylation step) and GALAM6PISO (deamination/isomerisation step) with gene rules agaA and agaS, respectively (Additional file 3: Fig S1).
In the next stage of gap filling, we used an algorithm available in cobrapy [36] that evaluates a minimal set of reactions required to be added to make a pathway feasible [37]. All blocked pathways in the model were checked for gap filling using the panreactome of all the reference GEMs. Gap filling suggested that a malate decarboxylating oxidoreductase (‘MALDDH’) reaction was missing in the model, preventing growth on D-malate. Upon addition of this reaction, growth was restored in the model. However, no homologous gene could be identified in EcN that could be linked to this reaction. Presence of this ability without an identifiable genomic background gives the opportunity to discover new genes related to D-malate metabolism [38].
The previous genome-scale reconstructions of E. coli strains contained biosynthetic pathways for secondary metabolites like enterobactin. EcN produces a wider range of secondary metabolites that also includes salmochelins (glucosylated enterobactin), yersiniabactin, aerobactin, and colibactin (Additional file 3: Fig S2) [15]. Here, we expanded the representation of biosynthetic pathways for enterobactin, salmochelins, aerobactin, yersiniabactin, and colibactin. This process involved defining new reactions representing various biochemical steps common to non-ribosomal peptide synthetase (NRPS) and polyketide synthase (PKS) assembly, similar to the descriptions in the GEM of Streptomyces coelicolor [39] (Additional file 3: Fig S2). The various chemical steps involved in these pathways were added from available pathways for enterobactin, colibactin, salmochelins, and aerobactin in the MetaCyc database. In the case of yersiniabactin, however, the entire pathway was reconstructed using the literature [40]. In addition to biosynthetic reactions, we also included reactions involved in the transport of secondary metabolites and various metal ion import reactions related to them. In total, iHM1533 included 177 curated reactions and 110 metabolites to represent secondary metabolism of EcN (Additional file 2: Data S2).
Next, all metabolites, reactions, and genes were annotated with identifiers from external databases, such as KEGG, BioCyc, MetaNetX, SEED, and CHEBI. Inclusion of these identifiers is important for interoperability between various knowledge bases and multiple omics data types. The locus tag, gene ID, and protein ID were added for all genes, when available in the genbank file, and all disconnected genes and metabolites were removed from the model. Lastly, the biomass objective function was updated using BOFdat (Biomass Objective Function from experimental data) [41]. In the first step, the stoichiometric coefficients for the major macromolecules were defined. Genomic (CP022686.1), transcriptomic [42], and growth data [26] of EcN was used, while MG1655 data was used for the macromolecular composition, proteomics, lipidomics, and maintenance [41], as EcN data was lacking. Coenzymes and inorganic ions were identified in the EcN model and included in the biomass objective function in the second step. The newly defined function was used to update the existing iDK1643 biomass objective function and get the final GEM iHM1533 (Additional file 4).
The curated GEM restored the in silico growth of 1.05 h–1 on synthetic minimal medium with 15 mM glucose as a carbon source (when using iDK1463 and iML1515 respectively 1.07 and 1.33 h–1). The predicted growth rate is higher than the 0.79 ± 0.02 h–1 described in literature on this media [25]. Although the biomass function was adjusted using BOFdat, additional experimental data of EcN is required to further increase accuracy. Additionally, various factors such as constraints on total protein, catalytic rates, and transcriptional regulation were not included in the model.
Phenotype characterisation of EcN
EcN naturally resides in the large intestine. Successful colonisation of the gut environment is dependent on various factors, including nutrient specificity and efficiency. Therefore, we were interested in the range of nutrients EcN can metabolise. Here, we generated phenotype microarray data for this strain. Four different Biolog plates were used, PM 1-4, consisting of 190 carbon, 95 nitrogen, 59 phosphorus, and 35 sulfur nutrient sources. We analysed the phenotype microarray data using DuctApe software that assigns activity indices between 0 and 9 to represent growth on each nutrient [41]. An activity index of 3 and above was considered as growth. We found that EcN could utilise 92 of the 190 carbon sources that included 47 carbohydrates (of 71 total), 24 carboxylic acids (59 total), 14 amino acids (30 total), and others (Fig. 2A/B, Additional file 5: Data S3). For the nitrogen sources, 45 nutrients facilitated growth of EcN that included 19 amino acids (33 total), all 12 of the peptides, and others. EcN could utilise 51 of the 59 phosphorus nutrients including all 7 inorganic and 44 organic sources (52 total). EcN was also capable of growth on all 5 of the inorganic and 21 of the 30 organic sulfur nutrient sources.
Overall, EcN has a wide range of nutrient sources it can utilise. In the large intestine, various nutrients are supplied by ingested food, epithelial and bacterial cell debris, and the mucus lining of the epithelium. This includes, e.g., gluconate from muscle tissues and sugar acids like galacturonate from pectin [42]. The mucus lining is a combination of glycoproteins and glycolipids that provide nutrients to intestinal bacteria, including N-acetylglucosamine, N-acetylgalactosamine, galactose, fucose, sialic acid (N-acetylneuraminic acid), and lesser amounts of glucuronate and galacturonate [43]. EcN is capable of growing on all these compounds as a carbon source (Additional file 5: Data S3). The capacity of EcN to metabolise this range of carbohydrates makes sense when put into perspective with its origin in the gut.
We compared our dataset with previously published Biolog data of EcN (Fig. 2B) [26]. In our experiment, EcN was capable of growth on six carbon sources (adonitol, dulcitol, pectin, Tween 20, Tween 40, and a-hydroxybutyric acid) that did not permit growth of EcN in the published dataset (Additional file 5: Data S3). However, the latter could grow on b-hydroxybutyric acid, which was not the case in our work. For the non-carbon source profiling PM plates, a different carbon source was used in the published experiment, 2 M succinate/200 µM citrate, compared to 2 M pyruvate used in our experiment. On succinate, EcN was capable of metabolising 57 nitrogen, 50 phosphorus, and 35 sulfur nutrient sources. In comparison, EcN could metabolise on pyruvate 45 nitrogen (40 shared, 5 only with pyruvate, 17 only with succinate), 51 phosphorus (49 shared, 2 on pyruvate, 1 on succinate), and 26 sulfur (17 shared, 9 only with succinate) sources (Additional file 5: Data S3). We note that the comparison of growth prediction here against Kim et al. might also be affected by the cut-off of the activity index used for what is considered growth. Overall, we found that EcN can consume a wide range of nutrients of which many can be found in the gut and that utilisation of nitrogen and sulfur sources specifically can be dependent on the carbon source. This dataset can help with understanding EcN’s role in the gut microbiome and aid future genome engineering endeavours.
Phenotypic characterisation of EcN largely corresponded to iHM1533 model predictions
Next, the phenotype microarray data was compared with growth predictions by iHM1533 to validate the accuracy of the reconstructed model (Fig. 2C). A total of 231 nutrient sources out of 379 on the Biolog plates could be linked to exchange reactions present in the model. EcN was able to grow on 20 nutrients that did not allow growth in the model (Additional file 5: Data S3). Gap filling was used to identify reactions that would enable the model to grow on these compounds from the panreactome of the reference strains, but no reactions were found that could restore growth [37]. For several pathways, e.g., methionine and phenylalanine metabolism, reactions had previously been added from the E. coli reference strains to the model (Fig. 1C). However, the pathways needed for growth on these compounds are not present in any of the models upon which iHM1533 was built, as none of them can grow on these compounds themselves [3]. As a result, these pathways could not be complemented by the reference stain panreactome. The model predicted growth on sucrose, while Biolog data showed the opposite. As EcN does not contain the genes for sucrose catabolism, the reactions SUCtpp, SUCptspp, SUCR, and FFSD were removed from the model [44].
Similar to iDK1463, reactions that are reported to be inactive under non-stressful aerobic conditions (ARGAGMt7pp, CELBpts, CLBtex, DTARTD, LCARS, PTRCORNt7pp, SUCTARTtpp, TARTD, and TARTRt7pp) were deactivated by constraining the flux to zero for this analysis [26]. Additionally, the catabolism of 3-hydroxyphenylacetic acid (3HPA) and 4-hydroxyphenylacetatic acid (4HPA) was deactivated by restricting flow through the reactions HPAtex and 3hoxpactex. Based on homology, genes required for the metabolism of these aromatic compounds are present in EcN, but Biolog data showed an inability of growth in normal aerobic conditions [45].
With the corrected model, growth on a total of 82.3% of nutrients was predicted correctly, which is just above the score of iDK1463 (81.8%) and in line with predictions in other recently published models (68–84%) [6, 46, 47]. We found 122 true positives and 68 true negatives, while 21 nutrients were incorrectly predicted to accommodate growth by the model and growth on 20 nutrients was still not supported in the model (Additional file 3: Table S1). Adonital and pectin, which only sustained growth in our Biolog dataset but not in the previously reported work [26], were among the 20 false-negative predictions. The false-positives included five nitrogen sources that showed very slow growth with an activity between 1 and 2.5 in the Biolog data set (l-Tyrosine, N-Acetyl-d-Glucosamine, N-Acetyl-d-Galactosamine, N-Acetyl-d-Mannosamine, and allantoin). All five sources did enable growth when succinate was used as a carbon source instead of pyruvate (Additional file 5: Data S3). Generally, false-positive predictions may occur if an enzyme is transcriptionally repressed, which is not represented in a metabolic model. Alternatively, an enzyme does not catalyse the designated reaction at a high enough rate under specific growth conditions and is thereby observed as false-positive, as appears to be the case for these five sources [48]. In addition to the Biolog data, growth on various carbon sources described in literature was checked [13, 25, 49]. iHM1533 correctly predicted growth on all 21 carbon sources, including l-arabinose, l-fucose, d-galactose, d-gluconate, and d-mannose (Additional file 6: Data S4). Therefore, the iHM1533 model could correctly predict a large part of the nutrient sustaining growth correctly, which is a prerequisite for making good predictions on behaviour in the gut or as a cell factory.
13C-fluxomics data comparison of EcN to iHM1533 model flux predictions
After comparing the GEM predictions with the Biolog data, the predictive quality of internal fluxes was validated with previously published 13C-fluxomics data. Fluxes determined by 13C experiments with growth on glucose and gluconate were compared to model-predicted fluxes [25]. The media conditions and efflux of acetate were set and internal fluxes were predicted in a sensitivity analysis with flux variability analysis (FVA) in combination with a Parsimoneous enzyme usage Flux Balance Analysis (pFBA). The predicted in silico fluxes and pFBA results were mapped to the experimentally determined fluxes on glucose and gluconate (Fig. 3, Additional file 7: Data S5). The pFBA shows limitations of predicting the fluxes in the tested conditions, for example a higher flux on glucose in the glycolysis (PGI) and a lower flux in the PPP (G6PDH2r). However, when the constraints in the objective function optimisation (biomass formation) were relaxed in the FVA, the internal central carbon flux corresponds more accurately to the experimental data. All experimental data points were within the 90% FVA threshold, and a large part within > 97%. These analyses validate the predictive quality of the model and display its use in prediction of internal fluxes based on media composition.
Essential gene prediction of EcN in gut microbiome media using iHM1533
GEMs can be used to predict essential genes. iHM1533 predicted 156 genes and 230 reactions to be essential in anaerobic gut microbiome media (Additional file 8: Data S6). The reactions that were essential were linked, among others, to cofactor and prosthetic group biosynthesis (53), amino acid metabolism (66), and purine and pyrimidine biosynthesis (17). When compared to E. coli K-12 MG1655, one gene was found to be only essential in MG1655, zupT, and three were found to be only essential in EcN, pgsA, can and leuB. The latter two have been experimentally validated under aerobic conditions [26]. In Kim et al. (2021) the gene argF was additionally mentioned to be only essential in EcN, due to the lack of the gene argI in the genome. However we identified two genes in EcN encoding for an ornithine carbamoyltransferase [24, 50]. Both are annotated in the EcN genome as argF, with locus tags CIW80_16625 and CIW80_16605. During construction, only the first gene was included as argF. Therefore, the second locus tag was included in the model as argI and linked to the same reaction (OCBT).
In addition to single essential genes, we also investigated synthetic lethality. Synthetic lethality arises when knocking out a combination of two or more genes results in cell death while knocking out the individual genes does not. The predictions of synthetic double lethals encompassed a total of 158 gene combinations in aerobic conditions and 157 in anaerobic conditions (Additional file 8: Data S6). The difference was the combination of hemN and hemF, two oxidases catalysing the same step in heme biosynthesis. The knockout of hemN is lethal in anaerobic conditions, as the product of hemF is oxygen-dependent. In aerobic conditions this knockout is not lethal, except when combined with a hemF knockout [51]. Overall, the essential genes differ very little between anaerobic and aerobic conditions. Predictions of these essential genes could aid with and be targets in genome engineering studies in the future, for example in knockout studies or biocontainment strategies.
Quality assessment of iHM1533 using Memote test suite
The final product, iHM1533, was tested using the Memote web application, a standard in the community for metabolic model testing [52] (Additional file 3: Fig S3). The analysis showed an overall score of 89%, which is a combination of the score on consistency and metabolite, reaction, gene, and SBO annotation. The consistency score was 97%. The score was lower than for iML1515, because in the secondary metabolite pathways of EcN a rest group (R) was added to multiple metabolites to represent the protein domains bound to the metabolite, resulting in mass and charge imbalances in iHM1533. Metabolite and reaction annotations scores were 84% and 83%, respectively. Nearly all genes have NCBI identifiers, but lack some other annotations (refseq, kegg, ccds, and hprd), resulting in a score of 68%. SBO terms had a score of 86%.
Further, we compared the Memote score of iHM1533 and iDK1463. The consistency score is the same, as an improvement in this score by the manual curation of the mass and charge balance in iHM1533 is negatively compensated by the imbalances resulting from the rest groups in the secondary metabolites. The annotations added as the last step of the curation resulted in a higher score for iHM1533 in all metabolite, reaction, and gene annotation scores. The SBO annotation was not updated and thereby gave a lower score for iHM1533. Overall the total score is the same, as the SBO score counts double.
Prediction of manipulation targets for secondary metabolites produced by EcN
The updated and validated GEM was used to investigate relationships between primary and secondary metabolic pathways and predict engineering targets for the over-production of secondary metabolites. The parsimonious flux balance analysis showed that iHM1533 can produce all secondary metabolites, except for aerobactin under anaerobic gut microbiome media. The first step of aerobactin biosynthesis involved oxygenation of lysine and thus required oxygen (MetCyc: META:1.14.13.59-RXN) [53, 54]. Therefore we used gut microbiome media with oxygen in order to analyse all secondary metabolite fluxes.
An algorithm called flux scanning with enforced objective function (FSEOF) was applied to identify targets for enhanced secondary metabolite production, while also allowing flux through biomass objective function [55]. The predicted targets from FSEOF analysis included reactions from various primary metabolism subsystems such as glycolysis/gluconeogenesis, various amino acid pathways, nucleotide salvage pathway, alternate carbon metabolism, and cofactor and prosthetic group biosynthesis, among others (Fig. 4, Additional file 3: Table S2). A combined total of 219 reactions that influence secondary metabolite production were predicted (Additional file 9: Data S7). Only 29 of 219 reactions were predicted as common targets for all secondary metabolites with 7 of them from the pentose phosphate pathway subsystem. On the other hand, 98 targets were predicted for only one of the secondary metabolites, including 40 for colibactin (24 of these from membrane lipid metabolism and 3 from methionine metabolism); 10 for salmochelin (3 of these from cofactor and prosthetic group biosynthesis); 30 for aerobactin (7 of these from threonine and lysine metabolism, where lysine is a key substrate for aerobactin); 9 for yersiniabactin (4 of these from methionine metabolism); and 9 for enterobactin (4 of these from nucleotide salvage pathway).
Further, we calculated the slope of flux variation of secondary metabolites against predicted FSEOF target reaction. A positive slope denotes increased flux through secondary metabolite as the flux through target reaction is increased and a negative slope denotes a decrease in secondary metabolite flux as the flux through target reaction is increased (Fig. 4, Additional file 9: Data S7). Reactions such as FBA, PFK, DHORDfum, and SUCDi were among the largest negative slopes common for all secondary metabolites, whereas PPA was one of the few reactions with a positive slope. Although many reactions for cofactor and prosthetic group biosynthesis were predicted as targets, the value of their slope was very low, indicating a minor impact on secondary metabolite biosynthesis. A total of 43 reactions from 10 different amino acid metabolism subsystems were part of the predicted target with mostly positive slopes. Reactions from cysteine and methionine metabolism were largely predicted as targets for yersiniabactin and colibactin biosynthesis. Reactions from tyrosine, tryptophan, and phenylalanine metabolism were predicted as targets for yersiniabactin, enterobactin, and salmochelin biosynthesis. Reactions from threonine and lysine metabolism were predicted specifically for aerobactin, whereas reactions from glycine and serine metabolism were largely predicted for all secondary metabolites but aerobactin. In general, these predictions of targets align with the amino acid substrates required as precursors for respective secondary metabolites.
Lastly, we carried out a biomass sensitivity analysis to measure the impact of the transition from primary to secondary metabolism across the subsystems. For this purpose, we created series of conditions where biomass reaction bounds were constrained to series of values between 0 and 100% of the optimal flux, while keeping the secondary metabolite exchange reaction as the objective function. This allowed us to measure reaction fluxes across all reactions at various stages going from no secondary metabolite production (100% biomasss) to maximum secondary metabolite production (0% biomass). The theoretically allowed flux ranges for all reactions at each biomass constraint were calculated using FVA, whereas the optimal solution flux was calculated using pFBA (Additional file 10: Data S8). To visualize the impact of primary to secondary metabolism, we visualised 10 reactions with the lowest slopes and 10 reactions with the highest slopes in the FSEOF analysis for each secondary metabolite (Additional file 3: Fig S4-S8). For example, there were a total of 9 targets predicted for improving aerobactin production from the subsystem of threonine and and lysine metabolism. Lysine is the key amino acid used as a precursor in aerobactin biosynthetic pathway. The predicted fluxes through reactions from threonine and lysine metabolism such as DAPE, SDPDS, DHDPRy, DAPDC increased signifincantly as the model was being optimised for aerobactin production over biomass, whereas the fluxes through SDPTA and ASAD from the same subsystem kept decreasing (Additional file 3: Fig S6). Our analysis thus shows the impact on the fluxes at systems level when the cell shifts from biomass to secondary metabolite production. In summary, this model can be used to investigate the link between primary and secondary metabolism and reactions can be predicted that provide targets for manipulation of secondary metabolites production.