 Methodology article
 Open Access
 Published:
The effects of model complexity and size on metabolic flux distribution and control: case study in Escherichia coli
BMC Bioinformatics volume 22, Article number: 134 (2021)
Abstract
Background
Significant efforts have been made in building largescale kinetic models of cellular metabolism in the past two decades. However, most kinetic models published to date, remain focused around central carbon pathways or are built around ad hoc reduced models without clear justification on their derivation and usage. Systematic algorithms exist for reducing genomescale metabolic reconstructions to build thermodynamically feasible and consistently reduced stoichiometric models. However, it is important to study how network complexity affects conclusions derived from largescale kinetic models built around consistently reduced models before we can apply them to study biological systems.
Results
We reduced the iJO1366 Escherichia Coli genomescale metabolic reconstruction systematically to build three stoichiometric models of different size. Since the reduced models are expansions around the core subsystems for which the reduction was performed, the models are nested. We present a method for scaling up the flux profile and the concentration vector reference steadystates from the smallest model to the larger ones, whilst preserving maximum equivalency. Populations of kinetic models, preserving similarity in kinetic parameters, were built around the reference steadystates and their metabolic sensitivity coefficients (MSCs) were computed. The MSCs were sensitive to the model complexity. We proposed a metric for measuring the sensitivity of MSCs to these structural changes.
Conclusions
We proposed for the first time a workflow for scaling up the size of kinetic models while preserving equivalency between the kinetic models. Using this workflow, we demonstrate that model complexity in terms of networks size has significant impact on sensitivity characteristics of kinetic models. Therefore, it is essential to account for the effects of network complexity when constructing kinetic models. The presented metric for measuring MSC sensitivity to structural changes can guide modelers and experimentalists in improving model quality and guide synthetic biology and metabolic engineering. Our proposed workflow enables the testing of the suitability of a kinetic model for answering certain studyspecific questions. We argue that the modelbased metabolic design targets that are common across models of different size are of higher confidence, while those that are different could be the objective of investigations for model improvement.
Background
Kinetic models of cellular metabolism can provide comprehensive understanding on the dynamics of the cell and its response to environmental changes and perturbations. In depth understanding of cellular metabolism can allow metabolic engineers to tailor cells according to sought specifications and objectives. This could enable the design of cell factories where flux is directed towards the production of biofuels, pharmaceuticals or other specialty chemicals. To be useful though, a kinetic model should represent the dynamics of the cell accurately enough to provide the required studyspecific knowledge [1]. To date, important strides towards building large and genomescale kinetic models of metabolism have been made [2,3,4,5]. Despite the emergence of methodologies for building kinetic models, the research community knows that several challenges remain to be confronted.
With larger and better quality kinetic models, the mathematical representations become increasingly complex. Furthermore, the parameter sensitivities of systems biology models are in general “sloppy” [6]. We have noticed that metabolic models are often built around certain central carbon pathways or, ad hoc reduced models of genomescale metabolic network models (GEMs) [7]. Such models do not account for the full information contained in the GEMs and, the ad hoc reduced models do not come with explicit explanations and justifications on how the model was reduced. Several studies have built kinetic models around ad hoc reduced models and computed Metabolic Sensitivity Coefficients (MSCs) for the system [1, 2, 8,9,10]. MSCs are desirable outputs of the kinetic models as they give insight into control patterns of the cell, assuming that the model is correct and accurate. However, Palsson and Lee showed with smallscale models that network complexity significantly affected the numerical values and the interpretation of MSCs [11]. Their study showed that three different red cell metabolic models produced MSCs that have opposite signs. This suggested that the analysis of incomplete metabolic models could lead to misleading and inaccurate information.
It is important to preserve certain level of detail in a metabolic model in order for its predictions to be realistic. The model needs to include important carbon fluxes from the central carbon up to the biomass building blocks, which also includes the synthetic pathway of interest that we would like to engineer. The fact that energy and redox are used by the synthetic pathway of interest makes it important that we account for “all” the reactions and subsystems that carry significant energy and redox fluxes, such as a detailed tricarboxylic acid (TCA) cycle and electron transport chains (ETCs). Keeping such level of detail is essential in order to avoid making false conclusions from kinetic models.
However, nowadays algorithms for reducing GEMs in a more systematic and complete manner are starting to emerge [7, 12,13,14,15]. DRUM [14] and MinNW [15] are algorithms that allow the reduction of GEMs but, they do not conserve the feasible flux ranges of the model being reduced. The NetworkReducer algorithm aims to reduce the network around certain “protected” metabolites and reactions by iteratively removing reactions that do not obstruct their activity [13]. Nevertheless, the NetworkReducer does not consider alternative subnetworks that could characterize the GEM being reduced. Ataman et al. developed the redGEM and lumpGEM algorithms which allow reduction of GEMs around selected subsystems by retaining linkages and the information captured in GEMs [7, 12]. The algorithm performs consistency checks with the GEM to ensure that the reduced model is consistent in terms of flux profiles, essential genes and reactions, thermodynamic feasible ranges of metabolite concentrations and ranges of Gibbs free energy of reactions. The redGEM and lumpGEM algorithms can be used to build thermodynamically feasible models with different levels of complexity consistent with the GEM for the same chosen subsystems, whilst considering alternative subnetworks that could be feasible. These algorithms open up the possibility to investigate how MSCs are affected by model complexity for consistently reduced models by building kinetic models around them. For further discussions about available model reduction algorithms, we refer the reader to a recent review [16].
This study investigates the effect of kinetic model complexity – in terms of reaction network size – and its effect on metabolic engineering conclusions derived from MSCs. The elements of complexity that are introduced in the models here are based on the choice of reactions and pathways that are brought into the system before reduction. We used the redGEM and lumpGEM algorithms to reduce the E. coli iJO1366 GEM to three different models, namely D1, D2 and D3, encompassing 271, 307 and 327 enzymatic reactions and 160, 188 and 197 metabolites, respectively. The thermodynamic formulation of the stoichiometric models allowed integration of fluxomics and metabolomics data for aerobically grown E. coli (see Additional file 1) [17]. Due to the topological differences between the three models, we proposed a technique for scaling up the flux profile and concentration vector reference steadystates from D1 into the larger models D2 and D3. This scaleup procedure ensures physiological equivalency of the models by assuring that their steadystates are numerically similar. All the three models satisfy thermodynamic constraints and are consistent with the GEM. We used the Optimization and Risk Analysis of Complex Living Entities (ORACLE) workflow to construct populations of kinetic models for D1, D2 and D3 around their scaled reference steadystates. Due to the uncertainty in the kinetic parameters, their largely unknown ranges and the highdimensionality nature of the system, there are multiple possible parameterizations of the kinetic models. Hence, we consider populations of kinetic models in order to account for the multiple possible scenarios arising from uncertainty in parameterization, hereby reducing bias. We fixed kinetic parameters from the smaller model into the larger one to further ensure equivalency of the models and hence a fair comparison. As integral part of the ORACLE workflow, we compute the MSCs for the stable kinetic models. The nested nature [18, 19] of the reduced models allows us to methodically compare the MSCs across the three models. We demonstrate that, even when the model preserve certain minimum assumptions of the real world biological system, MSCs are sensitive to model complexity.
The methodology presented in this manuscript allowed us to study the sensitivity of systematically reduced models of aerobically grown E. coli. The models were specifically designed to be equivalent variants representing the central carbon metabolism, with only incremental changes in the model size that come along as we extend the level connectivity of the involved subsystems. However, this approach could be applied to models reduced in an ad hoc fashion, as well as other physiologies and organisms. The metrics presented can be used to assess the adequacy of a network for metabolic engineering based on MSCs. If the conclusions derived from MSCs—relevant for strain design of a given biological system—are very sensitive to model complexity, the modeler can decide to improve/reassess the model. Hence, this pipeline can serve as a tool to test and ameliorate model quality for metabolic engineering applications.
Results
Reduced E. coli models
We applied redGEM and lumpGEM algorithms [7, 12] to systematically derive nested, reduced, E. coli stoichiometric models (Methods) from the iJO1366 GEM [20]. We selected glycolysis, pentose phosphate pathway (PPP), tricarboxylic acid (TCA) cycle, glyoxylate cycle, pyruvate metabolism and electron transport chain (ETC) as the subsystems (as defined in the iJO1366 GEM [20]) around which reduction was performed to different degrees of connection D, similarly to Ataman et al. [7]. D corresponds to the distance between pairs of selected subsystems. The selected subsystems contain the 12 essential biomass precursors defined by Neidhart et al. [21] and capture the central carbon metabolism of E. coli. Reduced stoichiometric models D1, D2 and D3 interconnect the pairs of subsystems with up to one, two and three reactions, respectively (Fig. 1). Hence, the reactions added by the expansions are entirely based on graphsearch. The D1, D2 and D3 cores were connected to biomass production via lumped reactions, generated by the lumpGEM, to characterize the rest of the GEM (further discussion on lumped reactions around Fig. 2 later in this section).
The additional reactions in D2 include xylose isomerase (XYLI2), hexokinase dfructose (HEX7) and dfructose 6phosphate phosphatase (F6PP), that connect dglucose with dfructose 6phosphate via dfructose. D2 also includes the maltodextrin system which connects the dglucose to dglucose 1phophate via the maltodextrin phosphorylase and maltodextrin glucosidase reactions. In D2, dihydroxyacetone phosphate can react to methyglyoxal, which in turn can react to dLactate, providing increased connectivity between glycolysis and the pyruvate node. Additionally, pyruvate can react to 2succinyl5enolpyruvyl6hydroxy3cyclohexene1carboxylate, which can react to form 2oxoglutarate, thus connecting the TCA cycle with the pyruvate metabolism. D2 also includes three different ways to connect with two reactions from fumarate to laspartate, which—via argininosuccinate, adenylsuccinate and adenylosuccinate—further link the TCA cycle with the ETC. The adenylate kinase (ADK3), nucleosidediphosphate kinase (NDPK1) and nucleosidetriphosphatase (NTP3) enzymes provide D2 model with additional flexibility in the system’s energy metabolism.
D3 has additional reactions enabling the transformation of methylglyoxal into dlactate and llactate. Methylglyoxal is a hub metabolite that provides connectivity between upper glycolysis to the pyruvate node. The pyruvate and phosphoenolpyruvate nodes are connected to the TCA cycle via chorismate. Fruthermore, the glutamine and glutamate synthases provide additional flexibility in allowing conversion between lglutamate and lglutamine. In D3 the presence of AMP nucleosidase (AMPN) provides an additional connection between the PPP and the ETC. However, the expansion from D1 to D2 resulted in more central carbon metabolites that change in connectivity (Fig. 2A) than the expansion from D2 to D3 (Fig. 2B). Several hub metabolites like methylglyoxal, isochorismate, pyruvate, dlactate and 2oxoglutarate change in connectivity between the three models.
Thermodynamicbased variability analysis
Within the thermodynamic formulation [22] of the stoichiometric models D1, D2 and D3, we integrated fluxomics and metabolomics data for aerobically grown E. coli (see Additional file 2). Several assumptions were made on reaction directionalities, based on literature [17, 23,24,25,26], to further constrain the models (Methods). We performed a thermodynamicbased variability analysis (TVA) [27] on D1, D2 and D3 and we found they had 9, 17 and 18 bidirectional reactions, respectively (see Additional file 3). We noticed that allowable TVA ranges for fluxes and concentrations appeared to differ more between D1 and D2, than between D2 and D3. These differences generally occurred around regions where the network expansion added new branching points. For further discussions on this topic, we refer the reader to our supporting documentation (see Additional file 4).
Model equivalency
Despite the inclusion of omics data for aerobically grown E. coli, D1, D2 and D3 remained underdetermined systems, resulting in the existence of multiple alternative steadystates that can characterize the studied E. coli physiology. A representative steadystate is required for the flux profile and for the metabolite concentration vector, to build a kinetic model around the selected steadystate. Furthermore, in light of benchmarking the outputs of kinetic models, the models are required to themselves be as equivalent to each other as possible to allow for an unbiased comparison. Hence, their representative steadystates were kept similar so that the models describe the same operational state of the cell.
Scaling up steadystates
We sampled the flux and the concentration solution spaces for D1 and we used PCA to select representative steadystates (Methods). To preserve equivalency across the kinetic models, it was desirable that the flux profile and the concentration vector steadystates in D2 and D3 resemble the ones selected in D1. The nested nature of the core models generated with redGEM eased the transferability of steadystates across models, allowing us to preserve similar values for fluxes and concentrations for the overlapping reactions of the three models.
We connected the core models to the biomass building blocks (BBBs), as defined by Neidhart et al. [21], via lumped reactions generated with the lumpGEM algorithm by applying approaches developed by Ataman and Hatzimanikatis [12]. A lumped reaction is a reaction that collapses a subnetwork of reactions into one massbalanced reaction. D1–3 had 247, 189 and 196 lumped reactions, respectively. The models’ lumped reactions are indeed not the same across D1–3. Consequently, lumped reactions impose certain stoichiometric constraints that can require flux to pass through alternative metabolic routes within the models. For instance, a BBB can be produced by a completely different lumped reaction (Fig. 2C), as we can generate it via a different subnetwork of reactions in the systems with larger cores. Thus, having distinct lumped reactions results in the redistribution of the flux profiles across models. An example of this is the hub metabolite methylglyoxal that provides new alternatives for lumped reactions in D2 and D3, thus contributing to differences in flux distribution across the models.
We studied the lumped reactions in D1–3 and observed that 103 were common between the models (Fig. 2D). D1, D2 and D3 have 126, 57 and 66 lumped reactions that are unique to themselves. D1 requires considerably more lumped reactions in order to produce the BBBs from the core subsystems. If we consider the lumped reactions as subnetworks of reactions, 474, 453 and 458 reactions are used to build the lumped reactions of D1–3, respectively (Fig. 2E). Interestingly, 446 reactions are common between the pools of reactions that constitute the lumped reactions of D1–3. It may appear unexpected that D3 had more lumped reactions than D2. However, this can occur when more “shorter” lumped reactions—that are composed of lesser reactions—are required to produce a given BBB.
In order to ensure equivalency between D1–3, we proposed a procedure that uses a Mixed Integer Linear Programming (MILP) formulation that imposes similarity between the representative steadystates of the models (Methods). The D2 fluxes of central carbon reactions are within below one percent deviation from the reference flux of D1 (see Additional file 5), except for the exporter of dAlanine (DALAtex) that deviates by 8% (Table 1). The only central carbon fluxes in D3 that deviate from D2 reference flux with more than one percentage are transaldolase (TALA) and xylose isomerase (XYLI2) with 4.5% and 33.2% respectively (Table 1). Other larger deviations occur in transport (periplasm to cytoplasm and extracellular to periplasm) reactions that carry a considerably lower flux such as transporters of lserine, succinate, ltryptophan, ltyrosine, lvaline and zinc (Table 1).
The concentration profile of D2 is within one percent of D1 reference steadystate, except for ADP, CoA, Sdihydroorotate and lglutamine with 16%, 45%, 303% and 94% deviations from D1 (Table 2). On the other hand, the D3 metabolite concentration steadystate is within one percentage from the D2 metabolite concentration vector. The nested nature and the consistency of redGEM and lumpGEM algorithms in GEM reduction allowed the steadystates to be transferred and communicated between models efficiently.
Equivalence in kinetic parameters
We constructed kinetic models around the selected reference steadystates of D1–3 using the ORACLE workflow [3, 28,29,30]. Uniform Monte Carlo sampling of the degrees of saturation of the enzyme active sites allowed us to study the kinetic parameter space, as proposed by Wang et al. [28]. The local stability of the models generated was tested by verifying that the eigenvalues are not positive. We first sampled 50,000 stable kinetic models for D1. To ensure equivalency at kinetic parameter level between D1–3, we adapted the ORACLE workflow to allow fixing the sampled saturation states from one model to another (Methods). From the 50,000 stable D1 kinetic models, we found 96.1% (48,080) to be stable in D2, of which 98.4% (47,299) were stable in D3. We then computed the MSCs for these stable models in order to compare how MCAbased decisions are affected by metabolic network size.
Consistency in MCA across models
Ranking enzymes for flux control
Some fundamental cellular tasks for a given physiology include metabolite excretion, substrate uptake and cellular growth, μ. As we studied the physiology of optimally grown E. coli, we considered control over μ across models to assess the consistency in conclusions based on MSCs. The flux control coefficients (FCCs) of μ were ranked for D1–3 based on their absolute means across stable models. The models were compared pairwise in increasing order of size (i.e. D1 versus D2, and D2 versus D3) to assess the impact of systematic network expansion on MSCs (Fig. 3).
The cellular growth FCCs with respect to glucose6phosphate isomerase (PGI), phosphofructokinase (PFK) and ATP maintenance (ATPM) are the most consistent in terms of sign and magnitude when comparing D1 with D2 (Fig. 3A). Pyruvate kinase (PYK), fructose biphosphate aldolase (FBA) and 2oxogluterate dehydrogenase (AKGDH) are also in agreement in terms of sign but magnitude can differ significantly. Some enzymes have control in D1 but no control in D2, such as ribulose 5phosphate 3epimerase (RPE) and phosphoglycerate mutase (PGM). Others, vice versa, have control in D2 but no control in D1 such as phosphoglycerate kinase (PGK) and glucose 6phosphate dehydrogenase (G6PDH2r). Ribose5phosphate isomerase (RPI), on the other hand, has opposing control on cellular growth in the two models. Differences in FCCs of cellular growth between D1 and D2 suggest that the expansion of D1 to D2 significantly affects the control scheme. When we compare FCC values pairwise between D1 and D2, we note that numerical values can be relatively dissimilar (see Additional file 4 for supporting information). Hence, differences can still be observed after preserving model equivalency.
We then compared top cellular growth FCCs in D2 and D3, which are in great sign and magnitude agreement (Fig. 3B). PGI, PFK and PYK are the top three enzymes in terms of cellular growth control according to both D2 and D3. The consistency between these FCCs suggests that the expansion of D2 to D3 does not affect the control pattern as significantly as the network expansion from D1 to D2. An analogous analysis was carried out for the flux control of glucose uptake and, the excretions of acetate and formate (see Additional file 4 for supporting information), and we observed a similar trend. The differences in control patterns appear to be more significant when expanding from D1 to D2, but of lesser importance when expanding from D2 to D3. This finding could suggest that entire genomescale kinetic models are not necessary to capture the essential physiological features of a cell as long as the model reduction is done systematically around carefully selected subsystems that are pertinent to the study. However, this could also mean that D1 is possibly missing on some information for performing MCA around growth. It is difficult to draw more conclusions as we can only compare what is topologically shared between two models. Clearly, a studyspecific resolution criterion in terms of model size/complexity that has to be met needs to be established before a model is used for further analysis.
MCA consistency across reduced models
As the study above revealed, certain flux control patterns can change significantly between models due to network complexity. We tried to locate, analyze and understand the differences and the similarities in MSCs that occur due to the topological alterations in kinetic model complexity. According to MCA theory, the FCCs conform with the summation theory [31, 32]. We proposed a deviation index (DI) that provides a quantitative measure on how much a reaction’s FCCs differ between two models, postulated from the summation theory (Methods). The DI served as a metric to classify reactions with respect to their consistency in FCCs across the reduced models.
We estimated the DI of 271 common enzymatic reactions when expanding from D1 to D2 to predict deviations in FCCs for the system. Reactions with the lowest DI (0–25 percentile) were mostly from the central carbon metabolism (Fig. 4). The reactions with the highest DI (75–100 percentile) were mostly located in the ETC. The only central carbon metabolism reactions having a high DI were TALA, acetylCoA synthase (ACS), phosphoenolpyruvate synthase (PPS) and NAD malic enzyme (ME1). TALA produces dfructose 6phosphate and, PPS and ME1 involve transformation of pyruvate. dFructose 6phosphate and pyruvate are both central carbon metabolites around which the expansion adds reactions (Figs. 2, 4). ACS is only one reaction away topologically from pyruvate, around which the expansion adds a reaction (Figs. 2 and 4).
We repeated the above analysis for D2 and D3, where we analogously compute the DIs for the 307 common enzymatic reactions (Methods). Similar observations were made for the reactions having low DIs (0–25 percentile) as most were located in central carbon metabolism, within the subsystems around which reduction was performed (Fig. 5). The reactions with higher DIs (75–100 percentile) are predominantly located around the ETC, with the exception of several reactions pertaining to central carbon metabolism. As with the previous analysis of D1 versus D2, ME1 and ACS had high DIs. The D3 expansion adds reactions around pyruvate (Figs. 2, 5), which could explain this observation. Aspartate transaminase (ASPTA), phosphoenolpyruvate carboxykinase (PPCK) and succinate dehydrogenase (SUCDi) from the central carbon metabolism exhibited high DIs (Fig. 5). ASPTA is directly connected via 2oxoglutarate with NADPH glutamate synthase (GLUSy), which is a newly added reaction by the D3 expansion. PPCK is connected via polyenolpyruvate to 3phosphoshikimate 1carboxyvinyltransferase (PSCVT), another add by the D3 expansion. Furthermore, SUCDi is topologically connected with the added reaction ubiquinone lLactate dehydrogenase (lLACD2), as cofactors ubiquinone8 and ubiquinol8 partake in both reactions. Interestingly, periplasmic glucose dehydrogen (GLCDpp), where ubiquinone8 and ubiquinol8 also participate, has a high DI as well. GLCDpp possibly causes its neighbouring reactions gluconokinase (GNK) and dgluconate transport (GLCNt2rpp) to have high DIs too, due to stoichiometric coupling. These observations suggest that alterations in flux split ratios around important branching points—caused by network expansion—could result into higher DIs in reactions at their vicinities.
Importance of flux splitting nodes
Overall, lower DIs were observed for reactions having a higher flux, pertaining to the core central carbon metabolism around which the models D1–3 were reduced (Figs. 4, 5). Since the cores of the reduced models contain the 12 precursor metabolites for biomass, their control patterns were expected to be similar. Stephanopoulos and Vallino point out that metabolic pathways of organisms have evolved over time to resist flux alterations at branching points [33]. The control architecture of an organism is built such that it preserves the flux splitting ratios of essential metabolic nodes. However, if two models have differences in the number of reactions and/or in the flux splitting ratios around an important branching point, the control architecture of the two systems can differ considerably.
Since we studied optimally grown E. coli, it was expected that the D1 to D2 expansion with the addition of XYLI2, F6PP and HEX7 would have influence on control patterns: the flux splitting ratios around the essential biomass precursor dfructose 6phosphate is altered. dFructose 6phosphate is a critical metabolic node for producing cell wall biomass building blocks and is located relatively upstream in the process of glucose catabolism. Altering flux splitting ratios around dfructose 6phosphate will have direct implications on the fate of the carbon flow across the whole network, particularly due to its upstream location.
The expansion from D2 to D3 results in different flux splitting ratios around three biomass precursors: pyruvate, polyenolpyruvate and 2oxoglutarate. Again, we can expect flux control patterns across the models to differ as the proportion of carbon flow directed towards certain biomass building blocks is affected. However, within the central carbon metabolism, these precursors are located relatively downstream to the glucose uptake, compared to for instance dfructose 6phosphate. Consequently, we can expect that these flux splitting ratios have less impact on the growth control of the system than dfructose 6phosphate. If we were discussing the production of certain amino acids of interest, rather than just cellular growth, these ratios could be of higher importance to the analysis. The significance of a metabolic node is strongly subject to the scope of the study. Hence, it is difficult to imagine a “onesizefitsall” model due to the complexity of the problems encountered in metabolic engineering.
Indeed, the importance of a metabolic branching point is very studyspecific as objectives can vary significantly. Had we, for instance, been interested in the study of dlactate production, it would have been essential to include the metabolism of methylglyoxal, dlactate and llactate into the subsystems around which model reduction is performed. However, as we are not interested in the production of dlactate, we are not that concerned about the high DI of dlactate transporter (dLACt2pp) when comparing D2 and D3 (Fig. 5). Furthermore, if we were interested to produce dlactate, it would be essential to consider implication of attempting to deviate flux towards the metabolism of dlactate. If the redirection of flux towards dlactate imposes important changes in the flux splitting ratios of significant metabolic nodes of wildtype E. coli, it may be worth considering other organisms that cause fewer modifications in flux distribution [33, 34].
Study of uncertainty in MCA
The MSCs of D1–3 were further studied by comparing their absolute deviations in the FCCs. We considered with respect to the central carbon subsystems to find which central carbon enzymes contributed in most uncertainty across the networks. The FCCs of reactions in the glycolysis (Fig. 6A, B) appear to have most absolute deviation stemming from enzymes in the glycolysis and in the PPP. In both comparisons (Fig. 6A, B), glycolysis contributes the most to this deviation. However, in the expansion from D2 to D3 (Fig. 6B), this contribution to the deviation is of a considerably smaller magnitude than in the expansion from D1 to D2 (Fig. 6A). Again, the additional connections around dfructose 6phosphate (Figs. 1 and 2) when expanding from D1 to D2 could explain this. Differences in flux splitting ratio around dfructose 6phosphate affect the redistribution of the flux in the network and hence the control pattern. Generally, reactions with a larger flux exhibit less absolute deviations in their FCCs. This parallels the observation that central carbon reactions carrying higher flux are perhaps more rigid in control patterns.
We perform a parallel analysis on FCCs of PPP reactions and similar observations were made. In the expansions from D1 to D2 (Fig. 7A) and D2 to D3 (Fig. 7B), glycolysis contributed the most to the absolute deviation of the FCCs of PPP reactions. However, the contribution of the glycolysis enzymes was considerably smaller in magnitude in the expansion from D2 to D3 (Fig. 7B) than in the one from D1 to D2 (Fig. 7B). Again, reactions carrying higher flux have less absolute deviation in their FCCs between the pairs of models. We analyzed FCCs individually in terms of absolute deviation (see Additional file 6), for both pairs D1 and D2 as well as, D2 and D3. PGI, TPI and PFK were the top three central carbon enzymes that resulted in the most absolute difference in flux control across the network. From the PPP enzymes, RPI resulted in the most absolute deviation in flux control. We also recall that RPI had signwise opposing control on cellular growth in the comparison of D1 and D2 (Fig. 3A). Due to the highly nonlinear nature of the studied systems, it is difficult to make direct conclusions on the causality of the observed deviations in control patterns of the networks. Most of the deviations were observed amongst peripheral transport reactions, rather than central carbon metabolism (see Additional file 6). Nevertheless, we could still find metabolic engineering decisions relevant to our study, independent of the complexity based on MCA outputs (Fig. 3).
Discussion
Using a kinetic model that is lacking adequate complexity level can result in modelers making false prediction. However, systematic assessment of the impact of model complexity on the conclusions derived from a kinetic model of metabolism has not been carried out in literature to date. Many degrees of freedom exist as the exact metabolic flux distribution, metabolite concentration levels, kinetic mechanisms and the required kinetic mechanism parameters are not fully characterized for biological systems. This multiplicity in sources of uncertainty makes it difficult to study the impact of model complexity on kinetic model conclusion. Hence, model equivalence has to be preserved in order to study the effects of network size on these conclusions. Additionally, certain minimum level of model complexity, such as network parts carrying important carbon flux and redox potential, is required in order for the model predictions to be realistic. We hereby address these issues by demonstrating the effect of network size on conclusions derived from systematically reduced kinetic models, whilst conserving maximum model equivalence.
In this work we study the impact of model complexity on the metabolic engineering decisions derived from MSCs. The redGEM and the lumpGEM algorithms were used to consistently reduce the E. coli iJO1366 GEM. Omics data for the physiology of optimally grown E. coli was integrated into the reduced stoichiometric models. The nested nature of the reduced models assisted us in the development of a workflow allowing to preserve maximum equivalence between the flux profile and metabolite concentration steadystates. The ORACLE framework was used to generate populations of stable kinetic models around these reduced stoichiometric models. Our workflow ensured that we preserve equivalency amongst the populations of the kinetic parameters for the stable kinetic models. The MSCs were computed within the ORACLE framework for the populations of stable kinetic models. Analysis of the MSCs, revealed that we can derive contextspecific metabolic engineering conclusions that are independent of the model’s complexity, as long as the reduction is performed consistently.
The “usefulness” of a kinetic model is highly dependent on the objectives of the study being undertaken. We selected the subsystems for the GEM reduction such that we: (1) cover the essential biomass precursor metabolites according to Neidhart as we focused primarily on cellular growth control and, (2) that we capture the ETC essential to account for redox potentials.
To isolate and study the effect of model size on MSCs we sought to generate models that share the same backbone in terms of subsystems that carry significant fluxes and are central to carbon energy and redox metabolism but differ in the degree of connectivity of these subsystems. The motivation behind this selection for this study lays in our general approach for the systematic generation of context specific models: at first we include all the subsystems that are relevant to the study at hand, and then we investigate to what extent additional levels of connectivity impact the model sensitivity characteristics. The addition of reactions around dfructose 6phosphate when expanding from D1 to D2 appeared to significantly affect growth control patterns (Fig. 3A). However, the expansion from D2 to D3 had considerably less impact as top cellular growth FCCs are consistent (Fig. 3B), which suggests that D2 could be a reasonable model choice in terms of complexity for predicting growth control patterns. As dfructose 6phosphate is an essential precursor for cell wall fabrication, a network expansion affecting flux distribution around it can be expected to have significant impact on cellular growth control structure. Hence, it is essential to consider the importance of certain metabolic nodes with respect to the study goals in order to ensure no information is lost in the reduction. Again, importance of a metabolic node is strongly influenced by the nature and the objectives of the analysis.
The MCA summation theorem was used to postulate a deviation index (DI) that gave a numerical indication on the consistency of the FCCs with respect to a reaction. Most of the reactions around central carbon metabolism, carrying a higher carbon flux, appeared to have lower DIs. Flux control for reactions with larger fluxes were more robust, particularly if the number of connecting reactions did not change between models for the metabolites participating in the reaction. The larger DIs were noted in the ETC and peripheral reactions. Nevertheless, the consistency in the control patterns of network regions that carry larger carbon flux was consistent across the reduced models when the DI was low, suggesting that their MSCbased predictions are independent of the network complexity. Thus, the DI can be used to study the structural robustness of a kinetic model.
We could argue that the larger the kinetic model is, the more confident and robust the metabolic engineering decisions derived from the model are. However, using our methodology, a modeler can use the DI as an indicator of structural robustness of the system to assess the confidence and quality of the model’s metabolic engineering predictions. In this context we suggest that the experimental design should first target the steps that are consistently better targets across different sized models. Next, we should focus on identifying to what reactions the model predictions are more sensitive to in the larger models, allowing the identification of changes that are responsible for the redistribution of control across the system. Such investigation will serve as a focused analysis for modeling and experiments – understanding which new reactions and pathways, when added during size increase, impact the control distribution in the reference pathways can provide a great insight on how structural changes in a biological network change its function, beyond single enzyme over/downregulation.
Conclusions
Failing to preserve certain level of model complexity when constructing kinetic models can result in erroneous conclusions. Model reduction – whether ad hoc or systematic – is a necessary step when constructing kinetic models and could result in false predictions if not done appropriately. To our knowledge, we propose for the first time a workflow for systematically constructing largescale kinetic models and tailor their size to match the requirements of the studied biological system. We suggested the deviation index as a metric that highlights differences across model MSCs and serves as an indicator in testing kinetic model size adequacy. The nested nature of the reduced models enabled the maintenance of maximum equivalence between the steadystates and kinetic parameters of the populations of kinetic models. This allowed us to assess the impact of model size on the MSCbased conclusions. We showed that despite consistent model reduction and preserving model equivalency, control coefficients can be significantly affected by network size. However, our method can be used to study and assess the adequacy of models based on control coefficients. As systematic model reduction algorithms gain momentum in the field, we hope to pave a path towards building more robust and transferable kinetic models for the community. Classical statistical model assessment tools could be additional avenues to explore when studying structural robustness of models.
Methods
We developed a workflow for building consistently reduced kinetic models from a genomescale metabolic model (Fig. 8). We used the redGEM algorithm to construct core models of increasing network size from the E. coli iJO1366 genomescale model. The lumpGEM algorithm was used to generate lumped reactions for the biosynthesis of biomass building blocks (BBBs) for these models. We used thermodynamicbased variability analysis (TVA) [27] to study the flexibility of the models. We proposed a procedure for scaling up the flux and concentration steadystates from one model to another one using the MILP formulation. The ORACLE framework was enhanced, allowing us to keep parametric equivalency between the populations of kinetic models around the steadystates of the reduced models. These steps are further detailed below.
Model reduction
The stoichiometry of the core networks was defined with the redGEM algorithm, which reduces systematically genomescale model reconstructions of metabolism [7]. The E. coli iJO1366 genomescale model was reduced, with aerobic minimal media, glucose as the sole carbon source, and the selected starting subsystems corresponding to central carbon metabolism (glycolysis/gluconeogenesis, citric acid cycle, pentose phosphate pathway, pyruvate metabolism, and glyoxylate metabolism). We incorporated all the reactions that use metabolites of the quinone/quinol pools (ubiquinone, ubiquinol, menaquinone, menaquinol, 2dimethyl menaquinone and 2dimethyl menaquinol) as the electron transport chain subsystem in order to account for the energy metabolism of the system. redGEM allows the user to define a degree of connection, D, to define the level of connectivity of the core. D is an input parameter of the redGEM and lumpGEM. D corresponds to the number of reaction required to connect the pairs of metabolites between starting subsystems, as defined in [7]. For the purpose of this study we sought to generate nested models that all share the same topology of the central carbon metabolism, but differ only in terms of the reactions that connect the abovementioned subsystems. We generated core networks with a D of 1, 2 and 3, which gave rise to models D1, D2 and D3 respectively. The lumpGEM algorithm [12] was used to generate lumped reactions for the biosynthesis of the BBBs for these core networks. Lumped reactions are subnetworks of reactions composed of noncore reactions that can be used to produce a BBB. Alternative lumped reactions were kept for each of the BBBs. Reactions that could not carry flux were considered as blocked and were removed.
For some of intracellular metabolites, a corresponding transport reaction has not been biochemically characterized and does not appear in the E. coli iJO1366 and in our reduced model. However, these metabolites, unless they are highly polar or very large, are subject to passive diffusive transport through the cell membrane. Therefore, we explicitly added transport reactions for these metabolites that operate at least at basal level (10^{−6} mmol/(gDW*h)).
redGEM and lumpGEM algorithms [7, 12] have been made available under the following GitHub repository: https://github.com/EPFLLCSB/redgem.
Flux directionality assumptions
As we model the same aerobically grown E. coli physiology as in our previous study [35], we make these same directionality assumptions for several bidirectional reactions:

FBA in midlower glycolysis operates towards catabolism [23].

Magnesium and phosphate transporters are both set towards uptake [24, 25].

Acetate kinase (ACKr) and phosphotransacetylase (PTAr) operate towards acetate production, as acetate is a byproducts [17].

SuccinylCoA synthetase (SUCOAS) operates towards producing succinate [17].

Polyphosphate kinases (PPK2r, and PPKr) are set towards the polyphosphate polymerization [26].
These directionality assumptions were made in agreement with the modeled physiology. Nevertheless, our workflow could be applied to other physiologies and under other directionality assumptions.
Thermodynamic analysis
The available fluxomics and metabolomics data for the optimal growth of E. coli under aerobic conditions and minimal media was integrated in our models. The MILP formulation of the thermodynamicsbased flux analysis was used to implement these data into D1, D2 and D3. Since the models were used to build kinetic models, it was undesirable for reactions to be at thermodynamic equilibrium, which would result in them having equal backward and forward fluxes. We imposed MILP constraints to ensure that the thermodynamic displacement, Γ [28, 31, 36], is not at equilibrium. For reactions near equilibrium Γ ≈ 1. This constraint ensures that we do not have reactions that have net fluxes equal to zero in our system.
Software used for performing thermodynamicbased flux analysis and TVA has been published [22] and is available in MATLAB, and Python 3, on GitHub: respectively, https://github.com/EPFLLCSB/matTFA and https://github.com/EPFLLCSB/pytfa.
Maximum equivalency between steadystates
We sampled the flux space of D1 in order to characterize the solution space without violating physiological, thermodynamic and directionality constraints. The convexity of the solution space enabled us to efficiently sample using the ArtificialCentering HitandRun sampler in the COBRA Toolbox [37, 38]. We sampled 10,000 flux vectors and used Principal Component Analysis (PCA) [39] to select a mean reference state. Nine components covering most of the variance were retained from PCA to select a sample closest to the projected mean. Similarly, we applied this approach for the concentration solution space of this selected flux profile. We selected a unique steadystate for the metabolite concentrations and metabolic fluxes for the demonstration purpose of this study. However, other steadystates could have been used and, in metabolic engineering, alternative steadystates should be considered as they can significantly affect conclusions [35].
In order to make the comparison of the models equitable, we wanted to maintain most similar steadystates between the models. For instance, for D2 we would like the flux vector to be the equal possible to the one from D1. Topological differences in the models make it impossible to have numerically exactly the same flux distribution in larger model for the same reactions. Hence, we take the representative flux from D1 and apply it with percentage relaxation with upper and lower bounds, F^{ub}_{rxn,i} and F^{lb}_{rxn,i} respectively, into D2. Consequently, we use an MILP formulation to minimize the number of violations of flux boundaries that we are trying to impose. For each intracellular reaction that is shared between the two models we create a binary variable z_{rxn, i} so that when it is equal to 1, the constraints that we impose become inactive. We add for each of these reactions the following constraints:
where UB and LB are the upper and lower bounds of the net fluxes NF of the reactions. We minimize the sum of the binaries, z_{rxn, i}, in order to have minimal violation of the flux constraints:
Minimize:
Subject to:
We implied a 1% relaxation to apply and test how many flux constraints we can impose without violation (minimal number of active binary variables z_{rxn, i}). After applying the constraints that are not violating model boundaries of D2, we proceed to sampling the solution space. We selected a sample based on mean PCA as with the representative flux of D1. We then implied in a similar manner the concentration profile from D1 into D2 with a 1% relaxation and sampled the concentration space for this flux profile. We repeat this procedure when scaling up the flux and concentration steadystates from D2 into D3.
Constructing kinetic models
Populations of kinetic models of metabolism can be constructed with any framework that allows the construction of ensembles of models, as discussed in a recent review [40]. We used the ORACLE framework [2, 3, 28,29,30, 36, 41,42,43,44,45] to build 50,000 kinetic models around the steadystates for D1, D2 and D3. Available kinetic properties of enzymes from the literature [46] and the databases [47, 48] were incorporated. Reversible Hill kinetics [49] and convenience kinetics [50] were used for reactions with unknown kinetic mechanism (see Additional file 4 for information about kinetic mechanisms and Additional file 7 for their usage across model reactions). Kinetic mechanisms with no or partial information about their parameter values were sampled within the space of kinetic parameters in the form of degree of saturation of enzyme [28]. We parameterized a population of kinetic models, performed consistency tests [28, 42, 51] and computed the MSCs [28, 52]. For further details on the ORACLE workflow the reader is referred to [2, 3, 28,29,30, 36, 41,42,43,44,45].
We preserved equivalency between populations of kinetic models for D1–3 by fixing the degree of saturation of enzymes from less complex models into the more complex models. We wanted to preserve model equality so that we can fairly compare MSCs of the models. Within the ORACLE framework, we added a feature for fixing the degree of saturation of enzymes. For the parameters that were common between D1 and D2, we fixed the degrees of enzyme saturations from D1 models into D2 models and we sampled the rest of the D2specific parameters uniformly, until we found a stable model. Hence, we preserved equivalency of the kinetic parameters between D1 and D2. Analogously, we repeated this procedure to imply the degrees of enzymes saturations from D2 into D3. Our stratified sampling approach allows the systematic scaling up and sampling of the parameters that we introduce with each network expansion. Consequently, this stratified sampling approach ensures that we focus on uncertainty introduced by the network expansion alone rather than the other common network parts. Ensuring numerical similarity between the parameters that are shared between two models permits this. Other methods such as a topdown approach of transferring parameters—from D3 to D2, and then from D2 to D1—could be used but this would introduce additional uncertainty from the larger topologies into the smaller networks.
MSC deviation index
The FCC is a measure of response of a flux to a perturbation in level of enzyme. We compute a FCC, \(C_{{p_{k} }}^{{v_{i} }}\), as follows:
where v is the flux across a reaction i and p is the concentration perturbation of an enzyme k.
In MCA the FCCs conform with the summation theorem defined in literature [31, 32]. The theorem implies that all the metabolic fluxes are systemic properties of the model and that their control is shared by all the reactions within the system. The summation theorem makes the assumptions that: (1) the parameters for which we compute flux control coefficients are of first order with respect to the flux, and that (2) the sum of a flux’s control coefficients with respect to all the parameters of the system is equal to one. Hence, the summation theory can be defined as follows [53, 54]:
where m corresponds to the number of enzymes of the system that can control a flux v.
We proposed a deviation index (DI) derived from the summation theorem to quantify the discrepancies in control patterns of a flux between two different models. We define DI as:
The value of DI will be zero due the summation theory of MCA if we sum over all m enzymes of our system. However, if we perform this computation of DI for all the enzymes m, except the enzymatic reactions added to the system by a model expansion, we can obtain a deviation from zero. This happens if these enzymatic reactions that were added to the system by the expansion exhibit control over the flux v being studied. Hence, if the DI value is not zero for a model expansion, this could suggest that some of the added reactions are important in terms of control to the system.
Availability of data and materials
The datasets generated and/or analysed during the current study are available in the Zenodo repository, https://doi.org/10.5281/zenodo.4587693.
Abbreviations
 ACS:

AcetylCoA synthase
 ADP:

Adenosine diphosphate
 AKGDH:

2Oxogluterate dehydrogenase
 AMP:

Adenosine monophosphate
 AMPN:

Adenosine monophosphate nucleosidase
 ASPTA:

Aspartate transaminase
 ATP:

Adenosine triphosphate
 ATPM:

Adenosine triphosphate maintenance
 BBB:

Biomass building block
 COBRA:

Constraintbased reconstruction and analysis
 DI:

Deviation index
 DRUM:

Dynamic reduction of unbalanced metabolism
 ETC:

Electron transport chain
 FBA:

Flux balance analysis
 FCC:

Flux control coefficient
 GEM:

Genomescale metabolic reconstruction
 GNK:

Gluconokinase
 MCA:

Metabolic control analysis
 MILP:

Mixed integer linear programming
 MSC:

Metabolic sensitivity coefficients
 NAD:

Nicotinamide adenine dinucleotide
 NADPH:

Nicotinamide adenine dinucleotide phosphate hydrogen
 ORACLE:

Optimization and risk analysis of complex living entities
 PCA:

Principal component analysis
 PFK:

Phosphofructokinase
 PGI:

Glucose6phosphate isomerase
 PGK:

Phosphoglycerate kinase
 PGM:

Phosphoglycerate mutase
 PPCK:

Phosphoenolpyruvate carboxykinase
 PPP:

Pentose phosphate pathway
 PPS:

Phosphoenolpyruvate synthase
 PSCVT:

Polyenolpyruvate to 3phosphoshikimate 1carboxyvinyltransferase
 PYK:

Pyruvate kinase
 RPE:

Ribulose 5phosphate 3epimerase
 RPI:

Ribose5phosphate isomerase
 SUCOAS:

SuccinylCoA synthetase
 TALA:

Transaldolase
 TCA:

Tricarboxylic acid
 TPI:

Triosephosphate isomerase
 TVA:

Thermodynamicbased variability analysis
References
 1.
Saa PA, Nielsen LK. Construction of feasible and accurate kinetic models of metabolism: a Bayesian approach. Sci Rep. 2016;6:29635.
 2.
Andreozzi S, Chakrabarti A, Soh KC, Burgard A, Yang TH, Van Dien S, et al. Identification of metabolic engineering targets for the enhancement of 1,4butanediol production in recombinant E. coli using largescale kinetic models. Metab Eng. 2016;35:148–59. https://doi.org/10.1016/j.ymben.2016.01.009.
 3.
Chakrabarti A, Miskovic L, Soh KC, Hatzimanikatis V. Towards kinetic modeling of genomescale metabolic networks without sacrificing stoichiometric, thermodynamic and physiological constraints. Biotechnol J. 2013;8(9):1043–57. https://doi.org/10.1002/biot.201300091.
 4.
Khodayari A, Maranas CD. A genomescale Escherichia coli kinetic metabolic model kecoli457 satisfying flux data for multiple mutant strains. Nat Commun. 2016;7:13806.
 5.
Khodayari A, Zomorrodi AR, Liao JC, Maranas CD. A kinetic model of Escherichia coli core metabolism satisfying multiple sets of mutant flux data. Metab Eng. 2014;25:50–62.
 6.
Gutenkunst RN, Waterfall JJ, Casey FP, Brown KS, Myers CR, Sethna JP. Universally sloppy parameter sensitivities in systems biology models. PLoS Comput Biol. 2007;3(10):e189.
 7.
Ataman M, Gardiol DFH, Fengos G, Hatzimanikatis V. redGEM: systematic reduction and analysis of genomescale metabolic reconstructions for development of consistent core metabolic models. PLoS Comput Biol. 2017;13(7):e1005444.
 8.
Teleki A, Rahnert M, Bungart O, Gann B, Ochrombel I, Takors R. Robust identification of metabolic control for microbial lmethionine production following an easytouse puristic approach. Metab Eng. 2017;41:159–72.
 9.
Millard P, Smallbone K, Mendes P. Metabolic regulation is sufficient for global and robust coordination of glucose uptake, catabolism, energy production and growth in Escherichia coli. PLoS Comput Biol. 2017;13(2):e1005396.
 10.
Miskovic L, AlffTuomala S, Soh KC, Barth D, Salusjärvi L, Pitkänen JP, et al. A designbuildtest cycle using modeling and experiments reveals interdependencies between upper glycolysis and xylose uptake in recombinant S. cerevisiae and improves predictive capabilities of largescale kinetic models. Biotechnol Biofuels. 2017;10(1):166.
 11.
Palsson BO, Lee ID. Model complexity has a significant effect on the numerical value and interpretation of metabolic sensitivity coefficients. J Theor Biol. 1993;161(3):299–315.
 12.
Ataman M, Hatzimanikatis V. lumpGEM: systematic generation of subnetworks and elementally balanced lumped reactions for the biosynthesis of target metabolites. PLoS Comput Biol. 2017;13(7):e1005513.
 13.
Erdrich P, Steuer R, Klamt S. An algorithm for the reduction of genomescale metabolic network models to meaningful core models. BMC Syst Biol. 2015;9(1):48.
 14.
Baroukh C, MuñozTamayo R, Steyer JP, Bernard O. DRUM: a new framework for metabolic modeling under nonbalanced growth. Application to the carbon metabolism of unicellular microalgae. PLoS One. 2014;9(8):e104499.
 15.
Röhl A, Bockmayr A. A mixedinteger linear programming approach to the reduction of genomescale metabolic networks. BMC Bioinform. 2017;18(1):2.
 16.
Singh D, Lercher MJ. Network reduction methods for genomescale metabolic models. Cell Mol Life Sci. 2020;77(3):481–8.
 17.
McCloskey D, Gangoiti JA, King ZA, Naviaux RK, Barshop BA, Palsson BO, et al. A modeldriven quantitative metabolomics analysis of aerobic and anaerobic metabolism in E. coli K12 MG1655 that is biochemically and thermodynamically consistent. Biotechnol Bioeng. 2014;111(4):803–15.
 18.
Schuster S, Kahn D, Westerhoff HV. Modular analysis of the control of complex metabolic pathways. Biophys Chem. 1993;48(1):1–17.
 19.
Bruggeman FJ, Westerhoff HV, Hoek JB, Kholodenko BN. Modular response analysis of cellular regulatory networks. J Theor Biol. 2002;218(4):507–20.
 20.
Orth JD, Conrad TM, Na J, Lerman JA, Nam H, Feist AM, et al. A comprehensive genomescale reconstruction of Escherichia coli metabolism—2011. Mol Syst Biol. 2011;7(1):535.
 21.
Neidhardt FC, Ingraham JL, Schaechter M. Physiology of the bacterial cell: a molecular approach: Sinauer Associates Sunderland, MA; 1990.
 22.
Salvy P, Fengos G, Ataman M, Pathier T, Soh KC, Hatzimanikatis V. pyTFA and matTFA: a python package and a Matlab toolbox for thermodynamicsbased flux analysis. Bioinformatics. 2018;1:3.
 23.
Cooper R. Metabolism of methylglyoxal in microorganisms. Annu Rev Microbiol. 1984;38(1):49–68.
 24.
Nelson DL, Kennedy EP. Transport of magnesium by a repressible and a nonrepressible system in Escherichia coli. Proc Natl Acad Sci. 1972;69(5):1091–3.
 25.
Rosenberg H, Gerdes R, Chegwidden K. Two systems for the uptake of phosphate in Escherichia coli. J Bacteriol. 1977;131(2):505–11.
 26.
Kumble KD, Ahn K, Kornberg A. Phosphohistidyl active sites in polyphosphate kinase of Escherichia coli. Proc Natl Acad Sci. 1996;93(25):14391–5.
 27.
Henry CS, Broadbelt LJ, Hatzimanikatis V. Thermodynamicsbased metabolic flux analysis. Biophys J . 2007;92(5):1792–805.
 28.
Wang L, Birol I, Hatzimanikatis V. Metabolic control analysis under uncertainty: framework development and case studies. Biophys J. 2004;87:3750–63.
 29.
Wang LQ, Hatzimanikatis V. Metabolic engineering under uncertainty—II: analysis of yeast metabolism. Metab Eng. 2006;8(2):142–59. https://doi.org/10.1016/J.Yinben.2005.11.002.
 30.
Wang LQ, Hatzimanikatis V. Metabolic engineering under uncertainty. I: framework development. Metab Eng. 2006;8(2):133–41. https://doi.org/10.1016/J.Ymben.2005.11.003.
 31.
Heinrich R, Schuster S. The regulation of cellular systems. New York: Chapman & Hall; 1996.
 32.
Kacser H, Burns J, editors. The control of flux. Symp Soc Exp Biol; 1973.
 33.
Stephanopoulos G, Vallino JJ. Network rigidity and metabolic engineering in metabolite overproduction. Science. 1991;252(5013):1675–81.
 34.
Bailey JE. Toward a science of metabolic engineering. Science. 1991;252(5013):1668–75.
 35.
Hameri T, Fengos G, Ataman M, Miskovic L, Hatzimanikatis V. Kinetic models of metabolism that consider alternative steadystate solutions of intracellular fluxes and concentrations. Metab Eng. 2019;52:29–41.
 36.
Miskovic L, Hatzimanikatis V. Modelling of uncertainties in biochemical reactions. Biotechnol Bioeng. 2011;108:413–23.
 37.
Schellenberger J, Que R, Fleming RMT, Thiele I, Orth JD, Feist AM, et al. Quantitative prediction of cellular metabolism with constraintbased models: the COBRA Toolbox v2.0. Nat Protoc. 2011;6:1290–307. https://doi.org/10.1038/nprot.2011.308.
 38.
Becker SA, Feist AM, Mo ML, Hannum G, Palsson BO, Herrgard MJ. Quantitative prediction of cellular metabolism with constraintbased models: the COBRA Toolbox. Nat Protoc. 2007;2:727–38. https://doi.org/10.1038/nprot.2007.99.
 39.
Jolliffe I. Principal component analysis. Wiley StatsRef: statistics reference online. Wiley; 2014.
 40.
Saa PA, Nielsen LK. Formulation, construction and analysis of kinetic models of metabolism: a review of modelling frameworks. Biotechnol Adv. 2017;5:17.
 41.
Soh KS, Miskovic L, Hatzimanikatis V. From network models to network responses: integration of thermodynamic and kinetic properties of yeast genomescale metabolic networks. FEMS Yeast Res. 2012;12:129–43.
 42.
Miskovic L, Hatzimanikatis V. Production of biofuels and biochemicals: in need of an ORACLE. Trends Biotechnol. 2010;28(8):391–7.
 43.
Miskovic L, Tokic M, Savoglidis G, Hatzimanikatis V. Control theory concepts for modeling uncertainty in enzyme kinetics of biochemical networks. Ind Eng Chem Res. 2019;58(30):13544–54.
 44.
Tokic M, Hatzimanikatis V, Miskovic L. Largescale kinetic metabolic models of Pseudomonas putida KT2440 for consistent design of metabolic engineering strategies. Biotechnol Biofuels. 2020;13(1):1–19.
 45.
Hameri T, Boldi MO, Hatzimanikatis V. Statistical inference in ensemble modeling of cellular metabolism. PLoS Comput Biol. 2019;15(12):56.
 46.
Segel IH. Enzyme Kinetics. 1975.
 47.
Schomburg I, Chang A, Placzek S, Sohngen C, Rother M, Lang M, et al. BRENDA in 2013: integrated reactions, kinetic data, enzyme function data, improved disease classification: new options and contents in BRENDA. Nucleic Acids Res. 2013;41(Database issue):D764–72. https://doi.org/10.1093/nar/gks1049.
 48.
Wittig U, Kania R, Golebiewski M, Rey M, Shi L, Jong L, et al. SABIORKdatabase for biochemical reaction kinetics. Nucleic Acids Res. 2012;40(D1):D790–6. https://doi.org/10.1093/Nar/Gkr1046.
 49.
Hofmeyr J, CornishBowden A. The reversible Hill equation: how to incorporate cooperative enzymes into metabolic models. Comput Appl Biosci. 1997;13:377–85.
 50.
Liebermeister W, Klipp E. Bringing metabolic networks to life: convenience rate law and thermodynamic constraints. Theor Biol Med Model. 2006. https://doi.org/10.1186/17424682341.
 51.
Miskovic L, Tokic M, Fengos G, Hatzimanikatis V. Rites of passage: requirements and standards for building kinetic models of metabolic phenotypes. Curr Opin Biotechnol. 2015;36:1–8.
 52.
Hatzimanikatis V, Bailey JE. MCA has more to say. J Theor Biol. 1996;182(3):233–42.
 53.
Heinrich R, Rapoport TA. A linear steadystate treatment of enzymatic chains. Eur J Biochem. 1974;42(1):89–95. https://doi.org/10.1111/j.14321033.1974.tb03318.x.
 54.
Kacser Ha, Burns J, editors. The control of flux. Symp Soc Exp Biol; 1973.
Acknowledgements
Not applicable.
Funding
This work was supported by funding from the Ecole Polytechnique Fédérale de Lausanne (EPFL), the 2015/313 ERASysAPP RobustYeast Project funded through SystemsX.ch, the Swiss Initiative for Systems Biology evaluated by the Swiss National Science Foundation, and the Swiss National Science Foundation grant 315230_163423. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author information
Affiliations
Contributions
TH, GF and VH conceived the approach. TH developed the software and performed the analysis of the results. TH, GF, and VH wrote the manuscript. The author(s) read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1.
MATLAB models of D1, D2 and D3. Models used for obtaining thermodynamically feasible steadystates for constructing populations of kinetic models. Software for reading models is available on GitHub: https://github.com/EPFLLCSB/matTFA.
Additional file 2.
Fluxomics and metabolomics experimental data incorporated in the model for optimally grown E. coli. Table with the flux boundaries in mmol/gDW/h and the concentration boundaries in log(mM) that were applied as constraints to the D13 TFA models based on the experimental data reported by McCloskey et al.
Additional file 3.
Thermodynamicsbased variability analysis of models. Spreadsheet with the list of metabolites and reactions inside the models with variability analysis of flux and metabolite concentrations. The lumped reactions generated with the redGEM and lumpGEM algorithms begin with LMPD. Reactions added to the E. coli iJO1366 for passive diffusive transport start with TransFlux. The rest of the reactions come from E. coli iJO1366.
Additional file 4.
Document with additional figures, details about methods and further analysis carried out in this study. Thermodynamicbased variability analysis was performed to study the feasible ranges of metabolic flux across central carbon reactions for D13 models. Flux control coefficients for other reactions were analyzed to see how they are affected by model complexity. Mathematical formulations of the kinetic mechanisms describing the reactions in the kinetic models are also provided.
Additional file 5.
Flux and concentration steady states. Spreadsheet providing metabolic flux and concentration reference steadystates across the models with a comparative study of these steady states between pairs of models.
Additional file 6.
Analysis of absolute deviations in means of flux control coefficients for the entire systems. The mean flux control coefficient \(C_p^v\) is first computed based on the populations of 50’000 models for D13 for all enzymatic reactions with respect to all the enzymes. The absolute deviation between these means of control coefficients is compared pairwise for D1 and D2, and D2 and D3. The vertical corresponds to the reactions v and the horizontal to the enzymes p. A large deviation indicates a discrepancy in the prediction of the model for that given control coefficient \(C_p^v\).
Additional file 7.
Kinetic mechanism allocation. Reaction mechanisms assigned for the reactions of D1, D2 and D3 models. The assigned reaction mechanisms in this file refer to the kinetic mechanisms for which a mathematical description is given in Additional file 4.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Hameri, T., Fengos, G. & Hatzimanikatis, V. The effects of model complexity and size on metabolic flux distribution and control: case study in Escherichia coli. BMC Bioinformatics 22, 134 (2021). https://doi.org/10.1186/s1285902104066y
Received:
Accepted:
Published:
Keywords
 Metabolic networks
 Kinetic model
 Metabolic control analysis
 Model complexity
 Model reduction