LK-DFBA: a linear programming-based modeling strategy for capturing dynamics and metabolite-dependent regulation in metabolism

Background The systems-scale analysis of cellular metabolites, “metabolomics,” provides data ideal for applications in metabolic engineering. However, many of the computational tools for strain design are built around Flux Balance Analysis (FBA), which makes assumptions that preclude direct integration of metabolomics data into the underlying models. Finding a way to retain the advantages of FBA’s linear structure while relaxing some of its assumptions could allow us to account for metabolite levels and metabolite-dependent regulation in strain design tools built from FBA, improving the accuracy of predictions made by these tools. We designed, implemented, and characterized a modeling strategy based on Dynamic FBA (DFBA), called Linear Kinetics-Dynamic Flux Balance Analysis (LK-DFBA), to satisfy these specifications. Our strategy adds constraints describing the dynamics and regulation of metabolism that are strictly linear. We evaluated LK-DFBA against alternative modeling frameworks using simulated noisy data from a small in silico model and a larger model of central carbon metabolism in E. coli, and compared each framework’s ability to recapitulate the original system. Results In the smaller model, we found that we could use regression from a dynamic flux estimation (DFE) with an optional non-linear parameter optimization to reproduce metabolite concentration dynamic trends more effectively than an ordinary differential equation model with generalized mass action rate laws when tested under realistic data sampling frequency and noise levels. We observed detrimental effects across all tested modeling approaches when metabolite time course data were missing, but found these effects to be smaller for LK-DFBA in most cases. With the E. coli model, we produced qualitatively reasonable results with similar properties to the smaller model and explored two different parameterization structures that yield trade-offs in computation time and accuracy. Conclusions LK-DFBA allows for calculation of metabolite concentrations and considers metabolite-dependent regulation while still retaining many computational advantages of FBA. This provides the proof-of-principle for a new metabolic modeling framework with the potential to create genome-scale dynamic models and the potential to be applied in strain engineering tools that currently use FBA.

We describe here the implementation of our modified form of dynamic flux balance analysis (DFBA) [1]. It takes as input two sets of information. The first set comprises the constraints and objective from FBA: a stoichiometric matrix describing the relationship between metabolites and fluxes in the model, a set of upper and lower bounds on metabolic fluxes, and an objective function specifying the flux(es) the network tries to locally maximize or minimize. To these, we add metabolite concentration initial conditions, a time interval, a parameter describing the number of segments into which the simulation interval is to be evenly divided, and a list of regulatory interactions (and the corresponding parameters to describe them).
We developed a procedure in MATLAB to automatically translate a standard FBA model into an LK-DFBA model, and then solve the resulting optimization problem. This procedure works by taking as input the original FBA model, plus the inputs specified above. This code has been made publicly available on GitHub at https://github.com/gtStyLab/lk-dfba.

Discretizing the Time Interval
Following the basic template of DFBA [1], the simulation interval is divided into nT segments, as shown in Fig. S1E. By our convention, metabolite concentrations are represented at the time points separating the intervals, and fluxes are represented over the interval between time points. Initial conditions for metabolite concentrations specify the concentrations at the time point prior to the first interval.

Stoichiometry and Pooling Fluxes
The mass balances on a set of nm metabolites, x ⃑ , can be represented by the system of equations is the accumulation or depletion of metabolites in the system, S is the stoichiometric matrix describing the connectivity of nm metabolites and nv fluxes in the metabolic network, and v ⃑ is a vector of the nv enzymatic rates through the metabolic network, i.e. the flux distribution.
In FBA, dx ⃑ dt is assumed to be zero, and the linear equation 0 = Sv ⃑ applies. Combined with a linear objective function z = c T v ⃑ where c i specifies the weight of flux v i in the objective (e.g. to maximize growth rate, set so that c biomass = 1 and all other c i = 0) and bounds v ⃑ LB ≤ v ⃑ ≤ v ⃑ UB , an LP can be specified as max In LK-DFBA, we relax the steady-state assumption, working from dx ⃑ dt = Sv ⃑ Moving the dx ⃑ dt term to the right side and adding it to the solution vector, gives us the system shown in Fig. S1B, where A is the (nm × (nm+nv)) augmented stoichiometric matrix and w ⃑⃑ ⃑ is the ((nm+nv) × 1) augmented flux vector, and using the "pooling flux" nomenclature of Covert et al. in iFBA [2], v ⃑ p = dx ⃑ dt i.e., we will describe dx i dt as the pooling flux v ⃑ p,i for metabolite x i . This augmented stoichiometric constraint will apply over each segment of the discretized interval, producing a set of nm • nT constraint equations, 0 = Aw ⃑⃑ ⃑ (t k ) where w ⃑⃑⃑ (t k ) is the augmented flux vector w ⃑⃑⃑ evaluated at the interval ending at t k and nT is the number of segments into which the overall time interval has been discretized.

Difference Equations
Concentrations are explicitly represented in the model, and metabolite dynamics are incorporated by integrating metabolite concentrations over each interval using a difference equation and the corresponding pooling flux term (i.e. the to produce a series of nm • nT constraint equations, as shown in Fig. S1E.

Constant Constraints on Concentration and Flux Values
As in FBA, lower and upper bounds on system fluxes are provided and apply to the flux distribution at each interval. Typically, the upper bound constraints on irreversible internal system fluxes (and both bounds on reversible reaction fluxes) are expected to be inactive and are set at a large nominal value to guarantee the space is bounded. If upper bounds on system fluxes are known, they can be implemented into the constraints and represent saturation. Pooling fluxes are given nominal bounds as well; due to limitations on the product • combined with constraints on concentrations and elsewhere in the system, it is expected that the nominal bounds will not act as active constraints.
Like bounds on flux values, constraints bounding concentrations can be set by the user, but generally it is expected that concentrations are strictly positive. If a concentration is known to be at a fixed quantity, the upper and lower bound can be set accordingly.

Linearized Kinetics Constraints
The key feature of LK-DFBA is the addition of linear equations to describe constraints in which fluxes are controlled by metabolites, as is the case in circumstances ranging from mass action kinetics to allosteric regulation (on short time scales) or transcriptional regulation (on longer time scales). Any dependence of flux on metabolite concentrations is implemented in this manner, and this turns out to be a critical requirement for enabling relevant dynamics in the model. These constraints are specified by a list of nr mappings. Corresponding to each mapping is a pair of parameters ( , ) such that for mapping between "controller" metabolites { } and "target" fluxes { } , ∑ , ( +1 ) ≤ (∑ , ( )) + , where , is a target flux in { } , , is a controller metabolite in { } , and ( , ) are the parameters describing the linear kinetics constraint. When > 0, this interaction produces a promotional effect, and when < 0, this interaction has an inhibitory effect. Applied over the whole discretized time course, this produces a total of nr • nT kinetics constraint equations. Examples of these mappings are shown in Fig. S1C.
These constraints allow us to not only represent interactions such as allosteric regulation, but also to linearly approximate the dependence of enzyme activity on its substrate concentration. Consider the case of the positive regulator shown in Fig. S1C: we note that the profile produced by simultaneously considering the effect of the constant flux bounds constraints (" ") in conjunction with the constraint produced by mapping the enzyme substrate as a "regulator" of enzyme flux ≤ • + is comparable to the flux vs concentration profile of a simple Michaelis-Menten reaction mechanism. We refer to these types of "kinetics" constraints as "mass action" constraints, to differentiate them from "regulatory" constraints produced through mechanisms such as allostery. Our code includes a procedure to automatically generate mass action kinetics constraints from a stoichiometric matrix, giving the user the convenience of only needing to manually specify the regulatory constraints.
We note that these regulatory interactions are implemented as bounds on the controlled fluxes, rather than as equality constraints: this is a very fundamental difference from the behavior of ODE models, in which each equation reduces the dimension of the solution space. For linear equations, this would often create cases in which the system of equations would produce negative concentrations, for example by forcing an efflux term to exceed an influx term when a metabolite was already depleted. However, in LK-DFBA, this just leads to a situation in which the kinetics constraint is no longer active, and the other model constraints preclude blatantly unphysical behavior. As a result, LK-DFBA has a degree of both simplicity and flexibility that comes with advantages and disadvantages that we will explore in more depth in the Results S1 section.
In a model with nr regulatory constraints, each regulatory constraint adds two parameters ( , ) to the model, for a total of (2 • nr) parameters. In many cases, a single controller is paired with a single target, but certain cases may allow lumping multiple species together, allowing a reduction in the number of model parameters. For example, one might reduce the number of model parameters by choosing to constrain only the sum of the effluxes from a given metabolite, such as at branch point, rather than to introduce separate constraints for each of the individual fluxes. Our Results and Discussion includes an assessment of the tradeoffs between these two options to determine if this consolidation represents an improvement or an oversimplification.
In the work discussed here, we modeled regulatory kinetics constraints that correspond to regulation of fluxes via rapid, direct mechanisms such as allostery. However, LK-DFBA is not inherently restricted to modeling this type of regulation. By choosing a simulation interval over which transcriptional changes are relevant, changes in enzyme levels could easily be modeled as well. Capturing these other types of regulation may require modifications to reflect differences in the underlying mechanisms. For example, changes in target fluxes associated with transcriptional regulation may be subject to a time delay due to the intermediate biochemical steps necessary to produce the relevant changes in enzyme levels. Such a time delay could be introduced by shifting the linkage between controller metabolites and target fluxes from the adjacent time interval to instead a later time interval, with the exact offset specified using parameters set by the user or determined through parameter fitting.
To run a given simulation, the set of ( , ) parameter values is provided along with the map of controllers and targets. In practice, this will need to be determined via parameter fitting, as the linear equations in general are simplified approximations that do not directly correspond to intrinsic physical quantities. We consider this question in-depth in subsequent sections, where we provide several methods for determining these parameters and comment on their effectiveness and practicality.

Model Objective
As in FBA, LK-DFBA requires an objective. While there are several ways in which to construct this objective 37 we found that an objective that applied to the fluxes weighed equally at each time point (an "instantaneous" objective) was effective in producing stable, robust behavior. This style of objective function can be generated easily by expanding the original FBA objective to apply over each interval. We also tested an alternate "terminal" objective function (in which the objective function only looks at the concentration of a selected final time point), and found that this often led to degenerate solutions, erratic behavior, and numerical artifacts at intermediate time points. We discuss this more in the Results S1 section.
We also found it effective to add a modest penalty to the L2 norm of the solution vector during optimization, changing the problem objective to = ⃑ ⃑ + ⃑ ⃑ ⃑ ⃑ where = − and is a small penalty on the solution norm. This has the effect of imposing a parsimony preference on the solution vector ⃑ ⃑ , which has been shown to be an effective and reasonable strategy, and helped resolve some occasional observed issues with solution degeneracy [4]. While the resulting problem is technically now a (linearly-constrained) quadratic program (QP), we observed no appreciable increase in solution time, and this particular case still specifies a convex optimization (and therefore preserves the very desirable Strong Duality features of the LP). The option to instead use the linear objective remains, but our implementation defaults to the QP formulation.

The LK-DFBA Optimization Problem
Assembling the constraints described in the previous sections produces the following linearly-constrained QP for simulating metabolic time courses which we refer to as LK-DFBA. For

Test Models
To test our modeling and parameter fitting strategies, we used several models to produce "synthetic" datasets. The advantage of using these datasets as a point of comparison is that it allows us to produce idealized cases under which we can study the theoretical performance and limitations of our modeling strategy without being limited by practical concerns such as data sampling frequency, signal-to-noise ratio in the data, or limits in the cross-section of metabolites we can measure.

The Branched Pathway Model
Our first test model is a modified version of a popular, well-established in silico model from Biochemical Systems Theory (BST) describing a simple branched pathway with both positive and negative regulatory interactions [5]. Our modified version introduces several changes and is shown in Fig. 1 of the main text.  First, we replaced the two effluxes in the original model with a single, fixed-stoichiometry "biomass" reaction, which produces a biomass "metabolite" subject to a mass balance equation. This introduces some additional biological relevance (such reactions are ubiquitous in genome-scale models) and allows us to define a clear objective for the system.
Second, we modified the two regulatory interactions to change their targets. This allows us to simplify the model while still producing interesting dynamics for X1 via interactions with the branch fluxes v2 and v4. Originally, the negative feedback regulator controlled by metabolite X3 targeted the system input flux, v1. We changed its target from v1 to the branch flux of the opposing lower pathway, v4. We then left the input flux v1 at a constant value of vin. We also changed the positive regulatory interaction controlled by X4 to instead target v3, the first flux in the opposing upper branch. This allowed us to avoid introducing a parameter identifiability problem when X4 acts as a controller for flux v5 via both mass action kinetics and regulation (as would have been the result of combining the two branch outlet fluxes into a single biomass composition reaction).
Like the original BST model, we implement power-law kinetics, as shown in the equations of Fig. 1 of the main text. We produce several noiseless datasets by modifying the initial conditions, biomass equation stoichiometry, and kinetic rate constants; the conditions for these models are shown in Table S1.

Glycolysis and Pentose Phosphate Pathway in E. coli
While the branched pathway model is convenient as an initial test case, it lacks physiological significance and is too simple to capture some of the challenges we expect in real metabolic networks. To explore initial scale up and to better gauge the challenges of implementing LK-DFBA, we test a model of central carbon metabolism in Escherichia coli, specifically encapsulating glycolysis and the pentose phosphate pathway (PPP) [6].
The network structure is shown in Fig. S2, and a list of model abbreviations in Table S2 and S3. The model is a system of ODEs with empirically derived rate laws. Metabolite concentrations were generated using the procedure described by Dromms and Styczynski [7]. Briefly, noiseless data at high resolution were generated from the default model initial conditions and parameters in Copasi 4.14 (Build 89), with the exception that moieties such as ATP, ADP, and NADH, etc. were held at constant concentrations during simulation [6][7][8][9][10].
Reversible reactions were implemented by splitting them into two irreversible reactions representing the forward and reverse directions. For mass action constraints, the substrate of the original reaction was designated as the controller for the forward reaction, and the product was set as the controller for the reverse reaction. Fluxes such as Met and Trp synthesis, which are set to fixed values in the ODE model, were explicitly provided this information as well. We modeled the degradation reactions using equality constraints where the parameter was known and assigned to the model based on the value in the underlying ODE model, and set = 0 to produce a first-order kinetic rate law. By setting = 0, we avoid creating the potential for our linear kinetics constraints to create conflicts with non-negative concentration constraints that would result in an infeasible LP.

Parameter Optimization
The most general strategy is a standard parameter optimization approach. We constructed a fitness function from the weighted sum-of-squares error (SSE) between the provided data and model predictions, subject to an L2 regularization penalty on the fitted parameters. The SSE weights are specified by the user and can be used to reflect features such as differences in scale between metabolites or heuristics to enable attempts to more effectively recapitulate the behavior of certain metabolites. These weights can potentially be applied to concentrations, fluxes, or pooling fluxes, but we only used weights on concentrations in our work. Our implementation also allows the user to specify a regularization weight and reference vector for the regularization penalty.
This SSE fitness function was used to fit FBA models for methods based on non-linear optimization. In the case of the genetic algorithm, we improved convergence of the algorithm by introducing constraints on the parameter search space to remove areas where we anticipated poor parameter sensitivity. These restrictions are described in Fig. S3.
For larger systems, we found that it may be more tractable to perform multiple sequential optimization problems by fixing a subset of the parameter values and switching off between optimizing different parameters at each step. We provide an option for the user to specify multiple rounds of optimization, in which individual pairs of parameters can be set as fixed or fitted for a given round of optimization. This is accomplished by specifying a design matrix in which rows represent kinetics constraints, and columns specify individual optimization rounds. If a particular kinetics constraint (parameter pair) is fixed at its initial values for a particular round, its value for the corresponding column is set to 1; otherwise, if it is to be optimized, it is set to 0. For example, we simultaneously fit all 6 kinetics constraints in a single step by setting this matrix as a (6×1) matrix of zeros. For the E. coli model, we chose to optimize over individual constraints (i.e. individual ( , ) parameter pairs) in sequential order until we had cycled through fitting all kinetics constraints twice. (B) Under certain regions of parameter space when < 0 (inhibition), the initial concentration of the metabolite, x0, is outside the feasible space. This produces a conflicting constraint with the kinetics constraints, and the resulting LP is infeasible. We avoid this by restricting ( , ) such that vmin ≤  x0 + .
(C) For certain regions of parameter space, the kinetics constraint will be guaranteed inactive. In this situation, the fitness function will be insensitive to small changes in ( , ), making it difficult to optimize these parameters. We minimize this issue by restricting b such that when > 0, ≤ vmax and when < 0, ≥ vmin.

Simulating a Time Course with a Nominal Set of Parameters
We implemented LK-DFBA in MATLAB using the Gurobi solver library [13]. These codes take a model specified by the user (including an FBA model structure and the additional information for concentrations, regulation, and simulation interval, as described in the Methods section), generate the extended LP problem structure for the dynamic FBA problem, and solve the optimization using Gurobi. The results of this optimization are parsed into data matrices for the concentration and flux time course profiles, and are returned to the user. An example time course simulation is shown in Fig. S4. One behavior we observe is a change in active constraints over the time course, leading to shifts in the resulting flux distribution. We note here that an instantaneous shift in fluxes takes time to produce changes in concentrations due to the integration equations.
To demonstrate the necessity of including our linear kinetics constraints, we performed a simulation with a model containing no kinetics constraints. The result of this is shown in Fig. S5. After an initial transient period in which the metabolite pools are immediately depleted, the model quickly reverted to the steady-state flux distribution one would observe from an FBA optimization with no dynamics or regulation.
To produce the stable behavior shown in Fig. S4 and S5, we tested several options to determine the optimal configuration of the optimization problem. We explored a terminal and an instantaneous objective function, and determined that an instantaneous objective produced more stable behavior. The justification for this decision is shown in Fig. S6, in which the prevalence of degenerate solutions and inconsistent time course behavior led us to abandon the terminal objective function.
To combat degenerate solutions, we further explored penalties on the norm of the solution vector ⃑ ⃑ . These included secondary optimizations in which the optimal = ⃑ ⃑ was set as a constraint, and the L1-or L2-norm of ⃑ ⃑ was minimized, as well as schemes penalizing ( ( +1 ) − ( )) (not shown). The results of several regularization schemes are shown in Fig. S7. From this analysis, we concluded that the best solution was a single optimization using the instantaneous objective with a penalty on the L2-norm of ⃑ ⃑ , which we implemented as described in the Methods as objective = ⃑ ⃑ − ⃑ ⃑ ⃑ ⃑ .
In hindsight, the improved performance of the instantaneous objective function over the terminal objective is perhaps unsurprising. In a biological system, the organism lacks any foreknowledge of resource abundance, and instead is limited only to sensing the current state of its internal and external environment. The instantaneous objective better reflects this reality, and is justified both on a theoretical basis and on the practical basis demonstrated in Fig. S6 and S7.

Assessment of Five Model Types on Noiseless Branched Pathway Data
As described in the Methods section, we generated a set of medium resolution (nT = 100) noiseless ODE time course profiles for the branched pathway model from the 15 parameterizations shown in Table S1. We fit models for the five methods described in the Methods section: BST, MM, LK-DFBA (LR), LK-DFBA (LR+), and LK-DFBA (GA). The fitness function for the LK-DFBA (LR+) and LK-DFBA (GA) fitting methods was configured to fit only metabolite concentrations by assigning a weight of 0 to system and pooling flux values. We simulated model dynamics at high resolution (nT = 1000) and determined the prSSE for each case; the results of this analysis are shown in Fig. S8.  Table S1. Penalized relative sum-of-square errors (prSSE) are calculated as described in the Methods section of the main text.
For the noiseless, high-resolution data sets, we observe several trends. First, the BST method has the best performance followed by the MM model. This is to be expected, since the BST method's model equations are identical to those of the underlying ODE model. Second, the LK-DFBA (LR) method has the lowest performance, which is perhaps unsurprising given the number of approximations used in this method. Third, the LK-DFBA (LR+) method substantially improves on the LK-DFBA (LR) parameters, leading it to produce time course data similar in accuracy to the LK-DFBA (GA) method. While the LK-DFBA (GA) method outperforms the LK-DFBA (LR+) method in seven of the fifteen cases, the differences are relatively small, and this modest improvement comes at the cost of 5-6 hours of computational time, compared to the <10 minutes required for the LK-DFBA (LR+) method (which in this case includes performing multiple fits with random perturbations to the initial LK-DFBA (LR) guess). For this reason, we omit using the LK-DFBA (GA) in subsequent sections.

Performance in the Branched Pathway Model
In the Missing-X2 cases, the lack of data describing X2 dynamics led to poor optimization using the LK-DFBA (LR+) method: the parameters that best optimized the remaining data pushed the model to poorly approximating the time course of the unmeasured metabolite (which was still included in the calculation of prSSE), as shown in Fig. S9. Looking at the contribution of individual metabolites to the overall error, we indeed see that the largest contribution comes from the prSSE for predicting X2. This serves to demonstrate a point made by Goel et al. regarding error compensation and the advantages of performing parameter optimization over smaller independent subsets of the system via e.g. regression [14].

Fig. S10. Dynamic Flux Estimation in the E. coli model
The black trajectories are flux profiles from the ODE model, and the red trajectories are the dynamic flux distributions calculated using the DFE procedure described in the Methods section. In most cases, DFE failed to qualitatively capture the correct flux behaviors, making it difficult to use the regression method with any accuracy. We opted to instead use noise-added flux values from the ODE model for parameter regression.

Parameter Sensitivity of LK-DFBA in Branched Pathway Model
For each of the nT and CoV combinations in Fig. 3 of the main text, 50 replicate data sets were produced. We observe that some parameters in linear kinetics constraints are more sensitive than others when using the LK-DFBA (LR+) method, as shown in Fig.  S11. For the branched pathway model, parameter values for a1, a3, a6, b1, b3, and b6 are generally stable across different nT and CoV conditions (Fig. S11A, C, F, G, I, L). Values for a2, a4, a5, b2, b4, and b5 are more sensitive to changes in CoV (Fig. S11B, D, E, H, J, K). For all parameters, the variance in values increases as CoV increases, which is to be expected. These results indicate that in a few key constraints it may be important to keep certain parameters within a certain range of values in order to correctly capture metabolite dynamics, whereas other parameters have more flexibility to fluctuate.