Backgrounds: FBA and MOMA
Before introducing our new bi-level programming problem to identify optimal metabolic genes or reactions to delete for the maximization of targeted bio-productions, we first review the mathematical foundations of FBA [14] and MOMA [11]. FBA provides appropriate simplifications for metabolic flux analysis by assuming the balance of production and consumption fluxes at steady states of metabolic network models. Specifically, with the prior stoichiometry knowledge, FBA assumes that the weighted sum of network fluxes based on stoichiometric coefficients S is 0: , 1 ≤ i ≤ N, in which we assume that the network model has M reactions and N metabolites in total; S
ij
is the stoichiometric coefficient of metabolite i in reaction j; and v
j
denotes the flux value of reaction j. For wild-type strains, as mentioned above, a common assumption is that their steady-state flux values follow an optimal distribution that maximizes the biomass production rate. The steady-state flux distribution is approximately solved as a LP problem to maximize the biomass production flux: v
biom
subject to the FBA stoichiometry constraints, in which v
biom
is defined by summing up the metabolite precursors that contribute to the biomass production in FBA [11]. In OptKnock, the optimal gene knockout strategy is to remove genes or reactions by setting the corresponding v
j
to zero with the resulting knockout flux distribution maintaining biomass maximization assumption.
As stated in [11], engineered gene knockouts in laboratory usually cannot achieve the maximum growth states as they have not been exposed to the same evolutionary pressure as wild-type strains. Typically, mutant strains initially stay as close as possible to wild-type optimal steady states in terms of flux values. Computational simulations under the MOMA assumption constraining metabolic adjustment to be minimal have demonstrated better agreement with observed flux values in actual experiments [11]. Hence, flux distributions in mutated metabolic networks can be solved as a QP optimization problem to minimize the L2 distance between the knockout flux values to wild-type steady-state flux values:
where v
j
represents the flux value of reaction j in mutant strains and w
j
is the corresponding flux value in wild-type strains. The flux value for biomass production v
biom
is similarly defined as mentioned earlier. In addition, the glucose flux value v
glc
denotes the glucose consumption rate, which is often set to a fixed value v
glc_uptake
. Finally, and are the lower bound and upper bound for v
j
, which are determined by the availability of nutrients or the maximal fluxes that can be supported by enzymatic pathways [11].
New bi-level programming framework
Following the modeling strategy in OptKnock [13], we aim to derive optimal gene knockout strategies, which consequently remove corresponding reactions for desired biomedical overproduction while maintaining obligatory cellular conditions, for example, cell mortality. However, as it has been shown that the assumption of biomass maximization for steady-state cellular conditions may not correctly predict metabolic flux distributions for knockouts [11, 13], we replace the internal cellular objective of maximizing biomass yield in OptKnock [13] by the MOMA assumption [11], which has led to better predictions of steady-state flux allocations for genetically engineered strains. With this critical change from OptKnock, we formulate a novel bi-level programming model for gene knockouts in which the inner optimization problem is a QP problem.
Mathematically, we introduce binary variables y
j
∈ {0, 1}, 1 ≤ j ≤ M, denoting gene or reaction knockout strategies in which reaction j either is knocked out (y
j
= 0) or remains active (y
j
= 1). The identification of optimal knockout strategies y
j
under MOMA requires to solve the following bi-level programming problem:
in which K is the allowed maximum number of knockouts and v
chemical
corresponds to the reaction that produces the desired biochemical production target. Note that we do not count in the flux change for the target reaction in the inner problem as it would contradicts to our primal optimization for maximal biochemical overproduction.
Adaptive linearization strategy for an exact optimal solution
We emphasize that the nested inner optimization problem is a QP problem with respect to flux allocation v
j
in knockout strains. As this nested inner problem is convex, we can still get its dual problem and the strong duality condition still holds for the inner primal and dual problems. Following the similar direction of [13], we can develop a single-level equivalent formulation by enforcing the objective value of the inner primal problem equal to that of its dual problem. However, the resulting formulation will be a mixed integer quadratically constrained programming problem, which poses a huge computational challenge when solving real problems. Because of this major change due to the introduction of the inner QP problem under the MOMA assumption, the transformation in OptKnock to a typical single-level mixed integer linear programming (MILP) problem based on the linear programming (LP) duality theory is not directly applicable any more.
To derive efficient solution algorithms for our new bi-level programming gene knockout problem, we adopt a novel adaptive linearization solution strategy to tackle the computational complexity introduced by the inner QP problem. Specifically, we propose to adaptively represent the quadratic terms in the objective function of the inner problem using a set of linear functions as illustrated in Figure 1(A), which yields a LP approximation for the nested inner problem. With a given piecewise linearization of the inner problem, we can convert our new bi-level model into a single-level problem based on the LP strong duality. For the linearized problem, we can obtain the optimal solution similarly as in [13] by solving the transformed single-level MILP problem. In order to obtain the exact optimal solution to the original bi-level problem with the inner QP problem, we adaptively create necessary pieces on the fly to approximate the quadratic objective function until the solution converges.
The basic idea of adaptive piecewise linearization is illustrated in Figure 1(B-D). We denote the initial starting solution by v1, which can be represented by a convex combination of endpoints of piecewise segments for a given piecewise linearization. The corresponding quadratic objective function value at v1 is denoted by M1, which can be approximated linearly by as the convex combination of the corresponding objective function values at segment endpoints A, B, C and D. We iterate the procedures to solve the linearized single-level MILP problem and to adaptively add piecewise linear segments to better approximate the inner quadratic objective function as illustrated in Figure 1(B-D) until the optimal solution of the MILP problem achieves the desired precision with respect to the approximation of the inner QP objective function. This adaptive linearization strategy has the guarantee that the final solution converges to the exact optimal solution. More importantly, it is much more efficient than directly solving mixed integer quadratic constrained problem without linearization and hence it allows us to solve for large-scale metabolic networks.
With this basic understanding of our new bi-level model and adaptive piecewise linearization solution strategy, we describe the detailed algorithm in the following sections.
Piecewise linearized inner problem
The quadratic objective function of the inner problem, denoting the metabolic adjustment to wild-type steady-state flux allocations (w
j
) in MOMA, is the key obstacle to derive the efficient solution strategy. We propose to use piecewise linear functions to approximate this quadratic objective function. The basic idea of piecewise linearization is to assume that each reaction flux value v
j
can be discretized into a finite number of segments, each of which is precisely defined by its corresponding consecutive endpoints . Any arbitrary value v
j
can then be represented by a convex combination of these endpoints:
(1)
in which are the piecewise variables determining the convex representation and there are T - 1 segments with T endpoints between the corresponding lower and upper bounds for flux value v
j
: and . These piecewise variables satisfy the following constraints to guarantee the satisfaction of the flux constraints :
(2)
(3)
Similarly, as can be seen from Figure 1, the individual contribution from flux v
j
to the original quadratic objective function of the inner problem can be approximated as
(4)
With this convex approximation strategy, the inner problem with MOMA is transformed to a linear programming problem with respect to the piecewise variables :
Here, both w
j
and are constants and we have removed the constant terms in the original objective function. This linear approximation of the original inner objective function based on the MOMA criterion now enables the solution strategy to the bi-level programming problem by taking advantage of the LP strong duality property [16], for which the objective function values for the primal and dual problems of the approximated inner LP problem must be equal to each other at optimality if both of them are bounded. With this duality condition, the bi-level programming problem can be solved as a single-level MILP problem by including the dual problem formulation and enforcing that the primal and dual problems share the same objective function value as in [13].
We first give the dual problem of the linearized inner problem:
where a
j
is the corresponding dual variable associated with the constraints on new piecewise variables β; b
i
is the dual variable for stoichiometric constraints, c
j
and d
j
are the dual variables for upper bound constraints and lower bound constraints for flux values respectively, and µ
glc
and µ
biom
are the dual variables corresponding to the constraints for glucose and biomass flux values. The knockout variable y
j
is still in the inner dual problem coupling two cellular objectives in the original outer and inner problems. The products of single binary variable and continuous variable in the fourth and the fifth terms can be linearized using the big-M method. Together with LP duality constraint, we have the final single-level MILP problem as
This final single-level MILP problem can be solved effectively by professional solvers, such as CPLEX [17]. We note that our new MOMA-based knockout optimization problem has a larger problem size with a larger number of variables and constraints as multiple linear functions are used to approximate the inner quadratic function.
Adaptive strategy
We have shown that we can effectively solve the linearized bi-level programming problem in the previous section. However, due to the linearization of the original quadratic MOMA objective function, the obtained result for a given linearization scheme is an approximate solution but not exact. In addition, the closeness to the exact optimal solution is directly determined by the number of segments for each flux to approximate the quadratic function . In order to obtain the exact optimal solution to the original bi-level programming problem, we adopt an adaptive strategy, in which piecewise linearization is implemented adaptively from the coarse to fine levels. As the original inner problem is to minimize the quadratic MOMA objective function, which is convex. It is easy to prove that the approximate optimal solution for a given linearization will have each flux v
j
fall within one segment. In other words, for each flux v
j
, piecewise variables only have either one (at endpoints) or two adjacent non-zero values for the approximate solution as illustrated in Figure 1.
When we have only one non-zero value within all the piecewise variables , we obtain the exact optimal solution as the linearized objective function has the exact same value at these segment endpoints. This naturally leads to an adaptive solution strategy to solve the original bi-level programming problem. We start with a coarse linearization with a small number of segments for each flux v
j
and solve the single-level MILP problem for this given linearization. We can compute the objective function value difference for the inner problem for the obtained solution as:
(5)
Based on the differences and the state of vector β
j
for all flux values, we adaptively add new piecewise linear segments to better approximate the corresponding contributions from each reaction flux to the quadratic objective function in the inner problem. By repeating the above procedure as shown in Figure 1(B-D), we can iteratively solve the problem by adaptively improve the piecewise linearization from coarse to fine levels until adding pieces does not change the objective value. If every Δ
j
is less than a very small number ϵ and every maximum value in β
j
is larger than a constant number θ that is close to 1, we can say the algorithm has converged. To speedup the algorithm, the knockouts from previous iteration are used to get a low bound for the MILP problem. Algorithm 1 provides the pseudo code for our adaptive linearization solution strategy to identify optimal knockout strategy for biochemical overproduction under the MOMA constraint.
Algorithm 1 Adaptive bi-level MOMAKnock.
Initialize variables.
Initialize the piecewise linearization with k pieces
repeat
Solve the inner primal problem based on previous knockouts to get a low bound objL;
Solve the MILP problem with the low bound objL;
for Each flux j do
Compute Δ
j
.
if Δ
j
>ϵ or then
Add a segment point at ; ( and are nonzero)
end if
end for
until Added segments do not improve the objective function