Skip to main content

MOMO - multi-objective metabolic mixed integer optimization: application to yeast strain engineering

Abstract

Background

In this paper, we explore the concept of multi-objective optimization in the field of metabolic engineering when both continuous and integer decision variables are involved in the model. In particular, we propose a multi-objective model that may be used to suggest reaction deletions that maximize and/or minimize several functions simultaneously. The applications may include, among others, the concurrent maximization of a bioproduct and of biomass, or maximization of a bioproduct while minimizing the formation of a given by-product, two common requirements in microbial metabolic engineering.

Results

Production of ethanol by the widely used cell factory Saccharomyces cerevisiae was adopted as a case study to demonstrate the usefulness of the proposed approach in identifying genetic manipulations that improve productivity and yield of this economically highly relevant bioproduct. We did an in vivo validation and we could show that some of the predicted deletions exhibit increased ethanol levels in comparison with the wild-type strain.

Conclusions

The multi-objective programming framework we developed, called Momo, is open-source and uses PolySCIP (Available at http://polyscip.zib.de/). as underlying multi-objective solver. Momo is available at http://momo-sysbio.gforge.inria.fr

Background

Multi-objective (multi-criteria) optimization is a method used to tackle problems when several objective functions have to be optimized simultaneously. Particularly, multi-objective programming is a class of optimization problems that have more than one objective function over a set of constraints. Except under limited circumstances, it is not possible to find a single optimal solution which simultaneously optimizes all of the objective functions, i.e., in general, the resolution of such problems leads to a trade-off curve (known as “Pareto frontier”) such that at any point of the curve, a compromise is made in favor of some objective function.

Multi-objective optimization has been extensively exploited in science and engineering. It has become a powerful tool for solving real-world problems and is also used in numerous bioinformatics and computational biology as shown in [1]. Successful engineering of microbial catalysts often needs the optimization of more than one feature, for example, the maximization of the production of a desired product while minimizing the synthesis of a by-product or maximizing the production of a product that is an end-point metabolite and therefore competes with growth for the carbon source. Some examples of direct application of this type of framework in the study of metabolism include the method developed by Villaverde et al. [2], based on kinetic models to predict up/down enzymatic regulations in order to optimise the production of metabolites of interest. Another approach, by Wu et al. [3], is also based on kinetic models to predict enzyme regulation and incorporates constraints to account for resilience effects and cell viability. Also, the work of Patané et al. [4], uses multi-objective simulations in order to predict maximum theoretical improvements rates of ethanol production in different strains.

Nevertheless, the most common algorithms used to guide strain design do not focus on a multi-objective approach but instead, in general, work within a bi-level framework (OPTKNOCK is the more paradigmatic case [5, 6]), adopt a max-min approach (e.g. ROBUSTKNOCK [7]), or exploit a consensus framework (e.g. the recently described OPTPIPE method [8]), as recently reviewed by [9].

There are also some other more generic approaches such as the one of Hädicke and Klamt [10] that is based on constrained minimal cut sets that allow disabling or preserving some cell functionalities when computing a gene deletion strategy. This method is generic and can be used to emulate other approaches such as OPTKNOCK, but it does not handle the optimization of multiple simultaneous compounds.

The use of multi-objective optimization for strain engineering is usually performed exploring heuristic/meta-heuristic approaches. Examples of these studies are the work of Maia et al. [11], in which the authors model the problem of identifying deletions to increase production of a compound of interest using multi-objective optimization, and then use an evolutionary algorithm to obtain the solutions. The developed approach was validated by improving the capability of Escherichia coli to produce succinic acid. Another example of studies exploring multi-objective optimization by heuristic methods is the work of Costanza et al. [12] that uses such an approach to seek the deletions that may optimize multiple cellular functions simultaneously, thereby allowing what the authors called a “genetic design of the strains”.

Although heuristic methods are usually computationally less expensive than exact ones, they do not guarantee that an optimal solution is found. In this work, we present an exact integer-linear multi-objective optimization methodology that can be explored for strain engineering, thereby expanding the current set of tools available for this objective. In our framework, called MOMO, given a metabolic network and a set of target reactions, the user can identify possible reaction deletions that could improve the flux of those targets, always in a multi-objective setting. To the best of our knowledge, this is the first time that such a tool has been developed combining these features.

To validate the approach, MOMO was used to model ethanol production undertaken by S. cerevisiae cells, and deletion of reactions that could optimize ethanol production were predicted and tested in vivo. Simultaneous maximization of growth and ethanol were used as objective functions. Despite many years of utilization in the industrial biotechnology sector, alcoholic fermentation undertaken by yeast cells gained a further boost in recent years with an increase in the interest of exploring ethanol as a biofuel [13].

Previous works have also aimed at exploring non-linear multi-objective optimization formulations to model Saccharomyces cerevisiae production of ethanol [14, 15]. Although these two approaches provided some understanding concerning the control and optimization of the small metabolic system used, no strategies to improve ethanol production could be designed from the results obtained.

We start by presenting in “Implementation” section an overview of metabolic network modeling and of the most used method for metabolic engineering, namely OPTKNOCK that adopts a bi-level model (“Modeling and optimization of metabolic networks” section and “Bi-level model” section). In “MOMO method” section, we introduce a bi-objective approach that will form the core of the MOMO method we developed, together with a classical enumeration technique that is required when there is more than one optimal solution and all need to be listed. In “Yeast fermentations” section, we describe the experimental context in which the yeast fermentations were made. In “Results” section, we present the experimental results obtained in silico, and the experimental validation that was performed. Finally, in “Discussion” section we discuss such results and present some of the perspectives of this work. The Additional file 1 provides additional theoretical material on multi-objective optimization.

Implementation

In this section, we describe some of basic aspects related to modeling metabolic networks in steady-state conditions and briefly describe the well-known algorithm OPTKNOCK, this being an important feature to better understand the modelling behind MOMO which was based on this. We also present the model of MOMO and its implementation aspects as well as some details about the fermentations performed to validate our method.

Modeling and optimization of metabolic networks

Genome-scale metabolic models are linked with conservation of mass equations for each metabolite in the metabolic network. The quantitative metabolic capabilities of a given organism can therefore be calculated as follows. Let \(I=\{1,\dots,m\}\) index the metabolites and \(J=\{1,\dots,n\}\) index the reactions of a given metabolic network. Assume that xi denotes the concentration of metabolite i, and vj denotes the flux of reaction j. The system dynamics is then described by the following equations:

$$\begin{array}{*{20}l} \frac{d x_{i}}{dt} = \sum_{j=1}^{n} s_{{ij}} v_{j},~ \forall i \in I, \end{array} $$

where sij are the stoichiometric coefficients of metabolite i in reaction j. Observe that the stoichiometric coefficients sij can be combined into the so-called stoichiometric matrix S=[sij]m×n, where each row corresponds to a metabolite and each column to a reaction. In this paper, we consider the steady-state approximation in which the metabolite concentrations are assumed to be constant. The above equation therefore reduces to:

$$\begin{array}{*{20}l} \sum_{j=1}^{n} s_{{ij}} v_{j} = 0,~ \forall i \in I. \end{array} $$
(1)

Since the number of reactions (denoted by n) is higher that the number of metabolites (denoted by m) in the majority of the real-world metabolic networks, there can be an infinite number of reaction fluxes satisfying the system of Eq. (1). One therefore needs to select from the set of feasible solutions only those that ensure cell viability and, in this context, one option currently applied is to maximize the flux of the biomass reaction (vbiomass). This is obtained by solving a continuous linear programming problem. In order to formulate such a problem, assume that LBj and UBj denote the given lower bound and upper bound on the flux of reaction j, respectively. This optimization problem can therefore be written as:

$$\begin{array}{*{20}l} \mathbf{FBA} \quad \quad \underset{\mathbf{v}}{\text{max}}\quad &~v_{{biomass}} \\ \text{s.t.}~~~~ &\sum_{j=1}^{n} s_{{ij}} v_{j} = 0,~ \forall i \in I, \\ &LB_{j} \leq v_{j} \leq UB_{j},~ \forall j \in J. \\ \end{array} $$
(2)

The above mathematical program is an essential part of Flux Balance Analysis (FBA) [1622] which plays a key role in optimizing cell metabolism and enables us to investigate both the theoretical and operative capabilities of such metabolism.

Bi-level model

We now describe the mathematical formulation of the OPTKNOCK [5] procedure in detail. OPTKNOCK was developed to identify multiple reaction deletions that may lead to the overproduction of a target chemical while cellular growth is maximized. This hierarchical optimization model comprises two decision or competing strategies, namely a cellular objective and a chemical production objective. It therefore is formulated as a Bi-Level Mixed Integer Program (BLMIP). Binary variables, denoted by yj,jJ, are defined which enable to perform reaction removals inside the model by multiplying both LBj and UBj by (1−yj) in the lower bound and upper bound constraints as follows:

$$y_{j} = {\left\{ \begin{array}{ll} 1,~ \text{if reaction }{j} \text{ is removed},\\ 0,~ \text{otherwise}. \end{array} \right.}$$

Assume that K denotes the given number of knock-outs. Then the Bi-Level Mixed Binary (linear) Program (BLMBP), which is a hierarchy of two optimization problems, can be mathematically formulated as follows:

$$\begin{array}{*{20}l}{} \mathbf{BLP}\! \quad\!\! \underset{\mathbf{y}}{\text{max}}\!\quad &~v_{{prod}} \\ \text{s.t.}~~~ \\ &\left\{\!\begin{array}{ll} \underset{\mathbf{v}}{\text{max}}\!\quad v_{{biomass}} \\ \hspace{6pt} \text{s.t.}\\ \hspace{28pt} v_{{biomass}} \geq MLB, \\ \hspace{28pt} \sum_{j=1}^{n} s_{{ij}} v_{j} = 0,~ \forall i \in I, \\ \hspace{28pt} LB_{j} (1-y_{j}) \leq v_{j} \leq UB_{j} (1\,-\,y_{j}), \forall j\! \in\! J, \\ \hspace{28pt} v_{j} \in \mathbb{R},~ \forall j \in J,\\ \end{array}\right.\\ &\sum_{j=1}^{n}y_{j}=K, \\ &y_{j} \in \{0,1\},~ \forall j \in J. \end{array} $$

where vprod corresponds to the flux of a chemical of interest, the vectors v and y denote \((v_{1},\dots,v_{n})\) and \((y_{1},\dots,y_{n})\) respectively, and MBL represents a minimum level of biomass production. We may also need to fix a substrate uptake that in this model we suppose is specified through the vectors LB and UB.

There is also one variation of OPTKNOCK, known as SIMPLEOPTKNOCK and implemented in the COBRA toolbox [23] that does not rely on a bi-level model, but on solving many linear programming problems.

This method consists of solving J instances of a variant of (2). For each reaction, its upper and lower flux bounds are fixed to zero. Under these new constraints that simulate the deletion of that reaction, an optimization of the biomass is performed. The obtained biomass value (maximum) is then fixed (upper and lower flux bounds) and this new model is subsequently optimized for the bioproduct of interest. The algorithm output, as implemented in the COBRA Toolbox, reports, for each reaction knock-out, the maximum biomass and the maximum and minimum values for the target compound.

MOMO method

We now describe the MOMO method developed in this study. This is based on a bi-objective model, whose characteristics are presented in the next subsection, and on an enumeration technique that is discussed just afterwards. Finally, MOMO itself, some of its implementation details and a short comparison with OPTKNOCK is discussed in the last part of this section.

Bi-objective model

In this section, we propose and describe a novel formulation for predicting reaction removal strategies in order to overproduce a target chemical in metabolic networks.

The formulation is based on multi-objective programming, and could be used for more than two objectives. In the case-study of this paper, we however concentrate on only two. Those are growth on one hand, and target chemical overproduction on the other, knowing that in general the two compete inside a cell, i.e., producing more of a target chemical leads to less production of biomass. We use such formulation to propose reaction knock-outs that enable attaining a high value for the second objective while achieving an acceptable value of the first, that is, cell growth rate. Comparing with bi-level optimisation, conceptually speaking, in bi-objective optimization we give the same importance to both objectives and we can evaluate a posteriori the Pareto frontier to detect the more plausible points to be inspected further. Whether multi-objective optimization may be approached by solving many single objective problems, using a multi-objective approach directly is simpler from a practical point of view. A detailed description of multi-objective optimization (definitions and algorithms) is presented in the Appendix, along with the main references.

We briefly describe however here one key concept, namely of the Pareto-efficient frontier, whose precise definition is given in the Appendix. Informally speaking, the Pareto-efficient frontier represents a set of solution points that are optimal in the sense that we cannot improve one objective without worsening another one. One example of a Pareto-optimal frontier is given in Fig. 1, where the points A, B, C, D, and E are considered Pareto-optimal.

Fig. 1
figure1

An example of a Pareto-efficient frontier with two objectives, v1 and v2. The red dots A, B, C, D, and E are examples of optimal choices while the points F, G, and H represent non-optimal solutions since we can improve one of the objectives without worsening the other

Considering the notations of “Modeling and optimization of metabolic networks” section, we partition the set J into (JE,JI) in such a way that JE indexes the essential reactions which are required for the cell to meet the biological objectives and JI indexes the inessential reactions that are not necessary for the cell. The Bi-Objective Mixed Binary (linear) Program (BOMBP) can therefore be mathematically formulated as follows:

$$\begin{array}{*{20}l}{} \mathbf{BOP} \quad \quad \underset{\mathbf{y},\mathbf{v}}{\text{max}}\quad &~(v_{{prod}},v_{{biomass}}) \\ \text{s.t.}~~~ &\sum_{j=1}^{n} s_{{ij}} v_{j} = 0,~ \forall i \in I, \end{array} $$
(3)
$$\begin{array}{*{20}l} &LB_{j} \leq v_{j} \leq UB_{j},~ \forall j \in J_{E}, \end{array} $$
(4)
$$\begin{array}{*{20}l} &LB_{j} (1-y_{j}) \leq v_{j} \leq UB_{j} (1-y_{j}),~ \forall j \in J_{I}, \end{array} $$
(5)
$$\begin{array}{*{20}l} &\sum_{j=1}^{n}y_{j}=K, \end{array} $$
(6)
$$\begin{array}{*{20}l} &y_{j} \in \{0,1\},~ \forall j \in J_{I}, v_{j} \in \mathbb{R},~ \forall j \in J. \end{array} $$
(7)

Notice that in BOP, the binary variables yj are only attributed to inessential reactions. We highlight that instead of formulating the problem of identifying reaction knock-out strategies with a bi-level model which gives exactly one optimal solution, we formulate this problem using a bi-objective model (as presented above) which provides a number of optimal solutions on the trade-off Pareto-optimal frontier. Hence, we are able to analyze each optimal solution of the bi-objective model more carefully in order to find the most suitable reaction deletion scenarios. Also using constraint (4) it is possible to restrict the method to find points only in a specific range (i.e. excluding uninteresting points like zero biomass for example).

We describe next the technique used to enumerate all optimal solutions.

Enumeration

The models presented so far have the limitation of predicting only one set of K deletions per point in the Pareto frontier. Since we may have many sets of deletions leading to the same production rates, it would be interesting to enumerate all sets of possible deletions for each point in the Pareto frontier.

We therefore now describe a classical enumeration technique that can be applied in conjunction with the models presented so far in order to enumerate all equivalent deletions, that is, all deletion sets that allow the same production rates of the desired product and of biomass. We also present a modification of this technique in order to enumerate not all equivalent deletions, but a subset of these deletions that allows a flux distribution with a minimal change of the fluxes in comparison to a reference flux vector. In our case, we use as reference the flux distribution obtained by the FBA model since it represents the one of the wild-type organism. This idea of minimizing the distance between a reference flux distribution and the mutant flux distribution is known as minimization of metabolic adjustment (MOMA) [24].

Let Z be set of points of the Pareto frontier obtained by solving the BOP. For each \((v^{*}_{{prod}},v^{*}_{{biomass}}) \in Z^{*}\), we construct the following feasibility problem:

$$\begin{array}{*{20}l} \mathbf{ENUMP} \quad \quad \text{Find}\quad& v, y \\ \text{s.t.}~~~~ & v_{{prod}} = v^{*}_{{prod}}, \\ & v_{{biomass}} = v^{*}_{{biomass}}, \\ & (3)-(7). \end{array} $$
(8)

Let \((\bar {v},\bar {y})\) be one solution of ENUMP. The vector \(\bar {y}\) represents one set of deletions. Let ρ(y) be a function that takes a vector of integers and returns the set of indexes L such that for all lL,yl=1. Consider the following constraint:

$$ \sum_{j \in \rho(\bar{y})} y_{j} \leq K-1. $$
(9)

Adding Eq. 9 to ENUMP will exclude the set defined by \(\bar {y}\) from the set of solutions and also any other set that contains it. Solving the problem again will either give a new deletion set, or a certificate proving that no other equivalent solution exists. Iterating over these steps allows to enumerate all sets of equivalent deletions for each point in the Pareto frontier.

Observe that the amount of sets with equivalent deletions may grow considerably if K is large. One alternative is to enumerate only a subset of the deletions. For that we can add an objective function to ENUMP so that only those deletion sets that are closer to a reference flux set are enumerated. One possible objective function would be to consider the strategy introduced in [24], minimizing the distance between the wild-type and the mutant distribution through quadratic programming. Here we considered a similar idea using the Manhattan norm instead of the Euclidean one. This decision allows us to stay in a integer-linear model and also allows sparser solutions.

We can now define the new problem for the enumeration procedure as follows:

$$\begin{array}{*{20}l} \mathbf{ENUMPM} \quad \quad \underset{\mathbf{v,y}}{\text{min}} \quad & f(v) = \sum_{i=1}^{n} |v_{i} - \bar{\bar{v}}_{i}| \\ \text{s.t.}~~~ &~ v_{{prod}} = v^{i}_{{prod}}, \\ &~ v_{{biomass}} = v^{i}_{{biomass}}, \\ &~(3)-(7), \end{array} $$
(10)

where \(\bar {\bar {v}}_{i}\) is the optimal solution obtained from FBA problem optimising the biomass growth rate as defined in (2). The norm on the objective function of ENUMPM can be modeled using a classical reformulation technique where f(v) can be reformulated as:

$$ f(v) = \sum_{i=1}^{n} z_{i}, $$

where:

$$\begin{array}{*{20}l} z_{i} &\geq v_{i} - \bar{\bar{v}}_{i},\\ z_{i} &\geq -(v_{i} - \bar{\bar{v}}_{i}). \end{array} $$

For the enumeration, it suffices to add two constraints. The first is a constraint fixing the value of the objective function to the optimal one obtained by solving ENUMPM for the first time:

$$ f(v) = f^{*}(v), $$
(11)

where f(v) is the optimal value of ENUMPM. Then at each step, one should also add Eq. 9 as explained previously.

Using ENUMPM in the enumeration scheme presented here allows to obtain all deletion sets that enable the desired production rates, and, at the same time, have minimum difference in the flux distribution compared to the wild-type theoretical fluxes.

Implementation of MOMO

The ideas presented in the previous sections (“Bi-objective model” section “Enumeration” section) were implemented in the software MOMO. The latter is coded in C++ and uses SCIP [25] and PolySCIP [26] as solvers for the optimization models.

MOMO allows to solve instances of the FBA, BOP, ENUMP and ENUMPM models, using exact methods, and further allows for variations of those when there are more than two objective functions, which may include for instance maximizing and/or minimizing other bio-products. The software is open-source and available at http://momo-sysbio.gforge.inria.fr/.

We present below a sequence of steps that can be used with MOMO to predict knockout strategies for the overproduction of a desired product:

  1. 1.

    Solve an instance of FBA in order to obtain a reference flux distribution.

  2. 2.

    Identify the reactions that have a non-zero flux in the previous step and consider them as knock-out candidates.

  3. 3.

    Solve an instance of the BOP or of one of the multi-objective models to obtain points in the Pareto curve.

  4. 4.

    For each point, fix the values of the objective functions and enumerate all the knock-outs using ENUMP or ENUMPM.

Yeast fermentations

To biologically evaluate the proposed approach, we applied MOMO with the objective of finding gene deletions that could optimize the production of ethanol while also maximizing biomass in yeast.

The fermentations were performed using the wild-type strain Saccharomyces cerevisiae BY4741 (MATa, his3 Δ1, leu2 Δ0, met15 Δ0, ura3 Δ0) and W303 (MAT α ade2-1, his3-1,15 leu2-3,112 trp1-1 ura3-1). BY4741 was acquired from the Euroscarf collection while the W303 strain was kindly provided by Dr Dennis Winge (University of Utah). The mutant strains devoid of genes FUM1, NDI1, MDH1, ADE3, ALT1, PGI1, FUM1, SCS7ADO1, BAT2, LOT6, IDP1, LOT6, YND1, FRD1, IPT1 and SUR2 were derived from the BY4741 background (and were also acquired from the Euroscarf collection), while strains devoid of SDH2 or of SDH6 and SDH7 were kindly provided by Dr Dennis Winge (University of Utah).

For the very high gravity fermentations, a protocol similar to the one described in [27] was used. Briefly, the strains were cultivated over-night at 30 oC in YPD growth medium (20 g/L glucose, 20 g/L bactopeptone and 10 g/L yeast extract), after which they were inoculated in the fermentation medium YPF which differs from YPD by having 300 g/L glucose as well as 240 mg/L leucine, 80 mg/L histidine, 80 mg/L methionine and 80 mg/L uracil which were added to the medium to complement the auxotrophies of the BY4741 strain. The different fermentations were assessed measuring cell growth (through the increase in culture OD600nm) and accompanying the alterations in the concentration of glucose, ethanol and glycerol in the supernatant. For this, 10 μL of each supernatant sample were separated in an Aminex HPX-87 H Ion Exchange Chromatography column, eluted at room temperature with 0.005 M H2SO4 at a flow-rate of 0.6 mL min-1 during 30 min, using a refractive-index detector. Under such experimental conditions, glucose had a retention time of 8.3 min and ethanol of 19.4 min. Reproducibility and linearity of the method were tested, and the concentrations were estimated based on appropriate calibration curves.

Results

Multi-objective optimization-based modelling of ethanol production in S. cerevisiae

To simulate the yeast-based production of ethanol, the Yeast 5.01 model [28] was used constraining the flux of glucose uptake (r_1714, D-glucose exchange, lower-bound limited to -10), the electron transfer to O2 (corresponding to reaction r_0438, ‘ferrocytochrome-c:oxygen oxidoreductase’, upper-bound limited to 10) and the isomerization reaction between dihydroxyacetone phosphate and glyceraldeyde-3-phosphate (corresponding to reaction r_1054, ‘triose-phosphate isomerase’, upper-bound limited to 6). The constraint of the oxygen reduction reaction was necessary to predict ethanol production as otherwise all the glucose was channeled for biomass growth. Constraining the isomerization reaction was performed in order to predict the co-production of ethanol and glycerol, something that the non-modified Yeast 5.01 model could not do but that is well described to occur during yeast alcoholic fermentation due to the need of recycling NADH. Also, all the infinite bounds in the Yeast 5.01 model, -INF and +INF, were set to -1000 and +1000, respectively.

Under the established conditions and using biomass as the objective function, the predicted flux of ethanol was 8.8 mmol/g dry weight/h, while the predicted flux for glycerol production was of 2.2 mmol/g dry weight/h. When changing the objective function towards “maximization of ethanol production”, no flux for the biomass could be predicted, thus reflecting the competing nature that exists between ethanol production and growth (Fig. 2).

Fig. 2
figure2

a Pareto-optimal points obtained upon simulation of yeast alcoholic fermentation maximizing biomass and ethanol production. The fluxes obtained for ethanol and biomass production on the A-G points of the Pareto frontier are as follows: A(8.8;0.43); B(11.8;0.41);C(15.3;0.27);D(17.2;0.19);E(17.4;0.18); F(17.5;0.17); G(19.8;0.0). b Heatmap illustrating the difference of the fluxes between the wild-type and mutants corresponding to each Pareto point. Considering reactions related to glycerol production, glycolysis, TCA cycle, and ethanol production and utilization

Simultaneous maximization of ethanol production and biomass growth using MOMO resulted in the identification of 6 Pareto frontier points (Fig. 2a).

Point A predicts fluxes of ethanol and biomass growth identical to those predicted when biomass is used as the objective function (Fig. 2a) (8.83 and 0.44 respectively). Along the Pareto frontier, an increased amount of ethanol is predicted, achieving a flux of 17.5 (Fig. 2a) while the flux of the predicted biomass is about 0.18 (in fact the maximum predicted flux for ethanol is of 18.8 but with no biomass prediction).

It is clear from the analysis of the data shown in Fig. 2 that the maximization of ethanol observed along the Pareto frontier is accompanied by a significant decrease in biomass flux, with the exception of the point B in which ethanol flux increases to 11.8 but biomass growth decreases minimally compared with the results that are obtained when biomass is the sole objective function (Fig. 2a and b).

A closer examination of the distribution of the fluxes obtained upon implementation of the multi-objective setting for the different points of the Pareto frontier (in part shown in Fig. 2b and further detailed in Additional file 2): Table S1 shows that the predicted increase in ethanol production is, at first, obtained at the expense of eliminating glycerol production (whose associated reaction fluxes drop to zero) and of reducing the activity of various reactions that consume pyruvate, including pyruvate dehydrogenase and pyruvate carboxylase (Fig. 2b). On the other hand, the glycolytic reactions involved in pyruvate replenishing are predicted to increase their fluxes along the different Pareto frontier points as well as pyruvate decarboxylase and alcohol dehydrogenase, the enzymes driving ethanol synthesis from pyruvate (Fig 2b). The reactions diverting acetaldehyde from ethanol synthesis show a reduced flux, as do the reactions involved in ethanol consumption (Fig 2b).

Another important observation concerns the reduction in the flux of various enzymes involved in the TCA cycle and in amino acid biosynthesis, something that could be attributed to the reduction that is observed in biomass, this effect being much more marked for the last points of the Pareto frontier where the decrease in biomass is more evident (Fig. 2b). Overall, it can be seen that the distribution of fluxes around the yeast metabolic network during alcoholic fermentation in the bi-objective setting is first performed at the expense of reducing the formation of byproducts but not growth.

Identification of deletions leading to improved ethanol production

The use of an in silico simulation to identify gene deletions is one of the most widely adopted strategies to improve the microbe-based production of desired compounds. In this direction, we used MOMO to identify putative deletions that could lead to an improved ethanol production undertaken by yeast cells. It is important to note that the deletions found will be those with the potential to enable the predicted maximum flux of ethanol production (there is not a strict causality relation between the deletions and the maximum fluxes). For this, a set of possible deletions was generated for each of the points of the Pareto frontier (see Additional file 2: Table S2). This was accomplished by fixing the objective functions vethanol and vbiomass to the value of each point and then using the MOMO enumeration technique (as explained in “Enumeration” section and in Step 4 of “Implementation of MOMO” section).

A selected set of the results obtained is shown in Table 1 while the full list of deletions is shown in Additional file 2: Table S2. The highest number of candidate reactions for deletion was obtained for point G (317), which is expected since this is the point where no biomass is predicted therefore giving a higher flexibility for identification of candidate reaction deletions. A closer look into the sets of solutions obtained for all Pareto frontier points revealed possibilities for deleting reactions associated with the metabolism of glycerol, pyruvate (e.g. pyruvate dehydrogenase or acetaldeyde dehydrogenase), with the activity of Krebs cycle (e.g. malate dehydrogenase, fumarase, pyruvate dehydrogenase, oxoglutarate dehydrogenase), with the respiratory chain (e.g. succinate dehydrogenase, ferrocytochrome-c:oxygen oxidoreductase), with the biosynthesis of amino acids, with the metabolism of complex lipids including ergosterol, sphingolipids or ceramides or with the synthesis of glutathione.

Table 1 Sub-set of the deletions identified by MOMO for each point of the Pareto frontier shown in Fig 2

The proposed deletions of genes involved in glycerol metabolism as a means of improve ethanol production is of particular interest, since several studies have demonstrated that indeed modulation of the activity of this pathway can contribute, at different extents, to improve ethanol production in yeast (reviewed in [2931]). It can be reasoned that the reduced activity of pathways consuming pyruvate and acetaldeyde can contribute to decrease the diversion of pyruvate, creating a surplus of these metabolites that can then be channeled for ethanol synthesis. However, an eventual link between the metabolism of complex lipids or of glutathione with ethanol production cannot be intuitively established. The fact the set of solutions identified by MOMO comprises those expected to be obtained from a biological point of view shows that the modelling is functioning in an appropriate manner.

To validate the results obtained by MOMO we considered a reduced set of predictions, among the whole set generated by our software. We eliminated many candidate reactions identified at the Pareto frontier points showing a very small amount of biomass (points F and G) to avoid solutions pointing to increased ethanol production merely by compromising cellular growth. This is a relevant aspect since some of the solutions proposed as means to improve ethanol production could actually be not possible by compromising cellular viability. We also aimed to include, in the set of strains to be tested, those genes that could not be intuitively be linked to ethanol production. We ended up with a set of 19 different strains to be tested on the capacity of yeast cells to produce ethanol under very high gravity conditions (300g/L glucose as starting sugar concentration). Among the 19 mutant strains tested, four were demonstrated to exhibit increased ethanol levels in comparison with those exhibited by the wild-type strain (14.1% (v/v) ±0.5 ethanol for the BY4741 strain): Δidp1, devoid of the mitochondrial isocitrate dehydrogenase (14.6% ±0.6); Δlot6, devoid of a cytosolic FMN-dependent NAD(P):quinone reductase (16.1% ±0.9); Δipt1, devoid of a inositolphosphotransferase (14.9% ±0.7) and the W303_ Δsdh2 mutant devoid of the succinate dehydrogenase subunit Sdh2 (14.9% ±0.7, comparing with 14.4% ±0.3) (Table 1).

Discussion

The need to optimize more than one feature is a recognized challenge in microbial metabolic engineering. A very common example where it is necessary to deal with many features is the production of compounds, that fall outside of the microbe’s metabolic repertoire, that competes for the carbon source for growth. Although solutions can be envisaged to solve this problem for the bioprocess point of view (using, for example, continuous fermentation reactors where biomass is maintained at a given dilution rate), this is problematic from the modelling point of view since in the standard models only one objective function is used. In the specific case of end-point metabolites, maximization of its production often results in no growth and vice-versa. Other examples where more than one objective has to be envisaged is the need to reduce the synthesis of by-products that could be coupled to the production of the compound of interest but whose total elimination could be difficult or, in some cases, even impossible. This is the case of glycerol, which is a known product of ethanol production undertaken by S. cerevisiae required for maintenaince of redox potential and osmobalancing of yeasts cells, thereby rendering its elimination (to prevent carbon diversion) very difficult (as reviewed in [29]).

In this work we present MOMO, a framework that is able to consider metabolic networks in multi-objective settings and that also pinpoints for a selected set of reactions that could be deleted in order to reach different combinations of the various objectives selected by the user. Several other tools capable of performing identification of interesting knock-outs have also been developed along time (as reviewed in [9]), out of which OPTKNOCK or OPTGENE are particularly known [5, 32]. The main difference of MOMO in relation with other methods is that it is based on multi-objective otpimization.

One interesting aspect of the MOMO approach, in contrast with the bi-level model used by OPTKNOCK, is that in this tool we do not need to impose constraints on the biomass values since all the optimal solutions for different biomass values can be obtained. This is advantageous since it does not require to specify an a priori biomass growth rate, which could impact/bias the obtained solution and allows an a posteriori analysis of the optimal frontier and choice of the points to enumerate deletions.

In MOMO all models are solved using exact methods. This have both the positive aspect that guarantees global optimality and the negative aspect that in practice, depending on the number of objectives to optimize and the choice of K, the problem may be too hard to be solved in a reasonable time, since in the worst case, the space of solutions may grow in the order of nK. Our experiences show that, using a modern laptop, solving instances with K>3 and/or more than 3 objective functions may be prohibitive in practice. To illustrate this, in Table 2 we show the running time for the enumeration of all deletion sets of point G of the Pareto frontier of Fig. 2 for K=1,2,3.

Table 2 The mean wall-clock time (from 3 runs) that MOMO took to run the FBA (step 1), BOP (step 3) with two objectives and the enumeration procedure for K=1,2,3 for the point G of the Pareto front of Fig. 2 just to illustrate the difficulty of the problems

Other advantageous aspects of MOMO compared with other approaches is the integration of an enumeration framework that instead of giving one single deletion is able to enumerate candidate(s) that may provide an improved production of the target(s) compound(s). As described above, this approach is based on having a reference flux so we can identify the active reactions to be candidates for the knock-out strategies. This reference flux could come from empirical data or FBA simulations for example. In our case we considered the flux obtained from the FBA model optimizing for biomass. A potential problem may be that this model have many optimal solutions that are different enough to have an impact on the enumeration procedure. An alternative to reduce this impact is to replace the MOMA approach by the more computationally costly one known as ROOM [33] that considers only the presence/absence of flux in a reaction instead of its actual flux value. In our test case discussed in “Yeast fermentations” section, due to the flux constraints used, the impact of different optimal solutions was not too important in the enumeration (i.e. the majority of the deletions enumerated were the same regardless of the FBA solution considered as reference). There are some methods like SIMPLEOPTKNOCK that also have enumeration features. The drawback with SIMPLEOPTKNOCK is the fact that it enumerates all possible deletions ranking them trough a score (wild-type prediction of target production). This results in an unpractical number of deletions to try empirically and often the score is unrelated with the real impact of the deletion. On the other hand, our approach enumerates a reduced number of deletions, instead of all possible ones, thus enabling a more suitable identification of candidates for subsequent experimental approaches.

We have tested the effect of the reaction deletions predicted by MOMO for ethanol production, with particular emphasis on reactions whose link with ethanol synthesis was not intuitive and could not be easily anticipated.

Out of the 19 deletions tested, four (deletion of IDP1, IPT1, LOT6 and SDH2) led to improvement in the production titers, comparing with the titers obtained by wild-type cells; these improvements ranging from 2% obtained for the Δidp1 strain to 13% obtained for the Δlot6 mutant. Although these improvements might seem small, it is well known that even small increases in the final tier of ethanol represent highly significant add-values to the economy of the bioprocess, highly restrained by the high costs of the distillation step required to obtain ethanol from the fermentation broth [13]. IDP1 and SDH2 encode enzymes that participate in the Krebs cycle and therefore their inactivation may result in a lower flux of the cycle thereby increasing the amount of pyruvate that can then be channeled for increased production of ethanol. In the case of IPT1 and LOT6, which encode respectively a inositolphosphotransferase involved in synthesis of complex sphingolipids and a cytosolic FMN-dependent NAD(P) H:quinone reductase, the improvements that appear to result from their deletions in ethanol production are far less easy to understand. Under the conditions that we have used we could not detect a significant difference between growth of the Δipt1 and Δlot6 strains and of the wild-type strain BY4741 thereby indicating that the improved production exhibited by the mutants does not appear to result from their reduced fitness. Despite the molecular basis underlying the improved production phenotype may not be easily understood, the fact that MOMO has identified non-obvious target reactions is of particular high interest.

Conclusion

In this paper, a new method, called MOMO based on a multi-objective mixed integer optimization, was developed and explored with the aim of identifying deletion strategies when several objectives should be taken into account and the variables are both continuous (fluxes) and discrete (deletions). We tested the proposed approach for ethanol and biomass maximization in yeast, identifying all the possible single deletions belonging to the Pareto front. MOMO is implemented and freely available at http://momo-sysbio.gforge.inria.fr. It can be easily applied to other cases where several metabolic functions can be optimized.

Ongoing work includes the testing of other objectives to be minimized, such as toxicity, and of other specific linear combinations of reactions to be included in the model. MOMO significantly expands the metabolic engineering tools available for identifying the best mutant strains and can therefore provide a wider view of the solution space for improving the production of valuable compounds.

Availability and requirements

Project name: MOMO

Project home page: http://momo-sysbio.gforge.inria.fr/

Operating system(s): Linux

Programming language: C++

Other requirements: SCIP optimization suite v. 3.2.1

License: ZIB Academic

Any restrictions to use by non-academics: licence needed for SCIP optimization suite

Availability of data and materials

All data referenced and analyzed are public and available online.

Abbreviations

BLMBP:

Bi-Level Mixed Binary (linear) Program

BLMIP:

Bi-Level Mixed Integer Program

BOMBP:

Bi-Objective Mixed Binary (linear) Program

FBA:

Flux balance analysis

MOMA:

Minimization of metabolic adjustment

MOMO:

Multi-objective metabolic mixed integer optimization

References

  1. 1

    Handl J, Kell DB, Knowles J. Multiobjective optimization in bioinformatics and computational biology. IEEE/ACM Trans Comput Biol Bioinforma. 2007; 4(2):279–292. https://doi.org/10.1109/tcbb.2007.070203.

    CAS  Article  Google Scholar 

  2. 2

    Villaverde AF, Bongard SP, Mauch K, Balsa-Canto E, Banga JR. Metabolic engineering with multi-objective optimization of kinetic models. J Biotechnol. 2016; 222:1–8.

    CAS  PubMed  Article  Google Scholar 

  3. 3

    Wu W-H, Wang F-S, Chang MS. Multi-objective optimization of enzyme manipulations in metabolic networks considering resilience effects. BMC Syst Biol. 2011; 5(1):145.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  4. 4

    Patané A, Jansen G, Conca P, Carapezza G, Costanza J, Nicosia G. Multi-objective optimization of genome-scale metabolic models: the case of ethanol production. Ann Oper Res. 2019; 276(1):211–227.

    Article  Google Scholar 

  5. 5

    Burgard AP, Pharkya P, Maranas CD. Optknock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnol Bioeng. 2003; 84(6):647–57.

    CAS  PubMed  Article  Google Scholar 

  6. 6

    Chowdhury A, Zomorrodi AR, Maranas CD. Bilevel optimization techniques in computational strain design. Comput Chem Eng. 2015; 72:363–72.

    CAS  Article  Google Scholar 

  7. 7

    Tepper N, Shlomi T. Predicting metabolic engineering knockout strategies for chemical production: accounting for competing pathways. Bioinformatics. 2009; 26(4):536–43.

    PubMed  Article  CAS  Google Scholar 

  8. 8

    Hartmann A, Vila-Santa A, Kallscheuer N, Vogt M, Julien-Laferrière A, Sagot MF, et al.OptPipe - a pipeline for optimizing metabolic engineering targets. BMC Syst Biol. 2017; 11(1):143. https://doi.org/10.1186/s12918-017-0515-0.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  9. 9

    Maia P, Rocha M, Rocha I. In silico constraint-based strain optimization methods: the quest for optimal cell factories. Microbiol Mol Biol Rev. 2016; 80(1):45–67.

    PubMed  Article  Google Scholar 

  10. 10

    Hädicke O, Klamt S. Computing complex metabolic intervention strategies using constrained minimal cut sets. Metab Eng. 2011; 13(2):204–213. https://doi.org/10.1016/j.ymben.2010.12.004.

    PubMed  Article  CAS  Google Scholar 

  11. 11

    Maia P, Rocha I, Ferreira EC, Rocha M. Evaluating evolutionary multiobjective algorithms for the in silico optimization of mutant strains. In: Oct 2008 in 2008 8th IEEE International Conference on BioInformatics and BioEngineering. IEEE: 2008. https://doi.org/10.1109/bibe.2008.4696733.

  12. 12

    Costanza J, Carapezza G, Angione C, Lió P, Nicosia G. Robust design of microbial strains. Bioinformatics. 2012; 28(23):3097–104.

    CAS  PubMed  Article  Google Scholar 

  13. 13

    Mussatto SI, Dragone G, Guimarães PM, Silva JPA, Carneiro LM, Roberto IC, et al.Technological trends, global market, and challenges of bio-ethanol production. Biotechnol Adv. 2010; 28(6):817–830.

    CAS  PubMed  Article  Google Scholar 

  14. 14

    Sendín OH, Vera J, Torres NV, Banga JR. Model based optimization of biochemical systems using multiple objectives: a comparison of several solution strategies. Math Comput Model Dyn Syst. 2006; 12(5):469–87.

    Article  Google Scholar 

  15. 15

    Vera J, De Atauri P, Cascante M, Torres NV. Multicriteria optimization of biochemical systems by linear programming: application to production of ethanol by Saccharomyces cerevisiae. Biotechnol Bioeng. 2003; 83(3):335–43.

    CAS  PubMed  Article  Google Scholar 

  16. 16

    Edwards J, Palsson B. The Escherichia coli MG1655 in silico metabolic genotype: its definition, characteristics, and capabilities. Proc Nat Acad Sci. 2000; 97(10):5528–33.

    CAS  PubMed  Article  Google Scholar 

  17. 17

    Edwards JS, Palsson BO. Metabolic flux balance analysis and the in silico analysis of Escherichia coli K-12 gene deletions. BMC Bioinforma. 2000; 1(1):1.

    CAS  Article  Google Scholar 

  18. 18

    Orth JD, Thiele I, Palsson BØ. What is flux balance analysis?. Nat Biotechnol. 2010; 28(3):245–8.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  19. 19

    Ramakrishna R, Edwards JS, McCulloch A, Palsson BO. Flux-balance analysis of mitochondrial energy metabolism: consequences of systemic stoichiometric constraints. Am J Physiol Regul Integr Comp Physiol. 2001; 280(3):R695–704.

    CAS  PubMed  Article  Google Scholar 

  20. 20

    Raman K, Chandra N. Flux balance analysis of biological systems: applications and challenges. Brief Bioinforma. 2009; 10(4):435–449.

    CAS  Article  Google Scholar 

  21. 21

    Varma A, Palsson BO. Metabolic Flux Balancing: Basic Concepts, Scientific and Practical Use. Bio/technology. 1994; 12(10):994–8.

    CAS  Article  Google Scholar 

  22. 22

    Varma A, Palsson BO. Stoichiometric flux balance models quantitatively predict growth and metabolic by-product secretion in wild-type Escherichia coli W3110. Appl Environ Microbiol. 1994; 60(10):3724–31.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  23. 23

    Heirendt L, Arreckx S, Pfau T, Mendoza SN, Richelle A, Heinken A, et al.Creation and analysis of biochemical constraint-based models: the COBRA Toolbox v3.0. ArXiv e-prints arXiv:171004038. 2017.

  24. 24

    Segrè D, Vitkup D, Church GM. Analysis of optimality in natural and perturbed metabolic networks. Proc Nat Acad Sci. 2002; 99(23):15112–7.

    PubMed  Article  CAS  Google Scholar 

  25. 25

    Achterberg T. SCIP: solving constraint integer programs. Math Program Comput. 2009; 1(1):1–41.

    Article  Google Scholar 

  26. 26

    Borndörfer R, Schenker S, Skutella M, Strunk T. PolySCIP In: Greuel GM, Koch T, Paule P, Sommese A, editors. Mathematical Software – ICMS 2016, 5th International Conference, vol. 9725. Berlin: 2016. p. 259–264.

  27. 27

    Teixeira MC, Godinho CP, Cabrito TR, et al., Increased expression of the yeast multidrug resistance ABC transporter Pdr18 leads to increased ethanol tolerance and ethanol production in high gravity alcoholic fermentation. Microb Cell Fact. 2012; 11:98. https://doi.org/10.1186/1475-2859-11-98.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  28. 28

    Heavner BD, Smallbone K, Barker B, Mendes P, Walker LP. Yeast 5 – an expanded reconstruction of the Saccharomyces cerevisiae metabolic network. BMC Syst Biol. 2012; 6(1):55. https://doi.org/10.1186/1752-0509-6-55.

    PubMed  PubMed Central  Article  Google Scholar 

  29. 29

    Nevoigt E. Progress in metabolic engineering of Saccharomyces cerevisiae. Microbiol Mol Biol Rev. 2008; 72(3):379–412.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  30. 30

    Papapetridis I, van Dijk M, van Maris AJA, Pronk JT. Metabolic engineering strategies for optimizing acetate reduction, ethanol yield and osmotolerance in Saccharomyces cerevisiae. Biotechnol Biofuels. 2017; 10(1):107. https://doi.org/10.1186/s13068-017-0791-3.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  31. 31

    Bro C, Regenberg B, Förster J, Nielsen J. In silico aided metabolic engineering of Saccharomyces cerevisiae for improved bioethanol production. Metab Eng. 2006; 8(2):102–111. https://doi.org/10.1016/j.ymben.2005.09.00.

    CAS  PubMed  Article  Google Scholar 

  32. 32

    Patil KR, Rocha I, Förster J, Nielsen J. Evolutionary programming as a platform for in silico metabolic engineering. BMC bioinforma. 2005; 6(1):308.

    Article  CAS  Google Scholar 

  33. 33

    Shlomi T, Berkman O, Ruppin E. Regulatory on/off minimization of metabolic flux changes after genetic perturbations. Proc Nat Acad Sci. 2005; 102(21):7695–7700. https://doi.org/10.1073/pnas.0406346102.

    CAS  PubMed  Article  Google Scholar 

Download references

Funding

This work was partially supported by FCT and EraNet in Industrial Biotechnology (6th edition, contract ERA-IB-2-6/0003/2014) and by the European Union, FP 7 BacHBerry (www.bachberry.eu), Project No. FP7-613793. SV acknowledges support by Program Investigador FCT (IF/00653/2012) from FCT, co-funded by the European Social Fund (ESF) through the Operational Program Human Potential (POPH). Funding received by iBB-Institute for Bioengineering and Biosciences (UID/BIO/04565/2013, UIDB/04565/2020), IDMEC, under LAETA (project UIDB/50022/2020), and INESC-ID (UIDB/50021/2020), through Fundação para a Ciência e a Tecnologia (FCT) - Portuguese Foundation for Science and Technology, and from Programa Operacional Regional de Lisboa 2020 (Project N. 007317) is also acknowledged. This work was also supported by the São Paulo Research Foundation (FAPESP) grant #2017/05986-2. The funding bodies had no role in the design of the study, collection, analysis or interpretation of data, or in writing the manuscript.

Author information

Affiliations

Authors

Contributions

RA, MD, M-FS, NPM and SV contributed with the study conception and design of the method. RA and MD worked in the software implementation. JLS and NPM worked in the acquisition of data and conceived and planned the biological experiments. RA, MD, JLS, M-FS, NPM and SV conceived and planned the numerical experiments and also worked in the data analysis and interpretation of the results. RA, MD, JLS, M-FS, NPM and SV contributed in drafting and reviewing the article and also approved the final of the version to be published.

Corresponding authors

Correspondence to Nuno P. Mira or Susana Vinga.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

SV and M-FS are members of the Editorial Board of BMC Bioinformatics. The remaining authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1

Appendix S1. Additional theoretical material on multi-objective optimization.

Additional file 2

Table S1: Flux distribution at each Pareto point. Table S2: All deletion candidates enumerated by Momo.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Andrade, R., Doostmohammadi, M., Santos, J.L. et al. MOMO - multi-objective metabolic mixed integer optimization: application to yeast strain engineering. BMC Bioinformatics 21, 69 (2020). https://doi.org/10.1186/s12859-020-3377-1

Download citation

Keywords

  • Optimization
  • Metabolic Engineering
  • Systems Biology