Robust flux balance analysis of multiscale biochemical reaction networks
© Sun et al.; licensee BioMed Central Ltd. 2013
Received: 16 September 2012
Accepted: 8 July 2013
Published: 30 July 2013
Biological processes such as metabolism, signaling, and macromolecular synthesis can be modeled as large networks of biochemical reactions. Large and comprehensive networks, like integrated networks that represent metabolism and macromolecular synthesis, are inherently multiscale because reaction rates can vary over many orders of magnitude. They require special methods for accurate analysis because naive use of standard optimization systems can produce inaccurate or erroneously infeasible results.
We describe techniques enabling off-the-shelf optimization software to compute accurate solutions to the poorly scaled optimization problems arising from flux balance analysis of multiscale biochemical reaction networks. We implement lifting techniques for flux balance analysis within the openCOBRA toolbox and demonstrate our techniques using the first integrated reconstruction of metabolism and macromolecular synthesis for E. coli.
Our techniques enable accurate flux balance analysis of multiscale networks using off-the-shelf optimization software. Although we describe lifting techniques in the context of flux balance analysis, our methods can be used to handle a variety of optimization problems arising from analysis of multiscale network reconstructions.
where v l ,v u ∈ R n are lower and upper bounds on the fluxes and c represents a biologically motivated objective function. We refer to  for details about FBA.
where C v ≤ d includes constraints equivalent to (2) for many pairs of fluxes.
Given the inherent multiscale nature of integrated reconstructed networks, the constraint matrices of the FBA linear programs (1) and (3) often contain entries that vary over many orders of magnitude. We say that the problems are poorly scaled. Conducting FBA for such networks has been unsatisfactory because even state-of-the-art linear programming solvers can produce inaccurate (or erroneously infeasible) results. In particular, for the E. coli Metabolic-Expression model, applying CPLEX  and Gurobi  to (3) with default settings (scaling enabled) has produced results with large constraint violations.
In the context of the simplex method for linear programming, the constraints (including bounds) form a polytope in n-space. The condition of a basis matrix associated with a vertex of the polytope provides a quantitative measure of either the “sharpness” or the “flatness” of the vertex. Poorly scaled constraints tend to create a polytope with very sharp and/or very flat vertices. To alleviate numerical difficulties for problem (1), linear programming systems typically compute row and column scaling matrices D r ∈ Rm×m and D c ∈ Rn×n such that the nonzero entries of the scaled constraint matrix D r S D c are of order 1. Scaling can improve the condition of many bases, but it may be at the expense of making other bases more ill-conditioned (including the optimal basis). For some problems, such as (3), the scaled constraints may be satisfied accurately by the scaled solution , but when the solution is unscaled, may violate S v = 0 significantly. We refer to  for a comprehensive study of scaling and its effects on the performance of the simplex method.
Here we apply lifting techniques to poorly scaled constraints to make the vertices of the “lifted” polytope more regular. Note that small entries in S and C do not constitute poor scaling unless all entries in a row or column are small. (There are no such rows and columns in our test data, but in general they would be scaled up to have maximum entry 1.) Our explicit aim is to reduce the magnitude of the largest matrix entries so that the reformulated constraints do not need scaling.
Mass balance constraints
Our implementation of lifting techniques uses a parameter τ, set to 1024 in our experiments. Constraints containing entries larger than τ are reformulated.
After a simplex solver has returned an allegedly optimal basic solution, the accuracy of satisfying the general linear constraints (S v=0 and C v≤d in (3)) could be improved by applying a single step of classical iterative refinement , especially if extended precision were available. However, the refined basic solution could well lie outside its bounds, and further simplex iterations would be necessary. Ideally this difficulty would be handled by the simplex solver itself.
We note that more elaborate forms of iterative refinement have been used to improve the accuracy of linear programming solutions. Gleixner et al.  describe an incremental precision-boosting procedure that solves a sequence of linear programs, each attempting to correct the error in the previous optimal solution. The Zoom procedure of Saunders and Tenenblat  is an analogous strategy for interior methods.
Implementation in the openCOBRA toolbox
Lifting techniques for poorly scaled reactions and coupling constraints have been implemented in the openCOBRA toolbox 2.05 , a Matlab package for constraint-based reconstruction and analysis of biochemical networks. Algorithm 1 summarizes the main steps. Our implementation makes efficient use of auxiliary variables by reusing them if possible. Suppose metabolite A participates in more than one reaction with large stoichiometric coefficients. We can use the same auxiliary variable to decompose all reactions involving metabolite A, thereby keeping problem size to a minimum.
Results and discussion
We use our implementation of lifting techniques to conduct FBA on two Metabolic-Expression models of E. coli. The models (ME76664 and ME76589) represent the function of almost 2000 E. coli genes and involve 62212 metabolites, with 6087 coupling constraints C v ≤ d to enforce consistency between the predicted steady states of both metabolism and macromolecular synthesis. The first model (ME76664) accounts for 76664 reactions, and the second (ME76589) accounts for 76589 reactions. Because of the dependencies between pairs of metabolic reactions and macromolecular synthesis reactions, the resulting flux balanced steady state v has reaction rates that vary by four orders of magnitude . Both models have about 41,000 large matrix entries (exceeding τ = 1024), with 1825 entries exceeding 105 and biggest entry 8×105.
Conducting FBA on ME76664 using the CPLEX and Gurobi simplex and barrier solvers with default settings (including scaling) resulted in erroneous reports of infeasibility or “optimal” solutions that were significantly infeasible. Our own simplex solver SQOPT  with scaling activated would solve the scaled problem well, but unscaling would magnify the infeasibilities.
FBA results for ME76664 before and after lifting
FBA results for ME76589 before and after lifting
FVA results (simplex solvers) for ME76664 before and after lifting
FVA results (barrier solver) for ME76664 before and after lifting
We described techniques that enable off-the-shelf optimization software to be applied to multiscale network reconstructions, such as integrated networks that represent both metabolism and macromolecular synthesis. The techniques enable accurate FBA and FVA of an integrated model of metabolism and macromolecular synthesis in E. coli, previously impossible because of numerical difficulties encountered by solvers.
As in silico biologists create increasingly complex models that capture more of the multiscale nature of biological systems , the optimization problems that arise during the analysis of these models will also become increasingly poorly scaled. We are aware of researchers resorting to specialized packages such as  that rely upon rational arithmetic to obtain exact solutions to the FBA and FVA linear programs. Such solvers are likely to be prohibitively slow for analyzing larger, more comprehensive reconstructed networks. A more practical approach is to employ quadruple-precision arithmetic, which is increasingly available in Fortran and C compilers and is valuable even when implemented in software. In the meantime, our techniques enable the constraint-based modeling community to analyze increasingly sophisticated and comprehensive models of biological systems with improved efficiency and reliability. They could also be combined with the refinement approach of Gleixner et al. .
Availability and requirements
Lifting techniques for poorly scaled reactions and coupling constraints have been implemented in the openCOBRA toolbox 2.05 , a MATLAB package for constraint-based reconstruction and analysis of biochemical networks.
Project name: openCOBRA toolbox
Project home page: http://opencobra.sourceforge.net/
Operating system: platform independent
Programming language: MATLAB
Other requirements: MATLAB 2008a or higher
License: GNU GPLv3
Any restrictions to use by non-academics: A separate license must be acquired.
We are grateful to three referees for their insightful comments and suggestions. This work was supported by the Department of Energy (Offices of Advanced Scientific Computing Research and Biological and Environmental Research) as part of the Scientific Discovery Through Advanced Computing program, grant DE-FG02-09ER25917, by the National Institute of General Medical Sciences of the National Institutes of Health, award number U01GM102098, and by the Office of Naval Research, grant N00014-11-1-0067. The content is solely the responsibility of the authors and does not necessarily represent the official views of DOE, NIH, or ONR.
- Orth JD, Thiele I, Palsson BØ: What is flux balance analysis?. Nat Biotechnol. 2010, 28 (3): 245-248. 10.1038/nbt.1614.PubMed CentralView ArticlePubMedGoogle Scholar
- Thiele I, Fleming RMT, Que R, Bordbar A, Diep D, Palsson BØ: Multiscale modeling of metabolism and macromolecular synthesis in E. coli and its application to the evolution of codon usage. PLoS One. 2012, 7 (9): e45635-10.1371/journal.pone.0045635.PubMed CentralView ArticlePubMedGoogle Scholar
- Thiele I, Fleming RMT, Bordbar A, Schellenberger J, Palsson BØ: Functional characterization of alternate optimal solutions of Escherichia coli’s transcriptional and translational machinery. Biophys J. 2010, 98 (10): 2072-2081. 10.1016/j.bpj.2010.01.060.PubMed CentralView ArticlePubMedGoogle Scholar
- CPLEX mathematical programming solver. [http://www-01.ibm.com/software/integration/optimization/cplex-optimizer/]
- Gurobi mathematical programming solver. [http://www.gurobi.com/]
- Elble J, Sahinidis N: Scaling linear optimization problems prior to application of the simplex method. Comput Optimization Appl. 2012, 52: 345-371. 10.1007/s10589-011-9420-4.View ArticleGoogle Scholar
- Albersmeyer J, Diehl M: The lifted Newton method and its application in optimization. SIAM J Optim. 2010, 20 (3): 1655-1684. 10.1137/080724885.View ArticleGoogle Scholar
- Gouveia J, Parrilo PA, Thomas R: Lifts of convex sets and cone factorizations. ArXiv:1111.3164 2011Google Scholar
- Moler CB: Iterative refinement in floating point. J ACM. 1967, 14 (2): 316-321. 10.1145/321386.321394.View ArticleGoogle Scholar
- Gleixner A, Steffy D, Wolter K: Improving the accuracy of linear programming solvers with iterative refinement. ZIB-Report 12-19, Zuse Institute Berlin 2012View ArticleGoogle Scholar
- Saunders MA, Tenenblat L: The Zoom strategy for accelerating and warm-starting interior methods. Presented at INFORMS Annual Meeting, Pittsburgh, PA. Nov 5-8, 2006, [http://www.stanford.edu/group/SOL/talks/saunders-tenenblat-INFORMS2006.pdf]Google Scholar
- Schellenberger J, Que R, Fleming RMT, Thiele I, Orth JD, Feist AM, Zielinski DC, Bordbar A, Lewis NE, Rahmanian S, et al: Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0. Nature Protoc. 2011, 6 (9): 1290-1307. 10.1038/nprot.2011.308. [http://github.com/opencobra]View ArticleGoogle Scholar
- Gill PE, Murray W, Saunders MA: SNOPT: An SQP algorithm for large-scale constrained optimization. SIAM Review. 2005, 47: 99-131. 10.1137/S0036144504446096. [SIGEST article]View ArticleGoogle Scholar
- Savinell JM, Palsson BØ: Network analysis of intermediary metabolism using linear optimization. I. Development of mathematical formalism. J Theor Biol. 1992, 154 (4): 421-454. 10.1016/S0022-5193(05)80161-4.View ArticlePubMedGoogle Scholar
- Feist A, Henry C, Reed J, Krummenacker M, Joyce A, Karp P, Broadbelt L, Hatzimanikatis V, Palsson BØ: A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Mol Syst Biol. 2007, 3: 1-18.View ArticleGoogle Scholar
- Thiele I, Heinken A, Fleming RMT: A systems biology approach to studying the role of microbes in human health. Curr Opin Biotechnol. 2012, 21 (1): 4-12.Google Scholar
- Cook W, Koch T, Daniel E, Wolter K: An exact rational mixed-integer programming solver. Proceedings of the 15th international conference on Integer Programming and Combinatorial Optimization, IPCO’11. 2011, Berlin, Heidelberg: Springer-Verlag, 104-116. [http://dl.acm.org/citation.cfm?id=2018167]Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.