DOTcvpSB, a software toolbox for dynamic optimization in systems biology

Background Mathematical optimization aims to make a system or design as effective or functional as possible, computing the quality of the different alternatives using a mathematical model. Most models in systems biology have a dynamic nature, usually described by sets of differential equations. Dynamic optimization addresses this class of systems, seeking the computation of the optimal time-varying conditions (control variables) to minimize or maximize a certain performance index. Dynamic optimization can solve many important problems in systems biology, including optimal control for obtaining a desired biological performance, the analysis of network designs and computer aided design of biological units. Results Here, we present a software toolbox, DOTcvpSB, which uses a rich ensemble of state-of-the-art numerical methods for solving continuous and mixed-integer dynamic optimization (MIDO) problems. The toolbox has been written in MATLAB and provides an easy and user friendly environment, including a graphical user interface, while ensuring a good numerical performance. Problems are easily stated thanks to the compact input definition. The toolbox also offers the possibility of importing SBML models, thus enabling it as a powerful optimization companion to modelling packages in systems biology. It serves as a means of handling generic black-box models as well. Conclusion Here we illustrate the capabilities and performance of DOTcvpSB by solving several challenging optimization problems related with bioreactor optimization, optimal drug infusion to a patient and the minimization of intracellular oscillations. The results illustrate how the suite of solvers available allows the efficient solution of a wide class of dynamic optimization problems, including challenging multimodal ones. The toolbox is freely available for academic use.


Background
Optimization plays a key role in computational biology and bioinformatics [1,2]. Dynamic optimization, also known as open-loop optimal control, seeks the maximization or minimization of a suitable performance index (which characterizes the solution quality) of a dynamic system taking into account possible equality or inequality constraints. The solution is represented by the optimal decision variables, which can be continuous (real numbers), discrete (integer numbers), or both. Continuous variables can be used to encode time-varying stimuli, while discrete variables usually represent events (like an on/off switch) or configurations. An overview of optimization in the context of computational systems biology was given by [3] and more recently by [4], the latter highlighting the need of robust and efficient dynamic optimization methods. Examples of relevant problems covered there include optimal control for modification of selforganized dynamics, optimal experimental design, dynamic flux balance analysis, the discovery of biological network design strategies and computational design of integrated biological circuits (synthetic biology).
A popular numerical approach for solving dynamic optimization problems is the control vector parameterization (CVP) method [5], which transforms the original problem into an outer non-linear programming (NLP) or mixedinteger non-linear programming (MINLP) problem, with an inner initial value problem (IVP). Solving the outer problem requires a suitable (MI)NLP solver. Since most biological systems are non-linear, the resulting optimization problems are frequently multimodal and very challenging to solve, so it is necessary to use proper global optimization methods [6].
This work presents DOTcvpSB, a user friendly MATLAB dynamic optimization toolbox based on the CVP method, which provides an easy to use environment while ensuring a good numerical performance. Users only need to define their dynamic optimization problems via a simple and compact input file which is close to the standard mathematical notation. Advanced users can tweak many configuration options for the different solvers in order to fine-tune the solution process. Although other existing toolboxes and software packages allow the definition and solution of optimization problems in systems biology (e.g. COPASI [7], PottersWheel [8] or SBtoolbox2 [9], to name a few), they are restricted to problems where the decision variables are static (time-independent). DOTcvpSB allows the definition and solution of dynamic optimization problems where decision variables are timedependent, thus reaching a much broader class of optimization problems.

Implementation
In this section, we first describe the class of problems considered and the framework chosen for its numerical solution. Next, we describe the organization and capabilities of the toolbox, highlighting its key features and modules.

Mixed-integer Optimal Control Problem
The mixed-integer optimal control problem, also called mixed-integer dynamic optimization (MIDO) problem, considers the computation of time dependent operating conditions (controls), discrete -binary or integer-decisions and time-independent parameters so as to minimize (or maximize) a performance index (or cost function) while keeping a set of constraints coming from safety and/ or quality demands and environmental regulations. Mathematically this is formulated as follows: Find u(t), i(t), p and t f so as to minimize (or maximize): subject to: where is the vector of state variables, is the vector of real valued control variables, is the vector of integer control variables, is the vector of time-independent parameters, t f is the final time of the process, m e , m i represent the number of equality and inequality constraints, respectively and g collects all state constraints, pathway, pointwise and final time constraints and u L , i L , p L , u U , i U , p U correspond to the lower and upper bounds for the control variables and the time-independent parameters.

Control Vector Parameterization
DOTcvpSB is based on the control vector parameterization (CVP) framework to solve the class of problems stated above. The CVP methodology proceeds dividing the control variables (u(t) and i(t)) into a number of elements and then approximating each element by means of different basis functions, usually low order polynomials. In this way the control variables are parameterized using w u ∈ R ρ and w i ∈ Z ρ , which become decision variables. This parameterization transforms the original infinite dimensional problem into a finite dimension (mixedinteger) non-linear programming problem that may be solved by a suitable (MI)NLP solver. Note that the evaluation of the objective function and constraints requires the solution of the system dynamics by solving an inner initial value problem (IVP).
If the outer (MI)NLP problem is convex, deterministic (gradient-based) local methods are the best alternative to efficiently solve it. In this regard, (mixed-integer) sequential quadratic programming methods, such as MISQP [10], can be considered the state-of-the-art. Nevertheless, in presence of non-convexities, local methods usually present convergence to local minima, thus requiring the use of global optimization methods.
Global optimization methods can be roughly classified in two major groups: deterministic and stochastic methods. Certain deterministic global methods can guarantee global optimality for particular classes of problems, although the computational cost becomes infeasible for problems of realistic size. They have been recently applied for the solution of MIDO problems [11,12]. Regarding stochastic methods, several works, as reviewed by [6], have illustrated their potential for dynamic optimization (DO) and, more recently, for mixed-integer dynamic optimization (MIDO) [13]. Stochastic methods usually locate the vicinity of global solutions with reasonable efficiency, but the cost to pay is that global optimality can not be guaranteed. Alternatives such as global-local hybrid methods have been presented both for DO [14] and MIDO [15], significantly improving the computational efficiency. Thus, we could summarize the current state-of-the-art in this domain by concluding that there is no silver bullet for global optimization of arbitrary MIDO problems. And this is why DOTcvpSB includes a suite of optimization solvers, following a "Swiss Army knife" approach.
Many of these optimization methods require the computation of the gradient of the objective and/or constraints with respect to the decision variables. Vassiliadis [5] proposed the use of first order parametric sensitivities to compute such information. The sensitivity equations result from a chain rule differentiation applied to the system defined in Eqns. 2 with respect to the decision variables and may be solved in combination with the original system. For this purpose, the use of Backward Differentiation Formulas (BDF) methods is very attractive since they are able to exploit the fact that the original system and the sensitivities share the same Jacobian.

Toolbox description
DOTcvpSB has been implemented in MATLAB http:// www.mathworks.com following the scheme presented in Figure 1. The original dynamic optimization or mixedinteger dynamic optimization problem is solved numerically by the use of a suitable optimizer (outer iteration) General scheme for the solution of a DO problem in DOTcvpSB Figure 1 General scheme for the solution of a DO problem in DOTcvpSB. DOTcvpSB: solution scheme based on CVP for mixed-integer dynamic optimization problems.
which requires the solution of an IVP (inner iteration) which will in general consist on a set of ODEs plus a set of sensitivities to compute gradient information. The solution of the inner IVP is accomplished by calls to tailored solvers from the SUite of Nonlinear and DIfferential/ ALgebraic equation Solvers (SUNDIALS) [16], more specifically CVODES. Since these simulations are the most computationally demanding task in the CVP method, our toolbox can automatically create compiled dynamically linked subroutines (known as MEX files in MATLAB) for the ODEs, Jacobian, and sensitivities, thus ensuring high performance.

Key Features
The core capabilities of the toolbox can be summarized as follows: • handling of a wide class of dynamic optimization problems, including constrained, unconstrained, fixed, and free terminal time problems described by ordinary differential equations (ODEs), as well as continuous and mixed integer decision variables; • the inner initial value problem (IVP) is solved using the state-of-the-art methods available in SUNDIALS [16]; • the outer (MI)NLP problem can be solved using a number of advanced solvers, including local deterministic methods, stochastic global optimization methods, and hybrid metaheuristics; • in addition to the traditional single optimization approach, the toolbox also offers more sophisticated strategies, like multistart, sucessive re-optimization [17], and hybrid strategies [14]; • a graphical user interface (GUI) which makes the definition and edition of a problem more easy and clear; • possibility of importing SBML models [18]; • many output options for the results, including detailed figures.

Description of main modules
The toolbox contains a number of modules (implemented as MATLAB functions) which can be grouped in two categories: • utility modules: graphical user interface (GUI), simulation, and SBML-import modules; • optimization modules: offering several optimization strategies, namely single optimization, multi-start, successive re-optimization, and hybrid optimization modules.

Utility modules
The utility modules offer several facilities for the definition, checking, and handling of problems. The toolbox can be operated through two equivalent approaches: by the use of the GUI, or directly from the command line (from where scripts with problem definitions can be created and executed). It also offers a module to import dynamic models from SBML files, and the imported models can be checked by a simulation module.
• Graphical User Interface (GUI) module: this module was developed in order to help users in the definition and execution of problems. With the help of this module, which follows an intuitive wizard-like approach, problem definitions and modifications are guided in an easy and convenient stepwise manner, especially indicated for entry users.
• Simulation module: this module carries out the dynamic simulation of the user-defined dynamics (plus assigned initial conditions and controls) generating the corresponding state trajectories. This module is especially useful for checking the model correctness during the definition phase, which is particularly error-prone. Typical errors like those related with units inconsistencies can be readily identified with this procedure.
• SBML to DOTcvpSB module: this module allows DOTcvpSB to import the systems dynamics from SBML (Systems Biology Markup Language) models [18,19]. Once a dynamic model is imported, it is necessary to check the model correctness by simulation (previous module). If everything works correctly, the user can proceed with the definition of the other terms of the dynamic optimization problem (performance index, constraints) and, finally, with its numerical solution.

Optimization modules
The optimization modules offers a suite of four different optimization strategies, each one with different options for the optimization solvers, following the "Swiss Army knife" approach mentioned previously. All these modules are described in more detail below.
• Single optimization module: this module makes a single call to one of the optimization solvers, which can be either a local deterministic or global stochastic method (see available solvers below). This procedure can be sufficient for well conditioned, convex problems, or non-convex problems which are cheap to evaluate. In any case, it is recommended as the first strategy to try with any new problem.
• Multi-start optimization module: this modules runs a selected optimization solver (typically a local one) repeatedly. The set of solutions (performance index values) obtained can then be analyzed (e.g. plotting a histogram) in order to check the multimodality of the problem.
• Sucessive re-optimization module: Sucessive reoptimization can be used to speed up the convergence for problems where a high discretization level is desired (e.g. those where the control profiles behave wildly). This procedure runs several successive single optimizations automatically increasing the control discretization, NLP, and IVP tolerances after each run.
• Hybrid optimization module: Hybrid optimization is characterized by the combination of a stochastic global method plus a deterministic local method. This procedures ensures a compromise between the robustness of global methods and the efficiency of local ones. This module is especially indicated for difficult multimodal problems. In any case, tweaking the hybrid method requires a deep knowledge of the solvers, and this approach will be almost always more costly (in CPU time) than the single optimization procedures using local methods (the price to pay for the increased robustness).

Numerical optimization methods (NLP and MINLP solvers)
The toolbox provides interfaces to several optimization state-of-the-art solvers: • local deterministic 1. IPOPT [20] implements a primal-dual interior point method, and uses line searches based on Filter methods; 2. FMINCON [21] is a part of the MATLAB optimization toolbox which uses sequential quadratic programming (SQP); 3. MISQP [10] solves mixed-integer non-linear programming problems by a modified sequential quadratic programming method; • stochastic global 1. DE [22] uses population based approach for minimizing the performance index; 2. SRES [23] uses an evolution strategy combined with an approach to balance objective and penalty functions; • and hybrid metaheuristics 1. ACOmi [15] is inspired by ants foraging behavior, using MISQP for local searches; 2. MITS [13] is based on extensions of the Tabu Search metaheuristic, using MISQP for local searches; where the deterministic MISQP solver and all hybrid solvers are able to handle mixed-integer problems directly. Users can change solvers by simply changing an option in the input data structure, thus requiring no problem reformulation.

Numerical simulation method (IVP solvers)
Forward integration of the ODE, Jacobian, and sensitivities (when needed) is ensured by CVODES, a part of SUN-DIALS [16], which is also able to perform simultaneous or staggered sensitivity analysis. The IVP problem can be solved with the Newton or Functional iteration module and with the Adams or BDF linear multistep method (LMM). The Adams method is recommended for solving of the non-stiff problems while BDF is recommended for solving of the stiff problems. Note that the sensitivity equations are provided analytically and the error control strategy for the sensitivity variables could be enabled.

Recommended operating procedure
It should be noted that, for a general MIDO formulation, there is no a priori way to distinguish if the resulting MINLP will be convex or not inside the search space considered, so the user has no clue on which optimization strategy should be using. Thus, we recommend that, for any new problem, the user follows this protocol: • Step 1: try solving the problem with the single optimization strategy and a local deterministic method, such as FMINCON or IPOPT for DO problems, or MISQP for MIDO problems, using a rather crude control discretization (e.g. 10 elements). After obtaining a solution, repeat changing the initial guess for the control variable. If a rather different solution is obtained, suspect multimodality and go to step 2 below. If not, solve the problem again using a finer discretization. For faster and more satisfactory results regarding control discretization, use the successive re-optimization module.
• Step 2: solve the problem using the multi-start optimization module. In general 100 runs is a sensible number for this task, but for costly problems the user might want to reduce this. Plotting an histogram of the resulting set of solutions will give a good view of the problem multimodality. For clearly multimodal problems, go to step 3. If not, stop, or go back to step 1 if e.g. more refined control levels are desired. •

Results and discussion
This section illustrates the usage and performance of the different modules of DOTcvpSB considering several illustrative examples.

Importing and checking a SBML dynamic model
For illustrative purposes, a dynamic model of the cell cycle [24] was chosen and imported into the DOTcvpSB toolbox. The problem is marked as BIOMD0000000005, Tyson1991_CellCycle_6var, 1831270 can be downloaded as an '.xml' file from the Biomodels database web page: http://www.ebi.ac.uk/biomodels/.
After importing it using function dotcvp_sbml2dotcvpsb, the user should perform a dynamic simulation using the simulation module to check the model. Figure 2 shows all state trajectories of the cdc2-cyclin model simulated with the constant parameters supplied in the above version.

Single optimization
Here we solve a relatively simple problem to illustrate the usage of the single optimization strategy with a local deterministic solver.
Importing a dynamic model from SBML to DOTcvpSB  x 3 x 4 x 5 x 6 x 7

Successive re-optimization
Here we show how to use the successive re-optimization module in order to obtain refined optimal control profiles.

Lee-Ramirez bioreactor
This example considers the optimal control of a fed-batch bioreactor for induced foreign protein production by recombinant bacteria. This problem was first presented by Lee et al [25], slightly modified by Tholudur et al [26], and later solved using a second order sensitivities approach [27]. The objective is to maximize the profitability of the process using the nutrient (u 1 ) and the inducer feeding rates (u 2 ) as control variables. Several different values for the ratio of the cost of inducer to the value of the protein production (Q) were published in the literature, but here we consider the particular case of Q = 2.5. Mathematically, the statement is to find the control trajectories that maximize the performance index at the fixed final time subject to where the state variables represent the reactor volume (x 1 ), the cell density (x 2 ), the nutrient concentration (x 3 ), the foreign protein concentration (x 4 ), the inducer concentration (x 5 ), the inducer shock factor on cell growth rate (x 6 ), and the inducer recovery factor on cell growth rate (x 7 ). The final time is specified as 10 h. The additional

Hybrid optimization
Here we solve a multimodal problem using the powerful hybrid strategy, where the adequate combination of an stochastic global and a deterministic local solver allows reaching the vicinity of global solution in a reasonable computation time.

Drug displacement problem with path constraint
Here we consider a modified formulation of the drug displacement problem (defined above) adding an state path constraint, which is set to ensure that the warfarin concentration in the patient's blood does not exceed a dangerous level. The constraint is defined as follows This problem has been reported to be highly multimodal, therefore its solution must be approached by the use of a suitable global method. On the other hand, a combination of a global and a local method (hybrid approach) should be more efficient. To illustrate this, we solved this problem using (i) the global DE solver (in single optimization mode) and (ii) a hybrid combining DE and MISQP solver. Using ρ = 10 free time intervals, both approaches converged to a similar solution, with a performance index (infusion final time) of 266.09. In addition, the inequality and all equality constraints were violated less than the pre-set tolerance of 10 -8 . But the hybrid approach was Optimal trajectories for the drug displacement problem It should be mentioned that these results are again in very close agreement with those presented in the above cited literature. The optimal trajectories are shown in Figure 6.

Multistart and single optimization with a global method
The multistart strategy is a good way of checking the possible non-convexity of problems. When the multimodality of a problem has been confirmed, users can choose a global or a hybrid strategy to find a solution in the close vicinity of the global one. We illustrate all this here considering a challenging MIDO problem.

Phase resetting of a calcium oscillator problem: a mixed-integer dynamic optimization problem
We have considered a calcium oscillator model describing intracellular calcium spiking in hepatocytes induced by an extracellular increase in adenosine triphosphate (ATP) concentration, as originally proposed in [28] and later slightly modified and solved in [29,30]. The aim of the optimization is to minimize the intracellular oscillations behavior with the help of two binary control variables (i 1 , i 2 ). The values of these variables and the time of the switching from one mode to another, together with the time-independent parameter (p 1 ), are the decision variables. The performance index is formulated as the minimization of the state variables deviations with respect to Optimal control of the Lee-Ramirez bioreactor Optimal trajectories for the drug displacement problem with path constraint Figure 6 Optimal trajectories for the drug displacement problem with path constraint. Optimal state trajectories (A) and control profile (B) for the drug displacement problem with path constraint on the warfarin concentration amount. Solution found with the hybrid global-local strategy. These authors reported that the system is extremely sensitive to small perturbations in the stimulus.
We first solved this problem using the multistart module of the DOTcvpSB toolbox, using MISQP as local solver.
The control discretization level was set to a value of ρ = 5 with free transition times and two binary decision variables for the controls. The multistart number of runs was set to 100, with randomly generated initial values for all the decision variables in each run. The set of solutions found were spread in a quite wide range, a clear sign of multimodality. The histogram of these solutions is shown in Figure 7, where performance index values worse than 2500.00 are not shown. The best value (for the reduced cost term above) obtained by the multistart was 1641.03, which is still far from the published solution reported above.
In a second step, we solved this problem using the MITS hybrid strategy, while keeping all the other settings as stated above. The best solution found by MITS was 1542.50, which is very close to the value reported in [29]. The corresponding optimal trajectories are shown in Figure 8 where it can be seen how the optimal control policies rapidly cancel the oscillations.

Conclusion
Here we have presented DOTcvpSB, a MATLAB toolbox for solving dynamic optimization problems from the domain of systems biology. This toolbox is able to handle very general mixed-integer dynamic optimization formulations, thus providing the opportunity to state and solve complex problems, such as e.g. optimal control for obtaining a desired biological performance, dynamic analysis of network designs or computer aided design of biological units. Problems are easily defined via a compact input structure, or optionally using a graphical user interface.
This toolbox has been developed placing particular care in providing state-of-the-art solvers in order to ensure a good compromise between computational robustness and efficiency. DOTcvpSB offers two key and unique advantages: • It incorporates a suite of local and global optimization solvers so as to handle a wide range of problems, including non-convex (multimodal) ones.
• It offers several optimization strategies, including single, multistart, sucessive reoptimization and hybrid methods. These strategies can be effectively used to enhance the solution of difficult multimodal problems.
The capabilities and performance of DOTcvpSB were successfully tested using several challenging benchmarks problems taken from the open literature. The results confirmed that the toolbox was able to get excellent results in reasonable computation times, showing a good compromise between robustness and efficiency.   On the other hand, the Symbolic Math Toolbox is needed if automatic generation of sensitivities and Jacobian are desired (recommended). Users must install the SBML and libSMBL toolboxes in order to be able to import SBML models.
License: The toolbox can be obtained and used for free for academic purposes, and is under the creative commons license. The conditions of the license can be found on: http://creativecommons.org/licenses/by-nc-nd/3.0/ Any restrictions to use by non-academics: Following the previous license.
Optimal trajectories for the calcium oscillator problem Figure 8 Optimal trajectories for the calcium oscillator problem. Optimal state trajectories (A, B, C, D) (blue lines), desired states (dotted red lines), and control profiles (E, F) (green lines) for the calcium oscillator problem with two control variables.