MEIGO: an open-source software suite based on metaheuristics for global optimization in systems biology and bioinformatics

Egea, Jose A; Henriques, David; Cokelaer, Thomas; Villaverde, Alejandro F; MacNamara, Aidan; Danciu, Diana-Patricia; Banga, Julio R; Saez-Rodriguez, Julio

doi:10.1186/1471-2105-15-136

Software
Open access
Published: 10 May 2014

MEIGO: an open-source software suite based on metaheuristics for global optimization in systems biology and bioinformatics

Jose A Egea¹,
David Henriques^2,3,
Thomas Cokelaer³,
Alejandro F Villaverde²,
Aidan MacNamara³,
Diana-Patricia Danciu³,
Julio R Banga² &
…
Julio Saez-Rodriguez³

BMC Bioinformatics volume 15, Article number: 136 (2014) Cite this article

8364 Accesses
118 Citations
15 Altmetric
Metrics details

Abstract

Background

Optimization is the key to solving many problems in computational biology. Global optimization methods, which provide a robust methodology, and metaheuristics in particular have proven to be the most efficient methods for many applications. Despite their utility, there is a limited availability of metaheuristic tools.

Results

We present MEIGO, an R and Matlab optimization toolbox (also available in Python via a wrapper of the R version), that implements metaheuristics capable of solving diverse problems arising in systems biology and bioinformatics. The toolbox includes the enhanced scatter search method (eSS) for continuous nonlinear programming (cNLP) and mixed-integer programming (MINLP) problems, and variable neighborhood search (VNS) for Integer Programming (IP) problems. Additionally, the R version includes BayesFit for parameter estimation by Bayesian inference. The eSS and VNS methods can be run on a single-thread or in parallel using a cooperative strategy. The code is supplied under GPLv3 and is available at http://www.iim.csic.es/~gingproc/meigo.html. Documentation and examples are included. The R package has been submitted to BioConductor. We evaluate MEIGO against optimization benchmarks, and illustrate its applicability to a series of case studies in bioinformatics and systems biology where it outperforms other state-of-the-art methods.

Conclusions

MEIGO provides a free, open-source platform for optimization that can be applied to multiple domains of systems biology and bioinformatics. It includes efficient state of the art metaheuristics, and its open and modular structure allows the addition of further methods.

Background

Mathematical optimization plays a key role in systematic decision making processes, and is used virtually in all areas of science and technology where problems can be stated as finding the best among a set of feasible solutions. In bioinformatics and systems biology, there has been a plethora of successful applications of optimization during the last two decades (see reviews in [1–5]). Many problems in computational biology can be formulated as IP problems, such as sequence alignment, genome rearrangement and protein structure prediction problems [1, 3], or the design of synthetic biological networks [6]. Deterministic and stochastic/heuristic methods have been extensively applied to optimization problems in the area of machine learning [2]. In addition to combinatorial optimization, other important classes of optimization problems that have been extensively considered, especially in systems biology, are cNLP and mixed-integer dynamic optimization. Such problems arise in parameter estimation and optimal experimental design [5, 7].

A number of authors have stressed the need to use suitable global optimization methods due to the non-convex (multimodal) nature of many of these problems [4, 8, 9]. Roughly speaking, global optimization methods can be classified into exact and stochastic approaches. Exact methods can guarantee convergence to global optimality, but the associated computational effort is usually prohibitive for realistic applications. In contrast, stochastic methods are often able to locate the vicinity of the global solution in reasonable computation times, but without guarantees of convergence. Metaheuristics (i.e. guided heuristics) are a particular class of stochastic methods that have been shown to perform very well in a broad range of applications [5].

Motivated by this, we developed the software suite MEIGO (MEtaheuristics for systems biology and bIoinformatics Global Optimization) which provides state of the art metaheuristics (eSS and VNS) in open-source R (here with the addition of the Bayesian inference method BayesFit) and Matlab versions (it is also available in Python via a wrapper for the R version). MEIGO covers the most important classes of problems, namely (i) problems with real-valued (cNLPs) and mixed-integer decision variables (MINLPs), and (ii) problems with integer and binary decision variables (IPs). Furthermore, MEIGO allows the user to apply parallel computation using cooperative strategies [10]. MEIGO can optimize arbitrary objective functions that are handled as black-boxes. Thus, it is applicable to optimize complex systems that may involve solving inner problems (e.g. simulations or even other optimization problems) to obtain explicit values for the objective function and/or the possible constraints. For example, CellNOpt [11], SBToolbox [12], AMIGO [13] and Potterswheel [14] use eSS for dynamic model calibration. Some recent successful applications of eSS in the field of systems biology can be found in [15–26]. It has also been shown that eSS outperformed the various optimization methods available in the Systems Biology Toolbox [27].

Methods

Enhanced Scatter Search (eSS)

Scatter search [28] is a population-based metaheuristic which can be classified as an evolutionary optimization method. In contrast with other popular population-based metaheuristics like, for example, genetic algorithms, the population size, N, in scatter search is small, and the combinations among its members are performed systematically, rather than randomly. The current population is commonly named the “Reference Set” (RefSet). The improvement method, which consists of a local search to increase the convergence to optimal solutions, can be applied with more or less frequency to the members of this RefSet. A set of improvements has been implemented in the enhanced scatter search method. Among the most remarkable changes, we can mention the replacement method. Unlike in the original scatter search scheme, which uses a μ+λ replacement (i.e. the new population or RefSet will consist in the best N solutions selected from the previous RefSet members and the new offspring solutions), the enhanced scatter search uses a 1+1 replacement, similar to the strategy used in a very efficient evolutionary method, Differential Evolution [29]. This means that a RefSet member can only be replaced by a solution that has been generated combining by the former and another RefSet member. In other words, an offspring solution can only replace the RefSet member that generated it, and not any other. This strategy enhances diversity and prevents the search from premature stagnation by not allowing too similar solutions to be present in the RefSet at the same time. The “go-beyond” strategy to exploit combinations which explore promising directions has also been implemented. This strategy analyzes the search directions defined by a RefSet member and their offspring. If an offspring solution outperforms its corresponding RefSet member (i.e. the RefSet member that generates it), then the method considers that the explored direction is promising and a new solution is generated within such direction, exploring an area beyond the segment defined by the RefSet member and its offspring solution. The process is repeated until the new generated solutions do not outperform the previous ones and it favours intensification in the current iteration. Additionally, the use of memory is also exploited to select the most efficient initial points to perform local searches, to avoid premature convergence and to perturb solution vectors which are stuck in stationary points. More details about the enhanced scatter search scheme can be found in [30].

Variable Neighbourhood Search (VNS)

Variable Neighbourhood Search is a trajectory-based metaheuristic for global optimization. It was introduced by Mladenović and Hansen [31] and has gained popularity in recent years in the field of global optimization. VNS performs a local search by evaluating the objective function around an incumbent solution and repeats the procedure visiting different neighbourhoods to locate different local optima, among which the global optimum is expected to be found. One of the key points of the algorithm is the strategy followed to change the current neighbourhood. VNS usually seeks a new neighbourhood by perturbing a set of decision variables using a distance criterion. Once a new solution has been created in the new neighbourhood, a new local search is performed. The typical scheme consists of visiting neighbourhoods close to the current one (i.e. perturbing a small set of solutions), until no further improvement is achieved. Then, more distant neighbourhoods are explored. Apart from this basic scheme, we have implemented advanced strategies to avoid cycles in the search (e.g. not repeating the perturbed decision variables in consecutive neighbourhood searches) in order to increase the efficiency when dealing with large-scale problems (e.g. by allowing a maximum number of perturbed decision variables, like in the Variable Neighbourhood Decomposition Search strategy [32]). We have also modified the search aggressiveness to locate high quality solutions (even if they are not the global optimum) in short computational times if required. Other heuristics, like the “go-beyond” strategy (explained above), that is used to exploit promising directions during the local search, have been adapted from other metaheuristics for continuous optimization [30].

BayesFit

BayesFit is a Bayesian inference method for parameter estimation that uses Markov Chain Monte Carlo (MCMC) to sample the complete probability distributions of parameters. This accounts for both experimental error and model non-identifiability. It is available in the R version of MEIGO and has been adapted from the Python package BayesSB [33]. The sampling of the probability distributions uses a multi-start MCMC algorithm where the number of visits to a position in the parameter space is proportional to the posterior probability. The MCMC walk is punctuated by a Metropolis Hastings (M-H) criterion that allows more distant neighbourhoods to be explored, based on a probabilistic calculation.

Cooperation

The cooperation scheme implemented in MEIGO is based on the following idea: to run, in parallel, several implementations or threads of an optimization algorithm, which may have different settings and/or random initializations, and exchange information between them. Since the nature of the optimization algorithms implemented in MEIGO is essentially different, we distinguish between eSS (the population based method) and VNS (the trajectory based method), following the classification proposed in [34] (currently there is no cooperation scheme for BayesFit):

1.
Information available for sharing: the best solution found and, optionally for eSS, the RefSet, which contains information about the diversity of solutions.
2.
Threads that share information: all of them.
3.
Frequency of information sharing: the threads exchange information at a fixed interval τ.
4.
Number of concurrent programs: η.

Each of the η threads has a fixed degree of aggressiveness. “Conservative” threads have an emphasis on diversification (global search) and are used to increase the probability of finding a feasible solution, even if the parameter space is rugged or weakly structured. “Aggressive” threads have an emphasis on intensification (local search) and they speed up the calculations in smoother areas. Communication, which takes place at fixed time intervals, enables each thread to benefit from the knowledge gathered by the others. Thus this strategy has several degrees of freedom that have to be fixed: the time between communication (τ), the number of threads (η), and the strategy adopted by each thread. These adjustments should be chosen carefully depending on the particular problem we want to solve. Some guidelines for doing this can be found in [10] and in the Additional files 1, 2, 3, 4 and 5 accompanying this paper.

Implementation

MEIGO runs on Windows, Mac, and Linux, and provides implementations in both Matlab and R. So far, MEIGO includes: (i) eSS (Enhanced Scatter Search, [30]), for solving cNLP and MINLP problems, and (ii) VNS (Variable Neighbourhood Search), following the implementation described in [35], to solve IP problems (see Figure 1). The R version of MEIGO also includes the Bayesian parameter inference method BayesFit. Cooperative parallel versions (CeSS, CVNS), which can run on multicore PCs or clusters, are also included. Cooperation enhances the efficiency of the methods, not only in terms of speed, but also in terms of range: the threads running in parallel are completely independent so they can be customized to cover a wide range of search options, from aggressive to robust. In a sense the cooperation, as it has been designed, acts as a combination of different metaheuristics since each of the threads may present a different search profile. Four different kernel functions per method are included depending on the programming language chosen and the parallelization capabilities. Parallel computation in Matlab is carried out making use of the jpar tool [36]. Parallel computation in R can be performed using the package snowfall [37].

The methods implemented in MEIGO consider the objective functions to be optimized as black-boxes, with no requirements with respect to their structure. The user must provide a function that can be externally called for evaluation, accepting as input the variables to be estimated, and providing as output the objective value, ϕ, as a function of the input parameters. For constrained problems, the values of the constraints are also provided as output so that penalization functions can be calculated. For eSS and VNS, the user must define a set of compulsory fields (e.g. the name of the objective function, the bounds in the parameters, the maximum number of function evaluations). Further options take default values or can be changed. After each optimization, all the necessary results are stored in data files for further analysis with the tools provided by the host platforms. BayesFit is similarly robust to the form of the problem; in this case the likelihood function is provided by the user and this is incorporated into the calculation for the posterior probability for the parameter set, given the data.

Importantly, MEIGO is an open optimization platform in which other optimization methods can be implemented regardless of their nature (e.g. exact, heuristic, probabilistic, single-trajectory, population-based, etc.).

Illustrative examples

To illustrate the capabilities of the methods presented here, a set of optimization problems, including cases from systems biology and bioinformatics, have been solved and are presented as case studies. The examples include (i) a set of state of the art benchmark cases for global optimization (from the Competition on Large Scale Global Optimization, 2012 IEEE World Congress on Computational Intelligence), (ii) a metabolic engineering problem based on a constraint-based model of E. coli, (iii) training of logic models of signaling networks to phospho-proteomic data [38], and (iv) an additional toy logic model [22] to compare BayesFit to eSS. The corresponding code for these examples is included in the distribution of the MEIGO software.

Large-scale continuous global optimization benchmark

These are benchmark functions used in the Special Session on Evolutionary Computation for Large Scale Global Optimization, which was part of the 2012 IEEE World Congress on Computational Intelligence (CEC@WCCI-2012). These objective functions can be regarded as state-of-the-art benchmark functions to test numerical methods for large-scale (continuous) optimization. Information about the functions as well as computer codes can be downloaded from http://staff.ustc.edu.cn/~ketang/cec2012/lsgo_competition.htm. Some of these functions were previously solved in [10] using CeSS, a cooperative version of the Enhanced Scatter Search metaheuristic implemented in Matlab and available within MEIGO. Large-scale calibration of systems biology models were also presented and solved in that paper. Here we present the solution of 3 of these functions (i.e. f10, f17 and f20) using the R version of CeSS used by MEIGO. The convergence curves for the solution of these benchmark functions in R are coherent with those presented in [10], which were solved with Matlab, and the results are also competitive with the reference results for these functions presented in http://staff.ustc.edu.cn/~ketang/cec2012/lsgo_competition.htm. The convergence curves corresponding to these results are presented in Figures 2, 3 and 4.

Integer optimization benchmark problems

A set of integer optimization problems arising in process engineering and coded in AMPL (A Modeling Language for Mathematical Programming) were solved using the Matlab version of VNS and making use of the AMPL-Matlab interface files provided by Dr. Sven Leyffer, available at http://www.mcs.anl.gov/~leyffer/macminlp/. VNS solved all the problems and, in some cases, achieved a better solution than the best reported one. A summary of the tested problems is presented in Table 1. These benchmarks have been solved using the Matlab version of MEIGO under Windows only, since the dynamic library to access AMPL files runs on Windows.

Table 1 Summary of solutions for integer programming problems

Full size table

Metabolic engineering example

In this section we illustrate the application of the VNS algorithm to a metabolic engineering problem. Here VNS was used to find a set of potential gene knock-outs that will maximize the production of a given metabolite of interest. The objective function is given by flux-balance analysis (FBA) where a steady-state model is simulated by means of linear programming (LP). The mathematical formulation is similar to that presented in [41]. FBA assumes that cells have a biological objective that is often considered as growth rate maximization, minimization of ATP consumption or both.

In this example we considered a small steady-state model from E. coli central carbon metabolism, available at http://gcrg.ucsd.edu/Downloads/EcoliCore. Here the metabolite of interest is succinate and we considered the biological objective as biomass maximization. To solve the inner FBA problem we used openCOBRA (http://opencobra.sourceforge.net/) with Gurobi as an LP solver (http://www.gurobi.com/). For the problem encoding, 5 integer variables were chosen as decision variables, one for each possible gene knock-out. Each of these variables was allowed to vary from 0 (no knock-out) to 52, the total number of possible genes to be knocked-out. Repeated KOs were filtered by the objective function.

Additionally we also implemented and solved the problem with a genetic algorithm from the Matlab Global Optimization Toolbox. The point here was to cross-check the VNS results, not to perform an extensive comparison between the performances of GA and VNS. However we found that for our particular problem and encoding, VNS achieved the optimal solution more often (see Figures 5 and 6). The Wilcoxon rank sum test with continuity correction for comparing means provides a p-value of 0.068 (or 0.021 if we remove the outlier VNS solution) showing that the solutions provided by VNS are significantly better. Please note that the GA was used out of the box (with default settings). Results can vary when using other encodings and further tuning of the search parameters. In any case, the purpose was to illustrate how this class of problems can be easily solved using VNS.

Training of logic models of signalling networks to phospho-proteomic data

In this section we compare the performance of variable neighborhood search (VNS) and a discrete genetic algorithm (GA) implementation in training a logic model of a signalling network to phospho-proteomic data [38].

The problem is formulated as follows: one starts from a signed directed graph, containing the prior knowledge about a signaling network of interest. This graph contains directed edges among nodes (typically proteins) and their sign (activating or inhibitory). From this graph, one generates all possible AND and OR gates compatible with the graph. That means, if there are more than one edge arriving at a node, these are combined as OR and AND gates. Mathematically, this is encoded as an hyper graph, where edges with two or more inputs (hyperedges) represent a logical disjunction (AND gate). OR gates are encoded implicitly, by means of edges with only one input arriving at a node. See [38] for details.

To calibrate such models, the authors formulated the inference problem as a binary multi-objective problem, where the first objective corresponded to how well the model described the experimental data and the second consisted of a complexity penalty to avoid over-fitting:

θ (P) = θ_{f} (P) + α \cdot θ_{s} (P)

(1)

where $θ_{f} (P) = \frac{1}{n_{E}} \sum_{k = 1}^{s} \sum_{l = 1}^{m} \sum_{t = 1}^{n} {(B_{k, l, t}^{M} (P) - B_{k, l, t}^{E})}^{2}$ and $θ_{s} (P) = \frac{1}{v_{e}^{s}} \sum_{e = 1}^{r} v_{e} P_{e}$ such that B_k,l,t(P)∈{0,1} is the value (0 or 1) as predicted by computation of the model’s logical steady state [42] and $B_{k, l, t}^{E} \in [0, 1)$ is the data value for readout l at time t under the kth experimental condition. θ_f(P) is the mean squared error and α·θ_s(P) is the product between a tunable parameter α and a function denoting the model complexity (each hyper edge receives a penalty proportional to the number of inputs. E.g. an AND gates with 3 inputs is penalised 3 times as a single edge. OR gates arise implicitly from the combination of single input edges.).

Noticeably, the binary implementation of this problem contains redundant solutions in the search space. This can be addressed by compressing the search space into a reduced set containing only the smallest non-redundant combinations of hyperedges [38] (equivalent to the Sperner hypergraph). By doing this, the problem is transformed from a binary to an integer programming problem that was solved in [38] using a genetic algorithm.

Here, we implemented this benchmark by using the Matlab version of CellNetOptimizer (CNO or CellNOpt, available at http://www.cellnopt.org/downloads.html). The prior-knowledge network and data-set are also publicly available and thoroughly described at http://www.ebi.ac.uk/~cokelaer/cellnopt/data/ExtLiverPCB.html.

In order to assess the performance of both algorithms we solved each problem 100 times using VNS and the GA implementation from CNO. In the allowed time budget, VNS returned solutions that were on average better than those found by the GA (see Figure 7). The Welch Two Sample t-test for comparing means provides a p-value of 3.5·10⁻¹⁴, which clearly shows that VNS outperforms GA for this problem. Since both methods are sensitive to the tuning parameters, we tried to tune both algorithms fairly. Also, we note that the solution of this problem in its original, binary implementation can be solved using deterministic methods based either on Integer Linear Programming [43, 44] or Answer Set Programming [45].

Training of logic ODEs to in-silico generated data

Here, in order to demonstrate the additional information that can be derived from the probability distributions of optimized parameters, we compared the R implementations of BayesFit and eSS. The problem is again based on a logic model where, this time, the topology of the model is known and the goal is to optimize the parameters of the transfer functions used to generate a continuous simulation of the model. The parameters were optimized to reduce the distance between the model simulation and in silico generated data. This example is as described in section 6 from [22]; the only difference is that the model used here is the compressed model used to generate the in silico data in [22]. BayesFit produced a good fit to the data, comparable to that of eSS (Mean Squared Error: BayesFit, 0.007; eSS, 0.005). One of the advantages of estimating parameters by Bayesian inference is that parameter identifiability can be deduced from the marginal distributions for each parameter. For example Figure 8 shows two parameters of a single interaction between the species “egf” and “sos” in the model; these parameters n and k, control the shape of the transfer function between the two species [22]. From this figure, the covariation between the 2 parameters is evident. The best fit parameters (red line) lie in one region of high probability. However, there are additional correlated peaks in the marginal distributions of the two parameters, which suggests different parameter values could also produce a strong fit to the data.

Conclusions

Here, we present MEIGO, a free, open-source and flexible package to perform global optimization in R, Matlab, and Python. It includes advanced metaheuristic methods. Furthermore, its modular nature (Figure 1), enables the connection to existing optimization methods.

Availability and requirements

Project name: Metaheuristics for global optimization in systems biology and bioinformatics (MEIGO)Project home page:http://www.iim.csic.es/~gingproc/meigo.html Operating system(s): Windows, Linux, Mac OS X Programming language: Matlab 7.5 or higher and R 2.15 or higher Licence: GPLv3

References

Greenberg HJ, Hart WE, Lancia G: Opportunities for combinatorial optimization in computational biology. Informs J Computing. 2004, 16 (3): 211-231. 10.1287/ijoc.1040.0073.
Article Google Scholar
Larrañaga P, Calvo B, Santana R, Bielza C, Galdiano J, Inza I, Lozano JA, Armañanzas R, Santafé G, Pérez A, Robles V: Machine learning in bioinformatics. Brief Bioinform. 2006, 7: 86-112. 10.1093/bib/bbk007.
Article PubMed Google Scholar
Festa P: On some optimization problems in molecular biology. Math Biosci. 2007, 207 (2): 219-234. 10.1016/j.mbs.2006.11.012.
Article PubMed CAS Google Scholar
Banga JR: Optimization in computational systems biology. BMC Syst Biol. 2008, 2: 47-10.1186/1752-0509-2-47.
Article PubMed Central PubMed Google Scholar
Sun J, Garibaldi JM, Hodgman C: Parameter estimation using Metaheuristics in systems biology: a comprehensive review. IEEE/ACM Trans Comput Biol Bioinform. 2012, 9: 185-202.
Article PubMed Google Scholar
Marchisio M: Stelling J: Computational design tools for synthetic biology. Curr Opin Biotechnol. 2009, 20 (4): 479-85. 10.1016/j.copbio.2009.08.007.
Article PubMed CAS Google Scholar
Banga JR, Balsa-Canto E: Parameter estimation and optimal experimental design. Essays Biochem. 2008, 45: 195-10.1042/BSE0450195.
Article PubMed CAS Google Scholar
Moles CG, Mendes P, Banga JR: Parameter estimation in biochemical pathways: a comparison of global optimization methods. Genome Res. 2003, 13 (11): 2467-2474. 10.1101/gr.1262503.
Article PubMed Central PubMed CAS Google Scholar
Ashyraliyev M, Fomekong-Nanfack Y, Kaandorp JA, Blom JG: Systems biology: parameter estimation for biochemical models. FEBS J. 2008, 276 (4): 886-902.
Article Google Scholar
Villaverde AF, Egea JA, Banga JR: A cooperative strategy for parameter estimation in large scale systems biology models. BMC Syst Biol. 2012, 6: 75-10.1186/1752-0509-6-75.
Article PubMed Central PubMed Google Scholar
Terfve C, Cokelaer T, MacNamara A, Henriques D, Goncalves E, Morris MK, van Iersel M, Lauffenburger DA, Saez-Rodriguez J: CellNOptR: a flexible toolkit to train protein signaling networks to data using multiple logic formalisms. BMC Syst Biol. 2012, 6: 133+-10.1186/1752-0509-6-133.
Article PubMed Central PubMed CAS Google Scholar
Schmidt H, Jirstrand M: Systems biology toolbox for MATLAB: a computational platform for research in systems biology. Bioinformatics. 2006, 22 (4): 514-515. 10.1093/bioinformatics/bti799.
Article PubMed CAS Google Scholar
Balsa-Canto E, Banga JR: AMIGO, a toolbox for advanced model identification in systems biology using global optimization. Bioinformatics. 2011, 27 (16): 2311-2313. 10.1093/bioinformatics/btr370.
Article PubMed Central PubMed CAS Google Scholar
Maiwald T, Eberhardt O, Blumberg J: Mathematical modeling of biochemical systems with PottersWheel. Computational Modeling of Signaling Networks, Series: Methods in Molecular Biology, Volume 880. Edited by: Liu X. 2012, Betterton MD. New York: Humana Press, 119-138.
Chapter Google Scholar
Balsa-Canto E, Alonso A, Banga JR: An iterative identification procedure for dynamic modeling of biochemical networks. BMC Syst Biol. 2010, 4: 11+-10.1186/1752-0509-4-11.
Article PubMed Central PubMed Google Scholar
Skanda D, Lebiedz D: An optimal experimental design approach to model discrimination in dynamic biochemical systems. Bioinformatics. 2010, 26 (7): 939-945. 10.1093/bioinformatics/btq074.
Article PubMed CAS Google Scholar
Yuraszeck TM, Neveu P, Rodriguez-Fernandez M, Robinson A, Kosik KS, Doyle FJ III: Vulnerabilities in the Tau network and the role of ultrasensitive points in Tau pathophysiology. PLoS Comput Biol. 2010, 6 (11): e1000997-10.1371/journal.pcbi.1000997.
Article PubMed Central PubMed Google Scholar
Jia G, Stephanopoulos G, Gunawan R: Parameter estimation of kinetic models from metabolic profiles: two-phase dynamic decoupling method. Bioinformatics. 2011, 27 (14): 1964-1970. 10.1093/bioinformatics/btr293.
Article PubMed CAS Google Scholar
Heldt F, Frensing T, Reichl U: Modeling the intracellular dynamics of influenza virus replication to understand the control of viral RNA synthesis. J Virol. 2012, 86 (15): 7806-7817. 10.1128/JVI.00080-12.
Article PubMed Central PubMed CAS Google Scholar
Higuera C, Villaverde AF, Banga JR, Ross J, Morán F: Multi-criteria optimization of regulation in metabolic networks. PLoS ONE. 2012, 7 (7): e41122-10.1371/journal.pone.0041122.
Article PubMed Central PubMed CAS Google Scholar
Jia G, Stephanopoulos G, Gunawan R: Incremental parameter estimation of kinetic metabolic network models. BMC Syst Biol. 2012, 6: 142+-10.1186/1752-0509-6-142.
Article PubMed Central PubMed Google Scholar
MacNamara A, Terfve C, Henriques D, Peñalver-Bernabé B, Saez-Rodriguez J: State-time spectrum of signal transduction logic models. Phys Biol. 2012, 9 (4): 045003-10.1088/1478-3975/9/4/045003.
Article PubMed Google Scholar
Sriram K, Rodriguez-Fernandez M, Doyle FJ III: Modeling cortisol dynamics in the neuro-endocrine axis distinguishes normal, depression, and post-traumatic stress disorder (PTSD) in humans. PLoS Comput Biol. 2012, 8 (2): e1002379-10.1371/journal.pcbi.1002379.
Article PubMed Central PubMed CAS Google Scholar
Sriram K, Rodriguez-Fernandez M, Doyle FJ: A detailed modular analysis of heat-shock protein dynamics under acute and chronic stress and its implication in anxiety disorders. PLoS ONE. 2012, 7 (8): e42958+-10.1371/journal.pone.0042958.
Article PubMed Central PubMed CAS Google Scholar
Freund S, Rath A, Barradas OP, Skerhutt E, Scholz S, Niklas J, Sandig V, Rose T, Heinzle E, Noll T, Pörtner R, Zeng AP, Reichl U: Batch-to-batch variability of two human designer cell lines – AGE1.HN and AGE1.HN.AAT – carried out by different laboratories under defined culture conditions using a mathematical model. Eng Life Sci. 2013, 00: 1-13.
CAS Google Scholar
Francis F, García MR, Middleton RH: A single compartment model of pacemaking in dissasociated Substantia nigra neurons. J Comput Neurosci. 2013, 35 (3): 295-316. 10.1007/s10827-013-0453-9.
Article PubMed Google Scholar
Egea JA, Schmidt H, Banga JR: A new tool for parameter estimation in nonlinear dynamic biological systems using global optimization. Poster. 9th International Conference on Systems Biology, ICSB 2008. Goteborg (Sweden), 22-28 August 2008. Available at http://www.iim.csic.es/gingproc/Poster_ICSB2008.pdf.,
Glover F, Laguna M, Martí R: Fundamentals of scatter search and path relinking. Control Cybernet. 2000, 39 (3): 653-684.
Google Scholar
Storn R, Price K: Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces. J Global Optimization. 1997, 11 (4): 341-359. 10.1023/A:1008202821328.
Article Google Scholar
Egea JA, Martí R, Banga JR: An evolutionary method for complex-process optimization. Comput Oper Res. 2010, 37 (2): 315-324. 10.1016/j.cor.2009.05.003.
Article Google Scholar
Mladenović N, Hansen P: Variable neighborhood search. Comput Oper Res. 1997, 24: 1097-1100. 10.1016/S0305-0548(97)00031-2.
Article Google Scholar
Hansen P, Mladenović N, Perez-Brito D: Variable neighborhood decomposition search. J Heuristics. 2001, 7: 335-350. 10.1023/A:1011336210885.
Article Google Scholar
Eydgahi H, Chen WW, Muhlich JL, Vitkup D, Tsitsiklis JN, Sorger PK: Properties of cell death models calibrated and compared using Bayesian approaches. Mol Syst Biol. 2013, 9: 644-
Article PubMed Central PubMed CAS Google Scholar
Toulouse M, Crainic TG, Sanso B: Systemic behavior of cooperative search algorithms. Parallel Comput. 2004, 30: 57-79. 10.1016/j.parco.2002.07.001.
Article Google Scholar
Hansen P, Mladenović N, Moreno-Pérez JA: Variable neighbourhood search: methods and applications. Ann Oper Res. 2010, 175: 367-407. 10.1007/s10479-009-0657-6.
Article Google Scholar
Karbowski A, Majchrowski M, Trojanek P: jPar–a simple, free and lightweight tool for parallelizing Matlab calculations on multicores and in clusters. 9th International Workshop on State-of-the-Art in Scientific and Parallel Computing (PARA 2008), May 13–16 2008, Trondheim (Norway),
Knaus J: Developing parallel programs using snowfall. 2010, [http://cran.r-project.org/web/packages/snowfall/vignettes/snowfall.pdf],
Google Scholar
Saez-Rodriguez J, Alexopoulos LG, Epperlein J, Samaga R, Lauffenburger DA, Klamt S, Sorger PK: Discrete logic modelling as a means to link protein signalling networks with functional analysis of mammalian signal transduction. Mol Syst Biol. 2009, 5: 331-
Article PubMed Central PubMed Google Scholar
Sandgren E: Nonlinear integer and discrete programming in mechanical design optimization. J Mech Des. 1990, 112 (2): 223-229. 10.1115/1.2912596.
Article Google Scholar
Harjunkoski I, Westerlund T, Pörn R, Skrifvars H: Different transformations for solving non–convex trim loss problems by MINLP. Eur J Oper Res. 1998, 105: 594-603. 10.1016/S0377-2217(97)00066-0.
Article Google Scholar
Burgard AP, Pharkya P, Maranas CD: Optknock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnol Bioeng. 2003, 84 (6): 647-657. 10.1002/bit.10803.
Article PubMed CAS Google Scholar
Klamt S, Saez-Rodriguez J, Lindquist J, Simeoni L, Gilles ED: A methodology for the structural and functional analysis of signaling and regulatory networks. BMC Bioinformatics. 2006, 7: 56-10.1186/1471-2105-7-56.
Article PubMed Central PubMed Google Scholar
Mitsos A, Melas IN, Siminelakis P, Chairakaki AD, Saez-Rodriguez J, Alexopoulos LG: Identifying drug effects via pathway alterations using an integer linear programming optimization formulation on phosphoproteomic data. PLoS Comput Biol. 2009, 5 (12): e1000591-10.1371/journal.pcbi.1000591.
Article PubMed Central PubMed Google Scholar
Sharan R, Karp RM: Reconstructing boolean models of signaling. Proceedings of the 16th Annual International Conference on Research in Computational Molecular Biology, RECOMB’12. 2012, Berlin, Heidelberg: Springer-Verlag, 261-271.
Google Scholar
Guziolowski C, Videla S, Eduati F, Thiele S, Cokelaer T, Siegel A, Saez-Rodriguez J: Exhaustively characterizing feasible logic models of a signaling network using Answer Set Programming. Bioinformatics. 2013, 29 (18): 2320-2326. 10.1093/bioinformatics/btt393.
Article PubMed Central PubMed CAS Google Scholar

Download references

Acknowledgements

The authors thank Alexandra Vatsiou for testing MEIGO. We acknowledge the funding received from the EU through projects “BioPreDyn” (FP7-KBBE grant 289434) and “NICHE” (FP7-ITN grant 289384), from Ministerio de Economía y Competitividad (and the FEDER) through the projects “Multi-Scales” (DPI2011-28112-C04-03 and DPI2011-28112-C04-04), and from the CSIC intramural project “BioREDES” (PIE-201170E018).

Author information

Authors and Affiliations

Department of Applied Mathematics and Statistics, Universidad Politécnica de Cartagena, 30202, Cartagena, Spain
Jose A Egea
(Bio)Process Engineering Group, Spanish National Research Council, IIM-CSIC, 36208, Vigo, Spain
David Henriques, Alejandro F Villaverde & Julio R Banga
European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SD, UK
David Henriques, Thomas Cokelaer, Aidan MacNamara, Diana-Patricia Danciu & Julio Saez-Rodriguez

Authors

Jose A Egea
View author publications
You can also search for this author in PubMed Google Scholar
David Henriques
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Cokelaer
View author publications
You can also search for this author in PubMed Google Scholar
Alejandro F Villaverde
View author publications
You can also search for this author in PubMed Google Scholar
Aidan MacNamara
View author publications
You can also search for this author in PubMed Google Scholar
Diana-Patricia Danciu
View author publications
You can also search for this author in PubMed Google Scholar
Julio R Banga
View author publications
You can also search for this author in PubMed Google Scholar
Julio Saez-Rodriguez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Julio R Banga or Julio Saez-Rodriguez.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

JSR and JRB conceived, designed and coordinated the project. JAE implemented the metaheuristic methods included in MEIGO. AFV and DH developed and implemented the methods for parallel computation. JAE, AFV and DH performed the numerical computations for the metaheuristic methods. AMN and DPD implemented and performed the computational test of the bayesian method. TC designed and developed the Python wrapper and helped with many technical aspects. All authors contributed to the writing of the manuscript. All authors read and approved the final manuscript.

Electronic supplementary material

12859_2013_6397_MOESM1_ESM.pdf

Additional file 1: MEIGO - Matlab’s users manual. The file includes the user’s manual of the Matlab version of MEIGO. (PDF 588 KB)

12859_2013_6397_MOESM2_ESM.pdf

Additional file 2: MEIGO - R’s user’s manual. The file includes the users manual of the R version of MEIGO. (PDF 582 KB)

12859_2013_6397_MOESM3_ESM.pdf

Additional file 3: Test of options for the integer programming benchmark problems. The file includes the test carried out with the different search options to solve the integer programming benchmark problems as well as the extracted conclusions. (PDF 126 KB)

12859_2013_6397_MOESM4_ESM.zip

Additional file 4: MEIGO Matlab version source code and examples. The file includes the source code of the Matlab version of MEIGO and the examples included in the users manual. (ZIP 4 MB)

12859_2013_6397_MOESM5_ESM.zip

Additional file 5: MEIGO R version source code and examples. The file includes the source code of the R version of MEIGO and the examples included in the users manual. (ZIP 2 MB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( https://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Egea, J.A., Henriques, D., Cokelaer, T. et al. MEIGO: an open-source software suite based on metaheuristics for global optimization in systems biology and bioinformatics. BMC Bioinformatics 15, 136 (2014). https://doi.org/10.1186/1471-2105-15-136

Download citation

Received: 23 October 2013
Accepted: 24 April 2014
Published: 10 May 2014
DOI: https://doi.org/10.1186/1471-2105-15-136

MEIGO: an open-source software suite based on metaheuristics for global optimization in systems biology and bioinformatics

Abstract

Background

Results

Conclusions

Background

Methods

Enhanced Scatter Search (eSS)

Variable Neighbourhood Search (VNS)

BayesFit

Cooperation

Implementation

Illustrative examples

Large-scale continuous global optimization benchmark

Integer optimization benchmark problems

Metabolic engineering example

Training of logic models of signalling networks to phospho-proteomic data

Training of logic ODEs to in-silico generated data

Conclusions

Availability and requirements

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Competing interests

Authors’ contributions

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us