POPE: post optimization posterior evaluation of likelihood free models
- Edward Meeds^{1}Email author,
- Michael Chiang^{2},
- Mary Lee^{3},
- Olivier Cinquin^{2},
- John Lowengrub^{3} and
- Max Welling^{1, 4}
Received: 9 February 2015
Accepted: 2 July 2015
Published: 20 August 2015
Abstract
Background
In many domains, scientists build complex simulators of natural phenomena that encode their hypotheses about the underlying processes. These simulators can be deterministic or stochastic, fast or slow, constrained or unconstrained, and so on. Optimizing the simulators with respect to a set of parameter values is common practice, resulting in a single parameter setting that minimizes an objective subject to constraints.
Results
We propose algorithms for post optimization posterior evaluation (POPE) of simulators. The algorithms compute and visualize all simulations that can generate results of the same or better quality than the optimum, subject to constraints. These optimization posteriors are desirable for a number of reasons among which are easy interpretability, automatic parameter sensitivity and correlation analysis, and posterior predictive analysis. Our algorithms are simple extensions to an existing simulation-based inference framework called approximate Bayesian computation. POPE is applied two biological simulators: a fast and stochastic simulator of stem-cell cycling and a slow and deterministic simulator of tumor growth patterns.
Conclusions
POPE allows the scientist to explore and understand the role that constraints, both on the input and the output, have on the optimization posterior. As a Bayesian inference procedure, POPE provides a rigorous framework for the analysis of the uncertainty of an optimal simulation parameter setting.
Keywords
Approximate Bayesian computation Simulation-based science Bayesian inferenceBackground
In science and industry alike, modelers express their expert knowledge by building a simulator of the phenomenon of interest. There is an enormous variety of such simulators, deterministic or stochastic, fast or slow, with or without constraints. For most simulators, e.g. driven by stochastic partial differential equations, it is impossible to write down an expression for the likelihood, which can make it highly challenging to optimize the simulator over its free parameters. This “blind optimization problem” is receiving increasing attention in the machine learning community [1–3].
However, even if the optimal parameter value θ ^{⋆} is found, this still leaves the scientist in the dark with respect to important questions such as: “Which parameters are correlated?”; “Which parameters are robust and which are sensitive?”; “Is my model overfitting, underfitting or just right”? We believe that methods capable of handling these type of questions post optimization are essential to the field of simulation-based modeling. In this paper we propose a new Bayesian framework that allows the scientist to answer these questions by defining a posterior distribution over all parameters that can be interpreted as “the probability that the outcome of a simulation conducted at that parameter value will result in a value of the objective that is equally good or better than a certain value \(y^{\star }_{1}\), subject to certain constraints on both parameters as well as simulation outcomes”. This “Post Optimization Posterior Evaluation” (POPE) is different from standard ABC [4–6] in that standard ABC compares simulator outcomes with observations while POPE reasons about an optimization problem (subject to constraints). For instance, POPE can be meaningfully applied to an optimization problem without a single observation by asking which parameter values are expected to perform better than a certain threshold value on the objective. While different philosophically, POPE can be implemented by using one-sided kernels within an ABC framework.
POPE is not intended to be an optimization tool for likelihood-free models. While one can use the POPE framework to iteratively and adaptively optimize an objective, its core use is in quantifying and visualizing the full distribution over parameters, including their posterior interactions, that result in equally good or better objective values than some given \(y^{\star }_{1}\). One could for instance imagine using Bayesian optimization [3] or some other global optimization technique [7] to find a value for \(y^{\star }_{1}\) and then visualize the posterior distribution of parameters given that value. The posterior distribution we approximate using ABC sampling techniques is related to the concept of “probability of improvement” often used in Bayesian optimization [8] to measure how promising a parameter value is in terms of improving on the current best solution. However, in that context the probability of improvement includes, besides uncertainty due to the stochastic nature of simulation, also the uncertainty of a surrogate model’s ability to predict the value of y _{1}. In contrast, with POPE the probability of improving the current best solution is only determined by the noise in the simulation.
POPE addresses the requirements of simulation-based science by providing tools that have a number of properties that are beneficial to a scientist: 1) the posterior distribution over parameters has a clear and interpretable meaning and can be used to suggest alternative parameters to explore, 2) POPE can handle multiple objectives and constraints, 3) unlike most standard optimization methods, POPE can handle simulators with stochastic outputs and complicated input or output constraints, 4) POPE can handle multimodal posterior distributions, 5) as part of its computation POPE will generate posterior predictive samples that can be used to evaluate the model fit, and 6) by incorporating Gaussian process surrogate models it can handle expensive simulators.
In this paper we will develop POPE and apply it to two real-world cases: one fast stochastic simulator in the domain of stem cell biology and one slow deterministic simulator developed for cancer research.
Approximate Bayesian computation
where π(θ) is a prior distribution over parameters θ∈IR^{ D } and \(\pi (\mathbf {y}^{\star }_{1}, \ldots, \mathbf {y}^{\star }_{N} | \boldsymbol {\theta })\) is the likelihood of N data observations, where \(\mathbf {y}^{\star }_{n} \in \text {I\!R}^{J}\). The vector of J values can either be “raw” observations or, more typically, informative statistics of observations. In this paper we consider the case where N=1 (though all our methods apply equally to N>1) and will henceforth drop the subscripts. The unconventional superscript on y ^{⋆} is used to distinguish the observations from the simulator outputs y.
Although Eq. 3 is an unbiased estimator of π _{ ε }(y ^{⋆}|θ), this ABC likelihood is an approximation to the true likelihood, since π _{ ε }(y ^{⋆}|θ)≈π(y ^{⋆}|θ). In other words, ε puts the “approximate” in ABC; samples are drawn from the true posterior only as ε→0. Common π _{ ε } kernels are the ε-tube \(\pi _{\boldsymbol {\epsilon }}(\mathbf {y}^{\star } | \mathbf {y}) \propto \prod _{j}\left [ \| y_{j}^{\star }-y_{j} \|_{1} \leq \boldsymbol {\epsilon }_{j} \right ]\) and the Gaussian kernel \(\pi _{\boldsymbol {\epsilon }}(\mathbf {y}^{\star } | \mathbf {y}) = \prod _{j}\mathcal {N} \left (y_{j}^{\star } | y_{j}, \boldsymbol {\epsilon }_{j}^{2} \right)\).
i.e. parameters θ ^{′} are first proposed, then outputs y ^{′} are generated from the simulator with input parameters θ ^{′}.
When only the numerator is re-estimated at every iteration (and the denominator is carried over from the previous iteration), then this algorithm corresponds to pseudo-marginal (PM) sampling [9, 10]. PM sampling is asymptotically correct (taking for granted the approximation introduced by the kernel π _{ ε }) but can display very poor mixing properties. By resampling the denominator as well, we improve mixing at the cost of introducing a further approximation. This sampler is known as the marginal sampler [4, 6]. There is evidence that using a single simulation is adequate [11]; indeed, we set S=1 in our experiments and found no benefit to tuning S.
For expensive simulators, even a single simulation per MH step can make ABC-MCMC infeasible.
Surrogate modeling—where the history of all simulations are stored in memory and used to build a surrogate of the simulator—may be the only option to make progress in that case.
Methods
In regular ABC the simulator generates output statistics y that are compared directly with observations y ^{⋆}. For optimization problems, however, the scientist may interpret y _{1} as a cost and \(y^{\star }_{1}\) as an estimate of the minimum cost. Other simulation statistics {y _{ j }},j=2..J may be constrained, e.g. \(\{y_{j} \leq y^{\star }_{j} \}.\) For instance, the cost could be some measure of misfit between simulator outcomes and desirable outcomes while constraints could represent domains within which certain simulation results should lie (constraints can of course also be incorporated into the cost function, but as we will see, it is sometimes beneficial to treat them separately). Our first guess to elucidate some posterior distribution over parameters could be to define a Gibbs distribution p(y _{1})∝ exp(−β y _{1}) which we would treat as a likelihood similar to π _{ ε } and apply ABC, rejecting everything that does not satisfy the constraints. Unfortunately, we do not consider this a satisfactory solution because the posterior does not have a clear interpretation. For instance, simply scaling the arbitrary constant β would change the posterior.
A better solution is to define a new type of (one-sided) Heavyside kernel in ABC: \(\left [ y_{1} \leq y^{\star }_{1} \right ]\) which is 1 when the argument is satisfied and 0 otherwise. Note that this kernel is applied to both the objective y _{1} and the constraints {y _{ j }} alike. The quantity \(y^{\star }_{1}\) is given by the lowest value of the objective found by some optimization procedure (e.g. grid-search, black-box [7] or Bayesian optimization [3], etc). The posterior samples produced by an ABC algorithm that uses this one-sided kernel have a very clean interpretation, namely they represent the probability that a simulation run at that parameter value will generate an equally good or better value for the objective while satisfying all the constraints. This distribution can be used to suggest new regions to explore (e.g. other modes, or regions that are farther away from constraint surfaces), and to visualize dependencies between parameters and their sensitivities.
Using the one-sided kernel [y≤y ^{⋆}] will cause the ABC sampler to get stuck when initialized in a region where y>y ^{⋆} because every proposed sample will get rejected. Even when initialized in a region where y≤y ^{⋆}, this kernel will make it very difficult to move between different “islands” (modes) in parameter space where these conditions hold. This problem is aggravated in high dimensions where \(\left [ \mathbf {y} \leq \mathbf {y}^{\star } \right ] = \prod _{j} \left [ y_{j} \leq y_{j}^{\star } \right ] \) and every condition needs to be satisfied for the likelihood to be non-zero. A one-sided ε-tube [y≤y ^{⋆}+ε] adds some relief but suffers the same problem for most useful values of ε.
By modifying ε we can control the relative importance or severity of the penalty, allowing us to use annealing schedules that adapt ε during the MCMC run in order to focus the sampling at modes when ε is small.
The one-sided kernels are easily modified for this, setting the likelihood to 1 in between the regions, with quadratic or linear penalties outside of the regions.
Modeling the simulator response
We may want to consider modeling the simulator response π(y|θ) if the outcome of the simulator is stochastic or the simulator is expensive to run. In the first case, we can reduce the variance of the Markov chain by learning a local response model ^{2} for every state θ. For the second case, a global response model (a.k.a. a surrogate) over the entire θ-space is more appropriate because it stores and makes use of the entire simulation history to predict responses at new θ locations.
Local response models
where \(\hat {\boldsymbol {\mu }}_{{\boldsymbol {\theta }}}\) and \(\hat {\boldsymbol {\Sigma }}_{\boldsymbol {\theta }}\) are computed from the S simulations. In experiments we can limit the number of parameters by using a factorized model: \(\mathcal {N}\left (\mathbf {y} | \hat {\boldsymbol {\mu }}_{{\boldsymbol {\theta }}}, \hat {\boldsymbol {\Sigma }}_{\boldsymbol {\theta }}\right) \approx \prod _{j=1}^{J} \mathcal {N}\left (y_{j} | \hat {\mu }_{j}, \hat {\sigma }^{2}_{j}\right)\), resulting in a factorized product over CDFs as well. Modeling the response by only the first two moments may be inadequate due to multi-modality, asymmetric noise, etc. For such cases a conditional KDE (kernel density estimate) response model can by used. In [13] this approach is shown to be superior to conditional Gaussians for certain computational psychology models. Note that for Gaussian kernels the conditional KDE is very similar to kernel ABC, but has additional flexibility of adaptively choosing bandwidths (rather than the fixed ε in kernel ABC).
Global response models
For very expensive simulators it is impractical to run simulations at each parameter location in the MCMC run. In these cases it is worth the extra storage and the computational overhead of learning a model of the simulator response surface or surrogate. For global response models the Metropolis-Hastings diverges from ABC-MCMC in that simulations are only performed if the surrogate is very uncertain. When the surrogate is confident, no simulations are performed.
The natural global extension of the Gaussian conditional model is the Gaussian process (GP). The GP has been used extensively for surrogate modeling [1, 2, 8, 14, 15], including more recent applications in accelerating ABC [16, 17]. In [16] GPs directly model the log-likelihood in successive waves of inference, each one eliminating regions of low posterior probability. This approach is capable of handling high-dimensional simulator outputs. In [17] each dimension of the simulator response is modeled by a GP and explicitly uses the surrogate uncertainty to determine simulation locations (design points). The advantage of this approach is that CDFs can be computed directly from the GPs predictive distributions. A global extension of the conditional KDE is more complicated, but estimators such as the Nadayara-Watson could provide the necessary modeling machinery. These extensions are beyond the scope of this paper.
MCMC for POPE
Algorithm 1 provides the pseudo-code for running a kernel ABC-MCMC version of POPE (easily modified to accommodate response models by plugging in the appropriate likelihood function for π _{ ε }(y ^{⋆}|y ^{(s)})). This is simply ABC-MCMC with one-sided kernel likelihoods. There are two possible modes for running POPE: marginal and pseudo-marginal. When running marginal MCMC, the state of the Markov chain only includes θ, and, as discussed earlier, has the property of improved mixing with the cost of doubling the number of simulations per Metropolis-Hastings step and a less accurate posterior. On the other hand, pseudo-marginal can mix poorly, but uses fewer simulations and is more accurate. Choosing between the two modes is problem specific.
Adaptive POPE
In ABC, the choice of ε is crucial to both the MCMC mixing and the precision of the posterior distribution. There is an obvious trade-off between the two as large ε provides better mixing but poorer approximations to the target distribution. It is common in ABC to adapt ε using quantiles of the discrepancies (e.g. in Sequential Monte Carlo ABC [18]) or using a more complicated approach, for example based on the threshold acceptance curve [19], or to include ε as part of the state of the Markov chain [20].
We propose an online version of the quantile method (see function UpdateEpsilons in Algorithm 2), setting ε to a quantile of the exponential moving average (EMA) of the discrepancies or some minimum values ε ^{MIN}, which ever is greater. Minimum values ε ^{MIN} are set not only for computational reasons, but also to reflect the scientist’s intuition regarding the relative importance of the constraints. Because ε can fluctuate during the MCMC run, it can explore regions where some constraints are easily satisfied, but others are not, and vice-versa. A quantile parameter β puts pressure on the chain to keep ε small.
For some problems we may not know certain objective values in y ^{⋆} before running POPE. For these cases simple adaptive MCMC procedures can estimate y ^{⋆} during the MCMC run. For deterministic simulators, y ^{⋆} can be updated after each simulation. For stochastic simulators we propose a local averaging procedure based on the EMA of y, similar to the adaptation of ε. The intuition behind this is that the best objective value y ^{⋆} at θ ^{⋆} is the expected value of the simulator response at θ ^{⋆}. An EMA of the simulation response approximates this expectation and we have found in our experiments with stochastic simulators that it performs well and conveniently fits into the POPE MCMC procedure (i.e. there is no need to set up an entirely different optimization procedure with complicated constraints on the input and outputs since these are already part of POPE). This is function UpdateObjectives in Algorithm 2.
These are adaptive MCMC algorithms that do not necessarily target the correct posterior distribution. The simplest way to correct this is to simply use a few MCMC runs to set ε or y ^{⋆} (if needed) or stop the adaptation altogether after a burnin period, from that point using non-adaptive ABC-MCMC. This is the approach we took in our experiments. Alternatively, the adaptation decay rate parameters δ and γ in Algorithm 2, could slowly increase to 1, at which point the adaptation ceases.
Posterior analysis of MCMC results
Along with the posterior parameter distribution p(θ | y ^{⋆}), which is usually the main distribution of interest in a Bayesian analysis, we will also examine the posterior predictive distribution, denoted as p(y|y ^{⋆}), though perhaps unintuitive, is the distribution of statistics (the predictions) generated by the simulation at the parameters from p(θ | y ^{⋆}). Posterior predictive distributions are used in statistics for model checking and model improvement [21], for example, and use the generative model with parameters from the posterior to generate pseudo or replicated data. Statistics of this data, defined by the statistician and considered important for the problem at hand, are compared to the statistics from the observations (the real data). One can then examine the bias and variance of the posterior predictive distributions with respect to the observations y ^{⋆}, or perform Bayesian t-tests (how probable are the observations y ^{⋆} under p(y|y ^{⋆})) (see [21], Chapter 6).
For ABC, the posterior analysis comes naturally, and, usually, for free. Using ABC-MCMC algorithms, statistics (judged important a priori by the scientist) are generated at each Metropolis-Hastings step. Simply storing the pairs {y,θ} from the MH step is sufficient to produce both p(y | y ^{⋆}) and p(θ | y ^{⋆}). In addition to the posterior predictive, visualizing the input-output posteriors, i.e. a joint p(y _{ j },θ _{ d } | y ^{⋆}) from the combined posterior predictive and posterior distribution, can lead to additional insight.
Results and discussion
Case 1: stem-cell niche geometry in C. elegans
Minimizing the time it takes to develop an organ or to return to a desired steady state after perturbation is an important performance objective for biological systems [22, 23]. Control of the cycling speed of stem cells and of the timing of their differentiation is critical to optimize the dynamics of development and regeneration. This control is often exerted in part by stem cell niches. While stem cell niches are known to employ a number of molecular signals to communicate with stem cells [24], the impact of their geometry on stem cell behavior has received less attention. To begin to address this question, we ask here how niches should be shaped to minimize the amount of time to produce a given number of differentiated cells.
We consider a model organ inspired from the C. elegans germ line, which is similar to a number of other systems [25]. Cells reside within a tube-like structure; one end defined by the niche is closed, while the other is open and allows cells to exit. The set of possible positions that can be assumed by stem cells is constrained by the geometry of the niche; a dividing cell that is surrounded by neighbors pushes away one of its neighbors, which in turn might need to push away one of its own neighbors; cells pushed outside of the niche by one of these chain displacement reactions are forced to leave the cell cycle and differentiate. A simulator we developed tracks cell division and movement, and outputs the time it takes to produce N cells for a given geometry. This geometry is such that rows are defined along the main axis of the organ; each cell row has its own size, comprised between 1 and 400 cells. There are several constraints that are put on the niche geometry to help the model remain realistic: the niche should hold fewer than 400 cells total, row size should monotonically increase along the niche axis, and the geometry should be “well-behaved” (i.e., there should not be large jumps in row size along the axis).
Experimental set-up
We performed several sets of experiments aimed at discovering the effects that realistic niche geometry constraints have on the time to 300 cells. We therefore define a single statistic y to be the time to N=300 cells for a niche of D rows; a niche geometry vector θ defines the simulator input parameters. In this study we set the number of rows in the niche to D=8. To enforce the monotonicity constraints, we define θ _{1}=1+g _{1} and θ _{ d }=θ _{ d−1}+g _{ d },∀d>1, i.e. we define niche geometries in terms of niche increment parameters g _{ d }≥0. With this set-up, we can change the prior constraints and observe the effects on the posterior predictive distribution p(y|y ^{⋆}).
The likelihood is a one-sided kernel \(\pi \left (y^{\star }\,|\, y\right) \propto K_{\boldsymbol {\epsilon }_{y}}\left (y;\; y^{\star } \right)\), where ε _{ y }=0.01 (except for experiment D and E, below) and y ^{⋆}=27.05. For this problem we did not know y ^{⋆} a priori, so we ran 5 runs of marginal kernel ABC with S=1 and adapted y ^{⋆} (Algorithm 2). We set y ^{⋆}=27.05, the median value from 5 runs (which produced values 26.99, 27.03, 27.05, 27.07, 27.28). The EMA approach to estimating y ^{⋆} was fairly robust for this problem: since the EMA produces a local average of y, any improvement upon y ^{⋆} must be consistently better. The parameter ε _{ y } could be interpreted as an error in the estimation of y ^{⋆}.
Stem-cell niche geometry experimental set-up and posterior predictive results
Experiment | M | y ^{⋆} | \(\tau _{g_{1}}\) | \(\tau _{g_{d}}\) | τ _{ θ } | Mean y | Median y | Mode y | P(y<27.05) |
---|---|---|---|---|---|---|---|---|---|
A | 1 | 27.05 | 399 | 399 | 400 | 27.042 | 27.037 | 27.029 | 0.53 |
1 | 27.05 | 399 | 10 | 400 | 27.059 | 27.054 | 27.076 | 0.49 | |
B | 1 | ∞ | 399 | 399 | 400 | 27.078 | 27.081 | 27.076 | 0.43 |
1 | ∞ | 399 | 10 | 400 | 27.298 | 27.150 | 27.114 | 0.32 | |
C | 1 | ∞ | 399 | 399 | 1500 | 30.159 | 30.184 | 30.224 | 0.00 |
1 | 27.05 | 399 | 399 | 1500 | 27.322 | 27.227 | 27.150 | 0.24 | |
D | 10 | 27.05 | 399 | 399 | 400 | 27.053 | 27.049 | 27.043 | 0.51 |
10 | 27.05 | 399 | 10 | 400 | 27.056 | 27.053 | 27.050 | 0.47 |
Experiment A: realistic constraints on g _{ d }
Experiment B: removing constraint on time to 300 cells
Experiment C: increasing threshold on total niche cells
Experiment D: replacing statistics with average of replicates
Experiment E: sensitivity to ε _{ y }
Discussion
The results of experiments A-C demonstrate the relative importance of the input and output constraints on the posterior probability of y | y ^{⋆}. The most important constraints are \(\sum \theta _{d}\) and y<y ^{⋆}. Both have similar effects on the posterior predictive distribution. The constraint \(\tau _{g_{d}}\) has little effect on P(y<27.05 | y ^{⋆}), but does produce significantly different posterior geometries, mainly due to the prior constraints.
Experiments A-D illustrate the usefulness of POPE for exploring the roles constraints play on the optimization posterior. We found that the constraints on the prior over valid regions of θ had significant influence on the posterior, and played a role similar to the likelihood term. Using realistic constraints on changes in row sizes had very little detrimental effect on the time to 300 cells, compared to having no realistic constraint. More important was the constraint on total geometry size. We found very little difference in the posteriors when the statistics were averages of simulation replicates versus a single simulation. This makes sense if the simulation noise is taken into account when setting ε: when increasing the number of replicates in the average, ε should be decreased (from its setting at M=1) to take into account the population mean variance, but this seems unnecessary since the posteriors change little, but the number of simulations increases.
Experiment E explored the role ε _{ y } plays in POPE. We found that only large changes had significant effects on the posterior, though we emphasize that this is entirely problem specific. In the niche experiments, there is a large region of parameter space which satisfies all the constraints. For this problem, ε plays a less important role than in other problems where it is difficult to find any parameter values for which all constraints are satisfied. In those situations, ε plays a critical role in POPE since it enables mixing of the Markov chain. It is therefore useful to measure acceptance probabilities in a few preliminary runs to guide the scientist in setting ε. Once a satisfactory acceptance rate is achieved (e.g. 20 % to 40 %), one could fix ε and run experiments. Afterwards, samples that violate constraints can always be ignored in the analysis.
From a biological perspective, further simulation experiments with this stem-cell model could address whether giving cells some flexibility in the position at which they differentiate allows for more flexibility in the optimal geometry, perhaps allowing that geometry to also satisfy competing performance objectives. POPE offers a robust and consistent Bayesian framework for new experiments.
Case 2: spotted patters in colon cancer tumors
A remarkable pattern of spots is visible in the tissue of colon cancer tumors when stained for markers indicating glycolytic activity. It is hypothesized that the spotted regions indicate localized areas of glycolytic cells, whereas surrounding areas are considered oxidative cells. Furthermore it is thought that Wnt signaling (an important cell signaling pathway in development and healing) plays a critical role in reducing glycolytic activity [26], thereby resulting in significant changes in spot formation. Experiments blocking Wnt by overexpression of a dominant negative form of lymphoid enhance factor (dnLEF-1) have shown that interfering with the Wnt pathway leads to fewer but larger spots and lighter background staining color than Mock tissue (tumors that have not received dnLEF-1 intervention).
Based on these findings, a simulator of a mathematical model of reaction-diffusion equations was built that produces spatial and temporal dynamics of a population fraction of oxidative cells and glycolytic cells, as well as the activity of Wnt and a Wnt inhibitor. The Wnt and Wnt inhibitor equations are based on the Gierer-Meinhardt activator-inhibitor model, where Wnt is the activator which produces a factor that inhibits Wnt activity.
Simulation parameters θ for spotted patterns in colon cancer tumors
Parameter θ | Description | Mock | A | B | C | D | E |
---|---|---|---|---|---|---|---|
κ _{ W }>0 | Rate of nonlinear Wnt production | 4 | 0.442 | 0.951 | 2.44 | 0.399 | 0.315 |
\(\kappa _{W_{I}} > 0\) | Rate of Wnt inhibitor production | 1 | 27.4 | 0.484 | 0.161 | 0.486 | 0.188 |
μ _{ W }≥0 | Decay rate of Wnt | 2 | 0.642 | 0.179 | 0.791 | 0.545 | 0.936 |
\(\mu _{W_{I}} \geq 0\) | Decay rate of Wnt inhibitor | 4 | 2.36 | 1.30 | 1.10 | 0.569 | 1.064 |
a≥0 | Constant of inhibition | 10^{−8} | 0.4006 | 0.416 | 0.0384 | 0.00491 | 0.0284 |
b≥0 | Constant of inhibition by W _{ I } | 1 | 0.0125 | 7.94 | 20.05 | 0.616 | 0.640 |
S _{ W }≥0 | Rate of constitutive Wnt production | 1 | 0.00167 | 0.00351 | 17.75 | 0.00005 | 0.00009 |
1≥D _{ W }>0 | Diffusion coefficient of Wnt | 0.01 | 0.0180 | 0.00322 | 0.0955 | 0.0336 | 0.0810 |
1≥N>0 | Nutrient level | 1 | 0.818 | 0.897 | 0.984 | 0.959 | 0.970 |
Simulation statistics y for spotted patterns in colon cancer tumors
Statistic y | Feasible region | Mock (y ^{⋆}) | A | B | C | D | E |
---|---|---|---|---|---|---|---|
Avg. Spot Width | y _{1}>0.604 | 0.604 | 1 | 0.65 | 0.65 | 1 | 1.75 |
Number of Spots | y _{2}∈[2,3,4] | 5 | 3 | 4 | 2 | 3 | 2 |
Avg. Background | y _{3}<0.807 | 0.807 | 0.77 | 0.75 | 0.70 | 0.6 | 0.70 |
Avg. Wnt | y _{4}<5.67 | 5.67 | 3.25 | 1.50 | 0.75 | 1 | 2 |
The Mock values y ^{⋆} define the constraints on simulator statistics y. More precisely, they constrain the posterior to regions where \(\left [ y_{1} > y^{\star }_{1}\right ],\left [ y_{2} < y^{\star }_{2}\right ],\left [ y_{3} < y^{\star }_{3} \right ]\), and \(\left [ y_{4} < y^{\star }_{4} \right ]\), which correspond to the goal of producing different patterns from Mock. For example, the first constraint states that we want the spot widths from simulation to be greater than \(y^{\star }_{1} = 0.604\), the average width of spots for the Mock setting θ. Similarly, we want fewer than 5 spots, a background lighter than 0.807, and a Wnt level less than 5.67. Further constraints are added to avoid degenerate simulation results; as an example, we set its likelihood to zero when there are no spots detected.
This simulator is deterministic but expensive to evaluate, requiring roughly 30 seconds to complete for the 1D simulator used in our experiments, and 90 seconds for the 2D simulator, used for generating 2D images only. We ran 6 chains of length 4000 pseudo-marginal kernel ABC-MCMC with S=1. To initialize the chains, a short rejection sampling procedure was used to select θ _{0} for each random seed. This is necessary as many random configurations of θ result in degenerate simulation results (i.e. zero likelihood). Diffuse log-normal prior distributions were placed over θ _{1} to θ _{7} and weak Beta priors put on D _{ W } and N. At least 100 initial samples were discarded from each chain; sometimes more if the chain had not yet reached a location where all the constraints were satisfied. In total there were 22257 samples in the posterior.
Discussion
This case study illustrates the usefulness of POPE for exploratory simulation analysis. As a first attempt at studying this simulator from an ABC perspective, POPE revealed several regions of parameter settings that produce qualitatively different images from Mock. Now experts can examine these various solutions to further develop the simulator or to increase the number of statistics. For example, some of the parameter settings in the posterior seem to be similar to the prior, indicating they have little influence on the posterior. If this does not match the intuition of the experts, the role these parameters have in the simulator can be re-evaluated. The J=4 statistics may also not be the most informative for the experts; based on our results, learning the statistics (using computer vision techniques applied to the images) or modifying the current statistics may improve the ability of the experts to learn more about the spot formation process.
This type of interaction between simulation model and cancer researchers is important; ongoing research with a modified version of this tumor metabolism simulator will include non-constant nutrient levels and various therapeutic regimes, which will improve our understanding of cancer metabolism, and in turn aid the development of new treatments or therapies.
Conclusions
In simulation-based science, simulators encode complex models of natural phenomena. Often scientists wish to find an optimal parameter setting—one that minimizes some cost function—for their simulator, subject to constraints on both the parameters and the other outputs of the simulator. However, a single optimal parameter setting, while useful, conveys limited information to scientists about parameter dependencies and sensitivities or allow them to compare different models in terms of their goodness of fit. We have proposed simple extensions to likelihood-free inference that incorporate one-sided likelihood kernels into standard ABC algorithms, allowing scientists to run ABC, post-optimization.
With POPE, scientists can answer these important questions regarding their optimized simulation model using a fully Bayesian approach. As a result, scientists can examine posterior predictive distributions, parameter correlations and perform sensitivity analyses. These analyses could in turn discover “overfit” optimum, where minor changes to the parameters lead to dramatic changes in the cost function, or quickly violate (biological) constraints. As Bayesian inference procedure, POPE naturally incorporates parameter and simulator uncertainty, therefore allowing it to be used to discover regions of parameter space that improve upon optimal settings.
We applied POPE to two case studies: one in an optimization setting (stem-cell niche geometry) and a non-optimization setting (spotting patterns in cancer tissue), showing its usefulness to general constraint-based likelihoods. These studies demonstrated that POPE naturally handles constraints on both the input parameters and the simulator output statistics, as well as in situations where the simulator is either very noisy or is deterministic. The preliminary results on these case studies offer many avenues for future work.
It is natural to extend POPE with surrogate models so that it can be applied to expensive simulators. Although there is considerable excitement in the machine learning community about optimizing objectives that are hard to evaluate, such as those defined by simulators, there is almost no work on analyzing such problems “post optimization”. POPE is easily combined with black-box optimization using surrogates with Bayesian posterior inference.
Endnotes
^{1} Note that this is reminiscent of the “probability of improvement” used in Bayesian optimization [8]. However, that quantity is different due to the fact that it includes the uncertainty of the surrogate function to predict the value of y. In POPE the posterior probability density is solely determined by the uncertainty due to noise in the simulation process.
^{2} We wish to clearly distinguish between response models described here and a simulator as a model of natural phenomena. A response model is a conditional distribution of statistics y at parameter location θ and has its own sets of parameters, such as mean and variance, or Gaussian process covariance parameters, that are of secondary interest and are useful computationally for inference. These should be distinguished from simulator parameters θ that are scientifically interesting, in of themselves.
Declarations
Acknowledgements
MC and OC thank support from the National Institutes of Health grants R01-GM102635 and R21-AG42125. JL, OC, MC, and ML thank the National Institutes for Health through grant P50-GM76516 for a Center of Excellence in Systems Biology at the University of California, Irvine, and P30-CA062203 for the Chao Comprehensive Cancer Center at the University of California, Irvine. JL also acknowledges partial support from the National Science Foundation, Division of Mathematics.
Authors’ Affiliations
References
- Lizotte DJ. Practical Bayesian optimization. University of Alberta: Phd thesis; 2008.Google Scholar
- Osborne MA, Garnett R, Roberts SJ. Gaussian processes for global optimization: 3rd International Conference on Learning and Intelligent Optimization (LION3); 2009, pp. 1–15.Google Scholar
- Snoek J, Larochelle H, Adams RP. Practical Bayesian optimization of machine learning algorithms: Advances in Neural Information Processing Systems 25; 2012. arXiv:1206.2944.Google Scholar
- Marjoram P, Molitor J, Plagnol V, Tavaré S. Markov chain Monte Carlo without likelihoods. Proc Nat Acad Sci. 2003; 100(26):15324–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Wilkinson R. Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. Stat Appl Genet Mol Biol. 2013; 12(2):129–42.PubMedGoogle Scholar
- Sisson SA, Fan Y. Likelihood-free Markov chain Monte Carlo. arXiv:1001.2058. 2010.Google Scholar
- Jones DR, Schonlau M, Welch WJ. Efficient global optimization of expensive black-box functions. J Global Optimization. 1998; 13(4):455–92.View ArticleGoogle Scholar
- Couckuyt I, Gorissen D, DeTurck F, Dhaene T. Inverse surrogate modeling: output performance space sampling. In: 13th AIAA/ISSMO Multidisciplinary Analysis Optimization Conference. Texas: 2010.Google Scholar
- Moral PD, Doucet A, O’Hagan A. Adaptive sequential Monte Carlo samplers: Technical report, University of Bordeaux; 2008.Google Scholar
- Andrieu C, Roberts G. The pseudo-marginal approach for efficient Monte Carlo computations. Ann Stat. 2009; 37(2):697–725.View ArticleGoogle Scholar
- Bornn L, Pillai N, Smith A, Woodard D. The use of a single pseudo-sample in approximate Bayesian computation. ArXiv e-prints. 2014. arXiv:1404.6298v4.Google Scholar
- Wood SN. Statistical inference for noisy nonlinear ecological dynamic systems. Nature. 2010; 466(7310):1102–4.View ArticlePubMedGoogle Scholar
- Turner BM, Sederberg PB. A generalized, likelihood-free method for posterior estimation. Psychon Bull Rev. 2014; 21(2):227–50.View ArticlePubMedPubMed CentralGoogle Scholar
- Rasmussen CE. Gaussian Processes to speed up hybrid Monte Carlo for expensive Bayesian integrals. Bayesian Stat. 2003; 7:651–9.Google Scholar
- Kennedy M, O’Hagan A. Bayesian calibration of computer models (with discussion). J R Stat Soc Ser B. 2001; 63:425–64.View ArticleGoogle Scholar
- Wilkinson R. Accelerating ABC methods using Gaussian processes. AISTATS. 2014. arXiv:1401.1436v2.Google Scholar
- Meeds E, Welling M. GPS-ABC: Gaussian process surrogate approximate Bayesian computation. Uncertainty in AI. 2014. arXiv:1401.2838v1.Google Scholar
- Beaumont MA, Cornuet JM, Marin JM, Robert CP. Adaptive approximate Bayesian computation. Biometrika. 2009; 96(4):983–90.View ArticleGoogle Scholar
- Silk D, Filippi S, Stumpf MPH. Optimizing threshold-schedules for sequential approximate Bayesian computation: applications to molecular systems. Stat Appl Genet Mol Biol. 2013; 12(5):603–18.PubMedGoogle Scholar
- Bortot P, Coles S, Sisson S. Inference for stereological extremes. J Am Stat Assoc. 2007; 102:84–92.View ArticleGoogle Scholar
- Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis, 2nd edn. New York: Chapman and Hall/CRC; 2004. Chap. 6.Google Scholar
- Lander AD, Gokoffski KK, Man FYM, Nie Q, Calof AL. Cell lineages and the logic of proliferative control. PLoS Biol. 2009; 7(1):608–19.View ArticleGoogle Scholar
- Itzkovitz S, Blat IC, Jacks T, Clevers H, van Oudenaarden A. Optimality in the development of intestinal crypts. Cell. 2012; 148(3):608–19.View ArticlePubMedPubMed CentralGoogle Scholar
- Li L, Xie T. Stem cell niche: structure and function. Ann Rev Cell Dev Biol. 2005; 21:605–31.View ArticleGoogle Scholar
- Cinquin O. Purpose and regulation of stem cells: a systems-biology view from the Caenorhabditis elegans germ line. J Pathol. 2009; 217(2):608–19.View ArticleGoogle Scholar
- Pate KT, Stringari C, Sprowl-Tanio S, Wang K, TeSlaa T, Hoverter NP, et al. Wnt signaling directs a metabolic program of glycolysis and angiogenesis in colon cancer. EMBO J. 2014. doi:http://dx.doi.org/10.15252/embj.20148859810.15252/embj.201488598.
Copyright
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.