Volume 14 Supplement 10

## Selected articles from the 10th International Workshop on Computational Systems Biology (WCSB) 2013: Bioinformatics

# Mapping behavioral specifications to model parameters in synthetic biology

- Heinz Koeppl
^{1, 2}Email author, - Marc Hafner
^{3}and - James Lu
^{4}

**14(Suppl 10)**:S9

**DOI: **10.1186/1471-2105-14-S10-S9

© Koeppl et al; licensee BioMed Central Ltd. 2013

**Published: **12 August 2013

## Abstract

With recent improvements of protocols for the assembly of transcriptional parts, synthetic biological devices can now more reliably be assembled according to a given design. The standardization of parts open up the way for *in silico* design tools that improve the construct and optimize devices with respect to given formal design specifications. The simplest such optimization is the selection of kinetic parameters and protein abundances such that the specified design constraints are robustly satisfied. In this work we address the problem of determining parameter values that fulfill specifications expressed in terms of a functional on the trajectories of a dynamical model. We solve this inverse problem by linearizing the forward operator that maps parameter sets to specifications, and then inverting it locally. This approach has two advantages over brute-force random sampling. First, the linearization approach allows us to map back intervals instead of points and second, every obtained value in the parameter region is satisfying the specifications by construction. The method is general and can hence be incorporated in a pipeline for the rational forward design of arbitrary devices in synthetic biology.

## Introduction

Synthetic biology places emphasis on small, standardized molecular parts and devices, mostly operating at the transcriptional level [1, 2]. With standardization comes the need for rigorous quantitative characterization of such devices and for a compositional theory to reliably build larger systems from small canonical circuits. For now most synthetic circuits implemented *in vivo* were constructed from a small number of components with topology and parameter values found by trial-and-error. The development of larger synthetic systems necessitates the use of appropriate design methodologies. *In silico* analyses can provide significant insights into the construction of complex synthetic systems, but due to the poor quantification of experimental and micro-environmental conditions, the predictive capability of *in silico* models for *in vivo* implementations remains limited. Apart from experimental limitations, modeling attempts to date most often make simplifying assumptions about all the perturbations that a synthetic construct is facing in vivo. For instance, only a few studies account for the large extrinsic noise [3–5] and in particular the one introduced by variations of plasmid copy number [6]. Incorporating those realistic *in vivo* constraints will make computational models more predictive, eventually enabling the upfront *in silico* optimization of transcriptional circuits. A first step toward this goal is to investigate the parameter dependency of certain behaviorial properties of a circuits. In systems biology attempts have already been made to address this problem, however, they either rely on purely local measures [7, 8] such as considered in classical sensitivity analysis [9, 10], or perform random parameter sampling [11] to determined parameter dependencies.

For a given circuit topology, kinetic parameters and other parameters that are involved in controlling the expression level of molecular species (e.g. promoter activity or number of ribosome binding sites) are important design parameters in synthetic biology. A major challenge is to find a set of parameters that satisfies the behavioral specification of a device [12]. Computer science offers various languages to formally define the proper functioning of a piece of code or hardware. Such specification languages of formal verification are used to check important behavioral properties, such as liveness, safety or fairness [13]. One convenient way to specify such properties is to use *temporal logic*, which is considered an extension of classical propositional reasoning, where propositional variables may change their truth values over time. A prominent such logic is the linear temporal logic (LTL), where the truth value of the propositions is interpreted over a linear timeline [13]. Such techniques were already applied to investigate robustness of computational models in system biology [14].

*specification functionals*. These are mappings

*ψ*from an appropriate function space

*χ*of

*n*-dimensional trajectories (e.g. ${L}_{2}(\left[0,T\right],{\mathbb{R}}^{n}$)) to the

*m*-dimensional reals and we choose the form

*g*(

*t*,

*x*(

*t*)) =

*h*(

*T − t*)

*x*(

*t*). In the following we will only require the map

*x*→

*g*(

*·*,

*x*) to be once-differentiable. With this, we can define the forward map from a

*p*-dimensional parameter space to the feature space as the composition

*F ≡ ψ*ο

*φ*, with $\phi :{\mathbb{R}}^{p}\to \mathcal{X}$. The trajectories $x\in \mathcal{X}$ are generated by the reaction rate equation

with the stoichiometric matrix $N\in {\mathbb{Z}}^{n\phantom{\rule{2.77695pt}{0ex}}\times \phantom{\rule{2.77695pt}{0ex}}q}$, the reaction flux vector $v:{\mathbb{R}}_{\ge 0}^{n}\times {\mathbb{R}}_{\ge 0}^{p}\to {\mathbb{R}}_{\ge 0}^{q}$ and $k\in {\mathbb{R}}_{\ge 0}^{p}$ the parameter set.

## Methods

The brute-force method of determining the parameter region that satisfies a certain behavioral specification $S\subseteq \mathcal{F}$ usually proceeds by Monte Carlo sampling of parameter sets, generating corresponding trajectories according to (1), checking whether those satisfy *S* and finally retaining only those parameter sets that led to satisfied specification *S*. There are two immediate downsides of this approach. First, most draws will be unsuccessful for high dimensional parameter spaces, for tight specifications, or for both. Different approaches using an optimized sampling [11, 17] have been developed to mitigate this problem, but are not solving it as they require convergence of the sampling. Second, drawing parameter points in ${\mathbb{R}}^{p}$ does not provide guarantees that those points belong to a connected domain of consistent parameter sets. Here we provide first attempts to tackle both problems.

*F*around some point and then locally invert it. Hence, a small enough local patch in feature space can be mapped backward to a small patch in parameter space. By successively sampling expansion points in their neighborhoods (e.g. by the ball-walk algorithm [18]) we can systematically cover the entire specification

*S*and obtain the corresponding parameter region. A series expansion of

*F*around some initial parameter set

*k*

^{0}reads

*f ≡ F*(

*k*

^{0}+ d

*k*)

*− F*(

*k*

^{0}) we see that a neighborhood d

*f*in feature space to first order can be mapped backward using the Moore-Penrose pseudo-inverse

*L*denotes the linearized forward map and hence is just the

*m × p*matrix

*L*

^{ T }

*L*and

*LL*

^{ T }do not exist. Such situations are encountered as soon as the number of specification features

*m*are less than the number of parameters, i.e. the dimension

*p*of the parameter space. Importantly, we can compute (3) efficiently using the variational equation for the system (1). Observe that

*k*around

*k*

^{0}. According to the variational equation the sensitivity obeys the following ordinary

*n × p*matrix differential equation

*k*

^{0}for brevity. Note, that (4) is equivalent to the transient sensitivity analysis of metabolic networks [9, 10], proposed as an extension of classical metabolic control analysis that only deals with steady state sensitivities. For a certain

*k*

^{0}the sensitivity of the kernel

*g*is a constant

*m × n*matrix that can be computed explicitly. Thus, by jointly solving (1) and (4) for some

*k*

^{0}together with

*T*we obtain the linearized map

*L*=

*L*(

*T*). Hence, for every sampled

*k*

^{0}and associated feature point

*f*

^{0}we propose to design a feature ball

*L*

^{†}. According to the singular value decomposition

*L*

^{†}=

*U*Σ

*V*with Σ a diagonal matrix with non-negative entries [16], the backward transformation needs to be a sequence of a rotation, a scaling and another rotation and hence the image of ${\mathcal{B}}_{{f}^{0}}$ under

*L*

^{†}can only be a ellipsoid in the parameter space

Clearly, sampling a multivariate region with balls of same dimension allow for a complete coverage of the region - something that can only be extrapolated when using pointwise sampling [11]. The question to efficiently sample a region with balls has been addressed in computational geometry and efficient randomized algorithms are available [18].

*L*is not the best local approximation to

*F*(

*k*) in some norm sense. More specifically we can improve on

*L*if we are giving additional samples of the neighborhood ${\mathcal{B}}_{{f}^{0}}\left(\delta \right)$. Consider we draw another ${k}^{i}\in {\mathcal{B}}_{{k}^{0}}$, then we can construct a rank-one update to

*L*

*F ≡ F*(

*k*

^{ i })

*− F*(

*k*

^{0}) and Δ

*k ≡ k*

^{ i }

*− k*

^{0}. In particular, the rank-one term (5) captures the nonlinear part of

*F*. From (5) it follows that the matrix ${\tilde{L}}^{i}$ satisfies the consistency property

*F*(

*k*) locally. In fact, ${\tilde{L}}^{i}$ is the matrix closest to

*L*, with respect to the Frobenius norm, that satisfies (6). Subsequently we will use this improved linear approximation to

*F*to bound the error that one can incurrs if one uses the pseudoinverse

*L*

^{†}for the backward map. This will also provide means to determine the maximal ball size

*δ*to stay below a certain error bound. We quantify the error in the feature space by the backward map followed by a forward map. That is, we want to find a

*δ*such that

for all $f\in {\mathcal{B}}_{{f}^{0}}\left(\delta \right).$

*ρ*(

*δ*) for the Frobenius norm of the rank-one perturbation, i.e. $\left|\right|\tilde{L}-L|{|}_{F}\le \rho \left(\delta \right)$ in the local domain of interest. Note, that

*ρ*(

*δ*) could and need to be estimated by sampling. Given a ${f}^{i}\in {\mathcal{B}}_{{f}^{0}}\left(\delta \right)$ the maximal error of the inverse-forward map is

*L*has linearly independent rows,

*LL*

^{†}is the identity matrix and thereby the error simplifies to

*δ*when relying on the pseudo-inverse

## Results

*x*

_{ i }denote the concentration of mRNA, protein, protein-dimer and dimer-promoter complex, respectively. The quantities ${x}_{5}^{0}$ and

*y*(

*t*) refer the total number of promoters and the external inhibitor concentration, respectively. The nominal value and the meaning of the model parameters are summarized in Table 1. We remark that such continuous state-space model have their limitations for transcriptional circuits because they require several gene copies in order to neglect the discrete Boolean nature of a single gene.

Nominal values and meaning of the kinetic parameters for the model of the synthetic sensor construct.

Basal transcription rate |
| 0.02 sec |
---|---|---|

Active-promoter transcription rate |
| 0.4 sec |

mRNA degradation rate |
| 0.3 sec |

Protein translation rate |
| 3 (nMsec) |

Dimerization rate |
| 0.1 (nMsec) |

Dimer dissociation rate |
| 0.001 sec |

Inhibitor binding rate |
| 0.011 (nMsec) |

Inhibitor unbinding rate |
| 0.2 sec |

Dimer-promoter binding rate |
| 0.21 (nMsec) |

Dimer-promoter unbinding rate |
| 0.2 sec |

Protein degradation rate |
| 0.2 sec |

*x** (

*s*) for the monomer and the dimer concentration over two small time intervals for each. More specifically,

*w*is the temporal weight function chosen to be

*w*

_{1}and

*w*

_{2}, as well as the target values are shown together with the trajectories for the nominal system (9) in Figure 3.

*k*

_{7}and the binding rate of the dimer to the promoter

*k*

_{9}. To assess the error incurred by the linearization we consider the reverse-forward mapping as described in (7). Hence for various size of

*δ*we perform the inverse mapping with

*L*

^{†}and the forward mapping with

*F*. If the inverse map is exact we should obviously obtain a ball with the same

*δ*. Any deviation

*ε*thereof reflects the approximation of

*F*

^{ − }1 by

*L*

^{†}. In Figure 4 the images of ${\mathcal{B}}_{{f}^{0}}\left(\delta \right)$ under

*L*

^{†}and

*F ◦ L*

^{†}are shown for various radii

*δ*.

*δ*a good trade-off between approximation accuracy and sampling coverage is achievable. A systematic sampling of a predetermined specification area

*S*would proceed by successively sampling overlapping balls with radii adapted to maintain

*ε*under a certain value as illustrated in Figure 5. In this example, the coverage of the region

*S*is above 98% using 50 balls of different radii. The lower left corner of the specification space (Figure 5A) maps to a strongly nonlinear region of the parameter space (upper right corner in Figure 5B) and therefore forces the use of smaller balls to keep the error in acceptable range. On the contrary, the upper right region of the specification space is more linear and larger balls can be used with limited relative error (Figure 5C).

## Conclusion

We presented a novel method to determine the parameter region of a biochemical reaction network that is consistent with a certain dynamical, behavioral specification. We defined specifications in a novel and general way that requires only the specification map to be once differentiable with respect to the states of the underlying differential equations. We showed that by locally linearizing this map we can solve the desired inverse problem of finding a parameter region for a given specification. As regions, instead of points, are mapped back to parameter space the scheme is in principle able to cover (given some regularity conditions) the feature and parameter space - something that is not possible with point-wise sampling. We also discuss means for estimating the size of the local neighborhood in order to guarantee certain approximation errors. The computational framework allows a very flexible definition of biologically relevant behavorial features and efficient determination of the corresponding parameter region. Hence, the range of experimentally modifiable parameters, such as promoter binding strength can be determined upfront before the experimental synthesis of a synthetic construct.

Throughout this work we only considered models based on ordinary differential equations, but the outlined framework can be extended to include stochastic dynamical models through the use of moment closure methods, for instance. In general, the specification functional will then involve the expectation operator and Monte Carlo sampling may be required to approximate it. Methods from stochastic sensitivity analysis [20] can be applied in order to perform the local inversion.

## Declarations

### Declarations

Publication of this article was supported by the Swiss National Science Foundation (SNSF) grant number PP00P2_128503.

This article has been published as part of BMC Bioinformatics Volume 14 Supplement 10, 2013: Selected articles from the 10th International Workshop on Computational Systems Biology (WCSB) 2013: Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/14/S10.

## Authors’ Affiliations

## References

- Nandagopal N, Elowitz MB: Synthetic biology: integrated gene circuits. Science (New York, NY). 2011, 333 (6047): 1244-8. 10.1126/science.1207084.View Article
- Lu TK, Khalil AS, Collins JJ: Next-generation synthetic gene networks. Nature Biotechnology. 2009, 27 (12): 1139-50. 10.1038/nbt.1591.PubMed CentralView ArticlePubMed
- Bowsher CG, Swain PS: Identifying sources of variation and the flow of information in biochemical networks. Proceedings of the National Academy of Sciences of the United States of America. 2012, 109 (20): E1320-8. 10.1073/pnas.1119407109.PubMed CentralView ArticlePubMed
- Hilfinger A, Paulsson J: Separating intrinsic from extrinsic fluctuations in dynamic biological systems. Proceedings of the National Academy of Sciences of the United States of America. 2011, 108 (29):
- Zechner C, Ruess J, Krenn P, Pelet S, Peter M, Lygeros J, Koeppl H: Moment-based inference predicts bimodality in transient gene expression. Proceedings of the National Academy of Sciences of the United States of America. 2012, 109 (21): 8340-5. 10.1073/pnas.1200161109.PubMed CentralView ArticlePubMed
- Bleris L, Xie Z, Glass D, Adadey A, Sontag E, Benenson Y: Synthetic incoherent feedforward circuits show adaptation to the amount of their genetic template. Molecular Systems Biology. 2011, 7 (519): 519-PubMed CentralPubMed
- Brown SK, Sethna JP: Statistical mechanical approach to models with many poorly known parameters. Physical Review E. 2003, 021904-68
- Gutenkunst RN, Waterfall JJ, Casey FP, Brown KS, Myers CR, Sethna JP: Universally Sloppy Parameter Sensitivities in Systems Biology Models. PLoS Comput Biol. 2007, 3 (10): e189-10.1371/journal.pcbi.0030189.PubMed CentralView Article
- Hatzimanikatis V, Bailey JE: MCA has more to say. Journal of Theoretical Biology. 1996, 182 (3): 233-42. 10.1006/jtbi.1996.0160.View ArticlePubMed
- Ingalls BP, Sauro HM: Sensitivity analysis of stoichiometric networks: an extension of metabolic control analysis to non-steady state trajectories. Journal of Theoretical Biology. 2003, 222: 23-36. 10.1016/S0022-5193(03)00011-0.View ArticlePubMed
- Hafner M, Koeppl H, Hasler M, Wagner A: 'Glocal' robustness analysis and model discrimination for circadian oscillators. PLoS Computational Biology. 2009, 5 (10): e1000534-10.1371/journal.pcbi.1000534.PubMed CentralView ArticlePubMed
- Miller M, Hafner M, Sontag E, Davidsohn N, Subramanian S, Purnick PEM, Lauffenburger D, Weiss R: Modular Design of Artificial Tissue Homeostasis: Robust Control through Synthetic Cellular Heterogeneity. PLoS Comput Biol. 2012, 8 (7): e1002579-10.1371/journal.pcbi.1002579.PubMed CentralView ArticlePubMed
- Baier C, Katoen JP: Principles of Model Checking. 2008, The MIT Press, London
- Rizk A, Batt G, Fages F, Soliman S: Continuous valuations of temporal logic specifications with applications to parameter optimization and robustness measures. Theoretical Computer Science. 2011, 412 (26): 2827-2839. 10.1016/j.tcs.2010.05.008.View Article
- Engl H, Hanke M, Neubauer A: Regularization of inverse problems, Mathematics and its applications. 1996, KluwerView Article
- Christian HP: Rank-Deficient and Discrete Ill-Posed Problems. 1998, Society for Industrial and Applied Mathematics
- Zamora-Sillero E, Hafner M, Ibig A, Stelling J, Wagner A: Efficient characterization of high-dimensional parameter spaces for systems biology. BMC Systems Biology. 2011, 5: 142-10.1186/1752-0509-5-142.PubMed CentralView ArticlePubMed
- Vempala S: Geometric Random Walks: A Survey. Computational Geometry. 2005, 52: 573-612.
- Hooshangi S, Thiberge S, Weiss R: Ultrasensitivity and noise propagation in a synthetic transcriptional cascade. Proceedings of the National Academy of Sciences of the United States of America. 2005, 102 (10): 3581-6. 10.1073/pnas.0408507102.PubMed CentralView ArticlePubMed
- Sheppard P, Rathinam M, Khammash M: A pathwise-derivative approach to the computation of parameter sensitivities in discrete stochastic chemical systems. Journal of Chemical Physics. 2012, 136: 034115-10.1063/1.3677230.PubMed CentralView ArticlePubMed

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.