Mapping behavioral specifications to model parameters in synthetic biology

With recent improvements of protocols for the assembly of transcriptional parts, synthetic biological devices can now more reliably be assembled according to a given design. The standardization of parts open up the way for in silico design tools that improve the construct and optimize devices with respect to given formal design specifications. The simplest such optimization is the selection of kinetic parameters and protein abundances such that the specified design constraints are robustly satisfied. In this work we address the problem of determining parameter values that fulfill specifications expressed in terms of a functional on the trajectories of a dynamical model. We solve this inverse problem by linearizing the forward operator that maps parameter sets to specifications, and then inverting it locally. This approach has two advantages over brute-force random sampling. First, the linearization approach allows us to map back intervals instead of points and second, every obtained value in the parameter region is satisfying the specifications by construction. The method is general and can hence be incorporated in a pipeline for the rational forward design of arbitrary devices in synthetic biology.


Introduction
Synthetic biology places emphasis on small, standardized molecular parts and devices, mostly operating at the transcriptional level [1,2]. With standardization comes the need for rigorous quantitative characterization of such devices and for a compositional theory to reliably build larger systems from small canonical circuits. For now most synthetic circuits implemented in vivo were constructed from a small number of components with topology and parameter values found by trial-and-error. The development of larger synthetic systems necessitates the use of appropriate design methodologies. In silico analyses can provide significant insights into the construction of complex synthetic systems, but due to the poor quantification of experimental and micro-environmental conditions, the predictive capability of in silico models for in vivo implementations remains limited. Apart from experimental limitations, modeling attempts to date most often make simplifying assumptions about all the perturbations that a synthetic construct is facing in vivo. For instance, only a few studies account for the large extrinsic noise [3][4][5] and in particular the one introduced by variations of plasmid copy number [6]. Incorporating those realistic in vivo constraints will make computational models more predictive, eventually enabling the upfront in silico optimization of transcriptional circuits. A first step toward this goal is to investigate the parameter dependency of certain behaviorial properties of a circuits. In systems biology attempts have already been made to address this problem, however, they either rely on purely local measures [7,8] such as considered in classical sensitivity analysis [9,10], or perform random parameter sampling [11] to determined parameter dependencies.
For a given circuit topology, kinetic parameters and other parameters that are involved in controlling the expression level of molecular species (e.g. promoter activity or number of ribosome binding sites) are important design parameters in synthetic biology. A major challenge is to find a set of parameters that satisfies the behavioral specification of a device [12]. Computer science offers various languages to formally define the proper functioning of a piece of code or hardware. Such specification languages of formal verification are used to check important behavioral properties, such as liveness, safety or fairness [13]. One convenient way to specify such properties is to use temporal logic, which is considered an extension of classical propositional reasoning, where propositional variables may change their truth values over time. A prominent such logic is the linear temporal logic (LTL), where the truth value of the propositions is interpreted over a linear timeline [13]. Such techniques were already applied to investigate robustness of computational models in system biology [14].
Mathematically, the design problem is an inverse problem and hence inherits the general feature of such problems, namely ill-posedness [15,16]. More specifically, for a certain behavioral specification one aims to find the corresponding parameter set that gives rise to such behavior. An simple example for a quantity in feature space could be the concentration of a molecular species at particular time-points. The problem is closely related to parameter optimization and even more so to robust optimization, where an objective function -generally encoding some behavioral constraint (e.g. making model trajectories close to the measurements) -is optimized to yield the optimal parameter set. Ill-posedness refers to the observation that two close-by points in specification or behavioral feature space may map to very distant points in the parameter space, indicating that this mapping is generally not contractive but rather expansive. The inverse and corresponding forward problem is illustated in Figure 1.
In the current analysis we restrict ourselves to models obeying the reaction rate equation and hence constitute a set of nonlinear ordinary differential equations. In general, connected domains may map to disconnected domains, for instance if the dynamical system contains bifurcation points (e.g. see Figure 1). For the proposed linearization approach we will further restrict ourselves to connected domains in the respective image space. Moreover, we will not resort to specifying behavior through temporal logics but will define general specification functionals. These are mappings ψ from an appropriate function space c of n-dimensional trajectories (e.g. L 2 ([0, T], R n )) to the m-dimensional reals and we choose the form with x ∈ X and the feature kernel g : R ≥0 × R n ≥0 → F , where F ⊆ R m . A special and more tractable version of the kernel is the convolution, i.e. g(t, x(t)) = h(T − t)x(t). In the following we will only require the map x g(·, x) to be once-differentiable. With this, we can define the forward map from a p-dimensional parameter space to the feature space as the composition F ≡ ψ ο , with ϕ : R p → X . The trajectories x ∈ X are generated by the reaction rate equation with the stoichiometric matrix N ∈ Z n × q , the reaction flux vector v :

Methods
The brute-force method of determining the parameter region that satisfies a certain behavioral specification S ⊆ F usually proceeds by Monte Carlo sampling of parameter sets, generating corresponding trajectories according to (1), checking whether those satisfy S and finally retaining only those parameter sets that led to satisfied specification S. There are two immediate downsides of this approach. First, most draws will be unsuccessful for high dimensional parameter spaces, for tight specifications, or for both. Different approaches using an optimized sampling [11,17] have been developed to mitigate this problem, but are not solving it as they require convergence of the sampling. Second, drawing parameter points in R p does not provide guarantees that those points belong to a connected domain of consistent parameter sets. Here we provide first attempts to tackle both problems.
The main idea is to locally linearize the forward map F around some point and then locally invert it. Hence, a small enough local patch in feature space can be mapped backward to a small patch in parameter space. By successively sampling expansion points in their neighborhoods (e.g. by the ball-walk algorithm [18]) we can systematically cover the entire specification S and obtain the corresponding parameter region. A series expansion of F around some initial parameter set k 0 reads Defining df ≡ F (k 0 + dk) − F (k 0 ) we see that a neighborhood df in feature space to first order can be mapped backward using the Moore-Penrose pseudoinverse that we define with care as where L denotes the linearized forward map and hence is just the m × p matrix Note, that the limit in (2) exists even if the inverse of L T L and LL T do not exist. Such situations are encountered as soon as the number of specification features m are less than the number of parameters, i.e. the dimension p of the parameter space. Importantly, we can compute (3) efficiently using the variational equation for the system (1). Observe that where the last terms in the integral is just the sensitivity of the solution of (1) to perturbations in k around k 0 . According to the variational equation the sensitivity obeys the following ordinary n × p matrix differential equation where we skipped the explicit dependency on k 0 for brevity. Note, that (4) is equivalent to the transient sensitivity analysis of metabolic networks [9,10], proposed as an extension of classical metabolic control analysis that only deals with steady state sensitivities. For a certain k 0 the sensitivity of the kernel g is a constant m × n matrix that can be computed explicitly. Thus, by jointly solving (1) and (4) for some k 0 together with up to time T we obtain the linearized map L = L(T). Hence, for every sampled k 0 and associated feature point f 0 we propose to design a feature ball and map it backward using L † . According to the singular value decomposition L † = UΣV with Σ a diagonal matrix with non-negative entries [16], the backward transformation needs to be a sequence of a rotation, a scaling and another rotation and hence the image of B f 0 under L † can only be a ellipsoid in the parameter space Clearly, sampling a multivariate region with balls of same dimension allow for a complete coverage of the region -something that can only be extrapolated when using pointwise sampling [11]. The question to efficiently sample a region with balls has been addressed in computational geometry and efficient randomized algorithms are available [18].
We remark that the map L is not the best local approximation to F(k) in some norm sense. More specifically we can improve on L if we are giving additional samples of the neighborhood B f 0 (δ). Consider we draw another k i ∈ B k 0, then we can construct a rank-one update to L where ΔF ≡ F(k i ) − F(k 0 ) and Δk ≡ k i − k 0 . In particular, the rank-one term (5) captures the nonlinear part of F. From (5) it follows that the matrixL i satisfies the consistency propertỹ Thus, knowing how to construct rank-one updates over the domain of interest is equivalent to knowing F(k) locally. In fact,L i is the matrix closest to L, with respect to the Frobenius norm, that satisfies (6). Subsequently we will use this improved linear approximation to F to bound the error that one can incurrs if one uses the pseudoinverse L † for the backward map. This will also provide means to determine the maximal ball size δ to stay below a certain error bound. We quantify the error in the feature space by the backward map followed by a forward map. That is, we want to find a δ such that for all f ∈ B f 0 (δ). Now suppose we know a bound r(δ) for the Frobenius norm of the rank-one perturbation, i.e. ||L − L|| F ≤ ρ (δ) in the local domain of interest. Note, that r(δ) could and need to be estimated by sampling. Given a f i ∈ B f 0 (δ) the maximal error of the inverse-forward map is which is known from robust linear squares [16] to be equivalent to the error Assuming that L has linearly independent rows, LL † is the identity matrix and thereby the error simplifies to This result provides one way to determine the radius of the feature ball δ when relying on the pseudo-inverse

Results
As a proof of concept of our method, we applied it to a simple synthetic sensor construct [19]. The system is made of several gene copies (e.g. with plasmid transfection), expressing a protein that dimerizes and activates the gene by binding to the promoter. In presence of the inhibitor (input of the system), the dimer is trapped and cannot bind to the promoter. A schematic of the involved reactions is depicted in Figure 2.
The system is simulated according to mass-action and obeys where the states x i denote the concentration of mRNA, protein, protein-dimer and dimer-promoter complex, respectively. The quantities x 0 5 and y(t) refer the total number of promoters and the external inhibitor concentration, respectively. The nominal value and the meaning of the model parameters are summarized in Table 1. We remark that such continuous state-space model have their limitations for transcriptional circuits because they require several gene copies in order to neglect the discrete Boolean nature of a single gene.
For the specified behavioral features, we expect the dimer to drop quickly after introduction of inhibitor and then quickly regain a high level after the inhibitor is washed out of the medium. We also constrain the monomeric protein. The specification functionals are the integral of the absolute difference to some target value x* (s) for the monomer and the dimer concentration over two small time intervals for each. More specifically, where w is the temporal weight function chosen to be The actual values for time-intervals for w 1 and w 2 , as well as the target values are shown together with the trajectories for the nominal system (9) in Figure 3.
For this case study we assume that we have means to design the binding rate of the inhibitor to the dimer k 7 and the binding rate of the dimer to the promoter k 9 . To assess the error incurred by the linearization we consider the reverse-forward mapping as described in (7).
Hence for various size of δ we perform the inverse mapping with L † and the forward mapping with F. If the inverse map is exact we should obviously obtain a ball with the same δ. Any deviation ε thereof reflects the approximation of F − 1 by L † . In Figure 4 the images of B f 0 (δ) under L † and F • L † are shown for various radii δ.
Hence, for an intermediate size of δ a good trade-off between approximation accuracy and sampling coverage is achievable. A systematic sampling of a predetermined specification area S would proceed by successively sampling overlapping balls with radii adapted to maintain ε under a certain value as illustrated in Figure 5. In this example, the coverage of the region S is above 98% using 50 balls of different radii. The lower left corner of the specification space ( Figure 5A) maps to a strongly nonlinear region of the parameter space (upper right corner in Figure 5B) and therefore forces the use of smaller balls to keep the error in acceptable range. On the contrary, the upper right region of the specification space is more linear and larger balls can be used with limited relative error ( Figure 5C).

Conclusion
We presented a novel method to determine the parameter region of a biochemical reaction network that is consistent with a certain dynamical, behavioral specification. We defined specifications in a novel and general way that requires only the specification map to be once differentiable with respect to the states of the underlying differential equations. We showed that by locally linearizing this map we can solve the desired inverse problem of finding a parameter region for a given specification. As regions, instead  Values are based on [19] and slightly adapted to obtain a desired threshold behavior. of points, are mapped back to parameter space the scheme is in principle able to cover (given some regularity conditions) the feature and parameter space -something that is not possible with point-wise sampling. We also discuss means for estimating the size of the local neighborhood in order to guarantee certain approximation errors. The computational framework allows a very flexible definition of biologically relevant behavorial features and efficient determination of the corresponding parameter region. Hence, the range of experimentally modifiable parameters, such as promoter binding strength can be determined upfront before the experimental synthesis of a synthetic construct. Throughout this work we only considered models based on ordinary differential equations, but the outlined framework can be extended to include stochastic dynamical models through the use of moment closure methods, for instance. In general, the specification functional will then involve the expectation operator and Monte Carlo sampling may be required to approximate it. Figure 4 Contours of B f 0 (δ) (blue) in feature space (first row) are mapped back to the parameter space via L † (second row) and mapped forward using F (red) for increasing size of δ (from left to right). Methods from stochastic sensitivity analysis [20] can be applied in order to perform the local inversion.