- Research
- Open access
- Published:
Modeling relaxation experiments with a mechanistic model of gene expression
BMC Bioinformatics volume 25, Article number: 270 (2024)
Abstract
Background
In the present work, we aimed at modeling a relaxation experiment which consists in selecting a subfraction of a cell population and observing the speed at which the entire initial distribution for a given marker is reconstituted.
Methods
For this we first proposed a modification of a previously published mechanistic two-state model of gene expression to which we added a state-dependent proliferation term. This results in a system of two partial differential equations. Under the assumption of a linear dependence of the proliferation rate with respect to the marker level, we could derive the asymptotic profile of the solutions of this model.
Results
In order to confront our model with experimental data, we generated a relaxation experiment of the CD34 antigen on the surface of TF1-BA cells, starting either from the highest or the lowest CD34 expression levels. We observed in both cases that after approximately 25 days the distribution of CD34 returns to its initial stationary state. Numerical simulations, based on parameter values estimated from the dataset, have shown that the model solutions closely align with the experimental data from the relaxation experiments.
Conclusion
Altogether our results strongly support the notion that cells should be seen and modeled as probabilistic dynamical systems.
Background
Cells are neither machines [1] nor simple information processing devices [2, 3]. Their specific complexity sometimes led to the idea that they should be treated differently that classical physico-chemical systems [4]. Nevertheless like all living systems cells are rooted within a physico-chemical reality which they can not escape. We therefore argue that cells should be seen and modelled as probabilistic dynamical systems.
One obvious sign that cells should indeed be seen as such lie in the possibility to perform so-called “relaxation” experiments. This consists in selecting a subfraction of a cell population (potentially down to one cell) and observing the speed at which the entire initial distribution for a given marker is reconstituted. Such relaxation experiments have already been published and analyzed on various cells and antigens.
Arguably the very first report to do so analyzed the distribution of the Sca1 antigen (Stem Cell Antigen 1) at the surface of EML cells, a multipotent mouse haematopoietic cell line. It was shown that it took more than 9 days before the three fractions (most Sca-1 negative, most Sca-1 positive and a central fraction) regenerated Sca-1 histograms similar to that of the parental (unsorted) population [5]. The authors proposed a phenomenological model which point toward discrete transitions in a dynamical system exhibiting multistability to quantitatively predict the relaxation dynamics of the sorted subpopulations [5]. For this they assumed the existence of two stable states, one of low and one of high Sca1 expression. Proliferation was assumed to be equal in both states.
Other studies have adopted a somewhat different approach with the knock-in of fluorescent reporter genes under the control of endogenous promoters [6, 7]. The first targeted promoter was that of Nanog in murine embryonic stem cells [6], an other gene classically considered as a stemness marker. Similarly to [5], the authors demonstrated that, although being in a Nanog low of in a Nanog high state is not biologically equivalent in term of fate, the transition between these two states can be adequately modelled using a fully probabilistic model, simulated using a Stochastic Simulation Algorithm [8].
The second targeted promoter was that of Tenascin-C in NIH 3T3 mouse fibroblasts [7]. In that case, the authors first proposed a phenomenological 2-states model, which proved to not correctly capture their data. They then turned toward a Langevin type stochastic differential equation to model the relaxation process. This led to an accurate prediction of the rates at which different phenotypes will arise from an isolated subpopulation of cells [7]. In contrast with [5], the authors assumed that each state had its own proliferation rate.
In the present work, we aimed at using a previously published mechanistic model of gene expression [9] to which we will add a stemness-dependant proliferation term, to fit relaxation data obtained by examining CD34 expression at the surface of TF1-BA cells.
Methods
Mathematical model
Case without proliferation
Throughout this work, we will use the classical two-state model (Fig. 1; see [9] and references therein), a refinement of the pioneer model introduced by [10].
This is the simplest model that accounts well for the specific nature of single-cell omics data (non-poisonnian [12], well fitted by Gamma distributions [13] and displaying a high proportion of zero counts [14]). More refined models with any number of possible gene configuration have been described [15] but their mathematical complexity makes them cumbersome to use for our purpose.
It is important to stress here that this is a mechanistic model, that differs from the phenomenological 2-states model described upper. Such models only considered a low and a high \(\chi\) state, without describing the protein production process. Importantly here stochasticity is described at the core of the modelling and does not need to be introduced as a additional term in the model. We recently proposed a piecewise deterministic Markov process (PDMP) version of that model which rigorously approximates the original molecular model [9]. Furthermore, a moment-based method has been proposed for estimating parameter values from a given experimental distribution assumed to arise from the functioning of a 2-states model [11]. We recall here the mathematical description of the model through the PDMP (piecewise deterministic Markov process) formalism
where \(E(t)\in \{0,1\}\) switching from 0 to 1 (resp. 1 to 0) at a rate \(k_{\text {on}}\) (resp. \(k_{\text {off}}\)). In this process, the protein quantity \(\chi (t)\) is structurally bounded by \(X_{max}=s/d\). From this process, we can derive the Chapman Kolmogorov or master equation in the form
see [16,17,18] for similar derivations. master equation of the process in the absence of proliferation reads. We recall that the boundary conditions simply reflects the no-flux boundary conditions stating \((s-d\chi ) n_{\text {on}}(t,\chi )=-d\chi n_{\text {off}}(t,\chi )=0\) for \(\chi =0,s/d.\) Moreover, because \(s-ds/d=-d.0,=0\), we only specify the boundary conditions when they give constraints on the densities. We define \(X_{\max }=s/d\) as the maximum value for the quantity of \(\text {CD34}\) in a cell. Scaling the space by \(X_{\max }\) allows us to consider the following system
with \(n_{\text {on}}(t,x)\) being the number of cells with a promoter in the on state at time t, with a (scaled) CD34 level x and \(n_{\text {off}}(t,x)\) being the number of cells with a promoter in the off state. The total number of cells, denoted as n(t, x), is given by \(n_{\text {on}}(t,x)+n_{\text {off}}(t,x)\). This is the quantity we considered to be measured.
Steady state of the model. The system is mass preserving and it converges to a steady state \(N_{\text {on},\text {off}}\) which is characterized by
And the solution if nonnegative. An interesting feature of this system is the fact that we have an explicit solution. We recall here the computations that can be found in [16] because they might help for understanding the computations for the model with proliferation. Indeed, summing up the equations, we get
Therefore, this quantity is constant on ]0, 1[. Using the boundary condition, we can see that 0 is the only admissible constant. Therefore, in this precise case, we have necessarily
Injecting in the equation we get
From this, we get
and quite remarkably, we have
If we choose \(C= \frac{\Gamma ((k_{\text {on}}+k_{\text {off}})/d)}{\Gamma (k_{\text {on}}/d)\Gamma (k_{\text {off}}/d)}\) we normalize this to 1 and get a \(\beta\) law \(B(k_{\text {on}}/d,k_{\text {off}}/d)\), so that we end up with
Case with proliferation
HSCs mostly reside in a quiescent state, although they can occasionally divide during homeostasis [19]. We therefore will consider \(\text {CD34}^+\) TF1-BA cells as immature slowly proliferating cells and \(\text {CD34}^-\) TF1-BA cells as mature highly proliferating cells. Therefore the proliferation rate will depend on the x variable representing the \(\text {CD34}\) content but not on the \(\text {on}/\text {off}\) status.
Moreover, we consider that cells keep their \(\text {on}/\text {off}\) status during a division. This is in line with the demonstration of a memory of transcriptional activity in mammalian cells [20, 21].
Since the system is structurally non-conservative, it makes no sense to look for steady state here. However, one can investigate for stable exponential profile, that is to look for positive solutions with shape
Such solution satisfy the system
In the sequel, we will focus on the normalized representant so that we will assume
We also introduce the adjoint eigenprofile. It can be obtained as exponentially growing solutions for the adjoint differential operators, this is the continuous equivalent of left and right eigenvector for the same eigenvalues in matrix analysis (\(MU=\lambda U, v^T M =\lambda V\) or equivalently \(M^T V=\lambda V\)).
We emphasize in particular the following property, for any initial condition of the system, one has
and moreover,
In the weighted norm \(\Vert (f_\text {on},f_\text {off})\Vert _{\phi }=\int _0^1 |f_\text {on}|\phi _\text {on}+|f_\text {off}|\phi _\text {off}\). In case \(\phi _{\text {on},\text {off}}\) is lower bounded, this implies classical \(L^1\) convergence. This lower bound is established below. For more details on this, we refer to [22] for an introduction to eigenvectors in this context. In particular, thanks to our normalization, the triplet \((\lambda ,N,\phi )\) is uniquely defined. Note that in the conservative case (\(r=0\)), \(\lambda =0\), N is given by the renormalized steady state and the adjoint eigenvector is simply the constant vector (1, 1). Note also that this guarantees that for any initial data, we have in case \(\lambda \ge 0\) and \(\phi _{\text {on},\text {off}}\) are lower bounded (as it will be established below)
And regarding the observations of the steady profile in Fig. 2, our normalized asymptotic profile should be \(N: x\longmapsto N_{\text {on}}(x)+N_{\text {off}}(x)\).
We assume that, for the initial dimensional system, the proliferation rate is linear, i.e. \(r:\chi \longmapsto \tilde{r}_0+\tilde{r}_1 \chi\). Scaling again the space by \(X_{\max }\), the proliferation term becomes, \(r:x \rightarrow r_0 + r_1 x\) with \(r_0 = \tilde{r}_0\) and \(r_1 = \tilde{r}_1 X_{\max }\). We assume that the constant proliferation rate is positive, \(r_0>0\). Conversely, to model the fact that \(\text {CD34}^-\) cells divide more frequently than \(\text {CD34}^+\) cells, we assume that the linear proliferation term, \(r_1\), is negative. However, to preserve the positivity of the proliferation rate, the constant \(r_1\) must satisfy the following condition, \(r_1 \in [-r_0,0]\).
We show in the Results section that, under this proliferation assumption, it is theoretically possible to derive the normalized asymptotic profiles \((N_{\text {on}},N_{\text {off}})\).
The biological setting
Relaxation experiments
Chronic Myeloid Leukemia (CML) is a myeloproliferative disorder arising at the hematopoietic stem cell (HSC) level. It is associated with the recurrent chromosomal (Philadelphia) translocation t(9;22)(q34;q11) which leads to the oncogenic chimeric gene that fuses Bcr and Abl genes and results in the expression of a constitutively active unique tyrosine kinase named BCR-ABL [23].
Véronique Maguer-Satta’s group has developed the TF1-BA cell line, a unique model of immature human hematopoietic cells (TF1) transformed with BCR-ABL by lentiviral infection. This model was shown to recapitulate early alterations following the BCR-ABL oncogene appearance as identified using primary samples of CML patients at diagnosis and in chronic phase [24].
We decided to analyze the relaxation dynamics for the CD34 antigen at the surface of those TF1-BA cells (Fig. 2). \(\text {CD34}\) is a transmembrane phosphoglycoprotein which is predominantly regarded as a marker of Haematopoietic Stem Cells (HSCs) [25]. We reasoned that \(\text {CD34}\) surface expression could therefore be seen as a proxy for stemness of our TF1-BA cells. Interestingly, one observes a relaxation in both directions: \(\text {CD34}^-\) cells are regenerated from \(\text {CD34}^+\) cells, as biologists would expect, but one also see that \(\text {CD34}^+\) cells are regenerated from \(\text {CD34}^-\) cells, establishing that stemness is not a fixed quality but the result of an underlying dynamical system as previously shown in other cellular systems ( [5, 6]; see upper)
Data processing
Two types of data were collected on days 2, 5, 9, 13, 19, 23, 26 and 30: cell counts and fluorescence distribution. The cell counts allow us to quantify proliferation whereas the fluorescence measure the distribution of CD34 expression.
Gating.
As usual for cytometric data, we initiate the analysis with a gating step. We use SSC-H and FSC-H data to distinguish viable and debris cells. FSC (Forward Scatter) data are generally assimilated to the size of the cells analyzed, and correspond to the light scattered along the laser path. SSC (Side Scattered) data, on the other hand, are usually linked to the granularity and correspond to the light scattered at a 90-degree angle. The “H” stands for Height and is one component of this type of data. Cell debris are characterized by relatively low size and high granularity relative to their size.
To select viable cells, we plot the values of SSC-H along those of FSC-H. An example of such a graph is provided in Fig. 3 using data from day 2 of the \(\text {CD34}^+\) cell experiment. Visually, viable cells can be identified as the cluster of points with high FSC-H and SSC-H values. Using the “FlowCal” python package [27], we draw an ellipse as a filter to select only these viable cells. At the bottom of Fig. 3, we represent fluorescence data with and without the gating phase (Ungated and Gated, respectively). Note that fluorescence distribution is only slightly affected by the removal of debris cells.
Shifting.
Even after gating, some cells exhibit a negative fluorescence level, which is inconsistent as these values are intended to represent the amount of proteins in each cell. To avoid this problem, we added a shifting step. This step occurs immediately after the gating process and consists in subtracting the minimum value of each distribution (for each sub-population and for each day) from all the values, bringing the minimum to 0. This transformation, once again, does not distort the fluorescence distribution. This corresponds to the interpretation of negative values as compensation of a baseline value.
Numerical simulations
Linking data to mathematical model: The cell counts are interpreted as snapshots of the total population \(\int _0^{ X_{\max }} (n_{\text {on}}(t,x)+n_{\text {off}}(t,x) )dx\). The fluorescence distribution is considered as a sample from the distribution \(\frac{n_{\text {on}}(t,x)+n_{\text {off}}(t,x)}{\int _0^{ X_{\max }} (n_{\text {on}}(t,x)+n_{\text {off}}(t,x) )dx}\). As we have no information on the repartition \(\text {on}/\text {off}\) for the initial data, we apply the following rule: for \(t_0\), starting of our simulation (DAY 2), we choose the repartition to be the same as in the steady distribution N. More precisely, we fix the proportion with the equation
Numerical scheme. For the Eq. (5), we use an explicit upwind scheme. Setting \(a_{\text {on}}: x \longmapsto d(1-x)\) and \(a_{\text {off}}: x \longmapsto -dx\), the scheme is given by
with \({n_{\text {on}/\text {off}}}_j^{n} = n_{\text {on}/\text {off}}(j \Delta x, n \Delta t)\), \(r_j = r(j \Delta x)\), \({a_{\text {on}/\text {off}}}_{j+1/2}^n = (a_{\text {on}/\text {off}}((j+1) \Delta x,n \Delta t)+a_{\text {on}/\text {off}}(j \Delta x, n \Delta t))/\Delta x\) and \({a_{\text {on}/\text {off}}}_{j-1/2}^n = (a_{\text {on}/\text {off}}(j \Delta x, n \Delta t)+a_{\text {on}/\text {off}}((j-1) \Delta x,n\Delta t))/\Delta x\). In the Results section, Fig. 7 illustrates a comparison between the result of the numerical scheme and the theoretical asymptotic profile of Eq. (5).
Estimation of the distance to the data.
To calibrate the parameter values of our system, we use our experimental data. Initially, in order to estimate the exponential growth rate of cells, we perform a linear regression analysis on the temporal evolution data of the cell count. To determine the values of other system parameters, we seek values that make our numerical results as close as possible to the experimental data. To characterize this notion of closeness between our numerical results and the data, we introduce the Kantorovich–Rubinstein distance. Given two probability distribution \(p_1,p_2\) on \(\mathbb {R}_+\), we define their cumulative distribution function \(P_i(x)=Pr(X<x, \text {under distribution }p_i)=\int _0^x p_i(dx).\) Using these functions we can define the Kantorovich Rubinstein (also known as Wasserstein) distance by
In our specific case, we want to compare at each step the (normalized) distribution generated by the model at time \(t_i\) (hereby denoted as \(model(t_i,dx)\) with cumulative distribution \(M(t_i,.)\)) and the corresponding distribution of the data at time \(t_i\) denoted as \(data(t_i,dx)\)with cumulative distribution \(D(t_i,.)\). We would therefore compute
Note that in our case the integral is in fact taken on the finite interval [0, 1] for scaled variables.
Considering the distribution profile of the data, we prefer to study this distance on a logarithmic scale. We therefore make the following change of variables \(y = \log (xX_{\max } +1)\), and we define the modified Kantorovich–Rubinstein distance as follows
with \(\tilde{D}\) and \(\tilde{M}\) the two cumulative distribution functions, defined below, in the new logarithmic scale. Note that, following this change of variable, this “distance” can be greater than 1.
We are looking for a function \(\tilde{m}\) that satisfies the following relation, for all \(b\in [0,\log (1+X_{max})]\)
In particular, the link between the corresponding densities is immediately given by
The space [0, 1] is discretized uniformly with \(J+1\) points, and this sequence is denoted \((x_j)_j\).
We also define the sequences \((y_j)_{j \in \{ 0,1,\ldots ,J\}}\) and \((\ell _j)_{j \in \{ 0,1,\ldots ,J-1 \}}\) as follows
and
Consequently, the estimator of the cumulative distribution function M is given by
where we have used (13) to estimate \(\tilde{m}\). We need to renormalize to ensure we are comparing probability distributions.
The estimator of the cumulative distribution function D is
with \(h_j(t) = \#\{d_i: \; \log (d_i(t) +1) \in [y_{j-1},y_j] \}\), where the operator \(\#\) corresponds to the cardinal of a set and the data \(d_i\) correspond to the fluorescence data obtained after data processing. These data, \(d_i\), are real numbers between 0 and \(X_{\max }\). The term \(\# \{d_k(t), \; \forall k \}\) corresponds to the number of cells present in the data on day t after the gating operation.
Therefore, the distance between the experimental data and the mathematical model is as follows
To calibrate the parameters of our model, we will minimize the sum of the modified Kantorovich–Rubinstein distance for the different days at our disposal and for the two experiments. The distance associated with \(\text {CD34}^+\) data is denoted \(\widehat{dist}_{\textrm{KT}}^{\log ,+}\), while that associated with \(\text {CD34}^-\) data is denoted \(\widehat{dist}_{\textrm{KT}}^{\log ,-}\). We also introduce the distance, denoted \(\widehat{dist}_{\textrm{KT}}^{\log ,\pm }\), corresponding to the sum of these two distances. Thereby we look for one set of parameters for fitting the two datasets jointly. Mathematically, the optimization problem is given by the following formula
The results of this optimization work are detailed in the Results section.
Results
Mathematical analysis—derivation of explicit normalized asymptotic profile \((N_{\text {on}},N_{\text {off}})\)
Under the assumption of a linear proliferation rate \(r(x)=r_0+r_1 x\), we obtain the following result
Theorem 1
Assume \(r(x)=r_0+r_1x\). Define the matrix M by
Denote s(M) the largest eigenvalue of M, then the left and right eigenvector
can be chosen positive. Moreover, the solution to (6) is given by
And
with C an arbitrary positive constant and \(\eta\) given by \(\eta = 1 +s(M)/k_{\text {on}}\).
In particular, we have
Proof
We notice that the existence and uniqueness (up to a multiplicative factor) of the triplet s(M), V, U is a straightforward consequence of the classical Perron Frobenius theorem which applies here because the off-diagonal entries of M are positive [28]. We introduce the ration \(\eta =\frac{V_{\text {on}}}{V_{\text {off}}}\) and notice that by construction, we have
Then, consider the system satisfied by \(\psi _{\text {on},\text {off}}=e^{r_1 x}\phi _{\text {on},\text {off}}\), where \(\phi _{on/off}\) is the solution of (7). We get
This system can be summarized as
We recognize the matrix \(r_0+M^t\). Therefore, we have a solution independent on x \(\psi _{\text {on},\text {off}}=V_{\text {on},\text {off}}\) and \(\lambda =r_0+s(M)\) and \(\phi _{\text {on},\text {off}}=e^{-\frac{r_1}{d} x}V_{\text {on},\text {off}}\). Similarly, if we denote \(P_{\text {on},\text {off}}=e^{-\frac{r_1}{d} x}N_{\text {on},\text {off}}\), we obtain
This can be condensed into
We can now proceed as for the conservative case and notice that since \(V^t(r_0-\lambda -M)=0\) multiplying the equation by V.
After the appropriate substitution, we obtain
This leads to, for a suitable renormalization constant \(C>0\),
We introduce then the notation \(\eta =\eta (r_1)=\frac{V_{\text {on}}}{V_{\text {off}}}\) and go back the N variables to write
In particular, we have
Going back to the definition of \(\eta\) we notice
\(\square\)
Simulation analysis
Estimation of exponential growth rate.
Using experimental data on cell numbers for different days, we estimate the exponential growth rate \(\lambda\). For both relaxation experiments, we perform a linear regression of the natural logarithm of the number of cells. In the case of \(\text {CD34}^+\) cell relaxation, the linear regression line is given by the slope \(\lambda ^+ \approx 0.418\), and for \(\text {CD34}^-\) cells, the slope is \(\lambda ^- \approx 0.422\). We estimate the parameter \(\lambda\) by the average of these two slopes, \(\lambda \approx 0.42\). Figure 4 shows that the estimate of the exponential growth rate is in good agreement with the experimental data.
Calibration of parameters.
A study of the maximum for each day and each experiment of the “PE-Cy7-A” fluorescence data reveals an \(X_{\max }\) close to 20,000. To reduce the numerical complexity of the 5-parameter optimization \((r_0,r_1,k_{\text {on}},k_{\text {off}},d)\), we will use the estimate of \(\lambda\) to reduce our optimization problem to just 4 parameters. Indeed, using the theoretical relationship (16) and the previous estimate of \(\lambda\), we can define the parameter \(r_0\) as a function of the other model parameters,
For the estimation of the other parameters \(r_1\), \(k_{\text {on}}\), \(k_{\text {off}}\) and d, we use the modified Kantorovich–Rubinstein distance minimization strategy, presented in the Method section and given by the following formula
To determine numerically this minimum, we calculate the modified Kantorovich–Rubinstein distance for the two relaxation experiments, for each point on a large grid of parameter values. We then adjust our grid to obtain the location of the minimum of the sum of two distances. This method gives us the following four parameter values \(r_1^\star = 0.426\), \(k_{\text {on}}^\star = 0.261\), \(k_{\text {off}}^\star = 19.178\) and \(d^\star =0.21\). Following this optimization strategy, the optimal choice for the function r is to choose \(r_1\) such that \(r(x) = r_0\times (1-x)\). Nevertheless, we will see in the next section that the choice of \(r_1\) is not decisive for a good fit between the model result and the experimental data.
Finally, using the relation, \(s= d X_{\max }\), we can calculate the value of the synthesis rate, \(s=4214\). All parameter value estimates are given in Table 1.
We compared those values to the litterature. The estimated half-life of CD34 in this model was estimated to be equal to log(2)/0.261, that is about 4 h. This is in the low range of the estimated distribution for proteins half-life [29]. Regarding the \(X_{\max }\) estimation, its value is in the range of observed value, slightly over the median that is \(1.6 \times 10^4\) [29]. The estimated \(k_{\text {on}}\) value gives an estimated frequency of 4 (1/0.261) bursts per hour on average, which is close from the expected range from a burst every 30 min to up to 10 h [30]. The \(k_{\text {off}}\) value display the expected ratio (kon \(<<\) koff an d1\(<<\) koff) in the case of a bursty regime [9]. Altogether all of our estimated parameters are thus in the expected range.
Profile likelihood.
To investigate the robustness of our parameter estimates and the significance of each parameter in minimizing the modified Kantorovich–Rubinstein distance, we employ an approach analogous to the profile likelihood concept [31, 32] in the context of our optimization problem.
First, we examine the influence of the parameter d. Let d be fixed, we calculate, in the same way, the triplet of parameters \((r_1,k_{\text {on}},k_{\text {off}})\) that minimizes the modified Kantorovich–Rubinstein distance under the fixed d constraint. These optimal parameters are, therefore, functions dependent on d, denoted as \(\widehat{r_1}\), \(\widehat{k_{\text {on}}}\), and \(\widehat{k_{\text {off}}}\). Mathematically, they are defined by the following relation
Once these functions have been calculated, we can determine the modified Kantorovich–Rubinstein distance associated with them, denoted by \(S_{d}\) and defined by the following equality,
For the argument of the function, we choose \(d/d^\star\), to study the distance associated with the relative variation of the optimal parameter. By definition of the function \(S_{d}\), it reaches its minimum at \(d=1\), corresponding to \(d=d^\star\). Similarly, we can define the functions \(S_{k_{\text {on}}}\), \(S_{k_{\text {off}}}\), and \(S_{r_1}\).
In Fig. 5.A, we plotted the \(S_{d}\), \(S_{k_{\text {on}}}\) and \(S_{k_{\text {off}}}\) functions. The impact of the relative variation of the two transition rates around the optimal value, on the modified Kantorovich–Rubinstein distance is quite similar. For the degradation rate, d, we note that a fine estimate of this is crucial to obtain good accuracy between the data and the mathematical model.
Conversely, the parameter \(r_1\) has a minor impact on the minimum of the modified Kantorovich–Rubinstein distance. Specifically, when \(r_1\) deviates from its optimal value, new optimal parameter values emerge, resulting in distances very close to the optimal distance. This result is illustrated in Fig. 5.B.
Comparison between model and experimental data. In Fig. 6 we compared data from relaxation experiments with the results of our model for the parameter values presented in Table 1.
To initialize our model on day 2, we use the Eq. (8), it follows this following initial conditions
where \(h_j\) corresponds to the number of cells on day 2 with fluorescence between \(X_{\max } \times x_j\) and \(X_{\max } \times x_{j+1}\), where \((x_j)_j\) corresponds to the uniform discretization of space [0, 1]. That is, the \((h_j)_j\) correspond to the heights of the histogram of the data renormalized by the maximum \(X_{\max }\).
Visually, Fig. 6.A,B reveals a high degree of proximity between the experimental data and the mathematical model. To quantify this closeness, we once again employ the modified Kantorovich–Rubinstein distance. In Fig. 6.C, we represent, by black squares, the temporal evolution of distance between the experimental data of the two relaxation experiments. Due to the antinomic nature of the two experiments, distances are considerable in the early days of the experiment. It then gradually decreases as the two distributions converge towards the stationary distribution. From day 26, both profiles reached the stationary stage. At this point, in the absence of noise, these two distributions are expected to be similar. Therefore, the minimum distance, \(\text {dist} = 0.142\) (illustrated by a dotted line), attained on day 26, corresponds to a reference distance to determine the proximity of two distributions.
In this figure, we also illustrate the distance between the model and the two relaxation experiments. The \(\text {CD34}^+\) cell relaxation experiment is represented by the red crosses, and the \(\text {CD34}^-\) cell relaxation experiment by the green crosses. To quantitatively assess the proximity of the model to experimental data, we employ the reference distance represented by the horizontal line. For the \(\text {CD34}^-\) cell relaxation experiment, we observe that the distances are always less than the reference distance, except on day 13 for which the distance is slightly greater. Concerning the \(\text {CD34}^+\) cell relaxation experiment, this time the distances are more regularly greater than the reference distance, but are still within an acceptable order of magnitude. These results show that our proposed model is very close to the experimental data.
Discussion
Although we had to infer a number of model parameters which could not be deduced from the literature (like for example the half-life of the CD34 protein), the overall fitting ability of our model proved to be quite satisfactory. Using Kantorovich distances, we indeed observed that the model-to-experiment distance was within the range of the experiment-to-experiment distance, so in the range of experimental variability.
We assumed that the proliferation rate would depend upon the level of expression of the very gene that is being modelled. In our case, that proved to be useful since we wanted to fit relaxation data obtained from CD34 expression. CD34 is a known marker for stemness and we hypothesized that, in line with the existing litterature [25], \(\text {CD34}^+\) cells would proliferate less that \(\text {CD34}^-\) more mature cells. We should nevertheless stress that such a behaviour can be true for normal hematopoietic stem cells, but can be questioned regarding cancer stem cells.
The methodology presented in this paper is applicable to any cell systems for which one can perform simultaneously relaxation experiments and proliferation measurements. Biological systems for which the half-life of the protein of interest is known should be preferred, since this will remove the need for estimating an important model parameter.
One of the difficulties we faced when comparing the model’s output with experimental data, lies in the need for common units By default, our model output is a value between 0 (no CD34 expressed) and 1 (maximum level of CD34 expression). FACS data are corrected fluorescent values, that can be negative in the raw acquisition dataset. We therefore processed the data with a gating phase, a shifting phase, and finally normalized them in order to obtain comparable values with the model.
It is crucial to emphasize that within a cell population displaying a stationary distribution of phenotypic states, no cell remains in a permanent state over time. Given a sufficiently long time, one can assume that all cells will have visited all possible states (i.e. all possible values for their surface CD34 expression). In other terms, in the state versus identity long standing debate [33], we clearly side with the view that stemness is an emerging dynamical property
Several points shall be investigated further. The first point that can be enriched is the form of the division rate \(r(x)=r_0-r_1 x\). The linear form ensures the explicit formulation of the stable distribution and facilitates the scaling by \(X_{max}\) but is not necessary for the existence of a profile. Moreover, we made strong assumptions here that the daughter cells have the same concentration of markers than the mother cell and that the division has no impact on the \(\text {on}/\text {off}\) status of the cell. This later is a reasonable assumption in the light of the existence of transcriptional memory [20, 21], but it might be gene-dependent.
One missing aspect of our model is the absence of any explicit death term. On the other hand, an expression independent death rate could immediately be considered by relaxation of the constraint of positive division rate (which would then correspond to a net growth rate). In terms of parameters, this would affect \(r_0\).
Another missing aspect of our model is the fact that the CD34 gene expression is modelled in isolation. It is quite obvious that in cells its expression level will be constrained by its positioning in a complex web of genes-to-genes interactions known as a Gene Regulatory Network (GRN). Inference of such GRNs is a notoriously difficult task (see e.g. [34]), and performing relaxation experiments from such complex objects is yet to be done.
One of the future goal of our work would be to assess its predictive ability. A promising lead would be to go further in the analysis in order to estimate the influence of parameters on the relaxation time. Mathematically, this could be analysed through the spectral gap which is beyond the scope of this work. It would be especially interesting to identify the effects of various parameters on it, in particular the parameter d which represents the degradation rate. Note that in this case, the distribution and the value of \(\lambda\) are expected to change. Interestingly, the model’s prediction in this case could be tested experimentally by modifying the endogenous CD34 protein stability.
Conclusion
In the present work, we proposed a revised two-state probabilistic model for gene expression which explicitly incorporates a proliferation term. This model was analysed and we obtained an analytical solution for our model’s steady state. The same model was then used for simulating the transient behavior of FACS-sorted cells leading to the progressive relaxation towards the steady state distribution. Altogether, our work shows that a two-state description for CD34 gene expression is well suited to explain the relaxation experiments. This support the notion that cells should be seen and modeled as probabilistic dynamical systems.
Availability of data and materials
The data described in this article can be freely and openly accessed at gitlab: https://gitlab.inria.fr/lepoutre_public/relaxation_modeling.
Abbreviations
- GRN:
-
Gene regulatory network
- PDMP:
-
Piecewise deterministic Markov process
References
Nicholson DJ. Is the cell really a machine? J Theor Biol. 2019;477:108–26. https://doi.org/10.1016/j.jtbi.2019.06.002.
Kupiec JJ. A probabilistic theory for cell differentiation, embryonic mortality and DNA c-value paradox. Specul Sci Technol. 1983;6(5):471–8.
Noble D. Genes and causation. Philos Transact A Math Phys Eng Sci. 2008;366(1878):3001–15. https://doi.org/10.1098/rsta.2008.0086.
Schrödinger E. What is life? The physical aspect of the living cell. Cambridge: Cambridge University Press; 1944.
Chang HH, Hemberg M, Barahona M, Ingber DE, Huang S. Transcriptome-wide noise controls lineage choice in mammalian progenitor cells. Nature. 2008;453(7194):544–7.
Kalmar T, Lim C, Hayward P, Munoz-Descalzo S, Nichols J, Garcia-Ojalvo J, Martinez Arias A. Regulated fluctuations in Nanog expression mediate cell fate decisions in embryonic stem cells. PLoS Biol. 2009;7(7):1000149. https://doi.org/10.1371/journal.pbio.1000149.
Sisan DR, Halter M, Hubbard JB, Plant AL. Predicting rates of cell state change caused by stochastic fluctuations using a data-driven landscape model. Proc Natl Acad Sci USA. 2012;109(47):19262–7. https://doi.org/10.1073/pnas.1207544109.
Gillespie DT. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J Comput Phys. 1976;22(4):403–34.
Herbach U, Bonnaffoux A, Espinasse T, Gandrillon O. Inferring gene regulatory networks from single-cell data: a mechanistic approach. BMC Syst Biol. 2017;11(1):105. https://doi.org/10.1186/s12918-017-0487-0.
Ko MS. A stochastic model for gene induction. J Theor Biol. 1991;153:181–94.
Peccoud J, Ycart B. Markovian modelling of gene product synthesis. Theor Popul Biol. 1995;48:222–34.
Raj A, Oudenaarden A. Nature, nurture, or chance: stochastic gene expression and its consequences. Cell. 2008;135(2):216–26.
Albayrak C, Jordi CA, Zechner C, Lin J, Bichsel CA, Khammash M, Tay S. Digital quantification of proteins and MRNA in single mammalian cells. Mol Cell. 2016;61(6):914–24. https://doi.org/10.1016/j.molcel.2016.02.030.
Sarkar A, Stephens M, Separating measurement and expression models clarifies confusion in single cell RNA-seq analysis. bioRxiv, 2020; 2020–0407030007 https://doi.org/10.1101/2020.04.07.030007
Herbach U. Stochastic gene expression with a multistate promoter: breaking down exact distributions. SIAM J Appl Math. 2019;79:1007–29.
Bressloff PC. Stochastic processes in cell biology, vol. 41. Cham: Springer; 2014.
Raj A, Peskin CS, Tranchina D, Vargas DY, Tyagi S. Stochastic MRNA synthesis in mammalian cells. PLoS Biol. 2006;4(10):309.
Karmakar R, Bose I. Graded and binary responses in stochastic gene expression. Phys Biol. 2004;1(4):197.
Laurenti E, Gottgens B. From haematopoietic stem cells to complex differentiation landscapes. Nature. 2018;553(7689):418–26. https://doi.org/10.1038/nature25022.
Fourneaux C, Racine L, Koering C, Dussurgey S, Vallin E, Moussy A, Parmentier R, Brunard F, Stockholm D, Modolo L, Picard F, Gandrillon O, Paldi A, Gonin-Giraud S. Differentiation is accompanied by a progressive loss in transcriptional memory. BMC Biol. 2024;22(1):58.
Phillips NE, Mandic A, Omidi S, Naef F, Suter DM. Memory and relatedness of transcriptional activity in mammalian cell lineages. Nat Commun. 2019;10(1):1208.
Perthame B. Transport equations in biology. Cham: Springer; 2006.
Chereda B, Melo JV. Natural course and biology of cml. Ann Hematol. 2015;94(Suppl 2):107–21. https://doi.org/10.1007/s00277-015-2325-z.
Laperrousaz B, Jeanpierre S, Sagorny K, Voeltzel T, Ramas S, Kaniewski B, Ffrench M, Salesse S, Nicolini FE, Maguer-Satta V. Primitive cml cell expansion relies on abnormal levels of bmps provided by the niche and on bmprib overexpression. Blood. 2013;122(23):3767–77. https://doi.org/10.1182/blood-2013-05-501460.
Sidney LE, Branch MJ, Dunphy SE Dua, Harminder S, Hopkinson, A. Concise review: evidence for cd34 as a common marker for diverse progenitors. Stem Cells. 2014;32(6):1380–9. https://doi.org/10.1002/stem.1661.
Vershik AM. Long history of the Monge–kantorovich transportation problem. Math Intell. 2013;35:1–9.
Castillo-Hair SM, Sexton JT, Landry BP, Olson EJ, Igoshin OA, Tabor JJ. Flowcal: a user-friendly, open source software tool for automatically converting flow cytometry data from arbitrary to calibrated units. ACS Synth Biol. 2016;5(7):774–80.
Horn RA, Johnson CR. Matrix analysis. Cambridge university press, 2012.
Schwanhausser B, Busse D, Li N, Dittmar G, Schuchhardt J, Wolf J, Chen W, Selbach M. Global quantification of mammalian gene expression control. Nature. 2011;473(7347):337–42. https://doi.org/10.1038/nature10098.
Nicolas D, Phillips NE, Naef F. What shapes eukaryotic transcriptional bursting? Mol BioSyst. 2017;13(7):1280–90.
Venzon D, Moolgavkar S. A method for computing profile-likelihood-based confidence intervals. J Roy Stat Soc: Ser C. 1988;37(1):87–94.
Raue A, Kreutz C, Maiwald T, Bachmann J, Schilling M, Klingmüller U, Timmer J. Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics. 2009;25(15):1923–9.
Zipori D. The nature of stem cells: state rather than entity. Nat Rev Genet. 2004;5(11):873–8.
Ventre E, Herbach U, Espinasse T, Benoit G, Gandrillon O. One model fits all: Combining inference and simulation of gene regulatory networks. PLoS Comput Biol. 2023;19(3):1010962. https://doi.org/10.1371/journal.pcbi.1010962.
Acknowledgements
We thank Ulysse Herbach for his help with inferring 2-states parameter values. We also thank the BioSyL Federation and the LabEx Ecofect (ANR-11-LABX-0048) of the University of Lyon for inspiring scientific events.
Funding
Maxime Estavoyer is funded by ANR PLUME (ANR-21-CE13-0040). For this project, Marion Dufeu was funded by IXXI (http://www.ixxi.fr)
Author information
Authors and Affiliations
Contributions
ME, TL and GR performed the mathematical derivation of the profile. VMS, SL and TV produced the data. MD, OG and ME analyzed and shaped the data. MD and ME performed the numerical simulations. ME performed the parameter inference. ME, OG and TL interpreted the results. ME, OG and TL wrote the manuscript. All authors but TV read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
None
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Estavoyer, M., Dufeu, M., Ranson, G. et al. Modeling relaxation experiments with a mechanistic model of gene expression. BMC Bioinformatics 25, 270 (2024). https://doi.org/10.1186/s12859-024-05816-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12859-024-05816-4