Skip to main content

Recursive model for dose-time responses in pharmacological studies

Abstract

Background

Clinical studies often track dose-response curves of subjects over time. One can easily model the dose-response curve at each time point with Hill equation, but such a model fails to capture the temporal evolution of the curves. On the other hand, one can use Gompertz equation to model the temporal behaviors at each dose without capturing the evolution of time curves across dosage.

Results

In this article, we propose a parametric model for dose-time responses that follows Gompertz law in time and Hill equation across dose approximately. We derive a recursion relation for dose-response curves over time capturing the temporal evolution and then specify a regression model connecting the parameters controlling the dose-time responses with individual level proteomic data. The resultant joint model allows us to predict the dose-response curves over time for new individuals.

Conclusion

We have compared the efficacy of our proposed Recursive Hybrid model with individual dose-response predictive models at desired time points. We note that our proposed model exhibits a superior performance compared to the individual ones for both synthetic data and actual pharmacological data. For the desired dose-time varying genetic characterization and drug response values, we have used the HMS-LINCS database and demonstrated the effectiveness of our model for all available anticancer compounds.

Background

One of the most important goal of precision medicine is to predict sensitivity of an anticancer drug to a given patient. Although, patients are most often characterized by their gene expressions, a more precise characterization is obtained by studying their proteomic expressions [1]. Harvard Medical School Library of Integrated Network-Based Cellular Signatures (HMS-LINCS) [2] offers proteomic data for cancer cell lines measured at various time intervals post drug application along with observed apoptosis fractions over different drug concentration and time. Given the evolution of protein expression over drug concentration and time, we would like to predict the apoptosis fractions of individuals over the same drug concentration at 72 hours post application of drug as 72 hours is considered to be the steady-state for cell viability studies [3, 4].

Our data, therefore, consists of a collection of temporally varying dose-response curves for each individual and the predictors are also a collection of temporally varying expressions, for multiple proteins, observed at different drug concentrations. This may appear as a standard function-on-function concurrent regression [5], but several obstacles arise, such as– (a) number of functional predictors exceeds the number of cell lines, (b) protein expression curves are observed at different time points as compared to the dose-response curves with only little overlap. Consequently, standard parametric statistical models cannot be readily applied here. Turning to model-free procedures, Matlock et al. [6] used Random Forest (RF) methodology to analyze the same HMS-LINCS data. While their methodology alleviates the small sample-high dimension problem, it cannot make temporal prediction of dose-response curves in absence of predictor information at the predicting time point. More precisely, given the observed dose-response curve of an individual at 48 hours, it cannot predict the curve at 72 hours. Therefore, the RF model either needs to wait till 72 hours and observe the protein expression curves to predict response, or it requires extrapolating protein expression data to 72 hours and then make predictions.

Standard machine learning (ML) approaches also fail to explicitly take into account a few well established properties of dose-response curves and their temporal evolution. For instance, Haber’s law suggest a monotonic relationship between responses observed at two successive time points at a given dose [7] and this rule, in turn, induces the dose-response curve observed at a later time point to dominate or be dominated by the curves observed at earlier time points [8]. Such constraints cannot be easily built into ML algorithms. Furthermore, these models provide little insight into the steady-state properties of the dose-response curves. Regardless of these shortcomings, several studies have demonstrated superior predictive performance of RF based models in drug sensitivity predictions [911].

To alleviate some theoretical restrictions of the foregoing ML approaches, while borrowing the predictive strength of RF methodology, we offer a hybrid model that satisfies some physical laws that dose-response curves are expected to satisfy while retaining a flexible model-free relationship between predictors and responses. In particular, we propose a parametric model for dose-time responses that follows the Gompertz law in time and approximately follows the Hill equation across dose. We derive a recursion relation for dose-response curves over time capturing the temporal evolution and theoretically examine their steady-state behavior. We then specify an RF model connecting the parameters controlling the dose-time responses with individual level proteomic data. The resultant joint model allows us to predict dose-response curves over time for new individuals. The complete fitting code along with a synthetic example can be obtained from GitHub via: https://github.com/dhruba018/Dose_time_Response_Recursive_Model.

Results

We have evaluated the performance of our proposed recursive hybrid model using both synthetic data and HMS-LINCS database mentioned above. Note that, we were forced to limit our analysis to a single dataset since, to our knowledge, HMS-LINCS is the only publicly available source offering functional responses as well as predictors. Furthermore, dimensions of HMS-LINCS datasets are restricted to a handful of drugs and samples with a higher number of predictors in contrast to some common pharmacogenomics databases (e.g., CCLE [12] or GDSC [13]) that provides dose-response curves for hundreds of samples, but with static feature sets. Here, we use HMS-LINCS as our synthetic data generation baseline first and then directly for analysis.

Description of HMS-LINCS datasets

We used two distinct datasets from HMS-LINCS as our predictor and response sets. The predictor set consists of dose-time expression for 21 proteins and the response set contains the mean apoptosis fractions observed in 10 BRAF V600E/D melanoma cell lines over multiple doses and time points [2, 6]. Both protein expressions and apoptosis fractions are available post drug application at 7 dose levels ranging from 3.2 nM to 3.2 μM. However, while the protein expressions are available at 5 different time points (1,5,10,24 & 48 hours post drug application), the apoptosis data is available only for 24,48 & 72 hours post drug application. Apoptosis fraction was computed from the number of apoptotic cells at each dose-time point and the total initial number of cells normalized with DMSO control and then averaged over 4 replicates to produce the mean value. Both sets are available for 5 RAF/MEK inhibitor drugs. More detailed descriptions can be obtained from [2, 6].

For this study, we only use protein expressions observed at 24 & 48 hours, since those expressions match the temporal record of the responses (24,48 & 72 hours). For the 72 hours scenario, we use a time series model for data extrapolation (elaborate description is provided below in “Application on HMS-LINCS data” section). Moreover, only 14 out of 21 proteins have complete records in the covariate set, therefore, we only use these 14 proteins as our final predictors.

Simulation study

To demonstrate the efficacy of our proposed model, we have performed a simulation study involving two synthetic datasets with 7 subjects, 7 dose levels, and 8 time points each (m=7, D=7, T=8). The predictor set contains the expression of 14 different covariates at each dose-time point (P=14). For detailed explanation of the terms used here, look into the “Methods” section.

Synthetic data generation

We first form the slope coefficient matrix in Eq. (20) by using the difference between HMS-LINCS protein expressions \(x_{t, d, i}^{(p)}\) at 24 & 48 hours post application of the drug AZ-628 as the base βd,i and add random noise to create distinction between subjects. This yields a (βd,i)49 × 14 matrix per subject (for 7 time point differences and 7 doses). We also use the 24 hour expression data post AZ-628 application in cell line C32 as baseline covariates and add random noise to create our predictors per subject \(z_{t, d, i}^{(p)}, \quad p = 1, 2, \cdots, 14\).

$$\begin{array}{*{20}l} {} \beta_{d, i}^{(p)} &\!:=\! \frac{dx_{\cdot, d, i}^{(p)}}{dt} \,=\, x_{48, d, i}^{(p)} \,-\, x_{24, d, i}^{(p)} + \nu_{i}, \ \ \nu_{i} \sim \mathcal{U} \big(-0.01, 0.02 \big) \end{array} $$
(1)
$$\begin{array}{*{20}l} {}z_{0, d, i}^{(p)} &= x_{24, d, C32}^{(p)} + \nu'_{i}, \qquad \nu'_{i} \sim \mathcal{U} \big(-0.5, 0.5 \big) \\ {}z_{t, d, i}^{(p)} &= z_{t^{-}, d, i}^{(p)} + \beta_{d, i}^{(p)} \Big\vert_{\Delta t \, = \, t \, - \, t^{-}}, \qquad t = 1, 2, \cdots, 7 \end{array} $$
(2)

For the output, we create a response matrix V8 × 7 (D=7, T=8) per subject where dose-response values for the initial time epoch are extracted from the 4 parameter sigmoidal dose-response model g(d) in Eq. (8) assuming a, b and θ as fixed but ci (i.e., EC50) to be varying with each subject i. To generate the responses for the remaining 7 time points, we estimate the growth and scaling coefficients (αd,i, γd,i in (20)) as linear models of the slope coefficient vector βd,i with random weight vector sets and take the final estimates as the maxima of the estimates at current and immediately previous doses (following property (ii) of the recursive model in (14)). We assume the 7 dose levels to be linearly spread in the interval [0,1].

$$ {}\begin{aligned} {\alpha}_{d, i} &\!= w_{\alpha, i}^{(0)} + w_{\alpha, i}^{(1)} \beta_{d, i}^{(1)} + \cdots + w_{\alpha, i}^{(14)} \beta_{d, i}^{(14)} = w_{\alpha, i}^{(0)} + \boldsymbol{\beta}_{d, i} \boldsymbol{w}_{\alpha, i} \\ {\gamma}_{d, i} &\!= w_{\gamma, i}^{(0)} + w_{\gamma, i}^{(1)} \beta_{d, i}^{(1)} + \cdots + w_{\gamma, i}^{(14)} \beta_{d, i}^{(14)} = w_{\gamma, i}^{(0)} + \boldsymbol{\beta}_{d, i} \boldsymbol{w}_{\gamma, i} \\ \hat{\alpha}_{d, i} &\,=\, \max\!\big(\!\alpha_{d, i},\!\alpha_{d^{-}, i\!} \big), \ \ \hat{\gamma}_{d, i} = \max\!\big(\gamma_{d, i},\!\gamma_{d^{-}, i} \big), \ \ d = 2, \cdots, 7 \\ \end{aligned} $$
(3)

where \(w_{\cdot, i}^{(p)} \sim \mathcal {U}(0, 1)\) are randomly chosen weight values for protein p in subject i. We then use this \(\hat {\alpha }_{d, i},\ \hat {\gamma }_{d, i}\) in the one-step prediction relation in Eq. (21) to generate responses at t>0. Figure 1 illustrates the monotonicity of drug response surfaces over dose and time for a representative subject with different levels of additive noise in response. An additional section displays the dose-time response surfaces for all 7 subjects [see Additional file 1: Figures S1-S3].

$$ \begin{aligned} \tilde{\alpha}_{d, i} &= \hat{\alpha}_{d, i} \Big\vert_{\Delta t \, = \, t \, - \, t^{-}} \quad \tilde{\gamma}_{d, i} = \hat{\gamma}_{d, i} \Big\vert_{\Delta t \, = \, t \, - \, t^{-}} \\ v_{t, d, i} &= v_{t^{-}, d, i} \; e^{\, \tilde{\gamma}_{d, i} \left(1 \, - \, e^{-\tilde{\alpha}_{d, i}} \right)}, \quad t = 1, 2, \cdots, 7 \end{aligned} $$
(4)
Fig. 1
figure 1

Dose-time response surfaces for synthetic data under various additive noise conditions

Dose-time response prediction

For the simulation study, we assume that the responses are available for the 7 initial time epochs while we predict for the last epoch with both our proposed hybrid recursive model and a standard RF model for each individual. To train these RFs, we use the protein expressions and responses at t<7 as predictors and output respectively and predict for the expression values at t=7. We also analyze the effect of noise in response values. Two different scenarios are shown in Fig. 1 along with the noiseless case, where the additive noise values are sampled respectively from \(\mathcal {U}\!\left (-\delta, \delta \right)\) & \(\mathcal {N}\!\left (0, \textstyle {\frac {\delta ^{2}}{4}} \right)\hspace {-0.3em},\ \delta = 0.05\) and incorporated in (4). Figure 2 displays the predicted dose-response curves overlaid with the actual responses for all 3 scenarios. For objective performance measure, we also perform comparisons between mean square prediction errors (MSPE) for both hybrid and individual models in all 3 scenarios, as illustrated in Table 1.

Fig. 2
figure 2

Predicted synthetic data dose-response curves at t=7 overlaid with the actual dose-response curves

Table 1 Mean square prediction errors (MSPE) for recursive Hybrid Model and Individual RF Models for synthetic data

Application on HMS-LINCS data

As mentioned earlier, HMS-LINCS protein expression and mean apoptosis fraction sets contain data for 10 different cell lines at 7 different doses of 5 different drugs. We only keep protein expression data at 24 & 48 hours post drug application in our predictor set and while the response set contains the complete mean apoptosis fraction data. However, only 7 out of 10 cell lines have complete record on apoptosis fractions and these 7 cell lines form our training set. The remaining 3 cell lines with partial records is used to validate our model. Since we are using protein expressions at only 2 times points, we simply put their differences as predictors \(\beta _{d, i}^{(p)}\) for the RF regression models (in (17)) to predict αd, γd.

$$ \begin{aligned} {\beta}_{d, i}^{(p)} &= x_{48, d, i}^{(p)} \, - \, x_{24, d, i}^{(p)}, \quad p = 1, 2, \cdots, 14 \\ \hat{\alpha}_{d, i} &= \text{RF}_{\alpha}\left({\beta}_{d, i}^{(1)} \;, {\beta}_{d, i}^{(2)} \;, \cdots, \; {\beta}_{d, i}^{(14)} \right) \\ \hat{\gamma}_{d, i} &= \text{RF}_{\gamma}\left({\beta}_{d, i}^{(1)} \;, {\beta}_{d, i}^{(2)} \;, \cdots, \; {\beta}_{d, i}^{(14)} \right) \end{aligned} $$
(5)

Out of the 3 validation cell lines, 2 cell lines (MMAC-SF and SKMEL28) have apoptosis records for 48 & 72 hours, but the 24 hours records are missing. The other cell line (K2) has apoptosis records for 24 & 48 hours with missing 72 hours records. So, for the former two, we use the 48 hours data as baseline and generate 72 hours prediction while for K2 we use the 24 hours data as baseline and generate 48 hours prediction. A cartoon representation of the prediction procedure is shown in Fig. 3.

Fig. 3
figure 3

Prediction of dose-response functions of apoptosis fraction from dose-expression functions of multiple proteins

We perform the one-step ahead prediction in (21) for all 5 drugs and calculate the associated mean MSPE over 3 test cell lines, as shown in Table 2. Similar to the simulation study, we compare these results with mean MSPE for standard individual dose-time RF models to put the performance into perspective. For RF training, we use the observed apoptosis fraction at each dose at 72 hours as our responses and the set of protein expressions at the corresponding dose level at 72 hours as our feature set. Since the protein expressions are not available at 72 hours, we fit a cubic spline to the expression values at each dose level for the 5 time points available and extrapolate to 72 hours for the required feature set. We use this extrapolated expressions to predict the apoptosis fractions at each dose for cell lines MMAC-SF and SKMEL28. For K2, we use the available 48 hour covariate data to train the RFs and perform prediction. We also plot the predicted dose-response values from both models with the observed values in Fig. 4 for 3 representative cell line – drug combination scenarios.

Fig. 4
figure 4

Predicted dose-response curves from Hybrid model and individual RF models at 48/72 hours for 3 representative cell line – drug combinations. The observed dose-response curves are also overlaid

Table 2 Mean square prediction errors (MSPE) for recursive Hybrid Model and Individual RF Models for HMS-LINCS data

Discussion

From the MSPE results in Tables 1 and 2, we can infer that the hybrid model predictions fit the actual dose-response curves significantly better than the individual RF models. For synthetic data, the hybrid model shows a mammoth decrease in MSPE values in Table 1 even when noise is present. Specifically, the overall mean MSPE for hybrid model is 22 times less than the mean MSPE for individual models in noiseless scenario and 11 times less in noisy cases. For the HMS-LINCS results in Table 2, the overall mean MSPE achieved by our model is 0.0044, which is a staggeringly 2.5 times less than the mean MSPE produced by the individual models (0.0105). We can also reach the same conclusion from examining the predicted dose-response curve fits in Figs. 2 and 4. For synthetic data, the predicted curves closely follow the observed dose-response variations even in the presence of significant noise, as demonstrated in Fig. 2 for subject 7 (An additional section illustrates the dose-response predictions for all 7 subjects [see Additional file 1: Figures S4 - S6]). For HMS-LINCS data, we have displayed predictions for 3 representative cases (out of 15 cell line – drug combinations) which also demonstrates the efficacy of our proposed hybrid model. These results bolster our claim that the recursive hybrid model is a much superior predictor of the dose-time response behavior compared to the standard RF methodology.

One important property of the hybrid model is that– given an initial dose-response curve, all dose-response curves at the subsequent time epochs inherit the theoretical properties of the initial curve (following (13)). Therefore, our model cannot accommodate situations where the properties of dose-response curve changes with time. This results in the 3 most glaring mismatches between the observed and predicted dose-response curves for the cell line – drug combinations: MMAC-SF – PLX4720, MMAC-SF – Vemurafenib, and SKMEL28 – SB590885. For these cases, the observed dose-response curves at 72 hours are partially observed sigmoidals where we do not observe the upper asymptotes. Since the Gompertzian kinetics enforces an upper asymptote proportional to the upper asymptote observed at t=0 (following (16)), the model becomes ill-scaled. Due to this susceptibility of our model to misspecification, a naïve RF model outperforms our model in all three situations, as shown in Fig. 5 (An additional section shows the predictions for all 15 cases [see Additional file 1: Figures S7 - S9]). Furthermore, although in 12 out of 15 test cases, our hybrid model outperforms RF, there exists some mismatch between the observed and predicted data. The main reason for this discrepancy is the imprecision associated with the estimates of αd,i & γd,i. Observe that for all dose points, we are estimating the two Gompertzian parameters from only 3 time points. With such limited data, the point estimates are bound to have large uncertainties, reflected by the mismatches. However, an increase in time points can reduce the imprecisions as illustrated in simulation study, where we have included 8 time points and observed a much closer fit for the predicted curves from the hybrid model to the observed (synthetic) dose-response curves, even with the presence of noise (Fig. 2).

Fig. 5
figure 5

Predicted dose-response curves from Hybrid model and individual RF models at 72 hours for 3 cell line – drug combinations for which RF models outperform the Hybrid model. The observed dose-response curves are also overlaid

Model scalability

The limited nature of HMS-LINCS datasets is precisely the reason we decided on conceptualizing a hybrid model. As mentioned above, to our knowledge, HMS-LINCS is the only publicly available database providing both functional predictor and response data. Several standard ML approaches that have been demonstrated to work well with large pharmacological datasets in literature [10], from where we have chosen Random Forest as our regression model due to its superior predictive capabilities [9, 11] as well as efficient handling of large datasets [14]. Moreover, note that there are only a few functional predictors (i.e., 14) and therefore, in terms of scalability, we could handle a large number of samples with the same number of functional predictors using the standard function-on-function regression easily without much demand on the computational resources. The computational burden may appear to escalate if we increase the number of functional predictors while keeping the sample size fixed, but this will be significantly alleviated by the RF modeling that connects the Gompertzian parameters with the slopes of the predictors. Therefore, the posited model is easily scalable for both large sample-size and large feature-size without much demand on computational resources. The time aspect is also not problematic since majority of the studies report observations at 72 hours [3, 4]. Given this time horizon, if the temporal resolution is very high, we can always coarsen the resolution at hourly (or daily) scale. The real bottleneck is the number of dose levels which can enforce a large number of optimization operations since the Gompertzian parameters are estimated sequentially at each time point. However, in reality, we do not expect too many dose levels due to the cost involved in data collection at finely resolved dosages.

We, therefore, need to deal with only two effective dimensions of scalability– (a) increase in the number of subjects, and (b) increase in the number of predictors. The model is set up to be simultaneously scalable in both these dimensions. Moreover, the response model for each subject is modeled independently to obtain subject-specific Gompertzian parameters, and therefore, estimation for each subject is trivially parallelizable. Once the parameter estimates are obtained, an RF model is deployed to connect the covariates with response parameters. To put these above discussion into perspective, we can look into the execution time for hybrid model prediction. For our experiments, the synthetic data case roughly takes 1.1 sec to fit for 6 subjects and predict dose-time responses for subject 7, while for HMS-LINCS data, it takes 0.785 sec to fit the 7 training cell lines and predict temporal dose-responses for all 3 validation cell lines.

Conclusions

We have developed a recursive hybrid methodology to model dose-time responses of individuals characterized by a set of functional covariates. Instead of directly connecting the observed responses with the observed covariates, we have taken a stepwise approach where the responses are modeled separately according to a parametric specification and the parameters of the response models are connected to the covariates via RF regression models. Empirical results suggest that our model provides significant improvement, in terms of MSPE, as compared to a naïve RF model directly connecting responses with covariates. The main strength of our methodology is that it can incorporate additional information about the expected behavior of the responses while letting machine learning methodology drive the processes on which we have no prior information about their expected behavior. We have theoretically shown some desirable properties of our hybrid model and demonstrated its predictive capabilities.

However, all these properties are contingent on proper specification of initial condition. Since all the later dose-response curves inherit the properties of the initial curve, our model cannot account for temporal variation in dose-response curve properties leading to a naïve RF model outperforming it (see Fig. 5). Furthermore, all our predictions are predicated on the availability of a baseline dose-response curve. To handle the situations where an individual comes along with only a set of baseline covariates (say, gene expression), we propose to specify a regression model connecting the baseline dose-response curves in the training set with their respective covariates using either a functional regression approach [5] or a fully data-driven functional random forest approach [15]. We can then use the covariates of a new individual to predict the associated baseline dose-response curve. This will provide us with the necessary initial conditions to utilize our recursive framework to generate the entire collection of dose-response curves at prespecified time epochs. We propose to investigate this approach in future studies.

We also note that the current model is devised to demonstrate that incorporating some known biological properties can bring about significant improvement in predictions. The model is set up to be computationally fast and easily scalable as number of subjects and predictors increase. However, in order to achieve computational efficiency, we forego the opportunity to perform a full blown likelihood based inference. We are currently investigating a Bayesian hierarchical specification given by

$$\begin{array}{*{20}l} y_{t, d, i} &= \mu_{t, d, i} + \varepsilon_{t, d, i} \end{array} $$
(6)

with

$$ \begin{aligned} \mu_{t, d, i} &= \mu_{0, d, i} \, e^{\gamma_{d, i} \left(1 \, - \, e^{-\alpha_{d, i} t} \right)} + \eta_{t, d, i} \\ \mu_{0, d, i} &\sim \mathcal{TN} \left(\left[ a_{0} + \frac{b_{0} - a_{0}}{1 + \left(\frac{c_{0}}{d} \right)^{\!\theta_{0}}} \right]\!,\ \sigma_{0}^{2} \right) \end{aligned} $$
(7)

where \(\mathcal {TN}(\mu, \sigma ^{2})\) is a truncated Normal distribution on an appropriate compact support, while ε, η are independent Gaussian noises with appropriate supports that match the support of the response. This hierachical model in (6 - 7) offers a formal stochastic extension of the posited hybrid model and the subsequent posterior analysis will allow us to create posterior predictive bands for the test cases, thereby offering insights into the adequacy of our model.

Methods

The following sections provide a detailed analysis of our proposed Recursive Hybrid Model along with the desired underlying properties.

Model specification

Let yt,d,i be the mean apoptosis fraction at time t and dose d for subject i, d=1,2,,D; t=1,2,,T; i=1,2,,m. We suppress the subscript i for the moment and develop a dose-time model for each individual. A simple parametric model for dose-time responses can be written as

$$ \begin{aligned} y_{t, d} &= f(t) \, g(d) + \varepsilon \\ \text{where}\\ &f(t) = a_{0} \, e^{\gamma \left(1 \, - \, e^{-\alpha t} \right)} \\ &g(d) = a + \frac{b - a}{1 + \left(\frac{c}{d} \right)^{\!\theta}} \end{aligned} $$
(8)

where f(t) is a Gompertz model in time t with α controlling the growth rate and g(d) is the sigmoidal model in dose d with a being the lower asymptotic response at d=0, b being the upper asymptotic response at d=, c is often interpreted as EC50, θ is the Hill slope i.e., the slope at the steepest part of the sigmoidal curve and ε is usually assumed to be Gaussian noise. Cross-sectionally, the four parameter sigmoidal function g(d) is widely used in dose-response studies [1620] while longitudinally, the two parameter Gompertz model f(t) is arguably the most popular growth model in tumor modeling efforts [16, 2124]. Therefore, both g(d) and f(t) are well suited if we wish to model the corresponding marginal processes separately. However, assuming a separable product model describing the joint process enforces some unjustifiable assumptions. For instance, such a separable model induces dose invariant temporal growth rate which contradicts Haber’s law [7] and also empirical observations suggesting growth rate at higher dose is significantly different from that observed at small dose levels [6].

To remove separability, we can introduce dose-dependent parameters in f(t) and time-dependent parameters in g(d). However, specification of time-dependent parameter makes temporal prediction of responses quite challenging. Furthermore, inclusion of both time-dependent and dose-dependent parameters in Eq. (8) will make the model heavily overparametrized. Therefore, to guarantee the estimability of these parameters, we need to enforce some dependence among these parameters. We capture this dependence by specifying a temporally recursive model for dose-response curves which implicitly induces dependence among the parameters. The recursion relation enables us to generate temporal predictions without incorporating the foregoing dependence explicitly.

Gompertzian recursion

Broadly speaking, our strategy is to fit a sigmoidal dose-response curve at the first observed time epoch and then enforce the subsequent dose-response curves to follow Gompertzian law at each dose point. Gompertzian kinetics essentially suggests exponential growth with exponentially decaying growth coefficient. Several empirical studies have reported Gompertz model to best fit tumor growth data [23, 25, 26]. Now, in order to effectively reduce the size of tumor, the kill rate induced by the drug should mimic the tumor growth pattern and hence we expect the dynamics of apoptosis fraction to follow Gompertzian kinetics too. To illustrate our conceptualization of Gompertzian law, we drop the subscript d momentarily. We begin with Ricker’s parametrization [27] of Gompertz model given by

$$ n_{t} = n_{0} \, e^{\gamma \left(1 \, - \, e^{-\alpha t} \right)}, \quad \gamma \geq 0, \; \alpha \in \mathbb{R} $$
(9)

where nt is expected value of the trait (number, density, etc.) under consideration at time t, n0 is the initial value of the trait giving the starting point on the growth curve, α is the growth coefficient and γ controls the upper asymptote of the growth curve (n0 eγ) by scaling the curve vertically (i.e., the scaling coefficient). In our situation, the focal trait is the mean apoptosis fraction. Therefore, our temporal model for the apoptosis fraction is posited as

$$ y_{t} = y_{0} \, e^{\gamma \left(1 \, - \, e^{-\alpha t} \right)}, \quad \gamma \geq 0, \; \alpha \in \mathbb{R} $$
(10)

where y0 is viewed as the mean apoptosis fraction observed immediately after drug application. Now, we can insert the dose subscript d in (10) and specify a sigmoidal dose-response model for y0, i.e.

$$ y_{0, d} = a_{0} + \frac{b_{0} - a_{0}}{1 + \left(\frac{c_{0}}{d} \right)^{\!\theta_{0}}} $$
(11)

Inserting (11) in (10) yield a model for expected dose-response at time t and dose d as

$$ y_{t, d} = \left[ a_{0} + \frac{b_{0} - a_{0}}{1 + \left(\frac{c_{0}}{d} \right)^{\!\theta_{0}}} \right] e^{\gamma \left(1 \, - \, e^{-\alpha t} \right)} \\ $$
(12)

It is easy to see (12) yields the following recursion relation between the apoptosis fractions observed at two consecutive time points at each dose d

$$\begin{array}{@{}rcl@{}} y_{t+1, d} = y_{t,d} \, e^{\left[ \gamma \left(1 \, - \, e^{-\alpha} \right) e^{-\alpha t} \right]} \end{array} $$
(13)

Now, observe that if we assume α and γ to be dose invariant in (12), since \(e^{\gamma \left (1 \, - \, e^{-\alpha t} \right)}\) is a non-negative quantity, yt,d inherits the sigmoidal property from y0,d. Therefore, the collection of dose-response curves over time are sigmoidal in dose and Gompertzian in time. Furthermore, Eq. (13) connects two sigmoidal curves at consecutive time epochs with the Gompertzian law implicitly inducing temporal dependence among sigmoidal parameters. Evidently, the recursion relation allows prediction in time despite having implicit time-dependent sigmoidal parameters.

Properties of the recursive model

Observe that in Eq. (12), the assumption of dose-invariant Gompertzian parameters yields an unrealistic scenario. Particularly at d=0, the lower asymptotes of the sigmoidal curves shifts with time. It is hard to justify why apoptosis fraction will change with time even though no drug is administered. To alleviate such drawbacks, we make both growth and scaling coefficients dose-dependent. Consequently, (12) becomes

$$ \begin{aligned} y_{t, d} &= \left[ a_{0} + \frac{b_{0} - a_{0}}{1 + \left(\frac{c_{0}}{d} \right)^{\!\theta_{0}}} \right] e^{\gamma_{d} \big(1 \, - \, e^{-\alpha_{d} t} \big)} \end{aligned} $$
(14)

while the recursion relation between two consecutive time points at dose d is still given by (13) after incorporating αd, γd. To investigate the asymptotic property of (14), we impose the following reasonable constraints

  • Since apoptosis fraction is expected to increase as drug concentration increases, we assume that both αd\(\left (\in \mathbb {R}\; \forall d \geq 0 \right)\) and γd\(\left (\in \mathbb {R}^{+}\cup \{0\} \; \forall d\geq 0\right)\) are non-decreasing functions of dose d.

  • \({{\lim }_{d \, \to \, 0} {\gamma _{d}} = 0}\)

  • Apoptosis fraction is bounded in [0,1], therefore, we assume that \({{\lim }_{d \, \to \, \infty } {\alpha _{d}} = \alpha ^{*}}\) and \({{\lim }_{d \, \to \, \infty } {\gamma _{d}} = \gamma ^{*}}\) where α, γ are finite positive constants that satisfy the bounded property of apoptosis fractions.

Under these assumptions, we have the following properties of the dose-response curve at time t, obtained from (14).

  1. (i)

    At d=0, assumption (b) forces the asymptotes of the sigmoidal curves to be stationary at a0 regardless of time. This is justifiable since we do not expect to see any change in apoptosis fraction from the baseline a0 without the application of a drug, regardless of how much time has elapsed.

  2. (ii)

    Given a particular time t, assumption (a) makes \(\phantom {\dot {i}\!}e^{\gamma _{d} \big (1 \, - \, e^{-\alpha _{d} t} \big) }\) an increasing function of dose d. If y0,d is also an increasing function of d, then so is yt,d (following (14)) with lower asymptote at a0 and upper asymptote at \(b_{t} = b_{0} e^{\gamma ^{*} \big (1 \, - \, e^{-\alpha ^{*} t} \big)}\). From assumption (c), α, γ>0, therefore b0>a0bt>a0.

  3. (iii)

    Since bt is an increasing function of time, the upper asymptotes of the dose-response curves increases with time indicating the expected greater efficacy at higher dose levels. In general, since the dose-response curves at each time epoch is monotonically increasing in dose (assuming y0,d is increasing in dose), Eq. (14) suggests dose-response curves at later time points will dominate the curves observed at initial time points. Also, the fact that the apoptosis fraction is bounded above by 1 is easily realized by specifying \(b_{0} < e^{-\gamma ^{*}}\).

  4. (iv)

    Eq. (13) and assumption (c) suggest that

    $$\begin{array}{*{20}l} {\lim}_{t \to \infty} \left[ \frac{y_{t+1, d}}{y_{t, d}} \right] = 1 \end{array} $$
    (15)

    indicating that the area under the curve (AUC) of the dose response curves do not become arbitrarily large with time but achieves a theoretical steady-state. A graphical representation of the above properties of our proposed recursive model (Eq. (14)) is shown in Fig. 6.

    Fig. 6
    figure 6

    An illustration of the behavior of the recursive Hybrid model developed in Eq. (14) for various dose-time points. The dose-response curve at t=50 (green) cannot be distinguished from the curve at t=100 (indigo) demonstrating the asymptotic behavior of the dose-responses curves

Regression model for individuals

In the preceding sections, we have discussed the dose-time model for responses observed for a single individual. We have several such individuals, each characterized by a set of protein expressions observed over both dose and time. Let \(x_{t, d, i}^{(p)}\) denote the observed expression for protein p in subject i at dose d and time t, p=1,2,,P, t=1,2,,T, d=1,2,,D, i=1,2,,m. We now connect these \(x_{t, d, i}^{(p)}\) values with the individual dose-time model specified in (14) by introducing the subscript i, which yields

$$ \begin{aligned} y_{t, d, i} &= \left[ a_{0, i} + \frac{b_{0, i} - a_{0, i}}{1 + \left(\frac{c_{0, i}}{d} \right)^{\!\theta_{0, i}}} \right] e^{\gamma_{d, i} \left(1 \, - \, e^{-\alpha_{d, i} t} \right)} \\ \end{aligned} $$
(16)

and the recursion relation between consecutive time points for an individual i at dose d is still given by (13) after incorporating the parameters αd,i, γd,i.

For each individual, we begin with fitting a sigmoidal dose-response curve at t=0 to estimate a0,i, b0,i, c0,i, θ0,i. We then posit that, for each individual, the growth and scaling coefficients (αd,i & γd,i) are determined by the rate of change in protein expressions over time i.e., fit a temporal trend model for each \(x_{\cdot, d, i}^{(p)}\) time series and use the slope coefficients\(\beta _{d, i}^{(p)} \, \bigg (\,=\,\frac {dx_{\cdot, d, i}^{(p)}}{dt}\!\bigg)\) as the predictors in two Random Forest (RF) regression models in (17). However, since αd,i, γd,i are not observed, we plug their estimates in LHS.

$$ \begin{aligned} \alpha_{d, i} &= \text{RF}_{\alpha}\!\left(\beta_{d, i}^{(1)}, \beta_{d, i}^{(2)}, \cdots, \beta_{d, i}^{(P)} \right)\\ \gamma_{d, i} &= \text{RF}_{\gamma}\!\left(\beta_{d, i}^{(1)}, \beta_{d, i}^{(2)}, \cdots, \beta_{d, i}^{(P)} \right) \end{aligned} $$
(17)

The algorithm for the stepwise fitting procedure is given below.

Stepwise fitting algorithm

  • At t=0, fit a sigmoidal curve to the observed dose-responses for each individual i and obtain the estimates \(\hat {a}_{0, i}\), \(\hat {b}_{0, i}\), \(\hat {c}_{0, i}\), \(\hat {\theta }_{0, i}\) and hence obtain \(\hat {y}_{0, d, i}\).

  • For each individual in the training set at each dose level, obtain the least square estimates of αd,i & γd,i using the recursion relation in (13). That is

    $$\begin{array}{*{20}l} {}\left(\tilde{\alpha}_{d, i}, \, \tilde{\gamma}_{d, i} \right) \,=\, \underset{\alpha_{d, i}, \, \gamma_{d, i}}{\text{argmin}} \sum_{t = 1}^{T} \!\left[ y_{t, d, i} \,-\, \hat{y}_{t^{-}, d, i} \, e^{\gamma_{d, i} \left(e^{-\alpha_{d, i} t^{-}} - e^{-\alpha_{d, i} t} \right)} \right]^{2} \end{array} $$
    (18)

    where \(\hat {y}_{t^{-}, d, i}\) is the dose-time response at the immediate preceding time point t. Usually, we perform a one-step ahead prediction for a specific time epoch t, considering t=0 and t=1 in (18), which yields

    $$\begin{array}{*{20}l} {}\left(\tilde{\alpha}_{d, i}, \, \tilde{\gamma}_{d, i} \right) = \underset{\alpha_{d, i}, \, \gamma_{d, i}}{\text{argmin}} \sum_{t = 1}^{T} \!\left[ y_{t, d, i} \, - \, \hat{y}_{t^{-}, d, i} \, e^{\gamma_{d, i} \big(1 \, - \, e^{-\alpha_{d, i}} \big)} \right]^{2} \end{array} $$
    (19)

    To satisfy assumption (c), we take the final estimates to be \(\hat {\alpha }_{d, i} = \max \!\left (\tilde {\alpha }_{d, i},\ \tilde {\alpha }_{d^{-}, i} \right)\) and \(\hat {\gamma }_{d, i} = \max \!\left (\tilde {\gamma }_{d, i},\ \tilde {\gamma }_{d^{-}, i} \right)\) where \((\cdot)_{d^{-}, i}\phantom {\dot {i}\!}\) is the estimate of that parameter obtained at immediately preceding dose level d.

  • Fit a trend model to the time series of each protein expression for each subject at each dose level and obtain the slope coefficients \(\beta _{d, i}^{(1)}, \beta _{d, i}^{(2)}, \cdots, \beta _{d, i}^{(P)}\).

  • If the training set consists of m individuals, then at each dose d, define m×1 vectors \(\hat {\boldsymbol {\alpha }}_{d},\ \hat {\boldsymbol {\gamma }}_{d}\) and m×P matrix βd as

    $$ {}\begin{aligned} \hat{\boldsymbol{\alpha}}_{d} &= \left[\begin{array}{cccc} \hat{\alpha}_{d, 1} & \hat{\alpha}_{d, 2} & \cdots & \hat{\alpha}_{d, m} \end{array}\right]^{T} \\ \hat{\boldsymbol{\gamma}}_{d} &= \left[\begin{array}{cccc} \hat{\gamma}_{d, 1} & \hat{\gamma}_{d, 2} & \cdots & \hat{\gamma}_{d, m} \end{array}\right]^{T} \\ \boldsymbol{\beta}_{d} &= \left[\begin{array}{cccc} \beta_{d, 1}^{(1)} & \beta_{d, 1}^{(2)} & \cdots & \beta_{d, 1}^{(P)} \\ \beta_{d, 2}^{(1)} & \beta_{d, 2}^{(2)} & \cdots & \beta_{d, 2}^{(P)} \\ \vdots & \vdots & \ddots & \vdots \\ \beta_{d, m}^{(1)} & \beta_{d, m}^{(2)} & \cdots & \beta_{d, m}^{(P)} \end{array}\right] = \left[\begin{array}{c} \boldsymbol{\beta}_{d, 1} \\ \boldsymbol{\beta}_{d, 2} \\ \vdots \\ \boldsymbol{\beta}_{d, m} \end{array}\right] \end{aligned} $$
    (20)

    Use βd as the covariate matrix and \(\hat {\boldsymbol {\alpha }}_{d},\ \hat {\boldsymbol {\gamma }}_{d}\) as the output to train the RFs in (17) modeling the Gompertzian parameters.

  • To predict for a new individual, use the corresponding matrix \(\boldsymbol {\beta }_{d}^{\, \text {new}}\) as input to get \(\hat {\boldsymbol {\alpha }}_{d}^{\, \text {new}},\ \hat {\boldsymbol {\gamma }}_{d}^{\, \text {new}}\). Use the observed baseline dose response for this individual and generate temporal forecast using (16). For the one-step ahead prediction in (19), the recursion relation in (13) simplifies as

    $$\begin{array}{@{}rcl@{}} y_{t, d, i} = y_{t^{-}, d, i} \, e^{\, \gamma_{d, i} \big(1 \, - \, e^{-\alpha_{d, i}} \big)} \end{array} $$
    (21)

Abbreviations

AUC:

Area under the curve

HMS-LINCS:

Harvard medical school library of integrated network-based cellular signatures

MSPE:

Mean squared prediction error

RF:

Random forest

References

  1. Shruthi B, Vinodhkumar P, Selvamani M. Proteomics: A new perspective for cancer. Adv Biomed Res. 2016; 5(1):67. https://doi.org/10.4103/2277-9175.180636.

    Article  Google Scholar 

  2. Fallahi-Sichani M, Moerke NJ, Niepel M, Zhang T, Gray NS, Sorger PK. Systematic analysis of brafv600e melanomas reveals a role for jnk/c-jun pathway in adaptive resistance to drug-induced apoptosis. Mol Syst Biol. 2015; 11(3):797. https://doi.org/10.15252/msb.20145877.

    Article  Google Scholar 

  3. Śliwka L, Wiktorska K, Suchocki P, Milczarek M, Mielczarek S, Lubelska K, Cierpiał T, ŁyŻwa P, Kiełbasiński P, Jaromin A, et al. The comparison of mtt and cvs assays for the assessment of anticancer agent interactions. PloS one. 2016; 11(5):0155772.

    Article  Google Scholar 

  4. Riss TL, Moravec RA, Niles AL, Duellman S, Benink HA, Worzella TJ, Minor L. Cell viability assays. Assay Guidance Manual. 2016. https://www.ncbi.nlm.nih.gov/books/NBK144065/.

  5. Ramsay J, Ramsay J, Silverman BW. Functional Data Analysis, Springer Series in Statistics: Springer; 2005. https://books.google.com/books?id=mU3dop5wY_4C.

  6. Matlock K, Dhruba SR, Nazir M, Pal R. An investigation of proteomic data for application in precision medicine. In: Biomedical & Health Informatics (BHI), 2018 IEEE EMBS International Conference On. USA: IEEE: 2018. p. 377–80.

    Google Scholar 

  7. Bliss C. The relation between exposure time, concentration and toxicity in experiments on insecticides. Ann Entomol Soc Am. 1940; 33(4):721–66.

    Article  CAS  Google Scholar 

  8. Miller FJ, Schlosser PM, Janszen DB. Haber’s rule: a special case in a family of curves relating concentration and duration of exposure to a fixed level of response for a given endpoint. Toxicology. 2000; 149(1):21–34.

    Article  CAS  Google Scholar 

  9. Wan Q, Pal R. An ensemble based top performing approach for nci-dream drug sensitivity prediction challenge. PloS one. 2014; 9(6):101183.

    Article  Google Scholar 

  10. De Niz C, Rahman R, Zhao X, Pal R. Algorithms for drug sensitivity prediction. Algorithms. 2016; 9(4):77.

    Article  Google Scholar 

  11. Dhruba SR, Rahmanl R, Matlockl K, Ghosh S, Pal R. Dimensionality reduction based transfer learning applied to pharmacogenomics databases. In: 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). USA: IEEE: 2018. p. 1246–9.

    Google Scholar 

  12. Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, Wilson CJ, Lehár J, Kryukov GV, Sonkin D, et al. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012; 483(7391):603–7.

    Article  CAS  Google Scholar 

  13. Yang W, Soares J, Greninger P, Edelman EJ, Lightfoot H, Forbes S, Bindal N, Beare D, Smith JA, Thompson IR, et al. Genomics of drug sensitivity in cancer (gdsc): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 2013; 41(D1):955–61.

    Article  CAS  Google Scholar 

  14. Genuer R, Poggi JM, Tuleau-Malot C, Villa-Vialaneix N. Random forests for big data. Big Data Res. 2017; 9:28–46.

    Article  Google Scholar 

  15. Rahman R, Dhruba SR, Ghosh S, Pal R. Functional random forests with applications in dose response predictions. Sci Rep. 2019; 9(1):1628.

    Article  Google Scholar 

  16. Pal R. Predictive Modeling of Drug Sensitivity. Cambridge: Academic Press; 2016.

    Google Scholar 

  17. Gadagkar SR, Call GB. Computational tools for fitting the hill equation to dose–response curves. J Pharmacol Toxicol Methods. 2015; 71:68–76.

    Article  CAS  Google Scholar 

  18. Gesztelyi R, Zsuga J, Kemeny-Beke A, Varga B, Juhasz B, Tosaki A. The hill equation and the origin of quantitative pharmacology. Arch Hist Exact Sci. 2012; 66(4):427–38.

    Article  Google Scholar 

  19. Knack I, Rohm K. Analysis of cooperative ligand-binding and steady-state kinetic data-computer-program for calculating parameters of hill equation with a variable hill coefficient. In: Hoppe-Seylers Zeitschrift Fur Physiologische Chemie. Genthiner Strasse 13, D-10785 Berlin, Germany: Walter de Gruyter & Co.: 1977. p. 262.

    Google Scholar 

  20. Leone FA, Baranauskas JA, Furriel RPM, Borin IA. Sigrafw: An easy-to-use program for fitting enzyme kinetic data. Biochem Mol Biol Educ. 2005; 33(6):399–403.

    Article  CAS  Google Scholar 

  21. Benzekry S, Lamont C, Beheshti A, Tracz A, Ebos JML, Hlatky L, Hahnfeldt P. Classical mathematical models for description and prediction of experimental tumor growth. PLoS Comput Biol. 2014; 10(8). https://doi.org/10.1371/journal.pcbi.1003800.

    Article  Google Scholar 

  22. Norton L. A gompertzian model of human breast cancer growth. Cancer Res. 1988; 48(24 Part 1):7067–71.

    CAS  PubMed  Google Scholar 

  23. Laird AK. Dynamics of tumour growth. Br j Cancer. 1964; 18(3):490.

    Article  Google Scholar 

  24. Casey AE. The experimental alteration of malignancy with an homologous mammalian tumor material: I. results with intratesticular inoculation. Am J Cancer. 1934; 21(4):760–75.

    Google Scholar 

  25. Chignola R, Foroni RI. Estimating the growth kinetics of experimental tumors from as few as two determinations of tumor size: implications for clinical oncology. IEEE Trans Biomed Eng. 2005; 52(5):808–15.

    Article  Google Scholar 

  26. Norton L, Simon R, Brereton HD, Bogden AE. Predicting the course of gompertzian growth. Nature. 1976; 264(5586):542.

    Article  CAS  Google Scholar 

  27. Ricker W. Growth rates and models. Fish Physiol. 1979; 8:673–743. Cambridge.

    Google Scholar 

Download references

Acknowledgments

Not applicable.

Funding

This work was supported by NIH grant R01GM122084. The publication costs of this article was funded by NIH grant R01GM122084.

Availability of data and materials

For the analysis of the recursive hybrid model, MATLAB codes (with the synthetic data example) are available in the following link: https://github.com/dhruba018/Dose_time_Response_Recursive_Model. The dose-time proteomic expression and corresponding apoptosis data are available through the LINCS data portal: http://lincsportal.ccs.miami.edu/dcic-portal.

About this supplement

This article has been published as part of BMC Bioinformatics Volume 20 Supplement 12, 2019: Selected original research articles from the Fifth International Workshop on Computational Network Biology: Modeling, Analysis and Control (CNB-MAC 2018): Bioinformatics. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-20-supplement-12.

Author information

Authors and Affiliations

Authors

Contributions

SRD, AR, SG and RP conceived of and designed the experiments. SRD, AR and RR performed the experiments. SRD, AR and SG analyzed the data. SRD, AR, RR, SG and RP wrote the paper. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Souparno Ghosh.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional file

Additional file 1

Supplementary information to recursive model for dose-time responses in pharmacological studies. Figure S1. Dose-time response surface for synthetic data without noise. Figure S2. Dose-time response surface for synthetic data corrupted with Uniform noise. Figure S3. Dose-time response surface for synthetic data corrupted with Gaussian noise. Figure S4. Predicted dose-response curves at t=7 overlaid with the observed dose-response curves for synthetic data without noise. Figure S5. Predicted dose-response curves at t=7 overlaid with the observed dose-response curves for synthetic data corrupted with Uniform noise. Figure S6. Predicted dose-response curves at t=7 overlaid with the observed dose-response curves for synthetic data corrupted with Gaussian noise. Figure S7. Predicted dose-response curves obtained from Hybrid model and individual RF models at 48 hours for cell line K2. The observed dose-response curves are also overlaid. Figure S8. Predicted dose-response curves obtained from Hybrid model and individual RF models at 72 hours for cell line MMAC-SF. The observed dose-response curves are also overlaid. Figure S9. Predicted dose-response curves obtained from Hybrid model and individual RF models at 72 hours for cell line SKMEL28. The observed dose-response curves are also overlaid. (PDF 2220 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dhruba, S.R., Rahman, A., Rahman, R. et al. Recursive model for dose-time responses in pharmacological studies. BMC Bioinformatics 20 (Suppl 12), 317 (2019). https://doi.org/10.1186/s12859-019-2831-4

Download citation

  • Published:

  • DOI: https://doi.org/10.1186/s12859-019-2831-4

Keywords