 Research
 Open Access
 Published:
Recursive model for dosetime responses in pharmacological studies
BMC Bioinformatics volume 20, Article number: 317 (2019)
Abstract
Background
Clinical studies often track doseresponse curves of subjects over time. One can easily model the doseresponse curve at each time point with Hill equation, but such a model fails to capture the temporal evolution of the curves. On the other hand, one can use Gompertz equation to model the temporal behaviors at each dose without capturing the evolution of time curves across dosage.
Results
In this article, we propose a parametric model for dosetime responses that follows Gompertz law in time and Hill equation across dose approximately. We derive a recursion relation for doseresponse curves over time capturing the temporal evolution and then specify a regression model connecting the parameters controlling the dosetime responses with individual level proteomic data. The resultant joint model allows us to predict the doseresponse curves over time for new individuals.
Conclusion
We have compared the efficacy of our proposed Recursive Hybrid model with individual doseresponse predictive models at desired time points. We note that our proposed model exhibits a superior performance compared to the individual ones for both synthetic data and actual pharmacological data. For the desired dosetime varying genetic characterization and drug response values, we have used the HMSLINCS database and demonstrated the effectiveness of our model for all available anticancer compounds.
Background
One of the most important goal of precision medicine is to predict sensitivity of an anticancer drug to a given patient. Although, patients are most often characterized by their gene expressions, a more precise characterization is obtained by studying their proteomic expressions [1]. Harvard Medical School Library of Integrated NetworkBased Cellular Signatures (HMSLINCS) [2] offers proteomic data for cancer cell lines measured at various time intervals post drug application along with observed apoptosis fractions over different drug concentration and time. Given the evolution of protein expression over drug concentration and time, we would like to predict the apoptosis fractions of individuals over the same drug concentration at 72 hours post application of drug as 72 hours is considered to be the steadystate for cell viability studies [3, 4].
Our data, therefore, consists of a collection of temporally varying doseresponse curves for each individual and the predictors are also a collection of temporally varying expressions, for multiple proteins, observed at different drug concentrations. This may appear as a standard functiononfunction concurrent regression [5], but several obstacles arise, such as– (a) number of functional predictors exceeds the number of cell lines, (b) protein expression curves are observed at different time points as compared to the doseresponse curves with only little overlap. Consequently, standard parametric statistical models cannot be readily applied here. Turning to modelfree procedures, Matlock et al. [6] used Random Forest (RF) methodology to analyze the same HMSLINCS data. While their methodology alleviates the small samplehigh dimension problem, it cannot make temporal prediction of doseresponse curves in absence of predictor information at the predicting time point. More precisely, given the observed doseresponse curve of an individual at 48 hours, it cannot predict the curve at 72 hours. Therefore, the RF model either needs to wait till 72 hours and observe the protein expression curves to predict response, or it requires extrapolating protein expression data to 72 hours and then make predictions.
Standard machine learning (ML) approaches also fail to explicitly take into account a few well established properties of doseresponse curves and their temporal evolution. For instance, Haber’s law suggest a monotonic relationship between responses observed at two successive time points at a given dose [7] and this rule, in turn, induces the doseresponse curve observed at a later time point to dominate or be dominated by the curves observed at earlier time points [8]. Such constraints cannot be easily built into ML algorithms. Furthermore, these models provide little insight into the steadystate properties of the doseresponse curves. Regardless of these shortcomings, several studies have demonstrated superior predictive performance of RF based models in drug sensitivity predictions [9–11].
To alleviate some theoretical restrictions of the foregoing ML approaches, while borrowing the predictive strength of RF methodology, we offer a hybrid model that satisfies some physical laws that doseresponse curves are expected to satisfy while retaining a flexible modelfree relationship between predictors and responses. In particular, we propose a parametric model for dosetime responses that follows the Gompertz law in time and approximately follows the Hill equation across dose. We derive a recursion relation for doseresponse curves over time capturing the temporal evolution and theoretically examine their steadystate behavior. We then specify an RF model connecting the parameters controlling the dosetime responses with individual level proteomic data. The resultant joint model allows us to predict doseresponse curves over time for new individuals. The complete fitting code along with a synthetic example can be obtained from GitHub via: https://github.com/dhruba018/Dose_time_Response_Recursive_Model.
Results
We have evaluated the performance of our proposed recursive hybrid model using both synthetic data and HMSLINCS database mentioned above. Note that, we were forced to limit our analysis to a single dataset since, to our knowledge, HMSLINCS is the only publicly available source offering functional responses as well as predictors. Furthermore, dimensions of HMSLINCS datasets are restricted to a handful of drugs and samples with a higher number of predictors in contrast to some common pharmacogenomics databases (e.g., CCLE [12] or GDSC [13]) that provides doseresponse curves for hundreds of samples, but with static feature sets. Here, we use HMSLINCS as our synthetic data generation baseline first and then directly for analysis.
Description of HMSLINCS datasets
We used two distinct datasets from HMSLINCS as our predictor and response sets. The predictor set consists of dosetime expression for 21 proteins and the response set contains the mean apoptosis fractions observed in 10 BRAF ^{V600E/D} melanoma cell lines over multiple doses and time points [2, 6]. Both protein expressions and apoptosis fractions are available post drug application at 7 dose levels ranging from 3.2 nM to 3.2 μM. However, while the protein expressions are available at 5 different time points (1,5,10,24 & 48 hours post drug application), the apoptosis data is available only for 24,48 & 72 hours post drug application. Apoptosis fraction was computed from the number of apoptotic cells at each dosetime point and the total initial number of cells normalized with DMSO control and then averaged over 4 replicates to produce the mean value. Both sets are available for 5 RAF/MEK inhibitor drugs. More detailed descriptions can be obtained from [2, 6].
For this study, we only use protein expressions observed at 24 & 48 hours, since those expressions match the temporal record of the responses (24,48 & 72 hours). For the 72 hours scenario, we use a time series model for data extrapolation (elaborate description is provided below in “Application on HMSLINCS data” section). Moreover, only 14 out of 21 proteins have complete records in the covariate set, therefore, we only use these 14 proteins as our final predictors.
Simulation study
To demonstrate the efficacy of our proposed model, we have performed a simulation study involving two synthetic datasets with 7 subjects, 7 dose levels, and 8 time points each (m=7, D=7, T=8). The predictor set contains the expression of 14 different covariates at each dosetime point (P=14). For detailed explanation of the terms used here, look into the “Methods” section.
Synthetic data generation
We first form the slope coefficient matrix in Eq. (20) by using the difference between HMSLINCS protein expressions \(x_{t, d, i}^{(p)}\) at 24 & 48 hours post application of the drug AZ628 as the base β_{d,i} and add random noise to create distinction between subjects. This yields a (β_{d,i})_{49 × 14} matrix per subject (for 7 time point differences and 7 doses). We also use the 24 hour expression data post AZ628 application in cell line C32 as baseline covariates and add random noise to create our predictors per subject \(z_{t, d, i}^{(p)}, \quad p = 1, 2, \cdots, 14\).
For the output, we create a response matrix V_{8 × 7} (D=7, T=8) per subject where doseresponse values for the initial time epoch are extracted from the 4 parameter sigmoidal doseresponse model g(d) in Eq. (8) assuming a, b and θ as fixed but c_{i} (i.e., EC_{50}) to be varying with each subject i. To generate the responses for the remaining 7 time points, we estimate the growth and scaling coefficients (α_{d,i}, γ_{d,i} in (20)) as linear models of the slope coefficient vector β_{d,i} with random weight vector sets and take the final estimates as the maxima of the estimates at current and immediately previous doses (following property (ii) of the recursive model in (14)). We assume the 7 dose levels to be linearly spread in the interval [0,1].
where \(w_{\cdot, i}^{(p)} \sim \mathcal {U}(0, 1)\) are randomly chosen weight values for protein p in subject i. We then use this \(\hat {\alpha }_{d, i},\ \hat {\gamma }_{d, i}\) in the onestep prediction relation in Eq. (21) to generate responses at t>0. Figure 1 illustrates the monotonicity of drug response surfaces over dose and time for a representative subject with different levels of additive noise in response. An additional section displays the dosetime response surfaces for all 7 subjects [see Additional file 1: Figures S1S3].
Dosetime response prediction
For the simulation study, we assume that the responses are available for the 7 initial time epochs while we predict for the last epoch with both our proposed hybrid recursive model and a standard RF model for each individual. To train these RFs, we use the protein expressions and responses at t<7 as predictors and output respectively and predict for the expression values at t=7. We also analyze the effect of noise in response values. Two different scenarios are shown in Fig. 1 along with the noiseless case, where the additive noise values are sampled respectively from \(\mathcal {U}\!\left (\delta, \delta \right)\) & \(\mathcal {N}\!\left (0, \textstyle {\frac {\delta ^{2}}{4}} \right)\hspace {0.3em},\ \delta = 0.05\) and incorporated in (4). Figure 2 displays the predicted doseresponse curves overlaid with the actual responses for all 3 scenarios. For objective performance measure, we also perform comparisons between mean square prediction errors (MSPE) for both hybrid and individual models in all 3 scenarios, as illustrated in Table 1.
Application on HMSLINCS data
As mentioned earlier, HMSLINCS protein expression and mean apoptosis fraction sets contain data for 10 different cell lines at 7 different doses of 5 different drugs. We only keep protein expression data at 24 & 48 hours post drug application in our predictor set and while the response set contains the complete mean apoptosis fraction data. However, only 7 out of 10 cell lines have complete record on apoptosis fractions and these 7 cell lines form our training set. The remaining 3 cell lines with partial records is used to validate our model. Since we are using protein expressions at only 2 times points, we simply put their differences as predictors \(\beta _{d, i}^{(p)}\) for the RF regression models (in (17)) to predict α_{d}, γ_{d}.
Out of the 3 validation cell lines, 2 cell lines (MMACSF and SKMEL28) have apoptosis records for 48 & 72 hours, but the 24 hours records are missing. The other cell line (K2) has apoptosis records for 24 & 48 hours with missing 72 hours records. So, for the former two, we use the 48 hours data as baseline and generate 72 hours prediction while for K2 we use the 24 hours data as baseline and generate 48 hours prediction. A cartoon representation of the prediction procedure is shown in Fig. 3.
We perform the onestep ahead prediction in (21) for all 5 drugs and calculate the associated mean MSPE over 3 test cell lines, as shown in Table 2. Similar to the simulation study, we compare these results with mean MSPE for standard individual dosetime RF models to put the performance into perspective. For RF training, we use the observed apoptosis fraction at each dose at 72 hours as our responses and the set of protein expressions at the corresponding dose level at 72 hours as our feature set. Since the protein expressions are not available at 72 hours, we fit a cubic spline to the expression values at each dose level for the 5 time points available and extrapolate to 72 hours for the required feature set. We use this extrapolated expressions to predict the apoptosis fractions at each dose for cell lines MMACSF and SKMEL28. For K2, we use the available 48 hour covariate data to train the RFs and perform prediction. We also plot the predicted doseresponse values from both models with the observed values in Fig. 4 for 3 representative cell line – drug combination scenarios.
Discussion
From the MSPE results in Tables 1 and 2, we can infer that the hybrid model predictions fit the actual doseresponse curves significantly better than the individual RF models. For synthetic data, the hybrid model shows a mammoth decrease in MSPE values in Table 1 even when noise is present. Specifically, the overall mean MSPE for hybrid model is ∼ 22 times less than the mean MSPE for individual models in noiseless scenario and ∼ 11 times less in noisy cases. For the HMSLINCS results in Table 2, the overall mean MSPE achieved by our model is 0.0044, which is a staggeringly ∼ 2.5 times less than the mean MSPE produced by the individual models (0.0105). We can also reach the same conclusion from examining the predicted doseresponse curve fits in Figs. 2 and 4. For synthetic data, the predicted curves closely follow the observed doseresponse variations even in the presence of significant noise, as demonstrated in Fig. 2 for subject 7 (An additional section illustrates the doseresponse predictions for all 7 subjects [see Additional file 1: Figures S4  S6]). For HMSLINCS data, we have displayed predictions for 3 representative cases (out of 15 cell line – drug combinations) which also demonstrates the efficacy of our proposed hybrid model. These results bolster our claim that the recursive hybrid model is a much superior predictor of the dosetime response behavior compared to the standard RF methodology.
One important property of the hybrid model is that– given an initial doseresponse curve, all doseresponse curves at the subsequent time epochs inherit the theoretical properties of the initial curve (following (13)). Therefore, our model cannot accommodate situations where the properties of doseresponse curve changes with time. This results in the 3 most glaring mismatches between the observed and predicted doseresponse curves for the cell line – drug combinations: MMACSF – PLX4720, MMACSF – Vemurafenib, and SKMEL28 – SB590885. For these cases, the observed doseresponse curves at 72 hours are partially observed sigmoidals where we do not observe the upper asymptotes. Since the Gompertzian kinetics enforces an upper asymptote proportional to the upper asymptote observed at t=0 (following (16)), the model becomes illscaled. Due to this susceptibility of our model to misspecification, a naïve RF model outperforms our model in all three situations, as shown in Fig. 5 (An additional section shows the predictions for all 15 cases [see Additional file 1: Figures S7  S9]). Furthermore, although in 12 out of 15 test cases, our hybrid model outperforms RF, there exists some mismatch between the observed and predicted data. The main reason for this discrepancy is the imprecision associated with the estimates of α_{d,i} & γ_{d,i}. Observe that for all dose points, we are estimating the two Gompertzian parameters from only 3 time points. With such limited data, the point estimates are bound to have large uncertainties, reflected by the mismatches. However, an increase in time points can reduce the imprecisions as illustrated in simulation study, where we have included 8 time points and observed a much closer fit for the predicted curves from the hybrid model to the observed (synthetic) doseresponse curves, even with the presence of noise (Fig. 2).
Model scalability
The limited nature of HMSLINCS datasets is precisely the reason we decided on conceptualizing a hybrid model. As mentioned above, to our knowledge, HMSLINCS is the only publicly available database providing both functional predictor and response data. Several standard ML approaches that have been demonstrated to work well with large pharmacological datasets in literature [10], from where we have chosen Random Forest as our regression model due to its superior predictive capabilities [9, 11] as well as efficient handling of large datasets [14]. Moreover, note that there are only a few functional predictors (i.e., 14) and therefore, in terms of scalability, we could handle a large number of samples with the same number of functional predictors using the standard functiononfunction regression easily without much demand on the computational resources. The computational burden may appear to escalate if we increase the number of functional predictors while keeping the sample size fixed, but this will be significantly alleviated by the RF modeling that connects the Gompertzian parameters with the slopes of the predictors. Therefore, the posited model is easily scalable for both large samplesize and large featuresize without much demand on computational resources. The time aspect is also not problematic since majority of the studies report observations at 72 hours [3, 4]. Given this time horizon, if the temporal resolution is very high, we can always coarsen the resolution at hourly (or daily) scale. The real bottleneck is the number of dose levels which can enforce a large number of optimization operations since the Gompertzian parameters are estimated sequentially at each time point. However, in reality, we do not expect too many dose levels due to the cost involved in data collection at finely resolved dosages.
We, therefore, need to deal with only two effective dimensions of scalability– (a) increase in the number of subjects, and (b) increase in the number of predictors. The model is set up to be simultaneously scalable in both these dimensions. Moreover, the response model for each subject is modeled independently to obtain subjectspecific Gompertzian parameters, and therefore, estimation for each subject is trivially parallelizable. Once the parameter estimates are obtained, an RF model is deployed to connect the covariates with response parameters. To put these above discussion into perspective, we can look into the execution time for hybrid model prediction. For our experiments, the synthetic data case roughly takes ∼ 1.1 sec to fit for 6 subjects and predict dosetime responses for subject 7, while for HMSLINCS data, it takes ∼ 0.785 sec to fit the 7 training cell lines and predict temporal doseresponses for all 3 validation cell lines.
Conclusions
We have developed a recursive hybrid methodology to model dosetime responses of individuals characterized by a set of functional covariates. Instead of directly connecting the observed responses with the observed covariates, we have taken a stepwise approach where the responses are modeled separately according to a parametric specification and the parameters of the response models are connected to the covariates via RF regression models. Empirical results suggest that our model provides significant improvement, in terms of MSPE, as compared to a naïve RF model directly connecting responses with covariates. The main strength of our methodology is that it can incorporate additional information about the expected behavior of the responses while letting machine learning methodology drive the processes on which we have no prior information about their expected behavior. We have theoretically shown some desirable properties of our hybrid model and demonstrated its predictive capabilities.
However, all these properties are contingent on proper specification of initial condition. Since all the later doseresponse curves inherit the properties of the initial curve, our model cannot account for temporal variation in doseresponse curve properties leading to a naïve RF model outperforming it (see Fig. 5). Furthermore, all our predictions are predicated on the availability of a baseline doseresponse curve. To handle the situations where an individual comes along with only a set of baseline covariates (say, gene expression), we propose to specify a regression model connecting the baseline doseresponse curves in the training set with their respective covariates using either a functional regression approach [5] or a fully datadriven functional random forest approach [15]. We can then use the covariates of a new individual to predict the associated baseline doseresponse curve. This will provide us with the necessary initial conditions to utilize our recursive framework to generate the entire collection of doseresponse curves at prespecified time epochs. We propose to investigate this approach in future studies.
We also note that the current model is devised to demonstrate that incorporating some known biological properties can bring about significant improvement in predictions. The model is set up to be computationally fast and easily scalable as number of subjects and predictors increase. However, in order to achieve computational efficiency, we forego the opportunity to perform a full blown likelihood based inference. We are currently investigating a Bayesian hierarchical specification given by
with
where \(\mathcal {TN}(\mu, \sigma ^{2})\) is a truncated Normal distribution on an appropriate compact support, while ε, η are independent Gaussian noises with appropriate supports that match the support of the response. This hierachical model in (6  7) offers a formal stochastic extension of the posited hybrid model and the subsequent posterior analysis will allow us to create posterior predictive bands for the test cases, thereby offering insights into the adequacy of our model.
Methods
The following sections provide a detailed analysis of our proposed Recursive Hybrid Model along with the desired underlying properties.
Model specification
Let y_{t,d,i} be the mean apoptosis fraction at time t and dose d for subject i, d=1,2,⋯,D; t=1,2,⋯,T; i=1,2,⋯,m. We suppress the subscript i for the moment and develop a dosetime model for each individual. A simple parametric model for dosetime responses can be written as
where f(t) is a Gompertz model in time t with α controlling the growth rate and g(d) is the sigmoidal model in dose d with a being the lower asymptotic response at d=0, b being the upper asymptotic response at d=∞, c is often interpreted as EC_{50}, θ is the Hill slope i.e., the slope at the steepest part of the sigmoidal curve and ε is usually assumed to be Gaussian noise. Crosssectionally, the four parameter sigmoidal function g(d) is widely used in doseresponse studies [16–20] while longitudinally, the two parameter Gompertz model f(t) is arguably the most popular growth model in tumor modeling efforts [16, 21–24]. Therefore, both g(d) and f(t) are well suited if we wish to model the corresponding marginal processes separately. However, assuming a separable product model describing the joint process enforces some unjustifiable assumptions. For instance, such a separable model induces dose invariant temporal growth rate which contradicts Haber’s law [7] and also empirical observations suggesting growth rate at higher dose is significantly different from that observed at small dose levels [6].
To remove separability, we can introduce dosedependent parameters in f(t) and timedependent parameters in g(d). However, specification of timedependent parameter makes temporal prediction of responses quite challenging. Furthermore, inclusion of both timedependent and dosedependent parameters in Eq. (8) will make the model heavily overparametrized. Therefore, to guarantee the estimability of these parameters, we need to enforce some dependence among these parameters. We capture this dependence by specifying a temporally recursive model for doseresponse curves which implicitly induces dependence among the parameters. The recursion relation enables us to generate temporal predictions without incorporating the foregoing dependence explicitly.
Gompertzian recursion
Broadly speaking, our strategy is to fit a sigmoidal doseresponse curve at the first observed time epoch and then enforce the subsequent doseresponse curves to follow Gompertzian law at each dose point. Gompertzian kinetics essentially suggests exponential growth with exponentially decaying growth coefficient. Several empirical studies have reported Gompertz model to best fit tumor growth data [23, 25, 26]. Now, in order to effectively reduce the size of tumor, the kill rate induced by the drug should mimic the tumor growth pattern and hence we expect the dynamics of apoptosis fraction to follow Gompertzian kinetics too. To illustrate our conceptualization of Gompertzian law, we drop the subscript d momentarily. We begin with Ricker’s parametrization [27] of Gompertz model given by
where n_{t} is expected value of the trait (number, density, etc.) under consideration at time t, n_{0} is the initial value of the trait giving the starting point on the growth curve, α is the growth coefficient and γ controls the upper asymptote of the growth curve (n_{0} e^{γ}) by scaling the curve vertically (i.e., the scaling coefficient). In our situation, the focal trait is the mean apoptosis fraction. Therefore, our temporal model for the apoptosis fraction is posited as
where y_{0} is viewed as the mean apoptosis fraction observed immediately after drug application. Now, we can insert the dose subscript d in (10) and specify a sigmoidal doseresponse model for y_{0}, i.e.
Inserting (11) in (10) yield a model for expected doseresponse at time t and dose d as
It is easy to see (12) yields the following recursion relation between the apoptosis fractions observed at two consecutive time points at each dose d
Now, observe that if we assume α and γ to be dose invariant in (12), since \(e^{\gamma \left (1 \,  \, e^{\alpha t} \right)}\) is a nonnegative quantity, y_{t,d} inherits the sigmoidal property from y_{0,d}. Therefore, the collection of doseresponse curves over time are sigmoidal in dose and Gompertzian in time. Furthermore, Eq. (13) connects two sigmoidal curves at consecutive time epochs with the Gompertzian law implicitly inducing temporal dependence among sigmoidal parameters. Evidently, the recursion relation allows prediction in time despite having implicit timedependent sigmoidal parameters.
Properties of the recursive model
Observe that in Eq. (12), the assumption of doseinvariant Gompertzian parameters yields an unrealistic scenario. Particularly at d=0, the lower asymptotes of the sigmoidal curves shifts with time. It is hard to justify why apoptosis fraction will change with time even though no drug is administered. To alleviate such drawbacks, we make both growth and scaling coefficients dosedependent. Consequently, (12) becomes
while the recursion relation between two consecutive time points at dose d is still given by (13) after incorporating α_{d}, γ_{d}. To investigate the asymptotic property of (14), we impose the following reasonable constraints

Since apoptosis fraction is expected to increase as drug concentration increases, we assume that both α_{d}\(\left (\in \mathbb {R}\; \forall d \geq 0 \right)\) and γ_{d}\(\left (\in \mathbb {R}^{+}\cup \{0\} \; \forall d\geq 0\right)\) are nondecreasing functions of dose d.

\({{\lim }_{d \, \to \, 0} {\gamma _{d}} = 0}\)

Apoptosis fraction is bounded in [0,1], therefore, we assume that \({{\lim }_{d \, \to \, \infty } {\alpha _{d}} = \alpha ^{*}}\) and \({{\lim }_{d \, \to \, \infty } {\gamma _{d}} = \gamma ^{*}}\) where α^{∗}, γ^{∗} are finite positive constants that satisfy the bounded property of apoptosis fractions.
Under these assumptions, we have the following properties of the doseresponse curve at time t, obtained from (14).

(i)
At d=0, assumption (b) forces the asymptotes of the sigmoidal curves to be stationary at a_{0} regardless of time. This is justifiable since we do not expect to see any change in apoptosis fraction from the baseline a_{0} without the application of a drug, regardless of how much time has elapsed.

(ii)
Given a particular time t, assumption (a) makes \(\phantom {\dot {i}\!}e^{\gamma _{d} \big (1 \,  \, e^{\alpha _{d} t} \big) }\) an increasing function of dose d. If y_{0,d} is also an increasing function of d, then so is y_{t,d} (following (14)) with lower asymptote at a_{0} and upper asymptote at \(b_{t} = b_{0} e^{\gamma ^{*} \big (1 \,  \, e^{\alpha ^{*} t} \big)}\). From assumption (c), α^{∗}, γ^{∗}>0, therefore b_{0}>a_{0}⇒b_{t}>a_{0}.

(iii)
Since b_{t} is an increasing function of time, the upper asymptotes of the doseresponse curves increases with time indicating the expected greater efficacy at higher dose levels. In general, since the doseresponse curves at each time epoch is monotonically increasing in dose (assuming y_{0,d} is increasing in dose), Eq. (14) suggests doseresponse curves at later time points will dominate the curves observed at initial time points. Also, the fact that the apoptosis fraction is bounded above by 1 is easily realized by specifying \(b_{0} < e^{\gamma ^{*}}\).

(iv)
Eq. (13) and assumption (c) suggest that
$$\begin{array}{*{20}l} {\lim}_{t \to \infty} \left[ \frac{y_{t+1, d}}{y_{t, d}} \right] = 1 \end{array} $$(15)indicating that the area under the curve (AUC) of the dose response curves do not become arbitrarily large with time but achieves a theoretical steadystate. A graphical representation of the above properties of our proposed recursive model (Eq. (14)) is shown in Fig. 6.
Regression model for individuals
In the preceding sections, we have discussed the dosetime model for responses observed for a single individual. We have several such individuals, each characterized by a set of protein expressions observed over both dose and time. Let \(x_{t, d, i}^{(p)}\) denote the observed expression for protein p in subject i at dose d and time t, p=1,2,⋯,P, t=1,2,⋯,T, d=1,2,⋯,D, i=1,2,⋯,m. We now connect these \(x_{t, d, i}^{(p)}\) values with the individual dosetime model specified in (14) by introducing the subscript i, which yields
and the recursion relation between consecutive time points for an individual i at dose d is still given by (13) after incorporating the parameters α_{d,i}, γ_{d,i}.
For each individual, we begin with fitting a sigmoidal doseresponse curve at t=0 to estimate a_{0,i}, b_{0,i}, c_{0,i}, θ_{0,i}. We then posit that, for each individual, the growth and scaling coefficients (α_{d,i} & γ_{d,i}) are determined by the rate of change in protein expressions over time i.e., fit a temporal trend model for each \(x_{\cdot, d, i}^{(p)}\) time series and use the slope coefficients\(\beta _{d, i}^{(p)} \, \bigg (\,=\,\frac {dx_{\cdot, d, i}^{(p)}}{dt}\!\bigg)\) as the predictors in two Random Forest (RF) regression models in (17). However, since α_{d,i}, γ_{d,i} are not observed, we plug their estimates in LHS.
The algorithm for the stepwise fitting procedure is given below.
Stepwise fitting algorithm

At t=0, fit a sigmoidal curve to the observed doseresponses for each individual i and obtain the estimates \(\hat {a}_{0, i}\), \(\hat {b}_{0, i}\), \(\hat {c}_{0, i}\), \(\hat {\theta }_{0, i}\) and hence obtain \(\hat {y}_{0, d, i}\).

For each individual in the training set at each dose level, obtain the least square estimates of α_{d,i} & γ_{d,i} using the recursion relation in (13). That is
$$\begin{array}{*{20}l} {}\left(\tilde{\alpha}_{d, i}, \, \tilde{\gamma}_{d, i} \right) \,=\, \underset{\alpha_{d, i}, \, \gamma_{d, i}}{\text{argmin}} \sum_{t = 1}^{T} \!\left[ y_{t, d, i} \,\, \hat{y}_{t^{}, d, i} \, e^{\gamma_{d, i} \left(e^{\alpha_{d, i} t^{}}  e^{\alpha_{d, i} t} \right)} \right]^{2} \end{array} $$(18)where \(\hat {y}_{t^{}, d, i}\) is the dosetime response at the immediate preceding time point t^{−}. Usually, we perform a onestep ahead prediction for a specific time epoch t, considering t^{−}=0 and t=1 in (18), which yields
$$\begin{array}{*{20}l} {}\left(\tilde{\alpha}_{d, i}, \, \tilde{\gamma}_{d, i} \right) = \underset{\alpha_{d, i}, \, \gamma_{d, i}}{\text{argmin}} \sum_{t = 1}^{T} \!\left[ y_{t, d, i} \,  \, \hat{y}_{t^{}, d, i} \, e^{\gamma_{d, i} \big(1 \,  \, e^{\alpha_{d, i}} \big)} \right]^{2} \end{array} $$(19)To satisfy assumption (c), we take the final estimates to be \(\hat {\alpha }_{d, i} = \max \!\left (\tilde {\alpha }_{d, i},\ \tilde {\alpha }_{d^{}, i} \right)\) and \(\hat {\gamma }_{d, i} = \max \!\left (\tilde {\gamma }_{d, i},\ \tilde {\gamma }_{d^{}, i} \right)\) where \((\cdot)_{d^{}, i}\phantom {\dot {i}\!}\) is the estimate of that parameter obtained at immediately preceding dose level d^{−}.

Fit a trend model to the time series of each protein expression for each subject at each dose level and obtain the slope coefficients \(\beta _{d, i}^{(1)}, \beta _{d, i}^{(2)}, \cdots, \beta _{d, i}^{(P)}\).

If the training set consists of m individuals, then at each dose d, define m×1 vectors \(\hat {\boldsymbol {\alpha }}_{d},\ \hat {\boldsymbol {\gamma }}_{d}\) and m×P matrix β_{d} as
$$ {}\begin{aligned} \hat{\boldsymbol{\alpha}}_{d} &= \left[\begin{array}{cccc} \hat{\alpha}_{d, 1} & \hat{\alpha}_{d, 2} & \cdots & \hat{\alpha}_{d, m} \end{array}\right]^{T} \\ \hat{\boldsymbol{\gamma}}_{d} &= \left[\begin{array}{cccc} \hat{\gamma}_{d, 1} & \hat{\gamma}_{d, 2} & \cdots & \hat{\gamma}_{d, m} \end{array}\right]^{T} \\ \boldsymbol{\beta}_{d} &= \left[\begin{array}{cccc} \beta_{d, 1}^{(1)} & \beta_{d, 1}^{(2)} & \cdots & \beta_{d, 1}^{(P)} \\ \beta_{d, 2}^{(1)} & \beta_{d, 2}^{(2)} & \cdots & \beta_{d, 2}^{(P)} \\ \vdots & \vdots & \ddots & \vdots \\ \beta_{d, m}^{(1)} & \beta_{d, m}^{(2)} & \cdots & \beta_{d, m}^{(P)} \end{array}\right] = \left[\begin{array}{c} \boldsymbol{\beta}_{d, 1} \\ \boldsymbol{\beta}_{d, 2} \\ \vdots \\ \boldsymbol{\beta}_{d, m} \end{array}\right] \end{aligned} $$(20)Use β_{d} as the covariate matrix and \(\hat {\boldsymbol {\alpha }}_{d},\ \hat {\boldsymbol {\gamma }}_{d}\) as the output to train the RFs in (17) modeling the Gompertzian parameters.

To predict for a new individual, use the corresponding matrix \(\boldsymbol {\beta }_{d}^{\, \text {new}}\) as input to get \(\hat {\boldsymbol {\alpha }}_{d}^{\, \text {new}},\ \hat {\boldsymbol {\gamma }}_{d}^{\, \text {new}}\). Use the observed baseline dose response for this individual and generate temporal forecast using (16). For the onestep ahead prediction in (19), the recursion relation in (13) simplifies as
$$\begin{array}{@{}rcl@{}} y_{t, d, i} = y_{t^{}, d, i} \, e^{\, \gamma_{d, i} \big(1 \,  \, e^{\alpha_{d, i}} \big)} \end{array} $$(21)
Abbreviations
 AUC:

Area under the curve
 HMSLINCS:

Harvard medical school library of integrated networkbased cellular signatures
 MSPE:

Mean squared prediction error
 RF:

Random forest
References
Shruthi B, Vinodhkumar P, Selvamani M. Proteomics: A new perspective for cancer. Adv Biomed Res. 2016; 5(1):67. https://doi.org/10.4103/22779175.180636.
FallahiSichani M, Moerke NJ, Niepel M, Zhang T, Gray NS, Sorger PK. Systematic analysis of brafv600e melanomas reveals a role for jnk/cjun pathway in adaptive resistance to druginduced apoptosis. Mol Syst Biol. 2015; 11(3):797. https://doi.org/10.15252/msb.20145877.
Śliwka L, Wiktorska K, Suchocki P, Milczarek M, Mielczarek S, Lubelska K, Cierpiał T, ŁyŻwa P, Kiełbasiński P, Jaromin A, et al. The comparison of mtt and cvs assays for the assessment of anticancer agent interactions. PloS one. 2016; 11(5):0155772.
Riss TL, Moravec RA, Niles AL, Duellman S, Benink HA, Worzella TJ, Minor L. Cell viability assays. Assay Guidance Manual. 2016. https://www.ncbi.nlm.nih.gov/books/NBK144065/.
Ramsay J, Ramsay J, Silverman BW. Functional Data Analysis, Springer Series in Statistics: Springer; 2005. https://books.google.com/books?id=mU3dop5wY_4C.
Matlock K, Dhruba SR, Nazir M, Pal R. An investigation of proteomic data for application in precision medicine. In: Biomedical & Health Informatics (BHI), 2018 IEEE EMBS International Conference On. USA: IEEE: 2018. p. 377–80.
Bliss C. The relation between exposure time, concentration and toxicity in experiments on insecticides. Ann Entomol Soc Am. 1940; 33(4):721–66.
Miller FJ, Schlosser PM, Janszen DB. Haber’s rule: a special case in a family of curves relating concentration and duration of exposure to a fixed level of response for a given endpoint. Toxicology. 2000; 149(1):21–34.
Wan Q, Pal R. An ensemble based top performing approach for ncidream drug sensitivity prediction challenge. PloS one. 2014; 9(6):101183.
De Niz C, Rahman R, Zhao X, Pal R. Algorithms for drug sensitivity prediction. Algorithms. 2016; 9(4):77.
Dhruba SR, Rahmanl R, Matlockl K, Ghosh S, Pal R. Dimensionality reduction based transfer learning applied to pharmacogenomics databases. In: 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). USA: IEEE: 2018. p. 1246–9.
Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, Wilson CJ, Lehár J, Kryukov GV, Sonkin D, et al. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012; 483(7391):603–7.
Yang W, Soares J, Greninger P, Edelman EJ, Lightfoot H, Forbes S, Bindal N, Beare D, Smith JA, Thompson IR, et al. Genomics of drug sensitivity in cancer (gdsc): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 2013; 41(D1):955–61.
Genuer R, Poggi JM, TuleauMalot C, VillaVialaneix N. Random forests for big data. Big Data Res. 2017; 9:28–46.
Rahman R, Dhruba SR, Ghosh S, Pal R. Functional random forests with applications in dose response predictions. Sci Rep. 2019; 9(1):1628.
Pal R. Predictive Modeling of Drug Sensitivity. Cambridge: Academic Press; 2016.
Gadagkar SR, Call GB. Computational tools for fitting the hill equation to dose–response curves. J Pharmacol Toxicol Methods. 2015; 71:68–76.
Gesztelyi R, Zsuga J, KemenyBeke A, Varga B, Juhasz B, Tosaki A. The hill equation and the origin of quantitative pharmacology. Arch Hist Exact Sci. 2012; 66(4):427–38.
Knack I, Rohm K. Analysis of cooperative ligandbinding and steadystate kinetic datacomputerprogram for calculating parameters of hill equation with a variable hill coefficient. In: HoppeSeylers Zeitschrift Fur Physiologische Chemie. Genthiner Strasse 13, D10785 Berlin, Germany: Walter de Gruyter & Co.: 1977. p. 262.
Leone FA, Baranauskas JA, Furriel RPM, Borin IA. Sigrafw: An easytouse program for fitting enzyme kinetic data. Biochem Mol Biol Educ. 2005; 33(6):399–403.
Benzekry S, Lamont C, Beheshti A, Tracz A, Ebos JML, Hlatky L, Hahnfeldt P. Classical mathematical models for description and prediction of experimental tumor growth. PLoS Comput Biol. 2014; 10(8). https://doi.org/10.1371/journal.pcbi.1003800.
Norton L. A gompertzian model of human breast cancer growth. Cancer Res. 1988; 48(24 Part 1):7067–71.
Laird AK. Dynamics of tumour growth. Br j Cancer. 1964; 18(3):490.
Casey AE. The experimental alteration of malignancy with an homologous mammalian tumor material: I. results with intratesticular inoculation. Am J Cancer. 1934; 21(4):760–75.
Chignola R, Foroni RI. Estimating the growth kinetics of experimental tumors from as few as two determinations of tumor size: implications for clinical oncology. IEEE Trans Biomed Eng. 2005; 52(5):808–15.
Norton L, Simon R, Brereton HD, Bogden AE. Predicting the course of gompertzian growth. Nature. 1976; 264(5586):542.
Ricker W. Growth rates and models. Fish Physiol. 1979; 8:673–743. Cambridge.
Acknowledgments
Not applicable.
Funding
This work was supported by NIH grant R01GM122084. The publication costs of this article was funded by NIH grant R01GM122084.
Availability of data and materials
For the analysis of the recursive hybrid model, MATLAB codes (with the synthetic data example) are available in the following link: https://github.com/dhruba018/Dose_time_Response_Recursive_Model. The dosetime proteomic expression and corresponding apoptosis data are available through the LINCS data portal: http://lincsportal.ccs.miami.edu/dcicportal.
About this supplement
This article has been published as part of BMC Bioinformatics Volume 20 Supplement 12, 2019: Selected original research articles from the Fifth International Workshop on Computational Network Biology: Modeling, Analysis and Control (CNBMAC 2018): Bioinformatics. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume20supplement12.
Author information
Authors and Affiliations
Contributions
SRD, AR, SG and RP conceived of and designed the experiments. SRD, AR and RR performed the experiments. SRD, AR and SG analyzed the data. SRD, AR, RR, SG and RP wrote the paper. All authors have read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file
Additional file 1
Supplementary information to recursive model for dosetime responses in pharmacological studies. Figure S1. Dosetime response surface for synthetic data without noise. Figure S2. Dosetime response surface for synthetic data corrupted with Uniform noise. Figure S3. Dosetime response surface for synthetic data corrupted with Gaussian noise. Figure S4. Predicted doseresponse curves at t=7 overlaid with the observed doseresponse curves for synthetic data without noise. Figure S5. Predicted doseresponse curves at t=7 overlaid with the observed doseresponse curves for synthetic data corrupted with Uniform noise. Figure S6. Predicted doseresponse curves at t=7 overlaid with the observed doseresponse curves for synthetic data corrupted with Gaussian noise. Figure S7. Predicted doseresponse curves obtained from Hybrid model and individual RF models at 48 hours for cell line K2. The observed doseresponse curves are also overlaid. Figure S8. Predicted doseresponse curves obtained from Hybrid model and individual RF models at 72 hours for cell line MMACSF. The observed doseresponse curves are also overlaid. Figure S9. Predicted doseresponse curves obtained from Hybrid model and individual RF models at 72 hours for cell line SKMEL28. The observed doseresponse curves are also overlaid. (PDF 2220 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Dhruba, S.R., Rahman, A., Rahman, R. et al. Recursive model for dosetime responses in pharmacological studies. BMC Bioinformatics 20 (Suppl 12), 317 (2019). https://doi.org/10.1186/s1285901928314
Published:
DOI: https://doi.org/10.1186/s1285901928314
Keywords
 Drug sensitivity prediction
 Pharmacogenomic studies
 HMSLINCS
 Joint dosetime modeling
 Recursive modeling
 Doseresponse curve
 Tumor growth model
 Gompertz law