Combination therapy design for maximizing sensitivity and minimizing toxicity

Matlock, Kevin; Berlow, Noah; Keller, Charles; Pal, Ranadip

doi:10.1186/s12859-017-1523-1

Research
Open access
Published: 22 March 2017

Combination therapy design for maximizing sensitivity and minimizing toxicity

Kevin Matlock¹,
Noah Berlow²,
Charles Keller² &
…
Ranadip Pal¹

BMC Bioinformatics volume 18, Article number: 116 (2017) Cite this article

1908 Accesses
10 Citations
Metrics details

Abstract

Background

Design of personalized targeted therapies involve modeling of patient sensitivity to various drugs and drug combinations. Majority of studies evaluate the sensitivity of tumor cells to targeted drugs without modeling the effect of the drugs on normal cells. In this article, we consider the individual modeling of drug responses to tumor and normal cells and utilize them to design targeted combination therapies that maximize sensitivity over tumor cells and minimize toxicity over normal cells.

Results

The problem is formulated as maximizing sensitivity over tumor cell models while maintaining sensitivity below a threshold over normal cell models. We utilize the constrained structure of tumor proliferation models to design an accelerated lexicographic search algorithm for generating the optimal solution. For comparison purposes, we also designed two suboptimal search algorithms based on evolutionary algorithms and hill-climbing based techniques. Results over synthetic models and models generated from Genomics of Drug Sensitivity in Cancer database shows the ability of the proposed algorithms to arrive at optimal or close to optimal solutions in significantly lower number of steps as compared to exhaustive search. We also present the theoretical analysis of the expected number of comparisons required for the proposed Lexicographic search that compare favorably with the observed number of computations.

Conclusions

The proposed algorithms provide a framework for design of combination therapy that tackles tumor heterogeneity while satisfying toxicity constraints.

Background

Design of drug therapies for cancer have primarily been considered from the perspective of sensitivity prediction using genetic characterizations as the predictor variables [1–3]. The genetic characterization based methodologies have severe limitations when the cancer type shows numerous aberrations among the samples and consequently predicting sensitivity based on similar steady state genetic characterizations provide limited accuracy. We have recently considered the modeling of tumor sensitivity using functional drug response data [4, 5], along with a functional and genetic characterization based integrated modeling [6]. Models were designed based on the in vitro tumor response to a set of drugs with known targets. However, the combination therapy design was based on the model reflecting the average behavior of the tumor tissue [7].

In this article, we incorporate heterogeneity by considering the effect of a drug on various parts of the tumors and incorporate toxicity by considering the effect of the drug on normal cell types. Consider a solid tumor tissue where the biopsy sample can be divided into separate samples to explore the heterogeneity. We can pass each biopsy sample through a drug screen to create a probabilistic target inhibition map (PTIM) [5] model. Let M T ₁,M T ₂,⋯,M T _k denote the k models corresponding to the k spatial tumor biopsies.

For toxicity the affect of the drug is not limited to the same organ that the tumor resides in. To solve this we can pass normal cell cultures from different organs of the body through drug screens to create separate models of different organs to assess toxicity of the drugs. Note that, normal cells from kidney, lungs etc. of a specific cancer patient may not be readily available and thus, the response to drugs can be approximated by using drug screens on normal human based cell lines of kidney, lungs and other organs. The assumption is that variations in normal cell response to different drugs over different patients are smaller compared to tumor cell response over different patients. The normal cell response to different drugs can vary significantly for cells belonging to different organs in the body. Let M N ₁,M N ₂,⋯,M N _p denote p models corresponding to p different normal cell types.

The goal of the combination therapy design will be to select a set of drugs that will maximize the sensitivity over heterogeneous tumor models M T ₁,M T ₂,⋯,M T _k and minimize sensitivity over normal cell type models M N ₁,M N ₂,⋯,M N _p. Note that currently available combination therapy design techniques are model free and require multiple experimental iterations to arrive at the optimal strategy [8–15]. This article considers model based combination therapy design over multiple models of tumor and normal cell lines.

We utilize the constrained structure of tumor proliferation models to design an accelerated lexicographic search algorithm for generating the optimal solution. For comparison purposes, we also designed two suboptimal search algorithms based on evolutionary algorithms and hill-climbing based techniques. We test the performance of our algorithms on synthetic models and models generated from Genomics of Drug Sensitivity in Cancer (GDSC) database [16]. Utilizing the model structure in the search process allows us to arrive at optimal or close to optimal solutions in significantly lower number of steps as compared to exhaustive search. The article also presents the theoretical analysis of the expected number of comparisons required for the proposed optimal Lexicographic search that compare favorably with the observed number of computations.

The paper is organized as follows: The model representation is discussed in “Methods” section. “Algorithms” section discusses the proposed lexicographic search algorithm along with suboptimal Genetic algorithm and Hill climbing approaches. “Results and discussion” section presents the results followed by Conclusions in “Conclusions” section.

Methods

Model type

In this section, we provide a brief review of the model used to represent each spatial tumor biopsy or normal cell line. A Probabilistic Target Inhibition Map (PTIM) model provides an estimate of sensitivity for all possible target inhibitions. Consider the example PTIM model with 3 targets k ₁, k ₂, k ₃ shown in Fig. 1 where the values for each cell represent the sensitivity corresponding to that specific inhibition. For instance, inhibition of k ₃ alone will produce a sensitivity of 0.75. Since the commonly used targeted drugs inhibit oncogenes, we consider the targets to be all oncogenes and inhibition of more oncogenes can only cause the sensitivity to remain same or increase. For instance, since the inhibition of k ₃ alone produces a sensitivity of 0.75, all supersets of that inhibition ([k ₃,k ₁], [k ₃,k ₂], [k ₃,k ₁,k ₂]) will have sensitivity ≥0.75. Similarly, any subset of known inhibition will have sensitivity less than or equal to the observed value. Based on these two biological constraints and limited drug perturbation experiments, we can arrive at an inferred PTIM model that can provide an estimate of sensitivity for all possible target inhibitions. The details of the model are available at [4–6] along with biological validation at [17]. Note that a PTIM can also be approximately represented as a tumor proliferation circuit as shown in Fig. 2 where the tumor proliferation can be restricted by inhibiting at least one series block. For instance, inhibition of the [k ₁,k ₂] block will provide a sensitivity of 0.95 whereas inhibition of k ₃ will provide a sensitivity of 0.75. Inhibiting more than the minimum will produce higher sensitivities that are given by the original map shown in Fig. 1.

Structure of tumor and normal cell models

Based on the previously discussed model structure, each of the k tumor models will be represented as a probabilistic target inhibition map that can also be approximated by a circuit representation of series of parallel blocks as shown in Fig. 3.

In Fig. 3, the number of blocks for models M T _i for i=1,⋯,k and M N _j for j=1,⋯,p are denoted by n _Ti and n _Nj respectively. Every model is composed of five blocks connected in series. Each block, b, contains a set of targets T _bi (up to a maximum of 5 targets), that are connected in parallel. Thus each model can have up to 25 targets.

Optimization objectives

For optimization, we consider both the worst case and best expected scenario. Let O(M T _i,ϕ) denote the sensitivity of Tumor model i for i=1,⋯,k with inhibition ϕ. Let O(M N _i,ϕ) denote the sensitivity of normal model i for i=1,⋯,p with inhibition ϕ.

Worst case optimization (WCO): We desire high sensitivity over the tumor cell lines and low sensitivity over the normal cells which can be formulated in the worst case scenario as maximizing the minimum sensitivity over the tumorous cells while maintaining the maximum sensitivity over the normal cells below a certain threshold θ ₁.

i.e. maxϕ(mini[O(M T _i,ϕ)]) while maxi [ O(M N _i,ϕ) ] ≤θ ₁

Best expected optimization (BEO): In this scenario, our goal will be to maximize the average sensitivity over the tumorous cells while maintaining the average sensitivity over the normal cells below a threshold θ ₂ i.e. $ \max _{\phi } (\overline {O({MT}_{i}, \phi)})$ while $\overline {O({MN}_{i}, \phi)} \leq \theta _{2}$.

Algorithms

Lexicographic search algorithm

In order to find the optimal target set for our problem, we can exhaustively search through all possible target combinations for a given toxicity threshold. Normally, for T targets this would require searching through 2^T combinations, which is not computationally feasible for large T. However, we can utilize the monotonic relationships of PTIM models to our advantage to reduce the number of search steps. Given a set of targets S ₁ whose toxicity (i.e. sensitivity over normal cell lines) is greater than a given threshold θ ₁, then all possible supersets of S ₁ will also have a toxicity ≥θ ₁ and thus, there is no need to search through the supersets of S ₁. Note that this is only valid when all the targets are oncogenes.

To take advantage of this property, we perform a branching Lexicographical Search of the solution space. We can view the solution space as a directed graph where each node of the graph is our target set represented as a binary string with T bits. Each edge of the graph corresponds to turning on one bit to the right of the least significant bit, creating a superset of that node. If the toxicity at a node exceeds the threshold, then there is no need to continue along the associated edges and we should instead trace back to the previous node. A recursive algorithm to perform this search is shown in Algorithm 1. A demo using four targets is shown in Fig. 4. Note that in Fig. 4, we are assuming that the sensitivity over normal cell lines exceed the given threshold for target set [1100] and thus its supersets consisting of [1110], [1101] and [1111] marked by dotted lines are excluded from the search process.

Lexicographic search analysis

In this section, we consider the stochastic analysis of the proposed Lexicographic Search to generate the expected number of searches. Let C D F(l,θ)=f(l,θ) denote the probability that a normal cell model will have sensitivity ≤θ for a random inhibition of l targets. We then define g _P(l,θ) as the probability that we will not exceed sensitivity θ while targeting l random inhibitions in a set of P cells. For the worst case scenario, $g^{wco}_{P}(l,\theta)$ is the probability that the maximum sensitivity over the normal cells is below threshold θ. Considering, independence of the normal cell sensitivities, we have:

$$\begin{array}{*{20}l} g^{wco}_{P}(l,\theta) &=Pr({max}_{i}(O({MT}_{i})) \leq \theta) \\ &= Pr(O({MT}_{1}) \leq \theta, O({MT}_{2}) \leq \theta, \cdots, \\ &\quad\; O({MT}_{P}) \leq \theta) \\ &= Pr(O({MT}_{1}) \leq \theta) \bullet Pr(O({MT}_{2}) \leq \theta) \cdots \\ &\;\;\;\; \bullet Pr(O({MT}_{P}) \leq \theta) \\ &= f(l,\theta)^{P} \end{array} $$

(1)

For the best expected scenario, let us consider the probability density function of observing a sensitivity of θ after l inhibitions $PDF(l,\theta) = \frac {\partial f(l,\theta)}{\partial \theta }$. Let X denote the random variable with P D F(l,θ) and Z denote the sum of P such random variables. The probability density function of Z denoted by q _P(l,θ) can be calculated by repeatedly convolving P D F(l,θ) with itself for P−1 times and is given by

$$ q_{P}(l,\theta) = PDF(l,\theta) \ast PDF(l,\theta) \dots \ast PDF(l, \theta) $$

Let Y denote the random variable denoting the average sensitivity over P cell lines with l random inhibitions. The probability density function of Y is given by

$$\begin{array}{*{20}l} {pdf}_{Y}(\theta) &= h_{p}(l,\theta) = P * q_{P}(l, P\theta) \end{array} $$

(2)

We can then estimate the cumulative density function of Y, $g^{beo}_{P}(l,\theta)$ by integrating across θ:

$$ g^{beo}_{P}(l,\theta) = \int_{0}^{\theta} h_{P}(l,u) du $$

Expected savings

Define A _i to denote the event that the sensitivity over P normal models with i random inhibitions ≥θ, i.e. P r(A _i)=1−g _P(i,θ). Let L _i denote the event of stoping at level i of the Lexicographic Search where i represents the number of bits we are searching through. The probability of event L _i is given by:

$$\begin{array}{*{20}l} P(L_{i}) & = P\left(A_{i} \cap A_{i-1}^{C} \cap A_{i-2}^{C} \dots \cap A_{1}^{C}\right) \\ & = P\left(A_{i} | \bigcap_{j=1}^{i-1} A_{j}^{C}\right) P\left(\bigcap_{j=1}^{i-1} A_{j}^{C}\right) \end{array} $$

(3)

We note that:

$$ P\left(A_{i} | \bigcap_{j=1}^{i-1} A_{j}^{C}\right) = 1 - P\left(A_{i}^{C} | \bigcap_{j=1}^{i-1} A_{j}^{C}\right) $$

(4)

By applying Bayes’ theorem, we can simplify further:

$$\begin{array}{*{20}l}{} P\left(A_{i}^{C} | A_{i-1}^{C} \cap A_{i-2}^{C} \cap \dots A_{1}^{C}\right) &= \frac{P\left(A_{i}^{C}\right)P\left(\bigcap_{j=1}^{i-1} A_{j}^{C}|A_{i}^{C}\right)}{P\left(\bigcap_{j=1}^{i-1} A_{j}^{C}\right)} \\ &= \frac{P\left(A_{i}^{C}\right)}{P\left(\bigcap_{j=1}^{i-1} A_{j}^{C}\right)} \end{array} $$

(5)

By combining Eqs. 3, 4 and 5, we have:

$$ P(L_{i}) = g(i-1,\theta) - g(i,\theta) $$

(6)

To find the expected savings, we note that by stopping at L _i, we search through $\sum _{j=0}^{i} {{T}\choose {j}}$ combinations. Thus, the expected savings E(S) is given by :

$$\begin{array}{*{20}l} E(S) &= \sum_{i=1}^{T}P\left(A_{i} \cap A_{i-1}^{C} \cap \dots A_{1}^{C}\right)\left[2^{T} - \sum_{j=0}^{i} {{T}\choose{j}}\right] \end{array} $$

(7)

$$\begin{array}{*{20}l} &= \sum_{i=1}^{T}\left[g(i-1,\theta) - g(i,\theta)\right]\left[2^{T} - \sum_{j=0}^{i} {{T}\choose{j}}\right] \end{array} $$

(8)

Genetic algorithm based search

Pareto optimality

We consider a multi-objective optimization scenario where we maximize sensitivity over tumor cells and minimize sensitivity over normal cells. For worst case optimization scenario, if therapies ϕ ₁ and ϕ ₂ satisfy the following relation: m i n _1≤i≤k[ O(M T _i,ϕ ₁)]≥m i n _1≤i≤k[O(M T _i,ϕ ₂)] and m a x _1≤i≤p[ O(M N _i,ϕ ₁)]≤m a x _1≤i≤p[ O(M N _i,ϕ ₂)] with either m i n _1≤i≤k[O(M T _i, ϕ ₁)]>m i n _1≤i≤k[O(M T _i,ϕ ₂)] or m a x _1≤i≤p[O(M N _i,ϕ ₁)]<m a x _1≤i≤p[O(M N _i,ϕ ₂)], then therapy ϕ ₁ is considered to dominate ϕ ₂ from the multi-objective Pareto sense. The therapies that are not dominated by any other therapy will form the Pareto efficient front.

Algorithm

Genetic Algorithms (GA) are inspired by evolutionary theory where strong species have a higher opportunity to pass their genes to offspring via reproduction and weaker chromosomes are eliminated by natural selection [18, 19]. Each generation or population consists of diverse individuals or chromosomes and in our Genetic Algorithm based Combination Therapy design (GACT), each therapy ϕ is regarded as a chromosome comprised of different target inhibitions. These target inhibitions are binary variables with values of 0 (non-inhibited) or 1 (completely inhibited). In order to select the best solutions (therapies) for the next generation, the fitness of each solution is computed. The therapies with the best fitness (our Pareto front) will be selected as the parents of the next generation. During each reproduction process, crossover and mutation operators are applied for the purpose of generating new solutions from existing ones. Mutation is performed by randomly flipping inhibition values of our targets. Crossover is performed by randomly picking values between two different target sets. For example, if we take the two target sets [a ₁,a ₂,a ₃] and [b ₁,b ₂,b ₃] a crossover between them can be performed by considering:

$$[c_{1}, c_{2}, c_{3}] = [a_{1}, b_{2}, a_{3}] $$

Based on our starting set of targets (M), we form the initial population P ₀ of N random target inhibition profiles. After calculating the fitness functions for the existing population, we calculate different Pareto front layers according to their dominance relationships. The top Pareto optimal points are selected to pairwise conduct crossover and mutations to form offsprings. Here we have set the number of offspring to be at least twice the number of points in our Pareto fronts with a minimum of O f f m i n=1000 and a maximum of n O f f m a x=15,000 offsprings. After merging these offsprings with their parent population P _t−1, we extract top N therapies to generate population P _t. We iterate our algorithm until we have achieved totalG generations or the number of offspring is greater than nOffmax. Note that evolutionary algorithms like GA will not guarantee convergence of the Pareto front but the performance of our therapies will improve if the Pareto front moves towards our desired direction with subsequent GA iterations. The detailed procedure for multi-objective GACT is shown as Algorithm 2. Figure 5 illustrates how the algorithm moves our pareto front towards better solutions across successive iterations. After running the GACT, we consider the final Pareto front and pick the target set that provides the maximum sensitivity over tumor cell lines when the toxicity is below threshold θ ₁.

Random restart hill climbing

We consider an additional suboptimal algorithm based on Hill Climbing to search the target space. Hill Climbing is an iterative method for finding the local maximum for any arbitrary function. Given a starting point, the algorithm considers all the nearest neighbors and then selects the neighbor that provides the best solution for the given optimization criteria. These steps are subsequently repeated until there are no neighbors that provide a better solution or a maximum number of iterations have been reached. While simple and effective in finding local optimum, Hill Climbing will rarely find the global optimum for non-convex functions. In order to overcome this handicap, we will randomly restart our search to a new random position whenever the algorithm converges to a local optimum.

In our case, the starting criteria will be a random set of targets chosen using latin hypercube numbers and each neighbor will be created by inhibiting or un-inhibiting a single target. As shown in Algorithm 3, our optimization criteria will change depending on the toxicity of our current set. If the toxicity is greater than the given threshold then we choose the neighbor with the least toxicity, otherwise we pick the neighbor with the highest sensitivity whose toxicity is below the threshold. When no improvements can be found among the neighbors we randomly choose a new set of targets. This is repeated until we have completed maxIter iterations.

Results and discussion

To evaluate the performance of our algorithms, we considered both synthetic models and models based on experimental datasets.

Synthetic model generation

The synthetic models are simulated using a proliferation network structure based on probabilistic target inhibition maps [5, 7]. Each cellular pathway, i, representing either a tumor or normal cell model is modelled by connecting a set of blocks in series. The number of blocks for models M T _i for i=1,⋯,k and M N _j for j=1,⋯,p are denoted by n _Ti and n _Nj respectively. Within each block, b, contains a set of targets T _bi (up to a maximum of 5 targets), that are connected in parallel. Since the targets are in parallel, the effective inhibition for each block given a set of target inhibitions, ϕ, is the minimum inhibition of the given targets within the block. Thus, the effective inhibition of block b in model M T _i with target inhibition ϕ is given by λ(M T _i,b,ϕ)= min(∀ϕ∈T _bi).

Each block is also given a score, S _bi, randomly using a uniform distribution with a minimum of 0.5 and maximum of 1. Finally, the overall sensitivity of the pathway can be computed using the following equation where we assume independence between the series blocks:

$$Sensitivity({MT}_{i},\phi) = 1-\prod_{b=1}^{N_{Ti}}(1-S_{bi} \lambda ({MT}_{i},b,\phi)) $$

Similary for normal cells:

$$Sensitivity({MN}_{i},\phi) = 1-\prod_{b=1}^{N_{Ni}}(1-S_{bi} \lambda ({MN}_{i},b,\phi)) $$

A representation of k tumor and p normal models as series of parallel target blocks is shown in Fig 3.

The synthetic model set consists of a total of 1000 synthetic pathways, 500 normal and 500 cancerous. A total of 25 targets are examined and all targets are equiprobable in both the cancer and normal pathways. We group the pathways into 100 groups where each group has 5 normal pathways and 5 cancerous pathways. From every group, we consider nNormal normal pathways and nTumor cancerous pathways.

GDSC data

In order to test our algorithms on biological functional data, we have utilized the GDSC database [16] to generate a set of PTIM models for 20 different cell lines. These cell lines were segregated into groups of 10, the first group is composed of breast-cancer cell lines and the second group is B-cell lymphoma cancer cell lines. A list of the cell lines is shown in Table 1. For each of the cell lines, we considered the IC50 values for 32 drugs and combined with the corresponding drug panels generated a PTIM model. The drug panels contained the K _d values for 404 targets and 62 of these targets were found to correspond to the PTIM model of at least one of the cell lines, 42 targets in the breast cell lines, 49 in the lymphoma and 27 targets where found in both the breast and lymphoma cell lines.

Table 1 Cell lines used in GDSC dataset

Combination therapy design for maximizing sensitivity and minimizing toxicity

Abstract

Background

Results

Conclusions

Background

Methods

Model type

Structure of tumor and normal cell models

Optimization objectives

Algorithms

Lexicographic search algorithm

Lexicographic search analysis

Expected savings

Genetic algorithm based search

Pareto optimality

Algorithm

Random restart hill climbing

Results and discussion

Synthetic model generation

GDSC data

Performance comparisons

GACT parameter selection

Hill climbing parameter selection

WCO results

BEO results

Computational complexity

Estimated number of searches

Conclusions

Abbreviations

References

Acknowledgments

Funding

Availability of data and materials

Authors’ contributions

Competing interests

Consent for publication

Ethics approval and consent to participate

About this supplement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us