COMBImage2: a parallel computational framework for higher-order drug combination analysis that includes automated plate design, matched filter based object counting and temporal data mining

Background Pharmacological treatment of complex diseases using more than two drugs is commonplace in the clinic due to better efficacy, decreased toxicity and reduced risk for developing resistance. However, many of these higher-order treatments have not undergone any detailed preceding in vitro evaluation that could support their therapeutic potential and reveal disease related insights. Despite the increased medical need for discovery and development of higher-order drug combinations, very few reports from systematic large-scale studies along this direction exist. A major reason is lack of computational tools that enable automated design and analysis of exhaustive drug combination experiments, where all possible subsets among a panel of pre-selected drugs have to be evaluated. Results Motivated by this, we developed COMBImage2, a parallel computational framework for higher-order drug combination analysis. COMBImage2 goes far beyond its predecessor COMBImage in many different ways. In particular, it offers automated 384-well plate design, as well as quality control that involves resampling statistics and inter-plate analyses. Moreover, it is equipped with a generic matched filter based object counting method that is currently designed for apoptotic-like cells. Furthermore, apart from higher-order synergy analyses, COMBImage2 introduces a novel data mining approach for identifying interesting temporal response patterns and disentangling higher- from lower- and single-drug effects. COMBImage2 was employed in the context of a small pilot study focused on the CUSP9v4 protocol, which is currently used in the clinic for treatment of recurrent glioblastoma. For the first time, all 246 possible combinations of order 4 or lower of the 9 single drugs consisting the CUSP9v4 cocktail, were evaluated on an in vitro clonal culture of glioma initiating cells. Conclusions COMBImage2 is able to automatically design and robustly analyze exhaustive and in general higher-order drug combination experiments. Such a versatile video microscopy oriented framework is likely to enable, guide and accelerate systematic large-scale drug combination studies not only for cancer but also other diseases. Electronic supplementary material The online version of this article (10.1186/s12859-019-2908-0) contains supplementary material, which is available to authorized users.


Matched Filter
A linear two-dimensional matched filter is used to detect apoptotic-like cells. The detector corresponds to a sliding image patch of size N × N and therefore N 2 filter coefficients. In the form of a N 2 × 1 dimensional column vector r, the matched filter calculates the optimal test statistic for discrimination between the two hypotheses: Here, b denotes the background, s denotes the signal prototype to be detected and n ∼ N (0, C). Given a filter coefficient vector w and a particular local image patch r, the output of the filter is:

Decision Statistics
Using Bayes' theorem, the aforementioned probabilities can be derived as: Thus, inequality (3) becomes: The probability density function for the left part of inequality (5), under the assumption that r ∼ N (m, C), is: Here, m 0 = b and m 1 = s under H 0 and H 1 , respectively. By plugging (6) in (5), we get: SI -1 e − 1 2 (r−s) C −1 (r−s) The right part of inequality (7) represents the threshold τ , which determines H 1 over H 0 . The left part corresponds to the filter coefficient vector defined as:

Starting Points
For each training image i, the fraction x i in percent of the total image area covered by the observed prototypic objects is estimated as: Here, N obs i denotes the number of observed prototypic objects, α obj denotes the area covered by such a prototypic object and α i the total area of the image. Then, the initial detection threshold τ i for image i is determined as: where p 100−x i is the percentile of the "after filtering" intensity distribution that x i belongs to. In other words, the fraction x i of all pixels being above τ i (ideally) corresponds to the N obs i objects. After applying (9) and (10), to all n training images, a set of initial guesses is defined as: S τ = {τ 1 , τ 2 , · · · · · · , τ n } which is split into m user defined equidistant values. The distance between two such consecutive values used to define the step of the employed interval optimization search is defined as: Here, τ step is the step between two consecutive searches, while τ max and τ min denote the maximum and minimum values in S τ , respectively. For the case study, m = 100 was used.

Loss function
A detection threshold τ is evaluated on a set of n images by comparing the number of observed and predicted objects. The corresponding loss function f is defined as: Here, N obs and N pred are n-dimensional column vectors, denoting the number of observed/actual and predicted/detected objects, respectively.

Interval Optimization
The set of initial guesses S τ is exhaustively searched by minimizing f inside the shortest interval I bounding the initial minimum τ * init , thus defined as: where τ L and τ R are the closest left and right thresholds respectively, around τ * init among S τ , where f is increasing. For a particular threshold value τ k in I, the next evaluation at τ k+1 is calculated as: The optimal threshold τ * k is obtained by minimizing f in I, which is then further used for detecting and counting the prototypic-like objects in all filtered images. Thus, the corresponding optimization problem can be expressed as:    Figure S3: Histogram of estimated inter-plate variabilities used in the context of resampling based null hypothesis significance testing with 10000 simulations and 5% probability of false alarm. All inter-plate measurements with variability larger that this threshold were detected as outliers and they were excluded from all subsequent inter-plate analyses. Figure S4: COMBO-Pick generates randomized experimental layouts in 384-well format with the following conventions: (i) B1-O1 are used as blank wells ; (ii) A1-A24, B24-O24 and P1-P24 are not used as destination wells in order to avoid potential drying issues. Thus, only the inner beige part, which consists in total of 308 wells is used by COMBO-Pick for the generation of randomized layouts.

Graphics
(1) (3) (1) (3)     Figure S17: Results of employing K-means clustering in order to group the drug combination effects with respect to their temporal phenotypic behaviors. The clustering procedure was performed 100 times independently for K = {1, 2, · · · 10}. The distributions of the corresponding sum of squared errors (SSE) are shown as box plots on the left side of the figure. The final selection of K was determined by calculating the relative change in SSE when transitioning between consecutive values of K, as illustrated on the right subplot. The smallest K that resulted in SSE drop bigger than 20% compared to the previous value K − 1, was selected. Here, K = 2 was selected.  Figure S18: The choice of K for K-means clustering in the second hierarchical level, shown here, was performed similarly to the first hierarchical level, as described in fig. S17.