### Comparisons with existing tools

To evaluate the performance of *cellXpress*, we considered several other alternative free biological image analysis software platforms (Figure 1). The functions of many of these platforms may be extended through third-party plugins or custom scripting/programming. However, most biological scientists will have limited resources or expertise in developing such custom plugins or programs. Therefore, we only considered built-in functions or plugins that are bundled with default installation packages. We chose to compare the performance of *cellXpress* (version pro 1.0) [27] to the Broad Institute's CellProfiler (version 2.0) [28] because they have the most similar functions (Figure 1). We also included NIH's ImageJ (version 1.47) [29] with plugins from the Fiji package [30] because it is a standard image analysis tool and widely used by biological scientists. We focused on evaluating the processing speed, cell segmentation accuracy, and profile clustering performance of these software packages.

### Applications to Kc167, HT29, and HeLa datasets

To evaluate cell segmentation performance of *cellXpress*, we used two standard image benchmark datasets, namely Kc167 and HT29, which represent different cell types and numbers of image frames [31, 32]. The first dataset was collected from a *Drosophila melanogaster* cell line, Kc167. We used the dataset's DNA marker for detecting nuclear regions, and actin marker for detecting cellular regions. This dataset has three image frames (an image frame refers to an imaging position in a well), but we only used one of them for testing cell segmentation speed to mimic the situation when computation cannot be parallelized at the image-frame level. Each image has a resolution of 1000 × 1006 pixels, and there are ~200 cells per frame. The second dataset [33] was collected from a human colon cancer cell line, HT29. We only used the dataset's DNA marker for detecting nuclear regions, and actin marker for detecting cellular regions. The dataset was generated in a shRNA screen for finding mitotic gene regulators [33]. It has 56 image frames, and was used to test cell segmentation when computation may be parallelized at the frame-level. Each image has a resolution of 512 × 512 pixels, and there are ~100 cells per frame. We followed the procedures recommended on the CellProfiler website [31], and used the original images and the provided pipeline without any further image pre-processing.

To evaluate the phenotypic-profiling performance of *cellXpress*, we used an image dataset from a previous high-throughput siRNA screen [34, 35] on HeLa cells stained for DNA, tubulin, and actin markers. The dataset was generated by transfecting HeLa cells with a genome-wide siRNA library for 48 hours, and used to predict functions of genes based on their knockdown phenotypes. There are four 670 × 510 pixel image frames per gene knockdown, each of which has around 50 cells. siRNAs for a non-human gene, renilla luciferase (Rluc), were used as negative controls. We selected 32 genes, which can be categorized into four groups representing structural components of actins or microtubules, or the synthesis machineries for RNAs or proteins (Additional file 1). The RNA and protein synthesis genes were selected from genes encoding the subunits of RNA polymerase II and ribosome, respectively. Microtubule structural components were selected from the α-tubulin, β-tubulin and γ-tubulin families. For structural components of actins, we included three actin isoforms (alpha, beta and gamma) and genes from the spectrin family, which are actin-crosslinking proteins that link the plasma membrane to the actin cytoskeleton [36].

### Evaluation criteria for segmentation accuracy

We used two different segmentation accuracy criteria: the boundary and Rand error indices [37]. The boundary error index ({E}_{\mathsf{\text{boundary}}}) measures the averaged distance between the boundaries of cellular masks obtained from manual and automated segmentation, respectively. Smaller boundary error index values mean higher automated segmentation accuracy. We define the boundary error index between two sets of boundary pixels (B and {B}^{\prime}) from a manual segmentation mask (M) and an automated segmentation mask ({M}^{\prime}), respectively, to be:

{E}_{\mathsf{\text{boundary}}}\left(M,{M}^{\prime}\right)=\frac{1}{\left|B\right|}{\displaystyle \sum _{b\in B}}\underset{{b}^{\prime}\in {B}^{\prime}}{\text{min}}\left\{\parallel b-{b}^{\prime}{\parallel}^{2}\right\},

where b and {b}^{\prime} are individual pixels within sets B and {B}^{\prime}, respectively; \left|\cdot \right| is the cardinality operator; and \u2225\cdot \u2225 is the Euclidean norm.

We also used the Rand error index [37], which measures the frequency with which the two segmentation masks disagree over whether a pair of pixels belongs to same or different segmented cellular regions. Let the set of labelled regions in a manual segmentation mask be L=\left\{{R}_{i}\right\} and the set of labelled regions in an automated segmentation mask be {L}^{\prime}=\left\{{R}_{j}^{\prime}\right\}, where {R}_{i} and {R}_{j}^{\prime} are the *i*-th and *j*-th connected pixels within the respective masks. Furthermore, we denote c as the number of pixel pairs in M that belongs to the same sets in L and the same sets in {L}^{\prime}, and d as the number of pixel pairs in M that belongs to different sets in L and different sets in {L}^{\prime}. Then, the Rand error index is:

{E}_{\mathsf{\text{Rand}}}\left(M,{M}^{\prime}\right)=1-\frac{c+d}{\left(\begin{array}{c}\hfill N\hfill \\ \hfill 2\hfill \end{array}\right)}.

where *N* is the total number of pixels in the segmentation mask *M*.

### Generation of phenotypic profiles for HeLa dataset

To construct phenotypic profiles for HeLa cells, we first segmented the dataset using *cellXpress*. Actin and tubulin were used as cell markers and DNA as a nuclear marker for the watershed algorithm. Then, we measured the morphology, intensity, intensity ratio, and pixel-level intensity correlation features for actin and tubulin in the whole cell, nuclear and non-nuclear regions; and for DNA in the nuclear region only. In total, we measured 290 features for every cell (Additional file 2). Then, we constructed three different types of phenotypic profiles for the dataset. The first type of profiles is based on the arithmetic mean of each feature across all cells that have been treated with a specific siRNA. The second type of profiles is based on principal component analysis (PCA) [38]. We kept the number of principal components needed to explain 95% of the variation in our data, and used the scores vector as the phenotypic profiles. The last type of profiles is the SVM-based "d-profiles" [2] (see **Implementation Section**).

### Evaluation criteria for phenotypic profiling

To evaluate the performance of these three phenotypic profiling methods, we measured the intra-group and inter-group dissimilarities for the four groups of siRNAs (Additional file 1). Other criteria based on centroids or medoids of the groups are not suitable for this dataset, because most of the profiles have highly-asymmetrical and non-Gaussian-like distributions. We computed the cosine dissimilarity between two profiles {g}_{r} and {g}_{s} as:

d\left({g}_{r},{g}_{s}\right)=1-\frac{{g}_{r}{g}_{s}^{T}}{\sqrt{\left({g}_{r}^{T}{g}_{r}\right)\left({g}_{s}^{T}{g}_{s}\right)}},

where {g}^{T} is the vector transpose of g. To determine the average 'compactness' of profiles within a group, we computed the average maximum intra-group dissimilarity score as:

{D}_{\mathsf{\text{intra}}}=\frac{1}{N}{\displaystyle \sum _{j=1}^{N}}\underset{{g}_{r},{g}_{s}\in {G}_{j}}{\text{max}}\left\{d\left({g}_{r},{g}_{s}\right)\right\},

where {G}_{j} is the set of all profiles in the *j-* th group, and N is the total number of groups.

To determine the average inter-group profile dissimilarity, we first sorted all pair-wise dissimilarities between profiles from two different groups, {G}_{j} and {G}_{k} from the lowest to the highest, where {d}_{1}<{d}_{2}<{d}_{3}<{d}_{4}<\dots, and {d}_{i}=d\left({g}_{r},{g}_{s}\right) for all {g}_{r}\in {G}_{j} and {g}_{s}\in {G}_{k}. For a *n*-nearest neighbours analysis, we denote the set of *n* lowest distances between two groups, {G}_{j} and {G}_{k}, as {W}_{jk}\left(n\right)=\left\{{d}_{1},{d}_{2},{d}_{3},\dots ,{d}_{n}\right\}. Then, the inter-group profile dissimilarity for the *n*-nearest neighbours is:

{D}_{\mathsf{\text{inter}}}=\frac{2}{N\left(N-1\right)}{\displaystyle \sum _{j=1}^{N}}{\displaystyle \sum _{k\ne j}}E\left({W}_{jk}\left(n\right)\right),

where *E*() is the mean operator. This evaluation is repeated for different values of *n*.

#### Computer software and hardware platforms

The evaluations were performed on a desktop computer with a Intel Core i7 3.07 GHz processor, 8 GB of memory, 64-bit Windows 7 operating system, and Java version 7 Update 9 (build 1.7.0_09-b05). All image and data files were stored in a local harddrive. For the evaluation of processing speed and segmentation accuracy, we implemented a script in Matlab version R2007b (Mathworks, USA) to compute and compare both the boundary and Rand error indices. For the evaluation of phenotypic profiling, we generated multidimensional scaling (MDS) plots for all the constructed profiles using the MASS [40] and the rgl libraries [41] under the R computing environment (version 2.14.2).