Analytics and visualization tools to characterize single-cell stochasticity using bacterial single-cell movie cytometry data

Background Time-lapse microscopy live-cell imaging is essential for studying the evolution of bacterial communities at single-cell resolution. It allows capturing detailed information about the morphology, gene expression, and spatial characteristics of individual cells at every time instance of the imaging experiment. The image analysis of bacterial "single-cell movies" (videos) generates big data in the form of multidimensional time series of measured bacterial attributes. If properly analyzed, these datasets can help us decipher the bacterial communities' growth dynamics and identify the sources and potential functional role of intra- and inter-subpopulation heterogeneity. Recent research has highlighted the importance of investigating the role of biological "noise" in gene regulation, cell growth, cell division, etc. Single-cell analytics of complex single-cell movie datasets, capturing the interaction of multiple micro-colonies with thousands of cells, can shed light on essential phenomena for human health, such as the competition of pathogens and benign microbiome cells, the emergence of dormant cells (“persisters”), the formation of biofilms under different stress conditions, etc. However, highly accurate and automated bacterial bioimage analysis and single-cell analytics methods remain elusive, even though they are required before we can routinely exploit the plethora of data that single-cell movies generate. Results We present visualization and single-cell analytics using R (ViSCAR), a set of methods and corresponding functions, to visually explore and correlate single-cell attributes generated from the image processing of complex bacterial single-cell movies. They can be used to model and visualize the spatiotemporal evolution of attributes at different levels of the microbial community organization (i.e., cell population, colony, generation, etc.), to discover possible epigenetic information transfer across cell generations, infer mathematical and statistical models describing various stochastic phenomena (e.g., cell growth, cell division), and even identify and auto-correct errors introduced unavoidably during the bioimage analysis of a dense movie with thousands of overcrowded cells in the microscope's field of view. Conclusions ViSCAR empowers researchers to capture and characterize the stochasticity, uncover the mechanisms leading to cellular phenotypes of interest, and decipher a large heterogeneous microbial communities' dynamic behavior. ViSCAR source code is available from GitLab at https://gitlab.com/ManolakosLab/viscar. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04409-9.


ViSCAR visualization functions
In Table S1, we provide an overview of ViSCAR visualization functions. Each function can provide different levels of analysis. The first two columns (LT and DT) denote the type of input tree, i.e., the type of analyzed attribute(s), cell attributes for an LT, and cell life attributes for a DT. The rest of the columns represent the levels of analysis, spatial (in the sense of community organization) and temporal. The ✓ and ✗ symbols indicate whether a visualization capability is available or not. The * symbol indicates that the function produces multiple plot(s). The ~ symbol indicates that a visualization capability can be achieved only indirectly (i.e., not specified by the function's input arguments). Specifically, the user must first specify the subpopulation (using select_subtree function) to select a subtree, multiple subtrees, or just not connected nodes of an LT or DT based on multiple selection criteria) and then proceed to create the chosen plot for visual inspection. For example, if the user wants to create a scatter plot for a specific frame then first they must select the cell instants of that frame (using select_subtree) and then proceed with calling the function plot_dot_attr2 with the appropriate arguments.

Case Study I: Basic workflow and single-cell analytics
In this section we present additional data analysis and visualization operations that users can perform using ViSCAR functions at the cell population and single-cell levels.
In Figure S1, we show the scatter plot of cell length vs. cell area, using color to represent the cell's mean fluorescence intensity. We observe a strong correlation between cell length and cell area, as expected in rod-shaped bacteria. This observation is also confirmed by the very high Pearson correlation coefficient (0.96). In Figure S1b and S1c, we provide the violin plots of the cell birth length and cell division length, respectively. The observed bimodality is vanished in violin plot of birth length. Having computed the cell life attributes, we can detect cells subgroups based on several rules. Given the elongation rate (see Figure tade in main text), we can detect slowly growing cells, such as metabolically inactive cells, see Figure   S2. By using the mean and standard deviation method (< x -2s) while inspecting the corresponding histogram ( Figure S2a) we can detect slowly growing cells, since they are expected to be outliers in terms of their elongation rate. This analysis can reveal persister cells, which are slowly growing cells that live for many frames. However, in this dataset, the only one slowly growing cell detected (Σφάλμα! Το αρχείο προέλευσης της αναφοράς δεν βρέθηκε.) cannot be regarded as a persister, as it is tracked for just a few frames because this simple movie ends prematurely. Furthermore, since single-cell length typically increases exponentially [68,69], cells with abnormal growth curves can be spotted by inspecting the histogram of the RMSE of the exponential models fit to the cell length ( Figure S2c). "Abnormal" cells having RMSE > 0.04 were detected; their length time-series (trajectories) were then visualized and compared to the rest of the population ( Figure S2d). This kind of processing can reveal interesting patterns of phenotypes, such as stairstep growth behavior ( Figure S3a). The large fluctuations observed ( Figure  S3b) may be attributed to image analysis imperfections, such as the cell curvature over/underestimation during cell segmentation or cell tracking errors. with RMSE exponential fit > 0.04.
The plot_pdf_attr function can be used in the "auto" mode to find the distribution that best fits a cell ( In Figure S4a we present the best-fit distribution for the cell elongation rate k.
This distribution is the Normal with μ = 0.744 μm/h and σ = 0.188, BIC = 12.384, and ΔBIC values of 7 and 15 when compared to the Gamma and Lognormal distributions, respectively. The Gamma and Lognormal ( Figure S4b and S4c) distributions were then fitted explicitly to compare them to the best-fit (Normal) distribution. We used a common range for the y-axis of the plots to allow automatic visual comparison. The distribution in Figure   It is worth mentioning that the distribution of the cell growth rate k is not normal (symmetrical) in general [71]. Since the difference between the best-fit (Normal) and the Gamma distribution is relatively small (ΔBIC = 7), there is support for that claim in dataset 1. Tables and Figures of the data analysis for Case Study II.

Single-cell attributes variability
For generations:

Synthetic single-cell movie
The single-cell movie depicts the simultaneous growth of eight (8) S.
Typhimurium bacterial populations, each emanating from a single progenitor cell, and was created using an extended version of the CellModeller [50] software tool. By properly extending this tool, we have developed an Individual-Based Model (IBM) and a simulation that mimics QS molecules' (AI2) [76] diffusion in the microenvironment and shows single-cells that either swim or grow and divide, sensing the QS molecules. As a cell grows, it implements a genetic circuit (described as a system of differential equations) driven by QS that controls the interplay of the TTSS-1 with the virulence pathway [78]. This may eventually lead to a phenotypic switching of a cell from non-virulent to virulent (The development of the IBM is beyond the scope of this work). The key proteins participating in the cell's genetic network are the Lsr operon (LsrOP), LsrR and InvF, for QS and T3SS-1 respectively. The simulation also includes probes docked on the right side of an agar plate, synthesizing the QS AI2 signal. In this in silico experiment, we assume that the AI2 -producing probes are uniformly distributed on the surface, controlling AI2 synthesis, bacteria migration, and eventually phenotypic switching. The motile bacteria (initially lying on the plate's left side, opposite to the probes' area) can express TTSS-1 species but cannot produce AI2. As the motile bacteria sense AI2, they diffuse in the environment and swim towards the maximum gradient of AI 2 out . As they approach the probes' area, they start growing and, at some point, start