Correlation of cell membrane dynamics and cell motility

Background Essential events of cell development and homeostasis are revealed by the associated changes of cell morphology and therefore have been widely used as a key indicator of physiological states and molecular pathways affecting various cellular functions via cytoskeleton. Cell motility is a complex phenomenon primarily driven by the actin network, which plays an important role in shaping the morphology of the cells. Most of the morphology based features are approximated from cell periphery but its dynamics have received none to scant attention. We aim to bridge the gap between membrane dynamics and cell states from the perspective of whole cell movement by identifying cell edge patterns and its correlation with cell dynamics. Results We present a systematic study to extract, classify, and compare cell dynamics in terms of cell motility and edge activity. Cell motility features extracted by fitting a persistent random walk were used to identify the initial set of cell subpopulations. We propose algorithms to extract edge features along the entire cell periphery such as protrusion and retraction velocity. These constitute a unique set of multivariate time-lapse edge features that are then used to profile subclasses of cell dynamics by unsupervised clustering. Conclusions By comparing membrane dynamic patterns exhibited by each subclass of cells, correlated trends of edge and cell movements were identified. Our findings are consistent with published literature and we also identified that motility patterns are influenced by edge features from initial time points compared to later sampling intervals.


Background
Cellular populations exhibit phenotypic heterogeneity across various physiological and pathological processes. The causative factors range from biological noise to complex distinct states of cell functions. Different approaches have been reported to study cellular heterogeneity from different fronts. Morphological responses to perturbations in cellular environments have been characterized by patterns of signaling marker colocalization from high content images [1]. Cellular heterogeneity through FACS (fluorescence activated cell sorting) has been captured to provide a large number of cell read outs, but without any spatial information [2]. Earlier studies have profiled cell subpopulations from fluorescent images by computing dynamic features of the cells along with static features by using unsupervised clustering [3]. Cellular morphology is a highly dynamic entity and time-lapse high-content imaging of cells provides an unprecedented opportunity to understand the mechanisms of morphodynamics. Morphodynamics is defined as a correlation of cell morphology and the underlying functional activity with respect to time [4]. This concept has enabled the discovery of functionality of specific biomolecules and demanded new techniques for interpretability, accuracy, and speed. Extensive research has been performed in understanding and application of morphodynamics of cell edges. High throughput analysis of cell morphodynamics has been used to discover functions of specific proteins [5]. A series of studies using quantitative fluorescent speckle microscopy have revealed the power of computer assisted high throughput analysis of time-lapse microscopy images: an analysis of the number of speckles suggested distinct regulation of actin polymerizationdepolymerization dynamics in different intracellular regions [6,7]. The ratio of protrusive to inactive cell perimeter has been used as the measure of cell edge activity [8]. Difference of the cell membrane boundary was reported in the study of cell spread dynamics [9] and its role in actin transport for protruding lamellipodia [10], formation of filopodia downstream of SCAR (Suppressor of cAMP receptor) [11], and the role of cofilin as a promoter of actin polymerization leading to protrusion [12]. Alternatively protrusion rates are measured at multiple locations of the cell boundary. The morphological changes have been studied by placing markers in the cell boundary at regular intervals and tracking their displacement in orthogonal directions to the cell boundary [13]. Instead of direct displacement of tracking, cell boundaries can be analyzed with kymographs [14]. This technique involves high resolution time-lapse microscopy to capture subcellular motion which is widely used for relatively small sample sizes due to highly magnified imaging and for relatively short periods of time. However, these approaches are not suitable for high throughput applications due to computational complexity compounded by elaborate cell shapes and its ever changing dynamics.
In this work, we propose novel morphodynamics concepts to quantify the relationship between whole cell movement and edge dynamics. Whole cell movement as a function of space and time and its possible influence on protrusion retraction dynamics have not been studied in detail. Heterogeneous populations exhibiting characteristic protrusion and retraction patterns have been completely exploited by us in order to identify possible correlations with motility features. Such information is helpful in determining overall motility functions of cells in collective migration. Cell membrane movements are extracted and protrusion/retraction dynamics along the cell edges at different time points were obtained to correlate with whole cell motility features. An approach to extract such patterns from heterogeneous cell populations is presented. Our experiments show that the cells with similar kinetic profiles display different edge movements and that features observed in initial time points have profound influence in determining the type of motility patterns as the cell adapts to its motion.

Results and discussion
Dataset Cells used in this experiment were mouse macrophage cell lines IC-21 (American Type Culture Collection (ATCC) TIB-186) treated with solvent DMSO (Dimethyl sulphoxide). Cells were observed over a period of 120 minutes and 12-bit images with 0.5 µm 2 pixels were collected using Cellomics KineticScan at every 10 minutes giving a total of 12 snapshots. Data and statistical analysis were implemented in MATLAB R2008a (The Mathworks, Inc., USA) and R project [15].

Cell identification and tracking
Cells are bright objects protruding from a relatively uniform dark background in microscopic images. The purpose of segmentation is to identify cells accurately in an automated manner. Segmentation algorithms cluster image pixels based on their features into two groups representing objects of interest and background. Simple methods like thresholding do not work because they are not robust to noise and artifacts of images as well as images with overlapping cells [16]. Methods such as region growing, watershed, clustering and active contours have been attempted on cellular images [17][18][19]. However, these methods fail on images composed of overlapping or clustered cells. Cell segmentation is crucial to this work since tracking and subsequent analyses depend on the segmentation results. In our analyses, active contour without edges was used since it is not dependent on initialization, noise and boundary leakage by using intensity gradients [20,21]. The energy functional for regularization term is controlled by the length terms only and it was set according to the resolution of fluorescence intensity. The two phase level-set method is able to identify cells with maximum shape information since it handles sharp corners and cusps of the objects well. Thus, the original shapes of cells are retained yielding accurate features. Since dynamics of cell is dependent on geometric centroid, cell shape has to be accurately segmented. We subjectively evaluated segmentation results from different methods and confirmed that slight changes in the methods could dislodge the cell boundary by several pixels but did not affect the global boundary movement. Since we used run length of the boundary, minor boundary displacement did not affect the overall downstream analysis.
The spatiotemporal tracking method does not assume overlapping of cell boundaries between adjacent frames. It is able to handle dividing cells by using a set of heuristics. Four different scenarios are encountered during matching: (i) a cell in the current frame could match a cell in the proceeding frame (a successful match), (ii) no matching for cells in both frames (cells moving out of focus), (iii) one cell in current frame matches with more than one cell in the proceeding frame (possible cell division), and finally (iv) more than one cell in the current frame matches with only one cell in the proceeding frame (over segmentation). For differentiating case (iii) and case (iv), matching between second and third frame are checked to see whether a cell in multiple matches has its own unique characteristics. If a cell matches its counterparts in second and third frame, then we conclude that this cell has divided in the middle frame. If it has only one match in the third frame, then we conclude that this cell might have been over-segmented in the second frame. We used the same settings for weights as suggested by authors [22].

Classification of cell features
The classification of cells is done in two steps. First, the numbers of clusters are found by modeling features by using a Gaussian mixture model; second, unsupervised Kmeans clustering was used with the number of clusters, obtained with GMM model. Since underlying structure of distribution of cell features are unknown, unsupervised approaches are more suitable. However, K-means clustering requires to know the number of clusters a priori. In order to implement Gaussian mixture models, cell features have to be normally distributed. We used probability plots and chi-square goodness-of-fit to test for Gaussianity of features. Probability plot is a graphical method for determining whether sample data conforms to a hypothesized distribution decided upon visual examination. The data is sorted and plotted against the midpoint in the jump of the empirical cumulative distribution function (CDF) on Y-axis. The CDF F(z), describes the probability that a random variable z with a given probability distribution takes on a value less than or equal to a specific value. The midpoint is given by (j -0.5)/N for jth sorted value from a sample size of N. This plot also includes a reference line joining the first and third quartile and extrapolated out to the ends, which is useful for judging whether data follows a normal distribution. A departure from normality is indicated by presence of points away from the reference line ( Fig. 1). All features except one, the total path length conformed to normality test. Since the cluster membership did not change significantly by removing total path length, this feature was included to derive the cell classes. A chisquare goodness-of-fit test also showed that Gaussian mixture modeling is appropriate to represent heterogeneous cell populations ( Table 1).
The GMM model was computed for every possible number of subpopulations in the dataset (K = 2, 3, … 100). To eliminate the influence of convergence failures, each run was attempted up to 5 times with new initial conditions until convergence was reached. MDL criteria were used since it can lead to a consistent estimator even for large values of observations. For each value of K, MDL was calculated after convergence in the EM step. The optimal value of K corresponds to minimum MDL. In our dataset, this method identified four distinct subclasses from dynamic features. These classes were termed as (i) Class 1, (ii) Class 2, (iii) Class 3, and (iv)  Table 2.

Classification of edge features
The cell images sampled at 12 different time points provided a vector of values of protraction and retraction velocities respectively. This vector constitutes to an edge print of a cell, characterizing the membrane movement of the cell over time. For dataset with 12 time points, the features are computed using the adjacent frames. Finally we get a feature set of 11 protrusion features and 11 retraction features and thus 22 features in total. This set of measurements provides novel dynamic features to capture individual cell movements and membrane (edge) dynamics. This measurement does not necessarily inform about cell migration, since membrane retraction and protrusion without translocation can lead to high values. Reference sets for each cell class were estimated by K-means clustering. The initial centroids for K-means were obtained by performing the clustering phase on a random 10% sample of the data. Since the choice of initial cluster centroids is important, only 10% of randomly sampled data was used for K-means clustering. The centroids obtained from the subsamples (first phase) was used as seeds in the clusters for the second phase which used all the data. This procedure overcomes the problem of initialization in K-means clustering. About 1000 iterations were used each time to get the cluster centroids and members. K-means identified different number of sub-clusters in each of the cell classes ( Fig. 2a -2h).

Correlation of cell and edge features
To evaluate correlation between cell and edge features, Spearman's rank correlation (r) and multiple correlation analysis (R 2 ) were used on averaged dynamic and edge features over time. The Spearman rank correlation is a non-parametric measure of statistical dependence between two features using the ranks of features and is less sensitive to outliers. For this analysis, MATLAB function 'corrcoef' with type 'Spearman' was used. Correlation coefficient was computed for every pair of motility and edge features and the results were reported for statistically significant correlations at p < 0.05. The pvalues were computed by transforming the correlation to create a t-statistic ( t s N = − − r r 2 1 2 ; where r = correlation coefficient, N = number of samples) having N -2 degrees of freedom and under the assumption that features are normally distributed. Rank correlations indicated that both motility and edge features varied in the degree of their correlation among clusters of cell dynamics (Fig. 3). The correlation plots in Figs. 3a -3d show that the level of correlation varies among different classes. Multiple correlation measures the goodness-offit in linear regression; the 'speed' was the dependent variable and all other features (motility and edge features) were the predictors in regression analysis. This analysis showed strong positive correlation for all the features (R 2 = 0.97). In order to account for bias due to outliers in the regression analysis, we also performed Figure 1 Tests of normality of features: every cross in the plot corresponds to midpoint in the jump of empirical cumulative distribution function on Y axis to sorted data in X axis (number of cells=5415).
• Class 1: This class consists of cells with low speed and persistence. The pattern shows that active membrane ruffling may not translate into active cell movement. It might have even restricted the cells overall movement which is evident from the low total displacement feature. For example, NRK49F cells with defect in rho or adducin have been shown to have active lamellopodial ruffling, while being unable to migrate [23] (Fig. 2a and 2b).
• Class 2: Cells with medium speed and persistence showing positive correlation for protrusion and retraction. Similar protrusion and high retraction activity may be the reason for multiple peaks of edge features over the length of time ( Fig. 2c and 2d).
• Class 3: This class is represented by fast moving cells displaying high speed and persistence and is positively correlated with edge movement features. These cells also had the highest edge activity which may help in moving the cell over long distances with high persistence. When the static features of these cells were analyzed they had typical fan shaped morphology ( Fig. 2e and 2f).
• Class 4: These cells frequently change directionality as indicated by low persistence. Edge features are also positively correlated to dynamic features and this suggests that the frequent change in direction may be accompanied by a respective change in edge movements. Although the cells change direction, they travel within a limited radius more like in spiral motion. This can be seen from the low total displacement and mean path length compared to class 3. Even though, the cell speed is greater than Class 3, the cells do not travel in a constant direction (as indicated by low persistence) and tend to display a spiral or circular concentric motion (Fig. 2g and 2h).
In order to determine which features contributed to the diversity of correlation patterns, or rather influenced the type of motility pattern adapted by any cell, factor analysis was performed on all four sub-clusters. This method has been proven efficient in describing cell shape dynamics in cancer cells [24]. This method postulates the existence of a small number of latent factors which explains the systematic contribution of the original features. The number of factors that should be retained is suggested by the Kaiser criterion (factors with Eigenvalues more than or equal to one should be retained) [25]. For class 1 and class 2, six factors were retained which accounted for 91.6% and 90.1% of the variance respectively. For class 3 and class 4, seven factors were retained and they accounted for 88.2% and 89.0% of the variance respectively (Table 4). Factor 1 indicated the presence of high number of edge features. In particular, protrusion and retraction features extracted from initial six time points (Table 5 ). Factor 2 had predominantly cell dynamics features. The remaining factors contained edge features sampled from middle to end time points. These findings conclude that the motility patterns are decided largely by cell membrane features observed in the initial time points.

Conclusion
Non-genetic heterogeneity in cell populations arises from a combination of intrinsic and extrinsic factors. This heterogeneity has been measured for gene transcription, phosphorylation, cell morphology, drug perturbations, and used to explain various aspects of cellular physiology. Our understanding of individual players in cell migration process is increasing; but there remains a vital gap to be filled concerning how they are coordinated spatially and temporally. New techniques are needed which can quantify dynamic cell movements at the level of single cell resolution in an automated manner.
Here, we report multivariate analysis of different sets of motility features through a meaningful combination of both novel (edge) and existing (centroid based) dynamic features. The first set of measurements has been already proved to improve subpopulation analysis. The second set of features is a novel measurement of edge activity. These features capture pixel movement, either through protrusion or retraction frame by frame over the entire length of observation. Since these measurements are temporally sampled, it is suitable to study cell activity over time. These  features are unique and not necessarily a measurement of cell migration, as membrane protrusion-retraction is possible without translocation. Our data indicate different levels of correlation between sets of features, depending on the dynamic classes they belong to. This type of relationship was expected for this cell line due to its highly motile nature. Our findings compare well with previous literature [23]. Veronika et al. BMC Bioinformatics 2011, 12(Suppl 13):S19 http://www.biomedcentral.com/1471-2105/12/S13/S19 The introduction of edge features is the major contribution of this work since it captures edge activity of large number of cells from high throughput imaging platforms in a way that no other profiling methods we are aware of have previously demonstrated. Our profiling method was able to provide additional insights which might have been missed using population based cell migration techniques or classical motility assays. To conclude, we have identified heterogeneous edge patterns of related dynamic profiles and validated our correlation patterns by comparing with previous publications. The dynamic profiles were obtained from cell displacement data by GMM clustering. Edge prints from these subclasses were further used to characterize heterogeneity arising due to different edge movements. The patterns arising from statistical correlation analysis were validated by comparing with previous publications. We also provided statistical evidence that initial time point edge features influence the motility patterns that a cell adapts.

Segmentation and tracking of cells
Level-set was used to segment cells from images, independently at all the time frames [20]. The image gradient was used to stop the evolution of level-sets.
Touching cells were further separated by a markercontrolled watershed that uses initially segmented cells as shape markers for marking function [26]. The segmented cells in adjacent frames were correspondingly matched by spatiotemporal matching scheme that uses features like size, intensity, and spatial coordinates for matching [22]. The tracks of cells were subsequently corrected for mismatches and only those cells moving for the entire period of observation were included for further analysis.

Dynamic feature extraction
Dynamic features of cells are classified into two categories based on motility modes: features describing whole cell dynamics and features representing membrane (edge) dynamics. Two different methods were employed to extract the two sets of features.

Cell dynamics
A persistent random walk model was used to study directional migration of cells, in which the geometric centroid of a cell forms the basis for modeling cell motility [27]. A total of eight cell dynamics features were extracted: speed, persistence, chemotactic index (CI), total path length (TPL), total displacement (TD), random motility coefficient (RMC), mean path length (MPL) and persistence length (PrL) [3]. The set of subpopulations obtained from these features represent cell classes. The overview of the analysis is illustrated in the flowchart of Fig. 4 and Algorithm 1 summarises the different steps in the analysis.

Algorithm 1 Statistical analysis of features
Step 1: Dete ermine the number of clusters by GMM modeling Step 2: Perfo orm K-means clustering to find subclasses Step     ML estimates of parameters are obtained by using Expectation Maximization (EM) algorithm. In order to find the optimal number of classes, a minimum description length (MDL) estimator was employed [28]. MDL is an information theoretic model selection principle presumed as the most compact representation of data in the probabilistic network. MDL estimator finds the model orderK by the following criteria: . The penalty term in MDL includes the total number of features to avoid over-fitting of the model. where j m,t denotes the location of the mth marker of the boundary j t at time t. The protrusion p t and retraction r t features at each time point t are then computed and features are extracted thereof. Fig. 5 illustrates the steps involved in evaluating edge features.
Cells are classified by a set of protrusion and retraction features measured over all the time points. These features provide an idea about the activity level of a cell at respective time instances and are used to cluster the cells. Clustering was performed using K-means algorithm.