Volume 12 Supplement 13
Cell cycle phase classification in 3D in vivo microscopy of Drosophila embryogenesis
© Du et al; licensee BioMed Central Ltd. 2011
Published: 30 November 2011
Cell divisions play critical roles in disease and development. The analysis of cell division phenotypes in high content image-based screening and time-lapse microscopy relies on automated nuclear segmentation and classification of cell cycle phases. Automated identification of the cell cycle phase helps biologists quantify the effect of genetic perturbations and drug treatments. Most existing studies have dealt with 2D images of cultured cells. Few, if any, studies have addressed the problem of cell cycle classification in 3D image stacks of intact tissues.
We developed a workflow for the automated cell cycle phase classification in 3D time-series image datasets of live Drosophila embryos expressing the chromatin marker histone-GFP. Upon image acquisition by laser scanning confocal microscopy and 3D nuclear segmentation, we extracted 3D intensity, shape and texture features from interphase nuclei and mitotic chromosomes. We trained different classifiers, including support vector machines (SVM) and neural networks, to distinguish between 5 cell cycles phases (Interphase and 4 mitotic phases) and achieved over 90% accuracy. As the different phases occur at different frequencies (58% of samples correspond to interphase), we devised a strategy to improve the identification of classes with low representation. To investigate which features are required for accurate classification, we performed feature reduction and selection. We were able to reduce the feature set from 42 to 9 without affecting classifier performance. We observed a dramatic decrease of classification performance when the training and testing samples were derived from two different developmental stages, the nuclear divisions of the syncytial blastoderm and the cell divisions during gastrulation. Combining samples from both developmental stages produced a more robust and accurate classifier.
Our study demonstrates that automated cell cycle phase classification, besides 2D images of cultured cells, can also be applied to 3D images of live tissues. We could reduce the initial 3D feature set from 42 to 9 without compromising performance. Robust classifiers of intact animals need to be trained with samples from different developmental stages and cell types. Cell cycle classification in live animals can be used for automated phenotyping and to improve the performance of automated cell tracking.
Cell divisions and their regulation play important roles in disease and development. The cell cycle can be divided in two main periods: interphase and mitosis. During interphase the cell grows, duplicates its DNA and accumulates nutrient and gene products required for mitosis. During mitosis, the cell splits itself and divides the genomic DNA between the two daughter cells. The mitosis can be further subdivided into several distinct phases: prophase, metaphase, anaphase and telophase. The cell phases can be identified by their appearance in high resolution microscopy images. Figure 1 shows examples of the typical appearances of the chromatin marker histone-GFP in different cell cycle phases. Automated cell phase classification is an essential step in high-throughput image analysis of large populations of cells that enables quantification of cell cycle progression, which is very important for developmental biology, cancer cell study and drug discovery. For instance, measuring the duration of individual cell cycle phases under different genetic and drug treatment conditions can improve the understanding of biological mechanisms in oncological diseases and enhance the effectiveness of drug discovery and development . Cell phase classification is crucial for high-throughput image based screens, such as the Mitocheck project that are aimed at identifying and characterizing genes involved in cell division . Several bioimaging research groups have addressed this challenging problem [3–7]. Most studies involved 2D images [1, 3–6]. One study dealt with 3D images, but cellular features were extracted from the most informative single slice . Dynamic features have been widely used for cell phase classification [1, 5–7], however as mentioned by , tracking algorithms become less reliable and context information becomes less informative when the cells are densely populated or/and move at fast velocity. In recent years, confocal laser scanning microscopy (CLSM) has become a common imaging modality to visualize fluorescently labelled cells in 3D. The extra dimension compared to conventional 2D microscopy promises to enhance the understanding of bio-molecular mechanisms. Another application of automated cell cycle phase identification is the improvement of cell tracking in the analysis of time-lapse images. In live tissues, cells can move large distances. Significant displacements in short periods (e.g. one minute) are especially pronounced in mitosis of Drosophila embryos. Since cell cycle phases occur in a fixed order, tracking can be improved using this prior biological knowledge. Therefore, it is essential to develop a cell phase classification algorithm that utilizes 3D image information and does not rely on dynamic features extracted by cell tracking. In this article, we present an automated cell cycle phase classification algorithm for 3D images of live Drosophila embryos.
The images stacks of Drosophila embryo were recorded at 55-60 second time intervals using a Zeiss 5 Live confocal laser scanning inverted microscope and consisted of 66-70 slices of 1024 x 1024 pixels. The voxel dimensions in x/y/z were 0.1 x 0.1 x 0.44 microns.
Image processing, segmentation and creation of labelled datasets
The image stacks were first deconvolved using Huygens Professional  to enhance the image quality. Then interphase nuclei and mitotic chromosomes were segmented using a multi-level-set 3D segmentation algorithm . Data samples of nuclei were obtained from movies of two embryos. The first embryo was recorded during the syncytial blastoderm stage and gave rise to 4606 samples representing the 5 phases of nuclear division cycles (interphase, prophase, metaphase, anaphase, telophase). The second 3D time series image dataset was acquired after cellularization during the cell divisions of the gastrulation stage and gave rise to 3119 samples For each sample, we calculated a set of 42 3D features (see below) and assigned one of the five cell cycle labels.
3D feature calculation
Humans recognize objects by their geometric and photometric characteristics. To mimic human vision, a set of 42 3D shape, texture and intensity features was carefully designed and extracted.
The volume V is equal to the total number of voxels inside the object times the voxel size. V = n×sx×sy×sz.
The surface area A is calculated using a voxel-based surface area estimation method . Prior to surface area calculation, segmented image stacks were interpolated to make each voxel isotropic using a shape-based interpolation .
Humans tend to identify nuclei based on their round or spherical shape. Sphericity ψ is defined as .
The eccentricity features E1, E2 are defined as the ratios of the square root of the third and second eigen value to the square root of the first eigen value. The inverse of the square root of the eigen values is the corresponding equatorial radius of an ellipsoid fitted into a given 3D object.
Mean and standard deviation of distance from surface to centroid
The voxels on the object surface are denoted as (p1,…,pi,…,pm), and their distances to the object centroid are (d1,…,di,…,dm). The meanand standard deviation of surface to centroid distances are defined as and .
Mean and standard deviation of intensity
Let the pixel intensities in 3D objects be denoted as (I1,…,Ii,…,In). The mean and standard deviation of intensity are defined as and .
3D texture features
Texture was described using Haralick texture features that are based on the 2D grey-level co-occurrence matrix (GLCM) [12–14]. In order to calculate 3D texture features, the grey-tone spatial dependence matrices Pk(i,j)(k = 1,…,13,i = 1, …, 256,j = 1, …,256) are calculated in 13 instead of 4 directions. NG denotes the number of grey levels, which is 256 in our case. Different displacement values of 1, 2, 4, and 8 were tested, all of which showed similar classification results. To reduce computational expenses and feature space dimensionality, we set the displacement value to 1 only.
The following texture features were used in this study:
Where μx, μy, σx and σy are the means and standard deviations of px and py.
f13 = variance of px−y
For a given 3D object, we have 13 angular gray-tone spatial dependence matrices. Hence we obtain a set of 13 values for each of the above mentioned texture features. The mean and standard deviation of these 13 values served as the 3D texture features.
Deviation between intensity-weighted and geometrical centroids
The Geometrical centroid of a 3D object is defined as . The intensity weighted-weighted centroid is defined as . The Deviation between intensity-weighted centroid to geometrical centroid (dx, dy, dz) is defined as , which describes the intensity distribution within a 3D object. The motivation of this feature was to describe asymmetry of intensity distribution found in cells, such as condensed heterochromatin found at one end of an interphase nucleus.
Feature reduction and classification
Visualization and validation of classification outputs
We created two datasets of nuclei detected in 3D images of early Drosophila embryos labelled with the live reporter histone-GFP that visualizes the progression through the phases of the division cycle (Figure 1). The first dataset contained 4606 samples in various phases of nuclear divisions during in the syncytial blastoderm stage, while the second one contained samples of nuclei in proliferating epithelial cells during gastrulation. Syncytial blastoderm and gastrulation are separated by cellularization that lasts approximately one hour. For each sample, we calculated 42 intensity, shape and texture features and assigned the respective cell cycle phase; interphase, prophase, metaphase, anaphase or telophase.
Comparison of cell cycle phase classification accuracy obtained with different classification models (columns) and feature reduction techniques (rows).
Cell cycle classification accuracy for a dataset of 3119 samples derived from the gastrulation blastoderm stage using none-weighted SVM and 42 features. (Pred. = predicted)
Cell cycle classification accuracy for a dataset of 3119 samples derived from the gastrulation stage using weighted-SVM and 9 features. (Pred. = predicted)
Cell cycle classification accuracy for a dataset of 4606 samples derived from the syncytial blastoderm stage using weighted-SVM and 9 features. (Pred. = predicted)
Cell cycle phase classification performance for different training and testing datasets. We used a weighted SVM with 9 features.
syncytium + gastrulation
syncytium + gastrulation
We noticed that a large proportion of misclassified cells were wrongly predicted to belong to neighbouring classes (see confusion matrices in Tables 234). For instance, 16 anaphase samples were misclassified as metaphase, and 10 anaphase samples as telophase (Table 3). This is not unexpected as phenotypic transitions of chromosomes during cell cycle progression happen gradually and there are no clear morphological boundaries between mitotic phases. Both forward feature selection and backward feature reduction could reduce the feature set from 42 to 12 without compromising classification performance (Table 4). Feature selection had a slight advantage as it was computationally more efficient (~2 times faster).
Although nuclei at syncytial and gastrula stage are visually similar, the overall classification accuracy of syncytial samples applied to a model trained with gastrula data was only 51.65%, while 70.52% classification accuracy was achieved in the converse experiment (Table 5). This might due to the following 3 differences: first, they are at different developmental stages, nuclei in syncytium stage have no membranes; second, they are from different Drosophila embryos; third, the laser power and microscope settings might be different for these two datasets. The results indicate that classifiers trained using syncytium dataset cannot be used to classify cells at cellular blastoderm stage and vice versa. However, a unified classifier can be obtained when trained using combining datasets from two developmental stages. Using this unified classifier, we could achieve over 90% classification accuracy for both datasets as shown in the last two columns of Table 5. This result shows that if the classifier is trained using more training samples containing all possible variations, a robust classifier can be obtained.
3D image stacks obviously contain more information than 2D images. Therefore, it is conceivable that 3D possess a higher discrimination power than 2D features. Since this notion lacks thorough evaluation and computing 2D features (especially texture features) is computationally less costly, it is worthwhile to address this issue in future research. One approach could involve producing 2D projections of 3D objects and testing the classification performance using 2D features extracted from 2D projections. Alternatively, we could extract features from a single representative slice (e.g. middle) as previously described .
3D live cell imaging is becoming a common technique for the study of dynamic cellular processes in 3D tissues. Accurate cell phase classification is one of the essential steps to automate 3D live cell imaging analysis. Starting from an initial set of 42 shape, intensity and texture feature, we evolved a reduced subset of 9 dominant features without affecting predictive performance. Weighted-SVM was used to alleviate the problem of imbalanced training datasets. Over 90% classification accuracy was achieved on two dataset consisting of over 7000 cells (nuclei). As in cultured cells, automated cell cycle classification in 3D tissues can be applied to the characterization of cell divisions phenotypes resulting from genetic perturbations in multi-cellular organisms such as Drosophila, zebrafish or C. elegans. Our method does not depend on dynamic features derived from cell tracking. As such, this approach can be used to improve the performance of automated cell tracking in live cell imaging.
This article has been published as part of BMC Bioinformatics Volume 12 Supplement 13, 2011: Tenth International Conference on Bioinformatics – First ISCB Asia Joint Conference 2011 (InCoB/ISCB-Asia 2011): Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/12?issue=S13.
- Chen X, Zhou X, Wong STC: Automated segmentation, classification, and tracking of cancer cell nuclei in time-lapse microscopy. IEEE Trans. Biomed. Eng. 2006, 53: 762–766. 10.1109/TBME.2006.870201View ArticlePubMedGoogle Scholar
- Neumann B, Walter T, Hériché JK, Bulkescher J, Erfle H, Conrad C, Rogers P, Poser I, Held M, Liebel U, Cetin C, Sieckmann F, Pau G, Kabbe R, Wünsche A, Satagopam V, Schmitz MHA, Chapuis C, Gerlich DW, Schneider R, Eils R, Huber W, Peters JM, Hyman AA, Durbin R, Pepperkok R, Ellenberg J: Phenotypic profiling of the human genome by time-lapse microscopy reveals cell division genes. Nature 2010, 464: 721–727. 10.1038/nature08869PubMed CentralView ArticlePubMedGoogle Scholar
- Lu J, Liu T, Yang J: Automated cell phase classification for zebrafish fluorescence microscope images. 20th International Conference on Pattern Recognition 2010, 2584–2587.Google Scholar
- Zhou X, Li F, Yan J, Wong STC: A novel cell segmentation method and cell phase identification using Markov model. IEEE Trans InfTechnol Biomed 2009, 13: 152–157.View ArticleGoogle Scholar
- Wang M, Zhou X, Li F, Huckins J, King RW, Wong STC: Novel cell segmentation and online SVM for cell cycle phase identification in automated microscopy. Bioinformatics 2008, 24: 94–101. 10.1093/bioinformatics/btm530View ArticlePubMedGoogle Scholar
- Wang M, Zhou X, King R, Wong S: Context based mixture model for cell phase identification in automated fluorescence microscopy. BMC Bioinformatics 2007, 8: 32. 10.1186/1471-2105-8-32PubMed CentralView ArticlePubMedGoogle Scholar
- Harder N, Mora-Bermúdez F, Godinez WJ, Ellenberg J, Eils R, Rohr K: Automated analysis of the mitotic phases of human cells in 3D fluorescence microscopy image sequences. Med Image Comput Assist Interv 2006, 9: 840–848.Google Scholar
- Chinta R, Puah WC, Kriston-Vizi J, Martin Wasser: 3D segmentation for the study of cell cycle progression in live Drosophila embryos. In Proceedings of the First International Workshop on Medical Image Analysis and Description for Diagnosis Systems. Porto, Portugal; 2009:43–51.Google Scholar
- Windreich G, Kiryati N, Lohmann G: Voxel-based surface area estimation: from theory to practice. Pattern Recognition 2003, 36: 2531–2541. 10.1016/S0031-3203(03)00173-0View ArticleGoogle Scholar
- Herman GT, Zheng J, Bucholtz CA: Shape-Based Interpolation. IEEE Computer Graphics and Applications 1992, 12: 69–79.View ArticleGoogle Scholar
- Haralick RM, Shanmugam K, Dinstein I: Textural features for image classification. Systems, Man and Cybernetics, IEEE Transactions on 1973, 3: 610–621.View ArticleGoogle Scholar
- Haralick RM: Statistical and structural approaches to texture. Proceedings of the IEEE 1979, 67: 786- 804.View ArticleGoogle Scholar
- Soh LK, Tsatsoulis C: Texture analysis of SAR sea ice imagery using gray level co-occurrence matrices. Geoscience and Remote Sensing, IEEE Transactions on 1999, 37: 780–795. 10.1109/36.752194View ArticleGoogle Scholar
- Reunanen J, Guyon I, Elisseeff A: Overfitting in making comparisons between variable selection methods. Journal of Machine Learning Research 2003, 3: 1371–1382.Google Scholar
- Jolliffe IT: Principal Component Analysis. New York: Springer-Verlag; 2002.Google Scholar
- Duda RO, Hart PE, Stork DG: Pattern Classification. 2nd edition. Wiley-Interscience; 2000.Google Scholar
- Blum AL, Langley P: Selection of relevant features and examples in machine learning. Artificial Intelligence 1997, 97: 245–271. 10.1016/S0004-3702(97)00063-5View ArticleGoogle Scholar
- Vapnik V: The Nature of Statistical Learning Theory. 2nd edition. Springer; 1999.Google Scholar
- Chih-Chung Chang, Chih-Jen Lin: LIBSVM : a library for support vector machines. 2001.http://www.csie.ntu.edu.tw/~cjlin/libsvm Software available atGoogle Scholar
- Yi-Min Huang, Shu-Xin Du: Weighted support vector machine for classification with uneven training class sizes. 2005, 7: 4365–4369.Google Scholar
- van der Maaten LJP: Matlab toolbox for dimensionality reduction. In Proceedings of the Belgian-Dutch Artificial Intelligence Conference. Volume 2007. Utecht, The Netherlands; 2007:439–440.Google Scholar
- Fukunaga K, Olsen DR: An algorithm for finding intrinsic dimensionality of data. IEEE Trans. Comput 1971, 20: 176–183.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.