Volume 14 Supplement 10
Cell segmentation by multi-resolution analysis and maximum likelihood estimation (MAMLE)
© Chowdhury et al; licensee BioMed Central Ltd. 2013
Published: 12 August 2013
Cell imaging is becoming an indispensable tool for cell and molecular biology research. However, most processes studied are stochastic in nature, and require the observation of many cells and events. Ideally, extraction of information from these images ought to rely on automatic methods. Here, we propose a novel segmentation method, MAMLE, for detecting cells within dense clusters.
MAMLE executes cell segmentation in two stages. The first relies on state of the art filtering technique, edge detection in multi-resolution with morphological operator and threshold decomposition for adaptive thresholding. From this result, a correction procedure is applied that exploits maximum likelihood estimate as an objective function. Also, it acquires morphological features from the initial segmentation for constructing the likelihood parameter, after which the final segmentation is obtained.
We performed an empirical evaluation that includes sample images from different imaging modalities and diverse cell types. The new method attained very high (above 90%) cell segmentation accuracy in all cases. Finally, its accuracy was compared to several existing methods, and in all tests, MAMLE outperformed them in segmentation accuracy.
Single cell microscopy and subsequent analysis has gained much interest recently in areas ranging from studies of gene expression dynamics [1–3], to studies of cell aging [4, 5] to disease classification . However, as most processes in cells are stochastic in nature  their study requires high-throughput measurements and analysis. The manual extraction of the results from the raw image data is thus prohibitive, causing a need for accurate and robust methods of cell segmentation.
Most existing methods lack in generic applicability and require strong assumptions on cell features i.e. cell shape, size, etc. Additionally, their performance is highly sensitive to cell density and signal to noise ratio. One of the presently most successful and used cell image analysis tools is 'Cellprofiler' , an open source software platform for automated cell segmentation from microscopy images. Cell segmentation in Cellprofiler is performed in two steps. First, it separates objects from image background by thresholding. Next, the clumped objects are segmented again by considering intensity or shape as a feature for discrimination. Cellprofiler provides several alternatives for automated threshold selection and clumped cell segmentation. The major drawback of its segmentation algorithm is that its accuracy decreases significantly when cells are in large, dense clumps.
Another state of the art software tool is 'Schnitzcells' . Schnitzcells provides solutions for segmentation and tracking of Escherichia coli cells from images by confocal or phase contrast microscopy. The segmentation of cells in Schnitzcells is a multi-stepped process. First, it applies edge detection for generating initial segmentation. Next, it splits long or clumped cells. Finally, it considers too small objects as false positives and discards them. The major problem is the large number of parameters that, without proper tuning, cause the accuracy of the segmentation to decrease notably. Further, it has a limited scope of application, i.e. it only handles E. coli and Bacillus subtilis cells and often presents a significant number of false positives.
Finally, it is worth mentioning the cell segmentation algorithm for histopathology images, whose implementation was made available in the Farsight toolkit . This method exploits graph-cuts-based segmentation for segmenting foreground signals from the image background. Then, the nuclear seed points are detected by a multi-resolution edge detection method. Aside these, other methods for cell segmentation were proposed (see e.g. [11, 12]). In general, these split the overall segmentation task into three steps. First, a separation of foreground objects from image background is made. Next, a post processing step is applied to split the under-segmented clumped cells. Finally, false positives are discarded by some criteria.
Here, we propose a novel cell segmentation method, MAMLE, which maintains very high cell segmentation accuracy in dense cell clusters with low signal to noise ratio (SNR). Moreover, MAMLE requires very few assumptions on cell shape or size, thus, it can handle a wide range of cell types in different imaging modalities. MAMLE is novel in that i) it adopts a state of the art image denoising technique for improving SNR in image, ii) unifies multi-resolution edge detection and threshold decomposition to accomplish the initial segmentation iii) corrects the over-segmented and under-segmented cells based on likelihood estimate, which is shown to be adaptive to varying conditions. Above all, MAMLE innovates in that it learns different shape features on the fly and exploits the learnt parameters for cell segmentation correction. A properly combined usage of all features is implemented to obtain robust and accurate cell segmentation.
MAMLE is primarily targeted towards one of the most challenging cell types for automated segmentation, E. coli, a model organism in cell and molecular biology research [13–15]. The high division rate, the formation of dense colonies and the cells' morphology make the segmentation more challenging than for most other cell types. We first describe the method, after which we evaluate its cell segmentation accuracy and compare it with state of the art cell segmentation platforms. Next, the robustness of MAMLE is studied in its parametric space. In the end, we present our conclusions.
i) Image denoising: fluorescent images often have low SNR, which leads to cell detection artifacts. Hence, denoising filters are often applied as a pre-processing step for segmentation. MAMLE exploits a state of the art image denoising technique known as Block-Matching and 3D filtering (BM3D) . We opted for BM3D due to its balance between noise cancellation and edge preservation capability . BM3D splits a noisy 2D image into fixed size blocks (8x8) and searches for the blocks that match the reference block. The matching blocks are arranged into a 3D array, known as 'group'. On each group, a 3D transformation is executed and thresholded. Afterwards, BM3D inverse transforms the group and aggregates it with weights to augment the basic estimate. The augmented basic estimate acts as a pilot signal for subsequent steps. Finally, a collaborative Wiener filtering is executed on the noisy signal to obtain the noise removed image.
ii) Foreground and background segmentation: this step (second step in Figure 1) separates individual cell colonies from the image background. In fluorescence microscopy, cells are stained with fluorophores or they express a fluorescent protein. As a result, the image background appears darker than foreground objects (i.e. cells). Therefore, a block-wise Otsu threshold, followed by bilinear interpolation, is applied to separate each cell colony from the background [8, 17]. Phase contrast images, on the other hand, have different intensity profiles for foreground and background. For these, we use iterative range filtering to segment foreground objects from the image's background . In both cases, the extracted foreground mask is used to select the region of interest.
Each pixel in the fuzzy edge image () is a real valued integer within the range 0 to S, where S is the maximum scaling factor. A pixel on the most certain edges in the original gray scale images holds a value close to S, while a pixel on the smooth region holds a value close to zero in the fuzzy edge image.
The decomposition starts with the strongest threshold () that subdivides the foreground object, based on most certain or strongest edges. Afterwards, it lowers the threshold gradually to a predefined bound, unless the foreground object is already reduced to a size smaller than a predefined value (i.e. the expected maximum size of a cell). The threshold decomposition is recursive and the exact number of decomposition levels is specific for each of the detected objects. The details are listed in 'Algorithm 1'. The segmentation result is treated as an initial segmentation mask for the following steps. As noted, the initial segmentation results show several over and under-segmentation artifacts, which should be corrected.
Algorithm 1: ThresholdDecompositon()
Select the edges that are stronger than and label the image based on edge selection
- 4.FOR EACH labelled region
- a)IF area () > THEN
- v)Learning shape parameter: First, we need to acquire a morphological or shape feature from the initial segmentation. As the initial results are partially correct, they can be used to obtain different shape features. Afterwards, they can be categorized into classes, namely 'correct segmentation', 'under-segmentation' or 'over-segmentation', based on the morphological features. However, to maintain the classifier tractable, we treat each object as a simple polygon and consider its area, the major axis length and the minor axis length as the discriminant features for classification. The major (a) and the minor (b) axes are computed according to (9) and (10) .(9)(10)
Here, is the centroidal moment, is the area and , are two orthogonal eigenvalues of the polygon. The ideal cell shape and shape distribution are assumed as multivariate Gaussian distributions on the feature space. The parameter of this distribution can be estimated from the sample mean vector and the covariance matrix in the feature space of the initial segmentation results.
A closer look at the objective function () reveals that, given the considered metric, the minimization of the variance of the normalized distance from the mean vector () would maximize the likelihood function. This problem is usually realized as a Gaussian mixture model problem and is solved with the expectation maximization (EM) algorithm . However, we did not consider EM as a solution since: i) EM needs to know the number of existing mixtures, ii) EM does not have direct control on the shape of the distribution and, iii) since EM considers global optimization, there is no straight forward way to consider the case where a part of one cell is clumped with one or more cells. Thus, we formulate an iterative procedure for likelihood maximization that splits a clumped object into two parts by maximizing the likelihood only in one of the parts, disregarding the other.
After the split, the disregarded part is reconsidered for split and processed recursively, unless its size is already smaller than the average cell size. The split procedure is not completely flawless. Occasionally, it over-segments a single cell into multiple parts. Nevertheless, most over-segmented cells are re-merged in the subsequent merge procedure.
Similarly, it generates hypothesis list () for merging three objects and adds a single row in the hypothesis matrix and likelihood vector for each of the generated hypotheses. This can be extended further, beyond the third level. We did not find any evidence to justify such expansion. Hence, we limited the hypothesis generation process to level three. The generated hypothesis matrix has rows and columns, while the likelihood vector has rows. Given this, the maximum likelihood estimate for merging can be obtained by selecting the hypotheses that includes each of the identified objects exactly in one hypothesis and, at the same time, maximizes the total likelihood. This problem can be solved by solving a standard binary integer programming problem formulated in (19).
such that (19)
Here, is a binary column vector that indicates whether the respective hypothesis should be accepted or not and is a column vector of ones that restricts the inclusion of each candidate exactly in one hypothesis. However, the stated problem belongs to the class of NP-hard problems . Thus, there is no known polynomial time solution for solving it. We exploited a linear programming (LP)-based branch-and-bound technique to obtain an approximate solution . Finally, the selected hypotheses are accepted and the objects are merged to construct the final segmentation result.
E. coli DH5α-PRO strain containing a bacterial expression vector PROTET-K133 carrying the MS2-dimer (MS2d) fused with green fluorescent protein (MS2d-GFP) was used for this study [26, 27]. This vector has an inducible promoter P(LtetO-1), which is under tight regulation of anhydrotetrachycline (aTc, IBA GmbH, Göttingen, Germany). Constructs were generously provided by Dr. Ido Golding, University of Illinois. Cells were grown in LB medium, supplemented with kanamycin antibiotic for the selection of cell containing the PLtetO-1-Ms2d-GFP plasmid. For full induction of gene expression, cells were grown overnight at 37 °C with aeration to reach an optical density of OD600 ≈ 0.3-0.5. The cells were incubated with inducer aTc (100 ng/ml) for 45 minutes to attain full induction of MS2d-GFP. Following induction, a few micro litres of culture were placed between a cover-slip and a thick slab of 1% agarose containing LB. Microscopy was performed at room temperature (22 °C) using a Nikon Eclipse (TE-2000-U, Nikon, Tokyo, Japan) inverted confocal laser-scanning microscope equipped with a 100X magnification (1.5NA) objective. GFP fluorescence was measured using a 488 nm laser (Radius 405 laser, Coherent, Inc., Santa Clara, CA) and a 515/30 nm detection filter (100-120 detector gain).
In case of Staphylococcus aureus, cells were grown in LB medium. OD600 ≈ 0.3-0.5 cells were incubated with 0.5 mg ml-1 DNA binding stain, 4'6-diamidino-2-phenylindole (DAPI, Sigma) for 1 hour at 37 °C and centrifuged at 7000 RPM for 10 min. Cell pellet was diluted 1:100 time and few micro litres of cells were placed in a microscopic slide as mentioned above to perform the image acquisition. DAPI stain expression was measured using a 406 nm laser (Radius 405 laser, Coherent, Inc., Santa Clara, CA) and a 450/35 nm detection filter (100-120 detector gain).
Test results on confocal images of E. coli cells expressing a fluorescent protein, MS2d-GFP
Over -seg. (total/ %)
Under -seg. (total/ %)
FN (total/ %)
Segmentation accuracy (%)
Test results from phase contrast images of E. coli cells, from fluorescence images of S. aureus, and from epifluorescence images of E. coli.
Over seg. (total/ %)
Under-seg. (total/ %)
Phase contrast - E. coli
Epi - E. coli
Finally, we consider the usability of the novel method. In addition to accurate segmentation, the number of parameters and a robust performance in the parametric space are critical aspects of an accurate segmentation method. Ideally, an automated method should have as few free parameters as possible, and their tuning should be intuitive. Also, the optimal range of parameters should be large enough for tuning properly. Such 'parametric robustness' is what enables the segmentation method to be applicable to large scale analysis without the need for significant effort regarding the parameter tuning.
The proposed method has only 4 free parameters for tuning, namely, maximum cell size, scaling or resolution level, decomposition level and threshold window size, which affect the segmentation accuracy. The first three are intuitive in the sense that the maximum cell size can be estimated from knowledge of the phenotype of the cells (or a quick observation of the test samples) and microscope settings. The maximum scaling factor should be smaller than the cell width and, finally, the decomposition level should be less than the maximum scaling factor.
Conclusions and discussion
Automated cell segmentation with high accuracy is a major challenge as well as a necessity towards high throughput analysis in cell biology, whose research is increasingly relying on in vivo single-cell studies. Here, we presented a novel method for automatically segmenting cells within colonies from microscopy images. The segmentation scheme exploits image de-noising techniques in transform-domain followed by multi-resolution edge detection and threshold decomposition for generating initial segmentation results. Then, a machine learning procedure is carried out to learn morphological shape parameters from the initial segmentation. Next, a likelihood optimization based splitting and maximum likelihood estimate based merging steps are executed to construct the accurate segmentation result.
The method was primarily evaluated for segmenting GFP labelled E. coli cells, but it was also tested for different cell types and imaging modalities. The test set comprises both de novo data set as well as samples from publicly available off-the-shelf benchmark data set. The segmentation results were found highly accurate by manual inspection, and denote high segmentation accuracy when compared with existing methods. The main strength of MAMLE relies on its ability to segment dense cell colonies as well as it robustness across a wide range of imaging modalities of different cell types.
Relevantly, the parameter selection is limited to three parameters, whose setting is intuitive. Either knowledge of the cells' morphology or a quick observation of the images, along with knowledge on the magnification settings of the microscope suffice to introduce parameter values that lead to robust results. Nevertheless, the overall performance was found robust to sub-optimal parameter settings as well. A forth parameter, 'threshold window size', as discussed, should be obtained from the spatial distribution of cell background and foreground illumination levels.
In the future, MAMLE can be extended in several ways. One plausible improvement is to add the possibility of training the method beforehand and update the trained knowledge base at runtime, rather than building the entire knowledgebase at runtime. This may be of use to research groups that focus on a specific organism and desire close to optimal results without the need to test the method and its parameters for each study.
Availability and additional materials
List of abbreviations used
Block-Matching and 3D filtering
Green fluorescent protein
Multi-resolution Analysis and Maximum Likelihood Estimation
Non-deterministic Polynomial-time hard
Work supported by Academy of Finland, Finnish Funding Agency for Technology and Innovation (TEKES), and Tampere Doctoral Programme in Information Science and Engineering (TISE). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
The publication costs for this article were funded by the Finnish Funding Agency for Technology and Innovation (grant no. 40226/12).
This article has been published as part of BMC Bioinformatics Volume 14 Supplement 10, 2013: Selected articles from the 10th International Workshop on Computational Systems Biology (WCSB) 2013: Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/14/S10.
- Golding I, Paulsson J, Zawilski SM, Cox EC: Real-time kinetics of gene activity in individual bacteria. Cell. 2005, 123 (6): 1025-1036. 10.1016/j.cell.2005.09.031.View ArticlePubMedGoogle Scholar
- Kandhavelu M, Mannerström H, Gupta A, Häkkinen A, Lloyd-Price J, Yli-Harja O, Ribeiro AS: In vivo kinetics of transcription initiation of the lar promoter in Escherichia coli. Evidence for a sequential mechanism with two rate limiting steps. BMC Syst Biol. 2011, 5: 149-10.1186/1752-0509-5-149.PubMed CentralView ArticlePubMedGoogle Scholar
- Taniguchi Y, Choi PJ, Li GW, Chen H, Babu M, Hearn J, Emili A, Xie XS: Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science. 2000, 329 (5991): 533-538.View ArticleGoogle Scholar
- Lindner AB, Madden M, Demarez A, Stewart EJ, Taddei F: Asymmetric segregation of protein aggregates is associated with cellular aging and rejuvenation. Proc Natl Acad Sci USA. 2008, 105: 3076-3081. 10.1073/pnas.0708931105.PubMed CentralView ArticlePubMedGoogle Scholar
- Lloyd-Price J, Häkkinen A, Kandhavelu M, Marques IJ, Chowdhury S, Lihavainen E, Yli-Harja O, Ribeiro AS: Asymmetric disposal of individual protein aggregates in Escherichia coli, one aggregate at a time. J Bacteriol. 2012, 194 (7): 1747-1752. 10.1128/JB.06500-11.PubMed CentralView ArticlePubMedGoogle Scholar
- Lihavainen E, Mäkelä J, Spelbrink JN, Ribeiro AS: Mytoe: Automatic analysis of mitochondrial dynamics. Bioinformatics. 2012, 7 (28): 1050-1051.View ArticleGoogle Scholar
- Kaern M, Elston TC, Blake WJ, Collins JJ: Stochasticity in gene expression: from theories to phenotypes. Nat Rev Genet. 2005, 6: 451-464. 10.1038/nrg1615.View ArticlePubMedGoogle Scholar
- Carpenter AE, Jones TR, Lamprecht MR, Clarke C, Kang IH, Friman O, Guertin DA, Chang JH, Lindquist RA, Moffat J, Golland P, Sabatini D: CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biol. 2006, 7: R100-10.1186/gb-2006-7-10-r100.PubMed CentralView ArticlePubMedGoogle Scholar
- Young JW, Locke JC, Altinok A, Rosenfeld N, Bacarian T, Swain PS, Mjolsness E, Elowitz MB: Measuring Single-Cell Gene Expression Dynamics in Bacteria Using Fluorescence Time-lapse Microscopy. Nat Protoc. 2011, 7 (1): 80-88. 10.1038/nprot.2011.432.PubMed CentralView ArticlePubMedGoogle Scholar
- Al-Kofahi Y, Lassoued W, Lee W, Roysam B: Improved automatic detection and segmentation of cell nuclei in histopathology images. IEEE T Biomed Eng. 2010, 57 (4): 841-852.View ArticleGoogle Scholar
- Sliusarenko O, Heinritz J, Emonet T, Jacobs-Wagner C: High-throughput, subpixel-precision analysis of bacterial morphogenesis and intracellular spatio-temporal dynamics. Mol Microbiol. 2011, 80 (3): 612-627. 10.1111/j.1365-2958.2011.07579.x.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang Q, Niemi J, Tan CM, You L, West M: Image segmentation and dynamic lineage analysis in single-cell fluorescence microscopy. Cytometry. 2010, 77 (1): 101-110.PubMed CentralPubMedGoogle Scholar
- Bernstein JA, Khodursky AB, Lin PH, Lin-Chao S, Cohen SN: Global analysis of mRNA decay and abundance in Escherichia coli at single-gene resolution using two-color fluorescent DNA microarrays. Proc Natl Acad Sci USA. 2002, 99: 9697-9702. 10.1073/pnas.112318199.PubMed CentralView ArticlePubMedGoogle Scholar
- Elowitz MB, Leibler S: A synthetic oscillatory network of transcriptional regulators. Nature. 2000, 403 (6767): 335-338. 10.1038/35002125.View ArticlePubMedGoogle Scholar
- Lutz R, Lozinski T, Ellinger T, Bujard H: Dissecting the functional program of Escherichia coli promoters: the combined mode of action of Lac repressor and AraC activator. Nucleic Acids Res. 2001, 29: 3873-3881. 10.1093/nar/29.18.3873.PubMed CentralView ArticlePubMedGoogle Scholar
- Dabov K, Foi A, Katkovnik V, Egiazarian K: Image denoising by sparse 3D transform-domain collaborative filtering. IEEE T Image Process. 2007, 16 (8): 2080-2095.View ArticleGoogle Scholar
- Otsu N: A threshold selection method from gray-level histograms. IEEE T Syst Man Cyb. 1979, 9 (1): 62-66.View ArticleGoogle Scholar
- Wang K, Wu J, Gao L, Pian Z, Guo L: Magnetic Resonance Images Edge Detection Based on Multi-scale Morphology. IEEE/ICME International Conference on Complex Medical Engineering. 2007, 744-747.Google Scholar
- Mukhopadhyay S, Chanda B: Multiscale morphological segmentation of gray-scale images. IEEE T Image Process. 2003, 12 (5): 533-549. 10.1109/TIP.2003.810757.View ArticleGoogle Scholar
- Canny J: A Computational Approach To Edge Detection. IEEE T Pattern Anal. 1986, 8 (6): 679-698.View ArticleGoogle Scholar
- Shih FYC, Mitchell OR: Threshold Decomposition of Gray-Scale Morphology into Binary Morphology. IEEE T Pattern Anal. 1989, 11 (1): 31-42. 10.1109/34.23111.View ArticleGoogle Scholar
- Prokop RJ, Reeve AP: A Survey of Moment-Based Techniques For Unoccluded Object Representation and Recognition. CVGIP: Graph Model Im. 1992, 54 (5): 438-460. 10.1016/1049-9652(92)90027-U.Google Scholar
- Duda RO, Hart PE, Stork DG: Pattern classification. 2001, John Wiley & Sons, 2Google Scholar
- Hillier FS, Lieberman GJ: Introduction to Operations Research. 2001, McGraw-HillGoogle Scholar
- Li F, Zhou X, Ma J, Wong ST: Multiple Nuclei Tracking Using Integer Programming for Quantitative Cancer Cell Cycle Analysis. IEEE T Med Imaging. 2010, 29 (1): 96-105.View ArticleGoogle Scholar
- Peabody DS, Lim F: Complementation of RNA Binding Site Mutations in Heterodimers of Bacteriophage MS2 Coat Protein. Nucleic Acids Res. 1996, 24: 2352-2359. 10.1093/nar/24.12.2352.PubMed CentralView ArticlePubMedGoogle Scholar
- Peabody DS: Role of the coat protein-RNA interaction in the life cycle of the bacteriophage MS2. Mol Gen Genet. 1997, 254: 358-364. 10.1007/s004380050427.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.