Segmentation and classification of two-channel C. elegans nucleus-labeled fluorescence images

Background Aging is characterized by a gradual breakdown of cellular structures. Nuclear abnormality is a hallmark of progeria in human. Analysis of age-dependent nuclear morphological changes in Caenorhabditis elegans is of great value to aging research, and this calls for an automatic image processing method that is suitable for both normal and abnormal structures. Results Our image processing method consists of nuclear segmentation, feature extraction and classification. First, taking up the challenges of defining individual nuclei with fuzzy boundaries or in a clump, we developed an accurate nuclear segmentation method using fused two-channel images with seed-based cluster splitting and k-means algorithm, and achieved a high precision against the manual segmentation results. Next, we extracted three groups of nuclear features, among which five features were selected by minimum Redundancy Maximum Relevance (mRMR) for classifiers. After comparing the classification performances of several popular techniques, we identified that Random Forest, which achieved a mean class accuracy (MCA) of 98.69%, was the best classifier for our data set. Lastly, we demonstrated the method with two quantitative analyses of C. elegans nuclei, which led to the discovery of two possible longevity indicators. Conclusions We produced an automatic image processing method for two-channel C. elegans nucleus-labeled fluorescence images. It frees biologists from segmenting and classifying the nuclei manually. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1817-3) contains supplementary material, which is available to authorized users.


Level set method
The level set method assumes that the gray level continuity or piecewise smoothness in the foreground. It is sensitive to the gradient ambiguities at the boundary and/or inside the nuclei. Our images contain nuclei of different ages, some of them have low intensity and complicated textures inside the nucleus, making the gradient disordered. Thus, this method is not suitable for our images.

Gradient vector flow tracking
The method in [1] is based on gradient vector flow tracking. It can segment densely packed, touching and not textured object; however, it cannot segment our textured nuclei properly. We find that the method often oversegments textured nuclei, and sometimes even results in holes in the nuclei, as shown in Figure S2 (b), especially the results in the red box. The primary reason is that the rough nuclear membrane texture in old worms produces disordered gradient flow magnitudes and directions, leading to the failure of gradient convergence.

Graph cut method
The graph cut method, which depends on the region connectivity of the foreground, cannot provide expected segmentation results because of the complicated nuclear textures. The method in [2] is based on the graph cut method. For comparison, we optimize the parameter described in this paper and find that for the images that contain both neuronal (small) and intestinal (large) nuclei, many nuclei are over-segmented, as shown in Figure  S3.

S2 Differences between green-channel and red-channel images
The differences between green-channel and red-channel images are discussed below: 1. Green-channel images indicate the nuclear membrane, while red-channel images indicate the chromosome, which is inside the membrane.
2. Green-channel images are clear, while some red-channel images contain many bubble-like features, as shown in Figure S4 (a-b). These bubbles would affect the segmentation results, especially when they stick to the nuclei.
3. Green-channel images have higher signal-to-noise ratio (SNR) than red-channel images. Because the Green fluorescent protein that we use has much higher photoconversion efficiency. It is obvious that the highest intensity of Figure S4 (a) is higher than Figure S4 (b), thus, the former one has higher contrast.
4. Green-channel images of old worms do not contain strong fluorescent noise, while some red-channel images do. An example is shown in Figure S4 (c-d).
5. Green-channel images do not contain germ cell nuclei because transgenic lmm-1::gfp are silenced in germ cells. But some red-channel images contain these nuclei, which are not the interest nuclei for our aging studies. An example is shown in Figure S4 (e-f).
Therefore, green-channel images are more reliable than red channel images for segmentation.

S3 Classification parameters
We constructed five classifiers using Support Vector Machine (SVM), Random Forest (RF), k-Nearest Neighbor (kNN), Decision Tree (DT) and Neural Net (NN). The classifiers were constructed using scikit-learn (http://scikit-learn.org/stable/), a machine learning library in Python. This section describes the details of the classifiers, including the range of parameter grid, the parameters and the functions we used for each classifier.

Range of parameter grid
We used GridSearchCV to find the optimal parameters, the parameter ranges of grid search were: