Region-based progressive localization of cell nuclei in microscopic images with data adaptive modeling

Background Segmenting cell nuclei in microscopic images has become one of the most important routines in modern biological applications. With the vast amount of data, automatic localization, i.e. detection and segmentation, of cell nuclei is highly desirable compared to time-consuming manual processes. However, automated segmentation is challenging due to large intensity inhomogeneities in the cell nuclei and the background. Results We present a new method for automated progressive localization of cell nuclei using data-adaptive models that can better handle the inhomogeneity problem. We perform localization in a three-stage approach: first identify all interest regions with contrast-enhanced salient region detection, then process the clusters to identify true cell nuclei with probability estimation via feature-distance profiles of reference regions, and finally refine the contours of detected regions with regional contrast-based graphical model. The proposed region-based progressive localization (RPL) method is evaluated on three different datasets, with the first two containing grayscale images, and the third one comprising of color images with cytoplasm in addition to cell nuclei. We demonstrate performance improvement over the state-of-the-art. For example, compared to the second best approach, on the first dataset, our method achieves 2.8 and 3.7 reduction in Hausdorff distance and false negatives; on the second dataset that has larger intensity inhomogeneity, our method achieves 5% increase in Dice coefficient and Rand index; on the third dataset, our method achieves 4% increase in object-level accuracy. Conclusions To tackle the intensity inhomogeneities in cell nuclei and background, a region-based progressive localization method is proposed for cell nuclei localization in fluorescence microscopy images. The RPL method is demonstrated highly effective on three different public datasets, with on average 3.5% and 7% improvement of region- and contour-based segmentation performance over the state-of-the-art.


Background
Microscopic image analysis is becoming an enabling technology for modern systems-biology research, and cell nucleus segmentation is often the first step in the pipeline. Despite recent advances, the segmentation performance remains unsatisfactory in many cases. For example, on the popular public databases [1], the state-of-the-art segmentation accuracies are just around 85%.
The challenges of automated cell nucleus segmentation mainly arise from two imaging artifacts, as shown http://www.biomedcentral.com/1471-2105/14/173 Figure 1 An example image (left) and corresponding segmentation ground truth (right) from data set [1]. It can be seen that besides the pixel-wise inhomogeneity within a cell nucleus, some cell nuclei exhibit much lower intensities than the others; and although the background looks generally dark, it is indeed highly inhomogeneous, with some fairly bright areas and also a few noisy regions displaying very high intensities.

Related work
Numerous works have been conducted on segmenting various structures in cell images [2,3], and unsupervised approaches appear to dominate. For example, the morphological methods based on thresholding, k-means clustering or watershed [4][5][6][7][8] can be quite effective, as long as the objects exhibit good contrast with the background. Watershed methods are also effective in separating touching cells, although the results might deviate from the actual contours slightly. A more popular trend of unsupervised segmentation is the energy-based deformable models, based on active contours [9] or level sets [10][11][12][13][14][15]. Compared with modeling contours explicitly, level sets have the advantage of being non-parametric and free from topology constraints. It is also relatively easy to incorporate continuous object-level regularization into level sets, such as shape priors. Another type of energy-based model is based on graph search [16,17], graph cuts [18,19] or normalized cuts [20]. Such methods attempt to derive the segmentation with global constraints, using well-defined graphical structures to represent the spatial relationships between regions. Many of these methods require good initial seeds or contours. However, the usual initialization techniques, such as thresholding and watershed, would not handle images with high inhomogeneities well, hence causing extra or missing detection of cell regions. Such detection errors during initialization could propagate into the final segmentation outputs.
It has been shown that intensity inhomogeneities can be tackled by integrating convex Bayesian functional with the Chan-Vese model [14], and discrete region-competition [15] based on the piecewise-smooth Mumford-Shah model [21]. However, without performing cell detection explicitly, the deformable models might become very complicated in order to filter background regions with cell-like features while keeping cell regions with background-like features. To detect cells from inhomogeneous background, one way is to reconstruct the ideal image [22,23], which however, requires specific imaging modeling. Reconstruction can also be built into active contours with constrained iterative deconvolution without explicitly computing the inverse problem [24,25]; however, it requires the point-spread function of a microscope, which is measured or modeled. Another way is to enhance the objects using h-dome transformation [26]; however, it might have difficulties with inhomogeneous foreground. The inhomogeneity can also be reduced with reference-based intensity normalization [27]; however, the image-level normalization would not well handle the intra-image variation. In addition, shape-based nucleus detection has been proposed, with Laplacian of Gaussian (LoG) [28] or sliding band filter (SBF) [29]. While the latter method is less sensitive to low contrast and better representative of irregular shapes, the detection accuracy partially relies on validation from the corresponding cytoplasm image, which is not always available.
Different from the unsupervised approaches, classification-based methods have also been proposed to incorporate prior information from labeled images. These classifiers include Bayesian [30], k-nearest neighbor (kNN) [31], support vector machine (SVM) [32], and atlas-based approaches [33]. Since the apparent difference between the foreground and background is their intensities, simple intensity-based features, such as histograms [30], have been widely used. On the other hand, the effectiveness of classification-based methods depends highly on the separation of feature spaces between foreground and background. Therefore, approaches based on a bag of local classifiers [30], and more complex features such as the local Fourier transform (LFT) [31], spatial information [33], and combination of appearance, shape and context features [32], have been proposed.
While most such supervised approaches describe the pixel-or region-level features, there are methods that tackle intensity inhomogeneity by explicitly modeling the inter-cell variations as more structural features. One way is to perform color standardization within pixelwise classification [34] to account for the inter-image intensity http://www.biomedcentral.com/1471-2105/14/173 variations. To also address the intra-image variations, contrast information between an image region and the global foreground and local interest regions is computed [35]. A similar approach is to estimate foreground probabilities based on intensity distributions derived from global images and local detection outputs [36]. While both approaches introduce cell-adaptive features, the methods for global feature representation and local region detection might not work well with large feature overlapping between the foreground and the background. An additional false positive reduction step has also been proposed to remove bright background regions that are misidentified as cell nuclei [37]. However, this approach requires a learned classifier, whose performance could be affected by inter-image feature variation. More differently, registration-based approach has also been studied, by creating a template set from training images and segmenting the testing image based on best matches [38]; such templates however, might have difficulties capturing large varieties of object shapes and textures.

Our contribution
The contribution of our work is to localize the cell nuclei in images with high intensity inhomogeneity with various data-adaptive modeling techniques in a progressive manner. Specifically, we design a three-stage cell nucleus localization method that: (1) salient regions representing cell nuclei and cell clusters are extracted with image-adaptive contrast enhancement; (2) the clusters are further processed to identify true cell nuclei based on feature-distance profiles of reference regions with clusteradaptive probability estimates; and (3) the contours of detected cell nuclei are refined in a graphical model with region-adaptive contrast information. Figure 2 gives an overview of the proposed method.
We also design distinctive data-adaptive priors that can be categorized by the level of generalization: (1) globallevel features modeled as support vectors from training images; (2) image-level features representing the distribution of varying appearances of the nearby cell nuclei; and (3) region-level features computed at all three stages of the method for interest region detection, candidate validation and contour refinement. Being adaptive to the specific image or interest region, the image-and regionlevel features are especially effective in accommodating the intensity inhomogeneities.
Compared to localization methods based on global criteria (e.g. thresholding or feature-based classification), our approach is more capable of accommodating (1) intensity variations between cell nuclei (intra-and inter-images) and (2) feature overlapping between cell nuclei and background areas. Compared to the energy-based techniques that target pixel-wise segmentation (e.g. level sets and graph cuts), our method has a stronger focus on cell nuclei detection with explicit modeling of cell-specific characteristics, to effectively filter cell-like background regions and identify obscure cell nuclei.
We suggest that the proposed region-based progressive localization (RPL) method can be potentially extended to other localization problems, if the objects of interest can be modeled as regions with distinct features from the surrounding background. A similar three-stage approach would be used, and the application-specific modifications would mainly focus on the feature design. One example could be tumor localization in functional images.

Initial segmentation
While cell nuclei might appear similar to the background, there is always some degree of contrast between them. Such an observation motivates us to localize the cell nuclei by extracting salient regions. During initial segmentation, we do not have strict requirements about the extracted regions. In particular, if multiple cell nuclei are tightly connected, or cell nuclei are surrounded by high-intensity background and difficult to differentiate, identifying them as a single region is acceptable. We design a contrastenhanced salient region detector for initial segmentation.
Specifically, an iterative approach is developed based on the maximally stable extremal region (MSER) method Figure 2 The high-level flow chart of our proposed region-based progressive localization method. In this example, the cell nuclei and clusters are first extracted during initial segmentation with contrast-enhanced salient region detector, then missed or falsely detected cell nuclei are further processed in decluster processing with classification-based candidate identification and probability estimation via distance profile for candidate validation, and better contour delineation is finally achieved with regional contrast-based graphical model. For easier viewing, a quadrant of the original image is shown here, and similarly for Figure 3 [39]. Since MSER does not require any initial contour and the region stability is constrained by local regional information, it is easy to use and able to accommodate large intra-image variations. However, the effectiveness of MSER depends highly on the intensity contrast between the foreground and background. If the contrast is low, some regions would not be detected (e.g. Figure 3c). It is intuitive to add contrast enhancement. However, basic approaches such as intensity stretching would not work due to large intensity span. Instead, we design an iterative approach by alternating between the following two steps. First, interest regions {R} are detected using MSER, as shown in Figure 3c. Second, based on the detection result, the image is enhanced (Figure 3d) by: where {R} 0 and {R} 2 denote the minimum and mean intensities of the detected interest regions in I, and C is a scaling constant. The normalization factor 0.5({R} 0 +{R} 2 ) is chosen based on: (1) it should normally be smaller than C so that all pixels in I are scaled up with contrast between pixels increased proportionally; and (2) it should not be so small that the image becomes distorted from the original patterns with intensities capped at 255 for grayscale images. The iteration stops when the number of regions created does not change any further. With such a contrastenhanced approach, better region detection output can be seen in Figure 3e. The resultant regions are either single-level, or form a hierarchy of lower-and upper-level regions.
It is also observed that during each iteration, the parameter MaxVariation in MSER (using VLFeat library [40]) needs to vary for individual images to better accommodate the inter-image variations. Therefore, the parameter value is determined at runtime by first setting MaxVariation to v 1 then gradually reducing it by a certain step v until it reaches v 2 or the number of region levels is larger than one. Furthermore, while the resultant singleand upper-level regions are mostly cell nuclei, occasionally under-segmentation happens. In other words, a single cell nucleus could be divided into two nested regions and the upper-level region would become a undersegmented portion of the cell nucleus. To reduce such under-segmentation, we find that if the combined area of two nested regions is roughly elliptical with a suitable size, they can be merged as a single region. The shape and size criteria are determined using a linear-kernel binary SVM obtained from the training data. The overall process of initial segmentation is listed in Algorithm 1.

Decluster processing
As seen in Figure 3e, the detected single-and upper-level regions usually represent the cell nuclei, and lower-level regions usually represent the background with elevated intensities. However, the upper-level regions could contain false positives caused by bright background, and lower-level regions could also include undetected cell nuclei. It is also observed that such incorrect detections are mainly present among the two-level nested regions (i.e. clusters), while the single-level regions are normally true cell nuclei. Therefore, in the second stage, we focus on further processing on the detected clusters, with two objectives. First, we expect to identify any cell nucleus that has not been detected after the initial segmentation. Such cell nuclei typically exhibit similar intensities to the surrounding background, and hence would not be highlighted as salient regions. Second, we need to filter out high-intensity background regions, which usually have rounded or irregular shapes, and could be easily confused as cell nuclei. A two-step approach is designed, using candidate identification then candidate validation. An example output is shown in Figure 3f.
Formally, let U = {u i : i = 1, ..., N U } be a detected cluster, with N U pixels u i . Define the set of labels {F, B} representing the foreground (i.e. cell nuclei) and background respectively, and a foreground region as a connected component G x ⊂ U with ∀u i ∈ G x : l i = F. The problem is to label each pixel u i ∈ U as l i = {F, B}, with the object-level constraint that any detected foreground region G x should have suitable characteristics as a cell nucleus.

Candidate identification
In the first step, we try to identify a set of non-overlapping candidate foreground regions {G x } from each cluster U by labeling each pixel u i as foreground or background. We specify that any upper-level region enclosed in a cluster U is a candidate region G x . To identify more candidates from the cluster U itself, it is observed that to differentiate between F and B pixels, the texture feature in a local patch is more discriminative than pixel intensities. For example, compared with cell nuclei, the background usually has more homogeneous texture that might be dark or bright. In this work, we choose to use the scale-invariant feature transform (SIFT) descriptor [41], which describes the gradient distribution within a local patch and is invariant to scale, translation and rotation. SIFT feature of each pixel u i is computed, and then labeled using a binary SVM.
The SVM kernel is polynomial, with other default settings in LIBSVM [42]. A connected component of F pixels is identified as a candidate region G x .
While most of the candidate regions are true cell nuclei, some are actually bright background areas with round shapes (e.g. the first example in Figure 4). To filter out the false detections, we pass them to the candidate validation stage.

Candidate validation
In the second step, we validate if the identified candidate region G x in image I is a cell nucleus. There are two reasons that motivate this step. First, there might be misclassification during candidate identification due to inter-image intensity variations (e.g. different appearances of the cell nuclei between the two examples in Figure 4). The labeling performance could be improved based on reference information gathered from the testing image itself. Second, pixel-level labeling based on SIFT features has limited spatial information and often does not represent the overall region G x . We design a probability estimation via distance profile method to derive the probability Q(G x , F) of G x being a valid cell nucleus based on the feature-distance profiles of other reference cell nuclei, as detailed below.
Probability inference Although cell nuclei in an image could have varying characteristics, we expect that G x , if representing a true cell nucleus, should have similar features to the other cell nuclei in the same image, especially those spatially adjacent to G x , as can be seen from the examples in Figure 4. Therefore, if we have a set of determined cell nuclei in I, we can use them as references to validate G x . To cope with inter-image variations, we would only select references from the image I in which G x resides. This means we could not use the ground truths for reference construction. Instead, we use the singleand upper-level regions that are detected during initial segmentation as references.
We use these references by first creating a distance profile per reference, and computing the probability of G x being a cell nucleus based on its feature distance to each reference. Specifically, assume within an area near G x , there are K reference regions G = {G k : k = 1, ..., K}.
Here near is defined as both G x and G k being in the same quadrant of image I. Let f x describe the region-level feature of G x , and the feature distance between G x and G k as δ(f k , f x ) (details of f and δ in the next two subsections). Intuitively, the more similar G x and G k are, the more likely G x is a cell nucleus. However, since G k may be a false positive detection, decision based on direct feature distance δ(f k , f x ) might be error prone. Therefore, we devise an alternative hypothesis that, if δ(f k , f x ) is comparable with http://www.biomedcentral.com/1471-2105/14/173 we use the non-parametric kernel density estimation (KDE): where δ k,x is short for δ(f k , f x ), K(·) is the Gaussian kernel and h k is the bandwidth approximation following normal distribution of all data samples {∀k : δ(f k , f k )}. The density valuep 0 (δ k,x ) is then normalized by the maximum density of the distribution to obtain the comparability measure in terms of probabilityp(δ k,x ) ∈[ 0, 1]: With this model,p(δ k,x ) is larger when δ k,x approaches the Gaussian mean of the samples, which means that G x is more likely a cell nucleus if the distance between G x and G k is similar to how the other references {G k } vary from G k .
Next, by combining the estimatesp(δ k,x ) from all references {G k }, the final probability of G x being a cell nucleus is derived: The averaging operation helps to ensure that a single reference G k with very different features from G x would not affect the overall probability Q(G x , F) significantly.
Then, based on Q(G x , F), we define a thresholding rule to determine if G x is a valid cell nucleus: where G x denotes other candidate regions that are within the same cluster U as G x and x = x; α 1 and α 2 are http://www.biomedcentral.com/1471-2105/14/173 predefined thresholds. Examples of the density computation and probability derivation are shown in Figure 4, and the overall process of candidate validation is listed in Algorithm 2. Extract the appearance features f x ; for k = 1, ..., K do Compute the appearance distance δ k,x ; Derive KDEp 0 (δ k,x ) and probabilityp(δ k,x ); end Generate the final probability Q(G x , F); end for x = 1, ..., X do Obtain the labeling l(G x ); end Appearance feature We observe that a region tends to comprise patches of similar textures and repetitive patterns. Therefore, we choose to represent G x with bagof-features. First, the image I that contains G x is divided into a grid of patches {P}. Then for each patch, we represent its texture feature by its minimum, maximum, mean intensity, standard deviation, and a histogram of intensity differences between each pair of pixels. Each patch-wise feature is then assigned a feature word. A histogram summarizing the occurrence frequencies of such feature words in G x is defined as f x . Here each feature vector is normalized by the size of G x , to represent the percentages of various intensities and feature words in G x .
Note that if G x is a newly identified candidate during decluster processing, G x might only represent a small under-segmented portion of the actual cell nucleus due to the pixel-level labeling. Therefore, to have a good summary of the actual candidate feature, we first estimate an elliptical region G x o that is a minimum volume ellipsoid covering G x [43]. To avoid including many background pixels into G x o , we ensure G x o is part of the cluster U in which G x is detected: in place of G x as the detected candidate, from which f x is computed. The refined elliptical regions are shown in Figure 4c.
Appearance distance To compute the distances δ(f k , f x ) between two histogram features, the diffusion distance [44] is used. The diffusion distance models the distance between histogram-based descriptors as heat diffusion process on a temperature field. Compared to the bin-tobin histogram distances, such as Euclidean distance, the diffusion distance is able to measure cross-bin distances, avoiding explicit computation of histogram alignment. While the earth mover's distance (EMD) [45] has similar advantages, the computation of diffusion distance is much faster, with O(H) complexity only, where H is the number of histogram bins.

Contour refinement
At this stage, a detected region could contain a single or multiple cell nuclei, which could be under-segmented or include extra background. We thus expect to achieve better contour delineation of cell nuclei. Our idea is that, while the foreground and background are often inhomogeneous, there is always relatively good contrast between them in a local area. Therefore, by performing contour refinement for each detected cell region G individually, the foreground and background can be better differentiated by analyzing the localized contrast information. We employ a regional contrast-based graphical model for the contour refinement.
Specifically, a conditional random field (CRF) [46] with the following energy function is designed: where G denotes the detected region G plus its surrounding area of a fixed width (half of the short axis of G) (Figure 5b), and L denotes the labeling vector of all pixels in G. Then, the model attempts to refine the contour of G by relabeling each pixel u i ∈ G as l i = {F, B}.
Here η(l i ) is the unary contrast-based intensity term, η(l G ) combined with ϕ(·) is the contrast-based detection term with l G representing the detected region G, and φ(·) is the spatial term associating neighboring pixels u i and u i . The constant 0.5 is set to obtain equal contributions from the unary costs ( i η(l i ) + η(l G )) and the combined pairwise costs ( i ϕ(l i , l G ) + i,i φ(l i , l i )). Graph cut [47] method is used to derive the most probable labeling L that minimizes the energy function, to produce the final segmentation of cell nuclei from G. Here our customized http://www.biomedcentral.com/1471-2105/14/173 definition of the intensity term and inclusion of the detection term are the main distinctions from the other CRF constructs [35,48,49]. The contrast-based intensity term η(l i ) describes the unary costs of pixel u i labeled as l i ∈ {F, B}. Basically, the costs of l i = F and l i = B represent the inverse probabilities, and the probability pr(u i , F) of l i = F is computed by: where I G denotes the mean intensity of G, and pr(u i , F) follows a sigmoid probability distribution based on the contrast feature f i . We expect pixels with f i > λ G to more likely represent the foreground. λ G is computed based on f G and f G , which are the mean and minimum of all feature values {f i : u i ∈ G}, and is adjusted by γ G for a balance of foreground and background partitioning in G. The parameter γ G is calculated at runtime, by gradually increasing it from γ 1 to γ 2 with a step value γ , and choosing the smallest γ G ∈[γ 1 , γ 2 ] that does not cause the entire G to be labeled as B. With pr(u i , B) = 1 − pr(u i , F), the cost values for both labels are: Note that since λ G would be closer to f G in most cases with small γ G , it would cause portions of G to have pr(u i , F) < 0.5 (i.e. u i labeled as background), resulting in possible under-segmentation. It is however not advisable to lower λ G , due to considerable overlap between low-intensity areas in G and the background. Therefore, we introduce a second contrast-based detection term to encourage labeling of l i = F. An auxiliary node l G is first included to the graph with the following unary costs: (11) where N G is the number of pixels in G, and such a large cost of l G = B ensures l G is assigned 1. Then for each pixel u i , a pairwise cost ϕ(l i , l G ) is computed based on the contrast ν(I i , I G ) between I i and the mean intensity of G: with δ(l i − l G ) = 1 if l i = l G and 0 otherwise, and ν(I i , I G ) = 1 if I i > I G , or: where · denotes the average Euclidean distances of all such pairwise distances in G. In this way, pixels with pr(u i , F) ≈ 0.5 could be better labeled with the additional cost factor; and obvious background pixels would still obtain the correct B label, with ϕ(l i , l G ) much lower than η(l i ).
The spatial term φ(l i , l i ) then further enhances the delineation by encouraging spatial labeling consistencies between neighboring pixels u i and u i . A pairwise cost for l i = l i is thus defined as: where δ(·) and ν(·) follow Eq. (12). Such a cost function implies that pixels with more similar intensities would be more penalized if they take different labels.

Materials and evaluation methods
Three different datasets that are publicly available with segmentation ground truth are used in this study. Their main properties are summarized in Table 1. The images in the first two datasets were acquired with nuclear markers whereas the third dataset also includes the cytoplasm. Detailed information can be found in [1,50]. Among the three, dataset 1 has higher contrast between cell nuclei and background. Datasets 2 and 3 have large intensity inhomogeneity and considerable degree of intensity overlapping between the cell nuclei and the background.  inclusion of cytoplasm in dataset 3 poses more challenges. The images in dataset 3 are preprocessed to remove the pink areas and converted to grayscale. Figure 6 shows an example image after the preprocessing.
Most parameters used in this study are set to the same values for all three datasets: (1) in Initial Segmentation, v1 = 0.7, v2 = 0.4 and v = 0.1; (2) in Probability Inference, α 1 = 0.6 and α 2 = 0.4; (3) in Appearance Feature, the number of histogram bins is 64, and the number of feature words is 12; and (4) in Contour Refinement, γ 1 = 0.25, γ 2 = 1 and γ = 0.25. While these settings are chosen empirically, using a common setting for all three datasets suggests that the method is robust to different image acquisition and manual tuning of parameters can be minimal. There are only two dataset-specific parameters. One is the patch size in Appearance Feature, which is 8 × 8 pixels for datasets 1 and 2, and 4 × 4 pixels for dataset 3. The smaller size for dataset 3 is chosen due to its smaller cell nuclei compared to datasets 1 and 2. The other parameter is C in Initial Segmentation, which is set to 128 for datasets 1 and 2, and 64 for dataset 3. This ensures the contrast enhanced images in dataset 3 would not become too bright to cause distortion.
For dataset 1, four representative images are selected to train two SVM classifiers, for the cell-cluster differentiation and candidate identification. While testing is performed on all images to make the results directly comparable with the state-of-the-art [14,38], we note that the testing results are not sensitive to the selection of training data, with very similar testing results observed based on different training sets. Similar procedures are performed for dataset 2. For dataset 3, in order to have comparable performance evaluation with [32,35], half of the images are used for training (images # 2, 3, 4, 5, and 7) and the rest for testing.  After initial segmentation (S-1), decluster processing (S-2), and contour refinement (S-3).
We evaluate the localization of cell nuclei by two measures. First, performance of object-level detection is evaluated by recall (R), precision (P), and accuracy (A): then the detection is considered TP [51]; and correspondingly FN and FP are determined. Second, the segmentation performance is evaluated by both region-and contour-based measures, including Dice, normalized sum of distances (NSD) and Hausdorff distance (HD): Here F represents the foreground pixels identified, M is the ground truth mask, and D(u i ) is the minimal Euclidean distance of pixel u i to ∂M of the corresponding reference nuclei, with ∂ indicating the contour.
We have compared with popular cell imaging segmentation techniques, including Otsu thresholding, k-means clustering and watershed [8]. Furthermore, in view of the popularity of level set for cell imaging and our design on tackling the intensity inhomogeneities, we have experimented with a level set method that has a similar focus, using the authors' released code [52], with initial contours generated using watershed method. For all methods, postprocessing is conducted to remove isolated segments that are smaller than 1/10 of the average size of foreground regions detected in the image. In addition, we report direct performance comparisons with the state-of-the-art results reported on the same datasets [14,32,35,38], by including the same performance measures as used in these works.

Cell detection
We report the object-level detection results in Table 2.
Comparing the results at various stages of the methodology, the improvement is larger on dataset 2 than dataset 1, e.g. 8.3% increase in detection accuracy on dataset 2 vs 0.8% increase on dataset 1. This is because inhomogeneity is more prominent on dataset 2 while dataset 1 exhibits clearer contrast between the cell nuclei and the background in most images. In our evaluation, a detection is only considered as TP if the overlap ratio in Eq. (18) is at least 0.5. Therefore, a largely over-or under-segmented object would be counted as FN for the second stage, and corrected after the contour refinement. This explains why although cell nuclei are detected after the decluster processing, the recall results only improve significantly after the third stage. On dataset 3, the presence of cytoplasm causes many cell nuclei to clutter into one region during the initial segmentation; this leads to FN. The third stage better differentiates the cell nuclei and cytoplasm, and the improvement is significant with 18.7% and 3.4% increase in detection recall and precision. Figure 7a gives a better overview of the overlap ratios obtained After initial segmentation (S-1), decluster processing (S-2), and contour refinement (S-3). http://www.biomedcentral.com/1471-2105/14/173 from the final localization outputs. While most cell nuclei exhibit ratios not less than 0.5, less optimal results are observed on dataset 3 again due to the influence from the cytoplasm. The performance improvement introduced by the iterative process of interleaving interest region extraction and image enhancement are shown in Figure 8a. The higher recall (i.e. on average 3.6% increase) suggests that such an approach is especially useful for identifying foreground regions that originally display low contrast from the background. The benefits of having candidate validation are shown in Figure 8b. By filtering out interest regions that are very different from the reference regions, the detection precision thus improves by on average 2%. The recall improves by on average 0.7% only, mainly because of the same constraints imposed by the overlap ratio.
To evaluate the effect of the default threshold setting α 1 for candidate validation, the receiver operating characteristics (ROC) curves are plotted by varying the threshold value. The probability estimates Q(G x , F)/ max x Q(G x , F) from all candidate regions are included in the plot, and candidate regions with at least 0.5 overlapping ratio with the ground truth are marked as foreground class and the rest as the background class. As shown in Figure 9, the 0.6 threshold setting provides a good balance between the TP and FP detections, with close to maximum TP rates. Note that the numbers of true negatives here are small (about 1/5 of positive samples), hence the FP rates appear relatively high. Table 3 summarizes the region-and contour-based segmentation results. On datasets 2 and 3, the decluster processing improves the Dice measure by about 3% and 4%, due to better object-level labeling of candidate regions. The contour-based measures, however, are mainly enhanced at the third stage of the methodology, with on average more than half reduction in NSD and HD. This is attributed to better contour delineations based on the detection results from the first two stages. Besides the mean values listed in the table, the distributions of Hausdorff distances on the final localization results are also shown in Figure 7b.

Nucleus segmentation
To further evaluate the design of the graphical model for contour refinement, the foreground probabilities for all pixels of interest are computed with the intensity term, as summarized in Figure 10. While many pixels exhibit suitable probabilities, some background pixels, especially those in datasets 2 and 3, have larger foreground probabilities and would lead to misclassification. The pixel-level classification is improved by introducing the contrast-based detection and spatial terms, as shown in Figure 11.

Performance comparison
The localization results using the standard approaches are listed in Table 4, with example outputs shown in Figure 6. Compared to our proposed method, the level set and watershed techniques produce the second best results for dataset 1, especially with good contour-based measures. However, without explicitly handling highintensity background regions, both methods result in about 3% lower detection precision. On dataset 2, our proposed method demonstrates stronger advantages, with 8.5% increase in detection accuracy,10.2% increase in Dice coefficient and 10.8 decrease in HD over the second best approach (i.e. level set). Both the level set and watershed approaches face the following challenges: (1) difficulty separating cell nuclei from surrounding background areas with low contrast, and (2) incapability of classifying background regions that resemble cell nuclei. On dataset 3, the intensity inhomogeneities within the cell nuclei and the cytoplasm make it particularly difficult to achieve good segmentation. As a result, the watershed method tends to largely over-segment the cell nuclei, generating many clusters and cause low detection recall and more errors in contour delineation. The level set method based on localized energy optimization is quite effective in splitting the clusters, but is less optimal for areas with high similarity between the cell nuclei and cytoplasm. The thresholding method does not perform as well as the level set or watershed approaches, but it does outperform the clustering-based approach. Compared to level set, our method achieves 7% increase in detection accuracy, 2.7% increase in Dice coefficient and 1.8 decrease in HD. Tables 2, 3 and 4 show that our proposed method delivers better localization even using only the initial segmentation step. Higher performance margins are obtained with decluster processing and contour refinement, especially on datasets 2 and 3.
A comparison with the state-of-the-art results reported for the same datasets is summarized in Table 5. Our method achieves better results in most measures, as bold-faced in the table. On dataset 1, 0.93 more FP cell nuclei are detected compared to the level set method [14]. It is possible that such false detections are caused by accidental highlighting of background regions during the iterative image enhancement for the initial segmentation stage. However, our method exhibits overall much better detection performance with minimal numbers of FNs (3.68 fewer than [14]) and only 1.43 FPs. The accuracy of pixel-level segmentation on dataset 2 improves significantly, as indicated by the 5% increase in Dice and Rand indices over [14]. 4.1% performance improvement of object-level accuracy over [35] on dataset 3 is also obtained. These observations suggest that our method is indeed quite effective in handling the intensity inhomogeneity issue that is the major cause hindering satisfactory segmentation on datasets 2 and 3. The improvement on the contour-based measures, i.e. on average 0.03 NSD decrease and 1.45 HD decrease over [14], also demonstrate the suitability of boundary delineation using region-based designs, i.e. the salient region extraction and graphical model-based contour refinement.
Our method is currently implemented in Matlab, running on a standard PC with a 2.66-GHz dual core CPU and 3.6 GB RAM. The computational time is related to the number of cells and the size of cells in an image. On a 1344×1024 pixel image with about 40 cell nuclei, an average 35 s is needed for the entire localization process. This is faster than applying the level set method [52], which requires about 45 s with 10 iterations.

Conclusions
A fully automatic localization method for cell nuclei in microscopic images is presented in this paper. Intensity inhomogeneities in cell nuclei and the background often cause unsatisfactory localization performance. Not many works have been reported to address this problem in a robust manner. We propose a method that exploits various scales of data-adaptive information to tackle the intensity inhomogeneity. First, the regions of interest, i.e. cell nuclei or clusters, are extracted as salient regions with iterative contrast enhancement. Then with featurebased classification and reference-based probability inference, the clusters are further processed to detect more Comparison between our proposed region-based progressive localization method (RPL), Otsu thresholding (OT), k-means clustering (KM), watershed (WS) and level set [52] (LS). http://www.biomedcentral.com/1471-2105/14/173 cell nuclei and filter out spurious regions. Lastly, based on regional contrast information encoded in a graphical model, the pixel-level segmentation is enhanced to create the final contours. This region-based progressive localization (RPL) method has been successfully applied to three publicly available datasets, showing good object-level detection and region-and contour-based segmentation results. Compared to popular approaches in this problem domain such as level sets, our method achieved consistently better performance, with on average 5.2% increase in Dice coefficient and 6% increase in object-level detection accuracy. Our method also outperformed the stateof-the-art with on average 3.5% and 7% improvement of region-and contour-based segmentation measures. We also suggest that the proposed method is general in nature and can be applied to other localization problems, as long as the objects of interest can be modeled as salient regions with measurable contrast from the background. As a future study, we will investigate improving the graphical model for better contour delineation. A potential approach is to incorporate an additional term as the cost of difference between the model image and the measured image, as inspired by [24]. The model image could be derived as a convolution of a point-spread function of the microscope with an object intensity function defined based on the pixel labels. We will also investigate replacing the pixel-wise labeling with region-level processing for computational efficiency while maintaining the segmentation accuracy. Other future work could explore the applicability of the proposed method on other types of images. Images with nuclear membrane marker and different nuclear markers such as the green fluorescent protein (GFP), and those with higher resolution or dimension, are of particular interest. To accommodate the specific characteristics of these images, possible changes to the method are to design more comprehensive intensity and texture features to differentiate among cell structures and background, and to enhance the contour refinement with boundary constraints.