We present the proposed cell segmentation method in this section. Effective contour detection is the first step of most region-based image segmentation methods [20–22]. We start by introducing a structured random forest (SRF) based method for fast and accurate muscle contour detection, SRF is selected because its: 1) fast prediction ability for high-dimensional data, 2) robustness to label noise [23], and 3) good support to arbitrary size of outputs. Next, a superpixel algorithm is used to generate region candidates. Finally, we present a hierarchical tree based method to select the optimal candidate regions based on CRF, Fig. 2 shows the entire process.

### Contour detection

Random forest (RF) classifier is an ensemble learning technique which combines *t* decision trees to form a forest \(\mathcal {F} = \left \{T_{j}\right \}_{j=1}^{t}\) [24]. Each tree *T*_{j} is trained independently and the final classification is determined by applying a majority voting to all the outputs of trees.

However, conventional RF can not capture the inherent contour structures inside local image patches so that it is difficult to obtain satisfactory contour detection performance [25]. In order to capture rich structures of contours during the SRF training, we propose to deploy SRF [26], a variation of RF, to detect the muscle cell contours. SRF is trained with a set of training data \(D = \{(x,y) \in \mathcal {X} \times \mathcal {Y}\}\), where \(\mathcal {X} = \mathbb {R}^{(d \cdot d) \times c}\) is the feature space of a *d*×*d* image patch, so that each pixel in the image patch is featured by a *c*-dimensional vector. The structured label \(y \in \mathcal {Y} \in \mathbb {Z}^{d \cdot d}\) corresponding to *x* is a patch cropped from the ground truth image, which is a binary image having the value of 1 in contour pixels and 0 otherwise.

To enable the training of SRF with structured labels, in node *i* where training data *D*_{i} falls, we adopt a mapping function proposed by [26] to map structured labels into a discrete space for each *x*∈*D*_{i}, which intrinsically consider the contour structure information. Then a split function *h*(*x*,*θ*)=**1**[*x*(*k*)<*τ*] splits and propagates the data *D*_{i}⊂*X*×*Y* to the left *L* (when *h*=0) or right *R* (*h*=1) substree of node *i*, which is the same as the node splitting procedure of RF. The *τ* and *k* are determined by maximizing the standard information gain criterion *C*_{i} at node *i* [24]:

$$ C_{i} = H(D_{i}) - \sum\limits_{o \in \{L,R\}} \frac{|D_{i}^{o}|}{|D_{i}|}H\left(D_{i}^{o}\right), $$

(1)

where *H*(*D*_{i}) is the Gini impurity measure, \(H(D_{i}) = \sum _{l} c_{l}(1-c_{l})\). *c*_{l} denotes the proportion of data in *D*_{i} with label *l*. After the data in *D*_{i} is propagated to the child nodes, the above steps are performed recursively until leaf nodes are reached (i.e., the stopping criteria is satisfied [24]). The most representative structural label *y* (close to mean) in each node is stored as its structured prediction [27].

In practice, following [25], we utilize three color channels computed using the CIE-LAB color space. Two gradient magnitude channels are computed with varying amounts of blur (we use Gaussian blurs with *σ*=0 and *σ*=1.5). Additionally, eight orientation channels in two image scales to represent the features of image patches. Such that in total *c*=13 channels in \(\mathcal {X}\) are extracted by using optimized code from [28] available online^{Footnote 1}. To prevent overfitting when training SRF, each tree randomly selects a subset of training samples and features for training. In the testing stage (see Fig. 2), since the prediction of each tree for each pixel is independent, we can parallelize this stage using a multi-thread technique [26].

### Region candidate generation

Based on the contour image detected by our SRF contour detector, region candidates can be generated using superpixel techniques, which is able to group similar pixels in terms of color, edge strength (referring to our detected contour image), and spatial cues.

In this paper we use the well-known oriented watershed transform and ultra-metric contour map (OWT-UCM) [29] algorithm to obtain our region candidate maps for three main reasons: 1) it is very efficient to handle large-scale images; 2) regions in a map are well nested at different thresholds; 3) it guarantees that the boundaries of each region are closed and single-pixel wide. These characteristics can facilitate the parallelism of the subsequent proposed hierarchical tree based region selection algorithm. OWT-UCM takes a contour image as input and outputs an over-segmented region candidate map [30], which is illustrated in Fig. 2. The next step is to select those regions using our proposed hierarchical tree-based region selection algorithm.

### Hierarchical tree-based region selection

Given the over-segmented region candidate map, our region selection algorithm aims to select region candidates as final segmentation by merging or discarding the segments in the region candidate maps.

First we build a hierarchal tree structure using the region candidate map. In our hierarchical tree structure, the leaf nodes represent the initial regions of the region candidate map. The regions are pair-wised merged using a simple hierarchical clustering algorithm to construct the tree. The root node of the tree corresponds to the whole image region. Each node in the tree is a candidate region. The tree structure satisfies the “non-overlapping” criteria, a common way for hierarchical tree based segmentation methods [12, 31, 32].

Suppose there are *N* base candidate regions, the total number of nodes in the tree would be 2*N*−1. We denote *R*={*R*_{1},*R*_{2},…,*R*_{2N−1}} as the region candidate map consisting of a set of region candidates *R*_{i}. Our goal is to select nodes in the tree as our final muscle cell segments. We show that this can be achieved by the condition random field (CRF) algorithm [33].

CRF has been widely used in image segmentation. It is a probabilistic graphical model aiming at maximizing a posterior given a defined energy function. In our method, the energy function is defined as

$$ E(R) = \sum\limits_{i=1}^{2N{+}1} U_{i}(R_{i}) + \sum\limits_{(i,j) \in \hat{R}} V_{i}\left(R_{i},R_{j}\right), $$

(2)

where \(\hat {R}\) is the subset of *R* contains all adjacent regions (i.e., any leaf nodes of a common father node) in leaves of the hierarchical tree. *U*_{i}(*R*_{i}) is the unary term for region *R*_{i}, which is a score to evaluate the probability of *R*_{i} covering a complete cell segment. We adopt our previously developed method [12] to evaluate *U*_{i} by training a cell scoring classifier, which is able to assign a probability value to determine whether a segment is a good region candidate. In brief, a set of features based on multiple cues are proposed to represent the candidate regions and a standard RF classifier is trained to classify the cell regions. *V*_{i}(*R*_{i},*R*_{j}) is the pair-wise term to evaluate the dissimilarity between two regions *R*_{i} and *R*_{j}. We define *V*_{i}(*R*_{i},*R*_{j}) as

$$ V_{i}\left(R_{i},R_{j}\right) = \mu e^{-B\left(R_{i},R_{j}\right)} \times L\left(R_{i},R_{j}\right), $$

(3)

where *B*(*R*_{i},*R*_{j}) is the boundary strength and *L*(*R*_{i},*R*_{j}) is the boundary length. *μ* is a constant to trade-off the contribution of the two terms. These two terms can be calculated based on the single-pixel wide and closed region candidate maps generated by OWT-UCM [29].

The inference procedure is to minimize the energy function *E* so as to assign a label (1 means this region is a complete cell segment and 0, otherwise) to each region in the node and, at the same time, satisfy the “non-overlapping” criteria, i.e., any substree can only has one label. We deploy the pylon model, a hierarchical CRF model, to minimize *E* [34]. However, the tree will become very big as the number of initial segments inside increases. In the next section, we propose a strategy to divide the inference procedure into several sub-problems which can be parallelized using multi-core programming.