The salient watershed algorithm
Given an EM image to segment, the first step is to produce accurate boundarypreserving superpixels. While many algorithms exist for this purpose, the classical watershed algorithm [16] is a natural choice due to its ease of use, efficiency, and scalability. Unfortunately, the standard watershed algorithm suffers from two significant problems: oversegmentation and leakage. Oversegmentation can usually be corrected with postprocessing steps (such as region merging); however, to extract regions from EM images that correspond to precise cellular structures, fixing leakage in the initial segmentation is critical. While datasetdependent heuristics may help resolve leakage, this does not address the general problem of watershed leakage when segmenting images across different EM preparations and imaging conditions.
To tackle these issues, we propose a novel variant of the watershed algorithm called Salient Watershed. The steps of our algorithm are:

1.
Denoise the image. We use nonlocalmeans smoothing [22] to both reduce the impact of local noise when detecting boundaries and to reduce unnecessary oversegmentation. In particular, we preprocess the original input image I with a 3 × 3 pixelwide nonlocalmeans filter [22] to obtain I _{
nl
} (Figure 2B).

2.
Detect highconfidence boundaries. First, we apply the Canny edge detector [20] on I _{
nl
} to obtain \nabla {I}_{\mathit{\text{nl}}}^{\mathit{\text{canny}}}. Second, we compute the Pb detector [21] on I _{
nl
} for a coarse estimate of boundary probabilities {I}_{\mathit{\text{nl}}}^{\mathit{\text{pb}}}, and then we compute an edge map \nabla {I}_{\mathit{\text{nl}}}^{\mathit{\text{pb}}} by thresholding {I}_{\mathit{\text{nl}}}^{\mathit{\text{pb}}} at a conservative threshold (1/200). Third, we combine these edges into a hybrid salient edge map via pixelwise multiplication: \nabla {I}_{\mathit{\text{nl}}}^{\mathit{\text{salient}}}=\nabla {I}_{\mathit{\text{nl}}}^{\mathit{\text{pb}}}.\ast \nabla {I}_{\mathit{\text{nl}}}^{\mathit{\text{canny}}} (Figure 2C). It has been previously shown that the probabilistic Pb edge detector [21] by itself cannot adequately segment EM images without retraining on specific type of images [5]. Combining the Canny and Pb boundary detectors gives us the ability to find highlikelihood salient boundaries that retain precise edge localization without resorting to parameter tuning or retraining for different kinds of tissue samples.

3.
Elevate watershed levels where Canny and Pb coincide. Next, we compute the Euclidean distance transform on \nabla {I}_{\mathit{\text{nl}}}^{\mathit{\text{salient}}} to obtain {I}_{\mathit{\text{dist}}}^{\mathit{\text{salient}}} and then compute an enhanced edge map {I}^{\mathit{\text{enhance}}}={e}^{2\ast {I}_{\mathit{\text{dist}}}^{\mathit{\text{salient}}}} (Figure 2D). This step elevates the watershed along the intersection of Canny and Pb lines and provides an exponential falloff as the distance to these lines increases. It also helps bridge small gaps that may exist in the boundaries.

4.
Run watershed on the enhanced image. Finally, we apply the classical watershed algorithm on I ^{enhance} to obtain the final oversegmented image {I}_{\mathit{\text{Ws}}}^{\mathit{\text{enhance}}} (Figure 2E).
By incorporating the notion of edge saliency into the watershed computation, we ensure that salient boundaries are preserved. This addresses the leakage problem consistently. While this procedure adds additional computational complexity to the original watershed procedure, Salient Watershed is a more robust algorithm that can be applied to many EM datasets to produce a firstpass segmentation without tuning parameters.
This algorithm produces an initial (oversegmented) set of superpixels (regions), which are then further collapsed using an agglomerative merging algorithm, as described below.
The region merging algorithm
Region merging is often performed after superpixels are generated to collapse neighboring regions. There are three aspects to region merging: the features used to represent each region, a measure of similarity between regions in featurespace, and an objective function for merging regions. We describe each of these aspects below.
Each region is defined by a normalized intensity histogram and a set of normalized texture histograms computed using pixel values in the region. Texture is an important cue used by humans when manually segmenting and annotating EM images [23], and its use has become popular in many computer vision tasks today [24]. Varma and Zisserman [25] proposed an effective set of 38 filters (6 orientations × 3 scales × 2 oriented filters + 2 isotropic filters), but only recorded the maximum filter response across each orientation, leading to 8 total filter responses at each pixel. Each region is thus represented by a b × 9 feature matrix, where b = 32 is the number of bins in each histogram.
Most previous approaches compute the similarity between two regions in featurespace based on the Euclidean or Manhattan distances [26], by comparing means and standard deviations of feature vectors [27, 28], or using informationtheoretic measures [29]. The downside of these measures is that they treat each histogram bin independently and, as a result, two histograms that differ slightly in adjacent bins are treated as equally distant as two histograms that differ equally in farapart bins. To avoid this problem, we use the Earth Mover’s Distance (EMD) [30], which computes the minimum cost to transform one histogram to exactly match the other using transformation costs that depend on the linear distance between bins. EMD can be solved quickly using a constrained bipartite network flow routine [31]. Overall, the similarity between two adjacent regions r and r^{′} is defined as:
\begin{array}{ll}\phantom{\rule{6.5pt}{0ex}}s(r,{r}^{\prime})& =exp(min({r}_{\text{size}},{r}_{\text{size}}^{\prime}\left)\right)+exp(\text{EMD}({\text{Int}}_{r},{\text{Int}}_{{r}^{\prime}})\phantom{\rule{2em}{0ex}}\\ \phantom{\rule{1em}{0ex}}\alpha \sum _{i=1}^{8}\text{EMD}({\text{Text}}_{r,i},{\text{Text}}_{{r}^{\prime},i})),\phantom{\rule{2em}{0ex}}\end{array}
(1)
where the first term biases towards collapsing smaller regions; Int_{
r
} is the normalized intensity histogram of region r; Text_{r,i} is the i^{th} normalized texture histogram of region r; and α is a parameter to weigh the contribution of the texture component (we set α = 1/8). We use EMD to compute the similarity between both normalized features (intensity and texture), and thus born terms lie on roughly the same scale.
The final aspect of the algorithm is the merging optimization function [26]. We define a predicate that states that every region r should be "sufficiently" different compared to each of its neighbors. Formally:
\begin{array}{l}\mathcal{P}\left(r\right)=\left\{\begin{array}{ll}\text{true}& \phantom{\rule{2em}{0ex}}\text{if}\phantom{\rule{2.77626pt}{0ex}}s(r,{r}^{\prime})\le \tau ,\forall {r}^{\prime}\in N\left(r\right)\\ \text{false}& \phantom{\rule{2em}{0ex}}\text{otherwise},\end{array}\right.\end{array}
(2)
Algorithm 1 RegionMerging(I,L,NumSPs)
where N(r) are the regions adjacent to r. If this statement is true for region r, we call r an "island". We seek to find a segmentation such that holds for every region. In graphtheoretic terms, we start with the region adjacency graph G = (V,E), defined by nodes V (regions) and with edges E connecting adjacent regions. To merge two regions means to contract the edge between them; our goal is thus to find a set of edges whose contraction results in a graph satisfying for every region. We find such a set using a greedy agglomerative algorithm: we start with the regions produced by the Salient Watershed algorithm, and iteratively merge the pair of neighboring regions that are most similar. This process can stop either when the similarity between any two adjacent regions is < τ (at which point every region is guaranteed to be an island according to τ) or when the desired number of superpixels is met (as we do here). Pseudocode of the region merging algorithm is shown in Algorithm 1.
Comparing segmentations versus groundtruth
To evaluate performance, we performed experiments and collected three 1000 × 1000pixel EM images of the nervous system: 2 images were from the fruit fly ventral nerve cord fixed using a high pressure freezing (HPF) protocol, and 1 image was from the mouse cortex using a perfusion DABbased protocol (e.g. [32]). We manually segmented membranes, mitochondria, and other neuronal structures in these images (Figure 1A and 1B) and extracted groundtruth boundary matrices for each. We also collected two additional images of the mouse cortex using HPF, which we analyzed qualitatively.
To compare an algorithm’s segmentation P with the groundtruth Q, we use two standard metrics: the asymmetric partition distance (APD) and the symmetric partition distance (SPD) [33]. APD(P,Q) computes, over all regions r ∈ P, the maximum percentage of pixels in r that map onto a single groundtruth segment. SPD(P,Q) finds the maximal matching between regions in P and Q and computes the overall percentage of pixels that must be deleted from both images in order to make each pair of matched regions equivalent. APD penalizes "spillover" of segments across groundtruth boundaries, but does not penalize oversegmentation. On the other hand, SPD measures exact 11 correspondence between segmentations and does penalize oversegmentation. We report 1 SPD(P,Q) as a percentage, so in both measures higher percentages are better.