### Dataset and computing environments

We used two datasets for testing. The first is provided by Wang et al. [11], and it consists of genetic labeling chandelier cells imaged by the fMOST system with a resolution of 0.2 × 0.2 × 1 μm^{3}. The second is provided by Zhang et al. [23] and Jiang et al. [24], and it consists of pyramidal neurons at the same resolution. Two mice were used, one for the chandelier cells and the other for the pyramidal neurons. The mice were anesthetized deeply using a 1% solution of sodium pentobarbital and sacrificed by injection of ketamine. The semiautomatic neuron tracing used in the quantitative evaluation was performed with Amira software (version 6.1.1; FEI, Mérignac Cedex, France); the code of this method was written in MATLAB (R2019a, MathWorks) and run on a graphics workstation (Xeon CPU E5–2687 v3).

### Workflow of the proposed method

Figure 1 depicts the computational workflow, including its three stages. The skeleton optimization method is the second stage, and the first and third stages are the pre- and postprocessing of the neuronal morphology, respectively.

The first phase is to extract a branch from the digital skeleton of the tree structure (Steps 1–3 in Fig. 1). Specifically, we linearly interpolate the digital skeleton and select the branch to be adjusted. The second stage involves fine-tuning the points derived from the digital skeleton to the centerline of the neuronal fibers from the images (Steps 4–5 in Fig. 1). Specifically, we initially adjust the digital skeleton points that are far from the centerline of the neuronal fibers based on the image grayscale. Subsequently, provided that the neuron fiber shape is smooth and continuous, we describe this physical property with two shape restrictions. One restriction is designed based on the image grayscale: the difference in the signal brightness between the local voxels around the neuronal fiber centerline must be small. The other is designed based on geometry: the angle formed from three continuous points on the neuronal fiber centerline must not be extreme. These two restrictions are designed to finely adjust the digital skeleton points to the neuronal fiber centerline. Subsequently, we join the branches into a tree structure according to the connected relationships, which comprises the third stage (Step 6 in Fig. 1). Finally, the digital skeleton yields a digital neuron morphological file.

With regard to the above three stages, the second stage is a looping structure. See the flowchart in Fig. 1 for details.

### Skeleton point adjustment based on grayscale

According to the grayscale distribution of the voxels around the neuronal fibers, a preliminary adjustment of the digital skeleton points that are far away from the neuronal fiber centerline shown in the image is performed. We take advantage of the idea of using the mean shift to find the location of this centerline point. Because of anisotropy in the image stacks, we linearly interpolate the three-dimensional images in the *Z* direction to obtain images with the same resolution in the *X*, *Y* and *Z* directions. For a skeleton point *p*_{0}(*x*_{0}, *y*_{0}, *z*_{0}), take an adjustment in the *Z* direction as an example. We keep the *X* and *Y* coordinates unchanged and calculate the convolution of the voxels in the local range centered on *p*_{0} and the Gaussian kernel function in the *Z* direction. The point that has the largest convolution value is regarded as the centerline point *p*_{1} of the neuronal fibers from the images.

$$ {p}_1=\arg \underset{p}{\max } Gau\otimes G\left({p}_0\right) $$

(1)

Similarly, the *X* and *Y* coordinates of the skeleton point *p*_{1} are adjusted to the centerline point to finally obtain the adjusted point *p*_{2}.

### Skeleton point adjustment based on shape restrictions

To describe the tortuous neuronal fiber morphology with more accurate digital skeletons, it is necessary to fine-tune the skeleton points after the preliminary adjustment. The neuronal fibers are smooth and continuous, and this neuronal fiber property is shown in the images. Considering this property, for the point *p*_{2}, we designed an angle restriction and brightness restriction to judge whether *p*_{2} needs further fine adjustment. Specifically, for *p*_{2}, we determined its parent point *p*_{f}, the parent point of *p*_{f}, *p*_{ff}, and the child point *p*_{c}. The point *p*_{2} forms an angle ∠*p*_{f}*p*_{2}*p*_{c} with *p*_{f} and *p*_{c}. Because the neuron fibers are smooth, the angle ∠*p*_{f}*p*_{2}*p*_{c} tends to be close to 180°. We set the critical value of the angle *α*_{std} as an obtuse angle (135° is chosen in this dataset). If ∠*p*_{f}*p*_{2}*p*_{c} < *α*_{std}, *p*_{2} is considered to require further adjustment. For the brightness restriction, we compared the signal strength \( {G}_{p_2} \) at point *p*_{2} with the signal strength threshold *G*_{std}, thus designing *G*_{std} as follows:

$$ {G}_{std}=\max \left(\left({G}_{p_f}+{G}_{p_2}+{G}_{p_c}\right)/3,\kern0.5em {{G_B}_{ranch}}_{\_ mean}+{G}_{Branch\_ stdev}\right) $$

(2)

Here, \( {G}_{p_f} \) is the signal strength of *p*_{f}, taking the grayscale value of *p*_{f} in the same way as for \( {G}_{p_2} \) and \( {G}_{p_c} \). *G*_{Branch _ mean} and *G*_{Branch _ stdev} are the mean and standard deviation of the signal strength values of all points on the current branch. If \( {G}_{p_2}>{G}_{std} \), the signal strength of *p*_{2} is in accordance with that of the current branch. However, if \( {G}_{p_2}<{G}_{std} \), *p*_{2} is identified as a point that requires further adjustment. If its signal strength is small, this is contrary to the fact that the signal strength at the center of the neuronal fiber in images tends to be relatively strong.

According to these restrictions, we set out to find the centerline points of the neuronal fiber from the images. Prior to adjusting *p*_{2}, because *p*_{2} itself may be a biased point, we have to judge whether we need to find a new starting point other than *p*_{2} for subsequent computation. Here, if ∠*p*_{f}*p*_{2}*p*_{c} > *α*_{std} and \( {G}_{p_2}<{G}_{std} \), then there is only a problem with the brightness of point *p*_{2}. In this situation, the starting point *p*_{2 _ new} will be equal to *p*_{2}. If ∠*p*_{f}*p*_{2}*p*_{c} < *α*_{std}, this indicates that there is a serious deviation in the angle pertaining to *p*_{2}, and such a point is likely to have a problem with both brightness and angle. Therefore, we have to find a new starting point *p*_{2 _ new}. The assumption here is that \( \left\Vert \overrightarrow{p_2{p}_f}\right\Vert <\left\Vert \overrightarrow{p_2{p}_c}\right\Vert \). In the \( \overrightarrow{p_2{p}_c} \) direction, with *p*_{2} as the starting point, we draw \( \left\Vert \overrightarrow{p_2{p}_{rc}}\right\Vert =\left\Vert \overrightarrow{p_2{p}_f}\right\Vert \) and obtain a point *p*_{rc} on \( \overrightarrow{p_2{p}_c} \). The intersection of the vertical bisector of \( \overrightarrow{p_f{p}_{rc}} \) and \( \overrightarrow{p_f{p}_{rc}} \) is defined as the point *p*_{mid}. In the \( \overrightarrow{p_f{p}_{ff}} \) direction, we draw \( \left\Vert \overrightarrow{p_f{p}_s}\right\Vert =\left\Vert \overrightarrow{p_2{p}_f}\right\Vert \) and obtain a point *p*_{s} on the extension line of \( \overrightarrow{p_f{p}_{ff}} \) (the topological graph is shown in Figure S1 A of the supplementary materials). Then, *p*_{2 _ new} is located as:

$$ {p}_{2\_ new}=\left({G}_{p_{mid}}\cdot {p}_{mid}+{G}_{p_2}\cdot {p}_2+{G}_{p_s}\cdot {p}_s\right)/\left({G}_{p_{mid}}+{G}_{p_2}+{G}_{p_s}\right) $$

(3)

Here, \( {G}_{p_{mid}} \) represents the signal strength of *p*_{mid}, taking the grayscale value of *p*_{mid} in the same way as for \( {G}_{p_2} \) and \( {G}_{p_s} \). With the starting point *p*_{2 _ new}, we use the idea of the mean drift to search for the centerline points of the neuronal fiber from the images. The drift direction \( {\overset{\rightharpoonup }{\upsilon}}_2 \) in a spherical region *S*_{p} centered on *p*_{2 _ new} is calculated as follows:

$$ {\overset{\rightharpoonup }{\upsilon}}_2=\sum \limits_{p_i\in {S}_p}{a}_i\cdot {b}_i\cdot \overrightarrow{p_i{p}_{2\_ new}}/\left\Vert \overrightarrow{p_i{p}_{2\_ new}}\right\Vert $$

(4)

where *p*_{i} is the *ith* voxel in the spherical region *S*_{p}. \( {a}_{p_i} \) is the angle factor of *p*_{i}, \( {a}_{{\mathrm{p}}_i}=\sin {\theta_{p_i}}^2 \), where \( {\theta}_{p_i} \) is the angle between the vectors \( \overrightarrow{p_f{p}_{rc}} \) and \( \overrightarrow{p_{2\_ new}{p}_i} \), and \( {b}_{p_i} \) is the signal strength factor of *p*_{i}, \( {b}_{p_i}=\exp \left(-{\left({G}_{p_i}-\mu \right)}^2/2{\sigma}^2\right)-\exp \left(-{\left({G}_{p_{2\_ new}}-\mu \right)}^2/2{\sigma}^2\right) \), where , and \( \sigma =\left|\mu -{G}_{p_{2\_ new}}\right| \).

In the \( {\overset{\rightharpoonup }{\upsilon}}_2 \) direction, we search for the point

$$ {p}_3=\arg \max \left({\omega_1}^{\left\Vert \overrightarrow{p_i{p}_{2\_ new}}\right\Vert }{G}_{std}\sqrt{G_{p_i}/{G}_{std}}/{G}_{p_{2\_ new}}-{\omega}_2\cos \angle {p}_f{p}_{2\_ new}{p}_c\cos \angle {p}_f{p}_i{p}_c\right) $$

(5)

\( {G}_{p_i} \) is the signal strength at voxel *p*_{i}, taking the grayscale value of *p*_{i}. \( {G}_{p_{2\_ new}} \) is the signal strength of *p*_{2 _ new}. This means that *ω*_{1} and *ω*_{2} are the parameters that can be adjusted according to the image quality. When the parameter *ω*_{1} is set to a larger value, this means that we regard the grayscale value as a large influence on the points. However, when *ω*_{2} is set to a larger value, this means that we regard the angle deviation of the points as having a greater influence.

### Quantitative evaluation on the synthetic dataset

Since the goals of most automatic neuron tracing methods are different from ours, the semimanually traced neuron skeleton, which is regarded as the ground truth in most studies, is one of our targets that needs to be optimized. Therefore, we have to find a ground truth, rather than the semimanually traced results, to evaluate the performance of our methods. Here, we design a three-dimensional synthetic neuron fiber image to evaluate our methods (Fig. 2). First, we choose a helix curve (t∈(π,3π], x = 30 × sin(t) + 60, y = 30 × cos(t) + 60, z = 10 × t) as the centerline of the neuron fiber, which is the ground truth of the traced neuron skeleton. Then, in three-dimensional images, we use continuous voxels to represent this helix curve. Next, we choose the Gaussian function as the point spread function (PSF) of the imaging system and convolve it with the helix curve images. The result simulates the neuron fiber images. Based on these images, we semimanually trace the neuron fibers. Following a previous procedure [8, 11, 19], the semimanually traced neuron skeleton is completed interactively in Amira visualization and data analysis software. The skeleton is checked back-to-back by 3 people. The traced skeleton is regarded as the optimization target. Finally, we use our method to optimize the skeleton.

The noise of nearby neuron fibers and other fibers can affect the quality of the automatically traced neuron skeleton. To evaluate the performance of our method in such cases, we add Gaussian noise with different signal-to-noise ratio levels (Fig. 2c and d). For the case of nearby neurons, we set a straight line as representing another neuron whose distance from the helix curve changes (Fig. 2e and f).

In the research on automatic neuron tracing methods, there are various parameters used to evaluate the traced results from different perspectives [22, 25, 26]. Among these parameters, the significant spatial distance (SSD) is a parameter that can intuitively evaluate the distance between two neuron skeletons, and it is suitable for us to use to quantitatively evaluate the difference between the traced neuron skeleton and the ground truth. The SSD is defined as the average distance of the reciprocal minimal spatial distances between two neuron skeleton nodes. A larger SSD means that these two neuron skeletons are more different from each other. In Peng’s work [22], if the distance between two nodes is less than two pixels, it is difficult to determine the difference visually. As a result, when calculating the SSD, the minimum reciprocal distance is set to 2 pixels. However, in our experiment, the maximum reciprocal distance between the unoptimized skeleton and the ground truth is less than 2 voxels. Therefore, we set the minimum reciprocal distance to 1 voxel. In addition, we also provide the SSDs that the minimum reciprocal distances are set to 0 and 0.5 voxels.

### Process for a whole-brain three-dimensional image dataset

We used image blocks from TDat [27] and converted the three-dimensional image dataset, which is 32,266 × 54,600 × 10,730 voxels (more than 30 TB) with a depth of 16 bit/voxel, to the TDat data format. TDat divides a three-dimensional image dataset into many small three-dimensional image blocks. We then split the reconstructed digital skeleton into many fragments according to the range of each TDat image block, and the skeleton fragments in each block were calculated in turn. Finally, the fragments were connected to become a complete digital skeleton. This process was slightly different from the workflow for the local neuronal fibers. In summary, for the locally distributed neuronal fibers, we split the digital skeletons according to the branch structure. However, for the long-projection neuronal fibers, we split the digital skeletons according to the range of the blocks.

### Neuronal fiber radius estimation

In a real dataset, it is difficult to obtain the centerline of the neuron fiber images. Thus, we do not have a way to directly evaluate the difference between the traced neuron skeletons and the ground truth. However, according to the traced neuron skeleton, we can calculate the neuron morphology parameters to evaluate the differences in the sizes. Here, we choose the neuron fiber length and neuron fiber radius to evaluate the performance of our method.

The estimation process for the neuron fiber radius is as follows (see Fig. 5): Based on the topology of the digital skeleton, we calculate a cross-sectional plane of a skeleton point. Thereafter, with a series of image processing operations, this plane is used to segment the area of the neuronal fiber tubular structure from the cross-sectional image, and we estimate the neuronal fiber radius. Specifically, the skeleton point *p*_{3} and its parent point *p*_{f} and child point *p*_{c} can be approximately considered to be in the same circle with a very large radius. Moreover, the normal vector \( {\overset{\rightharpoonup }{\upsilon}}_3 \) of this circle at *p*_{3} can be regarded as parallel to \( \overrightarrow{p_f{p}_{rc}} \) (the topological graph is shown in the supplementary materials, Figure S1 C). Subsequently, the cross-sectional plane at *p*_{3} is defined by the normal vector \( {\overset{\rightharpoonup }{\upsilon}}_3 \), named *Area*_{3}. With regard to *Area*_{3}, we segment the neuronal fiber tubular structure with a series of image processing operations, including linear grayscale transformations, adaptive thresholding [28], open operations, and connected component operations on the cross-sectional image. Finally, we obtain the inner part of the cross-section of the neuronal fiber structure at *p*_{3}, named *Area*_{t}. Here, we estimate this segmented result with a circle and estimate the radius according to the formula of a circle \( r=\sqrt{S/\pi } \), where *S* = *Area*_{t}.

### Statistical analysis

The statistical significance analysis was performed using MATLAB, and statistical comparisons were performed using a two-tailed t-test. All measurements are listed in the form *mean* ± *std*. The confidence level was set to 0.05 (**p* < 0.05, ***p* < 0.01, ****p* < 0.0001).