### Image formation model

We use the image formation model and algorithms from [7] to simulate a confocal microscope with ideal point spread function. Optimal reconstruction quality using this model requires that the refractive indices of immersion and embedding medium match the specification of the microscope lens. They should be adapted to the average refractive index of the imaged specimen. The signal *F*_{
i
}(**x**) for direction *i* (*i*=1: top, *i*=2: bottom) captured by the photo multiplier is modeled as the integral over a cone shaped bundle of rays originating at the recording position **x**∈*ℝ*^{3}. Each ray is attenuated by the attenuation coefficients along its path so that

{F}_{i}\left(\mathbf{x}\right)={\beta}_{i}I\left(\mathbf{x}\right)\xb7{\left({\int}_{S}{s}_{i}\left(\mathbf{r}\right)\xb7{e}^{-{\int}_{0}^{\infty}\alpha \left(\mathbf{x}+\ell \mathbf{r}\right)\phantom{\rule{0.3em}{0ex}}\mathrm{d}\ell}\phantom{\rule{0.3em}{0ex}}\mathrm{d}\mathbf{r}\right)}^{2}\phantom{\rule{0.3em}{0ex}},

(1)

where \alpha :{\mathbb{R}}^{3}\supset \Omega \to {\mathbb{R}}_{\ge 0} are the spatially variant attenuation coefficients. *s*_{
i
}:*S*→{0,1} are the cone sensitivity functions for both directions defined over the unit sphere *S*. *s*_{
i
}(**r**) is one for all ray directions within the cone and zero otherwise. I:\Omega \to {\mathbb{R}}_{\ge 0} denotes the attenuation free intensities. The factors *β*_{
i
}∈*ℝ*_{+} can be used to additionally scale all intensities of the recordings. We use it to model photo bleaching induced signal attenuation in the second recording. Only the focused beam leads to significant bleaching since the excitation energy drops quadratically with the distance to the focal point. The assumption of constant bleaching for the whole volume is a zero-order approximation for the true bleaching function which is non-linear and specific for the marker-molecules used. We fix *β*_{1}:=1 and optimize *β*_{2} alongside with the real intensities *I* and the attenuation coefficients *α* as described in the upcoming section.

### Energy formulation

We want to maximize the posterior probability for the attenuation coefficients *α*, the attenuation-free intensities *I*, and the factor *β*_{2} given the observed intensities of the two recorded data volumes *I*_{1} and *I*_{2}.

According to Bayes’ rule we get

\begin{array}{lcr}\left({\alpha}^{\ast},{I}^{\ast},{\beta}_{2}^{\ast}\right)& =& arg\underset{\alpha ,I,{\beta}_{2}}{max}\frac{P\left({I}_{1},{I}_{2}\mid \alpha ,I,{\beta}_{2}\right)P(\alpha ,I,{\beta}_{2})}{P({I}_{1},{I}_{2})}\\ =& arg\underset{\alpha ,I,{\beta}_{2}}{max}P\left({I}_{1},{I}_{2}\mid \alpha ,I,{\beta}_{2}\right)P(\alpha ,I,{\beta}_{2})\phantom{\rule{0.3em}{0ex}}.\end{array}

(2)

The prior *P*(*I*_{1},*I*_{2}) of the recorded images is independent of the attenuation, the true intensities, and the bleaching, therefore it could be dropped from the equation.

We have no prior knowledge about the expected intensities but want the attenuation coefficients to vary smoothly within local neighborhoods. Additionally we prefer solutions with zero attenuation estimates in areas of insufficient data. This can be modeled in the prior probability as

P\left(\alpha \right)\sim {e}^{-\lambda \underset{\Omega}{\int}\psi \left({\u2225\nabla \alpha \u2225}^{2}\right)\phantom{\rule{0.3em}{0ex}}\mathrm{d}\mathbf{x}}\xb7{e}^{-\mu \underset{\Omega}{\int}\sqrt{{\alpha}^{2}\left(\mathbf{x}\right)+{\epsilon}_{\text{sp}}^{2}}\phantom{\rule{0.3em}{0ex}}\mathrm{d}\mathbf{x}}\phantom{\rule{0.3em}{0ex}},

(3)

where \Omega \subset {\mathbb{R}}^{3} is the recorded volume, *λ*,*μ*∈*ℝ*_{≥0} are weighting factors and *ε*_{sp}∈*ℝ*_{+} is a small constant which is added for reasons of numerical stability. The loss function \psi :\mathbb{R}\to \mathbb{R} is either the identity function, leading to quadratic regularization (Tikhonov Miller (TM)) or approximates the total variation regularization (TV) when set to {\psi}_{\text{TV}}\left({x}^{2}\right):=\sqrt{{x}^{2}+{\epsilon}_{\text{TV}}^{2}} with *ε*_{TV}∈*ℝ*_{+} being a small constant. This approximation is closely related to the Huber norm. During the experiments we set *ε*_{sp}=*ε*_{TV}=10^{−10}.

We assume, that for all discrete recording positions **x** the observed attenuation coefficients and intensities are independent (except for their explicitly modeled common dependency on *α*, *I*, and *β*_{2}) resulting in

\phantom{\rule{-7.0pt}{0ex}}\left({\alpha}^{\ast},{I}^{\ast},{\beta}_{2}^{\ast}\right)=arg\underset{\alpha ,I,{\beta}_{2}}{max}P\left(\alpha \right)\prod _{\mathbf{x}\in {\Omega}^{\prime}}\prod _{i=1}^{2}P\left({I}_{i}\left(\mathbf{x}\right)\mid \alpha ,I,{\beta}_{2}\right)\phantom{\rule{0.3em}{0ex}},

(4)

where *Ω*^{′} is the discretized recorded volume.

The noise in the image intensities is dominated by the Poisson distributed shot noise due to the quantum nature of light but the measured intensities also contain additive Gaussian distributed read-out noise. The resulting noise model is the convolution of a Poisson process (scaled by a constant factor *m*∈*ℝ*_{+} and offset *b*∈*ℝ*) with a zero mean Gaussian distribution with standard deviation *σ*∈*ℝ*_{+}. We eliminate the offset *b* by subtracting the intensity at the histogram peak from each recording prior to further processing. For reasons of computational efficiency, we approximate the Poisson process by a Gaussian process with variable variance (the variance is proportional to the mean of the distribution). We further assume, that the measured intensities are a good estimator of this mean value. At each position **x**∈*Ω* this leads to a convolution of two Gaussian distributions which results in the combined Gaussian distribution

\begin{array}{ll}\phantom{\rule{-11.0pt}{0ex}}P\left({I}_{i}\left(\mathbf{x}\right)\mid \alpha ,I,{\beta}_{2}\right)\approx & \frac{1}{\sqrt{2\pi {m}^{2}{I}_{i}\left(\mathbf{x}\right)}}{e}^{-\frac{{\left({I}_{i}\left(\mathbf{x}\right)-{\beta}_{i}{F}_{i}\left(\mathbf{x}\right)\right)}^{2}}{2{m}^{2}{I}_{i}\left(\mathbf{x}\right)}}\phantom{\rule{2em}{0ex}}\\ \ast \frac{1}{\sqrt{2\pi {\sigma}^{2}}}{e}^{-\frac{{\left({I}_{i}\left(\mathbf{x}\right)-{\beta}_{i}{F}_{i}\left(\mathbf{x}\right)\right)}^{2}}{2{\sigma}^{2}}}\phantom{\rule{2em}{0ex}}\\ =\frac{1}{\sqrt{2\pi \left({m}^{2}{I}_{i}\left(\mathbf{x}\right)+{\sigma}^{2}\right)}}{e}^{-\frac{{\left({I}_{i}\left(\mathbf{x}\right)-{\beta}_{i}{F}_{i}\left(\mathbf{x}\right)\right)}^{2}}{2\left({m}^{2}{I}_{i}\left(\mathbf{x}\right)+{\sigma}^{2}\right)}}\phantom{\rule{0.3em}{0ex}}.\phantom{\rule{2em}{0ex}}\end{array}

(5)

Note, that for *m*=0 and *σ*=1 the new model coincides with the pure Gaussian model presented in [7]. The actual value for the Poisson scaling *m* (the number of collected photons per intensity level) and the standard deviation *σ* of the Gaussian noise can be estimated during a microscope calibration phase. If they are unknown, one of them can be fixed to an arbitrary value (we always fixed *σ*=1), and the other one can be adjusted to qualitatively obtain the optimum result. If additional sample information is available, e.g. the recordings consist of large homogeneous regions of different intensities, one can also try to estimate the parameters from the images themselves as done in [10]. However, for biological samples this is rarely the case.

The final energy formulation is obtained by replacing the maximization of the probability by a minimization of its negative logarithm

\begin{array}{ll}\underset{\alpha ,I,{\beta}_{2}}{min}E\left(\alpha ,I,{\beta}_{2}\right)& :=\underset{\alpha ,I,{\beta}_{2}}{min}\underset{{E}_{\text{data}}}{\underset{\u203f}{\sum _{i=1}^{2}\underset{\Omega}{\int}\frac{{\left({I}_{i}\left(\mathbf{x}\right)-{F}_{i}\left(\mathbf{x}\right)\right)}^{2}}{{m}^{2}{I}_{i}\left(\mathbf{x}\right)+{\sigma}^{2}}\phantom{\rule{0.3em}{0ex}}\mathrm{d}\mathbf{x}}}\\ \phantom{\rule{1em}{0ex}}+\lambda \underset{{E}_{\text{smooth}}}{\underset{\u203f}{\underset{\Omega}{\int}\psi \left({\u2225\nabla \alpha \left(\mathbf{x}\right)\u2225}^{2}\right)\phantom{\rule{0.3em}{0ex}}\mathrm{d}\mathbf{x}}}\\ \phantom{\rule{1em}{0ex}}+\mu \underset{{E}_{\text{sparse}}}{\underset{\u203f}{\underset{\Omega}{\int}\sqrt{{\alpha}^{2}\left(\mathbf{x}\right)+{\epsilon}_{\text{sp}}^{2}}\phantom{\rule{0.3em}{0ex}}\mathrm{d}\mathbf{x}}}\\ \text{respect to}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{2em}{0ex}}& {\beta}_{2}>0\phantom{\rule{0.3em}{0ex}}\wedge \phantom{\rule{0.3em}{0ex}}\forall \mathbf{x}\in \Omega :\alpha \left(\mathbf{x}\right)\ge 0\phantom{\rule{0.3em}{0ex}}.\end{array}

(6)

To simplify the notation, we introduce the shorthands

\begin{array}{lcr}\hfill {T}_{\mathbf{r}}\left[\alpha \right]\left(\mathbf{x}\right)& :=& {e}^{-{\int}_{0}^{\infty}\alpha \left(\mathbf{x}+\ell \mathbf{r}\right)\phantom{\rule{0.3em}{0ex}}\mathrm{d}\ell}\hfill \\ \hfill {C}_{i}\left[\alpha \right]\left(\mathbf{x}\right)& :=& \underset{S}{\int}{s}_{i}\left(\mathbf{r}\right)\xb7{T}_{\mathbf{r}}\left[\alpha \right]\left(\mathbf{x}\right)\phantom{\rule{0.3em}{0ex}}\mathrm{d}\mathbf{r}\hfill \\ {F}_{i}\left[\alpha ,I,{\beta}_{2}\right]\left(\mathbf{x}\right)& =& {\beta}_{i}I\left(\mathbf{x}\right){C}_{i}^{2}\left[\alpha \right]\left(\mathbf{x}\right)\hfill \\ {D}_{i}\left[\alpha ,I,{\beta}_{2}\right]\left(\mathbf{x}\right)& :=& {I}_{i}\left(\mathbf{x}\right)-{F}_{i}\left[\alpha ,I,{\beta}_{2}\right]\left(\mathbf{x}\right)\end{array}

where *T*_{
r
} is the attenuation along the ray with direction **r**, *C*_{
i
} is the cone transmission for recording direction *i*, *F*_{
i
} are the simulated intensities, and *D*_{
i
} are the differences between the recordings and simulations. Variables in square brackets indicate dependencies on the corresponding optimization variables.

For the optimization we employ the Broyden-Fletcher-Goldfarb-Shanno algorithm with box constraints on the variables (short L-BFGS-B) [11]. The solver minimizes the energy while respecting the positivity of the attenuations throughout the iterative optimization. L-BFGS-B implements a quasi Newton method, therefore, we need to provide the derivatives of the energy with respect to the unknown intensities *I*, the attenuation coefficients *α*, and the bleaching factor *β*_{2}. These are given by

\begin{array}{lcr}\frac{\mathrm{\delta E}\left(\alpha ,I,{\beta}_{2}\right)}{\mathrm{\delta I}\left(\mathbf{x}\right)}& =& -2\sum _{i=1}^{2}\frac{{\beta}_{i}{D}_{i}\left(\mathbf{x}\right){C}_{i}^{2}\left(\mathbf{x}\right)}{{m}^{2}{I}_{i}\left(\mathbf{x}\right)+{\sigma}^{2}}\end{array}

(7)

\begin{array}{lc}\frac{\mathrm{\delta E}\left(\alpha ,I,{\beta}_{2}\right)}{\mathrm{\delta \alpha}\left(\mathbf{x}\right)}& =\phantom{\rule{1em}{0ex}}4{\sum}_{i=1}^{2}{\beta}_{i}{\int}_{S}{s}_{i}\left(\mathbf{r}\right){\int}_{0}^{\infty}\frac{I\left(\mathbf{x}-\ell \mathbf{r}\right){T}_{\mathbf{r}}\left(\mathbf{x}-\ell \mathbf{r}\right){C}_{i}\left(\mathbf{x}-\ell \mathbf{r}\right){D}_{i}\left(\mathbf{x}-\ell \mathbf{r}\right)}{{m}^{2}{I}_{i}\left(\mathbf{x}-\ell \mathbf{r}\right)+{\sigma}^{2}}\phantom{\rule{0.3em}{0ex}}\mathrm{d}\ell \phantom{\rule{0.3em}{0ex}}\mathrm{d}\mathbf{r}\hfill \\ \phantom{\rule{0.75em}{0ex}}-\lambda \xb72\xb7\text{div}\left({\psi}^{\prime}\left({\u2225\nabla \alpha \left(\mathbf{x}\right)\u2225}^{2}\right)\nabla \alpha \left(\mathbf{x}\right)\right)+\mu \frac{\alpha \left(\mathbf{x}\right)}{\sqrt{{\alpha}^{2}\left(\mathbf{x}\right)+{\epsilon}_{\text{sp}}^{2}}}\end{array}

(8)

\begin{array}{lcr}\frac{\mathrm{\partial E}\left(\alpha ,I,{\beta}_{2}\right)}{\partial {\beta}_{2}}& =& -2\underset{\Omega}{\int}\frac{{D}_{2}\left(\mathbf{x}\right)I\left(\mathbf{x}\right){C}_{2}^{2}\left(\mathbf{x}\right)}{{m}^{2}{I}_{2}\left(\mathbf{x}\right)+{\sigma}^{2}}\phantom{\rule{0.3em}{0ex}}\mathrm{d}\mathbf{x}\phantom{\rule{0.3em}{0ex}},\end{array}

(9)

where the derivative of the loss function *ψ*TM′(*x*^{2})=1 for the TM regularization and {\psi}_{\text{TV}}^{\prime}\left({x}^{2}\right)=\frac{1}{2\sqrt{{x}^{2}+{\epsilon}_{\text{TV}}^{2}}} for the TV approximation. The detailed derivations are given in the Additional file 1.

The bleaching factor *β*_{2} prevents a direct analytic optimization of the intensities at each quasi Newton iteration as presented in [7]. Instead we optimize the bleaching factor *β*_{2} and intensities *I* within each quasi Newton iteration in an inner fixed-point iteration loop that alternates between analytic computation of *I*^{[j]} given {\beta}_{2}^{\left[j-1\right]} and {\beta}_{2}^{\left[j\right]} given *I*^{[j]} in inner iteration *j* where

\begin{array}{cc}\phantom{\rule{-15.0pt}{0ex}}{I}^{\left[j\right]}\left(\mathbf{x}\right)& =\frac{{I}_{1}\left(\mathbf{x}\right){C}_{1}^{2}\left(\mathbf{x}\right)\left({m}^{2}{I}_{2}\left(\mathbf{x}\right)\phantom{\rule{0.3em}{0ex}}+\phantom{\rule{0.3em}{0ex}}{\sigma}^{2}\right)\phantom{\rule{0.3em}{0ex}}+\phantom{\rule{0.3em}{0ex}}{\beta}_{2}^{\left[j-1\right]}{I}_{2}\left(\mathbf{x}\right){C}_{2}^{2}\left(\mathbf{x}\right)\left({m}^{2}{I}_{1}\left(\mathbf{x}\right)+{\sigma}^{2}\right)}{{C}_{1}^{4}\left(\mathbf{x}\right)\left({m}^{2}{I}_{2}\left(\mathbf{x}\right)+{\sigma}^{2}\right)+{{\beta}_{2}^{\left[j-1\right]}}^{2}{C}_{2}^{4}\left(\mathbf{x}\right)\left({m}^{2}{I}_{1}\left(\mathbf{x}\right)+{\sigma}^{2}\right)}\end{array}

(10)

\begin{array}{cc}{\beta}_{2}^{\left[j\right]}& =\frac{\underset{\Omega}{\int}\frac{{I}_{2}\left(\mathbf{x}\right){I}^{\left[j\right]}\left(\mathbf{x}\right){C}_{2}^{2}\left(\mathbf{x}\right)}{{m}^{2}{I}_{2}\left(\mathbf{x}\right)+{\sigma}^{2}}\phantom{\rule{0.3em}{0ex}}\mathrm{d}\mathbf{x}}{\underset{\Omega}{\int}\frac{{\left({I}^{\left[j\right]}\left(\mathbf{x}\right){C}_{2}^{2}\left(\mathbf{x}\right)\right)}^{2}}{{m}^{2}{I}_{2}\left(\mathbf{x}\right)+{\sigma}^{2}}\phantom{\rule{0.3em}{0ex}}\mathrm{d}\mathbf{x}}\phantom{\rule{0.3em}{0ex}}.\end{array}

(11)

### Implementation

The variational attenuation correction was implemented in C++ and run under Linux (Ubuntu 12.04) on an Intel Xeon E5-2680 (2.7GHz) Dual-Processor system. For the optimization we used the ready FORTRAN implementation of the L-BFGS-B optimizer. One iteration for data sub-sampled to 80×80×80 voxels needed on average 1.8 seconds, so a full reconstruction can be computed in the range of a few minutes. The complexity scales linearly with the number of voxels to process within each iteration (Additional file 1: Figure S3). The memory complexity also scales linearly with the raw data volume. Both quantities can be limited by sub-sampling the high resolution raw data. This has two advantages: First, less computational resources are needed and second, the weighted averaging during the sub-sampling already considerably reduces the image noise. The cone transmission is computed in parallel for all ray directions leading to a significant speed-up of the confocal microscope simulation. Depending on the random computation order introduced by the scheduling the results can slightly deviate from the numbers reported in the Results section. For real-world data we observed deviations of the estimated intensities of up to 3% after convergence of the algorithm. However, these differences are visually not recognizable.

#### Discrete derivative and integral computation

The gradients needed in the derivatives of the TM smoothness term and in the sparsity term are computed using central differences. For TV regularization we extended the numerical differentiation-scheme from [12] to 3-D to obtain the divergence term in the derivative of the smoothness term. The corresponding equations are given in the Additional file 1. The cone integrals are approximated as in [7] with a ray spacing of six degrees. This approximation of the cone integrals requires high regularization to lead to good reconstructions. We also did experiments with an alternative ray integration scheme that uses thin rays instead of the incrementally widening conic rays of [7]. To still capture all attenuations the ray sampling was increased so that the cone is sampled densely at the largest cone diameter with respect to the volume grid. The result is given in the Additional file 1 and shows, that the energy formulation leads to the desired solution but the numerical approximations have crucial influence on the resulting reconstruction.

### Data generation

#### Synthetic data

To quantitatively evaluate our method, we generated two different synthetic datasets. The first consists of a solid sphere with constant absorption coefficients of 0.006 per voxel. The interior 60% of the sphere were set to an intensity value of 4094. We added a smooth random texture with a variance of approximately 30% of the maximum of the corresponding quantity. This corresponds to the “well-posed” case when intensity and absorption information is available in the whole domain. The second dataset is the well-known Shepp-Logan phantom [13] consisting of a set of overlapping ellipsoids with homogeneous intensities. We assigned absorption coefficients to the different regions avoiding direct correlation with the intensities. Some regions were assigned equal attenuations independent of their intensity difference. During the simulation we applied an anisotropic Gaussian smoothing to reflect the microscope’s point spread function and ensure Nyquist sampling. For both datasets two recordings from opposite sides were simulated using (1) modeling absorption and photo bleaching (*β*_{2}=0.8, meaning 20% signal loss between the recordings). Then Poisson noise with scaling *m*=0.05 and Gaussian noise with standard deviation *σ*=10 were applied. The datasets are shown in Figure 3.

#### Zebrafish

To show that the approach also copes well with real world data, we tested it on samples of the ViBE-Z database consisting of confocal recordings of whole zebrafish (Danio rerio) embryos, which were fixed 72h after fertilization. Sample preparation, recording setup and image preprocessing are described in detail in [7]. The processing was performed on sub-sampled data with isotropic voxel extents of 8 *μ*m.

#### Arabidopsis thaliana

Finally we tested the approach on recordings of the root tip of the model plant *Arabidopsis thaliana*. The samples were fixed 96h after germination and the cell membranes marked with an Alexa antibody stain. Then they were embedded in SlowFade Gold Antifade (Invitrogen) and recorded from top and bottom using a confocal microscope equipped with a 40 × oil immersion objective. We applied the elastic registration algorithm from [7] to register the two views to each other. Finally we performed a background subtraction prior to applying the attenuation correction. The embedding medium had a refractive index of 1.42 compared to a refractive index of the immersion oil of 1.52 for which the lens was adjusted. The attenuation correction was performed on sub-sampled data with isotropic voxel extents of 2 *μ*m.

### Parameter setup

We want all terms in the energy to have approximately the same influence on the optimization process. This leads to rough rules of thumb for the selection of *λ* and *μ*. Since all terms integrate over the whole image domain, the choice is independent of the number of voxels. The energy contribution of the data term is in the order of the squared expected intensity differences between recording and simulation divided by the Poisson weights. The smoothness term’s contribution is in the order of the magnitude of the expected attenuation gradient (TV) or its square (TM). Finally, the sparsity term’s contribution is in the order of the expected attenuations. E.g. for intensity data with an expected residual intensity difference of 5 (corresponding to the average noise intensity) and pure Gaussian noise with expected attenuation coefficients of 0.005 and gradient magnitudes of 0.0005 initial choices of \lambda =\frac{{5}^{2}}{0.000{5}^{2}}=4\xb71{0}^{8} and \mu =\frac{{5}^{2}}{0.005}=5000 (TM), resp. \lambda =\frac{{5}^{2}}{0.0005}=5\xb71{0}^{4} and *μ*=5000 (TV) are appropriate. The approximate estimates for the expected attenuations and their gradients were empirically confirmed on real world samples. For higher Poisson weighting *m* the factors have to be decreased accordingly. The optimal values depend on the image content and should be optimized for specific types of data.

For the synthetic phantoms we chose for each fixed *m* the optimal *λ* and *μ* which minimize the root mean squared error (rmse) of the true intensities and the estimated intensities. The parameters were empirically determined with an exponential grid search over a parameter range of *λ*∈{0,10^{0},…,10^{9}} and *μ*∈{0,10^{2}, 10^{3},10^{4}} for the textured sphere phantom, and *λ*∈{0, 10^{6},…,10^{12}} and *μ*∈{0,10^{3},…,10^{8}} for the Shepp-Logan phantom. For all experiments we set *σ*:=1. For the real world data we used a conservative parameter set of *λ*=10^{7}, *μ*=0 and *m*=0.1 (Arabidopsis) or *λ*=10^{8}, *μ*=10^{4} and *m*=0 (zebrafish) for all experiments with TM regularization. For the zebrafish experiments with TV regularization we set *λ*=5·10^{4}, *μ*=0, and *m*=0. For the real world data we stopped the iterative process when the visually optimal reconstruction of the intensities was reached, which was after between 3 to 20 iterations. For the textured sphere phantom data we set a maximum of 50 iterations for TM regularization and ran the algorithm to convergence for TV regularization. All results reported for the Shepp-Logan phantom were reached at convergence of the algorithm.