Problem definition
The proposed developments are founded on a natural model for the observed spectrum y as a spiky signal x
p
convolved with a peak shape p superimposed onto a smooth baseline x
b
$$ \mathbf{y} = \mathbf{x}_{b} + \mathbf{x}_{p} \ast p + \mathbf{e} $$
(1)
where e includes measurement and model errors. It is a common linear model with additive uncertainties. These quantities are represented by vectors of size n, where n is the number of m/z channels of the original spectrum y.
The problem at stake is to recover both the signal of interest x
p
and the baseline x
b
. It is a difficult task at least for three reasons.
-
It is underdetermined since the number of unknowns is twice the number of data.
-
The convolution reduces the resolution due to peak enlargement and possible overlap.
-
Measurement noise and possible model inadequacy induce additional uncertainties.
As a consequence, information must be accounted for regarding the expected signals x
p
and x
b
. In the following developments, x
b
is expected to be smooth while x
p
is expected to be spiky and positive. This knowledge will be included in the next sections.
x
b
smoothness
A simple way to account for smoothness of x
b
is to penalize its fluctuations through
$$ \mathcal{P}_{b}\left(\mathbf{x}_{b}\right) = \frac{\mu}{2} \sum_{k=1}^{n-1} \left(\mathbf{x}_{b}\left[k+1\right] - \mathbf{x}_{b}\left[k\right]\right)^{2} = \frac{\mu}{2} \|\mathbf{Dx}_{b}\|^{2}_{2} $$
(2)
where D is a finite differences matrix of size (n−1)×n (given in Appendix “Smoothness and convolution matrix”) and μ>0.
x
p
sparsity and positivity
In order to favor sparsity for x
p
(spiky property) an elastic-net penalty is introduced
$$ \mathcal{P}_{p}(\mathbf{x}_{p}) = \lambda_{1} \big\|\mathbf{x}_{p}\big\|_{1} + \frac{\lambda_{2}}{2} \big\|\mathbf{x}_{p}\big\|_{2}^{2} $$
(3)
The degree of sparsity is controlled by λ1. The coefficient λ2 is generally set to zero or to a very small value. The reason why we have introduced this extra regularization is that a small positive value can sometimes improve convergence speed of the algorithm. In practice this only happens for spectra of several thousand of m/z channels and always has a limited impact on the obtained solution.
In addition, the solution also requires positivity of x
p
, i.e. positivity for each component x
p
[ k]. The l1 norm then simplifies
$$\|\mathbf{x}_{p}\|_{1} = \sum_{k=1}^{n} \mathbf{x}_{p}\left[k\right] = \mathbb{1}_{n}^{t} \, \mathbf{x}_{p} $$
where \(\mathbb {1}_{n}\) is the n dimensional column vector with each entry to 1. This substitution turns the convex, but nonsmooth, penalty \(\mathcal {P}_{p}(\mathbf {x}_{p})\) into a smooth quadratic one. This idea has already been used in [23, 28].
Data-fidelity term
A common and natural way to quantify data - model discrepancy founded on Eq. 1 relies on a squared Euclidean norm:
$$ \mathcal{J}_{\text{obs}}\left(\mathbf{x}_{b},\mathbf{x}_{p}\right) = \frac{1}{2} \big\|\mathbf{y} - \left(\mathbf{x}_{b} + \mathbf{Lx}_{p}\right) \big\|^{2}_{2} $$
(4)
where the n×n band matrix L represents the convolution with the peak shape p.
Complete objective
Using Eqs. 2, 3 and 4, we get a first expression of the complete objective:
$$ \mathcal{J}\left(\mathbf{x}_{b},\mathbf{x}_{p}\right) = \mathcal{J}_{\text{obs}}\left(\mathbf{x}_{b},\mathbf{x}_{p}\right) + \mathcal{P}_{b}\left(\mathbf{x}_{b}\right) + \mathcal{P}_{p}\left(\mathbf{x}_{p}\right) $$
(5)
that is to say:
$$\begin{array}{*{20}l} \mathcal{J}\left(\mathbf{x}_{b},\mathbf{x}_{p}\right) &= \frac{1}{2} \|\mathbf{y} - \left(\mathbf{x}_{b} + \mathbf{Lx}_{p}\right) \|^{2}_{2} \\ & \quad+ \frac{\mu}{2} \|\mathbf{D}\mathbf{x}_{b}\|^{2}_{2}+ \lambda_{1} \|\mathbf{x}_{p}\|_{1} + \frac{\lambda_{2}}{2} \|\mathbf{x}_{p}\|_{2}^{2} \end{array} $$
(6)
The minimization of this objective function
$$ \left(\widehat{\mathbf{x}}_{p},\widehat{\mathbf{x}}_{b}\right) = ~\underset{\mathbf{x}_{p}\ge0\,,\,\mathbf{x}_{b}}{\mathrm{arg\,min}}~\mathcal{J}\left(\mathbf{x}_{b},\mathbf{x}_{p}\right) $$
(7)
gives the desired solution \(\left (\widehat {\mathbf {x}}_{p},\widehat {\mathbf {x}}_{b}\right)\).
Elimination of x
b
To find the solution of problem Eq. 7 we begin by solving it for the x
b
vector. This is an unconstrained quadratic problem and an analytical solution can be found. We shall first write down the gradient of \(\mathcal {J}\) with respect to x
b
:
$$ \nabla_{\mathbf{x}_{b}}\mathcal{J} = \left(\mathbf{I}_{b} + \mu \mathbf{D}^{t}\mathbf{D}\right) \mathbf{x}_{b} - \left(\mathbf{y} - \mathbf{L}\mathbf{x}_{p}\right) $$
(8)
Solving \(\nabla _{\mathbf {x}_{b}} \mathcal {J} =0\) yields the minimizer:
$$\begin{array}{*{20}l} \widehat{\mathbf{x}}_{b} &=\,\,\left(\mathbf{I}_{b}+\mu \mathbf{D}^{t}\mathbf{D}\right)^{-1}\left(\mathbf{y}-\mathbf{L}\mathbf{x}_{p}\right) \\ &=\,\, \mathbf{B}_{\mu}^{-1}\left(\mathbf{y}-\mathbf{L}\mathbf{x}_{p}\right) \end{array} $$
(9)
where \(\mathbf {B}_{\mu }=\mathbf {I}_{b}+\mu \mathbf {D}^{t}\mathbf {D}\). This operation is always possible since B
μ
is invertible (sum of the identity matrix and a semi-positive matrix).
It is then possible to substitute x
b
for \(\widehat {\mathbf {x}}_{b}\) in objective Eq. 5 to obtain a reduced objective. After some algebra (proof is given in Appendix “Derivation of reduced objective”), we have
$$\widetilde{\mathcal{J}}(\mathbf{x}_{p}) = \mathcal{J}\left(\widehat{\mathbf{x}}_{b},\mathbf{x}_{p}\right) + \mathbf{C} $$
where C is a constant, and \(\widetilde {\mathcal {J}}(\mathbf {x}_{p})\) is defined by:
$$\begin{array}{@{}rcl@{}} \widetilde{\mathcal{J}}(\mathbf{x}_{p}) & = & \frac{1}{2} \mathbf{x}_{p}^{t} \, \left(\mathbf{L}^{t}\mathbf{A}_{\mu} \mathbf{L} + \lambda_{2} \mathbf{I}_{b} \right)\, \mathbf{x}_{p} \\ & & - \mathbf{x}_{p}^{t} \, \left(\mathbf{L}^{t}\mathbf{A}_{\mu} \mathbf{y} - \lambda_{1} \mathbb{1}_{n}\right) \end{array} $$
(10)
The matrix A
μ
is defined by:
$$ \mathbf{A}_{\mu} = \mathbf{I}_{b}-\mathbf{B}_{\mu}^{-1} = \mathbf{I}_{b}-\left(\mathbf{I}_{b}+\mu \mathbf{D}^{t}\mathbf{D}\right)^{-1} $$
(11)
and its interpretation is discussed in detail in “Analysis of the A
μ
matrix” section).
This quadratic form \(\widetilde {\mathcal {J}}(\mathbf {x}_{p})\) is our objective function and the solution \(\widehat {\mathbf {x}}_{p}\) is its minimizer subject to positivity:
$$ \widehat{\mathbf{x}}_{p} = \underset{\mathbf{x}_{p}\ge 0}{\mathrm{arg\,min}} \widetilde{\mathcal{J}}(\mathbf{x}_{p}) $$
(12)
Once this constrained quadratic problem is solved, we can retrieve \(\widehat {\mathbf {x}}_{b}\) using Eq. 9:
$$\widehat{\mathbf{x}}_{b} = \mathbf{B}_{\mu}^{-1} \left(\mathbf{y}-\mathbf{L} \, \widehat{\mathbf{x}}_{p}\right) $$
then the initial problem Eq. 7 is solved.
The gradient of \(\widetilde {\mathcal {J}}(\mathbf {x}_{p})\) is easily deduced form Eq. 10 and reads:
$$ \nabla_{\mathbf{x}_{p}}\widetilde{\mathcal{J}} = \left(\mathbf{L}^{t}\mathbf{A}_{\mu} \mathbf{L} + \lambda_{2} \mathbf{I}_{p}\right) \mathbf{x}_{p} - \left(\mathbf{L}^{t}\mathbf{A}_{\mu} \mathbf{y} - \lambda_{1} \mathbb{1}_{p}\right) $$
(13)
Interpretation of the reduced criterion
We can make a pause here to interpret Eq. 12. If there was no baseline a natural way to deconvolve the spectrum would be to solve:
$$\underset{\mathbf{x}_{p}\ge0}{\mathrm{arg\,min}} \frac{1}{2}\big\|\mathbf{y} - \mathbf{L}\mathbf{x}_{p}\big\|^{2}_{2} + \lambda \big\|\mathbf{x}_{p}\big\|_{1} $$
If we use the positivity constraint and expand this objective function we get the following equivalent problem:
$$ \underset{\mathbf{x}_{p}\ge0}{\mathrm{arg\,min}} \frac{1}{2}\mathbf{x}_{p}^{t} \mathbf{L}^{t}\mathbf{L}\mathbf{x}_{p} + \mathbf{x}_{p}^{t}\left(\lambda_{1} \mathbb{1}_{p} - \mathbf{L}^{t}\mathbf{y}\right) $$
(14)
Now if we look Eq. 10 and set λ2=0 we get:
$$ \underset{\mathbf{x}_{p}\ge0}{\mathrm{arg\,min}} \frac{1}{2}\mathbf{x}_{p}^{t} \mathbf{L}^{t}\mathbf{A}_{\mu} \mathbf{L}\mathbf{x}_{p}+\mathbf{x}_{p}^{t}(\lambda_{1} \mathbb{1}_{p} - \mathbf{L}^{t}\mathbf{A}_{\mu} \mathbf{y}) $$
(15)
The two equations are very similar apart from the fact that the operator A
μ
is applied to the reconstructed peaks Lx
p
and to the raw spectrum y. It could be interpreted as the precision (inverse of the covariance) matrix in a correlated noise framework. Instead of this classical approach and to better understand its role in our deconvolution context we study the evolution of A
μ
in the two limiting cases \(\mu \rightarrow 0\) and \(\mu \rightarrow \infty \).
Analysis of the A
μ
matrix
Figure 1 shows the column \(j=\lfloor {\frac {n}{2}}\rfloor \) of A
μ
for two different values of μ.
For small μ, the A
μ
operator acts like a regularized second order derivation. For a sufficiently small μ (such that the spectral radius satisfies ρ(μDtD)<1) a Taylor expansion gives:
$$\begin{array}{*{20}l} \mathbf{A}_{\mu} &\, =\,\, \mathbf{I}_{b}-\left(\mathbf{I}_{b}+\mu \mathbf{D}^{t}\mathbf{D}\right)^{-1} \\ &\, =\,\, \mathbf{I}_{b}-\left(\mathbf{I}_{b}-\mu \mathbf{D}^{t}\mathbf{D}+o(\mu)\right) \\ &\, =\,\, \mu \mathbf{D}^{t}\mathbf{D} +o(\mu) \end{array} $$
Hence compared to Eq. 14, one can interpret Eq. 15 as an usual deconvolution applied on the second order derivative A
μ
y of the initial spectrum y. The used point spread function is also the second order derivative A
μ
L of the initial peak shape p introduced in Eq. 1. The overall effect of the A
μ
operator is to cancel the slow varying component of the signal. In another terms, the baseline is removed thanks to a derivation.
The regularization strength increases as the baseline parameter μ increases. At the limit \(\mu \rightarrow \infty \) (proof given in Appendix “Expression of \( \underset{\mu \to \infty }{\lim }{\mathbf{A}}_{\mu } \)”) we get:
$$\lim\limits_{\mu\rightarrow\infty} \left[\mathbf{A}_{\mu}\right]_{i,j} = \delta_{i,j}-\frac{1}{n} $$
When applied to any vector y we get
$$\lim\limits_{\mu\rightarrow\infty} \mathbf{A}_{\mu}\mathbf{y} = \mathbf{y} - \bar{\mathbf{y}}\mathbb{1}_{n} $$
Thus the action of A
μ
centers the signal y by subtracting the constant \(\bar {\mathbf {y}}\mathbb {1}_{n}\) vector, where the scalar \(\bar {\mathbf {y}}\) is the mean value of the y vector’s components. The same centering holds for the peak shape function p. It follows that in this limiting case the solved problem is the usual deconvolution procedure applied on the centered spectrum. The computed baseline is simply a constant equal to the spectrum mean.
Debiasing
The l1 penalty acts as a soft threshold to select peaks ([29], Section 10). This leads to a bias in peak intensity estimation. These intensities are artificially reduced when the l1 penalty increases. We use the ideas introduced in [28] to get corrected peak intensities. This yields a resolution procedure involving two stages. The first stage selects the peaks, the second one corrects their intensities.
First stage: peak support selection
Given μ, λ1 and λ2, we solve the minimization problem Eq. 12. The obtained solution \(\widehat {\mathbf {x}}_{p}\) is a sparse vector containing peak intensities. The intensities are biased if the λ hyper-parameters are not null. However we can use \(\widehat {\mathbf {x}}_{p}\) to define the peak support Ω. In the present work the peak support is simply defined by keeping the local maxima of \(\widehat {\mathbf {x}}_{p}\).
$$\begin{array}{@{}rcl@{}} \Omega & = & \left\{ i, \left(\left(\widehat{\mathbf{x}}_{p}[\!i]>\widehat{\mathbf{x}}_{p}[\!i-1]\right)\wedge\left(\widehat{\mathbf{x}}_{p}[\!i]\ge\widehat{\mathbf{x}}_{p}[\!i+1]\right)\right)\vee \right. \\ & & \left.\left(\left(\widehat{\mathbf{x}}_{p}[\!i]\ge\widehat{\mathbf{x}}_{p}[\!i-1]\right)\wedge\left(\widehat{\mathbf{x}}_{p}[\!i]>\widehat{\mathbf{x}}_{p}[\!i+1]\right)\right) \right\} \end{array} $$
(16)
The peak intensities are going to be corrected in the second stage, but their final positions are defined by the condition Eq. 16. More complex procedures can be used to find the peak support. One such example is the procedure presented in [22], Post-processing and thresholding. These more refined methods can be introduced in a straightforward way in our approach by using them instead of the basic condition Eq. 16.
Second stage: peak intensity correction
In the second step, as we have peak support Ω, we do not need the l1 regularization anymore. Hence we simply ignore it and solve the simplified optimization problem:
$$\begin{array}{*{20}l} \mathbf{x}_{p} &\,=\,\, {\mathrm{arg\,min}} \frac{1}{2} \mathbf{x}_{p}^{t} \mathbf{L}^{t}\mathbf{A}_{\mu} \mathbf{L} \mathbf{x}_{p} - \mathbf{x}_{p}^{t} \mathbf{L}^{t}\mathbf{A}_{\mu} \mathbf{y} \\ & \mathtt{s.t.} \quad \left\{\begin{array}{rcl} \mathbf{x}_{p}[\!k]=0 & \text{\texttt{if}} & k\notin\Omega \\ \mathbf{x}_{p}[\!k]\ge0 & \text{\texttt{if}} & k\in\Omega \end{array} \right. \end{array} $$
(17)
Despite its more complex appearance, this problem is no more complicated than Eq. 12. Solving this problem will correct peak intensities by removing the bias induced by the previously used l1 penalty.
Boundary conditions
There is a possible improvement of the method concerning boundary conditions. As Eq. 2 suggests, a strong μ penalty forces the baseline to be constant. In Fig. 2 we see that this phenomenon is especially present at the domain boundaries.
We solved this problem by imposing baseline values at boundaries. This corrected solution is also shown in Fig. 2 and we can see that the corrected solution does not suffer from boundary effect anymore. Appendix “Boundary correction”, page 11, provides all the details on how to modify Eqs. 9 and 13 to introduce some constraints on the baseline values x
b
. These modified equations will constitute our final model formulation.
Final model formulation
After boundary correction, the modified Eq. 9 is
$$ \mathbf{x}_{b} = \widetilde{\mathbf{B}}_{\mu}^{-1}\left(\widetilde{\mathbf{y}}-\mathbf{L}\mathbf{x}_{p}\right) $$
(18)
and the modified Eq. 10 is
$$\begin{array}{*{20}l} \widetilde{\mathcal{J}}_{2}(\mathbf{x}_{p}) =\quad & \frac{1}{2} \mathbf{x}_{p}^{t}\left(\lambda_{2} \mathbf{I}_{p}+ \mathbf{L}^{t}\widetilde{\mathbf{A}}_{\mu} \mathbf{L}\right) \mathbf{x}_{p}+ \\ & \, \mathbf{x}_{p}^{t}\left(\lambda_{1} \mathbf{\mathbb{1}}_{p} - \mathbf{L}^{t}\widetilde{\mathbf{A}}_{\mu}\widetilde{\mathbf{y}} - \mathbf{L}^{t} (\mathbf{y}-\widetilde{\mathbf{y}})\right) \end{array} $$
(19)
The solution \(\widehat {\mathbf {x}}_{p}\) is the unique minimizer of \(\widetilde {\mathcal {J}}_{2}\) subject to positivity constraint:
$$ \widehat{\mathbf{x}}_{p} = \underset{\mathbf{x}_{p}\ge 0}{\mathrm{arg\,min}} \widetilde{\mathcal{J}}_{2}(\mathbf{x}_{p}) $$
(20)
The gradient is obtained by a straightforward computation:
$$ \nabla_{\mathbf{x}_{p}} \widetilde{\mathcal{J}}_{2} = \left(\lambda_{2} \mathbf{I}_{p}+ \mathbf{L}^{t}\widetilde{\mathbf{A}}_{\mu} \mathbf{L}\right) \mathbf{x}_{p} + \lambda_{1} \mathbf{\mathbb{1}}_{p} - \mathbf{L}^{t}\widetilde{\mathbf{A}}_{\mu}\widetilde{\mathbf{y}} - \mathbf{L}^{t} (\mathbf{y}-\widetilde{\mathbf{y}}) $$
(21)
As before, \(\widehat {\mathbf {x}}_{b}\) is computed from \(\widehat {\mathbf {x}}_{p}\) using Eq. 18. The explicit forms of \(\widetilde {\mathbf {B}}_{\mu }\), \(\widetilde {\mathbf {y}}\) and \(\widetilde {\mathbf {A}}_{\mu }\) are given in Appendix “Boundary correction”, respectively by Eqs. 31, 32 and 33.
Algorithm summary
For ease of reading Algorithm 1 recapitulates the main steps of the proposed method in its final formulation. The optimization procedure used to solve efficiently the two minimization problems will be described in detail in the next section.