In the first two subsections, we briefly describe the original Bayesian Piecewise Constant Regression method, explaining the hypothesis of the model and the estimation of its parameters with Bayesian regression. We emphasize the definitions of the original parameter estimators in order to show how we changed some of these estimators in the other subsections.

A brief explanation of the dynamic program for the computation of the estimators can be found in the Additional file 2 (more details can be found in [11, 12]).

Regarding notations, we do not indicate explicitly the random variable to which a distribution is referred, if it is clear from the context. For example, *p*_{
K
}(*k*) ≡ *p*(*k*) or *f*_{Y, M}(*y*, *μ*) ≡ *f*(*y*, *μ*).

### Hypotheses of the model

Let *Y*∊ ℝ^{n}be a random vector, such that each component (called *data point* or, if *Y* represents a quantity measured on part of the genome, *probe*) is conditionally normally distributed:

{Y}_{i}{|}_{{\tilde{\mu}}_{i}^{0},{\sigma}^{2}}~\mathcal{N}({\tilde{\mu}}_{i}^{0},{\sigma}^{2}).

Suppose also that *Y* represents a noisy observation of a piecewise constant function, which consists of *k*_{0} horizontal segments. Then, the segment level at a generic position *i* ({\tilde{\mu}}_{i}^{0}) does not assume different values for each *i*, but the data are divided into *k*_{0} intervals (with boundaries 0={t}_{0}^{0}<{t}_{1}^{0}<\cdots <{t}_{{k}_{0}-1}^{0}<{t}_{{k}_{0}}^{0}=n) where {\tilde{\mu}}_{{t}_{q-1}+1}^{0}=\mathrm{...}={\tilde{\mu}}_{{t}_{q}}^{0}=:{\mu}_{q}^{0} for each q = 1,...,*k*_{0}. Hence, {\mu}_{q}^{0} represents the level of the *q*^{th}segment. Our goal is to estimate the levels {\mu}^{0}=({\mu}_{1}^{0},\mathrm{...},{\mu}_{{k}_{0}}^{0}) of all the segments. In order to do that, we first need to estimate the number of the segments *k*_{0} and the partition of the data *t*^{0}. From a Bayesian point of view, **μ**^{0}, **t**^{0} and *k*_{0} are treated as random variables, hence we denote them with the corresponding upper case letters (**M**, *T* and *K*). Moreover, because of their randomness, we need to define a prior distribution for each of them to complete the model.

For the number of segments and the boundaries, we assume noninformative prior distributions:

\begin{array}{cc}p(k)=\frac{1}{{k}_{\mathrm{max}}}& k\in \mathbb{K}\end{array}

\begin{array}{cc}p(\mathit{t}|k)=\frac{1}{\left(\begin{array}{c}n-1\\ k-1\end{array}\right)}& \mathit{t}\in {\mathbb{T}}_{k,n},\end{array}

where \mathbb{K} = {1,...,*k*_{max}} and {\mathbb{T}}_{k,n} is the subspace of {\mathbb{N}}_{0}^{k+1} such that *t*_{0} = 0, *t*_{
k
}= *n* and *t*_{
q
}∊ {1,...,*n* - 1} for all *q* = 1,...,*k* - 1, in an ordered way and without repetitions.

About **M**, we assume that all its components are mutually independent and identically normally distributed,

M{|}_{\nu ,{\rho}^{2},K=k}~\mathcal{N}(\mathit{\nu},{\rho}^{2}\mathbb{I}),

where *ν*∊ ℝ^{k}, such that *ν*_{
q
}= *ν* for each *q* = 1,...,*k*, and \mathbb{I} ∊ ℝ^{k × k}, such that \mathbb{I}_{p, q}= *δ*_{p,q}for each *p*, *q* = 1,...,*k*.

Instead of these assumptions, we could assume a Cauchy distribution for each *Y*_{
i
}or M_{
q
}in order to model an observation whose noise has heavier tails, as previously done by Hutter [11, 12].

### Original estimation: the BPCR method

The statistical procedure consists in a sequence of estimations due to the relationship among the parameters.

BPCR estimates the number of segments with the MAP (Maximum A Posteriori) estimate given the sample point *y*,

\widehat{k}:=\underset{k\in \mathbb{K}}{\mathrm{arg}\mathrm{max}}p(k|\mathit{y}),

and, given \widehat{k}, also each boundary is estimated separately with its corresponding MAP estimate,

{\widehat{t}}_{p}:=\underset{h\in \{p,\mathrm{...},n-(\widehat{k}-p)\}}{\mathrm{arg}\mathrm{max}}\text{P}({T}_{p}=h|\mathit{y},\widehat{k})

for all *p* = 1,...,\widehat{k} - 1. Finally, the *r*^{th}moment of the level of the *m*^{th}segment is estimated with its posterior mean. Since its computation needs the knowledge of the number of segments and the partition of the data, we replace them with the estimated ones,

{\widehat{\mu}}_{m}^{r}:=E[{\text{M}}_{m}^{r}|\mathit{y},\widehat{\mathit{t}},\widehat{k}]

for all *m* = 1,...,\widehat{k}. When *r* = 1 and we assume that *Y* and **M** are normally distributed, the estimate turns out to be

{\widehat{\mu}}_{m}=\frac{{\rho}^{2}{\displaystyle {\sum}_{i={t}_{m-1}+1}^{{t}_{m}}{y}_{i}+{\sigma}^{2}\nu}}{({t}_{m}-{t}_{m-1}){\rho}^{2}+{\sigma}^{2}},

for all *m* = 1,...,\widehat{k}. When the sample contains only one segment, the Bayesian estimation of the posterior distribution of the levels should theoretically lead to a normal distribution, similar to a Dirac delta function centered at \widehat{\nu}, since the levels can assume only one value from the data. In fact, in this case, if we estimate *ρ*^{2} only using the data (without using any prior or other information), then this value will be close to zero (the variance of a constant random variable) and so the level will be estimated with \widehat{\nu}, the mean of the data (see Equation (9)).

We can estimate the segment level {\tilde{\text{M}}}_{s} at a generic position *s*, using the fact that it belongs to some segment *m* and in this segment it is equal to the corresponding M_{
m
}. Then, summing over all the possible segments, we can compute its posterior distribution in the following way:

f({\tilde{\mu}}_{s}|\mathit{y},K={k}_{0})={\displaystyle \sum _{m=1}^{{k}_{0}}{\displaystyle \sum _{i=0}^{s-1}{\displaystyle \sum _{j=s}^{n}f({\mu}_{m},{T}_{m-1}=i,{T}_{m}=j|\mathit{y},K={k}_{0})}}}

and the corresponding estimate of {\tilde{\text{M}}}_{s} given \widehat{k} is

{\widehat{\tilde{\mu}}}_{s}:=E[{\tilde{\text{M}}}_{s}|\mathit{y},\widehat{k}],

for all *s* = 1,...,*n*. The vector \widehat{\tilde{\mathit{\mu}}} is called *Bayesian Regression Curve* (*BRC*).

The probability distributions defined previously depend on the hyper-parameters *ν*, *ρ*^{2} and *σ*^{2} (respectively, the mean and the variance of the segment levels and the variance of the noise). Hutter [11, 12] suggested the following estimators:

\widehat{\nu}:=\frac{1}{n}{\displaystyle \sum _{i=1}^{n}{Y}_{i}=\overline{Y}}

{\widehat{\rho}}^{2}:=\frac{1}{n}{\displaystyle \sum _{i=1}^{n}{({Y}_{i}-\overline{Y})}^{2}}

{\widehat{\sigma}}^{2}:=\frac{1}{2(n-1)}{\displaystyle \sum _{i=1}^{n-1}{({Y}_{i+1}-{\overline{Y}}_{i})}^{2}.}

### Improved estimators of the number of segments

To understand the real meaning of the MAP estimator \widehat{K}, we need to introduce the theory of the construction of a generic Bayesian estimator.

In general, a Bayesian estimator is defined in the following way. Let us suppose that *Z* is a random variable whose distribution depends on an unknown parameter *θ*, which we want to estimate. Since we cannot exactly know the true value of the parameter, we consider it as a random variable Θ with a given prior probability distribution. In order to measure the goodness of the estimation, we define an error (or *loss function*) and we choose the estimator that minimizes the expected error given the sample *Z*,

\widehat{\Theta}:=\mathrm{arg}\underset{{\theta}^{\prime}}{\mathrm{min}}E[err(\Theta ,{\theta}^{\prime})|\mathit{Z}].

The 0–1 error (defined as 1 -*δ*_{θ, θ'}) is commonly used for a parameter which can assume only a discrete number of values. The estimator corresponding to this error is the MAP estimator,

\begin{array}{lll}\mathrm{arg}\underset{{\theta}^{\prime}}{\mathrm{min}}E[1-{\delta}_{\Theta ,{\theta}^{\prime}}|\mathit{Z}]\hfill & =\hfill & \mathrm{arg}\underset{{\theta}^{\prime}}{\mathrm{max}}{\displaystyle \sum _{\theta}{\delta}_{\theta ,{\theta}^{\prime}}p(\theta |\mathit{Z})}\hfill \\ =\hfill & \mathrm{arg}\underset{{\theta}^{\prime}}{\mathrm{max}}p({\theta}^{\prime}|\mathit{Z}).\hfill \end{array}

Obviously, if we use different types of errors, we can obtain different estimators. In the following, we will use \widehat{K} to denote any estimator of *K*, while \widehat{K}_{01} to denote the parameter estimator \widehat{K} based on the 0–1 error.

Using the 0–1 error, we give the same penalty to each value different from the true value, whether it is close to or far away from the true one. To take into account the distance of the estimated value from the true one, we need to use other types of errors, which are based on different definitions of distance, such as,

absolute error := |*θ* - *θ'*|

squared error := (*θ* - *θ'*)^{2}.

If the parameter *θ* ∊ ℝ, then the estimators corresponding to these errors are the median and the mean of its posterior distribution, respectively. In our case, we denote these estimators of *k*_{0} with {\widehat{K}}_{1} and {\widehat{K}}_{2}.

### Improved estimators of the boundaries

Similarly to the previous subsection, we derive alternative boundary estimators, by considering different types of errors. We denote the MAP boundary estimator defined in Equation (4) with {\widehat{\mathit{T}}}_{01}.

#### Meaning of the estimator {\widehat{\mathit{T}}}_{01}

The estimator {\widehat{\mathit{T}}}_{01} is defined in such a way that each component minimizes the 0–1 error of the corresponding boundary, separately. Explicitly, given the sample point *y* and the segment number *k*_{0}, its estimate is

{\widehat{\mathit{T}}}_{01}=\left(0,\mathrm{arg}\underset{{t}_{1}\in \mathbb{T}}{\mathrm{max}}p({t}_{1}|y,K={k}_{0}),\mathrm{...},\mathrm{arg}\underset{{t}_{{k}_{0}-1}\in \mathbb{T}}{\mathrm{max}}p({t}_{{k}_{0}-1}|y,K={k}_{0}),n\right),

where \mathbb{T} = {1,...,*n* - 1}. {\widehat{\mathit{T}}}_{01} may be regarded as an approximation of the Bayesian estimator that minimizes the error which counts the number of wrongly estimated boundaries:

\text{sum}0\text{-}1\text{error}={\displaystyle \sum _{p=1}^{{k}_{0}-1}\left(1-{\delta}_{{t}_{p}^{0},{t}_{p}}\right)={k}_{0}-1-{\displaystyle \sum _{p=1}^{{k}_{0}-1}{\delta}_{{t}_{p}^{0},{t}_{p}}}},

that is

{\widehat{\mathit{T}}}_{\text{sum}}=\underset{\mathit{t}\in {\mathbb{T}}_{{k}_{0},n}}{\mathrm{arg}\mathrm{max}}{\displaystyle \sum _{p=1}^{{k}_{0}-1}p({t}_{p}|K={k}_{0},\mathit{Y}).}

#### Definition of the estimator {\widehat{\mathit{{\rm T}}}}_{\text{joint}}

A problem of the latter estimator (Equation (17)) is its computational complexity, because it needs the computation of all the ordered combinations of the boundaries. On the other hand, there are two reasons for which {\widehat{\mathit{T}}}_{01} is not a suitable estimator of the boundaries. First, it does not take into account that the boundaries are dependent, because they have to be ordered, and second, in principle, it can have more than one component with the same value. As a consequence, a theoretically more correct way to estimate the boundaries is minimizing the 0–1 error with respect to the joint boundary probability distribution (this error is called *joint 0–1 error*). Then, given *k*_{0} and *Y*, the boundary estimator becomes

{\widehat{\mathit{T}}}_{\text{joint}}=\underset{\mathit{t}\in {\mathbb{T}}_{{k}_{0},n}}{\mathrm{arg}\mathrm{max}}p(\mathit{t}|K={k}_{0},\mathit{Y}).

#### Definition of the estimators \widehat{\mathit{{\rm T}}}_{BinErr} and \widehat{\mathit{{\rm T}}}_{BinErrAk}

We must notice that the estimators considered until now have the same length of the true vector of the boundaries. In practice, the number of segments *k*_{0} is unknown, so that we should use \widehat{k}. As a consequence, if \widehat{k} is different from *k*_{0}, then, strictly speaking, we cannot minimize the previous types of error because the vectors have different length.

A way to solve this issue is to map each boundary vector into a vector \mathit{\tau}\in {\mathbb{R}}_{0}^{n+1} in the following way:

\mathit{t}\mapsto \mathit{\tau}\text{suchthat}{\tau}_{i}=\{\begin{array}{ll}1\hfill & \text{if}\exists p\text{suchthat}{t}_{p}=i\hfill \\ 0\hfill & \text{otherwise}.\hfill \end{array}

We denote with \mathcal{T}{\mathcal{T}}_{k,n} the set of all the possible **τ** with *τ*_{0} = 1, *τ*_{
n
}= 1 and *k* - 1 of the other components equal to 1.

Now, for the two new vectors *τ*^{0} and *τ*, we define the following binary error,

\text{binaryerror}={k}_{0}-1-\u3008{\mathit{\tau}}^{0},\mathit{\tau}\u3009={k}_{0}-1-{\displaystyle \sum _{i=1}^{n-1}{\tau}_{i}^{0}{\tau}_{i}}.

Since the two-norm of the vectors involved is fixed, minimizing (20) is the same as minimizing the Euclidean distance between the two vectors or the sum 0–1 error. Furthermore, error (20) is consistent with the Russell-Rao dissimilarity measure defined on the space of the binary vectors. Its corresponding estimator is

{\widehat{\mathit{{\rm T}}}}_{\text{BinErr}}:=\underset{{\tau}^{\prime}\in \mathcal{T}{\mathcal{T}}_{{k}_{0},n}}{\mathrm{arg}\mathrm{min}}\text{E}\left[{k}_{0}-1-{\displaystyle \sum _{i=1}^{n-1}{\tau}_{i}{{\tau}^{\prime}}_{i}}|\mathit{Y},{k}_{0}\right]=\underset{{\tau}^{\prime}\in \mathcal{T}{\mathcal{T}}_{{k}_{0},n}}{\mathrm{arg}\mathrm{max}}\text{E}\left[{\displaystyle \sum _{i=1}^{n-1}{\tau}_{i}{{\tau}^{\prime}}_{i}}|\mathit{Y},{k}_{0}\right].

Since we do not know the real value of *k*_{0}, we should replace it with \widehat{k} to compute Equation (21). Doing this, we could amplify the error of the boundary estimation because of the addition of the error of the segment number estimation. A way to attenuate this issue is to integrate out the number of segments in the conditional expected value. Then the estimator becomes

{\widehat{\mathit{{\rm T}}}}_{\text{BinErrAk}}:=\underset{{\tau}^{\prime}\in \mathcal{T}{\mathcal{T}}_{\widehat{k},n}}{\mathrm{arg}\mathrm{max}}\text{E}\left[{\displaystyle \sum _{i=1}^{n-1}{\tau}_{i}{{\tau}^{\prime}}_{i}}|\mathit{Y}\right].

#### Improved regression curve

As we saw in the previous subsections, there are cases in which the estimation of a parameter of our interest can be made independently of other parameters by integration. The computation of the BRC (see Equations (7) and (8)) suggests to average also over the number of segments by considering the posterior probability of {\tilde{\text{M}}}_{s}, given only the sample point *y*,

{\widehat{\tilde{\mu}}}_{s}:=E[{\tilde{\text{M}}}_{s}|\mathit{y}].

Unfortunately, the computation of this quantity requires time O({n}^{2}{k}_{\mathrm{max}}^{2}) (see section "The dynamic program" in Additional file 2), hence it could be a problem with samples of big size. This new type of {\tilde{\text{M}}}_{s} estimation is referred to as *Bayesian Regression Curve Averaging over k* (*BRCAk*).

The same procedure cannot be applied for the level estimation, because in that case we need to know the partition of the whole interval.

### Properties of the hyper-parameter estimators and definition of new estimators

In order to study the properties of the hyper-parameter estimators defined in Equations (9), (11) and (10), first we need to compute the moments of the data \mathit{Y}{|}_{\nu ,{\rho}^{2},{\sigma}^{2}}. In the following, we will denote with *n*_{
q
}the number of data points in the *q*^{th}segment.

At first, let us consider only the data which belong to the *q*^{th}segment. From the hypothesis of the model, we know that

\begin{array}{lll}{Y}_{j}{|}_{{M}_{q},{\sigma}^{2}}\hfill & ~\hfill & \begin{array}{cc}\mathcal{N}({\text{M}}_{q},{\sigma}^{2})& j={t}_{q-1}+1,\mathrm{...},{t}_{q}\end{array}\hfill \\ {M}_{q}{|}_{\nu ,{\rho}^{2}}\hfill & ~\hfill & \mathcal{N}(\nu ,{\rho}^{2}),\hfill \end{array}

hence the marginal distribution of any two data points *Y*_{
i
}and *Y*_{
j
}belonging to the *q*^{th}segment is \mathcal{N}(**ν**, Σ), where

\Sigma =\left[\begin{array}{cc}{\sigma}^{2}+{\rho}^{2}& {\rho}^{2}\\ {\rho}^{2}& {\sigma}^{2}+{\rho}^{2}\end{array}\right].

It follows that the covariance between two data points, which belong to the same segment, is

Cov(*Y*_{
i
}, *Y*_{
j
}|*ν*, *ρ*^{2}, *σ*^{2}) = *ρ*^{2} *i* ≠ *j*,

and

*E*[*Y*_{
j
}|*ν*, *ρ*^{2}, *σ*^{2}] = *ν*

Var(*Y*_{
j
}|*ν*, *ρ*^{2}, *σ*^{2}) = *σ*^{2} + *ρ*^{2},

for each *j* = 1,...,*n*, independently of the segment to which it belongs.

Furthermore, from the hypotheses of the model, given the segmentation *t*^{0}, data points belonging to different segments are independent.

#### Expected value and variance of the estimator \widehat{\nu}

The estimator of *ν* is defined as \widehat{\nu}=\overline{Y} (see Equation (9)). From Equation (26), we can see that this estimator is unbiased and its variance turns out to be

\text{Var}[\overline{Y}]=\frac{{\sigma}^{2}}{n}+{\rho}^{2}\frac{{\displaystyle {\sum}_{p=1}^{{k}_{0}}{n}_{p}^{2}}}{{n}^{2}}.

Hence, the variance is always greater than O\left(\frac{{\rho}^{2}}{{k}_{0}}\right), even if we use a denser sampling, i.e. we augment the number of data points in the interval in which we are estimating the piecewise constant function.

#### New definition of the estimator \widehat{{\sigma}^{2}} and its expected value

A circular version of the *σ*^{2} estimator defined in Equation (11) is

\widehat{{\sigma}^{2}}:=\frac{1}{2n}{\displaystyle \sum _{i=1}^{n}{({Y}_{i+1}-{Y}_{i})}^{2}},

where *Y*_{n+1}:= *Y*_{1}. Using the values of the moments of the data points, its expected value is

E\widehat{[{\sigma}^{2}]}={\sigma}^{2}+{\rho}^{2}\frac{{k}_{0}}{n}{\mathbb{I}}_{\{{k}_{0}\ge 2\}},

where we considered two cases in the computation: (a) when *k*_{0} = 1, *Y*_{1} and *Y*_{
n
}belong to the same segment (thus they are dependent), (b) when *k*_{0} ≥ 2, we supposed that the first and the last segments have different levels and so *Y*_{1} and *Y*_{
n
}are independent. If the first and the last segments had the same level, then the two segments would be joined together and thus *Y*_{1} and *Y*_{
n
}would be dependent. In this case, the expected value would be the same but with *k*_{0} - 1 instead of *k*_{0}, since the number of segments would be *k*_{0} - 1. In any case, for *k*_{0} = 1, the estimator \widehat{{\sigma}^{2}} is unbiased, while for *k*_{0} ≪ *n* but *k*_{0} ≠ 1, \widehat{{\sigma}^{2}} is almost unbiased.

#### Expected value of the estimator \widehat{{\rho}^{2}}

The expected value of the estimator \widehat{{\rho}^{2}} (defined in Equation (10)) is

E[\widehat{{\rho}^{2}}]={\sigma}^{2}\left(1-\frac{1}{n}\right)+{\rho}^{2}\left(1-\frac{{\displaystyle {\sum}_{p=1}^{{k}_{0}}{n}_{p}^{2}}}{{n}^{2}}\right).

Note that when *k*_{0} = 1 (i.e. having only one segment), E[\widehat{{\rho}^{2}}]={\sigma}^{2}\left(1-\frac{1}{n}\right). In this degenerate case, the variance of the segment levels *ρ*^{2} should be estimated with zero but \widehat{{\rho}^{2}} estimates it with the variance of the data points.

Moreover, since {\displaystyle {\sum}_{p=1}^{{k}_{0}}{n}_{p}^{2}}\ge n (the equality holds only when *k*_{0} = *n*), we obtain that

{\sigma}^{2}\left(1-\frac{1}{n}\right)\le E[\widehat{{\rho}^{2}}]\le \left(1-\frac{1}{n}\right)\left({\sigma}^{2}+{\rho}^{2}\right).

Hence, if *n* is large the expected value is between *σ*^{2} and *σ*^{2} + *ρ*^{2}, so that, if *ρ*^{2} ≪ *σ*^{2}, the estimator is almost unbiased for *σ*^{2} (instead of *ρ*^{2}).

#### Definition of alternative estimator of *ρ*^{2}: \widehat{{\rho}_{1}^{2}}

Since the covariance between data points belonging to the same segment is *ρ*^{2}, we could try to use a circular version of the estimator of the autocovariance of a stationary time series

\widehat{{\rho}_{1}^{2}}:=\frac{1}{n}{\displaystyle \sum _{i=1}^{n}({Y}_{i}-\overline{Y})({Y}_{i+1}-\overline{Y}),}

where *Y*_{n+1}:= *Y*_{1}. The expected value of the estimator turns out to be

E[\widehat{{\rho}_{1}^{2}}]=\{\begin{array}{ll}-\frac{{\sigma}^{2}}{n}\hfill & \text{if}{k}_{0}=1\hfill \\ \frac{{\rho}^{2}}{n}\left(n-{k}_{0}-\frac{{\displaystyle {\sum}_{p=1}^{{k}_{0}}{n}_{p}^{2}}}{n}\right)-\frac{{\sigma}^{2}}{n}\hfill & \text{if}{k}_{0}\ge 2.\hfill \end{array}

In the computation we considered two cases: *k*_{0} = 1 and *k*_{0} ≥ 2. When *k*_{0} = 1, *Y*_{1} and *Y*_{
n
}belong to the same segment and so they are dependent; when *k*_{0} ≥ 2, we suppose that the first and the last segment have not the same level value and so *Y*_{1} and *Y*_{
n
}are independent. If *k*_{0} ≥ 2 and the first and the last segment had the same level value (event with a very low probability), then the first and the last segments would be joined together and so *Y*_{1} and *Y*_{
n
}would be dependent. In this case, the expected value of the estimator would have the same formula, but with *k*_{0} - 1 instead of *k*_{0}.

We can observe that, when *k*_{0} = 1, the expected value is negative, while, when *k*_{0} ≥ 2, it can be negative or positive. Moreover, the coefficient of *σ*^{2} is -\frac{1}{n} and so this addendum does not contribute so much to the unbiasedness of the estimator.

The negativity of the expected value happens because the estimator is a generic estimator of the covariance and, in general, this quantity can be negative. To prevent the negativity of the estimator, we can use its absolute value. In this way, when the quantity in (33) is negative, we use the same estimator but with the sign changed in one of the factors of each product, \widehat{{\rho}_{1}^{2}}=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}({Y}_{i}-\overline{Y})(\overline{Y}-{Y}_{i+1})}. Hence, the meaning of the estimator is the same. We are interested only in the absolute value of the estimate and not in its sign. In fact, we already know that the correlation is positive and the negativity of the estimate is due only to the property of the estimator. Our final definition of the estimator is then

\widehat{{\rho}_{1}^{2}}:=\frac{1}{n}\left|{\displaystyle \sum _{i=1}^{n}({Y}_{i}-\overline{Y})({Y}_{i+1}-\overline{Y})}\right|.