We first briefly review Fisher's test for the detection of periodic time series. We then generalise the test for different robust spectral estimators and finally we modify the test for the detection of a specific frequency.

#### Testing of all frequencies

Suppose we have a power spectral estimate

*I*(

*ω*). If the spectral estimate is obtained by using the classical periodogram (see e.g. [

5,

8] or [

3]) at the (harmonic) normalised frequencies

where

*a* = [

*N*/2] and the brackets denote the integer part of the rational number, then Fisher's

*g*-statistic is defined as the maximum periodogram ordinate divided by the sum of all of the ordinates. Formally

where *q* = [(*N* - 1)/2] and we can analytically find the *p*-values for the statistic under the Gaussian assumption (see e.g. [8] or [6]). Other test statistics have been proposed for periodicity detection as well. For example, Chen [4] used a test called Hubert's C-test to complement the Fisher's test. The C-test tests whether a time series is a Gaussian white noise sequence. However, like Fisher's test, the C-test is based on the standard periodogram. Therefore, the C-test is not robust and even if the underlying sequence is Gaussian, a few outliers (e.g. measurement errors) can lead to incorrect results.

In [

5] we developed a highly robust periodicity detection method that utilises the equivalence between the periodogram and correlogram spectral estimators. Robustness is obtained by replacing the standard autocorrelation estimator with a rank-based alternative. We used this robust spectral estimator instead of the periodogram in Equation (

3). Since analytical results for the

*p*-values of this modified distribution-free test do not exist (as of yet) we used permutation tests and simulated distributions to estimate the

*p*-values. The rank-based method proposed in [

5] does not, however, easily generalise to non-uniform sampling. In practice many biological time series measurements, including microarrays, are conducted at time points that are of biological interest and not necessarily at fixed time intervals. To tackle this problem, we propose to use robust regression based methods in the estimation of the spectral content. This way we can also take non-uniform sampling into account. Further motivation for this approach can be found by recalling that yet another (but still equivalent) formulation of the standard periodogram is obtained by using the frequency representation of a signal

where the last term is omitted if

*N* is odd. Parameters

*a*
_{1i
}and

*a*
_{2j
}now represent the frequency content of a time series signal. Under the commonly invoked i.i.d. Gaussian assumption, the optimal, i.e. minimum variance unbiased (MVU), estimates of the parameters can be obtained by using ordinary least squares regression (or simply inverse) to solve

for [

*a*
_{0}
**a**
_{
1
}
^{
T
}
**a**
_{
2
}
^{
T
}
*a*
_{N/2}]

^{
T
}. where

**y** is the measured time series and again omitting

*a*
_{N/2 }(and the last column in the matrix) if

*N* is odd. Matrices

*A*
_{1} and

*A*
_{2} are defined as

We should also verify that the above design matrix is invertible (full column rank), which should be the case unless the measurement times *t*
_{
n
}(or frequencies) are "pathological". To elaborate, for any frequency *ω* it is possible to choose *t*
_{
n
}so that, e.g., the cos term in one column of *A*
_{1} is always equal to 1, making that column linearly dependent with the first column vector (vector of ones). It should, however, be unlikely that this happens in practice. Furthermore, given time instants *t*
_{
n
}, we can define the frequencies (perhaps slightly differently if needed) so that the columns are always linerly independent.

By choosing

*ω* according to Equation (

2) and assuming uniform sampling we can estimate the harmonic Fourier coefficients in

**a**
_{
1
}and

**a**
_{
2
}. Evaluating

yields a power spectral estimate *B*(*ω*) that coincides (see [38] p. 395) with the periodogram if the time series sampling is uniform. This explicitly shows that the periodogram can be obtained directly from a least squares estimate and is hence sensitive to different anomalies in data. On the other hand, the above regression based formulation opens up the possibility of developing alternative robust periodicity detection methods (and also robust spectrum estimation, see [30]) that can naturally handle non-uniform sampling. Since the dimensionality of the model matrix in Equation (5) is high and robust regression estimators do not necessarily yield optimum solutions easily for high dimensional models, we propose to find the parameters one frequency at a time. Therefore, for the purposes of robust estimation, we consider the following reduced model**y** = *X*(*ω*)**b** + **e**, (11)

where

**y** is the measured time series vector,

**b** is the unknown parameter vector and

**e** is a residual term. The matrix

*X*(

*ω*) is formed in the following way

where *ω* is defined similarly as in Equation (1) and *t*
_{
n
}is the real-valued index.

To estimate **b** at the different frequencies we propose to use robust regression methods. The different robust regression estimators we chose for evaluation are reviewed in subsection *The different estimators*. It is suggested in [30] that when calculating the coefficients one frequency at a time, the fitted sinusoidal of the latest iteration should be subtracted from the signal **y** (i.e. leaving the residual). This residual is then used in place of **y** when estimating the parameter **b** of the next frequency, thus avoiding the problems caused by the loss of orthogonality. In addition, the order in which the frequencies should be sought is based on an initial spectral estimate, which is obtained by regression of the coefficients without any subtraction. The strategy is then to process the frequencies according to the magnitude of the coefficients, in descending order. To expand, in the case of uniform sampling, sinusoidals at the Fourier frequencies constitute an orthonormal basis. Thus in the least squares case, it is irrelevant in which order we fit the sinusoidals, since they are independent and the resulting spectral estimate will be the same [8]. If we have non-uniform sampling or we use estimators other than least squares we can no longer use this property.

The dependence between the fitted sinusoidals and the problem of overfitting can be corrected, at least to a reasonably good degree [30], by removing the fitted signal and fitting to the residuals. We should also mention that the non-orthogonolity would not be a problem if all the sinusoidals were fitted simultaneously, but this is not a reasonable approach for the robust estimators (due to the problems of high dimensionality and non-convergence.

We can now use the robust power spectral estimate *B*(*ω*) in Equation (3) without the requirement of uniform time series sampling. However, if the time series sampling is not uniform, the notions of Nyquist frequency and harmonic Fourier frequencies are no longer clear. We can still approximate the sampling frequency as if the sampling was uniform, as explained in *Appendix A*, and thus approximate the Fourier frequencies as well. For more discussion on the subject, see [39]. *p*-values can be estimated using simulated distributions or permutation tests. The use of simulated distributions is now more awkward than in the case of the rank-based method of [**5**]. It was noted in [5] that the rank-based method is "distribution-free", which implies that it is sufficient to simulate the null hypothesis distribution for only one noise type (e.g., Gaussian) under the assumption that the noise is i.i.d. The proposed regression based methods are not distribution-free and we would have to generate the null hypothesis distribution for all the different imaginable noise types separately. Of course, independent of the method, the null hypothesis distribution must be generated separately for different time series lengths. Therefore, based on the preceding discussion we propose to use permutation tests with multiple correction method of Benjamini and Hochberg [40] in the same way as in [5] to find the significance values.

#### Testing of one frequency

If we have

*a priori* information about the frequency of the interesting periodic phenomenon, we do not need to seek periodicities at all the frequencies. There are different ways of modifying Equation (

3) to take this additional information into account. In [

5] we replaced max

_{1 ≤ l ≤ a
}
*I* (

*ω*
_{l}) with just the power spectral estimate at the index of interest to concentrate on just the interesting frequency. If we want to extend our search to other than the harmonic Fourier frequencies then the denominator in Equation (

3) loses its meaning. Therefore we propose to use the following modified statistic

where *g*
_{
m
}is the modified *g*-statistic, *ω*
_{
c
}is the frequency of interest and
and
correspond to the coefficients of the fitted sine and cosine terms (Equation 11) at this frequency. The first problem arises with *p*-value computation. If we use ordinary least squares (OLS) for the regression then we can just neglect the denominator in Equation (3) since, as will be explained later, we will be using permutation tests for *p*-value computation and the denominator is the same for all the permutations of a single time series; thus, there is no need for the scaling. However when using the robust regression based methods there is no guarantee that the sum of the terms is a constant for different permutations. On the other hand, in many cases the effect of the omission of the scaling factor in the denominator has little effect on periodicity detection. This is also verified with good results on real data in subsection *The methods in practice*. Moreover, this way we do not need to compute the whole spectral estimate but just at the frequency of interest, which can now be other than one of the harmonic Fourier frequencies as well. This is a huge benefit for the use of the regression methods since the robust ones are typically computationally time-consuming. To sum up, we propose to use the following procedure for finding periodicities at a known frequency:

1. For a time series, fit the model of Equation (11) at a chosen frequency *ω*
_{
c
}

2. Evaluate Equation (13).

3. Randomly permute the original time series and for each permutation repeat steps 1 and 2.

4. To estimate the null hypothesis distribution of the modified *g*-statistic, compose a histogram of the population of *g*-statistics generated in step 3.

5. Use the histogram as a distributional estimate to get a *p*-value for the original test statistic computed in step 2.

6. Repeat steps 1–5 for all the time series to get the necessary *p*-values.

7. use multiple correction for the obtained *p*-values.

Although the idea of robustly fitting sinusoidals of one frequency to data is not new. we have here plausibly incorporated it for periodicity detection by modifying Fisher's test. We would like to note that exact tests are also available for some robust tests (see e.g. [27]). However, since these tests require some knowledge of the underlying distributions and because they are not applicable to all robust methods considered here, we propose to use the general purpose non-parametric permutation tests.