The current article focuses on the two-component model. Other points of view exist. In reference [25] the two-component model is reshaped into a conceptually attractive one-group model allowing a continuum of effects.

Denote the pdf of *p*-values by *f*, the proportion of unchanged by *p*_{0} and the distribution of the *p*-values for the changed genes by *f*_{1}. Then the pdf of *p*-values may be written as

*f*(*x*) = *p*_{0} × 1 + (1 - *p*_{0})*f*_{1}(*x*) (8)

using the fact that *p*-values for the unchanged genes follow a uniform distribution over the interval [0,1]. This model is unidentifiable without further assumptions, e.g. that p-values in a vicinity of 1 only represent unchanged genes. From the non-negativity of pdf's, clearly

*f*(*x*) ≥ *p*_{0} (9)

This leads to the estimate based on the minimum of the estimated pdf [1]

see also Figure 6. In most cases the minimum in (10) will occur for some *x* close to or at 1. Hence (10) will in these cases agree well with an estimate of *f*(1). If one has reason to believe that *p*_{0} is close to 1, it may pay off to replace (10) by the 25% percentile or simply put the estimate equal to 1, in order to make overestimation more likely.

### LSL

Let *R*(*α*) = # {*i* : *p*_{
i
}≤ *α*}, the number of rejected given the cut-off *α*. In [36] the approximation

*N* - *R*(*α*) ≈ *E*[*N* - *R*(*α*)] ≈ *N*_{0}(1 - *α*)

for small α and *N*_{0} = *Np*_{0} the number of true null hypotheses appears. Consequently, (*N* - *R*(*p*_{(i)})/(1 - *p*_{(i)}) = (*N* - *i*)/(1 - *p*_{(i)}) will approximate *N*_{0}, which lead the pioneering authors to consider plotting 1 - *p*_{(i)}against *N* - *i*, thus giving them an estimate of *N*_{0}. In [26] the Lowest SLope estimator (LSL) of *N*_{0} based on the slopes *S*_{
i
}= (1 - *p*_{(i)})/(*N* - *i* + 1) is presented. Starting from *i* = 1, the procedure stops at the first occurrence of *Si*_{0} <*Si*_{0}-1, and outputs the estimate

In [12] the two above estimates are presented, derived and compared together with a method called Mean of Differences Method (MD). MD and LSL are motivated by assuming independence and approximating the gaps *d*_{(i)}= *p*_{(i)}- *p*_{(i-1)}(define *p*_{(0)} = 0 and *p*_{(N+1))}≡ 1) with a *Beta*(1, *N*_{0}) distribution, which has expectation 1/(*N*_{0} + 1). This expectation may be estimated by the inverse of a mean of the form

MD proceeds downward and chooses *i*_{0} equal to the first *j* satisfying

Of these three methods LSL and MD give very similar results, and outperform their predecessor [12].

LSL is available as function *fdr.estimate.eta0* in package GeneTS [18] with the option method= "adaptive".

### The smoother

A method here referred to as the smoother appeared in [9]. This method, like all presented, is based on a comparison of the empirical *p*-value distribution to that of the uniform distribution. There will likely be fewer *p*-values close to 1 in the empirical than in the null distribution, which is a uniform. The ratio of the proportion of *p*-values greater than some *η* to the expected proportion under the uniform distribution, 1-*η*, will give a measure of the thinning of observed *p*-values compared to the null distribution. Thus, with *F*_{
e
}denoting the empirical distribution, the ratio {1-*F*_{
e
}(*η*)}/{1-*η*} will often be a good estimate of *p*_{0} for an astutely chosen threshold *η*. A spline is fitted to the function *p*_{0}(*η*) = {1-*F*_{
e
}(*η*)}/{1-*η*}, and the resulting function is evaluated at *η* = 1, yielding the estimate

(tilde '~' above *p*_{0} denoting the spline-smoothed version of *p*_{0}(*η*)) which goes into the calculation of the FDR in (2), see Figure 15. Note the relationship with LSL, *p*_{0}(*p*_{(i)}) = (*N*-*i*)/{(*N*-*i*+1)*S*_{
i
}*N*}. It can be shown that for fixed *η* this method offers a conservative point estimate of the FDR:

The q-value is estimated by combining (2), (7) and (11), and an implementation is provided as the function *qvalue* in package qvalue available on CRAN [18].

In [30] the authors go to great lengths to prove that for fixed *η*, as above, the conservativeness remains under various forms of dependency, such as clumpy dependence.

### The Bootstrap LSE

Another approach pioneered by Storey in [3] is to use a bootstrap least squares estimate (LSE), which chooses a value of *η* in *p*_{0}(*η*), that minimises the variation of the estimate for random samples of the original *p*-values. The bootstrap standard reference [37] provides more theoretical background. Generate B new samples *p*-values *p**^{b}(*b* = 1,..., *B*) by sampling with replacement from the observed ones, calculate a measure of the Mean Squared Error (MSE) of the corresponding estimates *p**^{b}_{0}(*η*) for a lattice of values of *η* and choose the value minimising the MSE. More formally, the optimal η is obtained through

The version of the bootstrap used in this article uses more samples B than the version available in qvalue (B = 500 instead of B = 100), and seems to perform better (data not shown).

Available in functions *qvalue* [18] and *p0.mom* (in package SAGx) [18, 38].

### SPLOSH

In [11] a spline function estimates the log-transformed pdf *log*[*f*(*x*)] using a complex algorithm involving splines called spacings LOESS histogram (SPLOSH). To obtain a stable estimate of FDR near zero a technique from mathematical analysis called l'Hospital's rule is used to approximate the ratio in (5) and to yield

,

where the numerator has been estimated as in (4). An R package with the same name is available [39].

The FDR estimate (6) is used with *F* obtained by the non-parametric estimate of the pdf.

Note that we can now calculate the posterior probability given its *p*-value that a gene is a DEG as *p*_{1}(*x*) = 1 - *p*_{0}/*f*(*x*), compare (3).

The method is available in R function *splosh* [40].

### BUM

In [10] the authors assume a beta-uniform (BUM) distribution, i.e. in (1) they replace *f*_{1} by a beta distribution,

*f*(*x*) = *λ* + (1 - *λ*)*ax*^{a-1} (12)

where in addition to *λ* which corresponds to *p*_{0} the shape parameter *a* has to be estimated. Thanks to the simple form of the distribution it is possible to estimate parameters through the maximum likelihood principle, i.e. by choosing values that maximise the likelihood of observing the *p*-values that were actually observed. However, due to problem in identifying *p*_{0} with *λ*, the authors instead use an upper bound

which corresponds to *f*(1).

The FDR estimate (5) is used with *F* the cdf corresponding to (12).

The authors provide R code for the application of their method [39].

A more intricate Hierarchical Bayes model based on the beta-uniform concept allowing for different parameter values in different intervals appears in [41]. The R function localFDR provides an implementation of the method [42].

### Poisson regression

In [28] it is suggested to estimate any empirical distribution by dividing the real axis into intervals and regarding the number of hits in each interval as the result of an inhomogeneous Poisson process, much like counting the number of cars arriving at a crossing during different time intervals. This method was used in [27] to model the distribution of a transformed test statistic, it also appears in function *locfdr* which estimates a local FDR as a function of a test statistic. In our case, of course, the support of the distribution is the unit interval [0,1]. Then the expected number of hits in each subinterval of [0,1] can be modelled as a polynomial in the midpoints of the subintervals by a technique called Poisson regression (PRE). The approach taken here is to choose a polynomial of low degree so that the plateau representing the uniform distribution is well captured. In doing so the ability the capture the distribution at low *p*-values is sacrificed.

A more mathematical description now follows. The PRE method assumes that the counts *S*_{
k
}follow a Poisson distribution whose intensity is determined by the midpoint *t*_{
k
}of the interval *I*_{
k
}, see [28]. To be specific: in the current application it is assumed that the expected frequency of observations in an interval is given by

where *μ*^{o}_{
k
}are the smoothed observed frequencies in each interval *I*_{
k
}. In statistical jargon this is a Poisson regression model with *μ*^{o}_{
k
}as offset. This assumes independence between counts in different intervals. Although this does not hold true the model succeeds to capture the essential features of distributions. Standard functions in e.g. R can fit this model. Normalising the curve by the total number of *p*-values we get an estimate of the pdf. Finally, smooth the pdf *f*(*x*) with a spline to obtain a more stable result, and use the estimate (10). An implementation of PRE is provided through R function *p0.mom* in package SAGx [18, 38].

### SEP

The Successive Elimination Procedure (SEP) excludes and includes *p*_{
i
}successively such that the similarity of the distribution of the included tends to behave increasingly like a uniform [13]. Finally, an index set *J*_{
final
}will map to a set of *p*-values that represent the true null hypotheses. This yields the point estimate

with *N*_{
J
}= # *J*, the cardinality of the set *J*. The identification of *J*_{
final
}proceeds by an intricate optimisation algorithm where the objective function consists of two terms : one Kolmogorov-Smirnov score

for the empirical cdf *F*_{
J
}(based on *J*), to measure the distance to a uniform, and one penalty term to guard against overfitting

for some tuning parameter *λ*.

A local FDR is obtained from smoothed histogram estimates based on equidistant bins

,

where the function *h*_{0} refers to the *J*_{
final
}set and *h* to the total set of *p*-values.

The function *twilight* in package *twilight* provides an implementation of SEP [19].

### Moment generating function approach

The next approach is based on the moment generating function (mgf), which is a transform of a random distribution, which yields a function *M*(*s*) characteristic of the distribution, cf. Fourier or Laplace transforms, e.g. [43]. Knowing the transform means knowing the distribution. It is defined as the expectation (or the true mean) of the antilog transform of *s* times a random variable *X*, i.e. the expectation of *e*^{sX}or in mathematical notation:

*M*(*s*) = ∫*e*^{sx}*f*(*x*)*dx*.

To calculate the mgf for p-values, we use the fact that the pdf is a mixture of pdf's (8). This yields the weighted sum of two transformed distributions:

,

where we have used the fact that the mgf of a uniform distribution over [0,1] equals *g*(*s*) = (*e*^{s}- 1)/*s*. Denoting the second transform by *M*_{1}(*s*) we finally have

*M*(*s*) = *p*_{0}*g*(*s*) + (1 - *p*_{0})*M*_{1}(*s*). (13)

Now, the idea is to estimate these mgf's and to solve for *p*_{0}. In the above equation *M*(*s*) can be estimated based on an observed vector of *p*-values and *g*(*s*) can be calculated exactly, respectively, while *p*_{0} and *M*_{1}(*s*) cannot be estimated independently. The estimable transform is, given the observed *p*-values *p* = *p*_{1},..., *p*_{
n
}, estimated by

Then, one can solve (13) for *p*_{0}:

Let us do so for *s*_{
n
}> *s*_{n-1}, equate the two ratios defined by the right hand side in (14) and solve for *M*_{1}(*s*_{
n
}). This gives the recursion

If we can find a suitable start for this recursion we should be in a position to approximate the increasing function *M*_{1}(*s*) for *s* = *s*_{1} <*s*_{2} < ... <*s*_{
m
}in (0, 1]. Now, note that 1 ≤ *M*(*s*), for any mgf, with close to equality for small values of *s*. It makes sense to start the recursion with some *M*_{1}(*s*_{1}) in *I* = [1, *M*(*s*_{1})]. In general, it will hold true that 1 ≤ *M*_{1}(*s*) <*M*(*s*) <*g*(*s*), since *f*_{1} puts weight to the lower range of the *p*-values at the expense of the higher range, the uniform puts equal weight, and *f* being a mixture lies somewhere in between. We can calculate *g*, *M* and *M*_{1} for an increasing series of values in [0,1], e.g. for *s* = (0.01, 0.0101, 0.0102, ..., 1). The output from one data set appears in Figure 16. Since all ratios (14) should be equal, a good choice of *M*_{1}(*s*_{1}) will be one that minimises the variation of the ratios. Standard one-dimensional optimisation will find the value in *I* that minimises the coefficient of variation (CV, standard deviation divided by mean)

where *s* = (*s*_{1}, ..., *s*_{
n
}). The CV will in contrast to the variance put both small and high values of the ratios on an equal footing and enable comparison.

Finally, these essentially equal ratios provide an estimate of *p*_{0}.

A heuristic convexity argument suggests that mgf over-estimates *p*_{0}, see the Additional file. Furthermore, the bias seems to decrease as *p*_{0} grows.

An implementation of mgf appears as function *p0.mom* in package SAGx [18, 38].

### Local FDR and FDR

The concept of a local false discovery rate originates from [1]. Let the (true) local FDR at *t* be defined as the probability that gene *i* is unchanged conditional upon that its *p*-value equals *t*, or in formulas : *LFDR*(*t*) = *Pr*(*gene i unchanged* | *p*_{
i
}= *t*) = *p*_{0}/*f*(*t*). The Averaging Theorem of [4] states that integrating the local FDR over the rejection region *R*, such as *R* = [0, 0.01], yields the FDR : *FDR*(*R*) = *E*[*LFDR*(*y*) | *y* ∈ *R*]. In [29] it is noted that the estimated q-value equals the mean of a local FDR

where the local FDR at the *i*^{th}ordered *p*-value *p*_{(i)}equals

,

where N denotes the total number of genes. This rephrases the theorem in terms of estimators. The local FDR is meant as an approximation of the probability that an individual gene *i* is a DEG. As remarked in [29] the q-value does not estimate the probability that a gene is a false positive. Indeed, the theorem shows that it is the mean of that probability for all genes at least as extreme as *i*. Thus the q-value will tend to give a lower value than *LFDR*(*i*).

Under a wide range of models, where *f*(*x*) is non-increasing, e.g. the BUM model, the expected local *LFDR*(*i*) will be non-increasing, and hence the differences above should tend to increase, see the Additional file. Hence there is a need for enforcing monotonicity as in (7). One tool for enforcing monotonicity is the Pooling of Adjacent Violators (PAVA) algorithm [44]. This algorithm has an intuitive appeal, is less ad-hoc than the local spline approach presented in [29], and is the non-parametric maximum likelihood estimate under the assumption of monotonicity. As an example of how it works, consider the series (1,2,4,3,5), which PAVA turns into the non-decreasing series (1, 2, 3.5, 3.5, 5) by pooling the violators of monotonicity (4, 3) and replacing them by their mean. Though not equivalent to the q-value from (2) and (7), the results from applying PAVA to the terms in (15) agreed rather well with the values obtained from function *qvalue*. In the **Results** section this approach combined with the PRE estimate of *p*_{0} is referred to as pava FDR. We could have used mgf for calculating FDR, but it was excluded due to the better over-all performance of PRE.

The bootstrap LSE gives a very similar result to the smoother and thus was excluded in comparison of FDR estimates.

The R function *pava.fdr* in package SAGx provides an implementation of pava FDR, and returns a list of estimates of FDR, LFDR and *p*_{0} [18, 38].

The reference [25] presents the theory behind the estimation of local false discovery rates provided by R package *locfdr* [18, 25]. The method procedes by transforming the test statistic *t* into *Z* = Φ^{-1}(*m*(*t*)), where Φ is the cdf corresponding to N(0,1) and *m* is a transform that for the uninteresting/null class renders the distribution of Z into a N(0,1). Typically, *m* could equal the cdf of the *t*-distribution. Assuming the model (1) the method estimates the mixture density *f*^{t}using Poisson regression, and fits either a theoretical null sub density (from now on suppressing superscript *t* and denoting the pdf corresponding to Φ by *φ*) *f*^{+}_{0}(*z*) = *p*_{0} *φ*(*z*) around z = 0, or fits an empirical null distribution. Then the procedure estimates the local false discovery rate *fdr*(*z*) = *f*^{+}_{0}(*z*)/*f*(*z*).