Microarray technology has become a standard experimental method in bio-medical research. In the analysis of microarray data, one of the most fundamental tasks is the identification of differentially expressed genes while controlling false positives and minimizing false negatives. This is a multiple hypothesis test problem which analyzes thousands or tens of thousands of genes simultaneously. In these tests we often need to control the false discovery among the rejected hypotheses under a pre-specified level while maintaining maximal power. Thus, there is a trade off in the control of the type-I error between rejecting true null hypotheses (false discovery) versus accepting true alternative hypotheses (false negative).

Traditional Bonferroni correction procedures are designed to control the Family Wise Error Rate (FWER), which guards against making one or more type I errors among a family of hypothesis tests. However, these procedures may be excessively conservative for microarray analysis where the number of hypotheses is very large and a substantial fraction of the genes are differentially expressed [1]. A more appropriate approach is to control the False Discovery Rate (FDR), which is the proportion of type I errors among all rejected hypotheses [2, 3]. This approach is particularly useful in exploratory analyses, where the objective is to maximize the discovery of true positives, rather than guarding against one or more false positive results.

A number of methods have been proposed to control the FDR given a population of hypothesis tests. These methods usually assume that the distribution of the test statistics, *f*, can be modeled by a mixture of two components [4]:*f*(*x*) = *π*
_{0}
*f*
_{0}(*x*) + (1 - *π*
_{0})*f*
_{1}(*x*) *π*
_{0} = *m*
_{0}/*m* (1)

Where *f*
_{0} is the distribution of the test statistics under H_{0}, which by definition equals to 1 when using p-values when tests are independent, *f*
_{1} is the distribution of the test statistics under H_{1}, *m*
_{0} is the number of true H_{0}, *m* is the total number of hypotheses under consideration, and *π*
_{0} is the proportion of true H_{0}. The methods proposed by Benjamini et al [2, 3] to control FDR do not estimate *π*
_{0}; therefore, they provide the strongest controls on FDR but have the lowest power compared to other methods that do so.

In many actual applications where a considerable number of genes are differentially expressed, assuming *π*
_{0} = 1 may be too conservative causing loss of power. Several alternative methods, such as nonparametric empirical Bayesian pFDR criterion and its p-value equivalent called q-value method [1, 5, 6], bin-wise model [7–9], local FDR method [10], parametric beta-uniform mixture models [11–14], the Lowest Slope estimator (LSL) [15], the Spacing LOESS Histogram (SPLOSH) method [16], the nonparametric MLE method [17], the moment generating function approach [18], and the Poisson regression approach [18–20], have all been proposed to estimate *π*
_{0} by pooling test statistics and controlling FDR based on the estimated *π*
_{0}.

In these methods, one of the critical steps is estimating the proportion of null hypotheses, *π*
_{0}. When using p-values, these estimations usually depend on the assumption that *f*
_{0} follows a uniform distribution. This assumption, which is of critical importance for the methods of statistical inference that employ pooling test statistics across genes [21], is valid when all test hypotheses are independent and identically distributed. Furthermore, when there are only weak correlations, or "clumpy" correlations (a large number of groups that have a small number of genes with high correlation within groups but no correlation between groups [21, 22]), the uniform assumption is not strongly violated and the method remains adequate. However, in datasets with large scale strong correlations, the joint distribution of the test statistics will no longer be the product of marginal distributions, and the observed *f*
_{0} will severely deviate from uniform, causing the current *π*
_{0} estimation methods to become very unstable. Increased variation and bias of *π*
_{0}, as well as FDR, was also observed by Wu et al [14] in datasets with strong local correlations.

The effect of correlation on simultaneous significance tests was previously discussed theoretically [23–25], and a number of permutation based FDR control methods were proposed, such as SAM [26], dChip [27], Ge et al [28], Meinshausen et al [24] and Efron [25]. In these methods, the distribution of *f*
_{0} was modeled empirically through permutations, which naturally considered the correlation. However, like Benjamini et al [2, 3], these methods don't estimate *π*
_{0}; therefore, in datasets with a large number of differentially expressed genes, the FDR control may be overly conservative with a loss of power.

Therefore we proposed 2 re-sampling schemes, similar to model averaging in bagging methods, to reduce the variation in estimating *π*
_{0} in datasets with strong correlation between gene expression values. Our methods produced a more stable and conservative estimation of *π*
_{0} and, therefore, provided stronger control of False Discovery Rate with only a minor sacrifice of power.