In this study we have demonstrated that our RIR-based FDR estimation method significantly outperforms the other popular approaches and provides very accurate FDR estimates, especially when a small percentage of genes are differentially expressed. Among the other FDR evaluation methods compared, the BH and BY methods were found to provide quite conservative results and failed to identify a number of truly differentially expressed genes in real microarray data, whereas the full-permutation (mix-all) approach appeared to yield false positives as significant genes.

In this study we found that one of the most critical steps in FDR evaluation is the generation of biologically-relevant null data. This step has failed and/or is difficult to assimilate in other theoretical and computational FDR estimation approaches. We believe that our heuristic, resampling-based approach provides a significant improvement on FDR estimation and a realistic and intuitive framework for understanding FDR in practice. Other approaches in use are based on quite restrictive mathematical assumptions and/or computational constraints, which result in a biologically unrealistic framework for statistical estimation and discovery. In particular, the simple, full permutation strategy produces both an inflated pooled variance and an inflated difference between the gene intensities, but results in a liberal testing framework because the inflation in the numerator of the test statistics (differential expression) is larger than that in the denominator (variance) in such a null distribution. On the other hand, the shuffling strategies across all conditions can not be applied to microarray data with a small sample size, as the number of independent permutations is too small to provide any meaningful results.

In many microarray studies under controlled experimental conditions, one may expect less than 10% of the genes to be differentially regulated, and thus removal of the top 10% genes from each local interval can be effective in generating a null-distribution excluding most of the differentially expressed genes. Our simulations show that removal of the top 5%, 10%, 20%, or even 50% genes does not affect the null distribution (data not shown), but we admit that these are yet subjective choices and may require a more extensive investigation. Our simulation studies have shown that removing the top 10% of genes produces results close to the true FDR among the four cases with 5% to 50% of differentially expressed genes. In Figure 3, we showed the comparison among the FDR evaluation methods for the simulated data with the proportions of differentially expressed genes varying between 5% and 50%. In many microarray studies, the proportion of differentially expressed genes would be lower than this. Thus, as somewhat expected, the mix-all approach, which is not sensitive to variability across different intensity ranges in microarray data, performs quite well if the proportion of differentially expressed genes is high and a large number of genes do not follow the baseline error distribution. Overall, the bigger such a proportion, the better the mix-all approach would perform. Note that with 5% and 10% of differentially expressed genes, the mix-all method performed poorer, with more liberal, underestimated FDR estimates, than our RIR approach. As Pounds and Cheng [6] reported, the FDR estimates of the mix-all approach are found to be somewhat unstable for low FDR, which may be a critical region in real data applications.

It has often been found that the results from simulation studies may be considerably affected by certain predefined parameters and settings, for example, *δ* for the differential expression magnitude and *q* for the estimation of null-gene proportion in our current study. As such we examined sensitivity of our results to these settings. First, we found that our results were not much different for different choices of *q* between 0.5 – 0.95 (data not shown). Also, although a more reasonable cross-validated approach is yet to be developed for choosing the *δ* value, our current parameter value was empirically chosen from an actual microarray data analysis. We then consistently used this value in our simulation study with varying proportion of differentially expressed genes up to 50% and found little effect of this setting on the resulting null distribution.

We note that our RIR-based FDR estimation is derived for each threshold value *c* of LPE z-score and that the ratio of *V*(*c*) and *R*(*c*) is then calculated only when *R*(*c*) > 0, so that this effectively provides an estimate of *pFDR*(*Z > c*), the q-value. Thus, the RIR-based FDR evaluation can be considered as a carefully designed resampling-based q-value estimation [7]. Note also that our RIR-based approach can be applied to microarray data analysis independent of different preprocessing methods.

In Table 2, several known genes' FDR estimates from the SPLOSH and mix-all approaches were larger than those of RIR. This is somewhat contrary with the observation that the SPLOSH and mix-all approaches were more liberal than the RIR as seen in Fig. 3 and Table 1. This may be due to the fact that these genes have relatively low variability, i.e., in high intensity regions, so that their significance is higher by considering such heterogeneous variability by RIR, but not by the others.