Open Access

Generalized shrinkage F-like statistics for testing an interaction term in gene expression analysis in the presence of heteroscedasticity

BMC Bioinformatics201112:427

DOI: 10.1186/1471-2105-12-427

Received: 10 June 2010

Accepted: 1 November 2011

Published: 1 November 2011



Many analyses of gene expression data involve hypothesis tests of an interaction term between two fixed effects, typically tested using a residual variance. In expression studies, the issue of variance heteroscedasticity has received much attention, and previous work has focused on either between-gene or within-gene heteroscedasticity. However, in a single experiment, heteroscedasticity may exist both within and between genes. Here we develop flexible shrinkage error estimators considering both between-gene and within-gene heteroscedasticity and use them to construct F-like test statistics for testing interactions, with cutoff values obtained by permutation. These permutation tests are complicated, and several permutation tests are investigated here.


Our proposed test statistics are compared with other existing shrinkage-type test statistics through extensive simulation studies and a real data example. The results show that the choice of permutation procedures has dramatically more influence on detection power than the choice of F or F-like test statistics. When both types of gene heteroscedasticity exist, our proposed test statistics can control preselected type-I errors and are more powerful. Raw data permutation is not valid in this setting. Whether unrestricted or restricted residual permutation should be used depends on the specific type of test statistic.


The F-like test statistic that uses the proposed flexible shrinkage error estimator considering both types of gene heteroscedasticity and unrestricted residual permutation can provide a statistically valid and powerful test. Therefore, we recommended that it should always applied in the analysis of real gene expression data analysis to test an interaction term.


The regulation of gene expression starts when a cell's DNA is transcribed into mRNA. The simultaneous expression profiles of many genes under different circumstances can provide insight into physiological processes. Using modern technologies in gene expression experiments such as oligonucleotide arrays [1], and cDNA spotted arrays [2], many scientists have made novel discoveries about complex biological processes of yeast [3, 4], drosophila [5], mice [6], humans [7], and other species. Recently one such study also included RNA-seq [8]. Statistical methodologies and issues involved in microarray data analysis have been widely reviewed [912], and it is expected that many of the same issues will need to be addressed with RNA-seq.

The analysis of variance (ANOVA) model is a popular statistical modeling method for the analysis of microarrays. Since its introduction by Kerr et al. [13], it has been extensively examined for use in this setting [1421]. Kerr et al. constructed an ANOVA model that included the gene effect as a fixed effect. This model assumes identically and independently distributed residual errors across genes. The advantage of this model is that the large number of genes involved in a microarray experiment results in huge degrees of freedom for the error estimate, which can lead to a very powerful test. However, the common assumption of homoscedasticity may not hold true in this setting [22]. One alternative is to use an ANOVA model for each gene, but the resulting test statistics from gene-specific models may have limited power because the biological sample size for each gene in a microarray experiment is usually small.

To address this problem of limited power, researchers have proposed other methods for obtaining more information across genes, ranging from a simple equal-weighted average of a gene-specific error estimate and the global average of all gene-specific error estimates (F2 statistic proposed by Wu et al. [19] to empirical Bayesian modeling of all gene-specific errors [2326]. Other variations [2729] used different variance modeling strategies to address the heteroscedasticity problem, but no clear winner has emerged [30]. Huang and Liu [31] extended the test statistics proposed by Cui et al. [28] by assuming a normal distribution on the mean and then deriving an empirical Bayes likelihood ratio test. The resulting test statistic shrinks both the mean and variances.

In addition to the problem of between-gene heteroscedasticity, we must also be concerned with within-gene heteroscedasticity. For example, in the study of simple differential gene expression between a treatment group and a control group, the variance in the treatment arm may differ from that in the control arm. Some approaches to this problem include a general Bayesian framework to model heteroscedastic error in a single generalized linear mixed model setting [32] and a structural model placed on the error variances specific to each gene and treatment combination [33].

As gene expression studies become more popular, the complexity of the experiment increases. Instead of only simple treatment and control experiments, two or more factor experiments are being conducted. This increase in experiment complexity has led to many scientific questions involving the hypothesis testing of an interaction between two factors. For example, testing a probe by genotype interaction can result in inferences about polymorphism in the probe, such as single nucleotide polymorphism (SNP) and insertion-deletion (indel) [3437]; testing a probe by sex can imply that alternative splicing occurs between male and female subjects [38]; and in pharmacogenomic studies, testing the genotype-drug/treatment or genotype-disease interaction may be of interest [39]. Thus far, all the development of ANOVA methods for microarray studies has focused on tests of main effects.

Here, a generalized shrinkage estimator incorporating both within- and between-gene heteroscedasticities is developed (see Lehmann and Cesella [40] for a review of shrinkage estimation). In any given experiment, both within-gene and between-gene heteroscedasticity may exist; thus, taking these possibilities into account should lead to an improved test statistic. Moreover, given the increasing complexity of recent studies and the burgeoning interest in hypotheses that involve interactions, we focus on an improved shrinkage-based F-test for interaction terms.


Here we develop new shrinkage estimates for the error term and show how to use these estimates to construct F-like statistics. We then estimate the null distribution of these statistics by using permutation tests.

Shrinkage error estimators

Shrinkage error estimators pull individual error estimates toward shrinkage targets, with the amount of shrinkage depending on the variability of individual error estimates [28, 40]. Let the gene-specific error estimates for all genes i and subgroups k be
σ ^ 1 , 1 2 , . . . , σ ^ 1 , K 2 , . . . , σ ^ I , K 2 ,

i = 1,...,I, k = 1,..., K, and let σ i , k 2 be the true variance of gene i in group k. When the experimental design is balanced, σ ^ i , k 2 is the residual mean square for gene i in group k and ν σ ^ i , k 2 σ i , k 2 ~ χ ν 2 , where ν represents the degrees of freedom for the error estimates.

The choices of shrinkage targets in microarray data include the following:
  1. 1.

    Specific values for each gene-group combination

  2. 2.

    Gene-specific values that are the same across all other groups

  3. 3.

    Group-specific values that are the same across genes but different across groups

  4. 4.

    A single point representing the underlying common error


Correspondingly, these targets are correct when (1) there are both within-gene and between-gene heteroscedasticity; (2) there is only between-gene heteroscedasticity; (3) there is only within-gene heteroscedasticity; and (4) all error variances are identical. We now develop a generalized shrinkage error estimator using these four shrinkage targets.

Let X i , k log σ ^ i , k 2 - m ~ log σ i , k 2 + log χ ν 2 ν - m , where m is the mean of log log χ ν 2 ν . Then using asymptotic normal approximation of Xi,k, the distribution of Xi,ks with different shrinkage targets for different gene i and group k combinations is
X i , k | θ i , k ~ N θ i , k , σ 2 θ i , k ~ N μ + α i + β k , τ 2 ,

where θ ̃ = θ 1 , 1 , . . . , θ 1 , K , . . . , θ I , 1 , . . . , θ I , K , α ̃ = α 1 , . . . , α I represents the gene-specific mean differences, and β ̃ = β 1 , . . . , β K models different means with respect to different classes of the subgroups.

If σ 2 and τ2 are known, then the Bayes estimator of θi,kunder the squared error loss is [39]:
θ i , k B = σ 2 σ 2 + τ 2 μ + α i + β k + τ 2 σ 2 + τ 2 X i , k .
Here, σ 2 is the variance of log χ ν 2 ν and is known [28, 40], but τ2 is not known. However, the marginal distribution of Xi,kcan be used to create an empirical Bayes estimator of τ2 and hence of θi,k. Marginally, Xi,k~ N(μ + α i + β k , σ2 + τ2),i = 1,..., I, k = 1, ...K, and, from this model, the least square estimates of μ , α ̃ , β ̃ , μ ^ , α ̃ , β ^ , are the uniformly minimum-variance and unbiased estimators. Using the fact that
E ( [ I K - ( I + K - 1 ) - 2 ] Σ ( X i , k - μ ^ - α ^ i - β ^ k ) 2 ) = 1 σ 2 + τ 2 ,

the empirical Bayes estimator for τ2 is Σ ( X i , k - μ ^ - α ^ i - β ^ k ) 2 [ I K - ( I + K - 1 ) - 2 ] - σ 2 .

Then, we can construct the positive-part empirical Bayes estimator [40]:
θ i , k E B + = X ^ i , k + 1 - [ I K - ( I + K - 1 ) - 2 ] σ 2 Σ ( X i , k - μ ^ - α ^ i - β ^ k ) 2 + X ^ i , k X ^ i , k = μ ^ + α ^ i + β ^ k ,
where(x)+ = max(x, 0). The generalized shrinkage error estimate for σi,kcan be obtained through exponentiating θ i , k E B + as follows:
σ ̃ G e n , i , k 2 = exp ( θ i , k E B + ) .
Using a similar argument, the generalized shrinkage error estimator with the shrinkage target at each gene is
σ ̃ G e n - g e n e , i , k 2 = exp ( m + μ ^ + α ^ i ) * exp [ 1 - [ I K - ( I - 1 ) - 2 ] σ 2 Σ ( X i , k - μ ^ - α ^ i ) 2 + X i , k - μ ^ - α ^ i ] ,
with the shrinkage target at each group is
σ ̃ G e n - g r p , i , k 2 = exp ( m + μ ^ + β ^ k ) * exp [ 1 - [ I K - ( K - 1 ) - 2 ] σ 2 Σ ( X i , k - μ ^ - β ^ k ) 2 + X i , k - μ ^ - β ^ k ] ,
and with the shrinkage target at the common error, we have
σ ̃ G e n - c e , i , k 2 = exp ( m + μ ^ ) * exp [ 1 - [ I K - 3 ] σ 2 Σ ( X i , k - μ ^ ) 2 + X i , k - μ ^ ] .
The shrinkage error estimator proposed by Cui et al. [28] shrinks the gene-specific error estimators toward their common corrected geometric mean. Specifically, the estimator for σ i 2 is calculated as
σ ̃ C u i , i 2 = exp m + Σ X i I * exp [ 1 - [ I - 3 ] σ 2 Σ ( X i - Σ X i I ) 2 + ( X i - Σ X i I ) ] ,

where X i is the residual variance estimate from a gene-specific model, and m and σ2 are the mean and variance of log χ 2 K ν K ν . The underlying assumption for this estimator is that there is no between-gene heteroscedasticity, as this estimator shrinks every gene-specific error estimator toward one target. Therefore, it will overshrink the gene-specific error estimates when gene heteroscedasticity exists. In comparison, generalized shrinkage error estimators are flexible in terms of incorporating a different type of heteroscedasticity. Some degrees of freedom are used for incorporating the heteroscedasticity. However, the gain is that the error estimator is then closer to the underlying distribution and should lead to better performance of the resultant F-like test statistics as shown in the results section.

In formulas (2), (3), (5), and (6), m is the mean and σ2 is the variance of a log-transformed chi-square random variable. The simulation-based approximate values of m and σ2 can be found from Table 1 in work of Cui et al. [28]. Pounds [41] gave analytical expressions for these parameters and developed R code for the exact calculation. Here, the simulation-based approximate values were used.
Table 1

Results from raw data permutation


Data set

F 1

F 2

F 3

F Cui

F Gen

F Gen-gene

F Gen-grp









































































CWER obtained from 1,000 permutations with the nominal significance level setting at 0.05, with standard errors in parentheses. Nine hundred simulation runs were performed to get empirical average CWER of all types of F-like test statistics.

Shrinkage F-like statistics

To construct a statistic for the hypothesis test of no interaction between two fixed effects, the traditional F-test is simply the ratio of the mean square of the interaction term (MSI) and the mean square of residuals (MSE). This F-test, referred to as F1 [42], is F 1 = M S I M S E = M S I σ ^ 2 . The F1 test corresponding to a specific gene i is denoted by
F 1 , i = M S I i σ ^ i 2 .

The error variance estimator in this test uses data from only gene i. In oligonucleotide mi-croarray models, the degrees of freedom for the error estimate can be small because the sample size of RNA is usually small, and hence the power of F1 can be limited.

Following the method of constructing an F-test statistic given by Neter et al. [42], the gene-specific shrinkage F-like statistics for testing an interaction between two fixed effects can be obtained as
F G e n , i = M S I i Σ k σ ̃ G e n , i , k 2 K , F G e n - g e n e , i = M S I i Σ k σ ̃ G e n - g e n e , i , k 2 K , F G e n - g r p , i = M S I i Σ k σ ̃ G e n - g r p , i , k 2 K , F G e n - c e , i = M S I i Σ k σ ̃ G e n - c e , i , k 2 K , F C u i , i = M S I i σ ̃ C u i , i 2 .
When the homoscedastic error assumption is true, the pooled variance estimator, σ ^ p o o l 2 , can be used to construct an F-like statistic. For a balanced design, the pooled variance estimate is the average of all gene-specific error estimates. This statistic is denoted by F3 using the same notation used by Cui and Churchill [22], who also introduced another shrinkage-type F statistic, F2, which can also borrow information across genes when estimating the residual variances. The statistic F2 uses an equal-weighted average of a gene-specific error estimator σ ^ 2 and σ ^ p o o l 2 . The definitions of F2,iand F3,iare
F 2 , i = M S I i 0 . 5 σ ^ i 2 + 0 . 5 σ ^ p o o l 2 , F 3 , i = M S I i σ ^ p o o l 2 .

Permutation tests

For the proposed generalized shrinkage F-like test statistics, the null distributions are not known named distributions. Therefore, an empirical approach such as a permutation test can be used to estimate the null distributions. The permutation test for interaction is complicated, because there is no exact permutation test for such a purpose [43]. We therefore must consider an approximate permutation method for testing an interaction term in a crossed fixed/mixed model [44, 45].

Permutation approaches developed previously focused on a single ANOVA model. In the typical gene expression study, thousands of ANOVA models are considered simultaneously. The additional complexity of the shrinkage F-like statistics indicates that Monte Carlo studies are needed to investigate the performance of residual permutation and raw data permutation, with restrictions or not, in a gene-expression analysis. The choice of permutation procedures is critical for assessing the performance of a test statistic.

For all the modified F-like statistics presented in the previous section, the null distributions can only be approximated empirically, but permutation procedures can be used to find the approximate null distribution of all the F and F-like statistics. The important issues in performing a permutation analysis include the choice of the exchangeable units under the null hypothesis, the choice of using restricted permutation or not, and the choice of residual permutation or raw data permutation. These choices influence the power of a test statistic.

Residual permutation using residuals from a reduced model and unrestricted raw data permutation can be used to approximate the null distribution of a statistic for testing an interaction term [44]. When using F1 to test an interaction term in a single ANOVA model, the residual permutation leads to a more powerful test than unrestricted raw data permutation [44]. However, in gene expression analysis, thousands of gene-specific ANOVA models are simultaneously considered, and for a particular gene-specific ANOVA model, information from other gene-specific ANOVA models is used to construct the shrinkage error estimate. Hence, both residual permutation and raw data permutation were investigated. Furthermore, both restricted and unrestricted permutations were studied, because the permutation units are exchangeable only within each particular group when within-gene heteroscedasticity is present across those subgroups.


The properties of this shrinkage estimator are compared with those of other existing F and F-like statistics that have been proposed and described in the "Shrinkage F-like statistics" section.

Simulation studies

The purpose of these simulation studies was to compare the performances of F1, F2, F3, F Cui , F Gen , F Gen-gene , and F Gen-grp in terms of type I error and power and to compare the results of a particular F-like statistic using four different permutation strategies: restricted/unrestricted residual permutation and restricted/unrestricted raw data permutation.

In these simulation studies, 100 genes with two probes for each gene and three replicates from each of two lines were simulated to mimic a split-plot design in a general oligonu-cleotide microarray experiment. The gene-specific ANOVA model in which data were generated from the model, y plr = P p + L l + RL rl + PL pl + ϵ plr , wp = 1, 2, l = 1,2, r = 1,2,3, where P, L, RL, and PL represent probe, line, replicates from a particular line, and the interaction between probe and line, respectively.

Replicates were nested within each line, and RL is usually treated as a random effect during the model-fitting procedure, which results in a correlation between probes from the same biological sample. In the simulated data sets, the correlation between genes was 0. As many as 900 simulation runs were carried out to compare the performances of F1, F2, F3, F Cui , F Gen , F Gen-gene , and F Gen-grp based on different permutation procedures. The four permutations tested were unrestricted residual permutation, restricted residual permutation with respect to each line, unrestricted raw data permutation, and restricted raw data permutation with respect to each line. The residuals permuted were from a reduced fixed model with fixed effects for only line and probe.

Two types of data were simulated: null cases and cases with a probe by line interaction at a range of degrees. Null cases included: null-ce, all probe-level expression values were simulated from the standard normal distribution; null-gh, the gene-specific error variances were simulated from the log-normal distribution with mean log at 0 and standard deviation at 2, mimicking the general heteroscedastic error distribution in typical datasets; null-wgh, all genes had the same error structures and the residual error variance of line 1 was 100 times that of line 2; null-bgh, simulated data were modified from null-gh, with the variance of line 1 multiplied by 100. Correspondingly, ce, gh, wgh, and bgh in Figures 1 and 2 were simulated by adding interaction terms to null-ce, null-gh, null-wgh, and null-bgh. Quantitative interaction was assumed and the differences in the opposite direction were set to make the detection powers for an interaction term based on traditional F-statistics and tabled p-values range from 0.05 to 0.95.
Figure 1

The comparison of power curves of all F -like test statistics. The x-axis is the average power from analyzing 900 simulated data sets using F1 with tabled p-values. The y-axis is the estimated powers using empirical gene-specific null distributions from 1,000 residual permutations. The upper four plots show the results with restricted residual permutation, while the lower four plots show the results from unrestricted residual permutation. The solid line indicates the empirical average CWER of a statistic is at the prespecified level, and the dashed line shows an inflated empirical average CWER."ce," all genes have common error; "gh," only between-gene heteroscedasticity exists; "wgh," only within-gene heteroscedasticity exists; "bgh," both between-gene and within-gene heteroscedasticity exist.

Tables 1 and 2 show the results from 900 simulation runs using raw data permutation and residual permutation, respectively. Data in Table 1 suggest that when both types of gene heteroscedasticity exist, the unrestricted raw data permutation had a greater average comparison-wise error rate (CWER) than residual permutation. Raw data permutation with restriction can control prespecified CWER im all cases. In Table 2, for the common error cases, all test statistics had the prespecified CWER from both restricted and unrestricted residual permutation. When within-gene heteroscedasticity existed, F1 and F Cui had inflated CWER from both two residual permutation tests. Restricted residual permutation reduces, but does not solve, this problem. For F2 and F3, only the restricted residual permutation could control the prespecified CWER. For F Gen , F Gen-gene , and F Gen-grp , restricted residual permutation gave conservative results in terms of having CWER smaller than the prespecified level. When the shrinkage target is correctly set, unrestricted residual permutation controls the nominal CWER. As expected, only F Gen coupled with unrestricted residual permutation could be used for all cases, because the CWER was always less than the nominal level.
Table 2

Results from residual permutation


Data set

F 1

F 2

F 3

F Cui

F Gen

F Gen-gene

F Gen-grp









































































CWER obtained from 1,000 permutations with the nominal significance level setting at 0.05, with standard errors in parentheses. Nine hundred simulation runs were performed to get empirical average CWER of all types of F-like test statistics.

Further simulations to compare the rejection rates were conducted. Only results from residual permutation are shown because it was found that raw data permutation was less powerful than residual permutation. This is consistent with the findings of Anderson and Ter Braak [44]. Figure 1 shows the estimated average null hypothesis rejection rate curves from all F-like statistics and both restricted and unrestricted residual permutation procedures. The x-axis represents the average null hypothesis rejection rate using F1 and the tabulated p-values. The solid line shows that the corresponding statistic controls the prespecified CWER, and the dashed line shows that the corresponding CWER was inflated. In general, restricted residual permutation is less powerful than unrestricted residual permutation. For example, the power of all statistics from unrestricted residual permutation almost doubled in some cases where heteroscedasticity existed.

When the common error assumption is valid, F3 is obviously the most powerful test and the prespecified CWER is controlled. All other F-like statistics performed very similarly in this case. When the shrinkage target was correctly set, the resultant test statistic was the most powerful one. For example, when there was only within-gene heteroscedasticity, F Gen-grp was more powerful than F Gen and F Gen-gene based on either restricted or unrestricted residual permutation. The rejection rate comparison of statistically valid test statistics is further illustrated in Figure 2, where the x-axis is the average rejection rate from using F Gen and unrestricted residual permutation. Figure 2 clearly shows that unrestricted residual permutation is more favorable in terms of power. F Gen-grp appears to be more powerful than F Gen , but when both types of gene heteroscedasticities occur, F Gen grp has inflated CWER.
Figure 2

The comparison of power curves of F Gen from unrestricted residual permutation versus other F -like test statistics. Only results from permutation combinations that can control prespecified CWER are used in this figure. The x-axis is the average power after analyzing 900 simulated data sets using F Gen and 1,000 unrestricted residual permutations. The y-axis is the estimated power from other F-like test statistics and empirical gene-specific null distributions based on the appropriate permutation. The solid black line corresponds to F Gen with unrestricted permutation, and this test always controls prespecified CWER."ce," all genes have common error; "gh," only between-gene heteroscedasticity exists; "wgh," only within-gene heteroscedasticity exists; "bgh," both between-gene and within gene heteroscedasticity exist; "res," restricted permutation; "unres," unrestricted permuation.

Drosophila data

The data used in this study are from a gene expression comparison study between D. melanogaster and D. simulans [46]. Expression of 10 genotypes of each species was measured in male flies. In D. simulans, each genotype was measured separately, and in D. melanogaster, a pool of 10 genotypes was measured. All genotypes (individual or pooled) were independently isolated and hybridized three times. The goal of the original study was to provide a genome-wide approach to identifying candidate genes potentially responsible for adaptation and speciation in D. simulans and D. melanogaster. In this study, we focus on identifying sequence differences between genotypes in D. simulans based on hybridization profiles. Within-gene heteroscedasticity is expected because the genotypes come from different lines. The proposed generalized shrinkage F-like test statistics F Gen , F Gen-gene , and F Gen-grp were compared with F2, F3 with restricted residual permutation, which could control prespecified CWER for any variance structure in simulation studies. Furthermore, Smyth's moderated F-test statistic [25] without multiple testing adjustment and controlling the false discovery rate (FDR) at 5% were used for comparison. As the main interest is in sequence difference, the focus is on the test of interaction between line and probe. The split plot model described above is used. SAS program codes are included in the additional files (additional file 1 and additional file 2).

The Drosophila genome has been fully sequenced and both SNPs and indels can cause a significant interaction term. Thus, the false positive rate and detection power based on SNP/indel sequence information can be calculated for a subset of the data. In the data set, there were 10 lines from D. simulans and three replicates from each line. Each probe set had 14 probes. The 1,285 probesets containing all "good" probes were selected. A "bad" probe's sequence satisfies one or more of the following criteria: it matches the D. simulans genome multiple times; it cannot be mapped to the flybase 4.2.1 genome; or, it has no information, such as hitting outside an exon, hitting a poorly aligned region, or hitting a region lacking a sequence. SNP or indel information could be determined in 777 probesets. For this data set, there was a high degree of within-gene heteroscedasticity: about 22.3% of the probe sets had a difference in line-specific residual variance estimates as large as or more than a 10-fold change. Therefore, as suggested by the conclusions from simulation studies, unrestricted residual permutation and restricted residual permutation were used for generalized shrinkage F-like test statistics (F Gen , F Gen-gene , F Gen-grp ) and restricted residual permutation was used for statistics (F2, F3). The results are shown in Table 3. Consistent with the findings from the simulation studies, F Gen had about 30% more detecting power by valuing the within-gene heteroscedasticity than the other F-like test statistics (F2, F3). The false discovery rate of F Gen was slightly higher than that of F2, F3. F Gen-gene and F Gen-grp performed similarly to F Gen . Both of Smyth's moderated F-test statistic without multiple testing adjustment and with FDR set at 5% for multiple testing adjustment detected more SNPs and indels but at the expense of a greater FDR than F Gen .
Table 3

Probe sets with significant line*probe terms found by F-like test statistics and appropriate residual permutation procedures and Smyth's moderated F-test statistic

Test statistic

Restricted permutation?

Number of probe sets found

True false discovery rate


F 2





F 3





F Gen





F Gen-gene





F Gen-grp





F Gen





F Gen-gene





F Gen grp





moderatedF - 1





moderatedF - 2





The CWER was set to 0.05. Gene-specific cutoff values were obtained from 1,000 permutations. "moderated F-1" and "moderated F-2" represent results from using moderated F statistic without any multiple testing adjustment and setting FDR to 5%.


For gene expression analysis, ANOVA models have been a popular modeling technique. Based on ANOVA models, flexible shrinkage F-like test statistics were developed to account for both the within-gene and between-gene heteroscedasticities. The emphasis here is on testing an interaction term, as this case is of increasing interest to biologists, and there is no clear existing theory on the most powerful, valid approach for such statistics. For all F-like statistics studied here, their null distributions were approximated empirically through permutations. Four different permutation procedures were investigated for eight different F-like statistical tests of the interaction term.

As expected, we found that when an error estimator overshrinks, the resulting F-like statistic cannot control the prespecified CWER. For example, F Gen-gene is an over-shrinkage error estimator when there is within-gene heteroscedasticity. As a result, compared with generalized shrinkage F-like statistics, it is not valid when within-gene heteroscedasticity exists. Undershrinkage is also important, as it will lead to a conservative test and lower power. This is clearly demonstrated when the common error can be assumed and the most powerful valid test is F Gen-grp .

The most striking result was the impact of the permutation procedures. Although this was not completely unexpected [4345], the effect of the permutation procedures is dramatic and worthy of special attention. Unrestricted raw data permutation could not control prespeci-fied CWER when there was within-gene heteroscedasticity. Restricted raw data permutation could be used, but it was less powerful than residual permutation. Also consistent with findings from Anderson and Ter Braak [44], restricted permutations are less powerful than unrestricted permutations. However, unrestricted permutations are valid only for a common error and when between-gene heteroscedasticity exists for our proposed shrinkage statistics; they are not valid in combination with F2, F3, or F Cui . For F Gen-grp , the unrestricted permutation can also be used in cases having within-gene heteroscedasticity, while only F Gen is valid with unrestricted permutation in all cases in terms of controlling prespecified CWER. Interestingly, the power gain from using the correct shrinkage target F Gen-grp rather than F Gen is far less than that of using unrestricted permutation. The result is that F3 is never the most powerful choice when testing an interaction term.

The correct shrinkage target can lead to the most powerful test statistic. As one of the reviewers suggested, a statistical test may be applied to help pick the best shrinkage target before obtaining shrinkage error estimates. However, this extra testing step may inflate the CWER of the test statistic when there is gene heteroscedasticity. For example, when there are both types of gene heteroscedasticities, it is possible that the above test suggests only within-gene heteroscedasticities exist, and F Gen-grp is shown to inflate the CWER. There is minimal penalty to using the shrinkage estimator we propose, so we recommend setting the shrinkage target in the full space spanned by group and gene and using unrestricted permutation to compensate for the possible power loss in fewer degrees of freedom left for estimating the errors.


The proposed generalized shrinkage F-like statistic with shrinkage targets located in a space spanned by gene and another group, F Gen , with unrestricted residual permutation is always valid in terms of having a prespecified CWER. This statistic has reasonable power in most cases; thus, it is generally recommended to be applied to test an interaction term in the analysis of real gene expression data.

List of abbreviations


comparison-wise error rate


false discovery rate


insertion and deletion


single nucleotide polymorphism



We thank Brandon Walts for identifying true SNP positions; Angela J. McArthur and David R. Galloway for their help in scientific editing; associate editor and three reviewers for their constructive comments that much improved this manuscript. This research was supported by NIH 1R01GM077618 (McIntyre), NIH 1R01GM081704 (Casella).

Authors’ Affiliations

Department of Preventive Medicine, Stony Brook University
Department of Statistics, University of Florida
Department of Molecular Genetics and Microbiology, University of Florida
The Genetics Institute, University of Florida


  1. Fodor SPA: Massively parallel genomics. Science 1997, 277: 393–395. 10.1126/science.277.5324.393View ArticleGoogle Scholar
  2. Schena M, Shalon D, Heller R, Chai A, Brown PO, Davis RW: Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. Proceedings of National Academy Science 1996, 93: 10614–10619. 10.1073/pnas.93.20.10614View ArticleGoogle Scholar
  3. Galitski T, Saldanha AJ, Styles CA, Lander ES, Fink GR: Ploidy regulation of gene expression in yeast. Science 1999, 285: 251–254. 10.1126/science.285.5425.251View ArticlePubMedGoogle Scholar
  4. Tu BP, Kudlicki A, Rowicka M, McKnight SL: Logic of the yeast metabolic cycle: Temporal compartmentalization of cellular processes. Science 2005, 310: 1152–1158. 10.1126/science.1120499View ArticlePubMedGoogle Scholar
  5. White KP, Rifkin SA, Hurban P, Hogness DS: Microarray analysis of Drosophila development during metamorphosis. Science 1999, 286: 2179–2184. 10.1126/science.286.5447.2179View ArticlePubMedGoogle Scholar
  6. Chabas D, Baranzini SE, Mitchell D, Bernard CCA, Rittling SR, Denhardt DT, Sobel RA, Lock C, Karpuj M, Pedotti R, Heller R, Oksenberg JR, Steinman L: The influence of the proinflammatory cytokine, Osteopontin, on autoimmune demyelinating disease. Science 2001, 294: 1731–1735. 10.1126/science.1062960View ArticlePubMedGoogle Scholar
  7. Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, Maner S, Massa H, Walker M, Chi MY, Navin N, Lucito R, Healy J, Hicks J, Ye K, Reiner A, Gilliam TC, Trask B, Patterson N, Zetterberg A, Wigler M: Large-scale copy number polymorphism in the human genome. Science 2005, 305: 525–528.View ArticleGoogle Scholar
  8. Blekhman R, Marioni JC, Zumbo P, Stephens M, Gilad Y: Sex-specific and lineage-specific alternative splicing in primates. Genome Research 2010, 20(2):180–189. 10.1101/gr.099226.109PubMed CentralView ArticlePubMedGoogle Scholar
  9. Butte A: The use and analysis of microarray data. Nature Reviews 2002, 1: 951–960. 10.1038/nrd961PubMedGoogle Scholar
  10. Churchill GA: Fundamentals of experimental design for cDNA microarrays. Nature Genetics 2002, 32: 490–495. 10.1038/ng1031View ArticlePubMedGoogle Scholar
  11. Craig BA, Black MA, Doerge RW: Gene expression data: The technology and statistical analysis. Journal of Agricultural, Biological, and Environmental Statistics 2003, 8(1):1–28. 10.1198/1085711031256View ArticleGoogle Scholar
  12. Allison DA, Cui X, Page GP, Sabripour M: Microarray data analysis: from disarray to consolidation and consensus. Nature Reviews Genetics 2006, 7: 55–65. 10.1038/nrg1749View ArticlePubMedGoogle Scholar
  13. Kerr MK, Martin M, Churchill GA: Analysis of variance for gene expression microarray data. Journal of Computational Biology 2000, 7: 819–837. 10.1089/10665270050514954View ArticlePubMedGoogle Scholar
  14. Kerr MK, Churchill GA: Statistical design and the analysis of gene expression microarrays. Genetical Research 2001, 77: 123–128.PubMedGoogle Scholar
  15. Kerr MK, Churchill GA: Experimental design for gene expression microarrays. Biostatistics 2001, 2: 183–201. 10.1093/biostatistics/2.2.183View ArticlePubMedGoogle Scholar
  16. Pritchard CC, Hsu L, Delrow J, Nelson PS: Project normal: Defining normal variation in mouse gene expression. Proceedings of the National Academy of SciencesUSA 2001, 98: 13266–13271. 10.1073/pnas.221465998View ArticleGoogle Scholar
  17. Wolfinger RD, Gibson G, Wolfinger ED, Bennett L, Hamadeh H, Bushel P, Ashfari C, Paules RS: Assessing gene significance from cDNA microarray expression data via mixed models. Journal of Computational Biology 2001, 8(6):625–637. 10.1089/106652701753307520View ArticlePubMedGoogle Scholar
  18. Kerr MK, Afshari CA, Bennett L, Bushel P, Martinez J, Walker NJ, Churchill GA: Statistical analysis of a gene expression microarray experiment with replication. Statistica Sinica 2002, 12: 203–217.Google Scholar
  19. Wu H, Kerr MK, Cui XQ, Churchill GA: MAANOVA: A Software package for the analysis of spotted cDNA microarray experiments, In. In The analysis of gene expression data: methods and software. Springer; 2002:313–341.Google Scholar
  20. Chu T, Weir B, Wolfinger R: A systematic statistical linear modeling approach to oligonucleotide array experiments. Mathematical Biosciences 2002, 176: 35–51. 10.1016/S0025-5564(01)00107-9View ArticlePubMedGoogle Scholar
  21. Wayne ML, Pan YJ, Nuzhdin SV, McIntyre LM: Additivity and transacting effects on gene expression in male Drosophila simulans . Genetics 2004, 168: 1413–1420. 10.1534/genetics.104.030973PubMed CentralView ArticlePubMedGoogle Scholar
  22. Cui X, Churchill GA: Statistical tests for differential expression in cDNA microarray experiments. Genome Biology 2003, 4(4):201.View ArticleGoogle Scholar
  23. Baldi P, Long AD: A Bayesian framework for the analysis of microarray expression data: Regularized t-test and statistical inferences of gene changes. Bioinformatics 2001, 17: 509–519. 10.1093/bioinformatics/17.6.509View ArticlePubMedGoogle Scholar
  24. Lönnstedt I, Speed T: Replicated microarray data. Statistica Sinca 2002, 12: 31–46.Google Scholar
  25. Smyth GK: Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology 2004., 3: No. 1, Article 3 No. 1, Article 3Google Scholar
  26. Tong TJ, Wang YD: Optimal shrinkage estimation of variances with applications to microarray data analysis. Journal of the American Statistical Association 2007, 102: 113–122. 10.1198/016214506000001266View ArticleGoogle Scholar
  27. Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to ionizing radiation response. The Preceedings of National Academy Science 2001, 989: 5116–5121.View ArticleGoogle Scholar
  28. Cui X, Hwang JTG, Qiu J, Blades NJ, Churchill GA: Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatis-tics 2005, 6: 59–75. 10.1093/biostatistics/kxh018View ArticleGoogle Scholar
  29. Feng S, Wolfinger RD, Chu TM, Gibson GC, McGraw LA: Empirical Bayes analysis of variance component models for microarray data. Journal of Agricultural, Biological & Environmental Statistics 2006, 1113: 197–190.View ArticleGoogle Scholar
  30. Kim SY, Lee JW, Sohn IS: Comparison of various statistical methods for identifying differential gene expression in replicated microarray data. Statistical Methods in Medical Research 2006, 15: 3–20. 10.1191/0962280206sm423oaView ArticlePubMedGoogle Scholar
  31. Hwang JTG, Liu P: Optimal tests shrinking both means and variances applicable to microarray data analysis. In preprint 2007–03. Department of Statistics, Iowa State University, IA; 2007.Google Scholar
  32. Kizilkaya K, Tempelman RJ: A general approach to mixed effects modeling of residual variances in generalized linear mixed models. Genetics Selection Evolution 2005, 37: 31–56. 10.1186/1297-9686-37-1-31View ArticleGoogle Scholar
  33. Jaffrezic F, Marot G, Degrelle S, Isabelle H, Foulley JL: A structural mixed model for variances in differential gene expression studies. Genetical Research 2007, 89(1):19–25. 10.1017/S0016672307008646View ArticlePubMedGoogle Scholar
  34. Rostoks N, Borevitz JO, Hedley PE, Russell J, Mudie S, Morris J, Cardle L, Marshall DF, Waugh R: Single-feature polymorphism discovery in the Barley transcriptome. Genome Biology 2005, 6: R54. 10.1186/gb-2005-6-6-r54PubMed CentralView ArticlePubMedGoogle Scholar
  35. Kirst M, Caldo R, Casati P, Tanimoto G, Walbot V, Wise RP, Buckler ES: Genetic iversity contribution to errors in short oligonucleotide microarray analysis. Plant Biotechnology Journal 2006, 4: 489–498.PubMedGoogle Scholar
  36. Zhang X, Shiu SH, Cal A, Borevitz JO: Global analysis of genetic, epigenetic and transcriptional polymorphisms in Arabidopsis Thaliana Using whole genome tiling arrays. PLoS Genetics 2008, 4(3):e1000032. 10.1371/journal.pgen.1000032PubMed CentralView ArticlePubMedGoogle Scholar
  37. Zhang X, Borevitz JO: Global Analysis of Allele-specific Expression in Arabidopsis Thaliana. Genetics 2009, 182(4):943–954. 10.1534/genetics.109.103499PubMed CentralView ArticlePubMedGoogle Scholar
  38. McIntyre LM, Bono LM, Genissel A, Westerman R, Junk D, Telonis-Scott M, Harshman L, Wayne ML, Kopp A, Nuzhdin SV: Sex specific expression of alternative transcripts in Drosophila. Genome Biology 2006, 7: R79. 10.1186/gb-2006-7-8-r79PubMed CentralView ArticlePubMedGoogle Scholar
  39. Kelly P, Zhou YH, Whitehead J, Stallard N, Bowman C: Sequentially testing for a gene-drug interaction in a genomewide analysis. Statistics in Medicine 2008, 27: 2022–2034. 10.1002/sim.3059View ArticlePubMedGoogle Scholar
  40. Lehmann EL, Casella G: Theory of Point Estimation. 2nd edition. New York: Springer-Verlag; 1998.Google Scholar
  41. Pounds S: Computational enhancement of a shrinkage-based analysis of variance F-test proposed for differential gene expression analysis. Biostatistics 2007, 83: 505–506.View ArticleGoogle Scholar
  42. Neter J, Wasserman W, Kutner MH: Applied Linear Statistical Models: Regression, Analysis of Variance, and Experimental Designs. 3rd edition. Irwin, Inc; 1990.Google Scholar
  43. Edgington ES: Randomization Tests. 3rd edition. Marcel Dekker, New York; 1995. (1995) (1995)Google Scholar
  44. Anderson MJ, Ter Braak CJF: Permutation tests for multi-factorial analysis of anova. Journal of Statistical Computation and Simulation 2003, 732: 85–113.View ArticleGoogle Scholar
  45. Churchill GA, Doerge RW: Naive application of permutation testing leads to nflated type I error rates. Genetics 2008, 178: 609–610. 10.1534/genetics.107.074609PubMed CentralView ArticlePubMedGoogle Scholar
  46. Nuzhdin SV, Wayne ML, Harmon KL, McIntyre LM: Common pattern of evolution of gene expression level and protein sequence in drosophila. Molecular Biology and Evolution 2004, 21: 1308–1317. 10.1093/molbev/msh128View ArticlePubMedGoogle Scholar


© Yang et al; licensee BioMed Central Ltd. 2011

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.