Nonparametric tests for differential gene expression and interaction effects in multi-factorial microarray experiments
© Gao and Song. 2005
Received: 19 April 2005
Accepted: 21 July 2005
Published: 21 July 2005
Skip to main content
© Gao and Song. 2005
Received: 19 April 2005
Accepted: 21 July 2005
Published: 21 July 2005
Numerous nonparametric approaches have been proposed in literature to detect differential gene expression in the setting of two user-defined groups. However, there is a lack of nonparametric procedures to analyze microarray data with multiple factors attributing to the gene expression. Furthermore, incorporating interaction effects in the analysis of microarray data has long been of great interest to biological scientists, little of which has been investigated in the nonparametric framework.
In this paper, we propose a set of nonparametric tests to detect treatment effects, clinical covariate effects, and interaction effects for multifactorial microarray data. When the distribution of expression data is skewed or heavy-tailed, the rank tests are substantially more powerful than the competing parametric F tests. On the other hand, in the case of light or medium-tailed distributions, the rank tests appear to be marginally less powerful than the parametric competitors.
The proposed rank tests enable us to detect differential gene expression and establish interaction effects for microarray data with various non-normally distributed expression measurements across genome. In the presence of outliers, they are advantageous alternative approaches to the existing parametric F tests due to the robustness feature.
High density oligonucleotide microarray, spotted cDNA array, or other array technologies have presented not only daunting amount of expression data for biologists to explore the inherent biological mechanisms, but also challenging statistical analysis problems. A replicated microarray experiment involves multiple arrays to compare gene expression profile under different conditions. However, normality assumption justifying parametric testing is often untenable in microarray studies [1, 2]. For instance, a set of 540 genes from a leukemia data set  were analyzed and various distributions for the different genes were found, in which only 13.3% genes have error distributions satisfying the normality assumption . If the underlying distributions of expression measurements can be validated properly, model-based approaches such as likelihood or Bayesian inference can validly accept non-normally distributed data and gain satisfactory power to detect differentially expressed genes (e.g. [5–10]). For example, a hierarchical mixture-model has been proposed with parameterizations for Gamma or log-normally distributed measurements . However when the distribution of the data is difficult to characterize, nonparametric inference makes less stringent distributional assumptions and thereby provide appropriate analysis.
Furthermore data contamination can arise in microarray setting due to different reasons. For instance, an image contamination can occur if a long scratch is present on the array image or a corner of the array is misaligned in the image processing stage. Sample contamination can occur if the mRNA sample is contaminated with other sources of RNA present in the laboratory environment. Such outliers are dramatically different from the majority of the observations and can greatly undermine the sensitivity of parametric approach. A method of assessing goodness of fit to a linear model has been used to automatically detect outliers that possess too large deviation from the overall pattern . Alternatively, a quality index based on coefficient variation was adopted to filter out outlying values with poor quality . Nevertheless the inspection process is time consuming for such large-scale expression data analysis . In this context, nonparametric inference is advantageous as it is insensitive to the presence of outliers. Even without the step of outlier filtering, the validity and power of the nonparametric procedures would be minimally affected.
The development of both parametric and nonparametric methods to address the two condition problem in microarray setting has recently received much attention. Most of the parametric tests employed t or t-like statistics and differ primarily in the estimation of variance . In contrast to these methods which treat the genes as separate fixed effects, the two-group Bayes method was proposed to treat the genes as arising from a certain population. Thus the dimensionality of the inference problem was reduced by sharing information across the array [5, 9]. Nonparametric approaches have also been proposed for two-user defined groups [12, 13]. The Wilcoxon rank sum test was considered in [14, 15] to identify differentially expressed genes in comparison with the Fisher-Pitman permutation test, which is also referred as the nonparametric t test . Recently, the Baumgartner-Weiß-Schindler test has been recommended to detect differentially expressed genes in two groups, which was shown to be less conservative and more powerful than the Wilcoxon rank sum test .
However, a microarray experiment often has more complicated design than that of two user-defined groups. Besides the treatment effects of interest, there may exist some clinical covariates such as age, gender and certain clinical symptoms, which also influence the gene expression level. For such experiments, a factorial design model is useful to account for the multiple sources of variation. Townsend and Hartl  derived a Bayesian model that has been widely used for the estimation of gene expression levels in multifactorial experiments [7, 17]. This model has been extended  to accommodate not only additive error terms but also multiplicative error terms to resolve small yet statistically significant differences in gene expression. Alternatively, an overall ANOVA model has also been widely used that simultaneously considers all the genes on the arrays and incorporates array effect and dye effect . A gene specific ANOVA model under the normality assumption was considered in . A mixed linear model was proposed to assess gene significance in which both fixed treatment effects and random array effects were assumed . Unfortunately, there has been no nonparametric procedure proposed up to date to analyze multifactorial microarray data. In addition, the establishment of interaction effect between the multiple attributing factors can help elucidate certain biological mechanisms related to the regulation of gene expression. Thus it is desirable to develop a set of nonparametric procedures to detect differential gene expression and establish interaction effects for multifactorial microarray data.
To account for the multiple sources of variation attributing to the gene expression, we consider the following model for each specific gene:
X kijn = θ k + T ij + C j + ε kijn , k = 1,..., K; i = 1,..., I; j = 1,..., J; n = 1,..., N (1)
with ∑ j C j = 0, and ∑ i, j T ij = 0, where k indexes for the gene number, i indexes for the treatment group, j indexes for the covariate group, n indexes for the replicate number. In the equation, X kijn represents the expression measurement, θ k represents the k th gene specific mean, T ij represents the effect of the i th treatment group (for instance, drug treatments, tissue types, and strains of mice) through its main effect and interaction effect with the j th level of the clinical covariate, and C j represents the effect of the j th level of the clinical covariate. The error terms ε kijn are independently and identically distributed random noise from a continuous distribution function F k .
To further discern the interaction effect, the treatment effect T ij , can be decomposed into T ij = M i + γ ij with ∑ i M i = 0 and ∑ i, j γ ij = 0, where M i denotes the main effect of the treatment group and γ ij denotes the interaction effect. Interaction effects are often of biological interest when the treatment effects are heterogeneous across the levels of the clinical covariate. For example, consider a data set with mouse strains as treatment groups and tissue types as covariate groups, the interaction effects arise when the effects of different mouse strains are disproportional over different tissue types. It is worth noting that model (1) is related to ANOVA models proposed by other researchers [18–20]. The difference between our factorial model (1) and the existing ANOVA models is two-fold: model (1) accommodates multifactor effects on each specific gene, and it does not make normality assumption on the error terms ε ijkn .
To develop nonparametric rank tests for multifactorial microarray experiments, it is natural to consider rank procedure which can be viewed as nonparametric analogue of the parametric analysis of variance approach. The popular rank transform (RT) method consists of replacing the observations by their ranks in the combined sample and performing one of the standard analysis of variance tests on these ranks . However in the general multifactorial model, the RT method is not valid for most of the common hypotheses due to the nonlinear nature of the rank transformation. For example in the presence of interaction effect, the naive application of ranks into ANOVA formula cannot be used to detect for main effect nor for the interaction effect. Theoretical validations of these limitations of RT method have been thoroughly discussed by Brunner and Neumann, Akritas, and Wilcox, among many others [22–28]. Since the RT method can be easily accomplished by using standard computer packages, extra caution needs to be exerted to prevent the inappropriate extensions of RT method for microarray data analysis under the multifactorial model. In the following, we shall present rank procedures which are similar to RT methods in the sense that they also resemble the analysis of variance approach, however they incorporate more rigorous treatment on the data rather than just replacing the actual observations by the overall rankings.
Convergence of type I error rates of the MRT test based on chi-squared approximation. The type I error rates of the MRT test based on chi-squared approximation were evaluated under varying sample sizes. A 2 × 2 design and a 3 × 4 design were considered.
N = 5
N = 15
N = 20
2 × 2
3 × 4
2 × 2
3 × 4
2 × 2
3 × 4
2 × 2
3 × 4
2 × 2
3 × 4
In practice, ties are commonly encountered in microarray data due to rounding and data modification . In the presence of ties, we adopted the method of mid-ranks which assigns each tied individual the average of the tied ranks. There are other methods of dealing with ties such as the methods of randomization and the average statistics. However it has been shown that the randomization method is less powerful under the alternatives due to the supplementary random effects introduced by the randomization. In addition, the method of average statistics typically leads to a conservative test that has a lower significance level than the nominal one . Thus the method of mid-ranks is most frequently used compared to other method to handle ties. As little is known about the small-sample performance of MRT using mid-ranks, it is of interest to conduct simulation studies to investigate this aspect. The related result is provided in the subsequent section.
An important aspect related to multifactorial design is to address treatment-covariate interaction effects. When the interaction is present, the gene expression level will be affected by the treatment disproportionally over different covariate levels. Based on the additive decomposition model for the treatment effects T ij = M i + γ ij , the testing of interaction effect is equivalent to the testing of the hypotheses H 20 : γ ij = 0, for all i, j versus H 2a : γ ij ≠ 0 for some i, j. As we have emphasized above, the RT method does not yield valid statistics for interaction effects (see [22–25]). Instead we employed the aligned rank transform test (ART) to test for the above hypotheses . ART test consists of performing the analysis of variance test on the ranked residuals of the aligned observations. Although both utilize the ANOVA formula, the ART method differs from the RT method as it is based on residuals after the alignment. In contrast to RT, ART is a valid test for interaction regardless of the presence of main effects .
If there are no interaction effects, we can consider a simpler model: X kijn = θ k + M i + C j + ε kijn , with M i denoting the treatment main effect and C j denoting the covariate effect. Testing for the treatment main effect corresponds to the hypotheses: H 30 : M i = 0 for all i against H 3a : M i ≠ 0 for some i. We propose to employ the rank transform statistic suggested in . It is worthy to point out that the testing of main effects in the absence of interaction is one of very few situations that naive application of the ANOVA formula on rank scores can yield valid statistic with satisfactory power properties.
In data analysis, the three testing procedures discussed above are connected. The following empirical rule regards how to proceed to choose the tests in a real data analysis. As the first step of analysis, the hypothesis of treatment effects (H 10) is usually tested to see if the gene is differentially expressed across treatment groups. If H 10 is accepted, no more actions will be taken as no differential expression is detected. If H 10 is rejected, we may further perform the test for interaction effects (H 20) to see if the differential expression is partly due to the interactions between treatment groups and covariate groups. The acceptance of H 20 implies there exist no interaction effects. Then the testing for main effects (H 30) can be pursued on the basis that the interaction effects are found insignificant.
Simulation studies were conducted to evaluate the performances of the proposed rank methods in comparison with the other two competing methods, the parametric F test (FT) and the permutation F test (PFT) that uses the F statistic but computes p-values through permutations. The criterion used in the comparison is the efficiency gain relative to the FT method, defined as
where T can be either the MRT, ART, RT, or PFT. Obviously, when the test T outperforms the FT, the EG will be positive; otherwise, the EG is negative.
The performances of these methods were evaluated under different noise distributions and different numbers of replications. We considered a replicated factorial array experiment involving two treatment groups, two levels of a clinical covariate and varying cell sample sizes. Average type I error rates and power were calculated from 1,000 simulation runs. From the literature it has been shown that normal, uniform, log-normal, Cauchy and normal mixture distributions, among others, are commonly seen for microarray expression data . In our simulation, we considered normal N(0, 1), uniform U(-2, 2), log-normal LN(0,1), Cauchy C(0.5) and contaminated normal CN(0.75, 0.5, 2) = 0.75N(0, 0.5) + 0.25N(0, 2). To some extent, contaminated normal can be used to model data with sample contamination, with one normal component representing the true underlying mRNA population of interest and the other normal behaving as the mRNA population from the contamination source. It is recognized that this normal mixture model may not be able to describe more irregular and dramatic data contamination such as distorted array image or scratched array regions. Fortunately the proposed nonparametric method does not rely on the correct characterization of the underlying distribution. This set of distributions were selected mainly for comparison purpose and they represent a broad range of characteristics from light-tailed to heavy-tailed, and from symmetric to asymmetric distributions.
We first evaluated the performance of the proposed MRT statistic for the testing of the treatment effects. We set the clinical effects as C 1 = -0.5, and C 2 = 0.5. Under the alternative situation, we set the treatment effects as T 11 = 0.7, and T 12 = 0.7, T 21 = -0.9, T 22 = -0.5, which were induced by the main effects M 1 = 0.6, M 2 = -0.6, and the interaction effects γ 11 = 0.1, γ 12 = 0.1, γ 21 = -0.3, γ 22 = 0.1.
Type I error rates and power for treatments effects. The type I error rates and power were evaluated under five different error distributions – normal, uniform, lognormal, contaminated normal and Cauchy. The values inside and outside parenthesis are type I error rates and power, respectively. The EG(PFT) and EG(MRT) denote the efficiency gain of PFT and MRT versus FT.
With regard to the power, the results are distribution dependent. For the medium or light-tailed distributions (normal and uniform), FT and PFT have similar performances and both of them achieve higher power than MRT test. In contrast, for the other distributions with heavy-tails, skewedness and contamination, MRT appears superior to the two competing methods. When the sample size N is 10, the MRT's efficiency gain, EG(MRT), is 54.3%, 15.6%, and 227.4% under log-normal, contaminated normal and Cauchy respectively. On the other hand, with the same sample size, the efficiency loss of the MRT is approximately 1.0% and 5.8% under normal and uniform. Compared to the amount of efficiency gain for the MRT versus the FT, the amount of efficiency loss seems to be marginal. Similar conclusions can be drawn when the sample size is 5. That is, the MRT's efficiency loss is approximately 5.6% for normal and 15.5% for uniform; the MRT's efficiency gain is 52.1%, 30.3%, and 145.2% for log-normal, contaminated normal and Cauchy respectively.
Type I error rates and power in the presence of ties. The type I error rates and power were evaluated under different error distributions and varying number of ties. The values inside and outside parenthesis are type I error rates and power, respectively. The cell sample size N = 5.
# of ties
Type I error rates and power of different tests for interaction effects. The type I error rates and power of the three different tests for interaction effects were evaluated under different error distributions. The values inside and outside parenthesis are type I error rates and power, respectively.
The above discussion focuses on the single gene analysis. However, in microarray analysis the subsequent analysis step typically involves either adjusting the significance for multiple testing [33, 34], or ranking genes according to the significance level such that the most relevant top k genes could be selected. Although discussing these global analysis approaches is beyond the scope of this paper, we are fully aware that the capability of a testing procedure to generate extreme p-values has a direct influence on the selection of the most relevant genes. When the Bonferroni procedure is employed to deal with the multiplicity, the Wilcoxon rank sum test is more conservative and less powerful than the Fisher-Pitman test or the parametric t-test . It was further demonstrated that the discreteness of the exact permutation distribution of the Wilcoxon test is responsible for the conservatism . Because of this, the Baumgartner-Weiß-Schindler test is recommended, as its exact permutation distribution has more non-zero mass probabilities and capable of generating richer small p-values than the Wilcoxon test. It is worthy pointing out that as the Bonferroni procedure is almost always more conservative than other multiple testing procedures, it will suffer most from the discreteness problem of the permutation distribution. Other multiple testing procedures impose less stringent p-value thresholds, therefore they are affected by the discreteness problem to a lesser extent.
If more exploratory research can be afforded to look for other genes, it is suggested to investigate the genes identified exclusively by the MRT (not by either the FT or the PFT). This extra list might provide a potential list of candidate genes that did not pass the two F-tests due to non-normal distributed noise in the data. To scrutinize this list, we also investigated the functions of 19 genes remaining in the list. The information regarding these 19 genes' functions is also available from the website as above. Among these 19 genes, 11 genes share similar functions as protein binding, transfer activity, signal pathway, receptor activities, mitochondrial electron transport chain, catalytic activities and kinase activities. It remains inconclusive if the other 8 genes can be supported as true positives due to the lack of known biological evidence.
As selecting the top listed genes only provides the set of most favorable candidates, no probabilistic statement can be attached to the findings. Alternatively, we can assess the significance of the findings under the multiple testing framework. Instead of using the stringent Bonferroni procedure, we applied the Benjamini and Hochberg's linear step-up procedure to control false discovery rate (FDR) . As this procedure selected genes based on the ordered p-values, the significant genes were chosen consecutively down the top gene lists. By Controlling the FDR at level 0.05, the parametric F-test found 13 significant genes, 8 of which were found by either the permutation F-test or the nonparametric rank test. Given that there are only two replicates in the data set, it is not surprising that permutation-based methods identified a smaller number of significant genes, due to the discreteness of the permutation distribution discussed above.
Because there is a loss of information whenever the original data is collapsed to ranked data, the abandonment of parametric methods may not be cost-effective in all settings. In this article we have thoroughly investigated the positives and negatives of the proposed nonparametric rank tests versus the parametric ANOVA tests: (1) Due to the information loss, the rank tests are marginally less powerful than the ANOVA tests for normal, uniform or other light-tailed distributions. On the other hand, our simulation illustrated that the rank tests are substantially more powerful than the ANOVA tests if the data follow heavy-tailed, skewed or asymmetric distributions. (2) Our investigation also demonstrated that reasonable number of replicates (N ≥ 5 for 2 × 2 design) are required to lessen the discreteness of permutation distribution encountered by the rank tests to evaluate p-values. In contrast, when the normality assumption is validated, the p-value of the parametric ANOVA statistic can be evaluated from the exact F distribution. (3) In the presence of severe outliers, the robust rank tests is more favorable than the parameter ANOVA tests. (4) When it is difficult to characterize the distribution of the data, the proposed distribution-free rank tests are useful to conduct an appropriate and powerful analysis.
As the comparative properties of rank tests relative to ANOVA tests are distribution dependent, distribution diagnostics can help the practitioners to determine which test will yield better power for a specific data set. Graphic inspections such as box-plot and normal probability plot offer a convenient way to visualize the shape of the underlying distribution. To quantify the magnitude of the deviation from normality, the Shapiro-Wilk test can be performed . Let x ,...,x [N] be the ordered values of N independent and identically distributed observations. Let z ,...,z [N] denote the vector of the associated quantiles of the standard normal distribution. The Shapiro-Wilk statistic is defined as the squared correlation between the ordered data values (sample quantiles) and the normal quantiles:
With regard to future extensions of the proposed methods, the tests discussed above can be applied to a high-way layout by collapsing these covariates into one. For example, a covariate with J levels and another covariate with K levels can be combined as a single factor of JK levels, so that the treatment effects can still be tested using the above two-way layout. When the data contains continuous covariates in certain applications, one can simply apply the proposed rank test method on the basis of residuals, the differences between the observations and the least squares fitted values calculated by using all the continuous covariates. Furthermore, it is possible to extend our methods to accommodate the dependence or heteroscedasticity which might occur in the microarray data sets. If the variances vary across different covariate groups, j = 1,..., J, the MRT statistic can still be employed to test for treatment effects using the standardized overall rank Z ijn . To deal with two-way models with repeated measures on one factor or on both factors, the rank statistic can be extended to a quadratic form incorporating an estimated covariance matrix reflecting the dependence structure in the data .
In this article, we have focused on the interaction effects between multiple attributing factors to the gene expression. Currently there has been an increasing interest in studying interactions between genes as opposed to clinical factors. To address this problem, we could select a number of genes and treat their expressions measurements as explanatory variables. The biological phenotype of interest can be chosen as the response variable. Then a linear model can be fitted linking the gene expressions and biological phenotype. The aforementioned interaction tests can be applied to this setting to investigate the possible interactions among the genes.
We have presented a set of nonparametric tests to detect treatment effects, clinical covariate effects, and interaction effects for multifactorial microarray data. These methods can be extended to accommodate high-way layouts, continuous covariates, dependent observations and heteroscedasticity which might occur in the microarray data sets. The proposed nonparametric procedures will prove to be of wide use in microarray data analysis as they can accommodate various noise distributions across genome.
The first hypothesis H 01 is formulated to test for treatment effects in two-way layout. Correspondingly, we have proposed a modified rank transform (MRT) test. This test standardizes the rank scores before plugging them into the analysis of variance formula. For simplicity in notation, we suppress the index k, as all the observations in the model are from a specific gene k. Let R ijn denote the rank of X ijn among all of the observations and define . Let denote the sample variance of ranks within the j th column. Define the standardized rank score Z ijn = R ijn /s j . Denote the marginal and overall averages of the standardized rank scores by and . The proposed modified rank transform statistic takes the following form:
It has been shown that the standardization procedure is essential for the validity of the MRT method as the nonlinear rank transformation introduces the heteroscedasticity into the ranked data . To assess the significance of the rank statistic, the permutation method will be invoked to provide p-values of the observed statistic. In implementation, we randomly relabel I treatment groups within each of J covariate levels. Namely, the set of observations X 1j1,..., X 1jN ,..., X Ij1,..., X IjN are shuffled within column j for 1 ≤ j ≤ J. For illustration purpose, consider a microarray data set with the covariate consisting of six different tissue regions and the treatment consisting of two distinct mouse strains. The six covariate levels correspond to the six tissue regions. To generate a permuted data set, for the 2N measurements obtained from the same tissue region, we randomly assign N of them to the first mouse strain and assign the remaining observations to the second mouse strain. Repeat this procedure until we have permuted for all the tissue regions to generate a new permuted data set. Then we calculate the proportion of the resulting statistic (3) being equal to or larger than the observed statistic over 10,000 permutations to obtain the permutation p-value.
The second hypothesis H 02 is formulated to test for interaction effects in two-way layout. To address this testing problem, the ART test is proposed to perform the analysis of variance test on the ranked residuals, of the aligned observations .
Here and are the Hodges-Lehmann estimates of the two main effects given by
Again, with low replicates, we propose to use the permutation method to compute p-values under the null H 20. In implementation, we randomly relabel both indices i and j within all the aligned observations and obtain the empirical p-value over 10,000 permutations.
The third hypothesis H 03 is formulated to test for main effects in the absence of interaction effects in a two-way layout. We propose to employ the rank transform statistic suggested in  which is formulated as follows:
The resulting RT statistic asymptotically follows a χ 2 distribution with I – 1 degrees of freedom. Likewise, we can test if there is a difference of gene expression among the clinical covariate levels for gene k, using a test statistic similar to (4), with only indexes I and J being swapped.
This work is supported by Natural Sciences and Engineering Research Council of Canada grants. We thank Hong Xu and Rui Liu for their assistance to the implementation of the algorithms. We are grateful to the Editor and two referees for their helpful comments that improved the manuscript.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.