Nonparametric tests for differential gene expression and interaction effects in multi-factorial microarray experiments
- Xin Gao^{1}Email author and
- Peter XK Song^{2}
DOI: 10.1186/1471-2105-6-186
© Gao and Song; licensee BioMed Central Ltd. 2005
Received: 19 April 2005
Accepted: 21 July 2005
Published: 21 July 2005
Abstract
Background
Numerous nonparametric approaches have been proposed in literature to detect differential gene expression in the setting of two user-defined groups. However, there is a lack of nonparametric procedures to analyze microarray data with multiple factors attributing to the gene expression. Furthermore, incorporating interaction effects in the analysis of microarray data has long been of great interest to biological scientists, little of which has been investigated in the nonparametric framework.
Results
In this paper, we propose a set of nonparametric tests to detect treatment effects, clinical covariate effects, and interaction effects for multifactorial microarray data. When the distribution of expression data is skewed or heavy-tailed, the rank tests are substantially more powerful than the competing parametric F tests. On the other hand, in the case of light or medium-tailed distributions, the rank tests appear to be marginally less powerful than the parametric competitors.
Conclusion
The proposed rank tests enable us to detect differential gene expression and establish interaction effects for microarray data with various non-normally distributed expression measurements across genome. In the presence of outliers, they are advantageous alternative approaches to the existing parametric F tests due to the robustness feature.
Background
High density oligonucleotide microarray, spotted cDNA array, or other array technologies have presented not only daunting amount of expression data for biologists to explore the inherent biological mechanisms, but also challenging statistical analysis problems. A replicated microarray experiment involves multiple arrays to compare gene expression profile under different conditions. However, normality assumption justifying parametric testing is often untenable in microarray studies [1, 2]. For instance, a set of 540 genes from a leukemia data set [3] were analyzed and various distributions for the different genes were found, in which only 13.3% genes have error distributions satisfying the normality assumption [4]. If the underlying distributions of expression measurements can be validated properly, model-based approaches such as likelihood or Bayesian inference can validly accept non-normally distributed data and gain satisfactory power to detect differentially expressed genes (e.g. [5–10]). For example, a hierarchical mixture-model has been proposed with parameterizations for Gamma or log-normally distributed measurements [10]. However when the distribution of the data is difficult to characterize, nonparametric inference makes less stringent distributional assumptions and thereby provide appropriate analysis.
Furthermore data contamination can arise in microarray setting due to different reasons. For instance, an image contamination can occur if a long scratch is present on the array image or a corner of the array is misaligned in the image processing stage. Sample contamination can occur if the mRNA sample is contaminated with other sources of RNA present in the laboratory environment. Such outliers are dramatically different from the majority of the observations and can greatly undermine the sensitivity of parametric approach. A method of assessing goodness of fit to a linear model has been used to automatically detect outliers that possess too large deviation from the overall pattern [11]. Alternatively, a quality index based on coefficient variation was adopted to filter out outlying values with poor quality [9]. Nevertheless the inspection process is time consuming for such large-scale expression data analysis [11]. In this context, nonparametric inference is advantageous as it is insensitive to the presence of outliers. Even without the step of outlier filtering, the validity and power of the nonparametric procedures would be minimally affected.
The development of both parametric and nonparametric methods to address the two condition problem in microarray setting has recently received much attention. Most of the parametric tests employed t or t-like statistics and differ primarily in the estimation of variance [12]. In contrast to these methods which treat the genes as separate fixed effects, the two-group Bayes method was proposed to treat the genes as arising from a certain population. Thus the dimensionality of the inference problem was reduced by sharing information across the array [5, 9]. Nonparametric approaches have also been proposed for two-user defined groups [12, 13]. The Wilcoxon rank sum test was considered in [14, 15] to identify differentially expressed genes in comparison with the Fisher-Pitman permutation test, which is also referred as the nonparametric t test [15]. Recently, the Baumgartner-Weiß-Schindler test has been recommended to detect differentially expressed genes in two groups, which was shown to be less conservative and more powerful than the Wilcoxon rank sum test [16].
However, a microarray experiment often has more complicated design than that of two user-defined groups. Besides the treatment effects of interest, there may exist some clinical covariates such as age, gender and certain clinical symptoms, which also influence the gene expression level. For such experiments, a factorial design model is useful to account for the multiple sources of variation. Townsend and Hartl [6] derived a Bayesian model that has been widely used for the estimation of gene expression levels in multifactorial experiments [7, 17]. This model has been extended [8] to accommodate not only additive error terms but also multiplicative error terms to resolve small yet statistically significant differences in gene expression. Alternatively, an overall ANOVA model has also been widely used that simultaneously considers all the genes on the arrays and incorporates array effect and dye effect [18]. A gene specific ANOVA model under the normality assumption was considered in [19]. A mixed linear model was proposed to assess gene significance in which both fixed treatment effects and random array effects were assumed [20]. Unfortunately, there has been no nonparametric procedure proposed up to date to analyze multifactorial microarray data. In addition, the establishment of interaction effect between the multiple attributing factors can help elucidate certain biological mechanisms related to the regulation of gene expression. Thus it is desirable to develop a set of nonparametric procedures to detect differential gene expression and establish interaction effects for multifactorial microarray data.
Results
Principle of the method
To account for the multiple sources of variation attributing to the gene expression, we consider the following model for each specific gene:
X_{ kijn }= θ_{ k }+ T_{ ij }+ C_{ j }+ ε_{ kijn }, k = 1,..., K; i = 1,..., I; j = 1,..., J; n = 1,..., N (1)
with ∑_{ j }C_{ j }= 0, and ∑_{i, j}T_{ ij }= 0, where k indexes for the gene number, i indexes for the treatment group, j indexes for the covariate group, n indexes for the replicate number. In the equation, X_{ kijn }represents the expression measurement, θ_{ k }represents the k^{ th }gene specific mean, T_{ ij }represents the effect of the i^{ th }treatment group (for instance, drug treatments, tissue types, and strains of mice) through its main effect and interaction effect with the j^{ th }level of the clinical covariate, and C_{ j }represents the effect of the j^{ th }level of the clinical covariate. The error terms ε_{ kijn }are independently and identically distributed random noise from a continuous distribution function F_{ k }.
To further discern the interaction effect, the treatment effect T_{ ij }, can be decomposed into T_{ ij }= M_{ i }+ γ_{ ij }with ∑_{ i }M_{ i }= 0 and ∑_{i, j}γ_{ ij }= 0, where M_{ i }denotes the main effect of the treatment group and γ_{ ij }denotes the interaction effect. Interaction effects are often of biological interest when the treatment effects are heterogeneous across the levels of the clinical covariate. For example, consider a data set with mouse strains as treatment groups and tissue types as covariate groups, the interaction effects arise when the effects of different mouse strains are disproportional over different tissue types. It is worth noting that model (1) is related to ANOVA models proposed by other researchers [18–20]. The difference between our factorial model (1) and the existing ANOVA models is two-fold: model (1) accommodates multifactor effects on each specific gene, and it does not make normality assumption on the error terms ε_{ ijkn }.
To develop nonparametric rank tests for multifactorial microarray experiments, it is natural to consider rank procedure which can be viewed as nonparametric analogue of the parametric analysis of variance approach. The popular rank transform (RT) method consists of replacing the observations by their ranks in the combined sample and performing one of the standard analysis of variance tests on these ranks [21]. However in the general multifactorial model, the RT method is not valid for most of the common hypotheses due to the nonlinear nature of the rank transformation. For example in the presence of interaction effect, the naive application of ranks into ANOVA formula cannot be used to detect for main effect nor for the interaction effect. Theoretical validations of these limitations of RT method have been thoroughly discussed by Brunner and Neumann, Akritas, and Wilcox, among many others [22–28]. Since the RT method can be easily accomplished by using standard computer packages, extra caution needs to be exerted to prevent the inappropriate extensions of RT method for microarray data analysis under the multifactorial model. In the following, we shall present rank procedures which are similar to RT methods in the sense that they also resemble the analysis of variance approach, however they incorporate more rigorous treatment on the data rather than just replacing the actual observations by the overall rankings.
Convergence of type I error rates of the MRT test based on chi-squared approximation. The type I error rates of the MRT test based on chi-squared approximation were evaluated under varying sample sizes. A 2 × 2 design and a 3 × 4 design were considered.
Dist | Design | N = 5 | N = 15 | N = 20 |
---|---|---|---|---|
N | 2 × 2 | 0.077 | 0.047 | 0.052 |
3 × 4 | 0.075 | 0.055 | 0.053 | |
U | 2 × 2 | 0.075 | 0.046 | 0.051 |
3 × 4 | 0.083 | 0.055 | 0.053 | |
LN | 2 × 2 | 0.074 | 0.060 | 0.055 |
3 × 4 | 0.078 | 0.061 | 0.056 | |
CN | 2 × 2 | 0.081 | 0.053 | 0.051 |
3 × 4 | 0.072 | 0.054 | 0.053 | |
C | 2 × 2 | 0.072 | 0.046 | 0.052 |
3 × 4 | 0.068 | 0.061 | 0.055 |
In practice, ties are commonly encountered in microarray data due to rounding and data modification [16]. In the presence of ties, we adopted the method of mid-ranks which assigns each tied individual the average of the tied ranks. There are other methods of dealing with ties such as the methods of randomization and the average statistics. However it has been shown that the randomization method is less powerful under the alternatives due to the supplementary random effects introduced by the randomization. In addition, the method of average statistics typically leads to a conservative test that has a lower significance level than the nominal one [29]. Thus the method of mid-ranks is most frequently used compared to other method to handle ties. As little is known about the small-sample performance of MRT using mid-ranks, it is of interest to conduct simulation studies to investigate this aspect. The related result is provided in the subsequent section.
An important aspect related to multifactorial design is to address treatment-covariate interaction effects. When the interaction is present, the gene expression level will be affected by the treatment disproportionally over different covariate levels. Based on the additive decomposition model for the treatment effects T_{ ij }= M_{ i }+ γ_{ ij }, the testing of interaction effect is equivalent to the testing of the hypotheses H_{20} : γ_{ ij }= 0, for all i, j versus H_{2a}: γ_{ ij }≠ 0 for some i, j. As we have emphasized above, the RT method does not yield valid statistics for interaction effects (see [22–25]). Instead we employed the aligned rank transform test (ART) to test for the above hypotheses [30]. ART test consists of performing the analysis of variance test on the ranked residuals of the aligned observations. Although both utilize the ANOVA formula, the ART method differs from the RT method as it is based on residuals after the alignment. In contrast to RT, ART is a valid test for interaction regardless of the presence of main effects [31].
If there are no interaction effects, we can consider a simpler model: X_{ kijn }= θ_{ k }+ M_{ i }+ C_{ j }+ ε_{ kijn }, with M_{ i }denoting the treatment main effect and C_{ j }denoting the covariate effect. Testing for the treatment main effect corresponds to the hypotheses: H_{30} : M_{ i }= 0 for all i against H_{3a}: M_{ i }≠ 0 for some i. We propose to employ the rank transform statistic suggested in [21]. It is worthy to point out that the testing of main effects in the absence of interaction is one of very few situations that naive application of the ANOVA formula on rank scores can yield valid statistic with satisfactory power properties.
In data analysis, the three testing procedures discussed above are connected. The following empirical rule regards how to proceed to choose the tests in a real data analysis. As the first step of analysis, the hypothesis of treatment effects (H_{10}) is usually tested to see if the gene is differentially expressed across treatment groups. If H_{10} is accepted, no more actions will be taken as no differential expression is detected. If H_{10} is rejected, we may further perform the test for interaction effects (H_{20}) to see if the differential expression is partly due to the interactions between treatment groups and covariate groups. The acceptance of H_{20} implies there exist no interaction effects. Then the testing for main effects (H_{30}) can be pursued on the basis that the interaction effects are found insignificant.
Single gene analysis
Simulation studies were conducted to evaluate the performances of the proposed rank methods in comparison with the other two competing methods, the parametric F test (FT) and the permutation F test (PFT) that uses the F statistic but computes p-values through permutations. The criterion used in the comparison is the efficiency gain relative to the FT method, defined as
where T can be either the MRT, ART, RT, or PFT. Obviously, when the test T outperforms the FT, the EG will be positive; otherwise, the EG is negative.
The performances of these methods were evaluated under different noise distributions and different numbers of replications. We considered a replicated factorial array experiment involving two treatment groups, two levels of a clinical covariate and varying cell sample sizes. Average type I error rates and power were calculated from 1,000 simulation runs. From the literature it has been shown that normal, uniform, log-normal, Cauchy and normal mixture distributions, among others, are commonly seen for microarray expression data [4]. In our simulation, we considered normal N(0, 1), uniform U(-2, 2), log-normal LN(0,1), Cauchy C(0.5) and contaminated normal CN(0.75, 0.5, 2) = 0.75N(0, 0.5) + 0.25N(0, 2). To some extent, contaminated normal can be used to model data with sample contamination, with one normal component representing the true underlying mRNA population of interest and the other normal behaving as the mRNA population from the contamination source. It is recognized that this normal mixture model may not be able to describe more irregular and dramatic data contamination such as distorted array image or scratched array regions. Fortunately the proposed nonparametric method does not rely on the correct characterization of the underlying distribution. This set of distributions were selected mainly for comparison purpose and they represent a broad range of characteristics from light-tailed to heavy-tailed, and from symmetric to asymmetric distributions.
We first evaluated the performance of the proposed MRT statistic for the testing of the treatment effects. We set the clinical effects as C_{1} = -0.5, and C_{2} = 0.5. Under the alternative situation, we set the treatment effects as T_{11} = 0.7, and T_{12} = 0.7, T_{21} = -0.9, T_{22} = -0.5, which were induced by the main effects M_{1} = 0.6, M_{2} = -0.6, and the interaction effects γ_{11} = 0.1, γ_{12} = 0.1, γ_{21} = -0.3, γ_{22} = 0.1.
Type I error rates and power for treatments effects. The type I error rates and power were evaluated under five different error distributions – normal, uniform, lognormal, contaminated normal and Cauchy. The values inside and outside parenthesis are type I error rates and power, respectively. The EG(PFT) and EG(MRT) denote the efficiency gain of PFT and MRT versus FT.
Dist | N | FT | PFT | MRT | EG (PFT) | EG (MRT) |
---|---|---|---|---|---|---|
N | 5 | 0.732 (0.050) | 0.732 (0.050) | 0.691 (0.054) | 0.000 | -0.056 |
10 | 0.976 (0.050) | 0.976 (0.050) | 0.966 (0.050) | 0.000 | -0.010 | |
U | 5 | 0.575 (0.051) | 0.565 (0.047) | 0.486 (0.048) | -0.017 | -0.155 |
10 | 0.930 (0.054) | 0.930 (0.054) | 0.876 (0.050) | 0.000 | -0.058 | |
LN | 5 | 0.386 (0.029) | 0.461 (0.047) | 0.587 (0.048) | 0.194 | 0.521 |
10 | 0.576 (0.036) | 0.620 (0.052) | 0.889 (0.052) | 0.076 | 0.543 | |
CN | 5 | 0.565 (0.039) | 0.598 (0.058) | 0.736 (0.055) | 0.058 | 0.303 |
10 | 0.829 (0.047) | 0.838 (0.056) | 0.958 (0.052) | 0.010 | 0.156 | |
C | 5 | 0.230 (0.021) | 0.366 (0.054) | 0.564 (0.049) | 0.591 | 1.452 |
10 | 0.259 (0.016) | 0.402 (0.052) | 0.848 (0.055) | 0.552 | 2.274 |
With regard to the power, the results are distribution dependent. For the medium or light-tailed distributions (normal and uniform), FT and PFT have similar performances and both of them achieve higher power than MRT test. In contrast, for the other distributions with heavy-tails, skewedness and contamination, MRT appears superior to the two competing methods. When the sample size N is 10, the MRT's efficiency gain, EG(MRT), is 54.3%, 15.6%, and 227.4% under log-normal, contaminated normal and Cauchy respectively. On the other hand, with the same sample size, the efficiency loss of the MRT is approximately 1.0% and 5.8% under normal and uniform. Compared to the amount of efficiency gain for the MRT versus the FT, the amount of efficiency loss seems to be marginal. Similar conclusions can be drawn when the sample size is 5. That is, the MRT's efficiency loss is approximately 5.6% for normal and 15.5% for uniform; the MRT's efficiency gain is 52.1%, 30.3%, and 145.2% for log-normal, contaminated normal and Cauchy respectively.
Type I error rates and power in the presence of ties. The type I error rates and power were evaluated under different error distributions and varying number of ties. The values inside and outside parenthesis are type I error rates and power, respectively. The cell sample size N = 5.
Dist | # of ties | FT | PFT | MRT |
---|---|---|---|---|
N | 2 | 0.729 (0.051) | 0.726 (0.051) | 0.684 (0.052) |
5 | 0.730 (0.056) | 0.730 (0.054) | 0.682 (0.052) | |
10 | 0.724 (0.053) | 0.715 (0.051) | 0.687 (0.051) | |
U | 2 | 0.589 (0.054) | 0.577 (0.052) | 0.495 (0.052) |
5 | 0.580 (0.050) | 0.568 (0.046) | 0.493 (0.051) | |
10 | 0.554 (0.048) | 0.539 (0.046) | 0.482 (0.048) | |
LN | 2 | 0.391 (0.032) | 0.455 (0.041) | 0.581 (0.048) |
5 | 0.411 (0.039) | 0.465 (0.054) | 0.585 (0.056) | |
10 | 0.412 (0.031) | 0.448 (0.043) | 0.589 (0.045) | |
CN | 2 | 0.563 (0.038) | 0.594 (0.051) | 0.730 (0.053) |
5 | 0.575 (0.040) | 0.599 (0.048) | 0.735 (0.056) | |
10 | 0.597 (0.044) | 0.613 (0.054) | 0.741 (0.057) | |
C | 2 | 0.244 (0.023) | 0.397 (0.057) | 0.533 (0.052) |
5 | 0.262 (0.022) | 0.370 (0.050) | 0.562 (0.049) | |
10 | 0.292 (0.030) | 0.368 (0.048) | 0.558 (0.055) |
Type I error rates and power of different tests for interaction effects. The type I error rates and power of the three different tests for interaction effects were evaluated under different error distributions. The values inside and outside parenthesis are type I error rates and power, respectively.
Dist | N | FT | PFT | ART | EG (PFT) | EG (ART) |
---|---|---|---|---|---|---|
N | 5 | 0.711 (0.055) | 0.697 (0.053) | 0.698 (0.048) | -0.020 | -0.018 |
10 | 0.952 (0.051) | 0.951 (0.051) | 0.947 (0.051) | -0.001 | -0.005 | |
U | 5 | 0.565 (0.049) | 0.548 (0.045) | 0.520 (0.050) | -0.030 | -0.080 |
10 | 0.891 (0.047) | 0.891 (0.048) | 0.833 (0.047) | 0.000 | -0.065 | |
LN | 5 | 0.405 (0.031) | 0.430 (0.040) | 0.579 (0.048) | 0.062 | 0.430 |
10 | 0.576 (0.038) | 0.593 (0.045) | 0.893 (0.050) | 0.030 | 0.550 | |
CN | 5 | 0.525 (0.035) | 0.540 (0.044) | 0.718 (0.045) | 0.029 | 0.368 |
10 | 0.812 (0.043) | 0.817 (0.050) | 0.969 (0.051) | 0.006 | 0.193 | |
C | 5 | 0.239 (0.014) | 0.336 (0.029) | 0.523 (0.044) | 0.406 | 1.188 |
10 | 0.274 (0.022) | 0.364 (0.056) | 0.859 (0.052) | 0.328 | 2.135 |
Global array analysis
The above discussion focuses on the single gene analysis. However, in microarray analysis the subsequent analysis step typically involves either adjusting the significance for multiple testing [33, 34], or ranking genes according to the significance level such that the most relevant top k genes could be selected. Although discussing these global analysis approaches is beyond the scope of this paper, we are fully aware that the capability of a testing procedure to generate extreme p-values has a direct influence on the selection of the most relevant genes. When the Bonferroni procedure is employed to deal with the multiplicity, the Wilcoxon rank sum test is more conservative and less powerful than the Fisher-Pitman test or the parametric t-test [15]. It was further demonstrated that the discreteness of the exact permutation distribution of the Wilcoxon test is responsible for the conservatism [16]. Because of this, the Baumgartner-Weiß-Schindler test is recommended, as its exact permutation distribution has more non-zero mass probabilities and capable of generating richer small p-values than the Wilcoxon test. It is worthy pointing out that as the Bonferroni procedure is almost always more conservative than other multiple testing procedures, it will suffer most from the discreteness problem of the permutation distribution. Other multiple testing procedures impose less stringent p-value thresholds, therefore they are affected by the discreteness problem to a lesser extent.
Biological data analysis
If more exploratory research can be afforded to look for other genes, it is suggested to investigate the genes identified exclusively by the MRT (not by either the FT or the PFT). This extra list might provide a potential list of candidate genes that did not pass the two F-tests due to non-normal distributed noise in the data. To scrutinize this list, we also investigated the functions of 19 genes remaining in the list. The information regarding these 19 genes' functions is also available from the website as above. Among these 19 genes, 11 genes share similar functions as protein binding, transfer activity, signal pathway, receptor activities, mitochondrial electron transport chain, catalytic activities and kinase activities. It remains inconclusive if the other 8 genes can be supported as true positives due to the lack of known biological evidence.
As selecting the top listed genes only provides the set of most favorable candidates, no probabilistic statement can be attached to the findings. Alternatively, we can assess the significance of the findings under the multiple testing framework. Instead of using the stringent Bonferroni procedure, we applied the Benjamini and Hochberg's linear step-up procedure to control false discovery rate (FDR) [38]. As this procedure selected genes based on the ordered p-values, the significant genes were chosen consecutively down the top gene lists. By Controlling the FDR at level 0.05, the parametric F-test found 13 significant genes, 8 of which were found by either the permutation F-test or the nonparametric rank test. Given that there are only two replicates in the data set, it is not surprising that permutation-based methods identified a smaller number of significant genes, due to the discreteness of the permutation distribution discussed above.
Discussion
Because there is a loss of information whenever the original data is collapsed to ranked data, the abandonment of parametric methods may not be cost-effective in all settings. In this article we have thoroughly investigated the positives and negatives of the proposed nonparametric rank tests versus the parametric ANOVA tests: (1) Due to the information loss, the rank tests are marginally less powerful than the ANOVA tests for normal, uniform or other light-tailed distributions. On the other hand, our simulation illustrated that the rank tests are substantially more powerful than the ANOVA tests if the data follow heavy-tailed, skewed or asymmetric distributions. (2) Our investigation also demonstrated that reasonable number of replicates (N ≥ 5 for 2 × 2 design) are required to lessen the discreteness of permutation distribution encountered by the rank tests to evaluate p-values. In contrast, when the normality assumption is validated, the p-value of the parametric ANOVA statistic can be evaluated from the exact F distribution. (3) In the presence of severe outliers, the robust rank tests is more favorable than the parameter ANOVA tests. (4) When it is difficult to characterize the distribution of the data, the proposed distribution-free rank tests are useful to conduct an appropriate and powerful analysis.
As the comparative properties of rank tests relative to ANOVA tests are distribution dependent, distribution diagnostics can help the practitioners to determine which test will yield better power for a specific data set. Graphic inspections such as box-plot and normal probability plot offer a convenient way to visualize the shape of the underlying distribution. To quantify the magnitude of the deviation from normality, the Shapiro-Wilk test can be performed [39]. Let x_{[1]},...,x_{[N]}be the ordered values of N independent and identically distributed observations. Let z_{[1]},...,z_{[N]}denote the vector of the associated quantiles of the standard normal distribution. The Shapiro-Wilk statistic is defined as the squared correlation between the ordered data values (sample quantiles) and the normal quantiles:
With regard to future extensions of the proposed methods, the tests discussed above can be applied to a high-way layout by collapsing these covariates into one. For example, a covariate with J levels and another covariate with K levels can be combined as a single factor of JK levels, so that the treatment effects can still be tested using the above two-way layout. When the data contains continuous covariates in certain applications, one can simply apply the proposed rank test method on the basis of residuals, the differences between the observations and the least squares fitted values calculated by using all the continuous covariates. Furthermore, it is possible to extend our methods to accommodate the dependence or heteroscedasticity which might occur in the microarray data sets. If the variances vary across different covariate groups, j = 1,..., J, the MRT statistic can still be employed to test for treatment effects using the standardized overall rank Z_{ ijn }[25]. To deal with two-way models with repeated measures on one factor or on both factors, the rank statistic can be extended to a quadratic form incorporating an estimated covariance matrix reflecting the dependence structure in the data [40].
In this article, we have focused on the interaction effects between multiple attributing factors to the gene expression. Currently there has been an increasing interest in studying interactions between genes as opposed to clinical factors. To address this problem, we could select a number of genes and treat their expressions measurements as explanatory variables. The biological phenotype of interest can be chosen as the response variable. Then a linear model can be fitted linking the gene expressions and biological phenotype. The aforementioned interaction tests can be applied to this setting to investigate the possible interactions among the genes.
Conclusion
We have presented a set of nonparametric tests to detect treatment effects, clinical covariate effects, and interaction effects for multifactorial microarray data. These methods can be extended to accommodate high-way layouts, continuous covariates, dependent observations and heteroscedasticity which might occur in the microarray data sets. The proposed nonparametric procedures will prove to be of wide use in microarray data analysis as they can accommodate various noise distributions across genome.
Methods
Rank test for treatment effects
The first hypothesis H_{01} is formulated to test for treatment effects in two-way layout. Correspondingly, we have proposed a modified rank transform (MRT) test. This test standardizes the rank scores before plugging them into the analysis of variance formula. For simplicity in notation, we suppress the index k, as all the observations in the model are from a specific gene k. Let R_{ ijn }denote the rank of X_{ ijn }among all of the observations and define . Let denote the sample variance of ranks within the j^{ th }column. Define the standardized rank score Z_{ ijn }= R_{ ijn }/s_{ j }. Denote the marginal and overall averages of the standardized rank scores by and . The proposed modified rank transform statistic takes the following form:
It has been shown that the standardization procedure is essential for the validity of the MRT method as the nonlinear rank transformation introduces the heteroscedasticity into the ranked data [25]. To assess the significance of the rank statistic, the permutation method will be invoked to provide p-values of the observed statistic. In implementation, we randomly relabel I treatment groups within each of J covariate levels. Namely, the set of observations X_{1j 1},..., X_{1jN},..., X_{Ij 1},..., X_{ IjN }are shuffled within column j for 1 ≤ j ≤ J. For illustration purpose, consider a microarray data set with the covariate consisting of six different tissue regions and the treatment consisting of two distinct mouse strains. The six covariate levels correspond to the six tissue regions. To generate a permuted data set, for the 2N measurements obtained from the same tissue region, we randomly assign N of them to the first mouse strain and assign the remaining observations to the second mouse strain. Repeat this procedure until we have permuted for all the tissue regions to generate a new permuted data set. Then we calculate the proportion of the resulting statistic (3) being equal to or larger than the observed statistic over 10,000 permutations to obtain the permutation p-value.
Rank test for interaction effects
The second hypothesis H_{02} is formulated to test for interaction effects in two-way layout. To address this testing problem, the ART test is proposed to perform the analysis of variance test on the ranked residuals, of the aligned observations .
Again, with low replicates, we propose to use the permutation method to compute p-values under the null H_{20}. In implementation, we randomly relabel both indices i and j within all the aligned observations and obtain the empirical p-value over 10,000 permutations.
Rank test for main effects
The third hypothesis H_{03} is formulated to test for main effects in the absence of interaction effects in a two-way layout. We propose to employ the rank transform statistic suggested in [21] which is formulated as follows:
The resulting RT statistic asymptotically follows a χ^{2} distribution with I – 1 degrees of freedom. Likewise, we can test if there is a difference of gene expression among the clinical covariate levels for gene k, using a test statistic similar to (4), with only indexes I and J being swapped.
Declarations
Acknowledgements
This work is supported by Natural Sciences and Engineering Research Council of Canada grants. We thank Hong Xu and Rui Liu for their assistance to the implementation of the algorithms. We are grateful to the Editor and two referees for their helpful comments that improved the manuscript.
Authors’ Affiliations
References
- Hunter L, Taylor RC, Leach SM, Simon R: GEST: a gene expression search tool based on a novel Bayesian similarity metric. Bioinformatics 2001, 17(Suppl 1):S115-S122.View ArticlePubMed
- Zhao Y, Pan W: Modified nonparametric approaches to detecting differentially expressed genes in replicated microarray experiments. Bioinformatics 2003, 19: 1046–1054. 10.1093/bioinformatics/btf879View ArticlePubMed
- Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov CollerHJP, Loh ML, Downing JR, Caligiuri MA: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999, 286: 531–537. 10.1126/science.286.5439.531View ArticlePubMed
- Carey, 2004 HowTo Use the Bioconductor edd package[http://www.bioconductor.org/]
- Newton MA, Kendziorski CM, Richmond CS, Blattne rFR, Tsui KW: On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. Journal of computational biology 2001, 8: 37–52. 10.1089/106652701300099074View ArticlePubMed
- Townsend JP, Hartl DL: Bayesian analysis of gene expression levels: statistical quantification of relative mRNA level across multiple strains or treatments. Genome Biology 2002, 3: 1–71. 10.1186/gb-2002-3-12-research0071View Article
- Townsend JP: Multifactorial experimental design and the transitivity of ratios with spotted DNA microarrays. BMC Genomics 2003, 4: 41. 10.1186/1471-2164-4-41PubMed CentralView ArticlePubMed
- Townsend JP: Resolution of large and small differences in gene expression using models for the Bayesian analysis of gene expression levels and spotted DNA microarrys. BMC Bioinformatics 2004, 5: 54. 10.1186/1471-2105-5-54PubMed CentralView ArticlePubMed
- Tseng GC, Oh MK, Rohlin L, Liao JC, Wong WH: Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Research 2001, 29: 2549–2557. 10.1093/nar/29.12.2549PubMed CentralView ArticlePubMed
- Kendziorski CM, Newton MA, Lan L, Gould MN: On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Statistics in Medicine 2003, 22: 3899–3914. 10.1002/sim.1548View ArticlePubMed
- Li C, Wong WH: Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proc Natl Acad Sci USA 2001, 98: 31–36. 10.1073/pnas.011404098PubMed CentralView ArticlePubMed
- Dudoit S, Yang YH, Speed TP, Gallow MJ: Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica 2002, 12: 111–139.
- Park PJ, Pagano M, Bonetti M: A nonparametric scoring algorithm for identifying informative genes from microarray data. Pac Symp Biocomput 2001, 52–63.
- Wu TD: Analysis gene expression data from DNA microarrays to identify candidate genes. Journal of Pathology 2001, 195: 53–65. 10.1002/1096-9896(200109)195:1<53::AID-PATH891>3.0.CO;2-HView ArticlePubMed
- Troyanskaya OG, Barber ME, Brown PO, Botstein D, Altman RB: Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics 2002, 18: 1454–1461. 10.1093/bioinformatics/18.11.1454View ArticlePubMed
- Neuhäuser M, Senske R: The Baumgartner-Weiß-Schindler test for the detection of differentially expressed genes in replicated microarray experiments. Bioinformatics 2004, 20: 3553–3564.View ArticlePubMed
- Ranz JM, Castillo-Davis CI, Meiklejohn CD, Hartl DL: Sex-dependent gene expression and evolution of the Drosophila transcription. Science 2003, 300: 1742–1745. 10.1126/science.1085881View ArticlePubMed
- Kerr MK, Martin M, Churchill GA: Analysis of variance for gene expression microarray data. J Comput Biol 2000, 7: 819–837. 10.1089/10665270050514954View ArticlePubMed
- Pavlidis P, Noble WS: Analysis of strain and regional variation of gene expression in mouse brain. Genome Biology 2001, 2: 0042.1–0042.15. 10.1186/gb-2001-2-10-research0042View Article
- Wolfinger RD, Gibson G, Wolfinger ED, Bennett L, Hamadeh H, Bushel P, Afshari C, Paules RS: Assessing Gene Significance from cDNA Microarray Expression Data via Mixed Models. Journal of Computational Biology 2001, 8: 625–637. 10.1089/106652701753307520View ArticlePubMed
- Conover WJ, Iman RL: On some alternative procedures using ranks for the analysis of experimental designs. Communications in Statistics 1976, A5: 1349–1368.View Article
- Brunner E, Neumann N: Rank tests for the 2 × 2 split plot design. Metrika 1984, 31: 233–243. 10.1007/BF01915206View Article
- Brunner E, Neumann N: Two-sample rank tests in general models. Biometrical Journal 1986, 28: 395–402.View Article
- Brunner E, Neumann N: Rank tests in 2 × 2 designs. Statistica Neerlandica 1986, 40: 251–271.View Article
- Akritas MG: The rank transform method in some two-factor designs. Journal of the American Statistical Association 1990, 85: 73–78.View Article
- Akritas MG: Limitations of the rank transform procedure: A study of repeated-measure designs, Part I. Journal of the American Statistical Association 1991, 86: 457–460.View Article
- Akritas MG: Limitations of the rank transform procedure: A study of repeated-measure designs, Part II. Statistics and Probability Letters 1993, 17: 149–156. 10.1016/0167-7152(93)90009-8View Article
- Wilcox RR: Applying Contemporary Statistical Techniques. Academic press/Elsevier; 2003.
- Hájek J, Sidák Z: Theory of rank tests. New York: Academic Press; 1967.
- Mansouri H, Chang GH: A comparative study of some rank tests for interaction. Statistica Sinica 1995, 19: 85–96.
- Mansouri H: Aligned rank transform tests in linear models. Journal of Statistical Planning and Inference 1999, 79: 141–155. 10.1016/S0378-3758(98)00229-8View Article
- Auxiliary Simulation Results[http://www.math.yorku.ca/~xingao/biosupport.html]
- Reiner A, Yekutieli D, Benjamini Y: Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics 2003, 19: 368–375. 10.1093/bioinformatics/btf877View ArticlePubMed
- Storey JD, Tibshirani R: Statistical significance for genomewide studies. Proc Natl Acad Sci USA 2003, 100: 9440–9445. 10.1073/pnas.1530509100PubMed CentralView ArticlePubMed
- Sandberg R, Yasuda R, Pankratz DG, Carter TA, Del Rio JA, Wodicka L, Mayford M, Lockhart DJ, Barlow C: Regional and strain-specific gene expression mapping in the adult mouse brain. Proc Natl Acad Sci USA 2000, 97: 11038–11043. 10.1073/pnas.97.20.11038PubMed CentralView ArticlePubMed
- Pavlidis P: Using ANOVA for gene selection from microarray studies of the nervous system. Methods 2003, 31: 282–289. 10.1016/S1046-2023(03)00157-9View ArticlePubMed
- Liu G, Loraine AE, Shigeta R, Cline M, Cheng J, Valmeekam V, Sun S, Kulp D, Siani-Rose MA: NetAffx: Affymetrix probesets and annotations. Nucleic Acids Res 2003, 31: 82–86. 10.1093/nar/gkg121PubMed CentralView ArticlePubMed
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B 1995, 57: 289–300.
- Shapiro SS, Wilk MB: An analysis of variance test for normality (complete samples. Biometrika 1965, 52: 591–611.View Article
- Thompson GL: A unified approach to rank tests for multivariate and repeated measures designs. Journal of the American Statistical Association 1991, 86: 410–419.View Article
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.