Leveraging two-way probe-level block design for identifying differential gene expression with high-density oligonucleotide arrays
© Barrera et al; licensee BioMed Central Ltd. 2004
Received: 23 December 2003
Accepted: 20 April 2004
Published: 20 April 2004
To identify differentially expressed genes across experimental conditions in oligonucleotide microarray experiments, existing statistical methods commonly use a summary of probe-level expression data for each probe set and compare replicates of these values across conditions using a form of the t-test or rank sum test. Here we propose the use of a statistical method that takes advantage of the built-in redundancy architecture of high-density oligonucleotide arrays.
We employ parametric and nonparametric variants of two-way analysis of variance (ANOVA) on probe-level data to account for probe-level variation, and use the false-discovery rate (FDR) to account for simultaneous testing on thousands of genes (multiple testing problem). Using publicly available data sets, we systematically compared the performance of parametric two-way ANOVA and the nonparametric Mack-Skillings test to the t-test and Wilcoxon rank-sum test for detecting differentially expressed genes at varying levels of fold change, concentration, and sample size. Using receiver operating characteristic (ROC) curve comparisons, we observed that two-way methods with FDR control on sample sizes with 2–3 replicates exhibits the same high sensitivity and specificity as a t-test with FDR control on sample sizes with 6–9 replicates in detecting at least two-fold change.
Our results suggest that the two-way ANOVA methods using probe-level data are substantially more powerful tests for detecting differential gene expression than corresponding methods for probe-set level data.
Keywordsgene expression analysis differential expression high-density oligonucleotide array ANOVA FDR
The use of DNA microarrays for monitoring the expression levels of thousands of genes simultaneously has generated a stream of methodological and computational challenges. In particular, the reliable identification of differentially expressed genes across different tissues, time points or treatment conditions is the most common and central task in the majority of such experiments . This task has been cast as a multiple hypothesis-testing problem of the simultaneous test for each gene j of the null hypothesis of no change in expression level between two or more experimental conditions. Tackling this problem generally involves the following key steps: (1) computing a test statistic for each gene j, T j and determining the significance of each test statistic using parametric assumptions or by appropriate estimation of a null distribution, and (2) employing an appropriate multiple testing procedure to determine which hypotheses to reject while controlling an appropriate error rate [2, 3].
A slew of statistical models has been developed to overcome the limitations of the classical t-test, rank-sum methods, and other one-way ANOVA methods currently applied to detecting differential gene expression . Under non-normal situations the classical parametric t-test is too conservative, and like the Wilcoxon test, with its lack of distributional assumptions, suffers from low power . Non-parametric variants of the t-test include the use of permuted data sets to estimate the null distribution of t-statistics for each gene . With a small number of replicates, the former method suffers from coarse resolution, resulting in too few or too many genes called differentially expressed depending on the significance threshold. A mixture-modeling approach to calculate the distribution of t-statistic type scores has been proposed to overcome that limitation . This approach is similar in spirit to the significance analysis of microarrays (SAM), an increasingly popular method which also uses a t-statistic type score . SAM uses permutations of repeated measurements and then pools estimated null statistics for each gene to compute an overall error rate defined as the false discovery rate (FDR) for genes identified as differentially expressed [2, 7].
The false discovery rate (FDR) is the rate at which features called significant are truly null. Here, it is the expected proportion of genes erroneously identified as differentially expressed. The control of the FDR as a multiple testing procedure was proposed by Benjamini and Hochberg as a more powerful alternative to controlling the family-wise error rate (FWER) when considering multiple null hypotheses simultaneously . Control of the FDR implies control of the FWER when all the null hypotheses are true . Bonferroni type procedures which control FWER are considered too stringent because they control the probability of making any Type I error among the hypotheses under consideration, thus rejecting too few hypotheses when identifying differentially expressed genes . On the other hand, control of the FDR has been increasingly favored for high-throughput screenings such as microarray experiments, striking a balance between FWER control and the per-comparison-error-rate (PCER) control which often yields too many false positives.
The persisting high cost of microarrays, in particular of commercial high-density oligonucleotide arrays (HDAs) such as the Affymetrix GeneChip, and the scarcity of samples in many experiments, continue to severely limit the number of replicates used per condition, and thus restrict the potential gain in statistical power of the statistical methods described above with increasing sample size. In addition, the statistical methods described above are generally applied to experiments using both cDNA microarrays and HDAs. The differences in design between the two microarray platforms have warranted different algorithms for aspects of array analysis such as gene expression level calculation, image analysis, and normalization .
In this light, instead of developing a new statistical method that can be generally applied to experiments using both cDNA microarrays and HDAs as those described above, we can leverage the unique design of HDAs for better differential gene expression identification. On HDAs supplied by Affymetrix, 11–20 25-base oligonucleotide probes that are exact complements to different fragments of the same gene target form a probe set. Unlike cDNA microarrays, where a single intensity ratio is collected for each gene, 11–20 probe-level measurements per probe set are collected simultaneously for any single array hybridization. However, these redundant measurements are typically summarized as one value in the form of an average difference (AD) or model-based expression index (MBEI) for the purpose of statistical analysis . Using probe-level measurements in identifying differentially expressed genes and blocking on the probe in an analysis of variance (ANOVA), combined with FDR adjustment for the multiple testing problem, are the key differences between our proposed approach and previously described related methods.
Although carrying out statistics at the probe-level immediately increases the sample size by at least an order of magnitude, it is warranted due to the large and systematic differences that are known to exist among probes that survey the same gene . Due to these probe-specific biases, variation induced by probes is larger than that induced by array replicates . The use of the probe as a blocking factor in testing for differential gene expression in a two-way ANOVA on probe-level data is thus expected to be more sensitive than previously described methods.
Chu et al. also took an ANOVA approach at the probe level, however the experimental design of their study was different from ours, which led to a more complicated model than what we propose here . Chu et al. compared their method to SAM on the same data set, but identified a very different set of differentially expressed genes (Table 3 in ). As pointed out by other researchers, this method cannot be easily benchmarked only based on data sets of unknown positives and negatives . Lemon et al. recently proposed a probe-level Logit-t method that was shown to be superior to other popular probe set methods . Independent from these two studies, we reached the same conclusion that using probe-level data could significantly improve the quality of resultant gene list. In addition, we demonstrate the use of a rank-based Mack-Skillings test, which does not depend on any distribution models required by the two parametric studies mentioned above. Furthermore, by using an FDR-based criterion, our method not only ranks genes but also suggests statistically rigorous thresholds for gene selection.
In this study, we compared both the sensitivity and specificity of parametric two-way ANOVA and the nonparametric Mack-Skillings test on probe-level data against the commonly-used t-test and Wilcoxon test on probe-set level data. For all tests, we employed FDR-controlling procedures described above to account for the multiple testing problem. Two public data sets are used for benchmarking purposes: the Lemon set, where thousands of genes are expected to be differentially expressed and the Affymetrix Latin-square data set where only 14 spiked genes out of over 9000 genes on the array are expected to show real change [15–17]. We systematically tested the effects of key factors such as expression level (concentration) of the RNA transcripts, number of replicates, amount of change to be detected, in addition to the statistical methods. In almost all cases, the proposed probe-level methods outperformed previous methods based on probe-set level calculations. We also found that the two-way methods are most sensitive to transcript concentration between 4 pM and 128 pM and fold change greater than two. By comparing receiver operating characteristic (ROC) curves, we demonstrated that by taking advantage of the HDA design, the two-way methods applied on only 2–3 replicates can exhibit the same high sensitivity and specificity as a SAM-like t-test with FDR-control using 6–9 replicates for detecting at least two-fold change. Therefore, by taking advantage of the HDA design, the present limitations of one-way ANOVA-type methods can be overcome. Matlab scripts for our methods are available on http://carrier.gnf.org/publications/ProbeStatistics.
We compared the performance of the commonly used one-way ANOVA methods described above, against the two-way ANOVA methods using two publicly available microarray data sets. The first set of microarray experiments involves groups of human fibroblast cells in three conditions – serum starved, serum stimulated, and a 50:50 mixture of starved/stimulated – with six replicate Affymetrix HuGeneFL arrays in each group . For this set, a total of 7011 probe sets were examined per array after the preprocessing steps. The second data set is the Affymetrix Latin Square Data for Expression Algorithm Assessment . In 11 experiments (denote these as experiments A-K), 14 groups of human gene transcripts in 14 different known concentrations were spiked into a background RNA mixture and hybridized to 3 replicate microarrays. In two additional experiments (denote these as experiments L and M), the same Latin Square design is followed but 12 instead of 3 replicates were used per condition. In the following study, we tracked only 12 of 14 genes due to errors in the original data set for two of the probe sets. Transcript concentrations for each spiked gene ranged from 0 to 1024 pM over the various experiments . For this data, the Affymetrix HG_U95A array is used and a total of 9024 probe sets in each array were analyzed as described in the following sections after the preprocessing step.
We first assessed the relative sensitivity of the statistical tests by comparing the number of genes identified as differentially expressed when controlling the FDR using either an LSU procedure or a resampling-based approach. We compared the serum starved and serum stimulated data sets between which, the expression levels of a large number of genes were expected to vary significantly . We randomly sampled three replicates per condition to make the results comparable to later analyses for which only three replicates are available. The process was repeated 100 times and results were averaged.
To assess whether we are detecting biologically meaningful change, we also applied the statistical tests on random pairs of combinations of data values within the same treatment condition, i.e. comparing serum starved samples or serum stimulated samples among themselves, respectively. The dashed lines in Fig. 1 and Fig. 2 show that as expected all methods do not identify any genes as differentially expressed within a reasonable level of FDR control. The result ensures that the extra sensitivity of the proposed two-way methods was not gained at the expense of sacrificing robustness. The specificity of the methods will be further studied later.
To study the necessity of explicitly modeling the probe-treatment interaction in our ANOVA model, F-tests were applied to all genes. 5151 out of 7009 genes (73%) tested had P-values less than 0.05, even after a Bonferroni adjustment. A possible explanation is that the interaction term captures the changes in probe cross-hybridization properties caused by the large differences in the mRNA content between the two sample groups.
Percentage of identical genes called significant by pairs of procedures
A representative run of the probe-level Logit-t method  on the Lemon data set identified 1032 genes as differentially expressed when controlling the LSU-FDR at level q = 0.05 as above – less than half the number identified by the two-way methods. The Logit-t method demonstrated a level of sensitivity similar to t-test (Fig. 8) [see Additional File 1].
Effect of concentration and fold change
With as little as three replicates, we see in Fig. 3 that the parametric two-way ANOVA and the Mack-Skillings test are very sensitive to two-fold changes when testing within a maximum concentration range of 4 to 128 pM (Fig. 3a). With a four-fold change, the two-way methods are able to successfully detect nearly all spiked gene transcripts in all pairs of experiments at FDR level q = 0.05 with the exception of one changing from 0.25 to 1 pM (Fig. 3b). Only with an eight-fold change and maximum concentration between 32 and 128 pM do we begin to detect the spiked genes when using the t-test and controlling the FDR at q = 0.05 (Fig. 3c). These differences may explain the higher sensitivity of the two way methods shown in Figs. 1 and 2.
Specificity and Sample Size Effect
In the previous sections, we compared the application of two-way ANOVA methods on probe-level data to standard statistical methods on probe-set level data in identifying differential gene expression in microarray experiments. We aimed to show the importance of leveraging HDA design in the choice of statistical test and not discarding information by working with a probe-set summary or average of probe-level data.
Using two-way ANOVA methods, we systematically accounted for probe-specific biases in hybridization or measurement efficiency, and thus achieved higher sensitivity and specificity compared with the t-test in the range of conditions investigated with varying levels of sample size, fold change, and maximum spike-in concentration. In the Lemon serum-starved and serum-stimulated data set, the two-way methods coupled with LSU-FDR control identified more than twice as many genes as differentially expressed compared with the t-test. With the Latin Square data set, we confirmed the specificity of the two-way methods by analyzing the ROC curves and observed that with as few as three replicates, the two-way ANOVA has a 91% sensitivity with a 99.84% specificity.
Parametric methods are commonly criticized for their lack of sensitivity and specificity when detecting differential gene expression. However, we discovered that the use of an LSU FDR-controlling procedure with the parametric two-way ANOVA method yielded the most promising results in terms of higher sensitivity and specificity for detecting differentially expressed genes. The outstanding performance of the parametric two-way ANOVA with the LSU FDR-controlling procedures relative to the other combinations of nonparametric tests and the resampling-based FDR in our study suggests that in the case of gene expression analysis with HDAs, there is a substantial gain in power by working with probe-level data, and that proper treatment of this data by appropriate normalization procedures and the application of appropriate transformations (logarithm, square root) can allow us to maintain assumptions critical to the method chosen.
Even without parametric assumptions, the advantage of treating the probe as a blocking factor was clearly demonstrated by the results using the Mack-Skillings test. Thus, if more conservative estimates from a two-way ANOVA analysis are desired, we can choose to use the results from the nonparametric Mack-Skillings test and still have a substantial gain in power over the t-test. The same Affymetrix Latin Square data set has been recently studied by Lemon et al. using a probe-level Logit-t method and a low false positive rate of 0.03% was achieved at the sensitivity of 87% . The parametric two-way ANOVA achieved essentially the same performance and the nonparametric Mack-Skillings method showed an even better false positive rate of 0.01% with the same sensitivity http://carrier.gnf.org/publications/ProbeStatistics.
The power of a statistical test is a function of its sensitivity and this further depends on (1) the magnitude of the real difference to be measured, (2) the noise level or standard deviation of sample measurements, (3) the significance level at which the tests are done, and (4) the sample sizes . Limitations inherent to the technology platform and suboptimal data preprocessing procedures can reduce the magnitudes of the real differences being measured. As we observed, there is a nonlinear relationship between expression values and the actual spike-in concentrations at the lower and higher ends of the concentration spectrum for the Latin Square data set due to detection and measurement saturation issues . With the use of probe-level data in a two-way ANOVA, we take advantage of informative probe-level differences between treatments and eliminate noise due to probe efficiency differences. In this way, the two-way ANOVA methods are better able to discern treatment differences. Control of the third factor depends on the number of false leads that one is willing to incur and this in turn varies with the goals of the experiments. Finally, the control of the fourth factor is limited by resource constraints, and in microarray experiments, this continues to be a key issue due to the costs of microarrays (two to three per condition in most labs) and availability of samples to be analyzed.
It is well-known that increasing sample size increases sensitivity for all statistical tests, and given enough samples, one can discern biologically meaningful changes well below the differences currently measured. That we can detect differentially expressed genes using two-way ANOVA methods with only two or three replicates and get comparable results with the use of the t-test on at least 6–9 replicates is evidence of the higher power of these methods on probe-level data all other factors being equal.
In this study, we have shown that coupled with an easily implemented linear step-up (LSU) FDR-controlling procedure, parametric and nonparametric two-way ANOVA methods using probe-level data are substantially more powerful tests than standard methods applied to probe-set level data for detecting differential gene expression. Their advantage in power is especially pronounced when working with samples with as few as two or three replicates – the most common sample sizes for microarray experiments . Although we only examined two sets of conditions in our data sets, the two-way ANOVA is a general design which easily handles other array experiment setups with two or more levels of treatments or time series points. As a well-known and extensively used statistical method in many fields, the two-way ANOVA has inspired a body of literature for dealing with many special cases, such as unequal group sizes due to missing data from replicates, which frequently occur in microarray experiments [21, 22]. Clearly, the ease of implementation of the two-way ANOVA-type methods coupled with LSU-FDR control, and the results shown herein, strongly suggest its use and further development for identifying differentially expressed genes.
For the following study, we use methods focusing on the two-sample case. We briefly describe the four well-known statistical tests and the two forms of FDR control employed in our study. Since the parametric statistical tests require the key assumption of equal group variances, logarithms of probe-level intensities and summarized expression values were taken to provide a better approximation .
The t-statistic and its variants are powerful measures for detecting differential expression because they permit selection of genes with maximal difference in mean level of expression between two groups and minimal variation of expression within each group . Here we employ the classical t-test which is a statistically equivalent test to the parametric one-way ANOVA in the two-sample case . As done in Reiner et al., we obtain the p-values directly using the t-distribution with appropriate degrees of freedom depending on the sample sizes .
Wilcoxon rank sum test
For each gene, the distribution-free rank sum test transforms the sorted gene expression values across experiments into ranks and then tests the null hypothesis of equality of the means of the ranked values between experimental conditions . For small sample sizes, exact p-values can be obtained from pre-calculated statistical tables. A normal approximation of standardized test statistics is typically used to obtain p-values for larger sample sizes. In this case, it was used for samples of size 9 and greater.
The use of ANOVA in testing for the equality of group means relies on the computation of the ratio of the mean square variation among group means to the mean square variation within groups. A large ratio indicates a significant difference between group means. The one-way ANOVA model, a generalization of the t-test reliably detects differences between group means only when other factors, which can cause large variation within groups, are controlled.
In the case of HDAs, probe-level intensities are a source of large and systematic variation. Thus, instead of using the summarized expression indices for each probe set for hypothesis testing and ignoring individual probe effects, we use intensity values for each probe in a probe set and control for probe-specific biases by considering probe type as a blocking factor in a two-way ANOVA. For each probe set, replicate measurements of log-transformed probe-level intensities for each probe are segregated into blocks across the treatment conditions. Two types of hypothesis tests can be performed in this case: (1) the test of the equality of probe or block means to assess the significance of explicitly modeling probe-level effects, and (2) the test of equality of treatment means having accounted for variation caused by individual probes. The ANOVA model is:
Y ijk = μ + P i + T j + PT ij + εk(ij),
where Y ijk is the logarithm of the probe-level intensity measurement, μ is the overall mean, P i is the effect of the probe i, T j is the effect of treatment j, PT ij is the effect of the interaction between the probe i and treatment j, and εk(ij)is the error. The probe-treatment interaction term is necessary based on our results on the Lemon data set (see Results for details).
In the first test we can measure the ratio of the mean square variation among blocks to the mean square variation within groups, where each group is a treatment/block combination. The significance of these probe-level differences have been documented and were again confirmed by the extremely low p-values associated with block effects in our study . However, it is not of particular interest that measured intensities for probe A differ significantly from those of probe B in a probe set when testing for differential gene expression . Here we only measure the amount of such fluctuations and remove it from the estimate of within group variability. In the second test, the test of interest for identification of differential gene expression, we measure the ratio of the mean square variation among treatments to the mean square variation within treatment/block groups. The p-values corresponding to the ratios for the second test are determined using an F-distribution whose numerator has degrees of freedom equal to k-1 where k is the number of treatments, and whose denominator has pk(r-1) degrees of freedom, where p is the number of probes in the probe set and r is the number of replicates. In this study, we maintain the assumption of equal group sizes because there are corresponding probes for each probe set across experimental samples profiled using the same array type, and in the data sets used, there are equal numbers of replicates [22, 23]
This distribution-free alternative to the classic two-way ANOVA model above transforms the probe-level intensities into ranks for each probe across the samples (replicates and conditions). It is a generalization of the nonparametric Friedman test when there are replicates. This test of no change across experimental conditions uses the Mack-Skillings statistic to measure the squared deviation of the sum of the ranks across the probes in a probe set for each treatment condition, from the expected sum based on no treatment differences. As with the Wilcoxon test, the exact p-values for small sample sizes can be found in statistical tables or computed numerically. Large-sample approximation allows the estimation of p-values using a chi-square distribution with k-1 degrees of freedom, where k is the number of experimental conditions .
Linear step-up (LSU) procedure
The linear-step up (LSU) procedure originally described by Benjamini and Hochberg (1995) controls the FDR rate at level q by rejecting all hypotheses H(i), i = 1,...,k where are the ordered p-values. Here, we compute the multiplicity adjusted p-values:
and thus associate an FDR for each hypothesis test .
Resampling-based methods seek to gain more power by utilizing the empirical dependency structure of the data to construct more powerful FDR-controlling procedures [3, 9]. Here we generate an m × n matrix of resample-based p-values [p ik ] for m probe sets using n permutations of treatment labels (n = 100 in this study). We naively estimate a resampling-based FDR for each probe set by ordering the observed p-values P j and starting with the largest p-value P(m)we compute:
V is the average number of assumed null p-values from all permutations as extreme as the observed value under consideration, whereas R is the number of observed p-values as extreme as the same value under consideration. The ratio of these values gives an estimate of the FDR associated with the rejection of the hypothesis under consideration.
The statistical tests described above were performed using Matlab. Built-in Matlab functions were used to compute the test statistics and associated p-values, and FDR adjustments were implemented as described above.
Microarray intensity normalization and gene expression calculations were performed using dChip . Probe values were first normalized and their background intensities subtracted. Probe set expression values were computed using the PM-only model for expression using standard outlier detection. An additional normalization step was used to adjust the probe set expression values of each array to a median expression level of 200. Aside from the previously published advantages for using only PM probes intensity calculations using only PM probes tend to result in higher values and few if any negative values, alleviating complications when log transforming the data [11, 10]. In addition to the preprocessing using dChip, we also filtered the probe sets so that at least one sample group has an average expression level of 20. This is done in order to prevent comparing expression levels of genes that are either insignificantly expressed in both treatment conditions or are expressed at the noise level.
Additional file 1
File name Figure 8
File type PDF
Description of the file: Number of probe sets identified by Logit-t method in the Lemon data set.
Number of probe sets called significant versus LSU-adjusted FDR in the Lemon data set computed with t-test, Wilcoxon test, Logit-t method, parametric two-way ANOVA and nonparametric Mack-Skillings method. Dashed lines indicate the control versus control comparisons.
Analysis of variance
false discovery rate
high-density oligonucleotide array
receiver operating characteristic
linear step-up procedure
- Pan W: A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics 2002, 18: 546–554. 10.1093/bioinformatics/18.4.546View ArticlePubMedGoogle Scholar
- Storey JD, Tibshirani R: SAM thresholding and false discovery rates for detecting differential gene expression in DNA microarrays. In To appear in The Analysis of Gene Expression Data: Methods and Software. Springer, New York (Edited by: Parmigiani G, Garrett ES, Irizarry RA and Zeger SL). 2003.Google Scholar
- Dudoit S, Yang YH, Callow MJ, Speed TP: Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat Sinica 2002, 12: 111–139.Google Scholar
- Troyanskaya O, Garber ME, Brown PO, Botstein D, Altman RB: Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics 2002, 18: 1454–1461. 10.1093/bioinformatics/18.11.1454View ArticlePubMedGoogle Scholar
- Pan W, Lin J, Le CT: How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach. Genome Biol 2002, 3: research0022.1–0022.10. 10.1186/gb-2002-3-5-research0022Google Scholar
- Pan W, Lin J, Le C: A Mixture Model Approach to Detecting Differentially Expressed Genes with Microarray Data. To appear in Functional & Integrative Genomics 2003. (Also Report 2003–004, Division of Biostatistics, University of Minnesota) 2003.Google Scholar
- Tusher V, Tibshirani R, Chu G: Significance analysis of microarrays applied to transcriptional responses to ionizing radiation. Proc Natl Acad Sci USA 2001, 98: 5116–5121. 10.1073/pnas.091062498PubMed CentralView ArticlePubMedGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc B 1995, 57: 289–300.Google Scholar
- Reiner A, Yekutieli D, Benjamini Y: Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics 2003, 19: 368–375. 10.1093/bioinformatics/btf877View ArticlePubMedGoogle Scholar
- Zhou Y, Abagyan R: Algorithms for high-density oligonucleotide array. Curr Opin Drug Discov Devel 2003, 6: 339–345.PubMedGoogle Scholar
- Li C, Wong WH: Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection,. Proc Natl Acad Sci 2001, 98: 31–36. 10.1073/pnas.011404098PubMed CentralView ArticlePubMedGoogle Scholar
- Yang Y, Hoh J, Broger C, Neeb M, Edington J, Lindpainter K, Ott J: Statistical Methods for Analyzing Microarray Feature Data with Replications. J Comput Biol 2003, 10: 157–169. 10.1089/106652703321825946View ArticlePubMedGoogle Scholar
- Chu TM, Weir B, Wolfinger R: A systematic statistical linear modeling approach to oligonucleotide array experiments. Math Biosci 2002, 176: 35–51. 10.1016/S0025-5564(01)00107-9View ArticlePubMedGoogle Scholar
- Lemon WJ, Liyanarachchi S, You M: A high performance test of differential gene expression for oligonucleotide arrays. Genome Biology 2003, 4: R67. 10.1186/gb-2003-4-10-r67PubMed CentralView ArticlePubMedGoogle Scholar
- Affymetrix Latin Square Data for Expression Algorithm Assessment: Affymetrix Inc., Santa Clara, CA, USA [http://www.affymetrix.com/analysis/download_center2.affx]
- Lemon WJ, Palatini JJT, Krahe R, Wright FW: Theoretical and experimental comparisons of gene expression indices for oligonucleotide arrays. Bioinformatics 2002, 18: 1470–1476. 10.1093/bioinformatics/18.11.1470View ArticlePubMedGoogle Scholar
- Liu W, Mei R, Di X, Ryder TB, Hubbell E, Dee S, Webster TA, Harrington CA, Ho M, Baid J, Smeekens SP: Analysis of high-density expression microarrays with signed-rank call algorithms. Bioinformatics 2002, 18: 1593–1599. 10.1093/bioinformatics/18.12.1593View ArticlePubMedGoogle Scholar
- Hoffmann R, Seidl T, Dugas M: Profound effect of normalization on detection of differentially expressed genes in oligonucleotide microarray analysis. Genome Biol 2002, 3: research0033.1–0033.11. 10.1186/gb-2002-3-7-research0033Google Scholar
- Rice JA: Mathematical Statistics and Data Analysis. 2 Edition Duxbury Press, Belmont, CA 1995.Google Scholar
- Rajagopalan D: Comparison of statistical methods for oligonucleotide arrays. Bioinformatics 2003, 19: 1469–1476. 10.1093/bioinformatics/btg202View ArticlePubMedGoogle Scholar
- Hollander M, Wolfe D: Nonparametric Statistical Methods. Wiley, New York 2 Edition 1999.Google Scholar
- Sokal RF: Biometry. Freeman, New York 3 Edition 1995.Google Scholar
- Jobson JD: Applied Multivariate Data Analysis, Volume I: Regression and Experimental Design. Springer-Verlag, New York 1991.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.