Using Affymetrix GeneArray® chips, we examined the relative magnitudes of different sources of variation in microarray experiment. Analysis of variance using mixed-effects linear models is a common way to account for and test the significance of various factors contributing to overall variation . Due to limitations of our hierarchical unbalanced experimental design and relatively small number of degrees of freedom, we did not include factors that can potentially contribute to variation such as day of processing, scanning order, mRNA preparation, etc. We assume that such factors were not significant. However, to formally test this assumption, another experiment is needed.
We used a general linear model to partition variance for each probe set into three components. The first source was biological (i.e. animal-to-animal) variation. The biological variation may be confounded by technical variation at the mRNA preparation step, but this variation is probably relatively small compared to variation at the hybridization step . Thus, we assume that most of the variation for this effect was due to true biological differences among animals. The second source of variation was the effect of labeling. Although our experiments were carried out by the same person, using the same equipment, under the same experimental conditions as much as realistically possible, there is always some variation caused by minor environmental differences in temperature, duration, pipetting etc., which influences labeling efficiency. The third source of variation other than animal-to-animal variation and labeling-effect variation was residual error caused by differences in hybridization, scanning and other factors. To compare the relative magnitudes of different sources of variation, we estimated variance components and applied shrunken variance estimators that borrow information across genes. We constructed these shrunken variance estimators by shrinking a group of individual variance estimators toward their common corrected geometric mean . The amount of shrinkage depends on the variation on the individual variance components estimators. These estimators were shown to be robust in respect to variance heterogeneity in gene expression data among groups .
We found that our results depend on the image processing algorithm used: biological variation was the largest source when dChip or RMA were used, but when *.cel files were processed with GCRMA-EB or MAS 5.0, the largest source was residual error. Bakay et al., 2002, found that biological variation presumably caused by tissue heterogeneity and genetic polymorphism was a major source of variation while technical variation was minor . Han et al., 2004, found that biological variation was about of the same size as other sources combined . Whitney et al., 2003, found that inter-individual variation in gene expression profiles was correlated with gender, age, and the time of day at which the sample was taken. These intrinsic differences in expression patterns were likely caused by differences in genotype, although they might also reflect epigenetic or environmental factors . Oleksiak et al., 2002, in their studies of teleost fish have observed significant differences in gene expression levels between individuals from the same population and between different populations. These differences could be caused by genetic variation as well as other factors, including maternal effects and genotype-environment interactions . On the contrary, Dumur et al., 2004, found that day-to day variation was the main source of variation . Woo et al., 2004, in studies of inbred mice strains, detected that most of the genes had small biological variance, but about 10% of genes showed large variation between individuals .
We found that technical replicates within a biological replicate had higher and more consistent correlations with each other than with other biological replicates. Generally, our correlations were higher than those observed by Dobbin et al., 2005, for interlaboratory correlations between tumor samples  and were compatible with values for in-lab correlations obtained in another study .
The consistency of the hybridization step was evaluated using paired t-tests following by modeling of distribution of resulting p-values. The significance depends on the image processing algorithm used: the hybridization effect was not significant for MAS 5.0 (4.8% of genes were differentially expressed between two technical replicates) and RMA (2.3% of genes), but the proportion of differentially expressed genes was higher than expected by chance for dChip (10.8% of genes) and GCRMA-EB (13.6% of genes).
The low-level data were analyzed using four popular methods implemented in the BioConductor  package: dChip , MAS 5.0 , RMA , and GCRMA-EB . We found that different low-level data processing algorithms produced different results. We provide comparisons mainly to illustrate the compatibility of several algorithms. Evaluation of the strengths and weaknesses of different image processing algorithms may require other experimental settings, such as spike-in data. Shedden et al., 2005, performed a comprehensive comparison of seven image processing methods for Affymetrix arrays and demonstrated that the choice of image processing algorithm has a major impact on the results of microarray data analysis . The authors found that the dChip method operates consistently well, while MAS 5.0 and GCRMA-EB consistently performed poorly. GCRMA-EB had a particular disagreement with other methods when a t-test was used for group comparison, presumably because it might be more sensitive to the underlying statistical assumptions of a test (e.g. independence of genes). Similarly, we observed that estimates of the proportion of differentially expressed genes between two technical replicates at the hybridization stage were different than those for data processed with GCRMA-EB compared to other methods, which is consistent with finding of Shedden et al. .
The results presented here are specific for the systems being studied, and other experimental conditions may yield different estimates. For example, we used an outbred strain of rats, which had greater inherent biological variation than inbred strains. In cell cultures of inbred mice strains under otherwise equal conditions, the relative magnitude of biological variation presumably would be smaller. Different steps in microarray data analysis, such as normalization, transformation, and gene filtering, may affect results as well [31–35]. A microarray platform and microarray facility can also have a significant impact, as was demonstrated in several recent studies [25, 36–38]. Testing the influence of these various factors could be an interesting topic of future research.