Gene expression microarray is a widely used technology in functional genomics that allows to measure efficiently the expression level of thousands of genes in a single experiment. Among the wide spectrum of available array technologies and suppliers, two common technologies are the in-situ oligonucleotide synthesised GeneChips developed by Affymetrix  and the spotted microarrays which are microscope slide spotted with a variable number of probes according to the biological application. Spotted microarrays use either cDNA as probe (Incyte Human UniGEM, Dualchip form Eppendorf, academic platforms,...) or oligonucleotide (Agilent gene expression Microarray, Applied Biosystems gene expression Microarray, Codelink Bioarray from GE Healthcare, NCI from Operon,...). The three major types of gene expression microarray applications are the class comparison, the class prediction and the class discovery . In this paper, we focus on the preprocessing of spotted microarray data for a class comparison application where the goal is to identify differentially expressed genes between two conditions.
Whatever its application, the first analytical step in a spotted microarray experiment is the acquisition of an image file with an optical scanner. Then, the image analysis software segments the acquired image into spotted and unspotted regions and returns average and median of the pixels intensities for both the foreground and the surrounding area (named local background) of each spot. It is well known that the foreground intensity of a spot does not perfectly reflect the RNA abundance of its corresponding gene due to interferences of non-specific hybridization on the probe . These interferences are named background noise and arise from many sources such as non-specific binding, deposit left due to incomplete washing, intrinsic fluorescence of the glass slides  or optical noise of the scanner. Han et al. showed that such interferences can be minimized by optimizing the numerous steps of the microarray experiment, and more particularly the hybridization and the washing steps . The authors also showed that non-optimal protocols can lead to fold-change compression. In that context, de cremoux et al. discussed also the importance of pre-analytical steps for transcriptome analysis .
Raw data returned by the scanner have to be preprocessed in three successive steps . The first step is the background correction for which the standard method implies to subtract an estimation of the background noise of a spot from the foreground intensity. The background noise is usually calculated as the mean of the pixels of its surrounding area and is named 'local background intensity'. The second step is the transformation of the corrected intensities for which the standard method consists in a log2 transformation. The third step is the normalization that is performed to calibrate the signal from different microarrays and to compare them together on an identical scale. Commonly used methods to normalize spotted microarray data either perform a global median normalization or a loess normalization .
The standard background correction method assumes that foreground intensities are affected additively by the background noise. Although well motivated, this standard method was widely criticized for several reasons. The best known drawback is that local background subtraction induces problems when foreground intensities are lower than local background intensities. Correction leads to negative corrected intensities and consequently to missing values after log2 transformation. Another cited drawback is the extreme variability of the log2 fold-changes obtained at low corrected intensities. To circumvent these drawbacks, alternatives to the standard background were proposed. [9–13]. Alternatively, the generalized logarithmic (glog) transformation was proposed as a valuable alternative to the log2 transformation [14, 15] in order to stabilize the variance of low corrected intensities. The transformation is determined by the equation:
where α and λ are two positive parameters. The glog transformation is sometimes referred as the generalized arcsinh transformation because of the relationship between the arcsinh and the log functions.
. Methods were developed to estimate the parameters of the glog transformation [16, 17]. Unlike the log2 transformation, the glog is defined for negative corrected intensities.
Eight distinct background correction methods were assessed for differential expression using data from two-color spotted cDNA microarrays by Ritchie et al. . In this study, the variance stabilization method (VSN) of Huber et al.  was considered as a background correction method but was actually the combination of the Standard background correction method with an arcsinh transformation where parameters are computed to perform transformation and normalization in a single step. After the other background correction methods, a log2 transformation and a loess normalization were applied on the data before computing fold-changes with SAM regularized t-statistics and empirical Bayes moderated t-statistics. Using 9 Lucidea Universal ScoreCard (LUS) controls in a spike experiment, the authors also compared the average bias for each background correction method. Various transformation methods were compared by Cui et al. . The glog transformation was recommended when low corrected intensities appear highly variable.
In this paper, we address the problem of the background correction and transformation of spotted microarray data and the subsequent impact on fold-change compression and on the variance of processed intensities. The first objective of this study was to compare various background correction methods and transformations commonly used in the literature. We propose to consider these two steps together because alternatives to the standard background correction methods as well as alternatives to the log2 transformation were initially proposed to circumvent the same problems: the high variability of low corrected intensities in the log2 scale and the missing values obtained after a log2 transformation of negative corrected intensities. These two successive preprocessing steps were assessed on datasets generated with two spotted microarray platforms (Duachip from Eppendorf and Codelink from GE Healthcare) as well as with a quantitative PCR platform (Taqman) from the MicroArray Quality Control (MAQC) project . Data generated by the MAQC project provide a unique opportunity to assess the advantages and disadvantages of data analysis methods with the aim of reaching a consensus on microarray data analysis. Accordingly, data from the MAQC project were used previously in order to compare the third preprocessing step, i.e. the normalization . A second objective of the study was to confirm the additive effect of the background noise on the foreground, the existence of which is the underlying hypothesis of the standard background correction method.