"Harshlighting" small blemishes on microarrays
© Suárez-Fariñnas et al. 2005
Received: 26 October 2004
Accepted: 22 March 2005
Published: 22 March 2005
Skip to main content
© Suárez-Fariñnas et al. 2005
Received: 26 October 2004
Accepted: 22 March 2005
Published: 22 March 2005
Microscopists are familiar with many blemishes that fluorescence images can have due to dust and debris, glass flaws, uneven distribution of fluids or surface coatings, etc. Microarray scans show similar artefacts, which affect the analysis, particularly when one tries to detect subtle changes. However, most blemishes are hard to find by the unaided eye, particularly in high-density oligonucleotide arrays (HDONAs).
We present a method that harnesses the statistical power provided by having several HDONAs available, which are obtained under similar conditions except for the experimental factor. This method "harshlights" blemishes and renders them evident. We find empirically that about 25% of our chips are blemished, and we analyze the impact of masking them on screening for differentially expressed genes.
Experiments attempting to assess subtle expression changes should be carefully screened for blemishes on the chips. The proposed method provides investigators with a novel robust approach to improve the sensitivity of microarray analyses. By utilizing topological information to identify and mask blemishes prior to model based analyses, the method prevents artefacts from confounding the process of background correction, normalization, and summarization.
Analysis of hybridized microarrays starts with scanning the fluorescent image. For high-density oligonucleotide arrays (HDONAs) such as Affymetrix GeneChip® oligonucleotide (Affy) arrays, the focus of this paper, each scanned image is stored pixel-by-pixel in a 'DAT' file. As the first step in measuring intensity of the hybridization signal, a grid is overlaid, the image is segmented into spots or features, and the pixel intensities within each of these are summarized as a probe intensity estimate (See reviews  and  for cDNA chips). The probe-level intensity estimates are stored in a 'CEL' file. Each gene is represented by pairs of probes, each representing another characteristic sequences and a 'mismatch', which is identical, except for the Watson-Crick complement in the center. Expression of a gene is estimated from such a probe set by applying algorithms for background correction, normalization, and summarization.
The quality of data scanned from a microarray is affected by a plethora of potential confounders, which may act during printing/manufacturing, hybridization, washing, and reading. Each chip contains a number of probes specifically designed to assess the overall quality of the biochemistry, such as 'checkerboards' in the corners and borders, whose purpose is, e.g., to indicate problems with the biotinylated B2 hybridization. Affymetrix software provides for a number of criteria to assess the overall quality of a chip, such as percent present calls, scaling factor, background intensity, and overall pixel-to-pixel variation (raw Q). Software packages such as Bioconductor for R  have implemented biochemical quality control tools such as RNA degradation plots. If a quality problem is found, however, these criteria and tools do not easily suggest a remedy and they have little sensitivity to detect localized artefacts, like a speck of dust or a localized hybridization problem. Although such physical blemishes obviously affect the expression estimates, they have hitherto been only narrowly addressed in the literature. Thus, there are currently no safeguards to signal potential physical blemishes. Instead, researchers are merely advised to carefully inspect the chip images visually [4, 5]. Given the high variance among the hundreds of thousands of probes and their random allocation on the chip, it is impossible to visually detect any but the starkest artefacts. For two-colour cDNA arrays, a Bayesian network approach has been proposed , based on the 'features' of the pixel distribution within each probe, yet, due to the standardized manufacturing process, the probes on an oligonucleotide array have too few 'features' for such an approach to be effective.
As the price of microarrays continues to drop, a typical microarray experiment now contains several chips, each representing a sample obtained under conditions that were similar except for the experimental factor under investigation. Having collections of chips available offers new strategies not only for analyzing the effect of the experimental factor, but also for identifying blemishes. The power of having several chips available was first harnessed for estimating mRNA expression levels by the 'robust multichip average' (RMA) method . One of the assumptions underlying the RMA model is that probes across chips are highly correlated, due to differences in their affinity [8, 9] and because only a small proportion of genes are differentially expressed in any experimental setting. This correlation should be even higher for the mismatches, because they are less likely to be affected by the specific changes in gene expression induced by the experimental factor. Given the volume of pixel level data, (>50 megapixels per image) it is desirable to devise algorithms that work from the 100 times smaller probe level files, the same information used in traditional signal value estimation approaches.
Psoriasis is thought to be due to an overly active immune system [10, 11]. To study how the immune response of leukocytes isolated from blood can be affected by drugs that may serve to control autoimmune diseases like psoriasis, blood was drawn from five volunteers under a protocol that had been approved by The Rockefeller University Hospital Institutional Review Board .
For each subject, peripheral blood mononuclear cells (PBMCs) were isolated and cultured in six Petri dishes. Four cultures were activated with an anti CD3/CD28 antibody, two of which were pre-treated with a repressor drug. Two cultures served as control without drug or activation. One of the two sets of control, activated, and pre-treated cultures (subject 1 and 2) was analyzed after 6, the other after 24 hrs. (For subjects 3, 4, and 5, only one time point is available.) All samples were hybridized to Affymetrix HuU95av2 chips.
Our method allows us to identify spatially correlated regions that are unlikely to originate from random fluctuations. To demonstrate that the statistical anomalies detected in the pseudo images at the probe level (Figure 2 and Figure 3) are, in fact, physical blemishes, we inspected the corresponding raw image at the pixel level. The regular artefacts seen (shadow, circle, cloud, etc.) are clearly blemishes, even if the precise nature of the physical blemish may not be known. Still, the difference in features between blemishes suggests different causes.
A number of factors are known to cause bright or dark spots in fluorescence micrographs. Dust on the front cover slip will cause a dark, out-of-focus shadow. Common white paper is bleached with strongly fluorescent dyes, so fibres from tissue paper ordinarily used for cleaning cause intense glare. Many organic solvents, detergents, and other chemicals will fluoresce when concentrated, so leftover droplets or condensates will appear as bright regions, regardless of whether they are in front or behind the focal plane. A crack in the glass would ordinarily be invisible to fluorescence microscopy – except for its ability to accumulate such substances. Glass will normally be coated with substances to prevent the direct binding of fluorophores to it; however, any damage to the fragile coating will cause fluorescent streaks. Illumination with a coherent source such as a laser, as opposed to a broadband source such as a xenon lamp, has specific artefacts such as speckle. In addition, the arrays themselves are manufactured through photolithographic techniques and may contain occasional damage.
The areas outside the dark clouds do not appear to be any grainier, so it does not seem to be a change of exposure setting or other simple global change. The image analysis software reports a single, global pixel-to-pixel variation Qraw; it would be useful to have a local quality measure as well, in a fashion similar to the reported background estimate for probe intensities. All dark clouds we found impinge on the array borders. We have no conjecture as to the physical origin of this problem.
The two artefacts crossing the left border of Figure 2 suggest yet another reason for blemishes on microarrays. Only one of our chips displayed this artefact, but it did so twice on the same border. Neither the raw image nor physical examination of the chip in a dissection microscope provided any hints to the possible cause (data not shown).
There are myriad possible explanations for what caused this striking artefact. A perfectly round structure with outliers concentrated near its perimeter, evocative of the 'coffee stain rings' phenomenon , suggests that a bubble (or a drop) may have formed, during the microfluidic stage, condensation after the washing stage, or as a manufacturing defect.
To determine the extent to which such artefacts may affect standard analyses, we compared the activated vs. the repressed samples (two each) for patient 2, and studied whether masking the blemishes affects the list of differentially expressed genes.
Genes whose expression estimates changed by more than 0.1 log2 through filtering were considered as 'altered' by filtering. The 'bright spot', where about 39 probes were affected, altered the expression of 16 genes by up to 1.37 log2. The 'shadowy circle' altered the expression of about 380 genes; more than 50 of them by more than 0.5 log2. The 'dark spot' affected 47 probes, altering expression of 103 genes by up to 1.6 log2. The 'cloud' altered the expression of 700 genes, 83 of them by more than 0.5 log2. The dirt covering around 25 × 25 probes, affected around 376 probes, altering 148 genes, 16 of them by more than 0.5 to a maximum of 1.26 log2.
Finally, we compared the two conditions (absence vs. presence of a repressor), mirroring masked probes on both on the affected and the corresponding chip. As an exploratory criterion, we used the modified (paired) t-test suggested in Smyth  from the limma package of the Bioconductor project . As shown in Figure 10b, the effects of identifying genes as differentially expressed can be dramatic, demonstrating the potential value of detecting blemishes and masking affected areas on microarrays.
As an alternative approach to identify blemishes, one might try to look at the residuals from parametric estimations in the background subtraction or summarization stages; e.g., looking at the residuals of the PM-MM difference model  or the RMA model over the PM values  to identify possible aberrations. Unfortunately, the variety of models currently being discussed attests to the fact that each model has its drawbacks. While random variation can typically be handled by statistical methods, systematic errors in the choice of the model assumptions may have a drastic impact on these processes. The proposed method is robust in the sense that only few assumptions are made. Another advantage of our approach is that we can include mismatch probes which are especially suitable to identify aberrations, because they are less sensitive to gene expression variations.
Moreover, in any such model of expression estimation the residuals of the entire probe set containing a faulty probe is likely to be affected, so that errors are spread across the probe set and hence over the image; if one probe in a probe set is an outlier, e.g., very bright, all other probes would be slightly dark ghost images, similar to the 'ghosting' seen in Figure 4. Utilizing topological information for identification and elimination of blemishes has the advantage that suspect probes are identified before background correction, normalization and summarization take place. Thus, faulty data will not confound the preprocessing steps and further statistical analysis.
With the next generation of Affymetrix chips, the relevance of correcting for blemishes will even increase. Here, we analyzed U95 chips with 16 probe pairs per probe set. To make room for more probe sets, the number of pairs per set has been reduced to (as few as) 11 on the U133 chips. This, however, not only increases the standard error by 20%, and, thus the effect of any artefacts on the results, but also reduces the ability of model based methods to draw on probe set information. The number of neighbouring cells on a microarray, in contrast, is not adversely affected by reducing the size of the probe sets. In fact, smaller probe sets make it less likely that probe pairs from the same set are in close vicinity.
We have presented an extremely simple method for finding blemishes on microarrays. The method's simplicity makes it robust and it does not rely on estimating model parameters. It sensitively tagged blemishes on chips that had passed our Gene Array Resource Center's quality control mechanism. Only one blemish (Figure 5) could have been readily seen in the raw images. That we found clear evidence of physical blemishes in the raw images for most of the artefacts identified on the pseudo images attests to the validity of the findings.
We have applied our method to an experimental dataset and were able to identify anomalies of different type. Approximately 25% of our chips are blemished, often more than once, and blemishes can cover areas from a few dozen to hundreds of probes. We examined the potential impact these blemishes have on the experiments. Failure to remove the blemishes from further analysis can materially affect the detection of subtle changes in experiments testing similar conditions. When applied to the Spike-in data set, the proposed method had an overall better sensitivity/(1-specificity) ratio.
For the future we propose to develop pattern recognition algorithms to automatically find and mask out suspected blemishes, and to modify the extant background correction and summarization algorithms to be able to properly handle missing data from blemish removal.
Let X (i), i = 1, ..., n, represent the intensity values of the i-th of n chips, each consisting of m × m (e.g., 650 × 650) cells . Assuming that biological systems respond to relative, rather than absolute differences in gene expression, for each pair of chips a matrix of pointwise (log) ratios is defined as
Given that the intensity at each cell is highly determined by the sequence of the probe , the spatial distribution of differences in log-intensities should have no identifiable features, except for probes belonging to probe sets related to the genes that are differentially expressed under the conditions the samples were taken. Here, we assume that the proportion of differentially expressed genes is small. Thus, since probes belonging to a probe set are (more or less) randomly distributed across the chip, cells of related genes are rarely located next to each other, so that no obvious pattern should be discernable. If, however, chip X (i) has a localized 'defect', this should result in a similar pattern across all R (i,i'≠i) in the region of the defect. To allow for visual inspection of such pattern, we draw on the fact that the distribution of differences in log-intensities should be (more or less) symmetrical, except for outliers caused by rare events affecting small areas in particular chips. Probe-wise outliers (due to both differential expression and defects) can be identified by comparing each chip to a measure of central tendency derived from all other chips. Although other measures of central tendency will be discussed below, we start our discussion with the special case of the arithmetic mean, which is known to be optimal in the classical linear model ()
Let R (i,i') = Δ(i,i') + D (i) - D (i') + ε where Δ(i,i') indicates the random contribution from the differentially expressed genes, D (i) describes the defects of the i-th chip, and ε other random errors. Then, D (i) contributes not only to (bars indicating the average over the index replaced by dot), but also, albeit with only 1/n of the intensity, to each of the other as a 'negative shadow' or 'ghost' image. As the number of chips n increases, however, the law of large numbers allows for approximating the linear equation system (1), with hats indicating estimators, as
From (2), we get the linear equation system:
where I = (δ j = j') j,j' = 1...n and J = (1) j,j' = 1...n . A system has the trivial solution Y = D whenever column sums are zero (JY = 0). As (2) guarantees that , setting yields the solution
as the linear model estimate for the deviation of the i-th chip from the other chips. As the number of chips increases, ghosting reduces, so that any discernable pattern in in the limit would suggest a defect.
The above justification for obtaining residuals within the linear model by subtracting the average is well known. Still, spelling out and justifying the individual steps above helps in two ways. First, we can fine tune the method for the particular situation we are faced with and, second, we can provide numerical examples comparing the proposed non-parametric with the traditional parametric approach. The justification for the choice of the arithmetic mean (average) as the measure of central tendency in linear models relies either on the law of large numbers and the central limit theorem or on the assumption that the distribution of errors is symmetrical, in general, and Gaussian, in particular. Neither assumption is easily justified for the errors caused by defects on a chip.
The arithmetic mean is known to be relatively sensitive to outliers. Thus, to discriminate outliers from observations close to the centre of the non-outliers, one would need either a very large number of chips or a measure of central tendency that is less likely to be affected by the outliers themselves. While microarray 'experiments' now typically consist of more than a single chip, the number of chips analyzed under comparable conditions is still too small to rely on the central limit theorem for outlier detection. With the number of chips in the single digits, even 'Winsorization' may not be feasible. Moreover, the need for choosing some Winsorization cut-off points adds an undesirable level of arbitrariness to the results. The median, as the most robust form of Winsorization, provides for a simple alternative measure of central tendency:
The authors wish to thank Marcelo O. Magnasco for helpful discussions and support. M.S.F. acknowledges a Woman in Science fellowship from RU. K.M.W. was supported in part by GCRC grant M01-RR00102 from the National Center for Research Resources at the National Institutes of Health. This paper is, in part, based on a presentation given at the 2004 Joint Statistical Meetings in Toronto, Canada.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.