Harshlight: a "corrective make-up" program for microarray chips
© Suárez-Fariñas et al. 2005
Received: 18 August 2005
Accepted: 10 December 2005
Published: 10 December 2005
Skip to main content
© Suárez-Fariñas et al. 2005
Received: 18 August 2005
Accepted: 10 December 2005
Published: 10 December 2005
Microscopists are familiar with many blemishes that fluorescence images can have due to dust and debris, glass flaws, uneven distribution of fluids or surface coatings, etc. Microarray scans do show similar artifacts, which might affect subsequent analysis. Although all but the starkest blemishes are hard to find by the unaided eye, particularly in high-density oligonucleotide arrays (HDONAs), few tools are available to help with the detection of those defects.
We develop a novel tool, Harshlight, for the automatic detection and masking of blemishes in HDONA microarray chips. Harshlight uses a combination of statistic and image processing methods to identify three different types of defects: localized blemishes affecting a few probes, diffuse defects affecting larger areas, and extended defects which may invalidate an entire chip.
We demonstrate the use of Harshlight can materially improve analysis of HDONA chips, especially for experiments with subtle changes between samples. For the widely used MAS5 algorithm, we show that compact blemishes cause an average of 8 gene expression values per chip to change by more than 50%, two of them by more than twofold; our masking algorithm restores about two thirds of this damage. Large-scale artifacts are successfully detected and eliminated.
Analysis of hybridized microarrays starts with scanning the fluorescent image. The quality of data scanned from a microarray is affected by a plethora of potential confounders, which may act during printing/manufacturing, hybridization, washing, and reading. For high-density oligonucleotide arrays (HDONAs) such as Affymetrix GeneChip® oligonucleotide (Affy) arrays, each chip contains a number of probes specifically designed to assess the overall quality of the biochemistry, whose purpose is, e.g., to indicate problems with the biotinylated B2 hybridization. Affymetrix software and packages from Bioconductor project for R  provide for a number of criteria and tools to assess overall chip quality, such as percent present calls, scaling factor, background intensity, raw Q, and degradation plots. However, these criteria and tools have little sensitivity to detect localized artifacts, like specks of dust on the face of the chip, which can substantially affect the sensitivity of detecting physiological (i.e., small) differences. In the absence of readily available safeguards to indicate potential physical blemishes, researchers are advised to carefully inspect the chip images visually [2, 3]. Unfortunately, it is impossible to visually detect any but the starkest artifacts against the background of hundreds of thousands of randomly allocated probes with high variance in affinity.
In  a simple method to "harshlight" blemishes in HDONAs chips was presented. The method produces an Error Image (E) for each chip, which indicates the deviation of this chip's log-intensities from the other chips in the experiment. Formally, E is calculated as E (i) = L (i) - median j L (j) where L (j) is the log-intensity matrix of chip i. Given that the intensity of each cell is highly determined by the sequence of the probe , this deviation should be near zero except for the probes belonging to the probe sets related to genes that are differentially expressed. In earlier Affymetrix chips, the probe pairs corresponding to a single probeset were located in adjacent positions on the array, but now probe pairs are randomly distributed on the chip , so that no obvious pattern should be discernable in E.
In about 25% of the chips we have seen, the error image shows artifacts with strikingly obvious patterns, which often hint to the physical cause of the blemish. While this makes such blemishes visible to the human eye, manually masking the defects is impractical except for small sets of chips and introduces undesirable subjectivity. Thus, we developed an R-package with subroutines in C, to automatically spot suspicious patterns in the error image (E) using a battery of diagnostic tests based on both image processing and statistical approaches.
For testing and developing purposes, several sets of chips were used, including chips from Affymetrix SpikeIn (HUG133 and HUG95) experiments  and from three other experiments undertaken at Rockefeller University facilities, for a total of 158 chips. These include a variety of experimental sets: HGU133a chips on embryonic stem cell samples , two clinical studies on psoriasis , undertaken using blood and skin samples (Haider A., personal communication), and a study on microglia cells (Kreek M.J., personal communication).
We have developed pattern recognition methods specifically tailored to each type of defect. We will describe them and how they are deployed in the next sections.
Having a batch of chips from a single experiment, the error image E is obtained for each chip as described above. Our algorithm detects patterns of outliers in these error images, so it is important to notice that henceforth, unless otherwise noted, we shall refer exclusively to the error images. In a typical experiment, only a small number of genes are expected to be differentially expressed. Thus, most pixels in E should be close to zero. Since probes belonging to a probe set are randomly distributed over the chip, variance in gene expression should not lead to spatially correlated patterns. Therefore any discernable pattern of outliers in E signals a defect. Harshlight automatically identifies those patterns and returns the batch of masked chips. The user may choose whether the intensity values of defective probes are to be substituted by missing values or by the median of the intensity values of the other chips (default).
Defects covering a large area (extended defects), can cause substantial variation in the overall intensity from one region of the chip to another, thereby compromising the assumption that most cells have only a small deviation from the median. To quantify such variation, we decomposed the error image E as:
E = B E + η E
where B E and η E represent, respectively, the background and features of the error image E. Please note that B E is a background in an image analysis sense, and it should not be confused with the optical background of the original chip image that is addressed in background correction procedures; B E is not related to the "dark" area of the original image and in fact can have either sign. sSimilarly, the "features" of the error image are its local spatial variations, and also can have either sign. Ideally, in an unblemished chip, the features η E originate in differentially expressed genes, which are expected to be spatially randomly distributed with mean zero and variance . Assuming background and features to be uncorrelated, this allows the variance of E to be decomposed as:
To estimate , the image is smoothed with a median filter, a technique commonly used in image processing to eliminate single-pixel noise. The median filtered image , created trough a sliding median kernel, is an estimator of the background B E as is defined by
where Θi are the pixels in the window centered in i. In our case, the mask used is a circular window with a user defined radius (default = 10 pixels). At the edges of the image, the part of the mask that lies outside of the chip borders is filled with the image mirrored at the border.
Since the background is locally constant, we have approximately:
where n δ is the number of pixels in the window, which is equal to 102π ≈ 314 in the case of a circular window with radius δ = 10. Thus, is a good approximation of the background variance.
In an unblemished chip, the variance of the deviations from the median chip, , is mainly due to the variance of the features η, and the background B E should suffer small variations across the chip, i.e. >> . However, in an image such that in Figure 1B there is a large variation of the background from one region to another. Thus, the proportion of variations in E explained by the background, namely / , quantifies the extent of such defect. If this quantity is bigger than a certain threshold, the chip should be discarded.
This kind of extended defect was rare; we only found three seriously flawed chips among the 158 chips we analyzed. The percentage of the estimated variance explained by the background / varied across chip collections; chips handled by our local facility had a median of 3% and always had <9% variations. The SpikeIn experiments had substantially larger ratios, and in the case of SpikeIn95, three outlier chips at 33%, 36% and 60% (the chip in Figure 1B). No chip in our collection has ratios between 17% and 33%, so any number in that range seems a reasonable threshold given our limited statistics. Since typically ση~0.4 in log2 units, chips with large ratios can be materially distorted; the background of the chip in Figure 1B has σB = 0.5, so the intensities in the bright region are more than double the intensities in the dark area.
We do not have enough data to ascertain what causes extended defects; since the chips are scanned by a laser-scanning system, extended defects are not caused by changes in illumination level or other simple physical causes. We therefore do not currently know what an appropriate remedy would be, so if an extended defect is detected analysis is stopped for this chip, and suggest to the user the chip should be discarded.
If the chip passed the previous test, analysis continues. First the chip is searched for compact defects, defined as small connected clusters of outliers in the error image E. As probe pairs are randomly distributed, differential gene expression leads to spatially uncorrelated variations. In good chips, the outlier pixels of E should not be connected, so connected outlier pixels indicate compact defects.
For each outlier image, the FloodFill algorithm is then used to detect clusters of connected outliers. For every flagged pixel in the image, the algorithm recursively looks for other flagged pixels in its neighborhood. If any are found, the pixels are assigned to the same cluster number. The process stops when no more connected pixels can be found (see Figure 3C for the resulting image). The user can choose whether two pixels are considered connected if they share only an edge (4 -neighbours connectivity) or also a corner (8-neighbours connectivity, default).
After each cluster is defined, the significance of its size s can be easily computed as 1-F(s), where F is the cumulative distribution of cluster size under the null hypothesis in a chip of the same dimension and proportion of outliers. If the size significance is bigger than a user-defined threshold (default α = 0.01), the cluster is discarded and not considered as a blemish (Figure 3D). In addition, its size is also compared to the minimum cluster size accepted (user defined, default = 15 pixels): again, if the cluster is not large enough it is not considered. The collection of chips we have examined displays a large number of compact defects, many of them quite large.
A histogram of the size distribution of compact defects, contrasted with the null hypothesis distribution derived from simulations, is shown in Figure 4. Within the range of a few units of area through several hundred probes, the distribution of compact defects can be approximated by a power law (similar to the Zipf law in linguistics) N(A)~A-3.1, while the null distribution falls off exponentially as N(A)~e-2.12A. Therefore for even moderately small areas the significance of such clusters is extremely high.
Areas covered by compact defects are excluded from the chip before continuing with the next step.
Diffuse defects are defined as areas with densely distributed, although not necessarily connected outliers.
The first step involves, as in the case of compact defects, the declaration of outliers. To avoid penalizing chips with small error variance, we declare outliers those pixels whose intensity values are higher (bright outliers) or lower (dark outliers) by a certain percentage than the expected intensity. In terms of the E, this criterion (also used in) implies that if pixels with x% of decrease in intensities are considered dark outliers, the dark outlier image can be defined as:
Both Outlier Images (one for dark outliers, one for the bright ones, overlaid in Figure 5A), are scanned with a circular sliding window of user-defined radius (default radius δ = 10). The borders are duplicated as described for the extended defects. For every pixel i in the Outlier image, the proportion of outliers in the surrounding circular window Θi is computed as:
A binomial test is then used to decide whether p i is larger than the overall proportion p o of outliers in the image, i.e. to test p i >p o vs. p i = p o .. A new image (Figure 5B) is then created as
where b α(p,n) is the α-percentile of the binomial distribution. D gives a better representation of the regions with high proportion of outliers, since the disconnected pixels in the Outlier Image appear now more connected (see Figure 5B).
The FloodFill algorithm is then used to detect connected flagged pixels (D i = 1) as before, and clusters of small size are discarded. The user can set the size limit of the clusters, but the default value is three times the area of the sliding window.
Finally, to better outline the area of blemishes, the image undergoes a closing procedure. This is a technique commonly used in image processing to close up breaks in the features of an image (see for example ). In our case, a circular kernel is centered in each pixel of the image (radius = radius of the kernel used to detect the diffuse defects, see later). Its centre is flagged if any of the pixels of the kernel is flagged.
This procedure (dilation) causes the borders of the defects to grow, eventually filling empty spaces inside the features. However, in order to maintain the original outer borders of the features, another circular kernel is applied to the image. This time, the centre of the window is flagged only if all of the pixels inside the window are flagged. This procedure (erosion) reverses any extension beyond the compact hull of the original cluster.
We suggest that all probes in the closed area should be masked, but the user can choose to mask only the outlier probes.
Real compact defects are isolated and highly connected clusters, so that after the closing procedure their area remains substantially the same. On the contrary, probes declared as compact defects that are part of diffuse defects are close to one another, and therefore after the closing procedure the area covered by the resulting cluster is appreciably bigger than the area covered by the compact defects alone. Comparing the extension of the areas before and after the procedure gives us an idea of how many compact defects there are in a specific region. The ratio between these two areas (Figure 6C and 6D) represents the density of compact defects in the region (Figure 6E). If this value is smaller than a threshold (default = 50%), the compact defects in the region are probably part of diffuse defects and shall not be eliminated when the compact detection procedure ends.
The package was implemented in R in compliance with the CRAN guidelines. Computationally intensive routines were implemented in C (R shared library builder 1.29 and GCC version 3.2.3) through the R interface for better efficiency.
The main Harshlight function accepts a Bioconductor object from the class affyBatch. For each batch of chips analyzed, the program returns two outputs: a file report in PostScript format and a new affyBatch object. The report shows, for each chip analyzed, the number and type of defects found, the percentage of the area eliminated after the analysis, an image is produced for each kind of defect, showing the areas where the blemishes are found. The output affyBatch object is identical to the input, except that the values within defects are declared missing. If some downstream subroutine does not allow for missing data, the user may choose to have missing data be substituted with the median of the other chips' intensity values for the blemished probe; this is a neutral substitution strategy, since it sets the error image values to zero on the blemished probe without affecting any other value. In general, the efficacy of an imputation method depends on what analysis is used downstream of it; because of this, only the median substitution method has been built into Harshlight. Other imputation methods can be still used, as functions taking an Affybatch object with missing values.
Default values for the parameters used in the program
Radius of the median filter
Threshold for the proportion of variance explained by the Background
Quantile for the definition of outliers
5th, two tails
Minimum size of clusters
Threshold for bright defects
40% more than original value
Threshold for dark defects
35% more than original value
p-value threshold for binomial test.
Radius of sliding window
Minimum size of clusters
We have built an algorithm upon a recently-developed methodology to visualize artifacts on HDONA microarrays, which automatically masks areas affected by these artifacts; we present an implementation of the algorithm in an R package, called Harshlight. The algorithm combines image analysis techniques with statistical approaches to recognize three types of defects frequent in Affymetrix microarray chips: extended, compact, and diffuse defects. The algorithm was tested on 158 chips, from 5 different experiments, including the two Affymetrix SpikeIn experiments. Output reports for all chips can be found online at.
That blemishes exist in fair abundance is clear from those output reports as well as from Figure 4. We shall now demonstrate that these blemishes affect the gene expression values and that Harshlight can restore this damage.
Different summarization algorithms are expected to resist blemishes differently, based on their statistical construction. We shall concentrate here on two popular algorithms, MAS5 and GCRMA. MAS5 is the "official" algorithm supplied by Affymetrix and by far the most widespread; GCRMA is an open-source method available in the Bioconductor suite, based on robust averaging techniques and sequence-dependent affinity corrections . The robust averaging employed in GCRMA should confer strong immunity to outliers. We shall show below that MAS5 is strongly affected by blemishes, and that GCRMA is affected to a smaller, yet still relevant extent.
We quantified the ability of Harshlight to apply "corrective make-up" using two distinct strategies: first, by artificially blemishing a "clean" dataset and verifying how much the values are affected and how well they are restored, and second, by using a case where nominal concentrations are known, the Affymetrix SpikeIn experiments.
In earlier chips the probesets were laid contiguously in space, so it was possible to detect localized defects by observing that the probeset had an "outlier" pattern . However, the entire probeset would have to be discarded, and the gene expression information would be lost. Correlated location in space precludes use of Harshlight on those earlier chips (e.g., HUGeneFL). The random allocation of probe pairs in newer generations of chips permits robust methods like GCRMA to partly resist damage by blemishes; and, in conjunction with a method like Harshlight, to restore the expression values for many affected genes.
While we have developed this method on the basis of our error image detection of outliers, in principle the residuals of any model such as  or GCRMA could be used to identify individual probes on a chip as outliers. To facilitate the integration of various methods, Harshlight accepts error images generated by other programs; if none are provided, then the error images are computed. We have, however, not yet explored the appropriate null hypothesis for these methods.
We have presented an R package that provides a way to automatically "harshlight" artifacts on the surface of HDONA microarray chips. The algorithm is based on statistical and image processing approaches in order to safely identify blemishes of different nature and correct the intensity values of the batch of chips provided by the user. The corrections made by Harshlight improve the reliability of the expression values when the chips are further analyzed with other programs, such as GCRMA and MAS5.
It has been shown that microarray results are affected if blemished chips enter the pipeline of the analysis; blemished probes may have values differing from the correct value by much more than the typical error, so blemishes are expected to have a particularly strong impact on experiments trying to discriminate subtle differences between samples or in a clinical diagnosis context. We present Harshlight in the hope it shall be a useful tool in quality assessment of microarray chips and will help improve microarray analysis.
Project name: Harshlight
Project home page: http://asterion.rockefeller.edu/Harshlight
Operating system(s): Platform independent, tested upon Red Hat Linux and is being under testing on Windows XP systems
Programming language: R, C
Other requirements: R 1.8.0 or higher
License: GNU, GPL
Any restrictions to use by non-academics: license needed
M.S.F. acknowledges a Woman in Science fellowship from RU. K.M.W. was supported in part by GCRC grant M01-RR00102 from the National Center for Research Resources at the NIH. We would like to thank Steffen Bohn for helpful comments and discussion.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.