Application of a correlation correction factor in a microarray cross-platform reproducibility study

Background Recent research examining cross-platform correlation of gene expression intensities has yielded mixed results. In this study, we demonstrate use of a correction factor for estimating cross-platform correlations. Results In this paper, three technical replicate microarrays were hybridized to each of three platforms. The three platforms were then analyzed to assess both intra- and cross-platform reproducibility. We present various methods for examining intra-platform reproducibility. We also examine cross-platform reproducibility using Pearson's correlation. Additionally, we previously developed a correction factor for Pearson's correlation which is applicable when X and Y are measured with error. Herein we demonstrate that correcting for measurement error by estimating the "disattenuated" correlation substantially improves cross-platform correlations. Conclusion When estimating cross-platform correlation, it is essential to thoroughly evaluate intra-platform reproducibility as a first step. In addition, since measurement error is present in microarray gene expression data, methods to correct for attenuation are useful in decreasing the bias in cross-platform correlation estimates.


Background
Previous microarray gene expression studies have examined within-platform reproducibility among different generations of the Affymetrix GeneChip [1,2] and among cDNA-based array platforms [3,4]. Subsequently, several cross-platform reproducibility studies have been reported, many of which examined either the consistency of intensities or the consistency with which different platforms identify genes significantly differently expressed [5][6][7][8][9][10][11][12][13][14][15][16][17][18]. Results from another large cross-platform study, the MicroArray Quality Control (MAQC) project, led by the US Food and Drug Administration with 51 participating universities and major biotechnology companies, have also been reported [19][20][21][22][23][24]. Some of these early studies demonstrated poor cross-platform correlations. For example, among 384 genes commonly declared present in a cDNA-based microarray and the Affymetrix HG-U95Av2 GeneChip platform, the Spearman correlation was only 0.131. Other cross-platform studies also reported low cross-platform correlations [5,8]. In addition, in a study examining three microarray platforms in ten laboratories, correlations between Affymetrix and two-channel arrays ranged from 0.13 -0.57 [25]. More recent research has demonstrated that poor correlations may be observed when at least one platform under examination suffers from low intra-platform reproducibility or when a poor data analytic method is applied [26].
Most of these studies estimated Pearson's correlation as a means of assessing cross-platform reproducibility. That is, we consider X and Y to be microarray gene expression values from two different platforms, and ρ XY is estimated.
However, for microarray data, both random variables X and Y are subject to measurement error. It is well known that the flourescent intensities from the scanned microarray images are proxies for the true underlying gene expression values [27]. Therefore, microarray gene expression values are measured with error. When examining crossplatform correlation, inconsistencies in measured intensities can be due to systematic platform biases as well as random intra-platform variability. Statistical methods that account for measurement error (ME), such as regression calibration, have been applied in a variety of scenarios to correct for the known bias caused by ME in parameter estimation [28]. In a recent review, the authors stated that within the next 5 years, "calibration methods will be introduced to systematically correct ratio underestimation by microarray technology" [29]. We have undertaken such an effort to account for the random intra-platform variability by developing a "disattenuated" correlation estimate [30] which accounts for random intra-platform variation in both X and Y, and demonstrate its use in measuring cross-platform correlation.
Microarray hybridizations were performed using three different technologies, each in a different laboratory. The Affymetrix (Affy) HG-U133A GeneChip was utilized in the Virginia Commonwealth University's (VCU) Division of Molecular Diagnostics Laboratory. A custom-designed oligonucleotide microarray designed specifically to interrogate genes more commonly expressed in brain tissue was used in VCU's School of Engineering's Center for Bioelectronics, Biosensors and Biochips (C3B). The C3B microarray platform comprises 10,000 genes represented by 3' fifty-mer oligonucleotides (MWG Biotech) that were spotted in duplicate. Finally, a cDNA microarray spotted with full and partial length PCR probes (Research Genet-ics/Invitrogen) was used in George Mason University's (GMU) Center for Biomedical Genomics and Informatics.
Each laboratory designed a small experiment to assess intra-platform quality control. Each laboratory used the same lot of reference RNA, the Stratagene Total Human RNA, for hybridizing a set of technical replicates for a process variability study. These 'self-self' hybridizations permit meaningful assessments of reproducibility since, under ideal circumstances such as that the same experimental conditions exist among platforms and that there are no probe-binding affinity effects, each gene across the set of chips should exhibit linearly related gene expression intensities across platforms. Although the RNA hybridized was from the same lot, the study designs and protocols differed from lab to lab. A description of of each experiment can be found in the Methods section of this paper.

Within-platform comparisons
Prior to estimating cross-platform correlations, we performed a thorough examination of intra-platform reproducibility, as recommended [29]. Since the Stratagene Total Human RNA was used as both the experimental and reference sample, the expected log 2 ratio for all genes is 1, so that no correlation is expected when comparing two arrays in terms of the log 2 ratio. Therefore for two channel arrays, we restricted attention to intensities from one channel as well as to the post-normalized intensities from that same channel. For the Affymetrix GeneChip, intensities were highly correlated across the set of three technical replicates for all expression summary methods (Table 1 and Figure 1). The GMU arrays were strongly correlated, though the C3B arrays were not highly correlated ( Figures  2 and 3).
The weighted kappa statistics indicated that the Affymetrix platform had the highest agreement among ranked intensities (Table 2), followed by the GMU array which also exhibited good agreement among the technical replicates when considering the ranked gene intensities. The weighted kappa statistics for C3B platform suggested the ranked intensities from the three technical replicates were not in agreement, yielding an insignificant p-value for two of the array comparisons. A similar conclusion, that the Affymetrix platform followed by the GMU array demonstrated the highest reproducibility, with low reproducibility among the C3B arrays, was noted upon examination of the proportion of invariant features (Table 3). Although intra-platform reproducibility varied among the three platforms studied, all platforms yield gene expression intensities that are subject to some degree of measurement error.

Cross-platform comparisons
For the GMU array the 21,168 spots correspond to 19,894 distinct clones, with the feature name of each spot denoted by Unigene ID. There were 2,744 Affy probe sets that matched a GMU Unigene ID. Among these, 145 Unigene IDs were interrogated by more than one probe set. After restricting attention to unique clones and probes sets Affymetrix Figure 1 Affymetrix. Pairwise scatterplots and Pearson's correlation for Affymetrix GeneChips (MAS5 summaries) restricted to the 1,288 genes in common among the three platforms. there were 2,587 unique probe sets/clones in common to GMU and the Affy platforms. For the C3B arrays, since its design is essentially two identical subarrays laid out in duplicate with the feature name of each spot denoted by RefSeqID, the average expression for each RefSeqID was calculated prior to merging the spots with the Affymetrix probe sets. That is, the 21,168 long oligos correspond to 10,040 distinct genes. For the C3B array, there were 9,000 distinct RefSeqIDs were interrogated by at least one Affymetrix probe set meeting our criteria. Once the data from the two different 2-channel arrays were merged to the Affymetrix GeneChip data (i.e., GMU-Affy and C3B-Affy), these two resulting datasets were then merged by Affymetrix probe set ID, resulting in 1,288 common probe sets/spots among the three platforms.
Not accounting for measurement error, the average Pearson correlations ( w ) of the log transformed Affymetrix GeneChip expression and C3B array expression are reported in Table 4   expression summaries as 'naïve' estimates of correlation. In addition, the disattenuated correlations ( ), obtained when considering that the C3B and Affy gene intensities are subject to measurement error, are also reported. Noting that the attenuation for the C3B arrays is 0.386, that is, over half of the variability is attributed to measurement error, the disattentuated correlations estimated using measurement error models are substantially higher, irre-spective of the Affymetrix expression summary method used. This suggests that previous use of Pearson's correlation under-estimated true underlying cross-platform correlations. That is, the effect of the presence of random intra-platform variation is degraded performance on the apparent cross-platform correlation. Therefore, by removing random intra-platform variation through measurement error methodology, the cross-platform correlation will go up.  The average Pearson correlations ( w ) of the log transformed Affymetrix GeneChip expression and GMU array expression are also reported in Table 4 for MAS 5.0, RMA, and GC-RMA expression summaries, as well as the disattenuated correlations ( ). The attenuation for the GMU arrays is 0.824, therefore the disattenuated correlations estimated using measurement error models are larger than their corresponding naïve estimates, though not as markedly in comparison to the C3B arrays. This is due to the higher reliability among the GMU expression intensities.

Discussion
In this paper, both intra-and cross-platform reproducibility was examined for the Affymetrix and two dual channel microarrays (C3B and GMU). We applied various methods for examining within-platform reproducibility including Pearson's correlation, the weighted kappa, and percent of invariant genes. We also examine cross-platform reproducibility using Pearson's correlation. We previously demonstrated the effectiveness of applying a correlation correction factor via a small simulation study and demonstrated its application in estimating gene-specific correlations. In this paper we demonstrated its use in estimating cross-platform reproducibility. We note that correcting for measurement error by estimating the "disattenuated" correlation removes the bias or attenuation inherent in cross-platform correlation estimates. Specifically, to the extent that random intra-platform variation is present, the effect is degraded performance on the apparent cross-platform correlation. Therefore, by removing random intra-platform variation through measurement error methodology, the cross-platform correlation will go up.
Due to the increased public availability of gene expression microarray data through Gene Expression Omnibus [31] and ArrayExpress [32], researchers are increasingly interested in methods that integrate the results from various microarray studies performed on similar types of samples [33][34][35][36][37]. A careful understanding of variability due to platform-specific bias and random intra-platform variability will help investigators select methods for integrating cross-platform results. Specifically, the amount of attenuation for a specific platform could be used as a platformspecific quality measure and incorporated into a metaanalytic framework [38]. Moreover, gene-specific attenuation factors could be used to adjust for quality in a genewise fashion in such models.
A major application of DNA microarray technology is differential gene expression profiling, or the detection of the differences in expression levels of genes between two different types of samples. Some have argued that the consistency of the differences via fold-change or ratio is a more relevant metric for assessing cross-platform comparability than intensities from a single channel. However, to estimate the correlation between fold-changes from two platforms, two different samples are needed. We therefore plan to use data from the MAQC project to examine cross-platform fold-change correlations. In addition, it has been suggested that a more relevant metric is not agreement in the identification of individual differentially expressed genes, but rather whether consistent and accurate predictions of sample class is obtained from the ρ ρ   platforms being compared [39]. This metric should be included is such cross-platform studies as well.
Previous researchers demonstrated that single and two channel microarrays yield consistent results, and concluded that the selection of which technology to use is not necessarily a critical factor in the design of a microarray study [20]. Here we demonstrate the critical need to thoroughly evaluate intra-platform reproducibility, a finding which has been been noted by others [26]. In this study, we examined two dual channel platforms and the Affymetrix platform. While the C3B and GMU platforms are not widely used by the microarray research community, they do represent a class of microarrays that are commonly used, two channel custom spotted/home brewed arrays. Thus, we believe these results are of general interest to those who use both commercial and custom designed arrays. While the C3B two channel platform had poor reproducibility, the GMU two channel and Affymetrix platforms had good reproducibility. We repeated the intra-platform analysis using the following three sets of randomly selected Affymetrix GeneChips (6, 12, 2), (5,16,14), and (5, 2, 3) and the intra-platform Affymetrix results were consistently reproducible with what is presented in this paper. This high reproducibility of the Affymetrix GeneChip data has also been reported by other investigators [14,40]. These data have proven useful in selecting a platform for studying biological specimens being collected by our tissue bank. We recommend that prior to performing expensive microarray hybridizations using irreplacable biological specimens procured from clinical studies, a thorough assessment of intra-platform reproducibility be conducted.
One limitation of this study is that platform is completely confounded with laboratory technician and protocol, that is, the platform-specific sequence of reactions, scanner, procedures and events involved in the production of microarray data. It was previously noted that there is a high positive correlation between technician experience and intra-platform correlation [25]. This is consistent with our findings, whereby a first year graduate student performed the C3B hybridizations ( = 0.656), while the GMU and Affy hybridizations were performed by Ph.D. faculty members ( = 0.848 and = 0.996, respectively). Future studies that control for external factors that may influence intra-platform reliability are warranted.
In calculating cross-platform correlation, we assumed that the correlation estimated using the using the 1288 matching probes across the three platforms are representative of expected correlation of genes in the human genome that could be represented on the plaforms. Examination of absolute tag counts for the Stratagene Total Human RNA obtained using Serial Analysis of Gene Expression data (available from GEO #GSM1734) revealed that the intensity distribution of the 1,288 genes in common among the three platforms is not representative of the range of expected values (Figures 4, 5, 6, 7). Thus the commonly invoked procedure of estimating cross-platform consistency using only probes in common to all platforms is demonstrated to suffer from bias related to genomic coverage and probe annotation. Future studies comparing commercially available and custom designed arrays need to take this into consideration.

Conclusion
When estimating cross-platform correlation, it is essential to thoroughly evaluate intra-platform reproducibility as a first step. We also note that the commonly invoked procedure of estimating cross-platform consistency using only probes in common to all platforms is demonstrated to suffer from bias related to genomic coverage and probe annotation. Future studies comparing commercially available and custom designed arrays need to take this into consideration. Moreover, to the extent that random intraplatform variation is present, the effect is degraded performace on the apparent cross-platform correlation. Therefore, by removing random intra-platform variation through measurement error methodology, the cross-platform correlation will go up. Methods to correct for attenρ ρ ρ Histogram of log 2 absolute tag counts from SAGE Figure 4 Histogram of log 2 absolute tag counts from SAGE. Histogram of log 2 absolute tag counts from Serial Analysis of Gene Expression using the Stratagene Total Human RNA for the 14000 unique tags. Data available from GEO Accession #GSM1734. uation, such as that presented, are thus useful in decreasing such a bias in cross-platform correlation estimates. Platform-specific attenuation estimates may subse-quently be used as a platform-specific quality measure and incorporated into a meta-analytic framework.

Stratagene Technical Replicates Dataset
Previously, each laboratory designed a small experiment to assess intra-platform quality control. Each laboratory used the same lot of reference RNA, the Stratagene Total Human RNA, for hybridizing a set of technical replicates for a process variability study. These 'self-self' hybridizations permit meaningful assessments of reproducibility since, under ideal circumstances such as that the same experimental conditions exist among platforms and that there are no probe-binding affinity effects, each gene across the set of chips should exhibit linearly related gene expression intensities across platforms. Although the RNA hybridized was from the same lot, the study designs and protocols differed from lab to lab.
The Affy platform was assessed using an unbalanced three-factor design using 16 technical replicates [41]. The same reference RNA sample was examined in 16 different chips run on two days in four different modules of the Affymetrix fluidics workstation. Fresh fragmented cRNAs were hybridized to the first four GeneChips on Day 1 while frozen fragmented cRNAs were hybridized to remaining four GeneChips on Day 1 and to all eight GeneChips processed on Day 2. To eliminate operator Histogram of log 2 average GMU signal Figure 7 Histogram of log 2 average GMU signal. Histogram of log 2 average GMU signal for the Stratagene Total Human RNA using the 1,288 genes in common among the three platforms.  Histogram of log 2 average C3B signal Figure 6 Histogram of log 2 average C3B signal. Histogram of log 2 average C3B signal for the Stratagene Total Human RNA using the 1,288 genes in common among the three platforms. Gene Array scanner. The full set of 16 Affymetrix Gene-Chips is publicly available [42].
At GMU, the RNA was amplified using the MessageAmp aRNA Kit (Ambion). The amplified RNA (aRNA) was quantified and its quality was monitored by agarose gel and average size by the Agilent 2100 Bioanalyzer. The same amount of aRNA (4 μg) were labeled with Cy3 and Cy5 according the The Institute for Genomic Research protocol and hybridized to three Human I chips. For each chip, the Stratagene Total Human RNA served as both the experimental and reference sample [43]. The ScanArray Express HT confocal laser scanner with settings at 75% of photomultiplier tube, 75% of laser power, and 10 μm of pixel resolution was used. Images were aquired by ScanArray Express 2.0 software and processed with QuantArray software.
The C3B laboratory assessed quality of their fabricated microarray using a fractional factorial design. The factors investigated were cDNA labeling strategy (3 levels: Dye conjugated nucleotide, aminoallyl, and Genesphere dendimer labeling), input total RNA concentration ratio (3 levels: 1:1, 1:2, 1:4), hybridization time (2 levels: 4 and 16 hours), hybridization buffer (3 levels: Genesphere, MWG, and Amersham buffer), and production lot (2 levels: lot 7 and 9). Due to the expense of microarray production and hybridization, a fractional factorial design, rather than the full factorial design, was used. Therefore, all combinations of experimental conditions were not included. Specifically, by assuming that high-order interactions are negligible, information regarding the main effects and low-order interactions may be obtained by running only a fraction of the complete factorial design. Since we were interested in examining the effects of hybridization buffer (3 levels), RNA input ratio (3 levels), labeling strategy (3 levels), hybridization time (2 levels), and lot (2 levels), we were initially interested in a 3 3 × 2 2 design. However, due to the expense involved in running a full factorial microarray experiment, a 2 8-2 fractional factorial design was adopted with defining relation is I = ABCDG = ABEFH = CDEFGH. This resolution V design permits estimation of all main effects and two-factor interactions under the assumption that three-way and higher order interaction terms may be ignored. Thus our experiment required 64 C3B arrays to be hybridized given the factors and levels of interest. Again, for each array the Stratagene Total Human RNA served as both the experimental and reference sample. Hybridized arrays were scanned with ScanArray Express microarray scanner (Perkin Elmer) at 80% laser power, 70% PMT gain, and 5 μm scan resolution. Spot intensities were acquired from the images using QuantArray software.
The analyses conducted in the current study were restricted to an equal number of chips by platform to ensure one technology did not dominate the results simply because of having a larger sample size. Three arrays were hybridized at GMU, so a random sample of size 3 was taken from the 16 Affy hybridized samples. These three GeneChips were QAQC8.CEL (Day 1 Frozen), QAQC10.CEL (Day 2 Frozen), and QAQC13.CEL (Day 2 Frozen). The three replicates selected from the C3B fractional factorial study were chosen based on 'optimal' hybridization conditions identified from the fractional factorial experiment. Specifically, the number of genes found to be signficantly different from the analysis of variance model was used as the metric estimating the relative influence of each main and two-factor interaction term. The level of each factor having the smallest number of genes differentially expressed was considered optimal. The three C3B chips used in this study were hybridized using the same buffer (Amersham), ratio of input experimental and control samples (1:1), and labeling method (Aminoallyl Post RT). The chips differed with respect to lot number and hybridization time, though these factors were found to not significantly influence the resulting intensities in the larger study.

Normalization
Since single-channel arrays measure expression intensities on an absolute scale whereas two-channel arrays measure expression intensities on a ratio-metric scale, we first investigated intra-platform reproducibility using different methods for calculating gene expression to aid in our determination of how to best transform the intensities from the three platforms to a similar scale. In addition, since the objective included an assessment of platformspecific reproducibility across the set of available technical replicates, methods for within-array normalization rather than methods that simultaneously normalize the data across all arrays, were applied in a platform-specific fashion.
For the two-channel arrays, we employed a commonly used procedure of normalizing the spot-level intensities on the array using print-tip loess regression and the subsequently analyzing the normalized spot-level intensities [44]. The use of normalized spot intensities has removed the systematic sources of variability (or at least, reduced) attributed to technical artifacts of no interest, such as deposition differences, differences in labeling efficiencies, print-tip differences etc. Specifically, due to spot differences attributed to deposition gain, print-tip, and dye effects noted among two-channel arrays, each two-chan-nel array (C3B and GMU) was normalized by estimating the corrections for spots i = 1, ..., G by fitting print-tip loess regression models to the M i = log 2 (channel 1 i /channel 2 i ) (log difference) on A i = (log 2 (channel 1 i ) + log 2 (channel 2 i ))/2 (log average) [45]. Probe intensities were then adjusted by , therefore, represents the normalized log ratios [46]. In addition, to enforce an absolute expression measure, the normalized ratios were subsequently transformed to yield the channel 1 normalized intensities by [44]. Background was estimated by the Quantarray software as the mean intensity among those pixels within the masked area between the 5 th and 20 th percentile of intensities for a given spot. Since simple background subtraction has been demonstrated to increase spot-level variability [47], no background correction was applied.
The Affymetrix GeneChip Operating System (GCOS) was used to calculate expression summaries with a target intensity of 100 using the Microarray Suite version 5.0 (MAS 5.0) method [48]. For completeness, we also estimated expression using the robust multiarray average (RMA) [49] and GC-RMA methods [50], although these methods normalize and estimate probe set expression summaries utilizing data across the entire set of Gene-Chips and therefore may overestimate reproducibility. All normalization and expression summary methods were performed using the R software [51] and relevant Bioconductor packages [52].

Identifying common genes across platforms
The RESOURCERER annotation and cross-reference database [53] was developed to help investigators identify genes commonly interrogated by different microarray platforms. Other software tools such as MergeMaid [54], GeneHopper [55], MatchMiner [56], and ProbeMatchDB [57] have been developed for a similar purpose. Recent research has demonstrated improved cross-platform correlations when spots are matched by sequence rather than by gene identifiers [58][59][60].
Therefore, probe sets and spots with common sequences to all three platforms were retained for analysis using the following method. First, the GCG program 'netfetch' was used to obtain the NCBI GenBank records for spot IDs on the GMU and C3B microarray platforms. The perfect match (PM) probe level sequence data for the Affymetrix HG-U133A GeneChip was downloaded from the Affymetrix website (06/14/2005). BLASTN (v2.2.10) was used to query the Affymetrix probe sequences against the C3B sequences. Thereafter, all probe sets for which at least 60% of the probes reported low e-scores values (E <0.000001) for the same spot were retained as matches. This threshold was determined considering the breakdown bound of the Tukey biweight estimator used in the MAS 5.0 expression summary algorithm. M-estimators with symmetric ψfunction have breakdown bound close to 50%. Therefore, probe sets for which > 60% of its PM probes specifically interrogated the same RefSeqID were retained. For the C3B microarray, each RefSeqID is spotted two times on the array. For the intra-platform reliability study (Stratagene dataset), average spot intensity per RefSeqID was retained as C3B gene expression. For the Affymetrix Gene-Chips, when multiple probe sets interrogated the same transcript, first, that probe set with the maximum proportion of probes with E <0.000001 was retained; when two or more probe sets had the same proportion, then the most 3' probe set was retained, defined by the probe set with maximum stop query sequence location among probes within a GenBank ID; when both quantities were the same, the probe set was randomly selected.
This process was completed separately for the Affy-C3B and Affy-GMU platform pairs. These two resulting datasets were merged by Affymetrix probe set ID, resulting in a dataset containing only genes in common to all three platforms.
All raw microarray files used in this study are publicly available [61].

Intra-platform analyses
It has been suggested that poor cross-platform correlation is likely a result of low intra-platform consistency [26]. Therefore, prior to estimating cross-platform reproducibility and gene-specific reliability, intra-platform reproducibility for three different microarray platforms was examined. After normalization and calculation of gene expression summaries, within-platform correlation was estimated using average Pearson correlation for the K = 3 chips. In addition, reproducibility was examined by comparing the proportion of invariant genes across the set of technical replicates within a platform. Specifically, for spot i = 1, . . ., G, the ranked expression for the k th replicate of platform l is denoted by R ikl . We then identified the rank difference for each spot i within platform l as Δ il = abs(argmax il (R ikl ) -argmin il (R ikl )). A gene was designated as 'invariant' for platform l using the indicator I(Δ il /G ≤ 0.05). As an example, this would correspond to permitting the rank to shift by no more than 1,114 when 22,283 genes are spotted on the array. Statistical tests of hypothesis comparing the proportions of invariant genes across platforms were conducted using a chi-square test. Finally, the weighted kappa statistic was estimated by first grouping gene expression intensities into 25 approximately equal-sized classes based on their ranked intensities, y i . A weighted kappa statistic was used to allow a smaller penalty of misclassification among closely related classes, where the weights were taken to be w rc = (1 -0.1 × |r -c|) when |r -c| < 10 and 0 otherwise.

Attenuation
When fitting a linear regression model for observed random variables x i and y i on observations i = 1, ..., n, it is assumed ) which is independent of x i , and x i is measured without error [62].
Using the formulas for estimating Pearson's correlation and the slope parameter β 1 , Pearson's correlation can be shown to be Therefore, Pearson's correlation measures the strength of the linear relationship between X and Y.
For a general problem, suppose x i cannot be measured precisely but rather is measured with error. Denote the error-prone measurements = x i + u i where u i ~ (0, ).
It is well known that fitting the model using the error-prone values leads to the attenuated estimate β 1* for β 1 [28]. That is, the slope parameter is biased. Therefore, when fitting a simple linear regression model using the error prone measurements , the leastsquares estimate is where β 1 is the true slope parameter describing the relationship between y i and x i and λ is the attenuation factor.
The attenuation factor is given by and is used to estimate β 1 when measurement error is present in both X and Y [28].

Estimating cross-platform correlation
From the intra-platform results, it is clear that microarray gene expression data is subject to measurement error. When estimating cross-platform correlation, let X and Y represent the random variables for two different platforms, known to be measured with error. That is, , v i ~ (0, ). The average Pearson's correlation ( w ), which is not corrected for measurement error, can be estimated as where is the average log 2 Affymetrix intensities and is C3B or GMU expression. However, a more appropriate measure, the "disattenuated" correlation [30], can be calculated as This estimate adjusts for the bias present in estimating the correlation when measurement error is present. Estimates for σ x , σ u , σ y , and σ v were fit using the regression calibration rcal function in Stata version 9 [63]. In estimating and , the repeated measurements were assumed to be unbiased for the true gene expression values. Moreover, any missing value was treated as missing at random. Previous investigators have reported high reproducibility estimates for Affymetrix expression values [14,40], therefore, we were primarily interested in estimating the correlation between Affymetrix and the custom designed arrays (C3B and GMU) that we have used in various cancer genomics projects. The disattenuated correlation, , and average Pearson correlation, w , were estimated separately for the GMU and C3B platforms relative to Affymetrix.