A methodology for global validation of microarray experiments
© Miron et al; licensee BioMed Central Ltd. 2006
Received: 30 March 2006
Accepted: 05 July 2006
Published: 05 July 2006
DNA microarrays are popular tools for measuring gene expression of biological samples. This ever increasing popularity is ensuring that a large number of microarray studies are conducted, many of which with data publicly available for mining by other investigators. Under most circumstances, validation of differential expression of genes is performed on a gene to gene basis. Thus, it is not possible to generalize validation results to the remaining majority of non-validated genes or to evaluate the overall quality of these studies.
We present an approach for the global validation of DNA microarray experiments that will allow researchers to evaluate the general quality of their experiment and to extrapolate validation results of a subset of genes to the remaining non-validated genes. We illustrate why the popular strategy of selecting only the most differentially expressed genes for validation generally fails as a global validation strategy and propose random-stratified sampling as a better gene selection method. We also illustrate shortcomings of often-used validation indices such as overlap of significant effects and the correlation coefficient and recommend the concordance correlation coefficient (CCC) as an alternative.
We provide recommendations that will enhance validity checks of microarray experiments while minimizing the need to run a large number of labour-intensive individual validation assays.
Microarrays provide large-scale comparative gene expression profiles between biological samples by simultaneously detecting either expression or differential expression in thousands of genes. The lack of agreement among various technologies putatively measuring the same processes has prompted calls for microarray results to be validated with other technologies before they are published [1, 2]. Microarray findings are usually validated on a gene-by-gene basis to lend support to biological models. It is not always clear, however, the extent to which the validation of these genes reflects the entire microarray experiment, in part because validation procedures often fall short of optimal sampling and statistical requirements . Thus, microarray results where only a handful of genes have been validated, with little concern regarding the remainder of the data, are common. Moreover, public repositories of microarray experiments contain an ever increasing number of archived studies for which it is not readily possible to evaluate the quality. Global validation of microarray experiments is necessary to address these shortcomings. A global validation approach should provide an index of the quality of the fold-change estimates regarding all differentially expressed genes. Such an approach would be a valuable tool for investigators to assess the value of their microarray experiments. Furthermore, if routinely provided in archived gene expression experiments, global validation information would identify studies most useful for hypothesis generation and would also provide a study confidence index that could be used for several applications including meta-analysis  and Bayesian network modelling . Three important aspects of global validation must be considered: what measurement should be validated, how should a subset of differentially expressed genes be chosen for follow-up, and what statistical evidence is needed to confirm validation of the microarray results in total?
It has been proposed that fold-change (FC), rather than raw expression, is the appropriate measure for comparing results across platforms [6, 7]. Otherwise, various technology-specific artifacts (e.g., probe-specific biases across microarray platforms, amplification bias in PCR) can compromise direct comparisons between gene expression measurements. At the same time, there is general consensus that whereas FC is a reasonable measure of effect size, it is inadequate as a test-statistic .
Investigators may select genes for validation based on reagent availability or they may select genes based on ad-hoc or a-priori biological models. One common strategy is to select the largest FCs or the most statistically significant differentially expressed genes, based on the idea that large effects are more likely to be valid . Such non-random sampling strategies have limited utility as gene selection procedures because validation results do not readily generalize to the entire set of differentially expressed genes. Moreover, the "regression toward the mean" statistical artifact [9, 10], whose effects are exacerbated by selecting genes with the largest FCs, may lead to underestimating the global level of agreement between microarray and validation samples. We describe two random sampling strategies which provide data appropriate for global validation.
Finally, what statistical evidence is needed to assess global validation? Perfect agreement between two sets of FC measurements is indicated by correspondence of the paired data points to the identity line (regression slope of 1 with a y-intercept of 0). Extent of agreement between this identity line and the data is not captured, however, by the commonly-used Pearson r correlation coefficient. For example, high r2 values could be obtained between two sets of observations even if the average of one set differed greatly from the other (good precision in predicting one set based on the other, but poor accuracy). By contrast, low r2 values could be obtained, despite the two sets agreeing quite well on average (good accuracy in predicting one set based on the other, but poor precision). A better indicator of validation is provided by the concordance correlation coefficient (CCC) [11–13], which combines accuracy and precision coefficients in one index.
The CCC can vary from 1 (perfect agreement) to -1 (perfect reversed agreement), with zero representing no agreement. Its precision coefficient squared is the Pearson r2, which measures how close the data points are to the least-squares regression line; the CCC's accuracy coefficient measures how closely the regression line agrees with the identity line. Precision-squared and accuracy can vary from zero (no agreement) to 1 (perfect agreement). CCC values can be small because precision is low, accuracy is low, or both.
We present simulated and empirical (microarray and qrPCR) data to illustrate deficiencies in selecting only the largest effects for retest and to propose better gene selection methods. We also examine frequently-used statistical metrics for assessing validation and contrast their performance with the CCC index. We show that CCC is a useful predictor of global validation of microarray experiments, and that it can be used as an index of quality for all microarray studies.
Performance of sampling strategies using simulated data
The top-ranked sampling procedure produced inferior estimates of all five statistical indices relative to random and random-stratified sampling. Slope and y-intercept values were accurately estimated across all sampling methods, although estimates for top-ranked sampling were highly variable (Figures 1C and 1D, 2A and 2B). Moreover, 6.8% of the slopes for the top-ranked samples were negative (i.e., in the opposite direction) compared to 0.02% (random-stratified) and 0.03% (random) (Figure 1C).
Top-ranked sampling also generated both highly inaccurate (downwardly biased) and highly variable estimates of precision, accuracy, and CCC compared to random-stratified and random sampling (Figures 1E,F, and 1G, 2C,D, and 2E, and Additional file 1). Moreover, only 32% of the top-ranked precision values were greater than 0.63 (the nominal p < 0.05, two-tailed significance level) compared to 93% and 87% for the random-stratified and random samples, respectively (uncorrected for multiple testing and assuming random sampling) (Figure 2C).
The extent of the bias in the precision estimates of both random and random-stratified sampling will vary with the true population correlation and with sample size. This bias should be, however, negligible in microarray validation studies. For random sampling, there are formulas which provide approximate corrections for the negative bias, which can be as high as 0.03 – 0.04 . To estimate the size of the bias for the type of stratification in the present study, we conducted additional simulations (10,000 runs at a time) in which we varied the number of "genes" selected per stratum (1–9 of 10). For randomly stratified data, the upward bias (mean difference with the benchmark data) of the sample correlation coefficient ranged from a high of 0.0040 (n = 1 per stratum) to a low of 0.0003 (n = 9 per stratum).
In summary, random and random-stratified sampling performed similarly well, although random sampling was slightly more variable and produced more outliers on all indices (Figures 1C–G and Additional file 1). Moreover, the top-ranked sampling strategy performed substantially worse than either of the two other strategies.
Performance of sampling strategies using empirical microarray data
List of genes upregulated in the microarray experiments selected for validation by qrPCR
Probe Set ID
dual specificity phosphatase 1
RIKEN cDNA 8430417G17
RAS p21 protein activator 3
CCAAT/enhancer binding protein delta
regulator of G-protein signaling 2
cysteine dioxygenase 1, cytosolic
RIKEN cDNA 1190002H23
mitogen-activated protein kinase kinase kinase 6
ras homolog gene family, member AB
TEA domain family member 4
downstream of tyrosine kinase 1
CCAAT/enhancer binding protein delta
nuclear factor, interleukin 3 regulated
cysteine rich protein 61
dipeptidase 1 (renal)
stromal interaction molecule 1
MAD homolog 1
RIKEN cDNA 2400006P09
chondroitin sulfate proteoglycan 2
cDNA sequence BC028953
forkhead box P1
expressed sequence AW538196
interleukin 6 receptor, alpha
Relation between microarray and qrPCR
Regression toward the mean
The lower precision and accuracy of the top sampling strategy can be explained by the regression toward the mean phenomenon. The phenomenon describes the tendency for extreme values of one set of observations to be less extreme on a second set. The lower the true correlation and the more extreme the values on the initial set, the more pronounced the tendency. Note also that regression toward the mean is bidirectional; the artifact remains if the initial and the retest sets are reversed.
The regression toward the mean effect depends solely on the correlation between two sets of observations; it occurs whenever the correlation is less than perfect (i.e., r2 < 1) [9, 10]. This correlation in turn depends on the variability of the true (unknown) values and the variability of the random error associated with the measurements. Reducing random error (e.g., by stringent quality control procedures) and sampling across the entire data range maximizes the observed correlation and minimizes the adverse effects of regression toward the mean. Finally, although for ease of exposition we conducted our simulations assuming linearity and normally distributed random error, the regression toward the mean phenomenon does not depend on these assumptions [9, 10].
The strength of the adverse effect of top-ranked sampling will depend on the distributions of the microarray and validated sample data. Accordingly, choice of microarray and qrPCR preprocessing methods (e.g., background correction, normalization, transformation, calibration) will affect the regression toward the mean effect to the extent that they affect the data distributions, although the effect will always be present to some degree. Assuming linearity and homoscedasticity, restricting the microarray data to the top-ranked effects will underestimate the population correlation between microarray (x) and validation samples (y) according to the following formula :
Where ρ and σ are the population correlations and standard deviations, respectively.
Regression toward the mean can be most easily illustrated when the two sample means and standard deviations are at least approximately equal, as is the case for the simulated data in Figure 1 and the microarray-microarray data in Figure 3. Note the larger distances between the identity (red) line in Figure 1A and the benchmark data least squares regression (black) line for the extreme (low and high) values of the initial sample (x-axis). Similar differences are seen between the identity (red) line and the least squares regression (blue) line in Figures 3B–D. For low initial sample values, predicted values on retest are larger; for large initial sample values, predicted values on retest are smaller.
Pair-link diagrams provide another graphical illustration  (Figure 1B). The lines linking the initial scores to their respective retest scores tend to cross. On average, high scores on the initial sample have negative (decreasing) slopes, low scores have positive (increasing) slopes, and middle scores have flat slopes. The upshot of this tendency is that retest values will have lower precision and lower accuracy when top-ranked initial values are selected for retest.
By contrast, the pair-link diagram for the microarray/qrPCR data (Figure 7E) shows that most lines linking the microarray FCs to their respective qrPCR FCs have a positive slope, especially among the top-ranked microarray values. Despite appearances, regression toward the mean is nonetheless present and provides an explanation for the lower level of agreement observed among the top-ranked genes. This type of apparent "regression away from the mean" can only occur when the standard deviation of the validation sample is larger than the standard deviation of the initial sample, as here (sdqrPCR = 0.85; sdMicroarray = 0.56). Regression toward the mean, however, is a phenomenon of standardized scores (it is simply not necessary for illustration purposes to use standardized scores when standard deviations are equal). When variance is taken into account and measurements are converted to standardized z-scores, the regression toward the mean effect is evident (Figures 7C, D and F). Extreme standardized qrPCR FCs are less extreme than their corresponding microarray FCs. Larger sample sizes would be needed to determine if the lower agreement among the top-ranked genes is due solely to regression toward the mean or to regression toward the mean plus some other effect (e.g., non-linearity).
Routine global validation of microarray results would provide valuable information on the quality of microarray studies and would complement existing standards for validating individual genes. Our results demonstrate that the outcome of global validation depends on how a subset of genes is chosen. Random-stratified sampling provides more accurate and more precise estimates of agreement between microarrays and qrPCR than does the often-used top-ranked sampling procedure. Our empirical results confirm the theoretical argument that selecting top-ranked differentially expressed genes for validation leads to underestimating the level of agreement between microarray and qrPCR validation FC values.
Genes which are deemed especially important to the experimenter can be validated separately from genes required for validation of the microarray experiment in total. Appreciation for the regression toward the mean effect, however, is still necessary for informed decisions regarding these specific genes. The tendency for extreme FCs to be less extreme on validation will still be operating, although the non-random sampling will make it difficult to assess the extent of the effect.
CCC index of validation
We argue that a one-to-one correspondence between microarray and validation FC estimates is the gold standard for validation. This tight clustering of retest FC values around the identity line is indexed uniquely among validation indices by the CCC measure. Accordingly, the CCC provides dimensionless metrics with which to compare technology platforms, statistical procedures, and laboratory protocols, and ultimately, the overall quality of any given microarray study. In the case of unusually low CCC values, the accuracy and precision components provide clues on how the validation samples deviate from the standard which may in turn suggest procedural or statistical remedies. Regression slope and intercept estimates provide additional information to convert microarray FC estimates into estimates from lower throughput methods. Finally, the reported robustness of the CCC with as few as 10 data points  is supported in our data, as influential (outlier) data points and deviations from linearity had little effect on the CCC estimates, although robust analogues of the CCC are also available .
The relatively high CCC values we observed in pairwise comparisons between our microarray experiments (Figures 3B–D) lend support to FC as a good index of effect size for platform comparison purposes. However, FC measurements present a statistical technical difficulty when attempting to validate the entire FC range, including non-differentially expressed genes. Most log FCs near zero represent non-differentially expressed genes whose variation merely reflects noise. Correlation with qrPCR for these genes will be close to zero and the least squares line will be flat within this range. Differentially expressed genes, on the other hand, will have positive slopes for both up and down regulated genes. If only differentially expressed genes are selected for validation, up and down-regulated genes should be examined separately. Analyzing them together will upwardly bias correlation values due to a "range enhancement" artifact ; in extreme cases, the correlation between microarray and qrPCR FCs may be close to 1, despite zero correlations when up and down-regulated genes are analyzed separately.
The adverse effects of regression toward the mean are sometimes avoided when specific genes of interest are selected and they cover the full FC range coincidentally. The strategy remains less than optimal as a global validation strategy, however, because the non-random sampling nonetheless prevents the generalization of the conclusions to the remaining majority of differentially expressed genes.
Threshold index of validation
All top-ranked, but only nine out of the 14 random-stratified genes in our study were statistically significant by qrPCR in the three samples. Non-significant p values tended to occur among the smaller FCs, but this effect was not uniform, as the smallest average microarray FC gene was significant in all three PCR samples. The seemingly paradoxical difference between the statistical significance threshold and the CCC approaches to validation can be explained as follows.
One difficulty with this type of threshold-based strategy is the choice of threshold. It can be shown that the smaller the initial p value, the more probable a second test will meet a specified probability threshold [24, 25]. A true positive gene that is differentially expressed at p = 0.05 has only a 50% chance of being differentially expressed at p < 0.05 on retest; at p = 0.005, the probability of obtaining p < 0.05 on retest increases to 80% . Accordingly, the larger its initial FC, the more likely the gene will exceed the decision threshold in the validation sample despite regression toward the mean. This threshold approach to validation, however, is adequate only if one is interested in the largest FC effects to the exclusion of more moderate but potentially important effects.
Additionally, consider the following example. Using a p < 0.05 threshold, a gene that is differentially expressed at p = 0.049 by microarray but at p = 0.051 by qrPCR would be said to not have validated despite almost identical p values. By contrast, a gene that is differentially expressed at p = 0.0001 by microarray and at p = 0.049 by qrPCR would be said to have validated despite a large discrepancy in p values. One way around this difficulty would be to require that the effect size of the validated sample not be significantly different from the effect size of the initial sample for a conclusion that the gene was validated, although this raises the additional question of how many replicate samples would be needed to provide sufficient statistical power for detecting differences . Moreover, the issue arises whether a gene would be considered validated if it were significant in both initial and validated samples but with significantly different effect sizes in the two samples. Despite their limitations however, threshold approaches do provide a framework for dichotomous decisions regarding whether or not an individual gene has been validated.
The question arises of how one can reach this type of decision on individual genes with the validation approach that we are advocating. One approach might be to first determine the extent of validation across a number of appropriately (e.g., random, random stratified) sampled genes by some index of global validation (e.g., CCC). If the results of global validation were found to be inadequate, then the microarray experiment might well be considered a failure and the validation of individual genes moot. If on the other hand global validation were found to be adequate, then regression diagnostic methods might be used to identify outlier genes whose validation results deviated from the pattern of the majority of genes for a variety of reasons (e.g., splice variants; cross-hybridization).
These regression outlier genes could then be investigated further to determine their status; non-outlier genes would be considered validated.
A corollary to this approach is that one could extend the conclusion of validity to those microarray findings not selected for validation with PCR but whose effects fall within the sampled (and validated) range. Our approach advocates selection of genes through a random process; therefore, any index of global validation should be uninfluenced by the specific sample selected and should generalize to the non-validated genes. Moreover, the relative proportions of outlier and non-outlier genes would provide an indication of the relative risk involved in making such a generalization.
Our results point to the importance of gene selection strategy, choice of qrPCR calibration methods, and choice of validation index in the assessment of microarray validation results. Sample sizes of 10 to 25 genes should be adequate for most validation purposes, although more observations may be needed to reliably estimate non-linearity between microarray and validation FC values. The CCC provides a global indication of the reproducibility of gene expression FCs estimated by microarrays, providing that a suitably random procedure is used to select genes for validation. Thus, we propose that the CCC be used as a universal measure of study quality.
FC values were simulated according to the following model: FC = μ + ε, where μ ~ N (1, 0.2) and ε ~ N (0, 0.1). That is, for each of the 100 simulated genes, a "true" FC was randomly generated from a N (1, 0.2) distribution. Random error for each of these true values was randomly generated twice from a N (0, 0.1) distribution. Each of the pair of random error values was added to a single true value to produce an "initial sample" and a "retest sample" value. The 0.2 standard deviation across the true FC values and the 0.1 standard deviation value for ε were selected to produce an expected correlation of 0.80. The expected correlation between two measurements with iid error associated with the same true value (here FC) is the ratio of the variance of the true scores to the variance of the measured scores as follows (see  pp. 134–140):
where , , and are the variances of the true scores, the measured scores, and random error, respectively.
For the current simulated data,
Sampling from simulated or from microarray data was performed in three different manners. For top-ranked sampling, 10 (simulated data) or 15 (microarray data) largest up-regulated FCs were selected. For random sampling, 10 (or 15) observations were randomly selected from the entire range of upregulated genes. For random-stratified sampling, the entire list of upregulated genes was rank ordered according to FCs; the data were then divided into 10 (or 15) equal-sized bins and one gene per bin was selected randomly.
where r is the Pearson correlation coefficient, and and and are the y and x sample variances and means, respectively.
Note that r c = r when = and s y = s x .
The CCC can also be written as the product of the accuracy and the precision coefficients. The precision coefficient is given by:
Where z are standardized scores (mean of zero with unit variance).
The accuracy coefficient is given by:
The original SAS code for the CCC index is available from . We have adapted the code for S-Plus and R, which is available from the corresponding author.
Robust statistical test for non-linearity
The relation between microarray and qrPCR FCs was assessed for curvature (Figure 4) by applying the Cramér-von Mises test to residuals generated by the Theil-Sen algorithm [29, 30]; see  for a description of the procedure and S-Plus functions [see Additional file 1 for more information].
Empirical study design overview
The biological samples used in the microarray study and the subsequent validation by qrPCR were obtained from three replicate studies. First, the cell culture experiment was performed three distinct times, and every time, the 2 samples (1 control and 1 treatment) were divided into several smaller aliquots. For the microarray hybridization, total RNA was extracted from these smaller aliquots: (1 control + 1 treatment) × 3 experiments = 6 samples. From these total RNA samples, five aliquots of total RNA were labelled and hybridized onto Affymetrix GeneChips; hence 6 samples × 5 aliquots = 30 Genechips. For qrPCR, total RNA was extracted from a second set of the smaller aliquots. For each gene tested, we performed 6 technical replicates of the qrPCR on each of the 6 samples.
Cell culture, treatment, and RNA extraction
Mouse 3T3-L1 cells, obtained from ATCC, were grown in DMEM (Invitrogen Canada Inc.) containing 10% charcoal/dextran treated fetal bovine serum (Hyclone), 2 μM L-glutamine (Invitrogen) and 100 U/mL penicillin/streptomycin (Invitrogen). Two parallel cultures containing cells seeded at a density of 6 × 105 per 150-mm plate were grown for 72 h (to confluence). The culture media was replaced and the cells were incubated for an additional 48 h. The cells were then treated with 1 μM dexamethasone (Sigma; dissolved in ethanol) or ethanol (control), for 3 h and were harvested by adding 6 mL Trizol reagent (Invitrogen) directly to each culture dish. The experiment was repeated three times using successive cell passages. Total cellular RNA was prepared according to the manufacturer's instructions. The samples were quantified by spectrophotometry and the RNA integrity was assessed using Agilent BioAnalyser RNA LabChips.
Microarray probe preparation and hybridization
Biotinylated cRNA probes were prepared for microarray analysis according to the manufacturer's instructions, using 10 μg of total RNA. Five aliquots of each sample were used for probe preparation, and the probes were hybridized overnight to Affymetrix MG-U74Av2 GeneChips (30 in total). Following hybridization, non-specifically bound probe was removed by washing using the GeneChip Fluidics Station 400 (Affymetrix). Specifically bound probe was detected by incubating the arrays with streptavidin phycoerthryin (Molecular Probes) and biotinylated anti-streptavidin antibody (Vector Laboratories) and scanning the chips using a Gene Array Scanner (Agilent). To minimize technical variability, RNA processing steps (RNA extraction, probe labeling and microarray hybridization) were performed in parallel for all samples.
Microarray data processing
Data for the three experiments were analyzed on a per experiment basis. For each experiment, data were normalized by the robust multi-array average (RMA) algorithm . Differential expression was tested by independent t-tests and corrected for multiple testing using the false discovery rate procedure [FDR, ]. Two-hundred forty-two genes were significantly differentially expressed in the same direction in all three experiments (FDR q = 0.05). Modified t-tests using the significance analysis of microarrays procedure [SAM, ] were also computed, with deltas of 0.25, 0.425, and 0.54269 for experiments 1, 2 and 3, respectively. These deltas corresponded to a false positive rate of approximately 0.05 and yielded 400 genes which were significantly differentially expressed in the same direction in all three experiments (see also the MIAME document [Additional file 9]). The intersection of these two lists contained 241 of the 242 genes identified by t-test alone. Because of the large overlap of the two methods, we selected the list generated by t-test as our final gene list.
These stringent criteria were adopted for two reasons. We wished to strictly minimize the false positive rate so as to reflect as accurately as possible the simulation data while at the same time not using a criterion that was redundant with the validation indices under examination. It is probable that this approach overlooked a number of differentially expressed genes. As such, although it served the methodological purposes of the present study, we do not recommend it as a general analytical strategy inasmuch as the false negative rate was likely unfavorably high.
Selection of differentially expressed genes for validation
The within-experiment log2 FC averages for the 242 consistently statistically significant genes were calculated and then averaged across experiments. Downregulated and four upregulated outlier genes (> 5 MADs) which would have adversely affected statistical analysis were eliminated from the list. Twenty-nine genes from the remaining 150 upregulated genes were selected for validation: the top 15 FCs and one gene randomly selected from each of 15 rank-ordered strata.
Oligonucleotide primer design for quantitative real time polymerase chain reaction
The cDNA and genomic sequences of all selected genes were obtained and the Primer 3 web tool  was used to select pairs of oligonucleotide primers with an optimal melting temperature of 60°C. Primer pairs were selected to span an intron/exon junction, except for a few genes where this was not possible, such as intronless genes.
Quantitative real time polymerase chain reaction
All primer pairs were tested by the polymerase chain reaction [1× reaction buffer, 3.5 μM MgCl2, 0.2 mM dNTP, 0.2 μM of each primer, 5 ng cDNA template (labeled as above), 0.25 U Hotstart Taq polymerase (Qiagen)] and agarose gel electrophoresis to verify the presence of a single band of the predicted size. Selected pairs were tested in quantitative real time polymerase chain reactions (qrPCR) on an ABI Prism® 7900HT sequence detection system (Applied Biosystems). Each combination of 50, 150 and 300 nM of forward and reverse primer was tested in duplicate reactions [1× reaction buffer, 3.5 μM MgCl2, 0.2 mM dNTP, varying concentrations of primer, 0.04 μM ROX (Molecular probes), 1× SYBR-green 1 (from a 10 000× stock, Molecular probes), 1× bovine serum albumin (New England Biolabs), 5 ng cDNA template, 0.25 U Hotstart Taq polymerase (Qiagen)] and the optimal reaction condition was selected. For each gene, we prepared 6 technical replicate reactions of the following: a calibration curve from a two-fold dilution series ranging from 20 to 0.078 ng of cDNA (9 dilutions), a control containing no cDNA, and each of the six experimental samples; 16 groups in total, for a total of 96 qrPCR reactions. The placement of the 16 groups of qrPCR reactions within the 384-well plates was randomized for every gene such that the samples would not always be located at the same place on the plate. All the qrPCR reactions for a given gene were run in parallel on the same plate. The results from the three control and the three treatment samples were then calculated by comparison to the calibration curve. Seven standard curve data points (for five of the 29 genes) had large influence on the regression slope (as indicated by |standardized slope dfBeta| values > 1, ). These data points were deleted, and the data were recalibrated. The experimental samples were calibrated according to their respective standard curve equations. Fifty-four of the 1044 calibrated values were found to be outliers (defined as one-and-a-half (1.5) times the inter-quartile range beyond the 25th and 75th percentile values). These outliers were deleted prior to averaging.
concordance correlation coefficient.
false discovery rate.
quantitative real time polymerase chain reaction.
This work was supported by the "High Throughput Gene Expression" funding under the Génome Québec Phase II Bioinformatics Consortium program and by the Genome Canada/Génome Québec funding of the "Microarray Data Analysis for Class Comparison: Methods, Software, and Pedagogy" project (R.N.).
- Benes V, Muckenthaler M: Standardization of protocols in cDNA microarray analysis. Trends in Biochemical Sciences 2003, 28(5):244–249. 10.1016/S0968-0004(03)00068-9View ArticlePubMedGoogle Scholar
- Firestein GS, Pisetsky DS: DNA microarrays: Boundless technology or bound by technology? Guidelines for studies using microarray technology. Arthritis and Rheumatism 2002, 46(4):859–861. 10.1002/art.10236View ArticlePubMedGoogle Scholar
- Moreau Y, Aerts S, De Moor B, De Strooper B, Dabrowski M: Comparison and meta-analysis of microarray data: from the bench to the computer desk. Trends in Genetics 2003, 19(10):570–577. 10.1016/j.tig.2003.08.006View ArticlePubMedGoogle Scholar
- Mosteller F, Colditz GA: Understanding research synthesis (meta-analysis). Annual Review of Public Health 1996, 17: 1–23. 10.1146/annurev.pu.17.050196.000245View ArticlePubMedGoogle Scholar
- Troyanskaya OG, Dolinski K, Owen AB, Altman RB, Botstein D: A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae). PNAS 2003, 100(14):8348–8353. 10.1073/pnas.0832373100PubMed CentralView ArticlePubMedGoogle Scholar
- Irizarry RA, Warren D, Spencer F, Kim IF, Biswal S, Frank BC, Gabrielson E, Garcia JG, Geoghegan J, Germino G, Griffin C, Hilmer SC, Hoffman E, Jedlicka AE, Kawasaki E, Martinez-Murillo F, Morsberger L, Lee H, Petersen D, Quackenbush J, Scott A, Wilson M, Yang Y, Ye SQ, Yu W: Multiple-laboratory comparison of microarray platforms. Nat Methods 2005, 2(5):345–50. Epub 2005 Apr 21.. 10.1038/nmeth756View ArticlePubMedGoogle Scholar
- Larkin JE, Frank BC, Gavras H, Sultana R, Quackenbush J: Independence and reproducibility across microarray platforms. Nature Methods 2005, 2(5):337–343. 10.1038/nmeth757View ArticlePubMedGoogle Scholar
- Chuaqui RF, Bonner RF, Best CJM, Gillespie JW, Flaig MJ, Hewitt SM, Phillips JL, Krizman DB, Tangrea MA, Ahram M, Linehan WM, Knezevic V, Emmert-Buck MR: Post-analysis follow-up and validation of microarray experiments. Nature Genetics 2002, 32: 509–514. 10.1038/ng1034View ArticlePubMedGoogle Scholar
- Campbell DT, Kenny DA: A primer on regression artifacts. New York, Guilford Press; 2003.Google Scholar
- Stigler SM: Statistical concepts in psychology. In Statistics on the table: The history of statistical concepts and methods. Cambridge, MA, Harvard University Press; 1999:189–199.Google Scholar
- Lin LI: A concordance correlation-coefficient to evaluate reproducibility. Biometrics 1989, 45(1):255–268. 10.2307/2532051View ArticlePubMedGoogle Scholar
- Lin LIK: Total Deviation Index for Measuring Individual Agreement With Applications in Laboratory Performance and Bioequivalence. Statistics in Medicine 2000, 19(2):255–270. 10.1002/(SICI)1097-0258(20000130)19:2<255::AID-SIM293>3.0.CO;2-8View ArticlePubMedGoogle Scholar
- Lin L, Hedayat AS, Sinha B, Yang M: Statistical methods in assessing agreement: Models, issues, and tools. Journal of the American Statistical Association 2002, 97(457):257–270. 10.1198/016214502753479392View ArticleGoogle Scholar
- Zimmerman DW, Zumbo BD, Williams RH: Bias in estimation and hypothesis testing of correlation. Psicológica 2003, 24: 133–158.Google Scholar
- Masseroli M, Cerveri P, Pelicci PG, Alcalay M: GAAS: Gene Array Analyzer Software for management, analysis and visualization of gene expression data. Bioinformatics 2003, 19(6):774–775. 10.1093/bioinformatics/btg071View ArticlePubMedGoogle Scholar
- Hand DJ: Measurement theory and practice: The world through quantification. LondonNew York, Arnold ;Distributed in the U.S.A. by Oxford University Press; 2004:x, 320 p..Google Scholar
- Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 2003, 4(2):249–264. 10.1093/biostatistics/4.2.249View ArticlePubMedGoogle Scholar
- Li C, Wong WH: Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proceedings of the National Academy of Sciences of the United States of America 2001, 98(1):31–36. 10.1073/pnas.011404098PubMed CentralView ArticlePubMedGoogle Scholar
- Wu Z, Irizarry R, Gentleman R, Murillo F, Spencer F: A model based background adjustment for oligonucleotide expression arrays. John Hopkins University, Department of Biostatistics Working Papers, Baltimore, MD; 2004.Google Scholar
- Affymetrix: Microarray Suite User Guide, Version 5.[http://www.affymetrix.com/support/technical/manuals.affx]
- Affymetrix: Technical note for PLIER.[http://www.affymetrix.com/support/technical/technotes/plier_technote.pdf]
- Hunter JE, Schmidt FL: Methods of meta-analysis : correcting error and bias in research findings. 2nd edition. Thousand Oaks, Calif., Sage; 2004:xxxiii, 582.Google Scholar
- King TS, Chinchilli VM: A generalized concordance correlation coefficient for continuous and categorical data. Statistics in Medicine 2001, 20(14):2131–2147. 10.1002/sim.845View ArticlePubMedGoogle Scholar
- Greenwald AG, Gonzalez R, Harris RJ, Guthrie D: Effect Sizes and P Values: What Should Be Reported and What Should Be Replicated? Psychophysiology 1996, 33(2):175–183.View ArticlePubMedGoogle Scholar
- Lander E, Kruglyak L: Genetic dissection of complex traits - Reply. Nature Genetics 1996, 12(4):357–358. 10.1038/ng0496-357View ArticleGoogle Scholar
- Allison DB, Cui X, Page GP, Sabripour M: Microarray data analysis: from disarray to consolidation and consensus. Nature Reviews Genetics 2006, 7(1):55–65. 10.1038/nrg1749View ArticlePubMedGoogle Scholar
- Cohen J, Cohen P, West SG, Aiken L: Applied multiple regression/correlation analysis for the behavioral sciences. Mahwah, N.J. ;, London :, L. Erlbaum Associates; 2003.Google Scholar
- SAS code for CCC:[http://tigger.uic.edu/~hedayat/]
- Sen PK: Estimates of regression coefficient based on Kendall's tau. Journal of the American Statistical Association 1968, 63(324):1379-&. 10.2307/2285891View ArticleGoogle Scholar
- Theil H: A rank-invariant method of linear and polynomial regression analysis. Indagationes Mathematicae 1950, 12: 85–91.Google Scholar
- Wilcox RR: Applying contemporary statistical techniques. Amsterdam ;, Boston :, Academic Press; 2003:pp. 477–479 and 531–533.Google Scholar
- Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP: Summaries of affymetrix GeneChip probe level data. Nucleic Acids Research 2003., 31(4):
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B - Methodological 1995, 57(1):289–300.Google Scholar
- Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences 2001, 98(9):5116–5121. 10.1073/pnas.091062498View ArticleGoogle Scholar
- Primer3 web tool[http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.