Effect of various normalization methods on Applied Biosystems expression array system data
© Barbacioru et al; licensee BioMed Central Ltd. 2006
Received: 18 October 2006
Accepted: 15 December 2006
Published: 15 December 2006
DNA microarray technology provides a powerful tool for characterizing gene expression on a genome scale. While the technology has been widely used in discovery-based medical and basic biological research, its direct application in clinical practice and regulatory decision-making has been questioned. A few key issues, including the reproducibility, reliability, compatibility and standardization of microarray analysis and results, must be critically addressed before any routine usage of microarrays in clinical laboratory and regulated areas can occur. In this study we investigate some of these issues for the Applied Biosystems Human Genome Survey Microarrays.
We analyzed the gene expression profiles of two samples: brain and universal human reference (UHR), a mixture of RNAs from 10 cancer cell lines, using the Applied Biosystems Human Genome Survey Microarrays. Five technical replicates in three different sites were performed on the same total RNA samples according to manufacturer's standard protocols. Five different methods, quantile, median, scale, VSN and cyclic loess were used to normalize AB microarray data within each site. 1,000 genes spanning a wide dynamic range in gene expression levels were selected for real-time PCR validation. Using the TaqMan® assays data set as the reference set, the performance of the five normalization methods was evaluated focusing on the following criteria: (1) Sensitivity and reproducibility in detection of expression; (2) Fold change correlation with real-time PCR data; (3) Sensitivity and specificity in detection of differential expression; (4) Reproducibility of differentially expressed gene lists.
Our results showed a high level of concordance between these normalization methods. This is true, regardless of whether signal, detection, variation, fold change measurements and reproducibility were interrogated. Furthermore, we used TaqMan® assays as a reference, to generate TPR and FDR plots for the various normalization methods across the assay range. Little impact is observed on the TP and FP rates in detection of differentially expressed genes. Additionally, little effect was observed by the various normalization methods on the statistical approaches analyzed which indicates a certain robustness of the analysis methods currently in use in the field, particularly when used in conjunction with the Applied Biosystems Gene Expression System.
DNA microarray technology provides a powerful tool for characterizing gene expression on a genome scale. While the technology has been widely used in discovery-based medical and basic biological research, its direct application in clinical practice and regulatory decision-making has been questioned [1, 2]. A few key issues, including the reproducibility, reliability, compatibility and standardization of microarray analysis and results, must be critically addressed before any routine usage of microarrays in clinical laboratory and regulated areas can occur. Considerable effort has been dedicated to investigate these important issues, most of which focused on the compatibility across different laboratories and analytical methods, as well as the correlation between different microarray platforms. In this study we investigate some of these issues using the Applied Biosystems Human Genome Survey Microarrays.
The microarrays contain 31,700 60-mer oligonucleotide probes representing 29,098 individual human genes, and uses chemiluminescence (CL) to identify and measure gene expression levels in cells and tissues. In addition to the unique 60-mer probe, an internal control probe (a 24-mer oligonucleotide) is co-spotted with the 60-mer probe on the microarray and labeled with a complementary oligo containing the fluorescent LIZ® dye (FL) during the hybridization of the microarray.
In this study, we analyzed the gene expression profiles of two human tissues: brain and universal human reference sample (UHR). Five technical replicates in three different sites were performed on the same total RNA samples according to manufacturer's standard protocols. Five different methods, quantile [3, 4], median , scale[6, 7], VSN  and cyclic loess  were used to normalize AB microarray data within each site. Since fold change and variance dependency with intensity is platform dependent  we were interested in evaluating the performance of these methods applied to AB microarray data, making this study the first one from this perspective. We restricted our attention on these five methods for the following reasons. These methods are most frequently used normalization methods for AB microarray data. In addition, the microarrays used in this study contain one probe for each gene (for most of the cases), this design restricting the number of normalization methods to be used and making methods based on replicated measurements for each gene (RMA, Plier etc.) inapplicable. Other normalization methods that would also be inapplicable include those explicitly developed for two color technology, or replicated measurements.
1,000 genes spanning a wide dynamic range in gene expression levels were selected for real-time PCR validation. Using the TaqMan® assays data as the reference set, the performance of the five normalization methods was evaluated focusing on the following criteria: (1) Sensitivity and reproducibility in detection of expression; (2) Fold change correlation with real-time PCR data; (3) Sensitivity and specificity in detection of differential expression; (4) Reproducibility of differentially expressed gene lists. The data set analyzed in this manuscript has been reported elsewhere  and made publicly available via GEO accession number GSE5350 using the platform GPL 4097 for TaqMan® assays data and GPL 2986 for Applied Biosystems Human Genome Survey Microarrays data.
Target selection for real-time PCR validation
Sensitivity and reproducibility in detection of expression
Detection concordance between AB microarrays and TaqMan® assays in UHR and Brain.
UHR, site 1
Brain, site 1
Fold change concordance with TaqMan® assays
Fold change concordance: linear regression parameters.
Sensitivity and specificity in detection of differential expression
Significantly differentially expressed genes concordance.
Differential expression t-test, t-test + FDR, t-test + FC, t-test + FDR + FC cut, SAM applied to Quantile normalized data.
p-t test + FDR
p-t test + FC
p-t test +FDR+FC
Reproducibility of differentially expressed gene lists
Reproducibility of differentially expressed gene lists.
One unanswered question in the microarray field has always been the effect of various normalization as well as statistical methods on the end results of a profiling experiment and more explicitly whether using different normalization or statistical approaches results in different gene lists of less concordance between different microarray platforms. In this study we have assessed the performance of five different normalization methods using the Applied Biosystems Expression Array System. Our results show a high level of concordance between these normalization methods. This is true, regardless of whether signals, variation or fold change measurements were interrogated. In addition, these five normalization methods showed similar performance of signal reproducibility between the three testing sites used for this study. Furthermore, we used TaqMan® assays as a reference, to generate TPR and FDR plots for the various normalization methods across the assay range (Figure 8). TPR was directly correlated to gene expression levels whereas FDR was inversely correlated. This is not completely surprising as the two platforms have different dynamic ranges and sensitivity levels, with the detection levels of the microarrays being lower than those of TaqMan® assays. These differences more than likely explain the lower TP rates and higher FP rates for the genes at the low expression levels. These effects were also observed for several other microarray platforms in a separate study . One conclusion of this study is that, at least for the microarray platform tested in this study, the current normalization approaches have little impact on the signal, detection levels as well as TP and FP rates in detection of differentially expressed genes. These results are consistent with the findings of the MAQC study ([18, 19]). In addition we also explored the contribution of several statistical approaches commonly used in the field on the TP and FP rates. As expected in this case, with approaches which relax the stringency in differential expression, better detection and differential expression concordance is observed, concomitant with a higher percentage of false positives. At the opposite end of the spectrum, FDR control and SAM methods, which are more restrictive in detection of differential expression, produce gene lists with fewer false positives. SAM, as expected, shows a reduced number of false positives for low expressers, at the expense of missing some differentially expressed genes. The expected percentage of false positives in these lists is close to the one observed when comparing results to TaqMan® assays. Unfortunately, it seems that the full strength of these statistical methods is obscured by the fact that the majority of the genes chosen for TaqMan® validation show significant fold changes between samples, minimizing the effect of FDR on the FP rate. More importantly however, applying the different normalization approaches to the various statistical methods tried, had no significant impact on identifying differentially expressed genes.
Finally, when comparing the overlap in gene lists generated by each of these statistical methods, a concordance of 69.7–74.01% was observed between all three sites, and 82.4–83.8% between sites 1 and 3, indicating little effect of the analysis approach used on the final gene list obtained. This result is, however, sensitive to the cut-offs used in determining the gene lists and can affect the degree of overlap observed . We were pleasantly surprised, however, of the little effect observed by the various normalization on the statistical approaches analyzed which indicates a certain robustness of the analysis methods currently in use in the field.
In this study we have assessed the performance of five different normalization methods using data generated with the Applied Biosystems Expression Array System. Our results show a high level of concordance between these normalization methods. This is true, regardless of whether signals, variation, site reproducibility or fold change measurements were interrogated. The same similarity is observed when TaqMan® assays were used as a reference, to generate TPR and FDR plots for the various normalization methods across the assay range. In addition we also explored the contribution of several statistical approaches commonly used in the field on the detection of differential expression. Little effect is observed by the various normalization methods on the statistical approaches analyzed which indicates a certain robustness of the analysis methods currently in use in the field, particularly when used in conjunction with the Applied Biosystems microarrays.
Sample A was Universal Human Reference RNA (Stratagene) and sample B was human brain total RNA (Ambion).
Selection of genes for validation by TaqMan assays
A list of 1,297 RefSeqs was selected by the MAQC consortium. Over 90% of these genes were selected from a subset of 9,442 RefSeq common to the four platforms (Affymetrix, Agilent, GE Healthcare and Illumina) used in the MAQC Pilot-I Study (RNA Sample Pilot), based on annotation information provided by manufacturers in August 2005. This selection ensured that the genes would cover the entire intensity and fold-change ranges and include any bias due to RefSeq itself. 1,000 TaqMan gene expression assays were used in the study that matches with the MAQC gene list. These 1,000 assays covered 997 genes (3 genes had more than one assay).
Applied Biosystems Expression Array analysis
The Applied Biosystems Human Genome Survey Microarray (P/N 4337467) contains 31,700 60-mer oligonucleotide probes representing 29,098 individual human genes. Digoxigenin-UTP labeled cRNA was generated and amplified from 1 μg of total RNA from each sample using Applied Biosystems Chemiluminescent RT-IVT Labeling Kit v 1.0 (P/N 4340472) according to the manufacturer's protocol (P/N 4339629). Array hybridization was performed for 16 hrs at 55°C. Chemiluminescence detection, image acquisition and analysis were performed using Applied Biosystems Chemiluminescence Detection Kit (P/N 4342142) and Applied Biosystems 1700 Chemiluminescent Microarray Analyzer (P/N 4338036) following the manufacturer's protocol (P/N 4339629). Images were auto-gridded and the chemiluminescent signals were quantified, background subtracted, and finally, spot- and spatially-normalized using the Applied Biosystems 1700 Chemiluminescent Microarray Analyzer software v 1.1 (P/N 4336391). Five technical replicates were performed on each sample, at three different testing sites, for a total of 30 microarrays.
TaqMan® Gene Expression Assay based real-time PCR
Each TaqMan Gene Expression Assay consists of two sequence-specific PCR primers and a TaqMan assay-FAM™ dye-labeled MGB probe. Each TaqMan assay was run in four replicates for each RNA sample. 10 ng total cDNA (as total input RNA) in a 10:l final volume was used for each replicate assay. Assays were run with 2× Universal PCR Master Mix without UNG (uracil-N-glycosylase) on Applied Biosystems 7900 Fast Real-Time PCR System using universal cycling conditions (10 min at 95°C; 15 sec at 95°C, 1 min 60°C, 40 cycles). The assays and samples were analyzed across a total of 44–384 well plates. Robotic methods (Biomek FX) were used for plate setup and each sample and assay replicate was tracked on a per well, per plate basis.
Statistical analyses were performed using the open source and open development software project R together with the Bioconductor packages ab1700, limma, multtest and affy .
When running experiments that involve multiple high density long-oligonucleotide arrays, it is important to remove sources of variation between arrays of non-biological origin. Normalization is a process for reducing this variation. We present five methods of performing normalization at the probe intensity level.
The idea is to scale the log-ratios to have the same median across arrays .
Quantile normalization was proposed by Bolstad et al.  for Affymetrix-style single-channel arrays and by Yang and Thorne  for two-color cDNA arrays. This method ensures that the intensities have the same empirical distribution across arrays.
VSN (Variance stabilization normalization)
Based on a function (arsinh) that calibrates for sample-to-sample variations through shifting and scaling, and transforms the intensities to a scale where the variance is approximately independent of the mean intensity .
This approach is based upon the idea of the M versus A plot, where M is the difference in log expression values and A is the average of the log expression values, presented in Dudoit et al. . However, rather than being applied to two color channels on the same array, as is done in the cDNA case, it is applied to probe intensities from two arrays at a time.
Signal detection analysis
Detection thresholds are defined according to each platform manufacturer's recommendation. For TaqMan Gene Expression Assays, detection threshold is set as Ct < 35 and Standard deviation (of the 4 technical replicates) < 0.5; for Applied Biosystems Expression Arrays, detection threshold is set as Signal to Noise ratio (S/N) > 3 and quality flag < 5000. Detection in each sample was defined as detectable in 3 out of 4 technical replicates for TaqMan® assays and 3 out of 5 technical replicates within each site for microarrays. Using TaqMan® Gene Expression Assays calls as the reference, contingency tables were constructed against microarrays, in which True Positives Rates (genes detectable by both TaqMan® assay and microarrays as a percentage of all genes detectable by TaqMan assays), are plotted against TaqMan CT values (Figure 1).
Variability within and between sites for different normalization methods for Applied Biosystems Microarray System
Coefficient of variation (CV) is used to measure variability within each site. In Figure 4 we present the dependency between CV of site 1, with TaqMan CT measurements for each normalization method and each sample. These curves represent the lowess approximation of the CV between the 5 technical replicates of all genes against the CT measurement.
In order to quantify the variability between sites these normalization methods produce, we perform one factor (site) ANOVA on all 29,098 genes. In this way we estimate the percent variability from the total variability (of each gene) that can be explained from site variability (Figure 5). For each gene, CVs are plotted against median expression level measured by quantile normalized data, and lowess fitting curves are used to approximate the all points generated from one normalization method.
True positive rates and false discovery rates in detection of differentially expressed genes for different normalization methods
In order to have a comprehensive understanding of the performance of these 5 normalization methods, detection of differentially expressed genes between UHR and Brain samples is a key issue. Only genes detected in both samples A and B by TaqMan® assays were used in this comparison. Significantly differentially expressed genes between samples were defined as p-value < 0.05 based on a student's t-test controlling FDR at 5% level (BH). Using calls from TaqMan® Gene Expression Assays as the reference, contingency tables were constructed against the different normalization methods, in which we are taking into considerations both p-value significance and fold change direction (up or down regulation). Based on this matrix, the TPR, FPR, FDR and accuracy were calculated for each normalization method. Results are presented in Table 3. A more detailed representation of true positive rates and false discovery rates, as functions of CT measurements are presented in Figure 8. Genes were first ranked according to their average value in the tissue comparison. For each bin of 50 consecutive genes (according to the ranking), we compare the results from each normalization method with the ones from TaqMan® Assays. We keep track of up/down regulation in each platform. The average value of these 50 genes in the two samples is plotted against TPR or FDR of the concordance between the two platforms in detecting differentially expressed genes.
- Hackett JL, Lesko LJ: Microarray data – the US FDA, industry and academia. Nat Biotechnol 2003, 21: 742–743. 10.1038/nbt0703-742View ArticlePubMedGoogle Scholar
- Petricoin EF 3rd, Hackett JL, Lesko LJ, Puri RK, Gutman SI, Chumakov K, Woodcock J, Feigal DW Jr, Zoon KC, Sistare FD: Medical applications of microarray technologies: a regulatory science perspective. Nat Genet 2002, 32(Suppl):474–479. 10.1038/ng1029View ArticlePubMedGoogle Scholar
- Bolstad BM, Irizarry RA, Astrand M, Speed TP: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003, 19: 185–193. 10.1093/bioinformatics/19.2.185View ArticlePubMedGoogle Scholar
- Yang YH, Thorne NP: Normalization for two-color cDNA microarray data. In Science and Statistics: A Festschrift for Terry Speed, IMS Lecture Notes-Monograph Series Edited by: Goldstein DR. 2003, 40: 403–418.View ArticleGoogle Scholar
- Hartemink A, Gifford D, Jaakkola T, Young R: Maximum Likelihood Estimation of Optimal Scaling Factors for Expression Array Normalization. In Microarrays: Optical Technologies and Informatics, Proceedings of SPIE Edited by: Bittner M, Chen Y, Dorsel A, Dougherty E. 2001, 4266: 132–140.View ArticleGoogle Scholar
- Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP: Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research 2002, 30(4):e15. 10.1093/nar/30.4.e15PubMed CentralView ArticlePubMedGoogle Scholar
- Smyth GK, Speed TP: Normalization of cDNA microarray data. In METHODS: Selecting Candidate Genes from DNA Array Screens: Application to Neuroscience Edited by: Carter D. 2003, 31(4):265–273.Google Scholar
- Huber W, Heydebreck A, Sueltmann H, Poustka A, Vingron M: Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 2002, 18(Suppl 1):S96-S104.View ArticlePubMedGoogle Scholar
- MAQC Consortium: The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 2006, 24(9):1151–61. 10.1038/nbt1239
- Applied Biosystems Expression Array System: System profile[http://docs.appliedbiosystems.com/pebiodocs/00113259.pdf]
- Heid CA, Stevens J, Livak KJ, Williams PM: Real time quantitative PCR. Genome Res 1996, 6: 986–994.View ArticlePubMedGoogle Scholar
- Gibson UE, Heid CA, Williams PM: A novel method for real time quantitative RT-PCR. Genome Res 1996, 6: 995–1001.View ArticlePubMedGoogle Scholar
- Cleveland WS, Devlin SJ: Locally-weighted regression: an approach to regression analysis by local fitting. J Am Stat Assoc 1988, 83(403):596–610. 10.2307/2289282View ArticleGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate – a practical and powerful approach to multiple testing. J R Stat Soc B Met 1995, 57(1):289–300.Google Scholar
- Tusher V, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. PNAS 2001, 98: 5116–5121. 10.1073/pnas.091062498PubMed CentralView ArticlePubMedGoogle Scholar
- Wang Y, Barbacioru C, Hyland F, Xiao W, Hunkapiller KL, Blake J, Chan F, Gonzalez C, Zhang L, Samaha RR: Large scale real-time PCR validation on gene expression measurements from two commercial long-oligonucleotide microarrays. BMC Genomics 2006, 7: 59. 10.1186/1471-2164-7-59PubMed CentralView ArticlePubMedGoogle Scholar
- Canales R, Luo Y, Willey J, Austermiller B, Barbacioru C, Boysen C, Hunkapiller K, Jensen R, Knight C, Lee K, Ma Y, Maqsodi B, Papallo A, Herness-Peters E, Poulter K, Ruppel P, Samaha RR, Shi L, Yang W, Zhang L, Goodsaid F: Evaluation of DNA Microarray Results with Alternative Quantitative Technology Platforms. Nat Biotechnol 2006, 24(9):1115–22. 10.1038/nbt1236View ArticlePubMedGoogle Scholar
- Guo L, Lobenhofer E, Wang C, Shippy R, Harris S, Zhang L, Mei N, Chen T, Herman D, Goodsaid F, Hurban P, Phillips K, Xu J, Deng X, Sun Y, Tong W, Dragan Y, Shi L: Rat toxicogenomic study reveal analytical consistency across microarray platforms. Nat Biotecnol 2006, 24(9):1162–69. 10.1038/nbt1238View ArticleGoogle Scholar
- Patterson T, Lobenhofer E, Fulmer-Smentek S, Collins P, Chu T, Bao W, Fang H, Kawasaki E, Hager J, Tikhonova I, Walker S, Zhang L, Hurban P, de Longueville F, Fuscoe J, Tong W, Shi L, Wolfinger R: Performance comparison of one-color and two-color platforms within the MicroArray Quality Control (MAQC) project. Nat Biotecnol 2006, 24(9):1140–50. 10.1038/nbt1242View ArticleGoogle Scholar
- Dudoit S, Yang YH, Callow MJ, Speed TP: Statistical methods for identifying genes with differential expression in replicated cDNA microarray experiments. Stat Sin 2002, 12(1):111–139.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.