Normalization of Illumina Infinium whole-genome SNP data improves copy number estimates and allelic intensity ratios
© Staaf et al; licensee BioMed Central Ltd. 2008
Received: 03 June 2008
Accepted: 02 October 2008
Published: 02 October 2008
Illumina Infinium whole genome genotyping (WGG) arrays are increasingly being applied in cancer genomics to study gene copy number alterations and allele-specific aberrations such as loss-of-heterozygosity (LOH). Methods developed for normalization of WGG arrays have mostly focused on diploid, normal samples. However, for cancer samples genomic aberrations may confound normalization and data interpretation. Therefore, we examined the effects of the conventionally used normalization method for Illumina Infinium arrays when applied to cancer samples.
We demonstrate an asymmetry in the detection of the two alleles for each SNP, which deleteriously influences both allelic proportions and copy number estimates. The asymmetry is caused by a remaining bias between the two dyes used in the Infinium II assay after using the normalization method in Illumina's proprietary software (BeadStudio). We propose a quantile normalization strategy for correction of this dye bias. We tested the normalization strategy using 535 individual hybridizations from 10 data sets from the analysis of cancer genomes and normal blood samples generated on Illumina Infinium II 300 k version 1 and 2, 370 k and 550 k BeadChips. We show that the proposed normalization strategy successfully removes asymmetry in estimates of both allelic proportions and copy numbers. Additionally, the normalization strategy reduces the technical variation for copy number estimates while retaining the response to copy number alterations.
The proposed normalization strategy represents a valuable tool that improves the quality of data obtained from Illumina Infinium arrays, in particular when used for LOH and copy number variation studies.
Genomic copy number alterations (CNA) and allelic imbalances are common events in the development of cancer and certain genetic disorders [1, 2]. The introduction of whole genome genotyping (WGG) arrays based on single nucleotide polymorphism (SNP) genotyping [3, 4] allows for combined DNA copy number (SNP-CGH) and loss-of-heterozygosity (LOH) analysis at high resolution . Currently, two major SNP array platforms are in use, Affymetrix GeneChip arrays  and Illumina BeadChips . The Infinium assay for Illumina BeadChips is based on allele-specific hybridization coupled with primer extension of genomic DNA using primers directly surrounding the SNP on randomly ordered bead arrays . The Infinium assay has been further developed into allele-specific single base extension using two color labeling with the Cy3 and Cy5 fluorescent dyes (Infinium II) . Current generations of Infinium II arrays are able to interrogate more than 1 million SNPs simultaneously.
Infinium II is a two-channel assay and data consist of two intensity values (X, Y) for each SNP, with one intensity channel for each of the fluorescent dyes associated with the two alleles of the SNP. SNP markers are present at a high redundancy on Infinium II assays and the allele specific intensities (X, Y) are summarized estimates from replicate markers. The alleles measured by the X channel (Cy5 dye) are arbitrarily, with respect to haplotypes, called the A alleles, whereas the alleles measured by the Y channel (Cy3 dye) are called the B alleles. The allele specific intensities are normalized using a proprietary algorithm in the Illumina Beadstudio software. The normalization algorithm is applied on a sub-bead pool level and is designed to adjust for channel-dependent background and global intensity differences, and to scale the data. A sub-bead pool is a set of beads that were manufactured together and are located in roughly the same analytical location (stripe) on a BeadChip. The algorithm uses a 6-degree of freedom affine transformation with 5 main steps: outlier removal, background estimation, rotational estimation, shears estimation, and scaling estimation . After normalization, data should be as canonical as possible with homozygous SNPs positioned along the transformed X and Y intensity axes. Normalized allele intensities are transformed to a combined SNP intensity, R (R = X + Y), and an allelic intensity ratio, theta (θ = 2/π*arctan(Y/X)).
R values are calibrated to generate copy number estimates (CN) by comparison to either a matched reference sample analyzed simultaneously or to canonical genotype clusters . Canonical genotype clusters are generated from a large panel of normal samples and the clusters for a SNP indicate the R and theta values expected for each genotype (AA, AB and BB). Theta values are calibrated to generate B allele frequencies (BAF) using canonical genotype clusters. BAF is a value between 0 and 1 and represents the proportion contributed by one SNP allele (B) to the total copy number: BAF is an estimate of NB/(NA+ NB), where NA and NB are the number of A and B alleles, respectively. When canonical genotype clusters are used for calibration, copy number estimates are calculated per SNP by taking the log2 of the SNP intensity (R) divided by the SNP intensity expected from the canonical genotype clusters. Thus, copy number estimates may be regarded as a combination of two individual one-channel measurements of the amount of genetic material for a given SNP. Normalization of one-channel array data has been extensively explored, incorporating various algorithms, among which quantile normalization (QN) has been reported to perform consistently well  and has been widely used to normalize between arrays [10–12]. Recently, QN was applied, as one of several analysis steps, to Illumina Sentrix SNP BeadArrays to correct for an observed dye bias in copy number analysis .
Allelic imbalances in samples can be conveniently visualized in BAF plots . A BAF value of 0.5 indicates a heterozygous genotype (AB), whereas 0 and 1 indicate homozygous genotypes (AA and BB, respectively). The allelic intensity ratio may, in the Infinium II assay, be regarded as a comparative dual channel measurement of the allelic proportion for a given SNP, similar to, e.g., two-channel gene expression data. Several reports have underlined the importance of intensity-based normalization, e.g., lowess , to correct for dye specific differences both for gene expression profiling [15, 16] and array comparative genomic hybridization (aCGH) [17–19] in two-channel microarray data. Since alleles for SNPs are arbitrarily called A or B, a set of genomically consecutive SNPs will appear in BAF plots as horizontal bands that are expected to be symmetrically positioned around 0.5. For example, a region of single copy number gain in all cells will, in addition to the two bands of homozygous SNPs at BAF = 0 and BAF = 1, result in two bands: one at BAF = 0.33 with SNPs having genotype AAB and one at BAF = 0.67 with SNPs having genotype ABB.
Here we demonstrate that BAF plots for tumor samples analyzed on Infinium II BeadChips often display bands that are not symmetrically positioned around 0.5. We show that these asymmetrical allelic ratios are caused by a bias between the two dyes used in the Infinium II assay, and that this dye intensity bias also hampers copy number estimates. Dye-bias can potentially be both global and SNP-specific. We propose using a quantile normalization based strategy applied to summarized bead type data within arrays for global correction of this dye intensity bias. The strategy corrects asymmetries that remain between intensity channels after the conventionally used BeadStudio normalization for both allelic intensity ratios and copy number estimates in normal as well as in tumor samples. Note that whereas quantile normalization is widely applied to single channel arrays to normalize between arrays, we instead apply it to normalize between channels within Infinium II arrays. Of key importance for the success of the strategy is the generation of new normalized reference data sets for the calibration of theta and R into B allele frequency and log R ratio – the data set analyzed and the data set used for calibration should both be normalized in the same way. We investigated the performance of the normalization strategy using 535 individual hybridizations from 10 different data sets generated on four different Infinium II platforms. The investigated data sets contain normal blood samples as well as breast tumor, colon tumor, urothelial tumor and chronic lymphocytic leukemia (CLL) samples. The included tumors display a large number of different copy number imbalances, but also variation in tumor heterogeneity and normal cell contamination. We conclude that the normalization strategy improves Infinium II data for samples of many different types.
Results and discussion
Occurrence of asymmetrical B allele frequencies and copy number estimates in tumor specimens
Correction of dye intensity bias in HapMap samples using quantile normalization
Since the two alleles for each SNP are, with respect to haplotypes, arbitrarily associated with the X and Y intensities, normalized X and Y intensities should, in contrast to figure 1f, be expected to have essentially equal intensity distributions. Quantile normalization can be used to generate identical distributions from a set of distributions . To investigate the effect of within sample QN of X and Y intensities from normal samples, we performed QN on X and Y intensities from HapMap samples used to generate the reference data sets for the Illumina 300 k version 1 (n = 111), 300 k version 2 (n = 120), 370 k (n = 123) and 550 k (n = 120) BeadChips. For each sample and SNP we calculated new normalized theta and R values thereby generating QN reference data sets. QN has been extensively used to normalize one-channel microarray expression data such that identical intensity distributions are generated for a set of arrays (between array normalization) . Here we instead propose to use QN between channels within two-channel SNP arrays.
Comparison of Log R ratio standard deviations between BeadStudio and quantile normalized HapMap data.
Mean Log R ratio SD* BeadStudio
Mean Log R ratio SD* QN
p-value SDQN < SDBeadStudio **
Mean decrease in SD (%) ***
300 k v1
300 k v2
Comparison of effects on allelic intensity ratios between Illumina BeadStudio and quantile normalized HapMap data.
Mean ThetaAB ± mean SD BeadStudio*
Mean ThetaAB ± mean SD QN*
Paired p-value theta SDQN < SDBeadStduio **
300 k v1
0.581 ± 0.095
0.454 ± 0.087
300 k v2
0.595 ± 0.097
0.457 ± 0.086
0.594 ± 0.097
0.460 ± 0.082
0.608 ± 0.099
0.451 ± 0.084
The intensity transformation introduced by QN can negatively affect allelic intensity ratio estimates
To address how to improve QN, we investigated how QN transforms the X and Y intensities for HapMap sample NA06985 (Figure 2d and 2e). Low values of X are increased with relatively large factors in intensity, while Y values are generally decreased and scaled with smaller factors. SNPs with a low value of X are predominantly genotyped as BB, and the number of SNPs affected by the increase in X is large as seen by comparing the transformation (Figure 2d) with an X intensity histogram (Figure 2f). The same pattern is not observed for the Y intensity, for which the large majority of SNPs are transformed to a lower intensity (Figures 2e and 2g). Thus, QN introduces a transformation that results in a large increase for low X values, which affects a large number of SNPs.
Incorporation of an intensity transformation threshold for QN improves allelic intensity ratio estimates
The negative effect of QN on allelic intensity ratios could potentially be circumvented by limiting the factor with which X intensity values are increased. Hence, we introduced a threshold for the QN intensity transformations to limit the increase of X and Y values before calculation of the allelic intensity ratio. In all our analyses, we used a threshold of 1.5 for the factor with which X and Y values could maximally be increased. While the threshold is applied identically to both X and Y transformations, it essentially only influences X values. A value of 1.5 appears reasonable as it incorporates the majority of SNPs with low X values (compare Figures 2d and 2e) without affecting the corresponding Y values, but the threshold may potentially be further improved by tuning. Using this QN modified with a threshold, tQN, we generated new tQN reference data sets. The application of the threshold markedly improved quantile normalized tumor BAF data by removing asymmetry and reducing variation (Figures 3c and 3d). Additionally, the removed asymmetry for allelic intensity ratios may provide a higher probability for SNPs to be genotyped, e.g., as AA for chromosome 9 of urothelial carcinoma UC456_R (Figure 3c compared to 3a) or as AA on 1q32.1 to qter for urothelial carcinoma UC152_I (Figure 3d compared to 1c). Consequently, tQN of Infinium II data could increase the genotype call rate, a variable commonly used to assess sample quality. An increased call rate for tumor specimens may also be beneficial for downstream LOH analysis software relying on genotype calls such as dChipSNP .
Systematic investigation of BAF asymmetry in tumor samples before and after tQN
Effects of tQN on copy number estimates for tumor and normal samples
Having established that tQN corrects for asymmetry in allelic intensity ratio estimates, we investigated the effects of tQN on CN estimates compared to BeadStudio. To this aim, we applied tQN to Infinium II data sets containing both blood and tumor samples and performed three comparisons. First, we investigated whether tQN increase or decrease the response in log R ratio to CNAs. Second, we investigated if tQN decrease variation in CN estimates. Finally, we applied a CNV calling algorithm to tQN normalized HapMap data to investigate the overlap of identified regions compared to BeadStudio data.
To investigate the effect on variation in CN estimates by tQN we computed sample adaptive noise thresholds (SATs) for tQN and BeadStudio data as previously described . We obtained significantly lower SATs using tQN for four of six tested data sets, while SATs were essentially unchanged for the remaining two data sets (Figure 5b). The lack of effect by tQN on tumors hybridized on Infinium 300 k v1 BeadChips is in concordance with the reference data set (Table 1). The lack of improvement by tQN for tumors in the breast cancer data set is more difficult to explain. All tumors in this data set are either hyper-diploid or of high aneuploidy resulting in highly unbalanced CN profiles. Unbalanced CN profiles may be problematic for the affine transformation in BeadStudio, which scales the data based on that homozygous SNPs on average should exist in two copies, and therefore may confound normalization and data interpretation for aneuploid tumors . A detailed investigation of this hypothesis is however not within the scope of the current study. The CLL data set is part of a comparison of four different array platforms for detection of CNAs and LOH . In that study, Gunnarsson et al. compared the average copy number ratio and standard deviation for the normal chromosome 1 in all samples between the different platforms. The Illumina platform showed the highest average standard deviation (0.26) of the four platforms. We found that the average standard deviation for chromosome 1 after tQN was 0.21, which is comparable to the results obtained by Gunnarsson et al. for the Agilent platform (0.20). Furthermore, when applying tQN the asymmetry in CN estimates observed in figure 1d was removed (Figure 5c). The effect of tQN on CN and BAF estimates for various tumor and normal samples is further illustrated in Additional file 1. In conclusion, we find that tQN of Infinium II data is beneficial for CN estimates as variation is reduced while the dynamic response in copy number ratios to CNAs remains unchanged. A decreased variation for CN estimates can be beneficial for downstream analysis and detection of CNAs.
To investigate whether the reduced variation in copy number estimates by tQN affected CNV detection compared to BeadStudio we applied the PennCNV algorithm  to the HapMap 550 k reference data set. The overlap of identified SNPs between BeadStudio and tQN data was on average 80% across the 120 HapMap samples for CNV regions larger than 8 SNPs. Importantly, the overlap percentage increased for larger CNV regions. Even though we cannot validate the correctness of CNV regions identified in either tQN or BeadStudio data these findings indicate that tQN reduces noise without removing biologically relevant regions.
We have developed a normalization method that improves the quality of data obtained from Illumina Infinium II genotyping arrays. We show that both allelic intensity ratio and copy number estimates are improved by using a quantile normalization strategy with a threshold for the intensity transformations (tQN) for correction of intensity dye bias when Infinium II BeadChips are applied to cancer samples. This dye bias results in an asymmetric detection of the two alleles for each SNP leading to asymmetry for both allelic intensity ratios and copy number estimates. Importantly, tQN not only removes such asymmetry but also reduces variation in copy number estimates. Essential for the improved result is to create reference data sets for calibration of B allele frequency and copy number estimates that are normalized with the same method that is applied to the investigated samples. The normalization strategy was successfully applied both to normal blood samples and tumor specimens with varying tumor heterogeneity and normal cell contamination. Our strategy is applied on a sample per sample basis and we have not evaluated if Infinium II data can be improved by using between array normalization. Further optimization of the normalization approach for Infinium II data should include adjusting X and Y intensities on a sub bead-level instead of the currently used summarized bead level to address the initially unequal X and Y distributions. Such a correction would presumably alleviate the need for an additional normalization. Potentially, such improvements may also address the lower ratio response to CNAs and signal to noise observed with SNP-CGH compared to conventional aCGH [23, 25].
We used 10 data sets for evaluation of the QN strategy. Data set 1 (HapMap 300 k v2) consists of 120 HapMap  samples hybridized on Illumina HumanHap300 version 2 Genotyping BeadChips (Courtesy of Illumina Inc., San Diego, CA). Data set 2 (HapMap 370 k) consists of 123 HapMap samples hybridized on Illumina HumanCNV370 Genotyping BeadChips (Courtesy of Illumina Inc.). Data set 3 (HapMap 550 k) consists of 120 HapMap samples hybridized on Illumina HumanHap550 Genotyping BeadChips (Courtesy of Illumina Inc.). Data set 4 (urothelial tumors 370 k) consists of 17 urothelial carcinomas hybridized on HumanCNV370 Genotyping BeadChips. Data set 5 (normal 370 k) consists of 17 normal samples hybridized on Illumina HumanCNV370 Genotyping BeadChips. Samples in data set 5 displayed call rates between 99.5 to 99.8%. Twelve of the samples in data sets 5 and 6 are paired tumor-normal samples from the same individual. Data set 6 (breast tumors 550 k) consists of six breast tumors hybridized on Illumina HumanHap550 Genotyping BeadChips. Data set 7 (leukemia 300 k v2) consists of ten CLL cases hybridized on Illumina HumanHap300 version 2 Genotyping BeadChips . Data set 8 (breast/colon 300 k v1) consists of six hybridizations on Illumina HumanHap300 version 1 Genotyping BeadChips representing two breast cancers and one colon cancer with matching normal samples (Courtesy of Illumina Inc.). Data set 9 (HapMap 300 k v1) consists of 111 HapMap samples hybridized on Illumina HumanHap300 version 1 Genotyping BeadChips (Courtesy of Illumina Inc.). Data set 10 (normal 550 k) consists of one normal sample hybridized 5 times at different DNA concentrations on Illumina HumanHap550 Genotyping BeadChips (obtained from the PennCNV website ).
Chromosomes 1 to 22 were used in all comparisons. Data sets 4 and 5 were generated at the SCIBLU Genomics Centre at Lund University, Sweden  and data sets 6 and 7 at the SNP technology platform in Uppsala, Sweden  according to manufacturers instructions.
BeadStudio data preprocessing
Fluorescent signals were imported into the BeadStudio software version 3.1 (Illumina Inc) and normalized. For each sample, the normalized fluorescence signal intensities were compared with the signal intensities of a set of reference genotypes, and the log2-ratios between sample and reference signals were calculated on a SNP per SNP basis. In addition, the frequency of the B-allele was for each sample estimated based on the reference genotype clusters . Normalized X and Y intensities were exported for further analysis. Manifest used for 300 k version 2 BeadChips was HumanHap300v2_A. Manifest used for 300 k version 1 BeadChips was BDCHP-1x10-HUMANHAP300v1-1_11219278_C. Manifest used for 370 k BeadChips was HumanCNV370v1_C. Manifest used for 550 k BeadChips was HumanHap550v3_A. Mirrored B allele frequencies (mBAF) were calculated as mBAF = abs(BAF - 0.5) + 0.5 .
Quantile normalization (tQN) of Infinium II data
tQN was performed individually for each sample using affine normalized intensities (X, Y) from BeadStudio and the R  package limma . The combined SNP intensity, R, was calculated from tQN intensities. A threshold of 1.5 for the intensity transformations XQN/X and YQN/Y was applied prior to calculation of theta: XQN intensities larger than 1.5 * X were set to 1.5 * X; YQN intensities larger than 1.5 * Y were set to 1.5 * Y. Theta, B allele frequencies and copy number estimates were calculated from tQN normalized intensities and reference data sets as previously described . CNV probes in analyzed samples were excluded from normalization due to lack of genotype information. Instead, for these probes the BeadStudio BAF and log R ratio values were used.
Construction of tQN corrected reference data sets
Quantile normalized reference data sets were created from HapMap data sets using intensities (X, Y) normalized in BeadStudio as the starting point. For each sample and SNP, quantile normalized R and theta values were calculated as previously described . Cluster positions in theta and R were calculated for each SNP and genotype based on genotype information (AA, AB and BB) using the mean of all samples for the specific SNP and genotype. SNPs with no cluster positions (no genotype assignment across all HapMap samples) were excluded from the analysis. BAF and copy number estimates for SNPs with only one genotype across all HapMap samples were calculated using the value of the single cluster position. Theta values for SNPs with one heterozygous and only one homozygous cluster position (e.g. AB and AA) were imputed for the missing homozygous cluster position (e.g. BB) by the median of all theta values for the missing genotype. Corresponding R estimates for the missing genotype were set as missing values. For CNV probes the original BeadStudio cluster positions were kept.
Segmentation of allelic ratios for investigation of BAF asymmetry
For matched tumor-normal samples, SNPs homozygous in both the tumor and its matched normal sample were first removed from the tumor BAF profile. Next, each tumor sample was split into an upper and lower data set, based on BAF values > 0.5 or < 0.5. Both data sets were mirrored from BAF to mBAF (compare figure 1b) and segmented by CBS  using default settings as recently described . Segments from the lower and upper part of the BAF profile were cross-mapped and the difference in the average segmented mBAF values between the upper and lower part for each genomic segment was calculated. If no asymmetry is present, the difference between the upper and lower part of the BAF profile should be zero. Only segments larger than 30 SNPs in size and with a segmented mBAF value > 0.6 in the upper and/or lower part was used for evaluation of asymmetry. For tumor samples without a matched normal, SNPs with BAF > 0.97 or < 0.03 were removed prior to splitting BAF profiles into an upper and lower part.
Copy number analysis
Segmentation was performed on normalized Log R ratios for each sample, platform and method using CBS . The significance level for accepting change-points, α, was set to 0.001 for all analyzed data sets and normalization methods. For comparisons between methods only segmented regions > 20 SNPs were used.
Sample adaptive thresholds
Sample adaptive thresholds for CN estimates were calculated as previously described , using a smoothing window of 21 SNPs, the median of the SD distribution as cut-off, and a scaling factor of 2 for all analyzed data sets and normalization methods.
Availability and requirements
Project name: tQN
Project home page: http://baseplugins.thep.lu.se/wiki/se.lu.onk.IlluminaSNPNormalization
Operating system(s): Any operating system supporting Perl and R.
Programming language: Perl and R.
Other requirements: Perl modules File::Spec, Getopt::Long, IO::File and Pod::Usage. R package limma.
License: GNU GPL
Any restrictions to use by non-academics: None
Data set 6 (breast tumors 550 k) is available through NCBI's Gene Expression Omnibus  with accession GSE11977.
B allele frequency
circular binary segmentation
comparative genomic hybridization
chronic lymphocytic leukemia
copy number aberration
copy number variation
loss of heterozygosity
mirrored B allele frequency
sample adaptive threshold
single nucleotide polymorphism
thresholded quantile normalization
whole genome genotyping.
Financial support was provided by the Swedish Cancer Society, the Knut & Alice Wallenberg Foundation, the Foundation for Strategic Research through the Lund Centre for Clinical Cancer Research (CREATE Health), the American Cancer Society and the IngaBritt and Arne Lundberg Foundation. The SCIBLU Genomics center is supported by governmental funding of clinical research within the National Health Services (ALF) and by Lund University.
- Pinkel D, Albertson DG: Comparative genomic hybridization. Annu Rev Genomics Hum Genet 2005, 6: 331–354. 10.1146/annurev.genom.6.080604.162140View ArticlePubMedGoogle Scholar
- Rajagopalan H, Lengauer C: Aneuploidy and cancer. Nature 2004, 432: 338–341. 10.1038/nature03099View ArticlePubMedGoogle Scholar
- Matsuzaki H, Dong S, Loi H, Di X, Liu G, Hubbell E, Law J, Berntsen T, Chadha M, Hui H, Yang G, Kennedy GC, Webster TA, Cawley S, Walsh PS, Jones KW, Fodor SP, Mei R: Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays. Nat Methods 2004, 1: 109–111. 10.1038/nmeth718View ArticlePubMedGoogle Scholar
- Gunderson KL, Steemers FJ, Lee G, Mendoza LG, Chee MS: A genome-wide scalable SNP genotyping assay using microarray technology. Nat Genet 2005, 37: 549–554. 10.1038/ng1547View ArticlePubMedGoogle Scholar
- Peiffer DA, Le JM, Steemers FJ, Chang W, Jenniges T, Garcia F, Haden K, Li J, Shaw CA, Belmont J, Cheung SW, Shen RM, Barker DL, Gunderson KL: High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Res 2006, 16: 1136–1148. 10.1101/gr.5402306PubMed CentralView ArticlePubMedGoogle Scholar
- Steemers FJ, Chang W, Lee G, Barker DL, Shen R, Gunderson KL: Whole-genome genotyping with the single-base extension assay. Nat Methods 2006, 3: 31–33. 10.1038/nmeth842View ArticlePubMedGoogle Scholar
- Bolstad BM, Irizarry RA, Astrand M, Speed TP: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003, 19: 185–193. 10.1093/bioinformatics/19.2.185View ArticlePubMedGoogle Scholar
- Barnes M, Freudenberg J, Thompson S, Aronow B, Pavlidis P: Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms. Nucleic Acids Res 2005, 33: 5914–5923. 10.1093/nar/gki890PubMed CentralView ArticlePubMedGoogle Scholar
- Carvalho B, Bengtsson H, Speed TP, Irizarry RA: Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. Biostatistics 2007, 8: 485–499. 10.1093/biostatistics/kxl042View ArticlePubMedGoogle Scholar
- Dunning MJ, Barbosa-Morais NL, Lynch AG, Tavare S, Ritchie ME: Statistical issues in the analysis of Illumina data. BMC Bioinformatics 2008, 9: 85. 10.1186/1471-2105-9-85PubMed CentralView ArticlePubMedGoogle Scholar
- Oosting J, Lips EH, van Eijk R, Eilers PH, Szuhai K, Wijmenga C, Morreau H, van Wezel T: High-resolution copy number analysis of paraffin-embedded archival tissue using SNP BeadArrays. Genome Res 2007, 17: 368–376. 10.1101/gr.5686107PubMed CentralView ArticlePubMedGoogle Scholar
- Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP: Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res 2002, 30: e15. 10.1093/nar/30.4.e15PubMed CentralView ArticlePubMedGoogle Scholar
- Quackenbush J: Microarray data normalization and transformation. Nat Genet 2002, (32 Suppl):496–501. 10.1038/ng1032View ArticlePubMedGoogle Scholar
- Smyth GK, Speed T: Normalization of cDNA microarray data. Methods 2003, 31: 265–273. 10.1016/S1046-2023(03)00155-5View ArticlePubMedGoogle Scholar
- Khojasteh M, Lam WL, Ward RK, MacAulay C: A stepwise framework for the normalization of array CGH data. BMC Bioinformatics 2005, 6: 274. 10.1186/1471-2105-6-274PubMed CentralView ArticlePubMedGoogle Scholar
- Staaf J, Jonsson G, Ringner M, Vallon-Christersson J: Normalization of array-CGH data: influence of copy number imbalances. BMC Genomics 2007, 8: 382. 10.1186/1471-2164-8-382PubMed CentralView ArticlePubMedGoogle Scholar
- Neuvial P, Hupe P, Brito I, Liva S, Manie E, Brennetot C, Radvanyi F, Aurias A, Barillot E: Spatial normalization of array-CGH data. BMC Bioinformatics 2006, 7: 264. 10.1186/1471-2105-7-264PubMed CentralView ArticlePubMedGoogle Scholar
- Assie G, LaFramboise T, Platzer P, Bertherat J, Stratakis CA, Eng C: SNP arrays in heterogeneous tissue: highly accurate collection of both germline and somatic genetic information from unpaired single tumor samples. Am J Hum Genet 2008, 82: 903–915. 10.1016/j.ajhg.2008.01.012PubMed CentralView ArticlePubMedGoogle Scholar
- Lin M, Wei LJ, Sellers WR, Lieberfarb M, Wong WH, Li C: dChipSNP: significance curve and clustering of SNP-array-based loss-of-heterozygosity data. Bioinformatics 2004, 20: 1233–1240. 10.1093/bioinformatics/bth069View ArticlePubMedGoogle Scholar
- Staaf J, Lindgren D, Vallon-Christersson J, Isaksson A, Goransson H, Juliusson G, Rosenquist R, Hoglund M, Borg A, Ringner M: Segmentation-based detection of allelic imbalance and loss-of-heterozygosity in cancer cells using whole genome SNP arrays. Genome Biol 2008, 9: R136. 10.1186/gb-2008-9-9-r136PubMed CentralView ArticlePubMedGoogle Scholar
- Gunnarsson R, Staaf J, Jansson M, Ottesen AM, Goransson H, Liljedahl U, Ralfkiaer U, Mansouri M, Buhl AM, Smedby KE, Hjalgrim H, Syvanen AC, Borg A, Isaksson A, Jurlander J, Juliusson G, Rosenquist R: Screening for copy-number alterations and loss of heterozygosity in chronic lymphocytic leukemia-A comparative study of four differently designed, high resolution microarray platforms. Genes Chromosomes Cancer 2008, 47: 697–711. 10.1002/gcc.20575View ArticlePubMedGoogle Scholar
- Wang K, Li M, Hadley D, Liu R, Glessner J, Grant SF, Hakonarson H, Bucan M: PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res 2007, 17: 1665–1674. 10.1101/gr.6861907PubMed CentralView ArticlePubMedGoogle Scholar
- Greshock J, Feng B, Nogueira C, Ivanova E, Perna I, Nathanson K, Protopopov A, Weber BL, Chin L: A comparison of DNA copy number profiling platforms. Cancer Res 2007, 67: 10173–10180. 10.1158/0008-5472.CAN-07-2102View ArticlePubMedGoogle Scholar
- SCIBLU Genomics, Lund University, Sweden[http://www.lth.se/sciblu]
- SNP Technology Platform in Uppsala, Sweden[http://www.genotyping.se]
- The R project for statistical computing[http://www.r-project.org]
- Venkatraman ES, Olshen AB: A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics 2007, 23: 657–663. 10.1093/bioinformatics/btl646View ArticlePubMedGoogle Scholar
- Gene Expression Omnibus[http://www.ncbi.nlm.nih.gov/geo/]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.