- Open Access
Assessing batch effects of genotype calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500 K array set using 270 HapMap samples
© Hong et al; licensee BioMed Central Ltd. 2008
- Published: 12 August 2008
Genome-wide association studies (GWAS) aim to identify genetic variants (usually single nucleotide polymorphisms [SNPs]) across the entire human genome that are associated with phenotypic traits such as disease status and drug response. Highly accurate and reproducible genotype calling are paramount since errors introduced by calling algorithms can lead to inflation of false associations between genotype and phenotype. Most genotype calling algorithms currently used for GWAS are based on multiple arrays. Because hundreds of gigabytes (GB) of raw data are generated from a GWAS, the samples are typically partitioned into batches containing subsets of the entire dataset for genotype calling. High call rates and accuracies have been achieved. However, the effects of batch size (i.e., number of chips analyzed together) and of batch composition (i.e., the choice of chips in a batch) on call rate and accuracy as well as the propagation of the effects into significantly associated SNPs identified have not been investigated. In this paper, we analyzed both the batch size and batch composition for effects on the genotype calling algorithm BRLMM using raw data of 270 HapMap samples analyzed with the Affymetrix Human Mapping 500 K array set.
Using data from 270 HapMap samples interrogated with the Affymetrix Human Mapping 500 K array set, three different batch sizes and three different batch compositions were used for genotyping using the BRLMM algorithm. Comparative analysis of the calling results and the corresponding lists of significant SNPs identified through association analysis revealed that both batch size and composition affected genotype calling results and significantly associated SNPs. Batch size and batch composition effects were more severe on samples and SNPs with lower call rates than ones with higher call rates, and on heterozygous genotype calls compared to homozygous genotype calls.
Batch size and composition affect the genotype calling results in GWAS using BRLMM. The larger the differences in batch sizes, the larger the effect. The more homogenous the samples in the batches, the more consistent the genotype calls. The inconsistency propagates to the lists of significantly associated SNPs identified in downstream association analysis. Thus, uniform and large batch sizes should be used to make genotype calls for GWAS. In addition, samples of high homogeneity should be placed into the same batch.
- Call Rate
- Batch Size
- Genotype Call
- Batch Composition
- Genotype Calling
Genome-wide association studies (GWAS) aim to identify genetic variants of single nucleotide polymorphisms (SNPs) across the entire human genome that are associated with phenotypic traits, such as disease status and drug response. The International HapMap project determined genotypes of over 3.1 million common SNPs in human populations and computationally assembled them into a genome-wide map of SNP-tagged haplotypes [1, 2]. Concurrently, high-throughput SNP genotyping technology advanced to enable simultaneous genotyping of hundreds of thousands of SNPs. These advances combine to make GWAS a feasible and a promising research field for associating genotypes with various disease susceptibilities and health outcomes. Recently, GWAS was successfully applied to identify common genetic variants associated with a variety of phenotypes [3–31]. Many of these studies used the Affymetrix GeneChip Human Mapping 500 K array set [5, 6, 11]. The genomic DNA for one of the arrays is cleaved with the Nsp I restriction enzyme and ~262,000 SNPs are interrogated. The second chip uses Sty I – cleaved genomic DNA and ~238,000 SNPs are analyzed. Genotypes from Affymetrix GeneChip Human Mapping 500 K array set data are usually determined by the calling algorithm BRLMM  embedded in Affymetrix software packages. Algorithms developed by other laboratories such as PLASQ , GEL , CRLMM , SNiPer-HD , MAMS , and CHIAMO  are also utilized.
The MPAM algorithm was developed for analysis of raw data (i.e., the CEL files) from the first generation of Affymetrix Mapping 10 K array and is based on clustering of chips for each SNP by modified partitioning around medoids . MPAM was error prone for SNPs with missing genotype groups or low minor allele frequency, a problem more pronounced on the second generation of Affymetrix Mapping 100 K array. This prompted Affymetrix to develop a new dynamic model based calling algorithm called DM for Mapping 100 K array data . DM is a single-chip calling algorithm and usually calls genotypes with high overall call rate and accuracy. However, the algorithm exhibited a higher misclassification rate for heterozygous genotypes than for homozygous genotypes. To improve data analyses for genotyping arrays, the multi-chip genotype calling algorithm RLMM was developed. RLMM is based on a robustly fitted, l inear m odel that employs M ahalanobis distance for classification . RLMM achieved a higher call rate than DM. With the release of the Mapping 500 K SNP array set, Affymetrix extended the RLMM model to BRLMM by adding a Bayesian step that provided improved estimates of cluster centers and variances. The DM and GEL algorithms operate on a single chip, while all others use multiple chips to call genotypes.
High call rate and accuracy of genotype calling are important and essential issues for success of GWAS, since errors introduced in the genotypes by calling algorithms can inflate false associations and may lose true associations between genotype and phenotype. Each of the algorithms was reported to have a high successful call rate and accuracy, or more precisely, high concordance with genotypes determined by the International HapMap Consortium on the HapMap samples. With the exception of DM and GEL, the algorithms require data from multiple chips (i.e., a batch) to make genotype calls. A GWAS usually involves analyses of thousands of samples that generate thousands of raw data files (i.e., CEL files). The raw data file for one sample (two CEL files for Affymetrix Mapping 500 K array set: one from Nsp-digested genomic DNA and one from Sty-digested DNA) is about 130 MB in size. Computer memory (RAM) limits make it unfeasible to analyze all CEL files in a GWAS in one single batch on a single computer. The samples are, therefore, divided into many batches for genotype calling. Affymetrix suggests 40 to 96 CEL files for a batch for the BRLMM method. To date, the effects on genotype calls caused (potentially) by changing the number and specific combinations of CEL files in batches and propagation of the effects to the downstream association analysis have not been investigated.
Since BRLMM is recommended by Affymetrix, we analyzed the effect of batch size and composition on the ability of the BRLMM algorithm to consistently call the 270 samples from the International HapMap project.
Batch size effect
Batch size effect was assessed by comparing the genotypes called from BS1, BS2, and BS3 (see Methods) for call rate and concordance. The overall call rates, defined as the proportion of successful calls to the total number of calls (successful calls plus missing calls) for BS1, BS2, and BS3 were 99.48%, 99.50%, and 99.49%, respectively. However, overall call rates are not informative enough to assess the distribution of missed calls on the chip. Batch size effect on genotype calling rates are best compared using one-against-one comparisons of distributions of call rates on individual samples and SNPs. These distributions were calculated from data of samples and SNPs generated from the calling results of the experiments with three batch sizes (BS1, BS2, and BS3).
Concordance of calls between batch sizes
BS1 vs BS2
BS1 vs BS3
BS2 vs BS3
Successful Calls for Both
Concordant Calls (All)
Concordant Calls (Hom)
Concordant Calls (Het)
Batch composition effect
The overall call rate based on all CEL files of the 270 HapMap samples for BC1, BC2, and BC3 (see Methods) were 99.48%, 99.43%, and 99.41%, respectively. The genetic homogeneity of the batches in BC1 (samples from 1 population group) is higher than that of BC2 (samples from 2 population groups) which, in turn, is higher than that of BC3 (samples from 3 population groups). The batch sizes were the same for all of the three experiments. Thus, higher call rates were obtained when genotype calling was conducted with samples of higher genetic homogeneity. The effect of batch homogeneity was relatively minor by this measure. Because the distribution of missing calls on samples and SNPs was more informative for assessing batch effect in our first experiments (BS studies), we examined the distribution of call rates in the BC experiments.
Concordance of calls between batch compositions
BC1 vs BC2
BC1 vs BC3
BC2 vs BC3
Successful Calls for Both
Concordant Calls (All)
Concordant Calls (Hom)
Concordant Calls (Het)
Quality of the raw data
Propagation of batch effect to significantly associated SNPs
The objective of a GWAS is to identify the genetic markers associated with a specific phenotypic trait. It is critical to assess whether and how the batch effect propagates to the significant SNPs identified in the downstream association analysis. Three case-control based association analyses were conducted for each of the calling results with different batch sizes and compositions to assess the propagation of batch effect in genotype calling to the significantly associated SNPs (see Methods).
After removal of low quality SNPs by quality control assessment, each of the three population groups (European, Asian, and African) was set as "case" while the other two groups were set as "control". Association analyses were conducted to identify SNPs that can differentiate the "case" group from the "control" group. Different lists of SNPs significantly associated with a same population group, identified using the genotype calling results with different batch sizes and compositions, were compared using Venn diagram.
It is clear that the batch size effect on genotype calling propagated into the downstream association analyses. Moreover, it was observed that the larger the differences between two batch sizes, the fewer the significantly associated SNPs shared by the two batch sizes. For example, there were 471, 370, and 217 significantly associated SNPs shared only by BS2 and BS3, by BS1 and BS2, and by BS1 and BS3 for the association analyses with European as "case", respectively, that are negatively related to the corresponding differences of batch sizes: 15, 45, and 60. Same trends were observed for the association analyses with African as "case" and with Asian as "case".
The Venn diagrams demonstrated that for a same "case-control" setting different lists of significantly associated SNPs were identified by the same statistical test (Chi2 test) using the genotype calling results from different batch compositions. Therefore, the batch composition effect on genotype calling propagated to the significantly associated SNPs. Moreover, it was observed that the larger the difference of genetic homogeneity between two batch compositions, the fewer the significantly associated SNPs shared by the two batch compositions. For example, there were 555, 512, and 229 significantly associated SNPs shared only by BC2 and BC3, by BC1 and BC2, and by BC1 and BC3, respectively, for the association analyses with European as "case". The numbers are negatively related to the corresponding differences of genetic homogeneity in the batch compositions: 0.17, 0.5, and 0.67. Same trends were observed for the association analyses with African as "case" and with Asian as "case".
GWAS is increasingly used to identify loci containing genetic variants associated with common diseases and drug responses. The number of SNPs interrogated in a GWAS has grown from thousands to millions; for example, the newest Affymetrix SNPs array 6.0 contains ~2 million probe sets. At the same time, the allele frequency difference of disease-associated or drug-associated SNPs is usually very small. Therefore, a very small error introduced in genotypes by genotype calling algorithms may result in inflated false associations between genotype and phenotype in the downstream association analysis. Reproducibility and robustness are as important to genotype calling as is the accuracy and call rate that are usually used to evaluate performance of genotype calling algorithms. As most genotype calling algorithms are based on multiple chips, and genotype calling for a GWAS is usually conducted in many batches, reproducibility and robustness of multi-chip calling algorithms under different batch sizes and compositions are important variables. Statistical tests of these parameters would increase the confidence for associated SNPs identified in downstream association analysis.
A heterozygous genotype carries a rare allele. Therefore, the robustness of calling heterozygous reduces false positive associations and the chance of missing true associations. Our studies revealed that both batch size and composition affected genotype calling results, especially for heterozygous genotype calling. It was also demonstrated that batch effect propagates to the downstream association analysis. Genotype calling algorithms that eliminate or reduce batch effects but maintain high call rates and accuracy are preferred for GWAS.
BRLMM first derives an initial guess for each SNP's genotype using the DM algorithm and then analyzes across SNPs to identify cases of non-monomorphism. This subset of non-monomorphism SNPs is then used to estimate a prior distribution on cluster centers and variance-covariance matrices. This subset of SNP genotypes is revisited and the clusters and variances of the initial genotype guesses are combined with the prior information of the SNP in an ad-hoc Bayesian procedure to derive a posterior estimate of cluster centers and variances. All SNPs in a chip are called according to their Mahalanobis distances from the three cluster centers and confidence scores are assigned to the calls. With default settings, BRLMM randomly picks 10,000 SNPs to estimate cluster centers and variances. But the number of non-monomorphism SNPs used to estimate the prior distribution on cluster centers and variance-covariance matrices varies with changing number of CEL files and changing composition of CEL files in the calling batches. Batch size effect and batch composition effect alter these estimates of prior distribution and variance-covariance matrices. The effect of altering the number of non-monomorphism SNPs was confirmed when using the BRLMM calling algorithm by varying the batch size and composition. The average number of non-monomorphism SNPs used to estimate the prior distributions are 5468 (Nsp) and 5422 (Sty), 4356 (Nsp) and 4358 (Sty), and 3612 (Nsp) and 3618 (Sty) for calling batches in BS1, BS2, and BS3, respectively. The difference of batch sizes is related to the difference of numbers of non-monomorphism SNPs used to estimate the prior distribution which is, in turn, related to the difference of genotype calling results. The average number of non-monomorphism SNPs used to estimate the prior distribution are 5468 (Nsp) and 5422 (Sty), 6399 (Nsp) and 6308 (Sty), and 6788 (Nsp) and 6688 (Sty) for calling batches in BC1, BC2, and BC3, respectively. Differences in genetic homogeneity of samples are related to differences in the numbers of non-monomorphism SNPs used to estimate the prior which, in turn, is related to the difference of genotype calling results.
As demonstrated above, both batch size and batch composition affect genotype calling results of GWAS using the BRLMM algorithm. The larger the difference of batch sizes, the larger the effect. When the samples in the calling batches are more homogenous, more concordant genotypes are called. Batch effect propagates to the downstream association analysis and makes the significantly associated SNPs identified inconsistent. Therefore, we suggest from our studies that the same or larger batch sizes should be used to make genotype calls for GWAS and homogenous samples should be put into the same batches.
The raw data (CEL files) from the Affymetrix GeneChip Human Mapping 500 K array set of the 270 HapMap samples were downloaded from the International HapMap project website http://www.hapmap.org/downloads/raw_data/affy500k/. The CEL file format was described on Affymetrix's developer pages http://www.affymetrix.com/Auth/support/developer/fusion/file_formats.zip. The file name indicated the population code (CEU/YRI/CHB+JPT), the sample identifier (e.g., NA12345), followed by the Affymetrix array type (based on restriction enzyme name: Nsp or Sty). Three population groups composed the data sets and each group contained 90 samples: CEU had 90 samples from Utah residents with ancestry from northern and western Europe (termed as European in this paper); CHB+JPT had 45 samples from Han Chinese in Beijing, China, and 45 samples from Japanese in Tokyo, Japan (termed as Asian in this paper); YRI had 90 samples from Yoruba in Ibadan, Nigeria (termed as African in this paper).
Quality of the raw data
The quality of the raw data from the Affymetrix Human Mapping 500 K array set was assessed using DM  before genotype calling by BRLMM. DM is a single array based algorithm; it processes one CEL file at a time in a multiple CEL file batch and statistically assesses experimental qualities with a numerical score between 0 and 100. A high QC (quality control) number means high quality of the experiment (CEL file).
Genotype calling by BRLMM
All experiments of genotype calling by BRLMM reported in this paper were conducted using apt-probeset-genotype of Affymetrix Power Tools 1.8.5. Affymetrix Power Tools (APT) contains a set of cross-platform command line programs that implement algorithms for analyzing and working with Affymetrix GeneChip® arrays. These programs are available on the Affymetrix website http://www.affymetrix.com/support/developer/powertools/index.affx. APT programs are intended for "power users" who prefer programs that can be utilized in scripting environments and are sophisticated enough to handle the complexity of extra features and functionality. The function of apt-probeset-genotype in APT is an application for making genotype calls using SNP Arrays (100 K, 500 K, Genome-Wide SNP Arrays 5.0 and 6.0). BRLMM is one of the genotype calling algorithms implemented in this function, and enables many parameters to be changed by a user. For the studies reported here, all the parameters, except as noted in the narrative were set to the default values recommended by Affymetrix. The chip description files (cdf) for both Nsp and Sty chips of the Mapping 500 K array set, as well as files for defining SNPs on chromosome X, were also used before genotype calling. They were downloaded from Affymetrix website. Nsp and Sty CEL files were genotype-called separately.
Batch size experiments
Three experiments were designed and conducted in order to assess the effect of batch size. In the first experiment (BS1), the 270 HapMap samples were divided into three batches based on their population groups: 90 Europeans, 90 Asians, and 90 Africans. The genotypes were called separately by BRLMM using the default parameter setting suggested by Affymetrix (CEL files from Nsp and CEL files from Sty were analyzed separately). Genotype calling results on Nsp files and on Sty files of the three batches in this experiment were then merged for comparison with results of other experiments with different batch sizes. The second experiment (BS2) used a batch size of 45 samples. Genotypes were called from the CEL files from 90 European samples in two batches, each with 45 CEL files using BRLMM with the same parameter settings as in the first experiment. The procedure was repeated for the Asian and African samples. In the third experiment (BS3), the batch size was 30 samples from each population groups.
Batch composition experiments
The selection of samples (CEL files) to place in each batch can also be anticipated to alter genotyping call rates. The term batch composition effect is used here to denote the selected arrays within batches. BRLMM was used with default parameter settings and the CEL files of 270 HapMap samples to test batch composition effects. In the first experiment (BC1), the 270 samples were placed in three batches. One batch contained 90 samples from the same population group, Europeans, Asians, or Africans. In the second experiment (BC2), the 90 samples in each of the three population groups were evenly divided into two subgroups with each subgroup having 45 unique samples. Genotype calling was then conducted in three batches with composition of: (i) subgroup 1 of Europeans + subgroup 1 of Asians, (ii) subgroup 2 of Europeans + subgroup 1 of Africans, and (iii) subgroup 2 of Africans + subgroup 2 of Asians. In the third experiment (BC3), the 90 samples in each of the three population groups were evenly divided into three subgroups with each subgroup having 30 unique samples. Genotype calling was then conducted in three batches with composition of: (i) subgroup 1 of Europeans + subgroup 1 of Asians + subgroup 1 of Africans, (ii) subgroup 2 of Europeans + subgroup 2 of Asians + subgroup 2 of Africans, and (iii) subgroup 3 of Europeans + subgroup 3 of Asians + subgroup 3 of Africans. In each of the three experiments, genotype calling results of the three batches were merged together before conducting the comparisons.
Comparing genotype calling results
In each of the experiments reported here, the genotype calling results by BRLMM from different calling batches were first merged using a set of in-house programs written in C++. When merging the calling results, genotypes of SNPs in Nsp and Sty chips of the same samples were merged followed by assembling together all genotypes of all of the 270 HapMap samples. Thereafter, overall call rates for each of the experiments, call rates of individual samples and SNPs in each of the experiments, and concordant calls between experiments were calculated and exported as tab-delimited text files using the in-house programs written in C++. Comparison of calling results was done using the R package.
Paired two samples t-test in R package (t.test) was used to statistically test the alternative hypothesis that call rates on samples or SNPs between two calling experiments are different.
where and are call rates of experiments 1 and 2 of sample i or SNP i, respectively; N is the total number of samples (in this case, 270) or SNPs (in this case, 500,668 which includes 50 QC probe sets in both Nsp and Sty chips).
In order to study the propagation of batch effect to the significantly associated SNPs, all genotype calling results of the raw data of 270 HapMap samples using BRLMM with different batch sizes and compositions were analyzed using Chi2 statistics test for associations between the SNPs and the case-control settings.
Prior to association analysis, quality control (QC) of the calling results was conducted to remove markers and samples with low quality. For each of the calling results, call rate of 90% was used to remove SNPs and samples. Minor allele frequency was used to filter SNPs and its cut-off was set to 0.01. Departure from Hardy-Weinberg equilibrium (HWE) was check for all SNPs. The p-value of Chi2 test for Hardy-Weinberg equilibrium was calculated for all SNPs at first and then the p-values were adjusted for multiple tests using Benjamini and Hochberg false discovery rate (FDR) . FDR of 0.01 was set as the cut-off for HWE test. There were no samples removed because of low quality. 54942 (10.97%) to 55496 (11.084%) SNPs were removed in the QC, mainly because of departure from HWE.
To mimic "case-control" in GWAS, for each of the genotype calling results, each of the three population groups (European, African, and Asian) was assigned as "case" while the other two as "control" to form a data set for association analysis for identifying the SNPs significantly associated with the "case" population group.
In the association analyses, a 2 × 3 contingency table was generated for each SNP and a case-control setting. Then Chi2 statistics test was applied on the contingency table to calculate a p-value for measuring the statistical significance of the association between the testing SNP and the corresponding case-control setting. After raw p-values for all SNPs in a data set were calculated, Bonferroni correction was applied to adjust the raw p-values. Lastly, a criterion of Bonferroni-corrected p-value less than 0.01 was used to identify the significantly associated SNPs.
We thank Drs. Federico Goodsaid, Sue Jane Wang, and Li Zhang of CDER/FDA, Ansar Jawaid of AstraZeneca, David Craig of The Translational Genomics Research Institute, Uwe Scherf, Lakshmi Vishnuvajjala, Arkendra De, and Lakshman Ramamurthy of CDRH/FDA, Nick Xiao of Core Genotyping Facility/NCI, and Keith Nangle, Meg E. Ehm, and Gbenga R. Kazeem of GlaxoSmithKline for fruitful discussions. We are grateful to the reviewers for their comments and suggestions for revising and improving the paper. We also thank Dr. Tao Chen and Dr. Lei Guo for reading through the paper and their comments. The views presented in this article do not necessarily reflect those of the US Food and Drug Administration.
This article has been published as part of BMC Bioinformatics Volume 9 Supplement 9, 2008: Proceedings of the Fifth Annual MCBIOS Conference. Systems Biology: Bridging the Omics. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/9?issue=S9
- The International HapMap Consortium: A haplotype map of the human genome. Nature 2005, 437: 1299–1320.PubMed CentralView ArticleGoogle Scholar
- The International HapMap Consortium: A second generation human haplotype map of over 3.1 million SNPs. Nature 2007, 449: 851–862.PubMed CentralView ArticleGoogle Scholar
- Klein RJ, et al.: Complement factor H polymorphism in age-related macular degeneration. Science 2005, 308: 385–389.PubMed CentralView ArticlePubMedGoogle Scholar
- Duerr RH, et al.: A genome-wide association study identifies IL23R as an inflammatory bowel disease gene. Science 2006, 314: 1461–1463.PubMed CentralView ArticlePubMedGoogle Scholar
- Frayling TM, et al.: A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 2007, 316: 889–894.PubMed CentralView ArticlePubMedGoogle Scholar
- Saxena R, et al.: Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride level. Science 2007, 316: 1331–1336.View ArticlePubMedGoogle Scholar
- Zeggini E, et al.: Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 2007, 316: 1336–1341.PubMed CentralView ArticlePubMedGoogle Scholar
- Scott L, et al.: A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 2007, 316: 1341–1345.PubMed CentralView ArticlePubMedGoogle Scholar
- Sladek , et al.: A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 2007, 445: 881–885.View ArticlePubMedGoogle Scholar
- Easton DF, et al.: Genome-wide association study identifies novel breast cancer susceptibility loci. Nature 2007, 447: 1087–1093.PubMed CentralView ArticlePubMedGoogle Scholar
- Wellcome Trust Case Control Consortium: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 2007, 447: 661–678.View ArticleGoogle Scholar
- Raelson JV, et al.: Genome-wide association study for Crohn's disease in the Quebec Founder Population identifies multiple validated disease loci. Proc Natl Acad Sci USA 2007, 104: 14747–14752.PubMed CentralView ArticlePubMedGoogle Scholar
- Uda M, et al.: Genome-wide association study shows BCL11A associated with persistent fetal hemoglobin and amelioration of the phenotype of β-thalassemia. Proc Natl Acad Sci USA 2008, 105: 1620–1625.PubMed CentralView ArticlePubMedGoogle Scholar
- Smyth DJ, et al.: A genome-wide association study of nonsynonymous SNPs identifies a type 1 diabetes locus in the interferon-induced helicase (IFIH1) region. Nature Genet 2006, 38: 617–619.View ArticlePubMedGoogle Scholar
- Hampe J, et al.: A genome-wide association scan of nonsynonymous SNPs identifies a susceptibility variant for Crohn disease in ATG16L1. Nature Genet 2007, 39: 207–211.View ArticlePubMedGoogle Scholar
- Rioux JD, et al.: Genome-wide association study identifies new susceptibility loci for Crohn disease and implicates autophagy in disease pathogenesis. Nature Genet 2007, 39: 596–604.PubMed CentralView ArticlePubMedGoogle Scholar
- Gudmundsson J, et al.: Genome-wide association study identifies a second breast cancer susceptibility variant at 8q24. Nature Genet 2007, 39: 631–637.View ArticlePubMedGoogle Scholar
- Yeager M, et al.: Genome-wide association study of breast cancer identifies a second risk locus at 8q24. Nature Genet 2007, 39: 645–649.View ArticlePubMedGoogle Scholar
- van Heel DA, et al.: A genome-wide association study for celiac disease identifies risk variants in the region harbouring IL2 and IL21. Nature Genet 2007, 39: 827–829.PubMed CentralView ArticlePubMedGoogle Scholar
- Todd AJ, et al.: Robust associations of four new chromosome regions from genome-wide analysis of type 1 diabetes. Nature Genet 2007, 39: 857–864.PubMed CentralView ArticlePubMedGoogle Scholar
- Hunter DJ, et al.: Genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nature Genet 2007, 39: 870–874.PubMed CentralView ArticlePubMedGoogle Scholar
- Tomlinson I, et al.: A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21. Nature Genet 2007, 39: 984–988.View ArticlePubMedGoogle Scholar
- Zanke BW, et al.: Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24. Nature Genet 2007, 39: 989–994.View ArticlePubMedGoogle Scholar
- Buch S, et al.: A genome-wide association scan identifies the hepatic cholesterol transporter ABCG8 as a susceptibility factor for human gallstone disease. Nature Genet 2007, 39: 995–999.View ArticlePubMedGoogle Scholar
- Winkelmann J, et al.: Genome-wide association study of restless legs syndrome identifies common variants in three genomic regions. Nature Genet 2007, 39: 1000–1006.View ArticlePubMedGoogle Scholar
- Grupe A, et al.: Evidence for novel susceptibility genes for late-onset Alzheimer's disease from a genome-wide association study of putative functional variants. Hum Mol Genet 2007, 16: 865–873.View ArticlePubMedGoogle Scholar
- Cargill M, et al.: A large-scale genetic association study confirms IL12B and leads to the identification of IL23R as psoriasis-risk genes. Am J Hum Genet 2007, 80: 273–290.PubMed CentralView ArticlePubMedGoogle Scholar
- Arking DE, et al.: A common genetic variant in the neurexin superfamily member CNTNAP2 increases familial risk of autism. Am J Hum Genet 2008, 82: 160–16.PubMed CentralView ArticlePubMedGoogle Scholar
- Kayser M, et al.: Three Genome-wide Association Studies and a Linkage Analysis Identify HERC2 as a Human Iris Color Gene. Am J Hum Genet 2008, 82: 411–423.PubMed CentralView ArticlePubMedGoogle Scholar
- Yang HH, Hu N, Taylor PR, Lee MP: Whole Genome-Wide Association Study Using Affymetrix SNP Chip: A Two-Stage Sequential Selection Method to Identify Genes That Increase the Risk of Developing Complex Diseases. Methods Mol Med 2008, 141: 23–35.View ArticlePubMedGoogle Scholar
- Butcher LM, Davis OS, Craig IW, Plomin R: Genome-wide quantitative trait locus association scan of general cognitive ability using pooled DNA and 500 K single nucleotide polymorphism microarrays. Genes Brain Behav. 2008,7(4):435–446. [http://www.blackwell-synergy.com/doi/pdf/10.1111/j.1601–183X.2007.00368.x]PubMed CentralView ArticlePubMedGoogle Scholar
- See the white paper on BRLMM of Affymetrix[http://www.affymetrix.com/support/technical/whitepapers/brlmm_whitepaper.pdf]
- LaFramboise T, et al.: Allele-specific amplification in cancer revealed by SNP array analysis. PLoS Comput Biol 2005, 1: e65.PubMed CentralView ArticlePubMedGoogle Scholar
- Nicolae DL, Wu X, Miake K, Cox NJ: GEL: a novel genotype calling algorithm using empirical likelihood. Bioinformatics 2006, 22: 1942–1947.View ArticlePubMedGoogle Scholar
- Carvalho B, Bengtsson H, Speed TP, Irizarry RA: Exploration, Normalization, and Genotype Calls of High Density Oligonucleotide SNP Array Data. Biostatistics 2007, 8: 485–499.View ArticlePubMedGoogle Scholar
- Hua J, et al.: SNiPer-HD: Improved genotype calling accuracy by an expectation-maximization algorithm for high-density SNP arrays. Bioinformatics 2007, 23: 57–63.View ArticlePubMedGoogle Scholar
- Xiao Y, Segal MR, Yang YH, Yeh RF: A multi-array multi-SNP genotyping algorithm for affymetrix SNP microarrays. Bioinformatics 2007,23(12):1459–1467.View ArticlePubMedGoogle Scholar
- Liu WM, et al.: Algorithms for large scale genotyping microarrays. Bioinformatics 2003, 19: 2397–2403.View ArticlePubMedGoogle Scholar
- Di X, et al.: Dynamic model based algorithms for screening and genotyping over 100 K SNPs on oligonucleotide microarrays. Bioinformatics 2005, 21: 1958–1963.View ArticlePubMedGoogle Scholar
- Rabbee N, Speed TP: genotype calling algorithm for Affymetrix SNP arrays. Bioinformatics 2006, 22: 7–12.View ArticlePubMedGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Statist Soc B 1995, 57: 289–300.Google Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.