Comparing genotyping algorithms for Illumina's Infinium whole-genome SNP BeadChips
- Matthew E Ritchie†1, 2Email author,
- Ruijie Liu†1,
- Benilton S Carvalho3,
- The Australia and New Zealand Multiple Sclerosis Genetics Consortium (ANZgene) and
- Rafael A Irizarry4Email author
© Ritchie et al; licensee BioMed Central Ltd. 2011
Received: 8 November 2010
Accepted: 8 March 2011
Published: 8 March 2011
Illumina's Infinium SNP BeadChips are extensively used in both small and large-scale genetic studies. A fundamental step in any analysis is the processing of raw allele A and allele B intensities from each SNP into genotype calls (AA, AB, BB). Various algorithms which make use of different statistical models are available for this task. We compare four methods (GenCall, Illuminus, GenoSNP and CRLMM) on data where the true genotypes are known in advance and data from a recently published genome-wide association study.
In general, differences in accuracy are relatively small between the methods evaluated, although CRLMM and GenoSNP were found to consistently outperform GenCall. The performance of Illuminus is heavily dependent on sample size, with lower no call rates and improved accuracy as the number of samples available increases. For X chromosome SNPs, methods with sex-dependent models (Illuminus, CRLMM) perform better than methods which ignore gender information (GenCall, GenoSNP). We observe that CRLMM and GenoSNP are more accurate at calling SNPs with low minor allele frequency than GenCall or Illuminus. The sample quality metrics from each of the four methods were found to have a high level of agreement at flagging samples with unusual signal characteristics.
CRLMM, GenoSNP and GenCall can be applied with confidence in studies of any size, as their performance was shown to be invariant to the number of samples available. Illuminus on the other hand requires a larger number of samples to achieve comparable levels of accuracy and its use in smaller studies (50 or fewer individuals) is not recommended.
In the past decade, hundreds of studies investigating the genetics of common human diseases have been published . High-density SNP microarrays cataloguing variation identified in the HapMap project  have been the enabling technology behind these large-scale genome-wide association studies. These microarrays allow the collection of genotypes for many SNPs in many individuals at relatively low cost. The two major producers of these microarrays are Affymetrix Inc. (Santa Clara, CA) and Illumina Inc. (San Diego, CA). The platforms offered by these companies differ substantially in terms of array fabrication, probe design, sample preparation and hybridization protocol. However, both currently genotype around 1 million SNPs per sample and also include non-polymorphic probes for assessing copy number variation in the genome.
Illumina's BeadChips have rapidly increased in both SNP density (from 100,000 to 1,000,000 SNPs) and in the number of samples processed in parallel (1, 2, 4, 8 or 12 per BeadChip) over the past few years. Illumina whole-genome SNP BeadChips use Infinium chemistry, which differentially labels allele A and allele B with red and green dye respectively [3, 4]. A number of algorithms are available for processing the raw signal from these arrays into genotype calls. These methods include: GenCall , Illumina's proprietary method implemented in the BeadStudio/GenomeStudio software; Illuminus ; GenoSNP ; CRLMM [8–10]; Birdseed, available in the Birdsuite software ; and BeagleCall .
In this paper we compare the four widely applicable methods GenCall, Illuminus, GenoSNP and CRLMM on different data sets, measuring performance in terms of accuracy and the ability of each method to flag poor quality calls, SNPs and samples.
Summary of the genotyping algorithms compared.
GenCall is the standard vendor provided method from Illumina  which is available as a module in the BeadStudio/GenomeStudio software. After reading in the data from binary files (idats) produced by Illumina's scanning system, normalization using an affne transform to rotate and re-scale the X and Y intensities is applied to decrease dependence between the two alleles . Normalization is performed separately for beads from different 'bead pools'. A 'bead pool' refers to a set of beads that have been manufactured together and are located in roughly the same physical position (strip) on a BeadChip. Polar coordinates (R, θ) are calculated from the normalized X and Y values. Clustering is performed by the GenTrain algorithm, which is a between sample model. Neural networks which take the polar coordinate transformed data and estimate the SNP-specific centroids for each genotype are used. Default cluster centroids are calculated using data from a set of HapMap samples  (Table 2). Alternatively, users may perform clustering using the available samples to calibrate the cluster positions to the data. Genotypes are then assigned by determining the nearest cluster. The GenCall score (GC) is a confidence measure assigned to each call which can be used to filter poor quality calls, SNPs or samples. Illumina generally recommend that calls with GC ≤ 0.15 represent failed genotypes. Averaged GC scores over all SNPs from a given sample, or across all samples for a given SNP can be used as sample or SNP quality metrics. A more commonly used sample quality metric is the 'no call rate'. For GenCall, genotypes with GC score less than a given threshold (0.15 in our analyses) are declared as missing. The proportion of missing values, or 'no calls' in each sample gives the no call rate; samples with higher rates are deemed less reliable than samples with lower rates. No call rates less than 1% should be expected for good quality samples which have been properly processed (Illumina Technical Support, personal communication).
A second alternative, named Illuminus , uses GenCall normalized X and Y values as input. It models the data from each SNP using a four component mixture model which is fitted using an Expectation Maximization (EM) algorithm to the strength (log(X ij + Y ij )) and contrast ((X ij - Y ij )/(X ij + Y ij )) values to summarize the four possible states (AA, AB, BB or NC for no call). The indices i and j refer to sample and SNP respectively. Probabilities (p ijk , where k = 1,....,4 is the genotype index) indicating how likely a given call is correct under the model are also available. The genotype with the highest probability is the call reported to the user, and the probability provides a call confidence measure. Illuminus fits a separate three component model for X chromosome SNPs in male samples. A perturbation score is also calculated to quantify how sensitive the clustering is to changes in the initial values used in the EM-algorithm. This score serves as a SNP quality measure, and a cut-off of 0.95 and above, which equates to 95% or more of the calls agreeing after perturbation, is recommended in the Illuminus documentation. Sample quality can be measured by the percentage of calls with a posterior probability less than a threshold (0.95 is recommended). Alternatively the percentage of no calls (NC or genotype index k = 4) obtained for each sample can be used as a sample quality indicator. The Illuminus software is implemented in C and is available from the authors on request .
A third method, GenoSNP  is the only method which ts a within-sample model to the data. GenoSNP uses the raw (non-normalized) X and Y intensities from GenCall, which are separated by bead pool and then quantile normalized within sample. A four component mixture model similar to Illuminus is then fitted to the normalized log2(X ij + 1) and log2(Y ij + 1) values. SNPs from the same bead pool within a given sample are called simultaneously using the model. This approach is quite different to the other methods, which use between sample information to fit the model. In GenoSNP, a posterior probability is available for each call indicating how likely the call comes from the class assigned. This value serves as a call confidence measure. The average posterior probability across all samples for a given SNP may be used to filter SNPs, with lower average probabilities indicative of SNPs with poorer clustering under the model. A SNP cut-off of 0.95 or higher is recommended for good quality data sets, and 0.8 or higher for lower quality data sets. Likewise, the average posterior probability of all calls from a given sample can be used as a sample quality metric. A sample quality threshold of 0.9 or higher is recommended. The GenoSNP software is implemented in C and is available from the authors on request .
The final method in our comparison, CRLMM, was originally developed for Affymetrix data [8, 9] and has recently been adapted to suit Illumina's Infinium SNP BeadChips . CRLMM extracts summarized X and Y intensities directly from the idat files. For normalization, SNPs are separated based on their physical location (strip) on the BeadChip surface and simultaneously quantile normalized between channels (X and Y) and samples, using the reference distribution obtained from the HapMap training samples (Table 2). Each strip contains SNPs from multiple bead pools. After normalization, SNP-specific log-ratios (M ij = log2X ij - log2Y ij ) and average intensities (S ij = (log2X ij + log2Y ij )/2) are calculated for each array. To remove intensity dependent effects of S on M, a three-component mixture model with smoothing splines is fitted to each array via the EM-algorithm. Next, a two-level hierarchical model, with SNP-specific means and standard deviations estimated from the relevant training data set using genotype information from the HapMap project, is fitted. The intensity-dependent splines and the SNP-specific genotype means and standard deviations are combined in the model [8, 9]. In general, the model assumes 3 clusters, except for X chromosome SNPs in male samples, where a 2 cluster model is used. Genotype calls are assigned by choosing the class that minimizes the negative log likelihood. CRLMM produces a number of quality assessment measures [9, 13]. Per call confidence is measured using the log-likelihood ratio test from the hierarchical model. At the SNP level, the minimum distance between the heterozygote center and either of the two homozygous centers provides a SNP confidence score. A signal-to-noise ratio (SNR) for each sample assesses the separation of the three major genotype clusters within an array, with lower values indicative of poorer quality data. The CRLMM method is implemented in R  and is available as part of the Bioconductor project .
None of the methods compared make calls for the non-polymorphic copy number specific probes which are available on many Infinium chip types.
Each of the four algorithms was applied to the data sets described in the following sections.
Summary of the HapMap samples analyzed by each algorithm.
370 k Duo
1 m Duo
370 k Quad
610 k Quad
660 k Quad
Association study data
The number of samples analyzed from the MS-GWAS.
Number of samples
Results and Discussion
Comparing accuracy using HapMap data
The accuracy versus drop rate calculations were repeated using per SNP quality measures instead of individual call confidence measures to filter entire SNPs from the analysis (Additional File 1: Supplemental Figure S3). The Illuminus perturbation score for SNP quality gives very similar accuracy to CRLMM's cluster separation metric when large numbers of samples are available (Additional File 1: Supplemental Figures S3A, S3C and S3E), while the average per SNP posterior probability of GenoSNP is slightly less accurate than these methods. For smaller sample sizes, Illuminus does less well. These measures are superior to GenCall's average GC score.
Higher-level performance assessment
The HapMap data sets analyzed are of very high quality and not subject to the same sources of variation that affect data from genome-wide association studies. In large projects, the collection of samples and genotypes may occur over a long period of time and arrays may be processed by multiple laboratories or core facilities. We examine data from the MS-GWAS where samples have been collected from different centers and processed in batches (Table 3). GenCall, Illuminus and CRLMM were each run independently on the different batches, while GenoSNP was run one sample at-a-time. For GenCall, re-clustering was carried out by the GenTrain algorithm using the samples available instead of the default HapMap cluster information. We use these data to assess how well each method performs at flagging samples of dubious quality.
Summary of the computing resources required by each method.
Time taken (mins)
Peak memory usage (GB)
GenomeStudio (v 1.1.0)*
CRLMM (v 1.2.4)†
Our study represents the largest comparison of genotyping methods for Illumina's Infinium BeadChip platform to date. We examined the performance on data sets varying in size from tens to nearly 2000 samples from a wide range of chip types.
Despite the differences in approach, the four methods compared generally o er similar performance in terms of accuracy with high quality HapMap data (> 99% agreement), when call or SNP-specific quality scores were used to filter data. CRLMM is marginally better than GenoSNP and Illuminus (when sample size is large enough), followed by GenCall. Each method also gives high concordance between replicate samples (> 99% on average). Variations in the ability of different methods to correctly recover calls from SNPs with low minor allele frequency were observed, with CRLMM and GenoSNP outperforming GenCall and Illuminus for SNPs with the lowest MAF. This points to the benefit of borrowing information between SNPs. In GenoSNP, this is done explicitly by using the many observations from a given bead pool to estimate parameters in the mixture model and assign genotypes. For CRLMM, there will be little information from the training data set on the heterozygous and homozygous cluster locations involving the minor allele. However, since the SNP-specific parameters are updated by an empirical Bayes shrinkage procedure, more weight will be placed on the priors in these situations. These priors are derived from other SNPs in the data set. Both approaches cope better than methods which model the data from each SNP independently (GenCall and Illuminus) when MAF is low. This issue will be important as arrays include more rare variants (MAF < 5%), such as SNPs discovered in the 1000 Genomes Project .
We observed that the performance of Illuminus depends upon the number of samples available for the analysis, with larger sample sizes (≥50), giving better results in terms of no call rate and accuracy. For genome-wide association studies, low sample numbers are not likely to be a problem, however for linkage studies, which are often much smaller (< 10 samples), Illuminus would not be the method of choice, unless the samples can be analyzed within a larger batch of the same chip type. All other methods can handle data from small-scale projects without compromising performance.
We note that relative to the time expended recruiting and collecting samples and processing arrays, the time taken to run each algorithm is insignificant, with slightly longer processing times unlikely to be a major factor effecting the choice of method. The ability to parallelize genotyping between multiple processors is a simple way to reduce the time taken to process samples. All four algorithms allow parallelization. By default, GenomeStudio divides the analysis between the available processors, splitting on sample or SNP depending upon the stage of the analysis. For GenoSNP, which processes samples one-at-a-time, parallelization is trivial; the user can easily divide the samples between the processors available. For Illuminus and CRLMM, the between-sample nature of the modelling, means that parallelization requires SNPs to be split between processors. This feature is available as an option in both algorithms. In CRLMM, the parallelization is handled using the snow package in R.
As for timing, researchers involved in large scale studies are likely to have access to high performance computing facilities, which means that large memory requirements of methods like CRLMM, and to a lesser extent Illuminus are not likely to pose a limitation. In the most recent version of CRLMM, the memory footprint can be reduced through use of the ff package in R. This package utilizes available disk space instead of RAM when RAM is limited to store the raw data and genotyping output.
One drawback of the current implementation of CRLMM is its reliance on training data to calibrate the model parameters, which means that for customized genotyping, or genotyping in non-model organisms (such as cow, pig and chicken), it cannot be applied due to a lack of availability of HapMap-like training data. We are currently investigating modifications to CRLMM to ensure it can be applied in such settings. While GenCall also includes a training step on HapMap data for the chip types analyzed in this paper, it can also work in an unsupervised manner, where it estimates cluster centers using the data available without the need for any prior information. Illuminus and GenoSNP can also be used on BeadChips containing customized human SNP sets or SNPs from other diploid organisms.
Further work would be to extend the comparison to include newer genotyping methods, such as BeagleCall , which adds an extra layer of haplotype information to the genotype calling process. The improvements offered by the recently released update to the GenTrain clustering algorithm (version 2) are also of interest. GenTrain2 was not used in this study, as output from this software was unavailable for any of the data sets analyzed. Since most studies published to date will be based on the older version of GenCall, our comparison is still relevant.
The full list of authors and affiliations for the ANZgene Consortium is as follows:
Study design and management committee: Melanie Bahlo1, David R Booth6, Simon A Broadley7,8, Matthew A Brown9;10, Simon J Foote11, Lyn R Griffiths12, Trevor J Kilpatrick13-15, Jeanette Lechner-Scott16,17, Pablo Moscato17,18, Victoria M Perreau13, Justin P Rubio14, Rodney J Scott16-18, Jim Stankovich11, Graeme J Stewart6, Bruce V Taylor11, James Wiley19 (Chair).
Sample processing, data handling and genotyping: Matthew A Brown9,10, David R Booth6, Glynnis Clarke20, Mathew B Cox17,18, Peter A Csurhes21, Patrick Danoy9, Joanne L Dickinson11, Karen Drysdale11, Judith Field14, Simon J Foote11, Judith M Greer21, Lyn R Griffiths12, Preethi Guru11, Johanna Hadler9, Ella Hoban11, Brendan J McMorran11, Cathy J Jensen14, Laura J Johnson14, Ruth McCallum22, Marilyn Merriman22, Tony Merriman22, Andrea Polanowski11, Karena Pryce9, Rodney J Scott16-18, Graeme J Stewart6, Lotfi Tajouri12, Lucy Whittock11, Ella J Wilkins14, Justin P Rubio14 (Chair).
Data analysis: Melanie Bahlo1, Matthew A Brown9,10, Brian L Browning23, Sharon R Browning23, Devindri Perera11, Justin P Rubio14, Jim Stankovich11 (Chief analyst).
Phenotyping: Simon Broadley7,8, Helmut Butzkueven14,24, William M Carroll25,26, Caron Chapman27, Allan G Kermode25,26, Mark Marriott15, Deborah Mason28, Robert N Heard6, Michael P Pender29,30, Mark Slee31, Niall Tubridy32, Jeanette Lechner-Scott16,17, Bruce V Taylor11, Ernest Willoughby33, Trevor J Kilpatrick13-15 (Chair).
Addresses:6The Westmead Millenium Institute, Westmead, New South Wales, Australia. 7School of Medicine, Griffith University, Queensland, Australia. 8Department of Neurology, Gold Coast Hospital, Queensland, Australia. 9Diamantina Institute of Cancer, Immunology and Metabolic Medicine, Princess Alexandra Hospital, University of Queensland, Brisbane, Queensland, Australia. 10Botnar Research Centre, Nuffield Department of Orthopaedic Surgery, University of Oxford, Oxford, UK. 11Menzies Research Institute, University of Tasmania, Hobart, Tasmania. 12Genomics Research Centre, Griffith University, Queensland, Australia. 13Centre for Neuroscience, University of Melbourne, Victoria, Australia.14The Howard Florey Institute, University of Melbourne, Victoria, Australia. 15Royal Melbourne Hospital, Parkville, Victoria, Australia. 16John Hunter Hospital, Hunter New England Health Service, Newcastle, New South Wales, Australia. 17Hunter Medical Research Institute, Newcastle, New South Wales, Australia. 18Centre for Bioinformatics, Biomarker Discovery and Information-based Medicine, University of Newcastle, New South Wales, Australia. 19Department of Medicine, Nepean Hospital, Penrith, New South Wales, Australia. 20Christchurch School of Medicine and Health Sciences, University of Otago, New Zealand. 21UQ Centre for Clinical Research, University of Queensland, Queensland, Australia.22Department of Biochemistry, University of Otago, Dunedin, New Zealand. 23Department of Statistics, The University of Auckland, Auckland, New Zealand. 24Department of Neurology, Box Hill Hospital, Victoria, Australia. 25Sir Charles Gairdner Hospital, Nedlands, Western Australia, Australia. 26Australian Neuromuscular Research Institute, Nedlands, West Australia, Australia. 27Barwon Health, Geelong, Victoria, Australia. 28Canterbury District Health Board, Christchurch, New Zealand. 29School of Medicine, University of Queensland, Queensland, Australia. 30Department of Neurology, Royal Brisbane and Women's Hospital, Queensland, Australia. 31School of Medicine, Department of Neurology, Flinders University, Bedford Park, Adelaide, South Australia, Australia. 32Department of Neurology, St. Vincent's University Hospital, Dublin, Republic of Ireland. 33Auckland District Healthboard, Auckland, New Zealand.
We thank Patrick Danoy for providing the raw data from the MS-GWAS and for providing information on the computing resources used by GenCall; Melanie Bahlo and Jim Stanokvich for sample annotation information and useful insights into the MS-GWAS data; Dan Peiffer from Illumina Inc. for providing access to their in-house HapMap data; Mike Inouye for providing the Illuminus software and advice on its use; Eleni Giannoulatou for providing the GenoSNP software and advice on its use; Illumina's Technical Support for answering various questions on Infinium technology and the GenCall algorithm, Marvin Newhouse and Jiong Yang for maintaining the computing environment used for the analysis; Keith Satterley for help in generating several data packages used by CRLMM; Marc Carlson for making the CRLMM data packages available through Bioconductor; Terry Speed for feedback on the manuscript and the anonymous reviewers whose comments also improved the final manuscript.
This work was supported by NHMRC Program grant 406657, NHMRC IRIISS grant 361646 and a Victorian State Government OIS grant (MER, RL), and NIH grants R01GM083084, R01RR021967 and P41HG004059 (BSC, RAI).
- Yu W, Gwinn M, Clyne M, Yesupriya A, Khoury MJ: A navigator for human genome epidemiology. Nat Genet 2008, 40: 124–5. 10.1038/ng0208-124View ArticlePubMedGoogle Scholar
- International HapMap Consortium: A second generation human haplotype map of over 3.1 million SNPs. Nature 2007, 449: 851–61. 10.1038/nature06258View ArticleGoogle Scholar
- Steemers F, Chang W, Lee G, Barker D, Shen R, Gunderson K: Whole-genome genotyping with the single-base extension assay. Nat Methods 2006, 3: 31–3. 10.1038/nmeth842View ArticlePubMedGoogle Scholar
- Peiffer D, Le J, Steemers F, Chang W, Jenniges T, Garcia F, Haden K, Li J, Shaw C, Belmont J, Cheung S, Shen R, Barker D, Gunderson K: High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Res 2006, 16: 1136–48. 10.1101/gr.5402306PubMed CentralView ArticlePubMedGoogle Scholar
- Kermani BG: Artificial intelligence and global normalization methods for genotyping.2008. [http://www.patentstorm.us/patents/7467117/fulltext.html]Google Scholar
- Teo Y, Inouye M, Small K, Gwilliam R, Deloukas P, Kwiatkowski D, Clark T: A genotype calling algorithm for the Illumina BeadArray platform. Bioinformatics 2007, 23: 2741–6. 10.1093/bioinformatics/btm443PubMed CentralView ArticlePubMedGoogle Scholar
- Giannoulatou E, Yau C, Colella S, Ragoussis J, Holmes C: GenoSNP: a variational Bayes within-sample SNP genotyping algorithm that does not require a reference population. Bioinformatics 2008, 24: 2209–14. 10.1093/bioinformatics/btn386View ArticlePubMedGoogle Scholar
- Carvalho B, Bengtsson H, Speed T, Irizarry R: Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. Biostatistics 2007, 8: 485–99. 10.1093/biostatistics/kxl042View ArticlePubMedGoogle Scholar
- Lin S, Carvalho B, Cutler D, Arking D, Chakravarti A, Irizarry R: Validation and extension of an empirical Bayes method for SNP calling on Affymetrix microarrays. Genome Biol 2008, 9: R63. 10.1186/gb-2008-9-4-r63PubMed CentralView ArticlePubMedGoogle Scholar
- Ritchie M, Carvalho B, Hetrick K, Tavaré S, Irizarry R: R/Bioconductor software for Illumina's Infinium whole-genome genotyping BeadChips. Bioinformatics 2009, 25: 2621–3. 10.1093/bioinformatics/btp470PubMed CentralView ArticlePubMedGoogle Scholar
- Korn J, Kuruvilla F, McCarroll S, Wysoker A, Nemesh J, Cawley S, Hubbell E, Veitch J, Collins P, Darvishi K, Lee C, Nizzari M, Gabriel S, Purcell S, Daly M, Altshuler D: Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat Genet 2008, 40: 1253–60. 10.1038/ng.237PubMed CentralView ArticlePubMedGoogle Scholar
- Browning B, Yu Z: Simultaneous genotype calling and haplotype phasing improves genotype accuracy and reduces false-positive associations for genome-wide association studies. Am J Hum Genet 2009, 85: 847–61. 10.1016/j.ajhg.2009.11.004PubMed CentralView ArticlePubMedGoogle Scholar
- Carvalho B, Louis T, Irizarry R: Quantifying uncertainty in genotype calls. Bioinformatics 2010, 26: 242–9. 10.1093/bioinformatics/btp624PubMed CentralView ArticlePubMedGoogle Scholar
- R Development Core Team:R A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna; 2010. [http://www.R-project.org]Google Scholar
- Gentleman R, Carey V, Bates D, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini A, Sawitzki G, Smith C, Smyth G, Tierney L, Yang J, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 2004, 5: R80. 10.1186/gb-2004-5-10-r80PubMed CentralView ArticlePubMedGoogle Scholar
- HapMart (version 21, NCBI Build 35) [http://hapmart.hapmap.org/BioMart/martview/]
- The Australia and New Zealand Multiple Sclerosis Genetics Consortium (ANZgene): Genome-wide association study identifies new multiple sclerosis susceptibility loci on chromosomes 12 and 20. Nat Genet 2009, 41: 824–8. 10.1038/ng.396View ArticleGoogle Scholar
- Bahlo M, Stankovich J, Danoy P, Hickey P, Taylor B, Browning SR, Australian and New Zealand Multiple Sclerosis Genetics Consortium (ANZgene), Brown M, Rubio JP: Saliva-derived DNA performs well in large-scale, high-density single-nucleotide polymorphism microarray studies. Cancer Epidemiol Biomarkers Prev 2010, 19: 794–8. 10.1158/1055-9965.EPI-09-0812View ArticlePubMedGoogle Scholar
- The 1000 Genomes Project Consortium: A map of human genome variation from population-scale sequencing. Nature 2010, 467(7319):1061–73. 10.1038/nature09534PubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.