 Software
 Open Access
 Published:
Efficient analysis of large datasets and sex bias with ADMIXTURE
BMC Bioinformatics volume 17, Article number: 218 (2016)
Abstract
Background
A number of large genomic datasets are being generated for studies of human ancestry and diseases. The ADMIXTURE program is commonly used to infer individual ancestry from genomic data.
Results
We describe two improvements to the ADMIXTURE software. The first enables ADMIXTURE to infer ancestry for a new set of individuals using cluster allele frequencies from a reference set of individuals. Using data from the 1000 Genomes Project, we show that this allows ADMIXTURE to infer ancestry for 10,920 individuals in a few hours (a 5 × speedup). This mode also allows ADMIXTURE to correctly estimate individual ancestry and allele frequencies from a set of related individuals. The second modification allows ADMIXTURE to correctly handle Xchromosome (and other haploid) data from both males and females. We demonstrate increased power to detect sexbiased admixture in AfricanAmerican individuals from the 1000 Genomes project using this extension.
Conclusions
These modifications make ADMIXTURE more efficient and versatile, allowing users to extract more information from large genomic datasets.
Background
The ADMIXTURE program [1] estimates individual ancestry proportions for admixed individuals from genomic datasets. It uses a likelihood model [2] that assumes the diploid genotype n _{ ij } for individual i at biallelic SNP j, which represents the number of type “1” alleles observed, is generated by binomial sampling from a weighted sum of ancestral allele frequencies. For each individual, the weights are given by the proportions of ancestry derived from each ancestral population. Given K ancestral populations, genotypes are sampled as \(n_{ij} \sim \text {Binomial }\left (2, \sum _{k=1}^{K} q_{ik}p_{kj}\right)\) where q _{ ik } the fraction of individual i’s ancestry attributable to population k and p _{ kj } is the frequency of the type 1 allele at SNP j in population k. ADMIXTURE maximizes the resulting biconcave loglikelihood (Eq. 1) using a block relaxation algorithm.
We describe two extensions to the ADMIXTURE program that accelerate the analysis of large datasets and enable ancestry estimation for sex chromosomes. The first extension (“projection”) allows ADMIXTURE to estimate ancestry for a new set of individuals using ancestral populations from an earlier ADMIXTURE run. It enables efficient inference of ancestry on large genomic datasets using ancestral populations estimated from reference panels like the 1000 Genomes Project. It can also be used to correctly infer individual ancestry in pedigrees. The second extension allows ADMIXTURE to model the loglikelihood for haploid chromosomes. This can be used to correctly estimate ancestry on sex chromosomes and therefore estimate sex bias in ancestry between the autosomes and sex chromosomes. We demonstrate the utility of these extensions using data from the 1000 Genomes Project [3] and the HapMap Project [4].
Implementation
Projecting new samples on existing population structure
A number of large genomewide datasets of human populations such as the HapMap Project, 1000 Genomes Project etc. are now publicly available. Many studies (e.g. [5]) use these datasets as reference panels in combination with the study sample to estimate individual ancestry using ADMIXTURE since these large datasets summarize worldwide human population structure. For study samples which do not include a novel population, an efficient way of estimating individual ancestry is to “project” the new samples on to the population structure learned from the reference panels. This is intuitively similar to the projection operation used in principal components analysis, though the mathematical details differ. We extended the ADMIXTURE code to allow loading of trained models (the.P files with cluster allele frequencies). For two datasets with the same set of SNPs, clusters can be learned using the unsupervised mode of ADMIXTURE on the first dataset and ancestry proportions can be inferred for the second dataset using these learned clusters. The same approach can be used to infer ancestry on a set of related individuals. First, we infer the largest set of unrelated individuals in the dataset using pedigree information or methods such as PLINK [6], KING [7] or PRIMUS [8]. Then, ADMIXTURE is run on this set in unsupervised mode and the remaining individuals are projected on the resulting population structure.
Mathematically, this requires solving the likelihood maximization problem of Eq. 1 with respect to Q for a fixed P. This problem can be solved efficiently using the optimization described by Alexander et al. [1].
Analyzing haploid sexchromosomes
Admixture between populations is often sexbiased, i.e., different proportions of males and females from the source populations contribute to the admixed populations. In human populations, sexbiased admixture has been observed in AfricanAmericans and Latinos, often using evidence from Ychromosome or mitochondrial DNA [9–11]. An alternative way to study sexbiased admixture is to examine individual ancestry estimates on the autosomes vs the sex chromsomes [5, 12]. Therefore, we are interested in inferring individual ancestry using ADMIXTURE on the sex chromosomes, in particular on the haploid Xchromosome in males.
For a haploid sexchromosome SNP, we assume that hemizygous genotypes are coded as homozygotes for the observed allele. Then, using the same notation as before, genotypes can be sampled as \(\frac {n_{ij}}{2} \sim \text {Binomial }(1, \sum _{k=1}^{K} q_{ik}p_{kj})\). The corresponding loglikelihood for a haploid sexchromosome SNP in an individual is half of that for a homozygous autosomal diploid SNP in Eq. 1, as described in Eq. 2. We account for this in ADMIXTURE by keeping track of the sex of each individual and the chromosome each SNP belongs to and adjusting the loglikelihood accordingly.
To enable correct handling of haploid sexchromosomes in multiple species, we implemented the haploid option, which takes a single colonseparated argument describing the haploid sexes and the haploid chromosomes. For instance, for human data, sexchromsomes can be supplied as an argument for ADMIXTURE as haploid=“male:23,24” with 23 and 24 representing the X and Y chromosomes respectively.
Results
We demonstrate the utility of the newly implemented options using experiments on human genomic datasets.
Using reference panels for inferring ancestry proportions with projection
We duplicated data from Phase 1 of the 1000 Genomes Project to create a dataset with 10,920 individuals. The data was filtered to include only SNPs with minor allele frequency (MAF) ≥5 % and thinned for linkage disequilibrium (LD) to have pairwise r ^{2}≤0.1 in 50 kb windows. We compared the running time and accuracy of two analyses, with the number of clusters (K) ranging from 2 to 10:

Unsupervised: Unsupervised ADMIXTURE was run on the entire dataset of 10,920 individuals.

Projection: Unsupervised ADMIXTURE was first run on the original 1092 individuals from the 1000 Genomes Project and the remaining 9828 individuals were projected on to the learned population structure.
Each analysis was performed with 5 random starts, with running time limited to 72 h. All experiments were run on a single core of a server with Xeon E52660 processors, using 3.7 GB memory.
Figure 1 shows the comparison of running times for ADMIXTURE on the 10,920 individuals using the two approaches. The projection approach is much faster than unsupervised ADMIXTURE, with speed gains increasing with K, the number of clusters. We find that the ancestry proportions inferred using both approaches are identical.
Comparison with iAdmix
The projection step we describe has been recently independently implemented by Bansal et al. [13] in the software iAdmix, using a different optimization algorithm. We compared our ADMIXTURE projection implementation to the iAdmix projection implementation by running unsupervised ADMIXTURE on the first 1092 individuals from the previous analysis and using the learned allele frequencies to infer ancestry for the remaining 9828 (copied) individuals by projection using either ADMIXTURE or iAdmix. Figure 2 shows that projection using ADMIXTURE is approximately 4 times faster than using iAdmix^{1}.
Ancestry estimation for related individuals using projection
ADMIXTURE infers individual ancestry proportion and ancestral population allele frequencies simultaneously in an alternating optimization [1]. Inferring allele frequencies (AF) from related individuals without suitable correction for relatedness can lead to high variance in estimates [14]. We demonstrate that relatedness can affect the inferred population clusters when ADMIXTURE is run on related individuals using the CEPH (Utah residents with ancestry from northern and western Europe, CEU) and Yoruba in Ibadan, Nigeria (YRI) individuals from HapMap Phase 3. We also show how projection can be used to obtain more accurate AF estimates.
We used 165 CEU individuals (112 unrelated and 53 related) and 113 unrelated YRI indviduals to construct a dataset with 278 individuals. After filtering for LD (r ^{2}<0.2) and MAF >0.05, the dataset had 180,591 SNPs. The dataset then was then analyzed using ADMIXTURE with K=2 population clusters in two ways:

All individuals: ADMIXTURE was run on the entire dataset.

Unrelated individuals: The dataset was divided into two sets  one containing only the 225 unrelated CEU and YRI individuals and another containing the 53 related CEU individuals. ADMIXTURE was run on the unrelated set. The related individuals were then projected on the allele frequencies inferred from the unrelated set.
For both analyses, we then compared the inferred allele freqencies for the European components to AF estimates from the Exome Aggregation Consortium (ExAC [15]) data at a common set of 939 SNPs (with frequency between 5 and 95 % in ExaAC). We find that European component AF estimates are closer to ExAC allele frequencies for the unrelated analysis (root mean square error = 0.040) than for the analysis using all individuals (root mean square error = 0.041), with p = 0.005 for a onetailed paired ttest when the squared errors are compared for each SNP. However, this error includes (1) the variance of the estimate due to the sample size from which the AF is estimated and (2) the variance of the estimate due to the relatedness of the samples. Assuming the Exac AF f to be the true underlying frequency, a normal approximation for the sample AF f _{ n } estimated from n unrelated diploid samples is given by \(f_{n} \sim \text {Normal} \left (f, \frac {f(1f)}{2n} \right)\) [16]. Therefore, we can construct a zscore that accounts for sampling variance as \(z = \frac {\sqrt {2n}(f_{n}f)}{\sqrt {f(1f)}}\). Comparing zscores, we find that the zscore for the analysis using only unrelated individuals (mean z=−0.19) is smaller than the zscore for the analysis using all individuals (mean z=−0.25), with p< 2.2e–16 for a onetailed paired ttest. The zscore using only unrelated individuals also has a smaller variance (var(z) = 1.80) than that for the zscore using all individuals (var(z) = 2.74). This suggests that the allele frequency estimates from the analysis using unrelated individuals are more accurate than those using all individuals. An alternative way of evaluating the accuracy of estimated allele frequencies is discussed in Additional file 1: Section S1.
Inference of sex bias from autosomal and Xchromosome ancestry
To demonstrate the utility of ancestry inference on haploid sex chromosomes, we examine sexbiased admixture in the AfricanAmerican population in the southwestern United States (ASW). We used 1092 individuals from Phase 1 of the 1000 Genomes project including the ASW with populations from Europe, Africa, Asia and the Americas. We removed 5 ASW individuals (ids NA19921, NA19625, NA20414, NA20299, NA20314) who had very high (greater than 5 %) Native American ancestry based on results reported by the 1000 Genomes Project [3]. SNPs were filtered to include only those with MAF ≥5 % and then thinned for LD to have pairwise r ^{2}≤0.1 in 50 kb windows.
Sex bias was analyzed by running ADMIXTURE on the 1087 individuals with K=3 clusters on the autosomes and Xchromosome separately and comparing ancestry proportions for each individual on the two chromosome subsets. If there was no sexbias during admixture, then the ancestry proportions on the two chromosome sets should be (nearly) equal.
We compared two ways of analyzing sex bias:

Females only: Since ADMIXTURE (without the new haploid option) requires diploid data, we subset the dataset to 562 females and ran ADMIXTURE on the autosomes and Xchromosome separately.

Males and Females: Using the haploid option (the X chromosome was denoted haploid in males with haploid=“male:23”), we ran ADMIXTURE separately on the autosomes and Xchromosome on the entire set of 1087 individuals.
Table 1 shows the results of the analysis. From both analyses, we can see that autosomes have an excess of European ancestry and Xchromsomes have an excess of African and Native American ancestry. Since the ancestry proportions for each component (European/African/Native American) are not normally distributed, a ttest is not suitable for assessing statistical significance. Therefore we used a Wilcoxon signedrank test to compare the paired Xchromosome and autosomal ancestry proportions (see Additional file 1: Section S2 for the behavior of the test under the null hypothesis). We see that the analysis using both males and females can reject the null hypothesis of identical mean ranks (no sex bias) at the 0.05 significance level, while the femalesonly analysis fails to reject the null hypothesis. From previous work, there is evidence for sexbiased admixture in AfricanAmericans [9, 12, 17]. Thus, including male samples in the analysis of Xchromosome ancestry with the haploid option improves power to detect sex bias in admixture.
Discussion
We have described two extensions to the ADMIXTURE program. The projection extension allows ADMIXTURE to estimate ancestry for a new set of individuals using predefined ancestral population frequencies (usually from an earlier ADMIXTURE run). This functionality is similar to that implemented in iAdmix [13], which uses a different optimization method, and that implemented by Sikora et al. [18] for ancestry inference for ancient individuals using an expectationmaximization algorithm. This extension enables efficient inference of ancestry on large genomic datasets using ancestral populations estimated from reference panels like the 1000 Genomes Project. The allele frequencies inferred by ADMIXTURE have been used previously to simulate individual genotypes [19, 20]. The resulting individual genomes have been used in subsequent ADMIXTURE [19] or other [20] analyses to enable a “supervised” analysis [21]. Our extension provides an efficient and principled framework for this approach.
The projection approach is useful when a new dataset is strongly unbalanced in its distribution of populations, since an unbalanced dataset can affect the accuracy of ancestry inference [22]. Another advantage of the projection approach is that individual ancestry can be inferred in parallel for each individual. Thus, if a user has access to multiple computers (or a computing cluster), then ancestry can be estimated for hundreds of thousands of individuals in a few hours. Our results on a dataset of 10,920 individuals constructed using the 1000 Genomes project show how projection improves the efficiency of ADMIXTURE. The projection approach can also be used to infer the ancestry of ancient DNA samples, as in Sikora et al. [18] and other work. A limitation of the projection approach is that if the projected data contains a novel population which was not present in the initial (training) set, the projection results may not be identical to those obtained from running ADMIXTURE on the combined dataset. The fit of the projected data to the population structure in the training set can be evaluated using the posterior predictive checks (PPCs) of Mimno et al. [23]. This framework uses the inferred model parameters from ADMIXTURE to generate simulated datasets whose similarity to the original dataset is assessed through a set of population genetics summary statistics such as identitybystate, linkage disequilibrium, F _{ ST } etc. If the projected individuals belong to a population not present in the training set, the PPCs will indicate a high discrepancy between the summary statistics for the projected individuals and the generated datasets. An alternative way of examining fit between the projected individuals and the training set is to examine the crossvalidation error of the projection step using the “–cv” option of ADMIXTURE. A high crossvalidation error would indicate that the projected individuals belong to a population not present in the training set.
Through experiments on HapMap CEU and YRI individuals, we showed that the projection approach is also useful for accurate ancestry inference on related individuals. This approach allows us to infer allele frequencies for ancestral populations with reduced error. A limitation of this approach is that if the number of founders in a pedigree is small, then the error in allele frequencies estimated from running ADMIXTURE only on the unrelated individuals may be large due to a larger sampling variance. In such cases, the method may not produce more accurate estimates than those obtained by running ADMIXTURE on the entire dataset.
The second extension we have developed correctly models the loglikelihood for haploid chromosomes. This can be used to estimate ancestry on sex chromosomes and thus estimate sex bias in ancestry. Our analysis of sex bias in the ASW AfricanAmerican population shows that accurate ancestry inference on the haploid Xchromosome in males can improve power of tests for sex bias that use ancestry proportions as a test statistic. While the test we described based on a difference in ancestry has a number of limitations (correlated tests, no correction for multiple testing, etc.), it is only intended to demonstrate the advantage of ancestry inference on haploid chromosomes for more power in tests for sex bias and is applicable to other tests of sex bias.
Conclusions
ADMIXTURE is widely used for analysis of ancestry in genomic datasets. The extensions we have described increase the efficiency of ADMIXTURE and increase its versatility. The projection operation allows more efficient analysis of large datasets by using available reference panels. It also allows analysis of ancestry in pedigrees. Ancestry analysis of haploid sexchromosomes improves power to detect sex bias in populations using autosomal and Xchromosome ancestry. We expect that with the growing number of populations being sequenced and large amounts of individuallevel genotype data being generated, these extensions will make ADMIXTURE more useful to researchers.
Endnote
^{1} We only show results for one replicate since iAdmix produces 130 GB of output files for one replicate of such a large dataset.
References
 1
Alexander DH, Novembre J, Lange K. Fast modelbased estimation of ancestry in unrelated individuals. Genome Res. 2009; 19(9):1655–1664.
 2
Pritchard JK, Stephens M, Donnelly P. Inference of Population Structure Using Multilocus Genotype Data. Genetics. 2000; 155(2):945–959.
 3
1000 Genomes Project Consortium. An integrated map of genetic variation from 1092 human genomes. Nature. 2012; 491(7422):56–65.
 4
Altshuler DM, Gibbs RA, Peltonen L, Dermitzakis E, Schaffner SF, Yu F, Bonnen PE, de Bakker PIW, Deloukas P, Gabriel SB, Gwilliam R, Hunt S, Inouye M, Jia X, Palotie A, Parkin M, Whittaker P, Chang K, Hawes A, Lewis LR, Ren Y, Wheeler D, Muzny DM, Barnes C, Darvishi K, Hurles M, Korn JM, Kristiansson K, Lee C, McCarrol SA, Nemesh J, Keinan A, Montgomery SB, Pollack S, Price AL, Soranzo N, GonzagaJauregui C, Anttila V, Brodeur W, Daly MJ, Leslie S, McVean G, Moutsianas L, Nguyen H, Zhang Q, Ghori MJR, McGinnis R, McLaren W, Takeuchi F, Grossman SR, Shlyakhter I, Hostetter EB, Sabeti PC, Adebamowo CA, Foster MW, Gordon DR, Licinio J, Manca MC, Marshall PA, Matsuda I, Ngare D, Wang VO, Reddy D, Rotimi CN, Royal CD, Sharp RR, Zeng C, Brooks LD, McEwen JE. Integrating common and rare genetic variation in diverse human populations. Nature. 2010; 467(7311):52–8.
 5
MorenoEstrada A, Gravel S, Zakharia F, McCauley JL, Byrnes JK, Gignoux CR, OrtizTello PA, Martínez RJ, Hedges DJ, Morris RW, Eng C, Sandoval K, AcevedoAcevedo S, Norman PJ, Layrisse Z, Parham P, MartínezCruzado JC, Burchard EG, Cuccaro ML, Martin ER, Bustamante CD. Reconstructing the population genetic history of the Caribbean. PLoS Genet. 2013; 9(11):1003925.
 6
Purcell S, Neale B, ToddBrown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC. PLINK: a tool set for wholegenome association and populationbased linkage analyses. Am J Hum Genet. 2007; 81(3):559–75.
 7
Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM. Robust relationship inference in genomewide association studies. Bioinformatics (Oxford, England). 2010; 26(22):2867–73.
 8
Staples J, Nickerson DA, Below JE. Utilizing graph theory to select the largest set of unrelated individuals for genetic analysis. Genet Epidemiol. 2013; 37(2):136–41.
 9
Parra EJ, Kittles RA, Argyropoulos G, Pfaff CL, Hiester K, Bonilla C, Sylvester N, ParrishGause D, Garvey WT, Jin L, McKeigue PM, Kamboh MI, Ferrell RE, Pollitzer WS, Shriver MD. Ancestral proportions and admixture dynamics in geographically defined African Americans living in South Carolina. Am J Phys Anthropol. 2001; 114(1):18–29.
 10
Wood ET, Stover DA, Ehret C, DestroBisol G, Spedini G, McLeod H, Louie L, Bamshad M, Strassmann BI, Soodyall H, Hammer MF. Contrasting patterns of Y chromosome and mtDNA variation in Africa: evidence for sexbiased demographic processes. Eur J Hum Genet EJHG. 2005; 13(7):867–76.
 11
Stefflova K, Dulik MC, Pai AA, Walker AH, ZeiglerJohnson CM, Gueye SM, Schurr TG, Rebbeck TR. Evaluation of group genetic ancestry of populations from Philadelphia and Dakar in the context of sexbiased admixture in the Americas. PloS ONE. 2009; 4(11):7842.
 12
Bryc K, Auton A, Nelson MR, Oksenberg JR, Hauser SL, Williams S, Froment A, Bodo JM, Wambebe C, Tishkoff SA, Bustamante CD. Genomewide patterns of population structure and admixture in West Africans and African Americans. Proc Natl Acad Sci U S A. 2010; 107(2):786–91.
 13
Bansal V, Libiger O. Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations. BMC Bioinforma. 2015; 16(1):4.
 14
McPeek MS, Wu X, Ober C. Best linear unbiased allelefrequency estimation in complex pedigrees. Biometrics. 2004; 60(2):359–67.
 15
Consortium EA, Lek M, Karczewski K, Minikel E, Samocha K, Banks E, Fennell T, O’DonnellLuria A, Ware J, Hill A, Cummings B, Tukiainen T, Birnbaum D, Kosmicki J, Duncan L, Estrada K, Zhao F, Zou J, PierceHoffman E, Cooper D, DePristo M, Do R, Flannick J, Fromer M, Gauthier L, Goldstein J, Gupta N, Howrigan D, Kiezun A, Kurki M, Moonshine AL, Natarajan P, Orozco L, Peloso G, Poplin R, Rivas M, RuanoRubio V, Ruderfer D, Shakir K, Stenson P, Stevens C, Thomas B, Tiao G, TusieLuna M, Weisburd B, Won HH, Yu D, Altshuler D, Ardissino D, Boehnke M, Danesh J, Roberto E, Florez J, Gabriel S, Getz G, Hultman C, Kathiresan S, Laakso M, McCarroll S, McCarthy M, McGovern D, McPherson R, Neale B, Palotie A, Purcell S, Saleheen D, Scharf J, Sklar P, Patrick S, Tuomilehto J, Watkins H, Wilson J, Daly M, MacArthur D. Analysis of proteincoding genetic variation in 60,706 humans. Technical report. 2015. http://biorxiv.org/content/early/2015/10/30/030338.abstract. Accessed 31 Oct 2015.
 16
Nicholson G, Smith AV, Jonsson F, Gustafsson O, Stefansson K, Donnelly P. Assessing population differentiation and isolation from singlenucleotide polymorphism data. J R Stat Soc Ser B Stat Methodol. 2002; 64(4):695–715.
 17
Bryc K, Durand EY, Macpherson JM, Reich D, Mountain JL. The genetic ancestry of African Americans, Latinos, and European Americans across the United States. Am J Hum Genet. 2015; 96(1):37–53.
 18
Sikora M, Carpenter ML, MorenoEstrada A, Henn BM, Underhill PA, SánchezQuinto F, Zara I, Pitzalis M, Sidore C, Busonero F, Maschio A, Angius A, Jones C, MendozaRevilla J, Nekhrizov G, Dimitrova D, Theodossiev N, Harkins TT, Keller A, Maixner F, Zink A, Abecasis G, Sanna S, Cucca F, Bustamante CD. Population genomic analysis of ancient and modern genomes yields new insights into the genetic ancestry of the Tyrolean Iceman and the genetic structure of Europe. PLoS Genet. 2014; 10(5):1004353.
 19
Dienekes. Dodecad Ancestry Project: How to create Zombies from ADMIXTURE etc. 2011. http://dodecad.blogspot.com/2011/05/howtocreatezombiesfromadmixture.html. Accessed 02 Sept 2015.
 20
Elhaik E, Tatarinova T, Chebotarev D, Piras IS, Maria Calò C, De Montis A, Atzori M, Marini M, Tofanelli S, Francalacci P, Pagani L, TylerSmith C, Xue Y, Cucca F, Schurr TG, Gaieski JB, Melendez C, Vilar MG, Owings AC, Gómez R, Fujita R, Santos FR, Comas D, Balanovsky O, Balanovska E, Zalloua P, Soodyall H, Pitchappan R, Ganeshprasad A, Hammer M, MatisooSmith L, Wells RS. Geographic population structure analysis of worldwide human populations infers their biogeographical origins. Nat Commun. 2014; 5:3513.
 21
Alexander DH, Lange K. Enhancements to the ADMIXTURE algorithm for individual ancestry estimation. BMC Bioinforma. 2011; 12(1):246.
 22
Shringarpure S, Xing EP. Effects of sample selection bias on the accuracy of population structure and ancestry inference. G3 (Bethesda, Md). 2014; 4(5):901–11.
 23
Mimno D, Blei DM, Engelhardt BE. Posterior predictive checks to quantify lackoffit in admixture models of latent population structure. Proc Natl Acad Sci U S A. 2015; 112(26):3441–450.
Acknowledgements
The authors would like to thank Shaila Musharroff, Peter Carbonetto, Alicia Martin, Caitlin Uren, Brenna Henn and the Bustamante lab for comments on the manuscript and software, and the Stanford Genetics Bioinformatics Service Center for computing resources. CDB was supported by grants from the National Science Foundation (DMS1201234) and National Institutes of Health (U01HG007436). KL was supported by grants from the National Human Genome Research Institute (HG006139) and the National Institute of General Medical Sciences (GM053275).
Availability and requirements
Project name: ADMIXTURE
Project home page: http://www.genetics.ucla.edu/software/admixture
Operating system(s): Linux, Mac OS X
Programming language: C++
Other requirements: None
License: Binaries freely available; source code proprietary
Any restrictions to use by nonacademics: None
Authors’ contributions
SSS and KL devised the mathematical details. SSS and DHA implemented the software. SSS and CDB designed the experiments. SSS analyzed data. SSS, KL, CDB and DHA composed the manuscript. The authors have approved the manuscript.
Competing interests
CDB is on the scientific advisory boards (SABs) of Ancestry.com, Personalis, Liberty Biosecurity, and Etalon DX. He is also a founder and chair of the SAB of IdentifyGenomics. None of these entities played a role in the design, interpretation, or presentation of these results. The authors declare that they have no competing interests.
Author information
Additional file
Additional file 1
Supplementary Text. (PDF 132 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Shringarpure, S.S., Bustamante, C.D., Lange, K. et al. Efficient analysis of large datasets and sex bias with ADMIXTURE. BMC Bioinformatics 17, 218 (2016). https://doi.org/10.1186/s128590161082x
Received:
Accepted:
Published:
Keywords
 Supervised learning
 Reference panels
 Pedigrees
 Sexchromosome
 Sex bias
 Ancestry inference
 Admixture