- Poster presentation
- Open Access
Population structure analysis on 2504 individuals across 26 ancestries using bioinformatics approaches
BMC Bioinformatics volume 16, Article number: P19 (2015)
Characterizing genetic diversity is crucial for reconstructing human evolution and for understanding the genetic basis of complex diseases; however, human population genetics are very complicated. Previously, we proved that based on the Hardy-Weinberg equilibrium, the heterozygous vs. non-reference homozygous single nucleotide polymorphism (SNP) ratio (het/nonref-hom) is two. Later, we found that this ratio is race dependent, with African being the most genetically diverse race and Asian being the most homozygous. This observation prompted us to conduct further study to understand the reasoning behind this diversity.
As expected, we found that the het/nonref-hom ratio is strongly associated with human ancestry. Africans had the highest het/nonref-hom ratios, followed by Americans and Europeans, and East Asians had the lowest (Figure 1). More interestingly, the het/nonref-hom ratios of South Asians are much higher than those of East Asians, and Americans showed the highest range (Figure 1). Thus we further quantitatively analyzed genetic variation in human populations on the 1000G dataset of 1011 observed genotypes (2504 individuals at 13424776 SNPs) using Structure 2.3.4. The resulting population structure is consistent with the major geographical regions. All races identified a dominate origin population, except Americans who had the most variation in the structure, represented by several populations including the dominant population of Europeans (Figure 2). Moreover, East Asians and South Asians were found to originate from different ancestries (Figure 2).
Using novel bioinformatics approach, we identified new insights into the history and geography of human evolution, and are valuable for tracking human migration and adaptation to local conditions.
Guo Y, Ye F, Sheng Q, Clark T, Samuels DC: Three-stage quality control strategies for DNA re-sequencing data. Brief Bioinform. 2013
Wang J, Raskin L, Samuels DC, Shyr Y, Guo Y: Genome measures used for quality control are dependent on gene function and ancestry. Bioinformatics. 2015, 31 (3): 318-323.
Guo Y, Zhao S, Sheng Q, Ye F, Li J, Lehmann B, Pietenpol J, Samuels DC, Shyr Y: Multi-perspective quality control of Illumina exome sequencing data using QC3. Genomics. 2014, 103 (5-6): 323-328.
Hubisz MJ, Falush D, Stephens M, Pritchard JK: Inferring weak population structure with the assistance of sample group information. Mol Ecol Resour. 2009, 9 (5): 1322-1332.
About this article
Cite this article
Wang, J., Samuels, D.C., Shyr, Y. et al. Population structure analysis on 2504 individuals across 26 ancestries using bioinformatics approaches. BMC Bioinformatics 16, P19 (2015). https://doi.org/10.1186/1471-2105-16-S15-P19
- Single Nucleotide Polymorphism
- Population Structure
- Human Population
- Human Evolution
- Genome Project