Population structure analysis on 2504 individuals across 26 ancestries using bioinformatics approaches

Wang, Jing; Samuels, David C; Shyr, Yu; Guo, Yan

doi:10.1186/1471-2105-16-S15-P19

Volume 16 Supplement 15

Proceedings of the 14th Annual UT-KBRIN Bioinformatics Summit 2015

Poster presentation
Open access
Published: 23 October 2015

Population structure analysis on 2504 individuals across 26 ancestries using bioinformatics approaches

Jing Wang¹,
David C Samuels²,
Yu Shyr¹ &
…
Yan Guo¹

BMC Bioinformatics volume 16, Article number: P19 (2015) Cite this article

1412 Accesses
3 Citations
Metrics details

Background

Characterizing genetic diversity is crucial for reconstructing human evolution and for understanding the genetic basis of complex diseases; however, human population genetics are very complicated. Previously, we proved that based on the Hardy-Weinberg equilibrium, the heterozygous vs. non-reference homozygous single nucleotide polymorphism (SNP) ratio (het/nonref-hom) is two[1]. Later, we found that this ratio is race dependent, with African being the most genetically diverse race and Asian being the most homozygous[2]. This observation prompted us to conduct further study to understand the reasoning behind this diversity.

Materials and methods

Using the 1000 Genomes Project (1000G) released genomic data of 2504 individuals (26 races from five major-races), we first computed the (het/nonref-hom) ratio which has been applied as a quality control parameter for sequencing data[1, 3].

Results

As expected, we found that the het/nonref-hom ratio is strongly associated with human ancestry. Africans had the highest het/nonref-hom ratios, followed by Americans and Europeans, and East Asians had the lowest (Figure 1). More interestingly, the het/nonref-hom ratios of South Asians are much higher than those of East Asians, and Americans showed the highest range (Figure 1). Thus we further quantitatively analyzed genetic variation in human populations on the 1000G dataset of 10¹¹ observed genotypes (2504 individuals at 13424776 SNPs) using Structure 2.3.4[4]. The resulting population structure is consistent with the major geographical regions. All races identified a dominate origin population, except Americans who had the most variation in the structure, represented by several populations including the dominant population of Europeans (Figure 2). Moreover, East Asians and South Asians were found to originate from different ancestries (Figure 2).

Conclusions

Using novel bioinformatics approach, we identified new insights into the history and geography of human evolution, and are valuable for tracking human migration and adaptation to local conditions.

References

Guo Y, Ye F, Sheng Q, Clark T, Samuels DC: Three-stage quality control strategies for DNA re-sequencing data. Brief Bioinform. 2013
Google Scholar
Wang J, Raskin L, Samuels DC, Shyr Y, Guo Y: Genome measures used for quality control are dependent on gene function and ancestry. Bioinformatics. 2015, 31 (3): 318-323.
Article PubMed CAS Google Scholar
Guo Y, Zhao S, Sheng Q, Ye F, Li J, Lehmann B, Pietenpol J, Samuels DC, Shyr Y: Multi-perspective quality control of Illumina exome sequencing data using QC3. Genomics. 2014, 103 (5-6): 323-328.
Article PubMed CAS Google Scholar
Hubisz MJ, Falush D, Stephens M, Pritchard JK: Inferring weak population structure with the assistance of sample group information. Mol Ecol Resour. 2009, 9 (5): 1322-1332.
Article PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

Center for Quantitative Sciences, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA
Jing Wang, Yu Shyr & Yan Guo
Center for Human Genetics Research, Vanderbilt University, Nashville, TN, 37232, USA
David C Samuels

Authors

Jing Wang
View author publications
You can also search for this author in PubMed Google Scholar
David C Samuels
View author publications
You can also search for this author in PubMed Google Scholar
Yu Shyr
View author publications
You can also search for this author in PubMed Google Scholar
Yan Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yan Guo.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Wang, J., Samuels, D.C., Shyr, Y. et al. Population structure analysis on 2504 individuals across 26 ancestries using bioinformatics approaches. BMC Bioinformatics 16 (Suppl 15), P19 (2015). https://doi.org/10.1186/1471-2105-16-S15-P19

Download citation

Published: 23 October 2015
DOI: https://doi.org/10.1186/1471-2105-16-S15-P19

Proceedings of the 14th Annual UT-KBRIN Bioinformatics Summit 2015

Population structure analysis on 2504 individuals across 26 ancestries using bioinformatics approaches

Background

Materials and methods

Results

Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

Proceedings of the 14th Annual UT-KBRIN Bioinformatics Summit 2015

Population structure analysis on 2504 individuals across 26 ancestries using bioinformatics approaches

Background

Materials and methods

Results

Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us