Skip to main content

Table 2 10 times repeated 5-fold cross-validated F1 score for 19 population classes from 1000 Genomes Project using SVM, PCA or GTM

From: Probabilistic ancestry maps: a method to assess and visualize population substructures in genetics

Ancestry 1000G code Population PCA 8-NN SVM 10 PCs GTM 3 PCs GTM 10 PCs
EAS CHB Han Chinese 0.20±0.01 0.78±0.01 0.45±0.04 0.75±0.01
EAS JPT Japanese 0.37±0.02 1.00±0.00 0.80±0.01 1.00±0.00
EAS CHS Southern Han Chinese 0.34±0.02 0.80±0.01 0.54±0.02 0.80±0.01
EAS CDX Chinese Dai 0.24±0.02 0.10±0.02 0.51±0.03 0.44±0.08
EAS KHV Kinh in Vietnam 0.44±0.01 0.68±0.00 0.63±0.01 0.71±0.01
EUR CEU+GBR Northern/Western Eur. 0.75±0.01 0.99±0.00 0.79±0.01 0.99±0.00
EUR TSI Toscani 0.46±0.01 0.74±0.02 0.58±0.01 0.54±0.06
EUR FIN Finnish 0.95±0.01 0.99±0.00 0.91±0.01 0.99±0.01
EUR IBS Iberian 0.35±0.03 0.81±0.01 0.35±0.04 0.74±0.02
AFR YRI Yoruba in Nigeria 0.30±0.02 0.69±0.00 0.15±0.03 0.66±0.03
AFR LWK Luhya 0.67±0.01 1.00±0.00 0.59±0.01 1.00±0.00
AFR GWD Gambian 0.26±0.02 0.94±0.02 0.23±0.02 0.78±0.07
AFR MSL Mende 0.25±0.03 0.93±0.02 0.35±0.03 0.81±0.04
AFR ESN Esan in Nigeria 0.28±0.02 0.00±0.01 0.19±0.05 0.28±0.13
AMR PUR Puerto Ricans 0.90±0.01 0.86±0.02 0.90±0.01 0.87±0.03
AMR CLM Colombians 0.69±0.01 0.85±0.01 0.84±0.01 0.82±0.02
AMR PEL Peruvians 0.88±0.01 0.97±0.01 0.94±0.01 0.95±0.01
SAS PJL Punjabi 0.89±0.01 0.96±0.01 0.96±0.01 0.96±0.00
SAS BEB Bengali 0.95±0.01 0.96±0.01 0.96±0.01 0.96±0.01
Overall    0.54±0.00 0.80±0.00 0.61±0.01 0.80±0.01
  1. SVM 10 PCs = support vector machine classification model using 10 principal components, PCA 8-NN = k-nearest neighbours model based on 2D PCA map (k = 8), GTM 3 or 10 PCs = bayesian classification model based on generative topographic mapping using 3 or 10 principal components. Ancestry codes: EAS = East Asians, EUR = Europeans, AFR = Africans, AMR = Admixed Americans, SAS = South Asians. CEU and GBR were merged into one class. Each value is an average with 95% confidence interval