Skip to main content

Table 2 10 times repeated 5-fold cross-validated F1 score for 19 population classes from 1000 Genomes Project using SVM, PCA or GTM

From: Probabilistic ancestry maps: a method to assess and visualize population substructures in genetics

Ancestry

1000G code

Population

PCA 8-NN

SVM 10 PCs

GTM 3 PCs

GTM 10 PCs

EAS

CHB

Han Chinese

0.20±0.01

0.78±0.01

0.45±0.04

0.75±0.01

EAS

JPT

Japanese

0.37±0.02

1.00±0.00

0.80±0.01

1.00±0.00

EAS

CHS

Southern Han Chinese

0.34±0.02

0.80±0.01

0.54±0.02

0.80±0.01

EAS

CDX

Chinese Dai

0.24±0.02

0.10±0.02

0.51±0.03

0.44±0.08

EAS

KHV

Kinh in Vietnam

0.44±0.01

0.68±0.00

0.63±0.01

0.71±0.01

EUR

CEU+GBR

Northern/Western Eur.

0.75±0.01

0.99±0.00

0.79±0.01

0.99±0.00

EUR

TSI

Toscani

0.46±0.01

0.74±0.02

0.58±0.01

0.54±0.06

EUR

FIN

Finnish

0.95±0.01

0.99±0.00

0.91±0.01

0.99±0.01

EUR

IBS

Iberian

0.35±0.03

0.81±0.01

0.35±0.04

0.74±0.02

AFR

YRI

Yoruba in Nigeria

0.30±0.02

0.69±0.00

0.15±0.03

0.66±0.03

AFR

LWK

Luhya

0.67±0.01

1.00±0.00

0.59±0.01

1.00±0.00

AFR

GWD

Gambian

0.26±0.02

0.94±0.02

0.23±0.02

0.78±0.07

AFR

MSL

Mende

0.25±0.03

0.93±0.02

0.35±0.03

0.81±0.04

AFR

ESN

Esan in Nigeria

0.28±0.02

0.00±0.01

0.19±0.05

0.28±0.13

AMR

PUR

Puerto Ricans

0.90±0.01

0.86±0.02

0.90±0.01

0.87±0.03

AMR

CLM

Colombians

0.69±0.01

0.85±0.01

0.84±0.01

0.82±0.02

AMR

PEL

Peruvians

0.88±0.01

0.97±0.01

0.94±0.01

0.95±0.01

SAS

PJL

Punjabi

0.89±0.01

0.96±0.01

0.96±0.01

0.96±0.00

SAS

BEB

Bengali

0.95±0.01

0.96±0.01

0.96±0.01

0.96±0.01

Overall

  

0.54±0.00

0.80±0.00

0.61±0.01

0.80±0.01

  1. SVM 10 PCs = support vector machine classification model using 10 principal components, PCA 8-NN = k-nearest neighbours model based on 2D PCA map (k = 8), GTM 3 or 10 PCs = bayesian classification model based on generative topographic mapping using 3 or 10 principal components. Ancestry codes: EAS = East Asians, EUR = Europeans, AFR = Africans, AMR = Admixed Americans, SAS = South Asians. CEU and GBR were merged into one class. Each value is an average with 95% confidence interval