Skip to main content

Table 7 Summary of the sub-continental classification problems results

From: ETHNOPRED: a novel machine learning method for accurate continental and sub-continental ancestry identification and population stratification correction

Sub-continental problem

Number of subjects, split

Number of SNPs

Baseline

DT1 (Number of SNPs), Accuracy

Minimal Number of DTs (Number of SNPs), Accuracy

Number of Robust DTs (Number of SNPs)

European

267,

882895

61.8%

1 (10), 79.0% ± 5.6%

3 (31), 86.6% ± 2.4%

15 (180)

CEU: 165

TSI: 102

East Asian

250,

892833

54.8%

1 (12), 74.4% ± 7.9%

39 (502), 95.6% ± 3.9%

67 (877)

CHB: 137

JPT: 113

African

497,

616597

40.8%

1 (23), 66.2% ± 5.3%

21 (526), 95.6% ± 2.1%

157 (4236)

LWK:110

MKK: 184

YRI: 203

North American

548,

526394

30.1%

1 (19), 82.7% ± 5.4%

11 (242), 98.4% ± 2.0%

70 (1643)

ASW: 87

CEU: 165

CHD: 109

GIH: 101

MXL: 86

Kenyan

294,

781061

62.6%

1 (11), 79.2% ± 3.5%

25 (271), 95.9% ± 1.5%

31 (341)

LWK: 110

MKK: 184

Chinese

246,

829364

55.7%

1 (15), 47.2% ± 9.1%

- (−), ≤55.7%

-  (−)

CHB: 137

 

CHD: 109

     
  1. This table summarizes the result of our studies on various sub-continental classification problems. The “Number of Subjects, Split” column shows the total number of subjects, followed by the list of (ethnic-group; number) pairs, giving the name of each subgroups and its size here. The “Number of SNPs” column gives the number of SNPs used for this study. The “Baseline” column gives the baseline accuracy of just using the majority class. The “DT1 (Number of SNPs), Accuracy” column provides the number of SNPs in the first decision tree, and its estimated 10-fold cross-validation accuracy. The “Minimal Number of DTs (Number of SNPs), Accuracy” column gives the minimal number of disjoint decision trees required to achieve the highest accuracy, and the number of SNPs involved, in these trees. The “Number of Robust DTs (Number of SNPs)” column gives the number of decision trees required to achieve robustness and the number of SNPs involved.