Sub-continental problem
|
Number of subjects, split
|
Number of SNPs
|
Baseline
|
DT1 (Number of SNPs), Accuracy
|
Minimal Number of DTs (Number of SNPs), Accuracy
|
Number of Robust DTs (Number of SNPs)
|
---|
European
|
267,
|
882895
|
61.8%
|
1 (10), 79.0% ± 5.6%
|
3 (31), 86.6% ± 2.4%
|
15 (180)
|
CEU: 165
|
TSI: 102
|
East Asian
|
250,
|
892833
|
54.8%
|
1 (12), 74.4% ± 7.9%
|
39 (502), 95.6% ± 3.9%
|
67 (877)
|
CHB: 137
|
JPT: 113
|
African
|
497,
|
616597
|
40.8%
|
1 (23), 66.2% ± 5.3%
|
21 (526), 95.6% ± 2.1%
|
157 (4236)
|
LWK:110
|
MKK: 184
|
YRI: 203
|
North American
|
548,
|
526394
|
30.1%
|
1 (19), 82.7% ± 5.4%
|
11 (242), 98.4% ± 2.0%
|
70 (1643)
|
ASW: 87
|
CEU: 165
|
CHD: 109
|
GIH: 101
|
MXL: 86
|
Kenyan
|
294,
|
781061
|
62.6%
|
1 (11), 79.2% ± 3.5%
|
25 (271), 95.9% ± 1.5%
|
31 (341)
|
LWK: 110
|
MKK: 184
|
Chinese
|
246,
|
829364
|
55.7%
|
1 (15), 47.2% ± 9.1%
|
- (−), ≤55.7%
|
- (−)
|
CHB: 137
|
|
CHD: 109
| | | | | |
- This table summarizes the result of our studies on various sub-continental classification problems. The “Number of Subjects, Split” column shows the total number of subjects, followed by the list of (ethnic-group; number) pairs, giving the name of each subgroups and its size here. The “Number of SNPs” column gives the number of SNPs used for this study. The “Baseline” column gives the baseline accuracy of just using the majority class. The “DT1 (Number of SNPs), Accuracy” column provides the number of SNPs in the first decision tree, and its estimated 10-fold cross-validation accuracy. The “Minimal Number of DTs (Number of SNPs), Accuracy” column gives the minimal number of disjoint decision trees required to achieve the highest accuracy, and the number of SNPs involved, in these trees. The “Number of Robust DTs (Number of SNPs)” column gives the number of decision trees required to achieve robustness and the number of SNPs involved.