Skip to main content

Table 2 Pre-processing statistics of HapMap phase III datasets and sub-continental population classification problems

From: ETHNOPRED: a novel machine learning method for accurate continental and sub-continental ancestry identification and population stratification correction

Dataset/Problem

Samples

All SNPs

SNPs with call rate < 100%

SNPs on Non-autosomal Chr.

SNPs deviated from HWE

Filtered SNPs

Unfiltered SNPs

ASW

87

1458387

214898

34554

94234

298524

1159863

CEU

165

1458387

376531

34554

81633

427638

1030749

CHB

137

1458387

353208

34554

77028

423270

1035117

CHD

109

1458387

352031

34554

77111

421328

1037059

GIH

101

1458387

234863

34554

85463

314376

1144011

JPT

113

1458387

271105

34554

75502

337033

1121354

LWK

110

1458387

365638

34554

97174

425375

1033012

MKK

184

1458387

411395

34554

105490

471384

987003

MXL

86

1458387

311704

34554

86910

387207

1071180

TSI

102

1458387

268916

34554

81919

326585

1131802

YRI

203

1458387

423100

34554

94449

476513

981874

European

267

1458387

493449

34554

137488

575492

882895

East Asian

250

1458387

475217

34554

129695

565554

892833

African

497

1458387

742671

34554

228268

841790

616597

North American

548

1458387

803678

34554

306572

931993

526394

Kenyan

294

1458387

590202

34554

170547

677326

781061

Chinese

246

1458387

538224

34554

131394

629023

829364