Skip to main content

Table 1 Hapmap trio datasets description

From: Shape-IT: new rapid and accurate algorithm for haplotype inference

Datasets

Chromosome

#datasets

#SNP

#indiv

Details

CEU Size

1 to 5

250

10 to 160

60

50 datasets of 10, 20, 40, 80 and 160 adjacent SNPs with MAF above 5%

CEU Density

1 to 5

300

40

60

50 datasets with spanned distance between SNP above 0, 0.5, 1, 2, 4 and 8 kb (MAF 5%)

CEU MAF

1 to 5

150

40

60

50 datasets with MAF above 1%, 5% and 10%

YRI Size

1 to 5

250

10 to 160

60

50 datasets of 10, 20, 40, 80 and 160 adjacent SNPs with MAF above 5%

YRI Density

1 to 5

300

40

60

50 datasets with spanned distance between SNP above 0, 0.5, 1, 2, 4 and 8 kb (MAF 5%)

YRI MAF

1 to 5

150

40

60

50 datasets with MAF above 1%, 5% and 10%

CEU illumina 50

12

300

50

60

15,000 illumina SNPs grouped by dataset of 50 SNPs

CEU illumina 100

12

150

100

60

15,000 illumina SNPs grouped by dataset of 100 SNPs

CEU illumina 200

12

75

200

60

15,000 illumina SNPs grouped by dataset of 200 SNPs

GRIV

1

90

50 to 200

100 to 300

3,500 illumina SNPs grouped by dataset of 50, 100 and 200 SNPs

  1. Description of the benchmarks derived from the HapMap trios datasets that we used to compare accuracy and runtimes of the various algorithms in Table 4. For each parameter (size, density, and MAF) 10 samples were chosen in each of the chromosomes 1 to 5, i.e. a total of 50 tests per parameter.