Skip to main content

Table 2 Exact genotypes in markers per data MAF bin

From: A joint use of pooling and imputation for genotyping SNPs

MAF

0.00–0.02

0.02–0.04

0.04–0.06

0.06–0.10

0.10–0.20

0.20–0.40

0.40–0.50

Scenario: LD + HD

Number before imputation

520.000

779.000

673.000

1537.000

3969.000

6561.000

2976.000

Number after imputation

Beagle

12699.362

5167.613

2776.687

4673.658

8804.892

12301.371

5337.921

Prophaser

12727.142

5193.438

2793.221

4705.346

8870.104

12396.258

5379.408

Proportion before imputation

0.041

0.149

0.238

0.322

0.441

0.520

0.543

Proportion after imputation

Beagle

0.994

0.987

0.984

0.981

0.977

0.975

0.975

Prophaser

0.996

0.992

0.989

0.987

0.985

0.983

0.982

Scenario: pooled HD

Number before imputation

12534.608

4826.542

2396.671

3481.896

4249.592

1853.529

159.575

Number after imputation

Beagle

12565.650

4892.246

2478.292

3778.296

5637.525

5407.479

1941.162

Prophaser

12755.854

5184.621

2758.079

4532.467

7964.742

9858.467

4012.725

Proportion before imputation

0.981

0.922

0.849

0.731

0.472

0.147

0.029

Proportion after imputation

Beagle

0.984

0.935

0.878

0.793

0.626

0.429

0.354

Prophaser

0.999

0.990

0.977

0.951

0.884

0.782

0.733

  1. The number of markers is given as the average over all samples in the study population per bin. The proportion of markers is given relatively to the number of markers per bin. To the difference of concordance, only full matches with the true genotype are counted, not half-matches. For the LD + HD scenario, the number of exact genotypes before imputation is equal to the number of variants on the LD map. For the pooled HD scenario, the number of exact genotypes before imputation is equal to the average number of genotypes that are fully determined after pooling simulation. Simulating pooling followed by imputation with Prophaser yields a gain in accuracy for the very rare variants (\(MAF < 0.02\)) which are almost all exactly genotyped. This gain is not negligible given the low occurence of these variations
  2. The best accuracy scores achieved by Prophaser are marked in bold