Skip to main content

Table 1 Variant calling accuracy when inferring using 35x WGS data from HG003 and 30x WGS data from HG00733

From: Improving variant calling using population data and deep learning

Dataset

Variant type

Population

Precision

Recall

F1

HG003

INDEL

Agnostic

0.997351

0.993922

0.995634

1000genomes

0.997462

0.993977

0.995716

AFR

0.997337

0.993623

0.995476

AMR

0.997355

0.993787

0.995568

EAS

0.997021

0.993062

0.995038

EUR

0.997364

0.993801

0.995579

SAS

0.997333

0.993692

0.995509

SNP

Agnostic

0.998131

0.993769

0.995945

1000genomes

0.998461

0.993868

0.996159

AFR

0.998475

0.993671

0.996067

AMR

0.998472

0.993816

0.996138

EAS

0.998444

0.993489

0.995961

EUR

0.998471

0.993808

0.996134

SAS

0.998464

0.993782

0.996117

HG00733

SNP

Agnostic

0.997700

0.993789

0.995740

1000genomes

0.997783

0.994116

0.995946

AFR

0.997783

0.993956

0.995866

AMR

0.997802

0.993950

0.995873

EAS

0.997813

0.993409

0.995606

EUR

0.997813

0.993932

0.995868

SAS

0.997810

0.993862

0.995832

  1. Bold numbers indicate best performance in each dataset-variant type group
  2. Methods: default DeepVariant (Agnostic), population-aware DeepVariant using allele frequencies from the entire 1000Genomes (1000genomes) and five 1000Genomes superpopulations (AFR, AMR, EAS, EUR and SAS). Higher values correspond to higher accuracy