Skip to main content

Table 3 Variant filtering of WES data improves Ti/Tv ratios

From: Effective filtering strategies to improve data quality from population-based whole exome sequencing studies

Filters

Variants removed

Number of variants (% of unfiltered)

Ti/Tv

p-valuea

Novel

Known

Truth

All

None

0

448,862 (100%)

1.93

2.71

3.05

2.39

N/A

VQSR

56,036

392,826 (87.5%)

2.21

2.88

3.07

2.63

<10-320

HWE

11,855

437,007 (97.4%)

1.93

2.73

3.06

2.40

1.42 × 10-21

Ave. GQ

33,083

415,779 (92.6%)

2.00

2.73

3.06

2.47

1.13 × 10-265

Call Rate

51,117

397,745 (88.6%)

2.09

2.78

3.08

2.51

<10-320

Combined*

59,952

388,910 (86.6%)

2.09

2.80

3.09

2.52

<10-320

Combined* + VQSR

97,840

351,022 (78.2%)

2.38

2.96

3.10

2.75

<10-320 (3.72 × 10-106)b

VQSR + Combined*

92,091

356,771 (79.5%)

2.34

2.94

3.10

2.72

<10-320

  1. Variants found in NCBI dbSNP Build 135.
  2. Variants found in HapMap phase 3 release 3.
  3. *Combination of HWE, Ave. GQ and Call Rate filters.
  4. ap-value based on a hypergeometric test of whether the removed variants were enriched for Tv over Ti vs. the unfiltered variant sets.
  5. bp-value based on a hypergeometric test of whether the variants that differed between Combined + VQSR variant sets and VQSR + Combined variant sets were enriched for Tv over Ti.