Skip to main content

Table 3 Variant filtering of WES data improves Ti/Tv ratios

From: Effective filtering strategies to improve data quality from population-based whole exome sequencing studies

Filters Variants removed Number of variants (% of unfiltered) Ti/Tv p-valuea
Novel Known Truth All
None 0 448,862 (100%) 1.93 2.71 3.05 2.39 N/A
VQSR 56,036 392,826 (87.5%) 2.21 2.88 3.07 2.63 <10-320
HWE 11,855 437,007 (97.4%) 1.93 2.73 3.06 2.40 1.42 × 10-21
Ave. GQ 33,083 415,779 (92.6%) 2.00 2.73 3.06 2.47 1.13 × 10-265
Call Rate 51,117 397,745 (88.6%) 2.09 2.78 3.08 2.51 <10-320
Combined* 59,952 388,910 (86.6%) 2.09 2.80 3.09 2.52 <10-320
Combined* + VQSR 97,840 351,022 (78.2%) 2.38 2.96 3.10 2.75 <10-320 (3.72 × 10-106)b
VQSR + Combined* 92,091 356,771 (79.5%) 2.34 2.94 3.10 2.72 <10-320
  1. Variants found in NCBI dbSNP Build 135.
  2. Variants found in HapMap phase 3 release 3.
  3. *Combination of HWE, Ave. GQ and Call Rate filters.
  4. ap-value based on a hypergeometric test of whether the removed variants were enriched for Tv over Ti vs. the unfiltered variant sets.
  5. bp-value based on a hypergeometric test of whether the variants that differed between Combined + VQSR variant sets and VQSR + Combined variant sets were enriched for Tv over Ti.