Skip to main content

Table 1 Descriptive metrics of 1231 whole genome sequences by batch

From: Identifying and mitigating batch effects in whole genome sequencing data

Variable Mean (SD) Group 1 Group 2 p-valuea
N 918 313  
GATK Genotype Quality 91.47 (2.72) 90.77 (3.57) NS
Median Read Depth 33.65 (4.69) 35.39 (6.81) NS
Ti/Tv in Non Coding Regions 2.01 (0.012) 1.95 (0.019) < 0.0001
Ti/Tv in Coding Regions 2.99 (0.053) 2.90 (0.032) < 0.0001
% Confirmed in 1000 Genomes 81 (0.87) 77 (0.76) < 0.0001
Percent Heterozygote 7.5 (0.48) 8.2 (0.45) < 0.0001
  1. Group 1 and Group 2 refer to two different groups detected via a visualization of eigenvectors from a PCA of metrics derived from the gVCF files
  2. GATK Genome Analysis Toolkit, Ti/Tv transition transversion ratio, NS not significant
  3. The means of each variable are reported along with the standard deviation in parenthesis
  4. aDifferences between the two groups were assessed using the Wilcoxon Rank Sum Test, two-sided alternative, with a Bonferroni adjustment for multiple tests