Skip to main content

Table 1 Descriptive metrics of 1231 whole genome sequences by batch

From: Identifying and mitigating batch effects in whole genome sequencing data

Variable Mean (SD)

Group 1

Group 2

p-valuea

N

918

313

 

GATK Genotype Quality

91.47 (2.72)

90.77 (3.57)

NS

Median Read Depth

33.65 (4.69)

35.39 (6.81)

NS

Ti/Tv in Non Coding Regions

2.01 (0.012)

1.95 (0.019)

< 0.0001

Ti/Tv in Coding Regions

2.99 (0.053)

2.90 (0.032)

< 0.0001

% Confirmed in 1000 Genomes

81 (0.87)

77 (0.76)

< 0.0001

Percent Heterozygote

7.5 (0.48)

8.2 (0.45)

< 0.0001

  1. Group 1 and Group 2 refer to two different groups detected via a visualization of eigenvectors from a PCA of metrics derived from the gVCF files
  2. GATK Genome Analysis Toolkit, Ti/Tv transition transversion ratio, NS not significant
  3. The means of each variable are reported along with the standard deviation in parenthesis
  4. aDifferences between the two groups were assessed using the Wilcoxon Rank Sum Test, two-sided alternative, with a Bonferroni adjustment for multiple tests