Skip to main content

Table 1 Minimal differences between Picard, SAMTools, and no duplicate removal

From: Evaluating the necessity of PCR duplicate removal from next-generation sequencing data and a comparison of approaches

Subset

Total Variants

Ti/Tv Ratio

% Variants in dbSNP

Avg. Population Frequency

% Protein Changing Variants

 All Picard

16354497

2.14

72.05

0.21

0.40

 All SAMTools

16250761

2.14

71.86

0.22

0.40

 All No Dups

16494672

2.14

71.30

0.21

0.40

P-Value

<2.60e-16

1.00

0.99

0.99

1.00

Common to all three

15688522

2.15

80.18

0.22

0.41

 Unique to Picard

307486

1.92

66.27

0.16

0.33

 Unique to SAMTools

150474

1.80

69.59

0.19

0.26

 Unique to No Dups

398248

1.95

54.07

0.16

0.34

 Unique to Picard/SAMTools

181176

1.97

73.86

0.22

0.33

 Unique to Picard/No Dups

177313

2.07

65.30

0.21

0.31

 Unique to SAMTools/No Dups

230589

1.73

52.17

0.23

0.24

P-Value (comparing Unique rows)

<2.60e-16

1.00

0.32

0.84

1.00

  1. Here we present metrics from each portion of the Venn diagram (Fig. 2), including total number of variants, transition/transversion (Ti/Tv) ratios, average population frequency, proportion of novel variants, and proportion of variants that change the protein product. In the top part of the table, variant characteristics are reported for all the variants resulting from duplicate removal using Picard or SAMTools, or no duplicate removal. Variants from the dataset processed using Picard are referred to as Picard, processed using SAMTools as SAMTools, and the dataset without duplicate removal as No Dups. Population frequencies are based on the 1000 Genomes Project, dbSNP variants refer to build 138 and any variant not present in dbSNP is considered novel, and protein changing variants are missense SNVs or frameshifting InDels. We performed a Chi-square goodness-of-fit to test for significant differences amongst values in each column. Two tests were performed for each column: (1) comparing the values for all variants in each main dataset (“All Picard”, “All SAMTools”, and “All No Dups”); and (2) comparing values for variants across all “Unique” groups. There was a significant difference when comparing the number of variants across groups, but none of the other measures were significantly different