Skip to main content

Table 11 SNP calling on the M. musculus dataset with and without compression

From: QualComp: a new lossy compressor for quality scores based on rate distortion theory

    

One cluster

   

R

MSE

T.P.

F.P

 F.N.

Selectivity (%)

Sensitivity (%)

Size (MB)

0

143.0

11217

2033

 1810

84.66

86.11

0.019

0.20

19.16

12585

1159

 442

91.57

96.61

16.17

0.33

16.46

12602

1120

 380

91.84

97.07

26.68

0.66

12.94

12669

998

 358

92.70

97.25

53.34

1.00

10.01

12656

875

 371

93.53

97.15

80.82

2.00

4.58

12733

594

 294

95.54

97.74

161.62

    

Two clusters

   

R

MSE

T.P.

F.P

 F.N.

Selectivity (%)

Sensitivity (%)

Size (MB)

0

37.58

12086

1534

 941

88.73

92.77

0.039

0.20

16.42

12644

1184

 383

91.44

97.06

16.19

0.33

14.39

12655

1107

 372

91.95

97.14

26.70

0.66

11.00

12669

985

 358

92.78

97.25

53.36

1.00

8.59

12687

830

 340

93.85

97.39

80.84

2.00

3.76

12751

606

 276

95.46

97.88

161.64

    

Three clusters

   

R

MSE

T.P.

F.P

 F.N.

Selectivity (%)

Sensitivity (%)

Size (MB)

0

27.49

12048

1219

 979

90.81

92.48

0.050

0.20

14.89

12638

1108

 389

91.93

97.01

16.21

0.33

13.09

12645

1070

 382

92.19

97.06

26.98

0.66

9.82

12646

909

 381

93.29

97.07

53.91

1.00

7.35

12685

776

 342

94.23

97.37

80.85

2.00

3.12

12730

554

 297

95.83

97.72

161.65

  1. We compare the SNPs detected by Samtools with the original FASTQ file and those obtained with the compressed files, using QualComp with one, two and three clusters and different rates. In all cases, reads were aligned first using the BWA algorithm. T.P., F.P. and F.N. stand for true positive (detected both with the original FASTQ file and the reconstructed one), false positive (detected only with the reconstructed FASTQ file) and false negative (detected only with the original FASTQ file), respectively. The selectivity parameter is computed as T.P./(T.P. + F.P.), and sensitivity as T.P./(T.P. + F.N.). Note that already for R = 0.2 the sensitivity is above 96% and the selectivity is above 91%.