Skip to main content

Table 11 SNP calling on the M. musculus dataset with and without compression

From: QualComp: a new lossy compressor for quality scores based on rate distortion theory

     One cluster    
R MSE T.P. F.P  F.N. Selectivity (%) Sensitivity (%) Size (MB)
0 143.0 11217 2033  1810 84.66 86.11 0.019
0.20 19.16 12585 1159  442 91.57 96.61 16.17
0.33 16.46 12602 1120  380 91.84 97.07 26.68
0.66 12.94 12669 998  358 92.70 97.25 53.34
1.00 10.01 12656 875  371 93.53 97.15 80.82
2.00 4.58 12733 594  294 95.54 97.74 161.62
     Two clusters    
R MSE T.P. F.P  F.N. Selectivity (%) Sensitivity (%) Size (MB)
0 37.58 12086 1534  941 88.73 92.77 0.039
0.20 16.42 12644 1184  383 91.44 97.06 16.19
0.33 14.39 12655 1107  372 91.95 97.14 26.70
0.66 11.00 12669 985  358 92.78 97.25 53.36
1.00 8.59 12687 830  340 93.85 97.39 80.84
2.00 3.76 12751 606  276 95.46 97.88 161.64
     Three clusters    
R MSE T.P. F.P  F.N. Selectivity (%) Sensitivity (%) Size (MB)
0 27.49 12048 1219  979 90.81 92.48 0.050
0.20 14.89 12638 1108  389 91.93 97.01 16.21
0.33 13.09 12645 1070  382 92.19 97.06 26.98
0.66 9.82 12646 909  381 93.29 97.07 53.91
1.00 7.35 12685 776  342 94.23 97.37 80.85
2.00 3.12 12730 554  297 95.83 97.72 161.65
  1. We compare the SNPs detected by Samtools with the original FASTQ file and those obtained with the compressed files, using QualComp with one, two and three clusters and different rates. In all cases, reads were aligned first using the BWA algorithm. T.P., F.P. and F.N. stand for true positive (detected both with the original FASTQ file and the reconstructed one), false positive (detected only with the reconstructed FASTQ file) and false negative (detected only with the original FASTQ file), respectively. The selectivity parameter is computed as T.P./(T.P. + F.P.), and sensitivity as T.P./(T.P. + F.N.). Note that already for R = 0.2 the sensitivity is above 96% and the selectivity is above 91%.