Skip to main content

Table 10 Alignment accuracy on the PhiX dataset with and without compression

From: QualComp: a new lossy compressor for quality scores based on rate distortion theory

        Mismatches     
bits/quality score 0 1 2 3 ≥ 4 Unmapped Size (MB)
          Original     
2.95 11315113 1179411 237852 89385 141300 347707 (2.61%) 468.096
          1 Cluster     
0 11315113 1178443 237493 67493 298 511928 (3.84%) 0.097
0.20 11315113 1179059 237691 86662 90262 401981 (3.01%) 32.097
0.50 11315113 1179153 237726 88051 100677 390048 (2.93%) 80.097
1.00 11315113 1179233 237766 88771 109950 379935 (2.85%) 159.097
2.00 11315113 1179304 237801 89177 120269 369104 (2.77%) 318.097
2.50 11315113 1179318 237813 89250 123610 365664 (2.74%) 397.097
          3 Clusters     
0 11315113 1179104 237763 79908 100618 398262 (2.99%) 0.285
0.20 11315113 1179221 237793 86486 120835 371320 (2.78%) 32.411
0.50 11315113 1179268 237799 87857 124371 366360 (2.75%) 81.185
1.00 11315113 1179298 237816 88621 128182 361738 (2.71%) 159.985
2.00 11315113 1179346 237827 89108 132675 356699 (2.67%) 318.585
2.50 11315113 1179362 237835 89204 134221 355033 (2.66%) 398.385
         5 Clusters     
0 11315113 1179057 237742 83060 110348 385448 (2.89%) 0.476
0.20 11315113 1179239 237796 86437 121236 370947 (2.78%) 32.551
0.50 11315113 1179283 237799 87858 124886 365829 (2.74%) 80.606
1.00 11315113 1179321 237813 88664 128682 361175 (2.71%) 160.376
2.00 11315113 1179363 237828 89146 133300 356018 (2.67%) 319.270
2.50 11315113 1179364 237833 89230 134703 354525 (2.66%) 400.176
  1. Alignment results of Bowtie with the original PhiX FASTQ file and the ones reconstructed by QualComp, with different parameters. The first column specifies the rate, and the remaining ones the number of reads that are mapped to the reference genome with 0, 1, 2, 3 and more than 4 mismatches, and those that did not map. Last column shows the total size after compression. To compute the size of the quality scores in the original FASTQ file, we apply SCALCE [35] with lossless compression. Note that the number of reads that map with zero mismatches remain constant for all the choices of rate and number of clusters, and is equal to that of the original file.