Skip to main content

Table 10 Alignment accuracy on the PhiX dataset with and without compression

From: QualComp: a new lossy compressor for quality scores based on rate distortion theory

  

     Mismatches

    

bits/quality score

0

1

2

3

≥ 4

Unmapped

Size (MB)

  

       Original

    

2.95

11315113

1179411

237852

89385

141300

347707 (2.61%)

468.096

  

       1 Cluster

    

0

11315113

1178443

237493

67493

298

511928 (3.84%)

0.097

0.20

11315113

1179059

237691

86662

90262

401981 (3.01%)

32.097

0.50

11315113

1179153

237726

88051

100677

390048 (2.93%)

80.097

1.00

11315113

1179233

237766

88771

109950

379935 (2.85%)

159.097

2.00

11315113

1179304

237801

89177

120269

369104 (2.77%)

318.097

2.50

11315113

1179318

237813

89250

123610

365664 (2.74%)

397.097

  

       3 Clusters

    

0

11315113

1179104

237763

79908

100618

398262 (2.99%)

0.285

0.20

11315113

1179221

237793

86486

120835

371320 (2.78%)

32.411

0.50

11315113

1179268

237799

87857

124371

366360 (2.75%)

81.185

1.00

11315113

1179298

237816

88621

128182

361738 (2.71%)

159.985

2.00

11315113

1179346

237827

89108

132675

356699 (2.67%)

318.585

2.50

11315113

1179362

237835

89204

134221

355033 (2.66%)

398.385

  

      5 Clusters

    

0

11315113

1179057

237742

83060

110348

385448 (2.89%)

0.476

0.20

11315113

1179239

237796

86437

121236

370947 (2.78%)

32.551

0.50

11315113

1179283

237799

87858

124886

365829 (2.74%)

80.606

1.00

11315113

1179321

237813

88664

128682

361175 (2.71%)

160.376

2.00

11315113

1179363

237828

89146

133300

356018 (2.67%)

319.270

2.50

11315113

1179364

237833

89230

134703

354525 (2.66%)

400.176

  1. Alignment results of Bowtie with the original PhiX FASTQ file and the ones reconstructed by QualComp, with different parameters. The first column specifies the rate, and the remaining ones the number of reads that are mapped to the reference genome with 0, 1, 2, 3 and more than 4 mismatches, and those that did not map. Last column shows the total size after compression. To compute the size of the quality scores in the original FASTQ file, we apply SCALCE [35] with lossless compression. Note that the number of reads that map with zero mismatches remain constant for all the choices of rate and number of clusters, and is equal to that of the original file.