Skip to main content

Table 2 Comparison of compression ratios of six software suites

From: MetaCRAM: an integrated pipeline for metagenomic taxonomy identification and compression

Data

Original (MB)

MCH1 (MB)

MCH2 (MB)

MCG (MB)

MCEG (MB)

Align. %

Qual value (MB)

bzip2 (MB)

gzip (MB)

MFComp (MB)

ERR321482

1429

191

186

312

213

29.6

411

362

408

229

SRR359032

3981

319

282

657

458

61.8

2183

998

1133

263

ERR532393

8230

948

898

1503

1145

46.8

3410

2083

2366

1126

SRR1450398

5399

703

697

854

729

7.7

365

1345

1532

726

SRR062462

6478

137

135

188

144

2.7

153

222

356

161

  1. For short hand notation, we used“MCH” = MetaCRAM-Huffman, “MCG” = MetaCRAM-Golomb, “MCEG” = MetaCRAM-extended Golomb, “MFComp” = MFCompress. MCH1 is the default option of MetaCRAM with Huffman encoding, and MCH2 is a version of MetaCRAM in which we removed the redundancy in both quality scores and the read IDs. “Align. %” refers to the total alignment rates from the first and second iteration. Minimum compressed file size achievable by the methods are written in bold case letters. Minimum compressed file size achievable by the methods are written in bold case letters