Skip to main content

Table 5 Comparison of Compression Results

From: Data structures and compression algorithms for high-throughput sequencing technologies

 

Dataset 1

Dataset 2

Dataset 3

Original Data Sizes

   

   Raw Sequence

1,030,333,440

353,181,920

8,869,613,392

   Uniform

912,352,288

252,540,968

4,946,059,912

   Bowtie

3,145,664,248

902,954,872

19,475,952,512

   Bowtie Extra Fields (7zip)

36,306,064

93,238,688

778,347,264

Best Compression

56,078,940

35,983,322

390,541,330

   Raw Sequence

18

10

23

   Uniform

16

7

13

   Bowtie

56

25

49

   Bowtie+

34

7

17

GenCompress

56,166,419

36,099,480

390,541,330

   Raw Sequence

18

9

23

   Uniform

16

7

13

   Bowtie

56

25

49

   Bowtie+

34

7

17

gzip

   

   Raw Sequence

41,378,624

95,688,992

618,818,824

 

24

3

14

   Uniform

42,918,256

54,762,528

603,836,784

 

21

4

8

   Bowtie

459,640,264

236,156,432

1,640,587,416

 

7

4

12

bzip2

   

   Raw Sequence

42,233,336

94,030,320

955,061,616

 

24

3

9

   Uniform

36,400,576

54,656,000

649,419,632

 

25

4

7

   Bowtie

250,373,616

171,835,792

1,609,317,768

 

13

5

12

7zip

   

   Raw Sequence

30,651,664

83,319,584

411,811,520

 

33

4

21

   Uniform

27,852,952

34,482,312

283,490,928

 

33

7

17

   Bowtie

247,481,992

183,522,960

1,254,167,144

 

13

5

16