Skip to main content

Table 7 Sequence encoding using Huffman Trees.

From: Data structures and compression algorithms for high-throughput sequencing technologies

Dataset

k

Sequence bits

Tree bits

Total bits

1

1

31,674,558

40

31,674,598

 

2

28,340,409

324

28,340,733

 

3

27,708,166

1,951

27,710,117

 

4

22,565,417

10,471

22,575,888

 

5

19,126,288

53,178

19,179,466

 

6

21,056,658

256,303

21,312,961

2

1

94,680,841

52

94,680,893

 

2

81,954,644

549

81,955,193

 

3

81,038,827

4,303

81,043,130

 

4

80,554,549

27,458

80,582,007

 

5

83,570,470

148,206

83,718,676

 

6

79,977,714

622,784

80,600,498

  1. The data is preprocessed by counting the frequencies of k-mers, and this is used to build a Huffman tree. The tree is used to encode the data, and the number of bits needed to store the data as well as the tree are given