Skip to main content

Table 3 Statistics of Three High-Throughput Data Sets

From: Data structures and compression algorithms for high-throughput sequencing technologies

 

Dataset 1

Dataset 2

Dataset 3

Reads (× 106)

6.4

1.7

31

Read length

19

25

23-44

Coverage

Very sparse

Sparse

Full

File sizes

   

   Raw Sequence

1,030,333,440

353,181,920

8,869,613,392

   Uniform

912,352,288

252,540,968

4,946,059,912

Location

743,517,128

226,557,032

4,232,120,216

Mismatches

168,835,160

25,983,936

713,939,696

   Bowtie

3,145,664,248

902,954,872

19,475,952,512

Bowtie Extra Fields

   

   gzip

50,382,904

106,576,328

839,247,848

   7zip

36,306,064

93,238,688

778,347,264