BMC Bioinformatics

Table 3 Statistics of Three High-Throughput Data Sets

From: Data structures and compression algorithms for high-throughput sequencing technologies

	Dataset 1	Dataset 2	Dataset 3
Reads (× 10⁶)	6.4	1.7	31
Read length	19	25	23-44
Coverage	Very sparse	Sparse	Full
File sizes
Raw Sequence	1,030,333,440	353,181,920	8,869,613,392
Uniform	912,352,288	252,540,968	4,946,059,912
Location	743,517,128	226,557,032	4,232,120,216
Mismatches	168,835,160	25,983,936	713,939,696
Bowtie	3,145,664,248	902,954,872	19,475,952,512
Bowtie Extra Fields
gzip	50,382,904	106,576,328	839,247,848
7zip	36,306,064	93,238,688	778,347,264

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com