Skip to main content

Table 1 The ten real-world FASTQ data sets used for performance evaluation

From: LW-FQZip 2: a parallelized reference-based compression of FASTQ files

 

Datasets

Platforms

Species

Read length (bp)

Size (MB)

GC content

Long-read

SRR2916693

454GS

Pseudomonas moraviensis

67-1201

425

58.8%

SRR2994368

Illumina Miseq

Escherichia coli

70-502

4688

49.7%

SRR3211986

Pacbio RS

Homo sapiens

2-62746

1759

39.6%

ERR739513

MinION

Phage

5-246140

871

47.9%

SRR3190692

Illumina MiSeq

Escherichia coli

70-602

11379

52.3%

Short-read

ERR385912

Illumina Hiseq 2000

Escherichia coli

51

641

43.5%

ERR386131

Ion Torrent PGM

Capsicum baccatum

151

1371

50.5%

SRR034509

Illumina Analyzer II

Escherichia coli

101

5247

52.6%

ERR174310

Illumina Hiseq 2000

Homo sapiens

202

105122

N.A.

ERR194147

Illumina Hiseq 2000

Homo sapiens

101

202631

40.3%

  1. Note: The long-read data sets have variable-length reads, while the short-read data sets have fixed-length reads