Skip to main content

Table 2 Details of the datasets used for experiments

From: CASPER: context-aware scheme for paired-end reads from high-throughput amplicon sequencing

dataset

type

ID

target†

# Total reads

# Refs‡

Fragment length§

Read length

Overlap length

Simulator (error model) or sequencer used

Source

Simulated

A4

V5

1,000,000

23

160-190

100

10-40

GemSIM (v4#)

[19, 20]

 

A5

V5

1,000,000

23

160-190

100

10-40

GemSIM (v5b)

[19, 20]

 

S4

V5

1,000,000

1

160

100

40

GemSIM (v4#)

[19, 20]

 

S5

V5

1,000,000

1

160

100

40

GemSIM (v5b)

[19, 20]

Real

C1

V3

716,366

9

169-195

125

55-81

Illumina GAIIx

[21]

 

C2

V3

1,350,602

9

169-195

125

55-81

Illumina GAIIx

[21]

 

C3

V3

673,845

1

198

108

18

Illumina GAIIx

[10]

  1. †hyper-variable regions in 16S rRNA; ‡the number of reference sequences; §excluding the primer (simulated).
  2. #Illumina error model v4 (forward rate 0.99%, reverse 2.40%); b v5 (forward rate 0.28%, reverse 0.34%).