Skip to main content

Table 2 Details of the datasets used for experiments

From: CASPER: context-aware scheme for paired-end reads from high-throughput amplicon sequencing

dataset
type
ID target† # Total reads # Refs Fragment length§ Read length Overlap length Simulator (error model) or sequencer used Source
Simulated A4 V5 1,000,000 23 160-190 100 10-40 GemSIM (v4#) [19, 20]
  A5 V5 1,000,000 23 160-190 100 10-40 GemSIM (v5b) [19, 20]
  S4 V5 1,000,000 1 160 100 40 GemSIM (v4#) [19, 20]
  S5 V5 1,000,000 1 160 100 40 GemSIM (v5b) [19, 20]
Real C1 V3 716,366 9 169-195 125 55-81 Illumina GAIIx [21]
  C2 V3 1,350,602 9 169-195 125 55-81 Illumina GAIIx [21]
  C3 V3 673,845 1 198 108 18 Illumina GAIIx [10]
  1. †hyper-variable regions in 16S rRNA; ‡the number of reference sequences; §excluding the primer (simulated).
  2. #Illumina error model v4 (forward rate 0.99%, reverse 2.40%); b v5 (forward rate 0.28%, reverse 0.34%).