CASPER: context-aware scheme for paired-end reads from high-throughput amplicon sequencing

BMC Bioinformatics

Table 2 Details of the datasets used for experiments

dataset type	ID	target†	# Total reads	# Refs^‡	Fragment length^§	Read length	Overlap length	Simulator (error model) or sequencer used	Source
Simulated	A4	V5	1,000,000	23	160-190	100	10-40	GemSIM (v4^#)	[19, 20]
	A5	V5	1,000,000	23	160-190	100	10-40	GemSIM (v5^b)	[19, 20]
	S4	V5	1,000,000	1	160	100	40	GemSIM (v4^#)	[19, 20]
	S5	V5	1,000,000	1	160	100	40	GemSIM (v5^b)	[19, 20]
Real	C1	V3	716,366	9	169-195	125	55-81	Illumina GAIIx	[21]
	C2	V3	1,350,602	9	169-195	125	55-81	Illumina GAIIx	[21]
	C3	V3	673,845	1	198	108	18	Illumina GAIIx	[10]

†hyper-variable regions in 16S rRNA; ‡the number of reference sequences; §excluding the primer (simulated).
#Illumina error model v4 (forward rate 0.99%, reverse 2.40%); b v5 (forward rate 0.28%, reverse 0.34%).

ISSN: 1471-2105