To denoise or to cluster, that is not the question: optimizing pipelines for COI metabarcoding and metaphylogeography

Table 1 Main characteristics of the original and the generated datasets

	n. ESVs (*)	n. MOTUs	Single-ESV MOTUs	ESVs/MOTU (*)	Reads/MOTU
Original	330,382	–	–	–	–
Du (**)	60,198	–	–	–	–
Da	32,798	–	–	–	–
Du_e (***)	113,133	–	–	–	–
S	330,382	19,012	12,257	17.378	511.194
Du_S	60,198	19,058	12,471	3.159	509.961
S_Du	75,069	19,012	12,433	3.949	511.194
Da_S	32,798	19,167	15,565	1.711	507.060
S_Da	35,376	19,012	15,198	1.861	511.194
Du_d_S	60,198	19,058	12,471	3.159	509.960
Du_c_S	60,198	19,058	12,471	3.159	509.960
Du_e_S	113,133	19,016	12,365	5.949	511.087
Du_e_d_S	113,133	19,016	12,365	5.949	511.087
Du_e_c_S	113,133	19,016	12,365	5.949	511.087

All datasets had 9,718,827 reads. 1-ESV MOTUs refer to the number of MOTUs with just one ESV. Codes of the datasets: Du, denoised with UNOISE3 algorithm (unless otherwise stated, it refers to the original formulation giving precedence to abundance ratio); Da, denoised with DADA2 algorithm; S, clustered with SWARM algorithm; Du_S, denoised (UNOISE3) and clustered; S_Du, clustered and denoised (UNOISE3); Da_S, denoised (DADA2) and clustered; S_Da, clustered and denoised (DADA2); Du_d_S, denoised (UNOISE3) with precedence to distance and clustered; Du_c_S, denoised (UNOISE3) with combined precedence and clustered; Du_e _S, denoised (UNOISE3) with correction taking into account the entropy of the codon positions and clustered; Du_e_d_S, denoised (UNOISE3) with correction plus precedence to distance and clustered; Du_e_c_S, denoised (UNOISE3) with correction plus combined precedence and clustered
*For the original and S datasets the number of sequences instead of ESVs is used
**The same values apply to Du_d (distance precedence) and Du_c (combined precedence)
***The same values apply to Du_e_d (distance precedence) and Du_e_c (combined precedence)

ISSN: 1471-2105