Skip to main content

Table 1 Main characteristics of the original and the generated datasets

From: To denoise or to cluster, that is not the question: optimizing pipelines for COI metabarcoding and metaphylogeography

  n. ESVs (*) n. MOTUs Single-ESV MOTUs ESVs/MOTU (*) Reads/MOTU
Original 330,382
Du (**) 60,198
Da 32,798
Du_e (***) 113,133
S 330,382 19,012 12,257 17.378 511.194
Du_S 60,198 19,058 12,471 3.159 509.961
S_Du 75,069 19,012 12,433 3.949 511.194
Da_S 32,798 19,167 15,565 1.711 507.060
S_Da 35,376 19,012 15,198 1.861 511.194
Du_d_S 60,198 19,058 12,471 3.159 509.960
Du_c_S 60,198 19,058 12,471 3.159 509.960
Du_e_S 113,133 19,016 12,365 5.949 511.087
Du_e_d_S 113,133 19,016 12,365 5.949 511.087
Du_e_c_S 113,133 19,016 12,365 5.949 511.087
  1. All datasets had 9,718,827 reads. 1-ESV MOTUs refer to the number of MOTUs with just one ESV. Codes of the datasets: Du, denoised with UNOISE3 algorithm (unless otherwise stated, it refers to the original formulation giving precedence to abundance ratio); Da, denoised with DADA2 algorithm; S, clustered with SWARM algorithm; Du_S, denoised (UNOISE3) and clustered; S_Du, clustered and denoised (UNOISE3); Da_S, denoised (DADA2) and clustered; S_Da, clustered and denoised (DADA2); Du_d_S, denoised (UNOISE3) with precedence to distance and clustered; Du_c_S, denoised (UNOISE3) with combined precedence and clustered; Du_e _S, denoised (UNOISE3) with correction taking into account the entropy of the codon positions and clustered; Du_e_d_S, denoised (UNOISE3) with correction plus precedence to distance and clustered; Du_e_c_S, denoised (UNOISE3) with correction plus combined precedence and clustered
  2. *For the original and S datasets the number of sequences instead of ESVs is used
  3. **The same values apply to Du_d (distance precedence) and Du_c (combined precedence)
  4. ***The same values apply to Du_e_d (distance precedence) and Du_e_c (combined precedence)