Skip to main content

Table 1 Number of clusters

From: DNACLUST: accurate and efficient clustering of phylogenetic marker genes

  0.99 0.97 0.95
DNACLUST exact 233879 73726 28241
DNACLUST inexact 240125 76391 28661
UCLUST exact 144339 48418 20039
UCLUST inexact 253108 71361 26685
CD-HIT 245851 100280 55208
  1. The number of clusters produced by DNACLUST, UCLUST and CD-HIT at various identity/similarity thresholds, on the twins dataset. Since each tool uses slightly different distance measures, the number of clusters can not be directly compared between different tools. (Namely the identity measure used by UCLUST and CD-HIT underestimates the distance between two sequences, as computed by the similarity measure used by DNACLUST). Instead we compare the change in the number of clusters when switching between the exact and inexact modes of each tool - a smaller change indicating better performance.