Skip to main content

Table 1 Number of clusters

From: DNACLUST: accurate and efficient clustering of phylogenetic marker genes

 

0.99

0.97

0.95

DNACLUST exact

233879

73726

28241

DNACLUST inexact

240125

76391

28661

UCLUST exact

144339

48418

20039

UCLUST inexact

253108

71361

26685

CD-HIT

245851

100280

55208

  1. The number of clusters produced by DNACLUST, UCLUST and CD-HIT at various identity/similarity thresholds, on the twins dataset. Since each tool uses slightly different distance measures, the number of clusters can not be directly compared between different tools. (Namely the identity measure used by UCLUST and CD-HIT underestimates the distance between two sequences, as computed by the similarity measure used by DNACLUST). Instead we compare the change in the number of clusters when switching between the exact and inexact modes of each tool - a smaller change indicating better performance.