Skip to main content

Table 2 Scalability benchmark through the viral nucleotide and amino acid sequence datasets

From: Clustering biological sequences with dynamic sequence similarity threshold

 

Datasets

Viral nucleotides (≤ 10,000 nts)

Long viral nucleotides (> 10,000 nts)

Viral amino acids@

No. of clusters

Time (hh:mm:ss)

No. of clusters

Time (hh:mm:ss)

No. of clusters

Time (hh:mm:ss)

ALFATClust$

237,276

00:35:41

499

00:00:39

234,451

00:27:23

CD-HIT

N.A.#

506

00:16:10

109,515

04:29:06

UCLUST

245,658

07:47:25

N.A.*

243,982

00:00:56

MMseqs2

221,997

00:02:18

428

00:00:15

235,921

00:06:43

VSEARCH

239,728

05:23:11

507

01:48:36

  

MeShClust

8713

00:14:53

462

01:14:52

  

MeShClust2

194,267

03:43:00

571

00:00:10

  

DNACLUST

N.A.^

N.A.^

  
  1. $ALFATClust runs at γlow = 0.75, and all other tools run at T = 0.85
  2. #Terminated after running for 8 h
  3. *Memory limit exceeded for the community (32-bit) version of UCLUST
  4. ^Segmentation fault occurs
  5. @Only ALFATClust, CD-HIT, UCLUST, and MMseqs2 can process protein sequences