Skip to main content

Table 2 Multidomain proteins clustering results

From: kClust: fast and sensitive clustering of large protein sequence databases

 

#Clusters

#Corrupted clusters

%Corrupted clusters

#Wrong seq per

Time

    

corrupted cluster

 

BLAST

6 977

186

2.66

1.5

3 h 21 m ∗

kClust iter

8 537

10

0.1

1.1

13 m/39 m †

kClust sens

17 070

10

0.06

1.1

9 m

kClust 20

17 127

10

0.06

1.1

9 m

kClust 30

22 047

6

0.03

1.2

9 m

UCLUST 30

39 284

10132

25.79

1.7

30s

UCLUST 40

50 104

10512

20.98

1.4

40s

CD-HIT

29 163

6 234

21.37

1.89

43m

  1. Clustering results on a set of 100 000 two-domain sequences constructed from 1000 domain architectures with 100 sequences each. ∗The time is the sum of the run times of all 8 parallel threads. †Includes the time for the calculation of multiple sequence alignments.