Skip to main content

Table 1 Pfam clustering results

From: kClust: fast and sensitive clustering of large protein sequence databases

 

#Clusters

#Corrupted clusters

%Corrupted clusters

#Wrong seqs per

Time

    

corrupted cluster

 

BLAST

118 920

5

4.2e-3

2.4

66 h 2 m ∗

kClust iter

111 251

4

5.2e-3

4.0

28 m/1 h 47 m †

kClust sens

153 721

8

5.2e-3

1.6

17 m

kClust 20

153 883

8

5.1e-3

1.6

16 m

kClust 30

169 533

5

2.9e-3

1.6

16 m

UCLUST 30

234 039

10

4.3e-3

1.0

2 m

UCLUST 40

244 568

8

3.2e-3

1.0

2 m

CD-HIT

170 750

4 086

2.39

1.39

5 h 26 m

  1. Clustering results on PfamA-seed single-domain sequences. ∗The time is the sum of the run times of all 8 parallel threads. †Includes the time for the calculation of multiple sequence alignments.