Skip to main content

Table 4 All clusters matching with the three largest Pfam clusters from data set #4. coreClust breaks down Pfam clusters into smaller clusters with zero to a few outliers

From: Alignment-free clustering of large data sets of unannotated protein conserved regions using minhashing

Pfam Family

|Pfam|

|coreCl|

\(\frac {|Pfam \cap coreCl|}{|Pfam|}\)

\(\frac {|Pfam \cap coreCl|}{|coreCl|}\)

Total

     

clustered

     

sequences

PF03880.12

1232

84

0.07

1

341

  

54

0.04

1

 
  

44

0.03

1

 
  

27

0.02

1

 
  

24

0.02

1

 
  

17

0.01

1

 
  

15

0.01

1

 
  

15

0.01

1

 
  

14

0.01

1

 
  

13

0.01

1

 
  

13

0.01

1

 
  

11

0.01

1

 
  

10

0.01

1

 

PF00271.28

1192

364

0.30

1

1084

  

343

0.29

1

 
  

253

0.21

1

 
  

91

0.08

1

 
  

22

0.02

1

 
  

11

0.01

0.91

 

PF00270.26

1187

260

0.22

1

1031

  

245

0.21

1

 
  

196

0.16

1

 
  

141

0.12

1

 
  

105

0.09

1

 
  

72

0.06

1

 
  

12

0.01

1

Â