Skip to main content

Table 1 Comparison of SYSTERS, TribeMCL and single linkage clustering (SLC) on Pfam and ENZYME data sets (J: Jaccard Coeffcient, SE: Sensitivity, SP: Specificity). The best result in each row is shown in bold face. For the single linkage clustering only the results of the "best" clustering are shown together with the corresponding cutoff E-value. In the case of Sensitivity/Specificity these values were choosen according to the intercept point of the two curves when plotting the values for all possible E-value cutoffs. All clustering procedures were applied to the non-redundant data set and redundant sequences were added to the cluster sets again to compare to the "true" cluster sets: a33,963,365 pairwise values of 283,113 non-redundant sequences used for clustering and 442,872 redundant sequences used in comparison; b1,582,948 pairwise values of 38,176 non-redundant sequences used for clustering and 84,405 redundant sequences used in comparison.

From: Large scale hierarchical clustering of protein sequences

 

SLC

SYSTERS

TribeMCL at Inflation

 

best

at cutoff

Superfam.

Subclust.

1.1

2

3

4

5

Pfama

J

0.19362

1e-53

0.15637

0.20815

---

---

---

---

---

SE

0.26886

1e-49

0.55272

0.48302

---

---

---

---

---

SP

0.26536

1e-49

0.17902

0.26781

---

---

---

---

---

ENZYMEb

A.B.C.D

J

0.88760

1e-21

0.77445

0.89670

0.60390

0.60074

0.59990

0.59942

0.59778

SE

0.92295

1e-08

0.92931

0.92297

0.61323

0.60328

0.60224

0.60164

0.59989

SP

0.93616

1e-08

0.82294

0.96924

0.97543

0.99304

0.99357

0.99388

0.99416

A.B.C.?

J

0.71527

1e-15

0.65915

0.72410

0.48721

0.47900

0.47803

0.47746

0.47600

SE

0.74985

1e-03

0.75320

0.73727

0.49099

0.47996

0.47895

0.47836

0.47688

SP

0.80855

1e-03

0.84073

0.97592

0.98445

0.99586

0.99601

0.99608

0.99617