Skip to main content

Table 1 Comparison of SYSTERS, TribeMCL and single linkage clustering (SLC) on Pfam and ENZYME data sets (J: Jaccard Coeffcient, SE: Sensitivity, SP: Specificity). The best result in each row is shown in bold face. For the single linkage clustering only the results of the "best" clustering are shown together with the corresponding cutoff E-value. In the case of Sensitivity/Specificity these values were choosen according to the intercept point of the two curves when plotting the values for all possible E-value cutoffs. All clustering procedures were applied to the non-redundant data set and redundant sequences were added to the cluster sets again to compare to the "true" cluster sets: a33,963,365 pairwise values of 283,113 non-redundant sequences used for clustering and 442,872 redundant sequences used in comparison; b1,582,948 pairwise values of 38,176 non-redundant sequences used for clustering and 84,405 redundant sequences used in comparison.

From: Large scale hierarchical clustering of protein sequences

  SLC SYSTERS TribeMCL at Inflation
  best at cutoff Superfam. Subclust. 1.1 2 3 4 5
Pfama
J 0.19362 1e-53 0.15637 0.20815 --- --- --- --- ---
SE 0.26886 1e-49 0.55272 0.48302 --- --- --- --- ---
SP 0.26536 1e-49 0.17902 0.26781 --- --- --- --- ---
ENZYMEb
A.B.C.D
J 0.88760 1e-21 0.77445 0.89670 0.60390 0.60074 0.59990 0.59942 0.59778
SE 0.92295 1e-08 0.92931 0.92297 0.61323 0.60328 0.60224 0.60164 0.59989
SP 0.93616 1e-08 0.82294 0.96924 0.97543 0.99304 0.99357 0.99388 0.99416
A.B.C.?
J 0.71527 1e-15 0.65915 0.72410 0.48721 0.47900 0.47803 0.47746 0.47600
SE 0.74985 1e-03 0.75320 0.73727 0.49099 0.47996 0.47895 0.47836 0.47688
SP 0.80855 1e-03 0.84073 0.97592 0.98445 0.99586 0.99601 0.99608 0.99617