Skip to main content

Table 3 Sequence-sequence comparison F-measures for clustered sequences

From: Evaluation and improvements of clustering algorithms for detecting remote homologous protein families

Family
Dataset TransClust ( T ) HiFix ( s,c ) MCL ( I ) SCPS ( c )
A-10 0.494 (1) 0.467 (0.10,0.7) 0.352 (18) -
A-20 0.573 (1) 0.491 (0.15,0.7) 0.398 (17) -
A-30 0.675 (1) 0.583 (0.20,0.7) 0.415 (51) -
A-50 0.721 (1) 0.608 (0.25,0.7) 0.457 (40) -
A-70 0.739 (1) 0.630 (0.25,0.7) 0.474 (30) -
A-90 0.758 (1) 0.653 (0.25,0.7) 0.511 (29) -
A-95 0.766 (1) 0.654 (0.25,0.7) 0.527 (22) -
GOLD 0.914 (25) 0.902 (0.30,0.6) 0.880 (12) -
Super-family
A-10 0.377 (1) 0.337 (0.10,0.7) 0.270 (18) 0.297 (648)
A-20 0.450 (1) 0.362 (0.10,0.7) 0.282 (18) 0.352 (753)
A-30 0.551 (1) 0.473 (0.10,0.7) 0.333 (57) 0.473 (955)
A-50 0.609 (1) 0.507 (0.25,0.7) 0.351 (59) 0.557 (1188)
A-70 0.631 (1) 0.539 (0.25,0.7) 0.377 (43) 0.581 (1279)
A-90 0.654 (1) 0.560 (0.25,0.7) 0.426 (43) 0.607 (1345)
A-95 0.659 (1) 0.563 (0.25,0.7) 0.435 (28) 0.615 (1401)
GOLD 0.865 (1) 0.915 (0.05,0.3) 0.827 (36) 0.904 (6)
  1. The optimized set of parameters determined for each clustering algorithm are shown in parenthesis, see Section ‘Parameter optimization’. Best values are shown in bold.