Skip to main content

Table 6 Number of sequences, families, and super-families in the datasets

From: Evaluation and improvements of clustering algorithms for detecting remote homologous protein families

Dataset Sequences Families Super-families
A-10 3461 (55%) 970 (25%) 589 (30%)
A-20 4260 (60%) 1144 (28%) 684 (34%)
A-30 6532 (72%) 1572 (38%) 868 (44%)
A-50 10816 (84%) 2109 (49%) 1080 (55%)
A-70 13391 (87%) 2306 (54%) 1162 (59%)
A-90 15861 (90%) 2420 (56%) 1222 (62%)
A-95 17505 (91%) 2521 (59%) 1273 (64%)
GOLD 866 (100%) 91 (100%) 5 (100%)
  1. Numbers in parenthesis indicate the percentage of sequences/families/super-families that remained after removing singletons.