Skip to main content

Table 6 Number of sequences, families, and super-families in the datasets

From: Evaluation and improvements of clustering algorithms for detecting remote homologous protein families

Dataset

Sequences

Families

Super-families

A-10

3461

(55%)

970

(25%)

589

(30%)

A-20

4260

(60%)

1144

(28%)

684

(34%)

A-30

6532

(72%)

1572

(38%)

868

(44%)

A-50

10816

(84%)

2109

(49%)

1080

(55%)

A-70

13391

(87%)

2306

(54%)

1162

(59%)

A-90

15861

(90%)

2420

(56%)

1222

(62%)

A-95

17505

(91%)

2521

(59%)

1273

(64%)

GOLD

866

(100%)

91

(100%)

5

(100%)

  1. Numbers in parenthesis indicate the percentage of sequences/families/super-families that remained after removing singletons.