Skip to main content

Table 1 General Properties of the Datasets

From: Clustering of protein families into functional subtypes using Relative Complexity Measure with reduced amino acid alphabets

Family

# of sequences

# of subfamilies

μ Length

σ Length

μ PID*

Crotonases

467

13

332

87

21

Mandelate racemases

184

8

416

74

27

Vicinal oxygen chelates

309

18

294

108

14

Haloacid dehalogenases

195

14

303

137

12

Nucleotidyl cyclases

75

2

1059

200

21

Acyl transferases

177

2

290

12

41

GH2 hydrolases

33

4

872

160

15

  1. * Mean Percent Identity (μ PID) is the average of all pairwise sequence identities in a given family.