Skip to main content

Table 1 Full protein family dataset used for function prediction experiments.

From: Analysis of substructural variation in families of enzymatic proteins with applications to protein function prediction

EC class

PDB ID (Chain)

Amino acid numberLabels

EC class size

1.1.1.1

1HET (A)

46C, 48S, 67H, 174C

82

1.1.1.21

1US0 (A)

43D, 48Y, 76S, 77K, 110H

89

1.11.1.7

1ARU (A)

52RQ, 56H, 57D, 93NR, 184H

83

1.14.13.39

1DWW (A)

194C, 346V, 363F, 366W, 367Y

126

2.5.1.18

2A2R (A)

7Y, 13FLR, 47ACFLM, 108CFLY

190

2.6.1.1

2QA3 (A)

32G, 34G, 183N, 374R

105

2.7.4.6

1NHK (R)

51Y, 117H, 119S, 128K

60

3.1.1.7

1H23 (A)

84W, 117G, 130Y, 279W, 330F

110

3.1.3.1

1ANI (A)

51D, 101D, 102S, 331H, 412H,

44

3.1.3.48

2CM2 (A)

181DE, 182FHMY, 216S, 221R, 266Q

248

3.2.1.1

1HT6 (A)

52G, 178R, 180D, 205E, 291D

133

3.5.2.6

1YLJ (A)

70S, 73K, 130S, 132N

254

4.2.1.1

1HCB (A)

94H, 96H, 106E, 119H, 199T

282

5.3.1.1

1YPI (A)

12K, 95H, 96S, 165A

95

5.3.1.5

1DID (A)

53H, 56D, 93F, 136W, 182K

71

  1. For each EC class family, a single PDB structure was used to define an input motif. The list of amino acid numbers are documented functional residues found within the primary PDB http://www.pdb.org reference corresponding to each PDB structure. The superscript labels above each amino acid number are the possible amino acid types that can match at each motif point; further details of alternate amino acid label use can be found here [43]. Where multiple amino acid labels per motif point appear, they were determined using ConSurf [28].