Skip to main content

Table 1 Comparison of curated and automatically-generated domain hierarchies

From: Automated hierarchical classification of protein domain subfamilies based on functionally-divergent residue signatures

CDD

Protein superfamily

number

length

Manually curated

Automatically generated

Ident.

 

seqs‡

 

nodes*

LLR†

nodes*

LLR†

time§

cd00030

C2

23,452

102

106 (103)

236574

78(73)

223857

19.4

cd00138

PLDc_SF

16,765

119

105 (102)

241766

36(34)

192876

10.0

cd00142

PI3Kc_like

2,409

219

22

34129

16

34563

4.5

cd00159

RhoGAP

4,815

169

39(38)

55604

32

53540

7.97

cd00173

SH2

5,917

79

111 (101)

49274

39

40075

3.5

cd00180

Protein kinases

104,912

215

280(260)

1378273

107(104)

1536991

241.0

cd00229

SG NH_hydrolase

14,635

187

30

180667

29

183822

14.95

cd00306

S8/S53 peptidase

10,960

241

36

161685

45(44)

173693

30.90

cd00368

Molybdopterin-Binding

9,540

374

26

177569

44

209704

39.3

cd00397

DNA_BRE_C

25,824

164

27 (26)

187382

39(37)

211739

16.9

cd00761

Glycosyltransferase A (GT-A)

66,260

156

71 (70)

944727

123(110)

1048396

193.8

cd00768

Class II aaRS-like core

37,160

211

17

674454

31

833691

54.3

cd00838

MPP_superfamily

33,753

131

61

402297

55(54)

399553

65.1

cd00900

PH-like

22,593

99

81

211812

99(98)

274945

52.3

cd01067

Globin_like

9,933

117

4 (1)

11133

26 (25)

73808

4.3

cd01391

Periplasmic_Binding_Protein_1

36,330

269

142(140)

619713

68(65)

580753

169.1

cd01494

AAT_I (Pyrodoxal-PO4-binding)

114,781

170

16

1086328

92(84)

2027660

249.67

cd01635

Glycosyltransferase GTB

44,366

229

45

723443

95(93)

881414

232.7

cd02156

Class I aaRS-like core √

53,605

105

34

522962

61(57)

698273

41.4

cd02883

Nudix_Hydrolase

32,046

123

55 (54)

321636

61(60)

367819

43.2

cd03128

GAT-1 (mcBPPS vs pmcBPPS)

46,514

92

34(32)

319515

64(62)

388621

42.2

cd03440

hot_dog

30,162

100

22(18)

141990

70 (69)

345298

39.1

cd03873

Zinc peptidases

24,455

237

81

596408

69(66)

590521

43.9

cd05466

Periplasmic_Binding_Protein_2

45,287

197

76(73)

523941

49(41)

411445

31.7

cd06587

Glo_EDI_BRP_like

36,165

112

60 (58)

335848

94(91)

479522

54.8

cd06663

Biotinyl-lipoyl

25,013

73

4

53038

25(18)

66571

4.53

cd06846

Adenylation_DNA_ligase_like

3,833

182

14

43276

20

48,475

4.8

cd08555

PI-PLCc_GDPD_SF

8,707

179

74 (73)

143201

37(32)

123075

6.9

cd08772

GH43_62_32_68 (β propellers)

6,760

286

28

111336

51(50)

176701

30.0

cl09931

Rossmann fold proteins

424,764

93

361 (347)

4110907

145(130)

4029120

757.2

 

Average

44,057

167.7

66.4

486696

56.9

556884

83.6

  1. ‡ After removing identical sequences and sequences that fail to align with at least 75% of the domain.
  2. * Numbers in parentheses indicate the nodes retained after insignificant nodes were removed by the mcBPPS program.
  3. † The log-likelihood ratio in nats.
  4. § The time (in minutes) is for Steps 2 and 3 of the algorithm only; Step 1 can be parallelized to run in less than 10% of the time shown.