Skip to main content

Advertisement

Table 1 Comparison of curated and automatically-generated domain hierarchies

From: Automated hierarchical classification of protein domain subfamilies based on functionally-divergent residue signatures

CDD Protein superfamily number length Manually curated Automatically generated
Ident.   seqs   nodes* LLR nodes* LLR time§
cd00030 C2 23,452 102 106 (103) 236574 78(73) 223857 19.4
cd00138 PLDc_SF 16,765 119 105 (102) 241766 36(34) 192876 10.0
cd00142 PI3Kc_like 2,409 219 22 34129 16 34563 4.5
cd00159 RhoGAP 4,815 169 39(38) 55604 32 53540 7.97
cd00173 SH2 5,917 79 111 (101) 49274 39 40075 3.5
cd00180 Protein kinases 104,912 215 280(260) 1378273 107(104) 1536991 241.0
cd00229 SG NH_hydrolase 14,635 187 30 180667 29 183822 14.95
cd00306 S8/S53 peptidase 10,960 241 36 161685 45(44) 173693 30.90
cd00368 Molybdopterin-Binding 9,540 374 26 177569 44 209704 39.3
cd00397 DNA_BRE_C 25,824 164 27 (26) 187382 39(37) 211739 16.9
cd00761 Glycosyltransferase A (GT-A) 66,260 156 71 (70) 944727 123(110) 1048396 193.8
cd00768 Class II aaRS-like core 37,160 211 17 674454 31 833691 54.3
cd00838 MPP_superfamily 33,753 131 61 402297 55(54) 399553 65.1
cd00900 PH-like 22,593 99 81 211812 99(98) 274945 52.3
cd01067 Globin_like 9,933 117 4 (1) 11133 26 (25) 73808 4.3
cd01391 Periplasmic_Binding_Protein_1 36,330 269 142(140) 619713 68(65) 580753 169.1
cd01494 AAT_I (Pyrodoxal-PO4-binding) 114,781 170 16 1086328 92(84) 2027660 249.67
cd01635 Glycosyltransferase GTB 44,366 229 45 723443 95(93) 881414 232.7
cd02156 Class I aaRS-like core √ 53,605 105 34 522962 61(57) 698273 41.4
cd02883 Nudix_Hydrolase 32,046 123 55 (54) 321636 61(60) 367819 43.2
cd03128 GAT-1 (mcBPPS vs pmcBPPS) 46,514 92 34(32) 319515 64(62) 388621 42.2
cd03440 hot_dog 30,162 100 22(18) 141990 70 (69) 345298 39.1
cd03873 Zinc peptidases 24,455 237 81 596408 69(66) 590521 43.9
cd05466 Periplasmic_Binding_Protein_2 45,287 197 76(73) 523941 49(41) 411445 31.7
cd06587 Glo_EDI_BRP_like 36,165 112 60 (58) 335848 94(91) 479522 54.8
cd06663 Biotinyl-lipoyl 25,013 73 4 53038 25(18) 66571 4.53
cd06846 Adenylation_DNA_ligase_like 3,833 182 14 43276 20 48,475 4.8
cd08555 PI-PLCc_GDPD_SF 8,707 179 74 (73) 143201 37(32) 123075 6.9
cd08772 GH43_62_32_68 (β propellers) 6,760 286 28 111336 51(50) 176701 30.0
cl09931 Rossmann fold proteins 424,764 93 361 (347) 4110907 145(130) 4029120 757.2
  Average 44,057 167.7 66.4 486696 56.9 556884 83.6
  1. After removing identical sequences and sequences that fail to align with at least 75% of the domain.
  2. * Numbers in parentheses indicate the nodes retained after insignificant nodes were removed by the mcBPPS program.
  3. The log-likelihood ratio in nats.
  4. § The time (in minutes) is for Steps 2 and 3 of the algorithm only; Step 1 can be parallelized to run in less than 10% of the time shown.