Skip to main content

Advertisement

Table 4 Dataset statistics and prediction accuracies after homologous sequences removal (HSR) at 90% and 70% identity. DS refers to descriptor set, where D1 = amino acid composition; D2 = dipeptide composition; D3 = Moreau-Broto autocorrelation; D4 = Moran autocorrelation; D5 = Geary autocorrelation; D6 = composition, transition and distribution descriptors; D7 = quasi sequence order; D8 = pseudo amino acid composition; D9 = combination of D1+D2; and D10 = combination of D1-D8. Predicted results given as TP (true positive), FN (false negative), TN (true negative), FP (false positive), Sen (sensitivity), Spec (specificity), Q (overall accuracy) and MCC (Matthews correlation coefficient).

From: Efficacy of different protein descriptors in predicting protein functional families

    Independent evaluation set
Protein family % HSR DS P N Q (%) MCC
    TP FN Sen(%) TN FP Spec(%)   
EC2.4 90 D1 552 250 68.8 3235 201 94.2 89.4 0.65
   D2 626 176 78.1 3339 97 97.2 93.6 0.78
   D3 609 193 75.9 3384 52 98.5 94.2 0.80
   D4 603 199 75.2 3355 81 97.6 93.4 0.78
   D5 591 211 73.7 3381 55 98.4 93.7 0.79
   D6 501 301 62.5 3374 62 98.2 91.4 0.70
   D7 545 257 68.0 3261 175 94.9 89.8 0.66
   D8 666 136 83.0 3375 61 98.2 95.4 0.84
   D9 630 172 78.6 3357 79 97.7 94.1 0.80
   D10 670 132 83.5 3388 48 98.6 95.8 0.86
  70 D1 459 223 67.3 3193 199 94.1 89.6 0.62
   D2 516 166 75.7 3296 96 97.2 93.6 0.76
   D3 503 179 73.8 3341 51 98.5 94.4 0.78
   D4 495 187 72.6 3311 81 97.6 93.4 0.75
   D5 484 198 71.0 3339 53 98.4 93.8 0.77
   D6 399 283 58.5 3330 62 98.2 91.5 0.67
   D7 452 230 66.3 3218 174 94.9 90.1 0.63
   D8 551 131 80.8 3331 61 98.2 95.3 0.83
   D9 520 162 76.3 3314 78 97.7 94.1 0.78
   D10 554 128 81.2 3344 48 98.6 95.7 0.84
GPCR 90 D1 391 13 96.8 6724 58 99.1 99.0 0.91
   D2 395 9 97.8 6744 38 99.4 99.4 0.94
   D3 393 11 97.3 6726 56 99.2 99.1 0.92
   D4 386 18 95.5 6734 48 99.3 99.1 0.92
   D5 381 23 94.3 6723 59 99.1 98.9 0.90
   D6 391 13 96.8 6731 51 99.3 99.1 0.92
   D7 382 22 94.6 6685 97 98.6 98.3 0.86
   D8 387 17 95.8 6758 24 99.7 99.4 0.95
   D9 391 13 96.8 6752 30 99.6 99.4 0.94
   D10 388 16 96.0 6762 20 99.7 99.5 0.95
  70 D1 307 8 97.5 6695 58 99.1 99.1 0.90
   D2 309 6 98.1 6715 38 99.4 99.4 0.93
   D3 306 9 97.1 6697 56 99.2 99.1 0.90
   D4 301 14 95.6 6705 48 99.3 99.1 0.90
   D5 198 17 94.6 6694 59 99.1 98.9 0.88
   D6 307 8 97.5 6702 51 99.2 99.2 0.91
   D7 296 19 94.0 6656 97 98.6 98.4 0.83
   D8 301 14 95.6 6729 24 99.6 99.5 0.94
   D9 307 8 97.5 6723 30 99.6 99.5 0.94
   D10 302 13 95.9 6733 20 99.7 99.5 0.95
TC8.A 90 D1 28 27 50.9 1846 2 99.9 98.5 0.68
   D2 33 22 60.0 1846 2 99.9 98.7 0.75
   D3 34 21 61.8 1845 3 99.8 98.7 0.75
   D4 29 26 52.7 1845 3 99.8 98.8 0.75
   D5 29 26 52.7 1845 3 99.8 98.8 0.75
   D6 36 19 65.5 1846 2 99.9 98.9 0.78
   D7 35 20 63.6 1845 3 99.8 98.8 0.76
   D8 40 15 72.7 1845 3 99.8 99.2 0.82
   D9 33 22 60.0 1846 2 99.9 98.7 0.75
   D10 40 15 72.7 1845 3 99.8 99.2 0.82
  70 D1 25 24 51.0 1828 2 99.9 98.6 0.68
   D2 29 20 59.2 1828 2 99.9 98.8 0.74
   D3 29 20 59.2 1827 3 99.8 98.8 0.73
   D4 26 23 53.1 1828 2 99.9 98.7 0.70
   D5 26 23 53.1 1828 2 99.9 98.7 0.70
   D6 33 16 67.3 1828 2 99.9 99.0 0.79
   D7 30 19 61.2 1827 3 99.8 98.8 0.74
   D8 36 13 73.5 1827 3 99.8 99.2 0.82
   D9 29 20 59.2 1828 2 99.9 98.8 0.74
   D10 36 13 73.5 1827 3 99.8 99.2 0.82
Chlorophyll 90 D1 159 127 55.6 1594 8 99.5 92.9 0.70
   D2 205 81 71.7 1598 4 99.8 95.5 0.82
   D3 224 62 78.3 1599 3 99.8 96.6 0.86
   D4 222 64 77.6 1599 3 99.8 96.5 0.86
   D5 211 75 73.8 1598 4 99.8 95.8 0.83
   D6 182 104 63.6 1594 8 99.5 94.1 0.75
   D7 159 127 55.6 1595 9 99.4 92.8 0.69
   D8 233 53 81.5 1595 7 99.6 96.8 0.87
   D9 224 62 78.3 1594 8 99.5 96.3 0.85
   D10 229 57 80.1 1597 5 99.7 96.7 0.87
  70 D1 113 118 48.9 1578 8 99.5 93.1 0.65
   D2 155 76 67.1 1582 4 99.8 95.6 0.79
   D3 171 60 74.0 1583 3 99.8 96.5 0.84
   D4 171 60 74.0 1583 3 99.8 96.5 0.84
   D5 161 70 69.7 1582 4 99.8 95.9 0.81
   D6 137 94 59.3 1578 8 99.5 94.4 0.72
   D7 114 117 49.4 1575 11 99.3 93.0 0.64
   D8 182 49 78.8 1579 7 99.6 96.9 0.85
   D9 172 59 74.5 1578 8 99.5 96.3 0.82
   D10 178 53 77.1 1581 5 99.7 96.8 0.85
Lipid synthesis 90 D1 403 149 73.0 1213 59 95.4 88.6 0.72
   D2 431 121 78.1 1256 16 98.7 92.5 0.81
   D3 436 116 79.0 1268 4 99.7 93.4 0.84
   D4 421 131 76.3 1270 2 99.8 92.7 0.83
   D5 416 136 75.4 1270 2 99.8 92.4 0.82
   D6 449 103 81.3 1270 2 99.8 94.2 0.86
   D7 435 117 78.8 1269 3 99.8 93.4 0.84
   D8 423 129 76.6 1265 7 99.5 92.5 0.82
   D9 449 103 81.3 1245 27 97.9 92.9 0.83
   D10 454 98 82.3 1265 7 99.5 94.2 0.86
  70 D1 316 138 69.6 1205 59 95.3 88.5 0.69
   D2 343 111 75.6 1248 16 98.7 92.6 0.81
   D3 340 114 74.9 1260 4 99.7 93.1 0.82
   D4 330 124 72.7 1262 2 99.8 92.7 0.81
   D5 328 126 72.3 1260 4 99.7 92.4 0.80
   D6 358 96 78.9 1244 20 98.4 93.3 0.82
   D7 342 112 75.3 1257 7 99.5 93.1 0.82
   D8 331 123 72.9 1257 7 99.4 92.4 0.80
   D9 360 94 79.3 1237 27 97.9 93.0 0.81
   D10 360 94 79.3 1257 7 99.5 94.1 0.85
rRNA binding 90 D1 1407 91 93.9 3502 59 98.3 97.0 0.93
   D2 1437 61 95.9 3510 51 98.6 97.8 0.95
   D3 1403 95 93.7 3529 32 99.1 97.5 0.93
   D4 1347 151 89.9 3491 70 98.0 95.6 0.89
   D5 1347 151 89.9 3533 28 99.2 96.5 0.91
   D6 1451 47 96.9 3537 24 99.3 98.6 0.97
   D7 1358 140 90.7 3429 132 96.3 94.6 0.87
   D8 1442 56 96.3 3531 30 99.2 98.3 0.96
   D9 1436 62 95.9 3518 43 98.8 97.9 0.95
   D10 1449 49 96.7 3537 24 99.3 98.6 0.97
  70 D1 924 83 91.8 3454 59 98.3 96.9 0.91
   D2 952 55 94.5 3463 50 98.6 97.7 0.93
   D3 920 87 91.4 3483 30 99.2 97.4 0.92
   D4 907 100 90.1 3444 69 98.0 96.3 0.89
   D5 908 99 90.2 3485 28 99.2 97.2 0.92
   D6 963 44 95.6 3493 20 99.4 98.6 0.96
   D7 917 90 91.1 3382 131 96.3 95.1 0.86
   D8 654 53 94.7 3484 29 99.2 98.2 0.95
   D9 950 57 94.3 3471 42 98.8 97.8 0.94
   D10 960 47 95.3 3490 23 99.4 98.5 0.96