Skip to main content

Advertisement

Table 3 Dataset training statistics and prediction accuracies of six protein functional families. DS refers to descriptor set, where D1 = amino acid composition; D2 = dipeptide composition; D3 = Moreau-Broto autocorrelation; D4 = Moran autocorrelation; D5 = Geary autocorrelation; D6 = composition, transition and distribution descriptors; D7 = quasi sequence order; D8 = pseudo amino acid composition; D9 = combination of D1+D2; and D10 = combination of D1-D8. Predicted results given as TP (true positive), FN (false negative), TN (true negative), FP (false positive), Sen (sensitivity), Spec (specificity), Q (overall accuracy) and MCC (Matthews correlation coefficient).

From: Efficacy of different protein descriptors in predicting protein functional families

Protein family Des-criptor set Training set Testing set Independent evaluation set   
   P N P N P N Q(%) MCC
     TP FN TN FP TP FN Sen(%) TN FP Spec(%)   
EC2.4 D1 1249 2120 1154 1 9065 12 724 176 80.4 3244 202 94.1 91.3 0.74
  D2 1319 2120 1080 5 8806 1 646 154 82.9 3349 97 97.2 94.1 0.80
  D3 1105 1756 1295 4 9166 5 768 132 85.3 3394 52 98.5 95.8 0.87
  D4 1239 2221 1161 4 8701 5 756 144 84.0 3365 81 97.7 94.8 0.84
  D5 1242 2223 1160 2 8690 14 753 147 83.6 3391 55 98.4 95.4 0.85
  D6 1214 2077 1145 45 8846 4 741 159 82.3 3383 63 98.2 94.9 0.84
  D7 1293 2624 1072 39 8295 8 696 204 77.3 3270 176 94.9 91.3 0.73
  D8 1226 3008 1177 1 7918 1 794 106 88.2 3387 59 98.3 96.2 0.88
  D9 1275 2747 1129 0 8177 3 782 118 86.9 3367 79 97.7 95.5 0.86
  D10 1228 3254 1176 0 7672 1 798 102 88.7 3397 49 98.6 96.5 0.89
GPCR D1 1590 7458 1847 1 14166 3 505 17 96.7 6735 58 99.1 99.0 0.93
  D2 564 711 1728 3 14121 5 510 12 97.7 6737 56 99.2 99.1 0.93
  D3 1169 4628 1122 4 10208 1 507 15 97.1 6737 56 99.2 99.0 0.93
  D4 1257 4474 1037 1 10363 0 499 23 95.6 6745 48 99.3 99.0 0.93
  D5 1290 4724 997 8 10113 0 494 28 94.6 6734 59 99.1 98.8 0.91
  D6 757 2060 1536 2 12777 0 503 19 96.3 6742 51 99.2 99.0 0.93
  D7 812 2950 1482 1 11887 0 495 27 94.8 6696 97 98.6 98.3 0.88
  D8 653 2171 1644 0 12550 1 501 21 96.0 6769 24 99.7 99.4 0.95
  D9 1590 7458 693 12 7322 57 512 10 98.1 6735 58 99.1 99.1 0.93
  D10 672 2454 1625 0 12268 0 502 20 96.2 6757 36 99.5 99.2 0.94
TC8.A D1 118 2858 49 0 13121 0 36 27 57.1 1843 2 99.9 98.5 0.73
  D2 116 1100 50 0 14824 0 41 22 65.1 1843 2 99.9 98.7 0.78
  D3 94 7962 53 0 14501 0 42 21 66.7 1842 3 98.6 98.7 0.78
  D4 94 7962 47 0 11250 0 37 26 58.7 1843 2 99.9 98.5 0.74
  D5 94 7962 47 0 11137 0 37 26 58.7 1843 2 99.9 98.5 0.74
  D6 94 7962 64 0 15283 0 44 19 69.8 1843 2 99.9 98.9 0.81
  D7 94 7962 59 0 15045 0 43 20 68.3 1843 2 99.9 98.9 0.80
  D8 103 943 63 0 14981 0 48 15 76.2 1843 2 99.9 99.1 0.85
  D9 114 810 52 0 15114 0 41 22 65.1 1843 2 99.9 98.7 0.78
  D10 102 1068 64 0 14856 0 48 15 76.2 1843 2 99.9 99.1 0.85
Chlorophyll D1 356 7928 166 0 14297 0 182 128 58.7 1587 11 99.3 92.7 0.71
  D2 4S40 934 248 1 7927 1 228 82 73.6 1595 3 99.8 95.6 0.83
  D3 425 603 264 0 15253 0 246 64 79.4 1594 4 99.8 96.4 0.86
  D4 415 574 273 1 15282 0 247 65 79.7 1597 1 99.9 96.6 0.87
  D5 429 615 259 1 15240 1 233 77 75.2 1597 1 99.9 95.9 0.84
  D6 482 946 202 5 14910 0 205 105 66.1 1597 1 99.9 94.4 0.79
  D7 394 3337 210 85 12517 2 178 132 57.4 1597 1 99.9 93.0 0.73
  D8 371 1421 317 1 14435 0 255 55 82.3 1593 5 99.7 96.9 0.88
  D9 399 1273 289 1 14582 1 249 61 80.3 1591 7 99.6 96.4 0.86
  D10 381 1753 307 1 14102 1 251 59 81.0 1594 4 99.8 96.7 0.88
Lipid synthesis D1 849 2026 705 3 8229 7 470 165 74.0 1218 57 95.5 88.4 0.73
  D2 927 2037 629 1 8225 0 512 123 80.6 1259 16 98.6 92.7 0.84
  D3 898 2968 659 0 7294 0 509 126 80.2 1271 4 99.7 93.2 0.84
  D4 968 3227 588 1 7035 0 493 142 77.6 1273 2 99.8 92.5 0.83
  D5 970 3280 586 1 6982 0 491 144 77.3 1260 15 98.8 91.7 0.81
  D6 874 2112 681 2 8149 1 525 110 82.7 1268 7 99.5 93.9 0.86
  D7 863 2415 692 2 7845 2 512 123 80.6 1271 4 99.7 93.4 0.85
  D8 907 1608 615 0 4488 0 498 137 78.4 1268 7 99.5 92.5 0.83
  D9 815 1613 740 2 8638 11 525 110 82.7 1248 27 97.9 92.8 0.84
  D10 865 1640 657 0 4456 0 531 104 83.6 1268 7 99.5 94.2 0.87
rRNA binding D1 548 579 3390 6 9598 22 1824 87 95.5 3511 60 98.3 97.3 0.94
  D2 1133 1225 2811 0 8974 0 1844 67 96.5 3519 52 98.5 97.8 0.95
  D3 1126 1638 2816 2 8560 1 1812 99 94.8 3535 36 99.0 97.5 0.95
  D4 1337 1958 2697 0 8241 0 1783 128 93.3 3484 87 97.6 96.1 0.91
  D5 1372 1976 2572 0 8223 0 1784 127 93.4 3479 92 97.4 96.0 0.91
  D6 921 1208 2971 52 8991 0 1824 87 95.5 3541 30 99.2 97.9 0.95
  D7 878 2743 3040 26 7442 14 1808 103 97.9 3481 90 97.5 96.5 0.92
  D8 810 2245 3143 0 7954 0 1849 62 96.8 3541 30 99.2 98.3 0.96
  D9 810 972 3075 3 9182 2 1848 63 96.7 3526 45 98.7 98.0 0.96
  D10 900 2600 3044 0 7599 0 1858 53 97.2 3547 24 99.3 98.6 0.97