Skip to main content

Table 6 Performance of BioHEL in the RSA datasets.

From: Automated Alphabet Reduction for Protein Datasets

Strategy Alphabet Size % Accuracy #Rules #expr. atts.
Orig. 20+1 70.7 ± 0.4 58.6 ± 2.3 9.0 ± 0.2
MI 2 67.6 ± 0.3• 52.9 ± 4.2 5.8 ± 1.3
  3 69.4 ± 0.3• 54.9 ± 1.1 5.4 ± 1.2
  4 68.9 ± 0.6• 54.5 ± 1.3 5.9 ± 1.2
  5 67.9 ± 0.9• 53.1 ± 3.8 6.8 ± 1.2
RMI 2 67.6 ± 0.3• 52.9 ± 4.2 5.8 ± 1.3
  3 69.7 ± 0.4• 56.5 ± 1.3 5.5 ± 1.2
  4 69.9 ± 0.4• 57.5 ± 1.2 6.3 ± 1.4
  5 -- -- --
DualRMI 2 66.6 ± 0.4• 33.4 ± 4.8 3.7 ± 0.8
  3 69.9 ± 0.4• 56.7 ± 1.3 5.3 ± 1.1
  4 70.1 ± 0.4 58.0 ± 1.2 6.0 ± 1.4
  5 70.3 ± 0.4 58.2 ± 1.1 6.5 ± 1.6
  1. Accuracy is the average test accuracy from the ten cross-validation folds. A • marks reduced datasets where performance is significantly worse than the original full AA representation, according to the statistical t-tests with a 99% confidence level.