Skip to main content

Table 4 Performance of BioHEL in the CN datasets.

From: Automated Alphabet Reduction for Protein Datasets

Strategy Alphabet Size % Accuracy #Rules #expr. atts.
Orig 20+1 74.0 ± 0.6 34.4 ± 1.7 9.0 ± 0.1
MI 2 72.3 ± 0.6• 21.4 ± 1.0 6.2 ± 0.7
  3 73.2 ± 0.6 30.2 ± 1.7 6.7 ± 1.0
  4 72.4 ± 0.8• 26.4 ± 2.1 7.1 ± 1.1
  5 71.8 ± 0.9• 23.4 ± 4.8 7.8 ± 1.0
RMI 2 72.3 ± 0.6• 21.4 ± 1.0 6.2 ± 0.7
  3 73.2 ± 0.6 30.2 ± 1.7 6.7 ± 1.0
  4 73.3 ± 0.5 30.2 ± 1.5 6.1 ± 1.1
  5 -- -- --
DualRMI 2 72.4 ± 0.5• 24.0 ± 1.3 7.0 ± 1.0
  3 73.0 ± 0.6• 29.1 ± 1.6 6.5 ± 1.1
  4 73.3 ± 0.6 29.7 ± 1.3 6.3 ± 1.0
  5 73.3 ± 0.5 30.4 ± 1.1 6.2 ± 1.1
  1. Accuracy is the average test accuracy from the ten cross-validation folds. A • marks reduced datasets where performance is significantly worse than the original full AA representation according to statistical t-tests with 99% confidence level.