Skip to main content

Table 6 Performance of BioHEL in the RSA datasets.

From: Automated Alphabet Reduction for Protein Datasets

Strategy

Alphabet Size

% Accuracy

#Rules

#expr. atts.

Orig.

20+1

70.7 ± 0.4

58.6 ± 2.3

9.0 ± 0.2

MI

2

67.6 ± 0.3•

52.9 ± 4.2

5.8 ± 1.3

 

3

69.4 ± 0.3•

54.9 ± 1.1

5.4 ± 1.2

 

4

68.9 ± 0.6•

54.5 ± 1.3

5.9 ± 1.2

 

5

67.9 ± 0.9•

53.1 ± 3.8

6.8 ± 1.2

RMI

2

67.6 ± 0.3•

52.9 ± 4.2

5.8 ± 1.3

 

3

69.7 ± 0.4•

56.5 ± 1.3

5.5 ± 1.2

 

4

69.9 ± 0.4•

57.5 ± 1.2

6.3 ± 1.4

 

5

--

--

--

DualRMI

2

66.6 ± 0.4•

33.4 ± 4.8

3.7 ± 0.8

 

3

69.9 ± 0.4•

56.7 ± 1.3

5.3 ± 1.1

 

4

70.1 ± 0.4

58.0 ± 1.2

6.0 ± 1.4

 

5

70.3 ± 0.4

58.2 ± 1.1

6.5 ± 1.6

  1. Accuracy is the average test accuracy from the ten cross-validation folds. A • marks reduced datasets where performance is significantly worse than the original full AA representation, according to the statistical t-tests with a 99% confidence level.