Skip to main content

Table 4 Performance of BioHEL in the CN datasets.

From: Automated Alphabet Reduction for Protein Datasets

Strategy

Alphabet Size

% Accuracy

#Rules

#expr. atts.

Orig

20+1

74.0 ± 0.6

34.4 ± 1.7

9.0 ± 0.1

MI

2

72.3 ± 0.6•

21.4 ± 1.0

6.2 ± 0.7

 

3

73.2 ± 0.6

30.2 ± 1.7

6.7 ± 1.0

 

4

72.4 ± 0.8•

26.4 ± 2.1

7.1 ± 1.1

 

5

71.8 ± 0.9•

23.4 ± 4.8

7.8 ± 1.0

RMI

2

72.3 ± 0.6•

21.4 ± 1.0

6.2 ± 0.7

 

3

73.2 ± 0.6

30.2 ± 1.7

6.7 ± 1.0

 

4

73.3 ± 0.5

30.2 ± 1.5

6.1 ± 1.1

 

5

--

--

--

DualRMI

2

72.4 ± 0.5•

24.0 ± 1.3

7.0 ± 1.0

 

3

73.0 ± 0.6•

29.1 ± 1.6

6.5 ± 1.1

 

4

73.3 ± 0.6

29.7 ± 1.3

6.3 ± 1.0

 

5

73.3 ± 0.5

30.4 ± 1.1

6.2 ± 1.1

  1. Accuracy is the average test accuracy from the ten cross-validation folds. A • marks reduced datasets where performance is significantly worse than the original full AA representation according to statistical t-tests with 99% confidence level.