Skip to main content

Table 3 Benefit from feature selection.

From: A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data

  

univariate selection

multivariate selection (Gini importance)

multivariate selection (PLS/PC)

  

PLS

PC

RF

PLS

PC

RF

PLS

PC

RF

MIR BSE

orig

10.0 (6)

10.0 (4)

1.5 (7)

10.0 (6)

10.0 (6)

2.0

(6)

3.0 (13)

0.9 (80)

0.5 (51)

 

binned

6.0

(5)

7.0

(5)

2.2 (9)

10.0 (9)

10.0 (6)

3.0

(9)

10.0 (5)

9.0

(4)

0.4 (51)

MIR wine

French

4.0

(3)

3.0

(2)

0.7 (64)

5.0

(3)

3.0

(1)

3.0 (26)

0.0 (100)

0.6 (33)

0.0 (64)

 

grape

8.0

(2)

8.0 (21)

0.6 (64)

10.0 (4)

10.0 (5)

2.0 (11)

4.0

(1)

6.0

(1)

0.0 (64)

NMR tumor

all

1.0 (80)

0.5 (11)

4.0 (6)

4.0 (51)

0.0 (100)

2.0

(6)

0.8 (11)

0.0 (100)

0.3 (80)

 

center

2.0

(7)

0.4

(6)

0.8 (86)

2.0 (26)

0.2 (64)

0.7 (41)

0.0 (100)

0.7 (13)

0.3 (80)

NMR candida

1

0.5 (80)

0.0 (80)

0.8 (80)

0.0 (100)

0.0 (100)

0.8 (41)

0.4 (64)

0.0 (100)

0.4 (9)

 

2

0.4 (80)

0.9 (64)

0.0 (80)

2.0 (64)

0.4 (26)

0.0 (100)

2.0 (21)

1.0 (21)

0.4 (41)

 

3

0.0 (100)

0.0 (100)

0.0 (80)

2.0 (80)

0.6 (80)

2.0 (26)

2.0 (80)

0.0 (100)

0.7 (41)

 

4

0.8 (80)

0.0 (100)

0.0 (80)

2.0 (80)

1.0 (80)

2.0 (64)

0.7 (33)

0.0 (100)

0.3 (32)

 

5

0.0 (100)

0.0 (100)

0.7 (80)

0.0 (100)

0.4 (80)

1.0 (64)

0.7 (64)

0.7 (80)

0.4 (21)

  1. Significance of accuracy improvement with feature selection as compared to using the full set of features; and percentage of original features used in a classification that has maximum accuracy (in parentheses). The significance is specified by -log10(p), where p is the p-value of a paired Wilcoxon test on the 100 hold-outs of the cross-validation (see text). For comparison, -log(0.05) = 1.3 and -log(0.001) = 3; the value of 6.0 reported for MIR BSE binned in the second row of the first column corresponds to a highly significant improvement in classification accuracy, corresponding to a p-value of 10-6.