| |
no selection
|
Univariate selection
|
Multivariate selection (Gini importance)
|
multivariate selection (PLS/PC)
|
---|
| |
PLS
|
PC
|
RF
|
PLS
|
PC
|
RF
|
PLS
|
PC
|
RF
|
PLS
|
PC
|
RF
|
MIR BSE
|
orig
|
66.8
|
62.9
|
74.9
|
80.7
|
80.7
|
76.7
|
84.1
|
83.2
|
77.4
|
68
|
63.5
|
75.5
|
| |
-
|
-
|
-
|
***
|
***
|
*
|
***
|
***
|
**
|
**
| | |
|
binned
|
72.7
|
73.4
|
75.3
|
80.4
|
80.7
|
76.6
|
86.8
|
85.8
|
77.3
|
85
|
82.1
|
75.6
|
| |
-
|
-
|
-
|
***
|
***
|
**
|
***
|
***
|
**
|
***
|
***
| |
MIR wine
|
French
|
69.5
|
69.3
|
79.3
|
83.7
|
83.5
|
82.2
|
82.4
|
81
|
81.2
|
66.9
|
70.0
|
79.8
|
| |
-
|
-
|
-
|
***
|
**
| |
***
|
**
|
*
| | | |
|
grape
|
77
|
71.4
|
90.2
|
98.1
|
98.7
|
90.3
|
98.4
|
98.4
|
94.2
|
91.7
|
88.5
|
90.4
|
| |
-
|
-
|
-
|
***
|
***
| |
***
|
***
|
**
|
***
|
***
| |
NMR tumor
|
all
|
88.8
|
89
|
89
|
89.3
|
89.3
|
90.5
|
90.0
|
89.6
|
89.6
|
89.3
|
89.2
|
89.1
|
| |
-
|
-
|
-
|
*
| |
***
|
**
| |
*
| | | |
|
center
|
71.6
|
72.3
|
73.1
|
73.9
|
72.7
|
73.9
|
72.6
|
72.0
|
74.3
|
71.8
|
72.7
|
73.3
|
| |
-
|
-
|
-
|
**
| | |
*
| | | | | |
NMR candida
|
1
|
94.9
|
94.6
|
90.3
|
95.1
|
94.9
|
90.6
|
95.6
|
95.3
|
90.3
|
95.3
|
95.2
|
90.7
|
| |
-
|
-
|
-
| | | | | | | | | |
|
2
|
95.6
|
95.2
|
93.2
|
95.8
|
95.7
|
93.7
|
95.6
|
95.5
|
93.5
|
96.0
|
95.9
|
94.1
|
| |
-
|
-
|
-
| | | | | | |
*
| | |
|
3
|
93.7
|
93.8
|
89.7
|
93.7
|
93.8
|
89.9
|
94.2
|
93.8
|
89.9
|
94.0
|
94.0
|
90.2
|
| |
-
|
-
|
-
| | | |
*
| |
*
|
*
| | |
|
4
|
86.9
|
87.3
|
83.9
|
87.8
|
87.3
|
84.0
|
88.2
|
87.6
|
84.3
|
87.7
|
87.6
|
84.1
|
| |
-
|
-
|
-
| | | |
*
| | | | | |
|
5
|
92.7
|
92.6
|
89.2
|
92.7
|
92.6
|
89.9
|
92.5
|
92.5
|
90.3
|
92.8
|
92.6
|
90.0
|
| |
-
|
-
|
-
| | | | | | | | | |
- The best classification results on each data set are underlined. Approaches which do not differ significantly from the optimal result (at a 0.05 significance level) are set in bold type (see methods section). Significant differences in the performance of a method as compared to the same classifier without feature selection are marked with asterisks (* p-value < 0.05, ** p-value < 0.01, *** p-value < .001). The MIR data of this table benefit significantly from a feature selection, whereas the NMR data do so only to a minor extent. Overall, a feature selection by means of Gini importance in conjunction with a PLS classifier was successful in all cases and superior to the "native" classifier of Gini importance, the random forest, in all but one cases.