Skip to main content

Table 2 Average cross-validated prediction accuracy.

From: A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data

   no selection Univariate selection Multivariate selection (Gini importance) multivariate selection (PLS/PC)
   PLS PC RF PLS PC RF PLS PC RF PLS PC RF
MIR BSE orig 66.8 62.9 74.9 80.7 80.7 76.7 84.1 83.2 77.4 68 63.5 75.5
   - - - *** *** * *** *** ** **   
  binned 72.7 73.4 75.3 80.4 80.7 76.6 86.8 85.8 77.3 85 82.1 75.6
   - - - *** *** ** *** *** ** *** ***  
MIR wine French 69.5 69.3 79.3 83.7 83.5 82.2 82.4 81 81.2 66.9 70.0 79.8
   - - - *** **   *** ** *    
  grape 77 71.4 90.2 98.1 98.7 90.3 98.4 98.4 94.2 91.7 88.5 90.4
   - - - *** ***   *** *** ** *** ***  
NMR tumor all 88.8 89 89 89.3 89.3 90.5 90.0 89.6 89.6 89.3 89.2 89.1
   - - - *   *** **   *    
  center 71.6 72.3 73.1 73.9 72.7 73.9 72.6 72.0 74.3 71.8 72.7 73.3
   - - - **    *      
NMR candida 1 94.9 94.6 90.3 95.1 94.9 90.6 95.6 95.3 90.3 95.3 95.2 90.7
   - - -          
  2 95.6 95.2 93.2 95.8 95.7 93.7 95.6 95.5 93.5 96.0 95.9 94.1
   - - -        *   
  3 93.7 93.8 89.7 93.7 93.8 89.9 94.2 93.8 89.9 94.0 94.0 90.2
   - - -     *   * *   
  4 86.9 87.3 83.9 87.8 87.3 84.0 88.2 87.6 84.3 87.7 87.6 84.1
   - - -     *      
  5 92.7 92.6 89.2 92.7 92.6 89.9 92.5 92.5 90.3 92.8 92.6 90.0
   - - -          
  1. The best classification results on each data set are underlined. Approaches which do not differ significantly from the optimal result (at a 0.05 significance level) are set in bold type (see methods section). Significant differences in the performance of a method as compared to the same classifier without feature selection are marked with asterisks (* p-value < 0.05, ** p-value < 0.01, *** p-value < .001). The MIR data of this table benefit significantly from a feature selection, whereas the NMR data do so only to a minor extent. Overall, a feature selection by means of Gini importance in conjunction with a PLS classifier was successful in all cases and superior to the "native" classifier of Gini importance, the random forest, in all but one cases.