Skip to main content

Table 1 Recursive feature selection.

From: A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data

1.

Calculate feature importance on the training data

 

   a. Gini importance

 

   b. absolute value of regression coefficients (PLS/PCR)

 

   c. p-values from Wilcoxon-test/t-test

2.

Rank the features according to the importance measure, remove the p% least important

3.

Train the classifier on the training data

 

   A. Random forest

 

   B. D-PLS

 

   C. D-PCR

 

and apply it to the test data

4.

Repeat 1.–4. until no features are remaining

5.

Identify the best feature subset according to the test error

  1. Workflow of the recursive feature selection, and combinations of feature importance measures (1.a-1.c) and classifiers (3.A-3.C) tested in this study. Compare with results in Table 2 and Fig. 4. Hyper-parameters of PLS/PCR/random forest are optimized both in the feature selection (1.) and the classification (3.) step utilizing the training data only. While Gini importance (1.a) and regression coefficients (1.b) have to be calculated within each loop (step 1.–4.), the univariate measures (1.c) have only to be calculated once.