Skip to main content

Table 1 Random forest feature selection methods and their permutation requirements

From: binomialRF: interpretable combinatoric efficiency of random forests to identify biomarker interactions

Permute Method P-value Brief description
No binomialRF [15] Yes Optimal splitting features’ p-values obtained via one-sided correlated binomial tests
EFS [16] No Calculates a global score for each feature using 8 different metrics to measure importance and selects features whose score exceeds the median global score
AUC-RF [17] No Iteratively trains a random forest algorithm and removes predictors in a stepwise fashion to maximize an AUC increase
RFE, dRFE [18] No Iteratively trains a random forest (RF) model and drops uninformative features based on a user-defined criterion
RF-ACE [19] No Creates phony variables called “Artificial Contrasts with Ensembles”, and compares how often these sham variables are used over the real ones
R2VIM [12] No Calculates variable importance (VI) and divides by minimum VI to create relative VI, and choose important features based on a pre-selected cutoff
VarSelRF, geneSrF [5] No Iteratively removes worst .20 (or x-percentage) of all features; retrains RF; selects smallest feature set within one set of best models
Yes Vita [20] Yes P-values are calculated based on empirical null distribution of non-positive importance scores that accelerate null distribution estimates
Perm [20] Yes Permutes outcomes (Y) and determines importance based on which features retained a larger importance in Yoriginal vs. Ypermuted
PIMP [14] Yes Permutes outcome and determines features’ priority based on increases in mutual information or Gini errors. A feature’s p-values is produced by an importance measure fitted to a distribution
VSURF [17] No Two-step FS algorithm: 1) uses predictor permutations to identify features robust to noise, and 2) refines model by conducting step-forward inclusion of features until error convergence
Boruta [13] No Creates phony predictors by permuting the values of the shadow vars. Runs RF to identify features’ Z-scores. Eliminates features whose Z-score are less than a threshold. Repeats until convergence
  1. Absence of permutations generally decreases substantially computing time. P-values provide explicit ranking of features, which enables objective feature thresholding