Skip to main content

Table 1 Random forest feature selection methods and their permutation requirements

From: binomialRF: interpretable combinatoric efficiency of random forests to identify biomarker interactions

Permute

Method

P-value

Brief description

No

binomialRF [15]

Yes

Optimal splitting features’ p-values obtained via one-sided correlated binomial tests

EFS [16]

No

Calculates a global score for each feature using 8 different metrics to measure importance and selects features whose score exceeds the median global score

AUC-RF [17]

No

Iteratively trains a random forest algorithm and removes predictors in a stepwise fashion to maximize an AUC increase

RFE, dRFE [18]

No

Iteratively trains a random forest (RF) model and drops uninformative features based on a user-defined criterion

RF-ACE [19]

No

Creates phony variables called “Artificial Contrasts with Ensembles”, and compares how often these sham variables are used over the real ones

R2VIM [12]

No

Calculates variable importance (VI) and divides by minimum VI to create relative VI, and choose important features based on a pre-selected cutoff

VarSelRF, geneSrF [5]

No

Iteratively removes worst .20 (or x-percentage) of all features; retrains RF; selects smallest feature set within one set of best models

Yes

Vita [20]

Yes

P-values are calculated based on empirical null distribution of non-positive importance scores that accelerate null distribution estimates

Perm [20]

Yes

Permutes outcomes (Y) and determines importance based on which features retained a larger importance in Yoriginal vs. Ypermuted

PIMP [14]

Yes

Permutes outcome and determines features’ priority based on increases in mutual information or Gini errors. A feature’s p-values is produced by an importance measure fitted to a distribution

VSURF [17]

No

Two-step FS algorithm: 1) uses predictor permutations to identify features robust to noise, and 2) refines model by conducting step-forward inclusion of features until error convergence

Boruta [13]

No

Creates phony predictors by permuting the values of the shadow vars. Runs RF to identify features’ Z-scores. Eliminates features whose Z-score are less than a threshold. Repeats until convergence

  1. Absence of permutations generally decreases substantially computing time. P-values provide explicit ranking of features, which enables objective feature thresholding