binomialRF: interpretable combinatoric efficiency of random forests to identify biomarker interactions

Rachid Zaim, Samir; Kenost, Colleen; Berghout, Joanne; Chiu, Wesley; Wilson, Liam; Zhang, Hao Helen; Lussier, Yves A.

doi:10.1186/s12859-020-03718-9

BMC Bioinformatics

Table 1 Random forest feature selection methods and their permutation requirements

From: binomialRF: interpretable combinatoric efficiency of random forests to identify biomarker interactions

Permute	Method	P-value	Brief description
No	binomialRF [15]	Yes	Optimal splitting features’ p-values obtained via one-sided *correlated* binomial tests
	EFS [16]	No	Calculates a global score for each feature using 8 different metrics to measure importance and selects features whose score exceeds the median global score
	AUC-RF [17]	No	Iteratively trains a random forest algorithm and removes predictors in a stepwise fashion to maximize an AUC increase
	RFE, dRFE [18]	No	Iteratively trains a random forest (RF) model and drops uninformative features based on a user-defined criterion
	RF-ACE [19]	No	Creates phony variables called “Artificial Contrasts with Ensembles”, and compares how often these sham variables are used over the real ones
	R2VIM [12]	No	Calculates variable importance (VI) and divides by minimum VI to create relative VI, and choose important features based on a pre-selected cutoff
	VarSelRF, geneSrF [5]	No	Iteratively removes worst .20 (or x-percentage) of all features; retrains RF; selects smallest feature set within one set of best models
Yes	Vita [20]	Yes	P-values are calculated based on empirical null distribution of non-positive importance scores that accelerate null distribution estimates
	Perm [20]	Yes	Permutes outcomes (Y) and determines importance based on which features retained a larger importance in Y_original vs. Y_permuted
	PIMP [14]	Yes	Permutes outcome and determines features’ priority based on increases in mutual information or Gini errors. A feature’s p-values is produced by an importance measure fitted to a distribution
	VSURF [17]	No	Two-step FS algorithm: 1) uses predictor permutations to identify features robust to noise, and 2) refines model by conducting step-forward inclusion of features until error convergence
	Boruta [13]	No	Creates phony predictors by permuting the values of the shadow vars. Runs RF to identify features’ Z-scores. Eliminates features whose Z-score are less than a threshold. Repeats until convergence

Absence of permutations generally decreases substantially computing time. P-values provide explicit ranking of features, which enables objective feature thresholding

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com