Skip to main content

Table 2 Summary of the modeling approaches included in the evaluation

From: A systematic evaluation of high-dimensional, ensemble-based regression for exploring large model spaces in microbiome analyses

Model

Ensemble characteristics

Output

Paradigm

R Package

 

Tuning parameter

Model space construction

   

ENC

λ ENC

None

Influential variables

\(\left.\vphantom {\frac {\frac {\sum \limits ^{\frac {\sum \limits ^{2}_{3}} {1}}_{1}}{\sum \limits ^{2}_{3}}}{1_{3}^{4}}}\right \}\text {Frequentist}\ (l_{1}, l_{2}\ \text {penalties})\)

quadrupen, glmnet

PS

λ MB

\(\left.\vphantom {\frac {\sum \limits ^{2}_{3}}{1_{1}^{\frac {1}{2}}}}\text {Subsampling}\right \}\)

Influential variables

  

LS

λ ENC

 

Inclusion probabilities

  

SS

Λ

 

Inclusion probabilities

  

PR

λ MB

\(\left.\vphantom {\frac {\sum \limits ^{2}_{3}}{1_{1}^{\frac {1}{2}}}}\right \}\text {Resampling}\)

Influential variables

  

LR

λ ENC

 

Inclusion probabilities

  

SR

Λ

 

Inclusion probabilities

  

BMA

E M S=1

\(\left.\vphantom {\frac {\sum \limits ^{2}}{1_{3}}}\right \}\text {MCMC}\)

Inclusion probabilities

\(\left.\vphantom {\frac {\sum \limits ^{2}}{1_{3}}}\right \}\text {Bayesian (Spike \& slab prior)}\)

BoomSpikeSlab

BMAC

E M S CV

 

Inclusion probabilities

  
  1. ENC: The baseline penalized regression model. Elastic net with λ optimal =λ ENC derived from cross-validation (CV), Ensembles based on 100 subsamples: PS: Meinshausen & Bühlmann’s algorithm with a single λ optimal =λ MB selected to minimize the expected number of false positives, LS: Single λ optimal =λ ENC with no variable selection, SS: Stability selection across the entire 100 λ∈Λ grid with no variable selection, Ensembles based on 100 resamples: PR, LR, SR: Identical to PS, PR and LR, respectively, with model space constructed through resampling. BMA: Bayesian model averaging with expected model size (EMS) = 1, BMAC: BMA with EMS determined by CV (E M S CV ).