Skip to main content

Table 3 Simulation results of biomarkers

From: binomialRF: interpretable combinatoric efficiency of random forests to identify biomarker interactions

Model Precision Recall Test error Model size
3A. Results: 100–2000 features
 AUCRF 0.54 (0.25) 0.74 (0.26) 0.27 (0.1) 8.74 (0.13)
 binomialRF 0.91 (0.13) 0.37 (0.36) 0.33 (0.13) 81.72 (0.08)
 Boruta 0.89 (0.15) 0.41 (0.37) 0.32 (0.13) 63.38 (0.1)
 EFS 0.83 (0.16) 0.69 (0.27) 0.25 (0.1) 8.66 (0.13)
 Perm 0.33 (0.33) 0.82 (0.18) 0.30 (0.09) 59.42 (0.1)
 PIMPa 0.18 (0.36) 0.00 (0.01) 0.35 (0.1) 1.47 (0.11)
 RFE 0.49 (0.35) 0.61 (0.23) 0.3 (0.08) 250.29 (0.09)
 VarSelRF 0.67 (0.24) 0.65 (0.29) 0.27 (0.1) 12.31 (0.12)
 Vita 0.46 (0.28) 0.66 (0.29) 0.28 (0.1) 35.44 (0.1)
 VSURF 0.86 (0.15) 0.44 (0.36) 0.31 (0.12) 40.95 (0.1)
3B. Results: 10,000 features
 AUCRF 0.17 (0.05) 0.33 (0.05) 0.41 (0.05) 215.68 (0.01)
 binomialRF 0.51 (0.12) 0.14 (0.12) 0.41 (0.03) 28.6 (0.03)
 Boruta 0.72 (0.18) 0.03 (0.18) 0.47 (0.01) 4.68 (0.02)
 Perm 0.02 (0) 0.82 (0) 0.46 (0.03) 4958.26 (0.03)
 RFE 0.03 (0) 0.66 (0) 0.44 (0.04) 1950.11 (0.02)
 Vita 0.03 (0) 0.52 (0) 0.45 (0.05) 1954.32 (0.02)
  1. The binomialRF and the algorithms in Table 1 were tested across a range of simulation scenarios (Table 6). Mean (standard deviation) results are shown and ranked according to decreasing F1-score. In 3A, the results for all techniques are shown up to 2000 features. In 3B, the results are shown for a limited simulation scenario with 10,000 features and 100 seeded genes. Only a subset of methods are presented in 3B as the remaining were either unable to process 10,000 features (i.e., induced memory errors) or introduced rate-limiting computational challenges (see Fig. 1). Across both tables, Boruta and binomialRF attain the highest precisions, while PERM the highest recall. More studies are required in high dimensional scenarios to better understand each technique’s behavior. Top accuracies are bolded
  2. aAcross many runs – the PIMP algorithm resulted in no gene predictions, despite running them using their default parameters, resulting in these low precision and recall values. We varied the parameters with no additional success – so we report these results with an asterisk to note they warrant further investigation