RGLM thinning versus RF thinning. This figure compares the thinned RGLM with the thinned RF in (A) the 20 disease related data sets and (B) the 700 gene expression traits. Numbers that connect dashed lines are RGLM thinning thresholds. For a pre-specified threshold, the number of features used for a thinned random forest is matched with that for the thinned RGLM (except for a threshold of 0). The x−a x i s (log-scaled) and the y−a x i s report the median number of genes left for prediction and the median accuracy across data sets, respectively. The Wilcoxon signed rank test was used to test whether the median accuracy of the thinned RGLM equals that of the thinned RF. Note that the thinned RGLM consistently yields higher accuracies than the thinned RF (according to the 2-sided test p-values).