p
| GDFS | CFS | rpart |
---|
10 | 3.29 | (0) | 2.065 | (0) | 0.02 | (2) |
20 | 52.246 | (0) | 11.527 | (2) | 0.023 | (6) |
30 | 113.702 | (0) | 23.395 | (5) | 0.029 | (9) |
40 | 249.751 | (2) | 30.866 | (8) | 0.038 | (12) |
50 | 498.445 | (1) | 83.885 | (10) | 0.043 | (16) |
60 | 609.841 | (0) | 415.525 | (4) | 0.05 | (19) |
70 | 962.695 | (2) | 828.434 | (3) | 0.083 | (22) |
80 | 1902.347 | (0) | 696.083 | (10) | 0.068 | (26) |
90 | 1516.234 | (1) | 1286.167 | (9) | 0.078 | (28) |
100 | 2059.3 | (1) | 812.16 | (17) | 0.096 | (31) |
- Time taken (in seconds) to carry out feature selection using a greedy search approach for simulated datasets of increasing data dimensionality, p. Tabulated alongside are the run times for CFS, and rpart corresponding to these same datasets. Data were simulated from ordinary Dirichlet distributions across two groups, with 100 observations in each group. Approximately one third of the variables were set to differ between groups (“grouping variables”). Reported in brackets beside the run times are the number of discrepancies between the true set of grouping variables and the selected feature set. The number of discrepancies was calculated as the sum of the number of variables that were incorrectly selected as features and the number of true grouping variables that were not selected by the selection algorithm.