Skip to main content

Table 10 Computational efficiency of the GDFS method compared with CFS and rpart

From: Greedy feature selection for glycan chromatography data with the generalized Dirichlet distribution

p

GDFS

CFS

rpart

10

3.29

(0)

2.065

(0)

0.02

(2)

20

52.246

(0)

11.527

(2)

0.023

(6)

30

113.702

(0)

23.395

(5)

0.029

(9)

40

249.751

(2)

30.866

(8)

0.038

(12)

50

498.445

(1)

83.885

(10)

0.043

(16)

60

609.841

(0)

415.525

(4)

0.05

(19)

70

962.695

(2)

828.434

(3)

0.083

(22)

80

1902.347

(0)

696.083

(10)

0.068

(26)

90

1516.234

(1)

1286.167

(9)

0.078

(28)

100

2059.3

(1)

812.16

(17)

0.096

(31)

  1. Time taken (in seconds) to carry out feature selection using a greedy search approach for simulated datasets of increasing data dimensionality, p. Tabulated alongside are the run times for CFS, and rpart corresponding to these same datasets. Data were simulated from ordinary Dirichlet distributions across two groups, with 100 observations in each group. Approximately one third of the variables were set to differ between groups (“grouping variables”). Reported in brackets beside the run times are the number of discrepancies between the true set of grouping variables and the selected feature set. The number of discrepancies was calculated as the sum of the number of variables that were incorrectly selected as features and the number of true grouping variables that were not selected by the selection algorithm.