Skip to main content

Advertisement

Table 4 Computing times.

From: A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data

   no selection univariate selection Multivariate selection (Gini importance) multivariate selection (PLS/PC)
   PLS PC RF PLS PC RF PLS PC RF PLS PC RF
MIR BSE orig 5.7 11.1 9.9 46.4 53.9 46.8 88.8 97.0 91.5 87.9 92.4 88.0
  binned 2.8 3.2 3.1 13.6 14.7 15.9 26.1 27.1 29.0 28.7 29.6 31.5
MIR wine French 8.8 7.8 2.4 26.6 21.8 7.7 47.0 45.9 33.5 17.2 14.7 7.4
  grape 12.1 10.3 2.5 28.9 22.3 8.0 54.0 47.6 33.5 15.8 13.1 6.5
NMR tumor all 0.3 0.4 0.4 1.4 1.2 2.1 2.9 2.7 3.6 3.6 3.4 4.3
  center 0.2 0.2 0.2 1.1 0.8 1.1 2.2 1.9 2.1 2.1 1.8 2.0
NMR candida 1 4.6 8.8 7.7 22.4 41.2 37.1 43.5 62.5 61.1 59.8 78.4 75.4
  2 3.7 4.8 3.8 18.0 22.0 19.4 34.5 38.5 37.3 36.3 40.3 37.9
  3 3.7 4.7 3.7 17.4 20.1 17.9 33.4 36.0 34.7 34.6 37.8 35.1
  4 3.9 5.1 4.8 18.7 23.4 24.3 36.0 40.5 60.5 41.6 46.2 47.0
  5 3.5 3.9 2.6 31.9 32.4 27.0 62.6 63.0 60.0 58.3 43.4 38.5
  1. The table reports the runtime for the different feature selection and classification approaches, and the different data sets (on a 2 GHz personal computer with 2 GB memory). Values are given in minutes, for a ten-fold cross-validation and with parameterisations as used for the results shown in Tables 2 and 3. For all methods, a univariate feature selection takes about five times as long as a classification of the same data set without feature selection. Both multivariate feature selection approaches require approximately the same amount of time for a given data set and classifier. Their computing time is no more than twice as long as in a recursive feature elimination based on a univariate feature importance measure.