Skip to main content
Figure 8 | BMC Bioinformatics

Figure 8

From: A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data

Figure 8

The effect of different noise processes on the performance of the feature selection methods in the synthetic bivariate classification problem illustrated in Fig. 1. In the left column feature vectors are extended by a random variable scaled by S, in the right column a random offset of size S is added to the feature vectors. Top row: classification accuracy of the synthetic two-class problem (as in Fig. 7, for comparison); second row: multivariate Gini importance, bottom row: p-values of univariate t-test. The black lines correspond to the values of the two features spanning the bivariate classification task (Fig. 1), the blue dotted line corresponds to the third feature in the synthetic data set, the random variable. The performance of the random forest remains nearly unchanged even under the presence of a strong source of "local" noise for high values of S.

Back to article page