Property | Consequence of using SMOTE on high-dimensional data |
---|---|
E(SMOTE) = E(X) | Little impact on classifiers that depend on mean values (DLDA); |
| Minority class variability is underestimated; negative impact on classifiers that use class-specific variances (DQDA); inflated statistical significance of statistical tests for comparing classes (t-test); |
d(SMOTE, TEST) < d(X, TEST)d: Euclidean distance | Test samples are classified mostly in the minority class for classifiers based on Euclidean distance (k-NN); variable selection is helpful in reducing this problem; |
cor(SMOTE, X) ≥ 0; cor(SMOTEs, SMOTEt) ≥ 0 | Training set samples are no longer independent; independence of samples is assumed by most classifiers (DLDA, PLR,...) and variable selection methods (t-test, Mann-Whitney,...) |