Skip to main content

Table 4 The classification results of different sampling algorithms

From: CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests

Dataset Method F G-Mean AUC OOB error
1. Circle Original data 0.9081 0.9339 0.9389 0.0296
Random oversampling 0.9249 0.9553 0.9567 0.0163
SMOTE 0.9086 0.9535 0.9579 0.0384
Borderline-SMOTE1 0.9110 0.9534 0.9619 0.0438
Safe-level-SMOTE 0.9146 0.9595 0.9559 0.0431
C-SMOTE 0.9302 0.9713 0.9813 0.0702
k-means-SMOTE 0.9262 0.9589 0.9602 0.0323
CURE-SMOTE 0.9431 0.9808 0.9855 0.0323
2. Blood-transfusion Original data 0.3509 0.5094 0.5083 0.2548
Random oversampling 0.3903 0.5490 0.5449 0.2250
SMOTE 0.4118 0.5798 0.5537 0.2152
Borderline-SMOTE1 0.4185 0.5832 0.5424 0.1630
Safe-level-SMOTE 0.4494 0.6174 0.5549 0.2479
C-SMOTE 0.4006 0.5549 0.5531 0.2418
k-means-SMOTE 0.4157 0.5941 0.5433 0.1872
CURE-SMOTE 0.5393 0.6719 0.6533 0.2531
3. Haberman’s survival Original data 0.3279 0.5018 0.6063 0.3149
Random oversampling 0.3504 0.5178 0.5959 0.1534
SMOTE 0.4350 0.5971 0.6259 0.1728
Borderline-SMOTE1 0.4523 0.6119 0.6298 0.2589
Safe-level-SMOTE 0.4762 0.6008 0.6030 0.3077
C-SMOTE 0.4528 0.5487 0.5656 0.2780
k-means-SMOTE 0.4685 0.6249 0.6328 0.1828
CURE-SMOTE 0.5000 0.6282 0.6940 0.2717
4. Breast–cancer-wisconsin Original data 0.9486 0.9619 0.9491 0.0446
Random oversampling 0.9451 0.9623 0.9620 0.0301
SMOTE 0.9502 0.9666 0.9627 0.0341
Borderline-SMOTE1 0.9506 0.9661 0.9635 0.0379
Safe-level-SMOTE 0.9509 0.9671 0.9638 0.0404
C-SMOTE 0.9491 0.9636 0.9561 0.0380
k-means-SMOTE 0.9449 0.9616 0.9562 0.0373
CURE-SMOTE 0.9511 0.9664 0.9621 0.0427
5. SPECT.train Original data 0.6348 0.6764 0.6579 0.3634
Random oversampling 0.6539 0.6924 0.6753 0.3468
SMOTE 0.6618 0.6990 0.6825 0.3688
Borderline-SMOTE1 0.6710 0.6926 0.6746 0.3489
Safe-level-SMOTE 0.6770 0.7074 0.6913 0.3160
C-SMOTE 0.6564 0.6936 0.6764 0.3448
k-means-SMOTE 0.6796 0.6941 0.6846 0.3599
CURE-SMOTE 0.6855 0.7155 0.6951 0.1108
  1. From the classification results obtained by the different sampling algorithms discussed in Table 4, the best F-value, G-mean and AUC were achieved on the Circle dataset by CURE-SMOTE, and its OOB error is second-best, behind only random sampling. The overall classification result on the blood-transfusion dataset is poorer, but the CURE-SMOTE algorithm achieves the best F-value, G-mean and AUC, while its OOB error is inferior to the original data. On the Haberman's survival dataset, the F-value, G-mean and AUC achieved by CURE-SMOTE are superior to the other sampling algorithms. For the breast-cancer-wisconsin dataset, CURE-SMOTE achieves the best F-value, but its G-mean and AUC are slightly lower, although they are little different from the other sampling algorithms. On the SPECT dataset, CURE-SMOTE surpasses the other sampling algorithms with regard to F-value, G-mean, AUC and OOB error
  2. The best value of every performance evaluation criteria obtained by the algorithms are marked in boldface