Skip to main content

Table 4 The classification results of different sampling algorithms

From: CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests

Dataset

Method

F

G-Mean

AUC

OOB error

1. Circle

Original data

0.9081

0.9339

0.9389

0.0296

Random oversampling

0.9249

0.9553

0.9567

0.0163

SMOTE

0.9086

0.9535

0.9579

0.0384

Borderline-SMOTE1

0.9110

0.9534

0.9619

0.0438

Safe-level-SMOTE

0.9146

0.9595

0.9559

0.0431

C-SMOTE

0.9302

0.9713

0.9813

0.0702

k-means-SMOTE

0.9262

0.9589

0.9602

0.0323

CURE-SMOTE

0.9431

0.9808

0.9855

0.0323

2. Blood-transfusion

Original data

0.3509

0.5094

0.5083

0.2548

Random oversampling

0.3903

0.5490

0.5449

0.2250

SMOTE

0.4118

0.5798

0.5537

0.2152

Borderline-SMOTE1

0.4185

0.5832

0.5424

0.1630

Safe-level-SMOTE

0.4494

0.6174

0.5549

0.2479

C-SMOTE

0.4006

0.5549

0.5531

0.2418

k-means-SMOTE

0.4157

0.5941

0.5433

0.1872

CURE-SMOTE

0.5393

0.6719

0.6533

0.2531

3. Haberman’s survival

Original data

0.3279

0.5018

0.6063

0.3149

Random oversampling

0.3504

0.5178

0.5959

0.1534

SMOTE

0.4350

0.5971

0.6259

0.1728

Borderline-SMOTE1

0.4523

0.6119

0.6298

0.2589

Safe-level-SMOTE

0.4762

0.6008

0.6030

0.3077

C-SMOTE

0.4528

0.5487

0.5656

0.2780

k-means-SMOTE

0.4685

0.6249

0.6328

0.1828

CURE-SMOTE

0.5000

0.6282

0.6940

0.2717

4. Breast–cancer-wisconsin

Original data

0.9486

0.9619

0.9491

0.0446

Random oversampling

0.9451

0.9623

0.9620

0.0301

SMOTE

0.9502

0.9666

0.9627

0.0341

Borderline-SMOTE1

0.9506

0.9661

0.9635

0.0379

Safe-level-SMOTE

0.9509

0.9671

0.9638

0.0404

C-SMOTE

0.9491

0.9636

0.9561

0.0380

k-means-SMOTE

0.9449

0.9616

0.9562

0.0373

CURE-SMOTE

0.9511

0.9664

0.9621

0.0427

5. SPECT.train

Original data

0.6348

0.6764

0.6579

0.3634

Random oversampling

0.6539

0.6924

0.6753

0.3468

SMOTE

0.6618

0.6990

0.6825

0.3688

Borderline-SMOTE1

0.6710

0.6926

0.6746

0.3489

Safe-level-SMOTE

0.6770

0.7074

0.6913

0.3160

C-SMOTE

0.6564

0.6936

0.6764

0.3448

k-means-SMOTE

0.6796

0.6941

0.6846

0.3599

CURE-SMOTE

0.6855

0.7155

0.6951

0.1108

  1. From the classification results obtained by the different sampling algorithms discussed in Table 4, the best F-value, G-mean and AUC were achieved on the Circle dataset by CURE-SMOTE, and its OOB error is second-best, behind only random sampling. The overall classification result on the blood-transfusion dataset is poorer, but the CURE-SMOTE algorithm achieves the best F-value, G-mean and AUC, while its OOB error is inferior to the original data. On the Haberman's survival dataset, the F-value, G-mean and AUC achieved by CURE-SMOTE are superior to the other sampling algorithms. For the breast-cancer-wisconsin dataset, CURE-SMOTE achieves the best F-value, but its G-mean and AUC are slightly lower, although they are little different from the other sampling algorithms. On the SPECT dataset, CURE-SMOTE surpasses the other sampling algorithms with regard to F-value, G-mean, AUC and OOB error
  2. The best value of every performance evaluation criteria obtained by the algorithms are marked in boldface