Skip to main content

Table 6 Gathered p-values from statistically testing the superiority of DI2 with respect to predictive accuracy against alternative discretization procedures, and original data, using one-tailed paired t-test and considering 5 categories per variable (complementary information in Additional file 3)

From: DI2: prior-free and multi-item discretization of biological data and its applications

 

DI2 (single)

DI2 (single, optimized)

K-means

Quantile

Uniform

Original

K-means

Quantile

Uniform

Original

Naïve Bayes

0.686

0.897

0.005

0.719

0.287

0.431

0.002

0.325

Random Forest

0.404

0.921

0.101

0.998

0.126

0.653

0.016

0.998

SMO

0.980

0.968

0.014

0.456

0.790

0.773

0.017

0.441

C4.5

0.500

0.345

0.044

0.965

0.230

0.194

0.013

0.891

MLRM

0.500

0.907

0.009

0.803

0.316

0.821

0.013

0.588

FleBiC

0.001

0.007

1.9E−08

2.1E−05

1.0E−04

6.7E−09

FleBiC Hybrid

5.4E−04

0.693

5.2E−05

0.030

0.873

2.0E−04

 

DI2 (whole)

DI2 (whole, optimized)

K-means

Quantile

Uniform

Original

K-means

Quantile

Uniform

Original

Naïve Bayes

0.948

0.991

0.020

0.965

0.662

0.822

0.004

0.712

Random Forest

0.066

0.426

0.012

0.992

0.074

0.666

0.195

0.999

SMO

0.906

0.914

0.042

0.641

0.805

0.813

0.026

0.406

C4.5

0.085

0.072

0.004

0.702

0.687

0.500

0.028

0.958

MLRM

0.952

0.986

0.148

0.993

0.721

0.896

0.047

0.942

 

DI2 (borders, single)

DI2 (borders, single, optimized)

K-means

Quantile

Uniform

Original

K-means

Quantile

Uniform

Original

FleBiC

8.0E−05

7.3E−05

1.5E−08

0.002

0.016

9.1E−08

FleBiC Hybrid

1.4E−05

0.001

4.3E−06

6.1E−04

0.084

1.0E−04

  1. DI2 is assessed without and with border values, single column and whole dataset, and in the absence and presence of outlier removal
  2. Bold values indicate that the accuracy achieved using DI2 discretization is statistically superior against the corresponding discretization