Skip to main content

Table 6 Gathered p-values from statistically testing the superiority of DI2 with respect to predictive accuracy against alternative discretization procedures, and original data, using one-tailed paired t-test and considering 5 categories per variable (complementary information in Additional file 3)

From: DI2: prior-free and multi-item discretization of biological data and its applications

  DI2 (single) DI2 (single, optimized)
K-means Quantile Uniform Original K-means Quantile Uniform Original
Naïve Bayes 0.686 0.897 0.005 0.719 0.287 0.431 0.002 0.325
Random Forest 0.404 0.921 0.101 0.998 0.126 0.653 0.016 0.998
SMO 0.980 0.968 0.014 0.456 0.790 0.773 0.017 0.441
C4.5 0.500 0.345 0.044 0.965 0.230 0.194 0.013 0.891
MLRM 0.500 0.907 0.009 0.803 0.316 0.821 0.013 0.588
FleBiC 0.001 0.007 1.9E−08 2.1E−05 1.0E−04 6.7E−09
FleBiC Hybrid 5.4E−04 0.693 5.2E−05 0.030 0.873 2.0E−04
  DI2 (whole) DI2 (whole, optimized)
K-means Quantile Uniform Original K-means Quantile Uniform Original
Naïve Bayes 0.948 0.991 0.020 0.965 0.662 0.822 0.004 0.712
Random Forest 0.066 0.426 0.012 0.992 0.074 0.666 0.195 0.999
SMO 0.906 0.914 0.042 0.641 0.805 0.813 0.026 0.406
C4.5 0.085 0.072 0.004 0.702 0.687 0.500 0.028 0.958
MLRM 0.952 0.986 0.148 0.993 0.721 0.896 0.047 0.942
  DI2 (borders, single) DI2 (borders, single, optimized)
K-means Quantile Uniform Original K-means Quantile Uniform Original
FleBiC 8.0E−05 7.3E−05 1.5E−08 0.002 0.016 9.1E−08
FleBiC Hybrid 1.4E−05 0.001 4.3E−06 6.1E−04 0.084 1.0E−04
  1. DI2 is assessed without and with border values, single column and whole dataset, and in the absence and presence of outlier removal
  2. Bold values indicate that the accuracy achieved using DI2 discretization is statistically superior against the corresponding discretization