BMC Bioinformatics

Table 6 Gathered p-values from statistically testing the superiority of DI2 with respect to predictive accuracy against alternative discretization procedures, and original data, using one-tailed paired t-test and considering 5 categories per variable (complementary information in Additional file 3)

From: DI2: prior-free and multi-item discretization of biological data and its applications

	DI2 (single)				DI2 (single, optimized)
	K-means	Quantile	Uniform	Original	K-means	Quantile	Uniform	Original
Naïve Bayes	0.686	0.897	0.005	0.719	0.287	0.431	0.002	0.325
Random Forest	0.404	0.921	0.101	0.998	0.126	0.653	0.016	0.998
SMO	0.980	0.968	0.014	0.456	0.790	0.773	0.017	0.441
C4.5	0.500	0.345	0.044	0.965	0.230	0.194	0.013	0.891
MLRM	0.500	0.907	0.009	0.803	0.316	0.821	0.013	0.588
FleBiC	0.001	0.007	1.9E−08	–	2.1E−05	1.0E−04	6.7E−09	–
FleBiC Hybrid	5.4E−04	0.693	5.2E−05	–	0.030	0.873	2.0E−04	–

	DI2 (whole)				DI2 (whole, optimized)
	K-means	Quantile	Uniform	Original	K-means	Quantile	Uniform	Original
Naïve Bayes	0.948	0.991	0.020	0.965	0.662	0.822	0.004	0.712
Random Forest	0.066	0.426	0.012	0.992	0.074	0.666	0.195	0.999
SMO	0.906	0.914	0.042	0.641	0.805	0.813	0.026	0.406
C4.5	0.085	0.072	0.004	0.702	0.687	0.500	0.028	0.958
MLRM	0.952	0.986	0.148	0.993	0.721	0.896	0.047	0.942

	DI2 (borders, single)				DI2 (borders, single, optimized)
	K-means	Quantile	Uniform	Original	K-means	Quantile	Uniform	Original
FleBiC	8.0E−05	7.3E−05	1.5E−08	–	0.002	0.016	9.1E−08	–
FleBiC Hybrid	1.4E−05	0.001	4.3E−06	–	6.1E−04	0.084	1.0E−04	–

DI2 is assessed without and with border values, single column and whole dataset, and in the absence and presence of outlier removal
Bold values indicate that the accuracy achieved using DI2 discretization is statistically superior against the corresponding discretization

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com