Application of an efficient Bayesian discretization method to biomedical data

BMC Bioinformatics

Table 1 Description of datasets

Dataset	Dataset name	Type	P/D	#t	#n	#V	M
1	Alon et al.	T	D	2	61	6,584	0.651
2	Armstrong et al.	T	D	3	72	12,582	0.387
3	Beer et al.	T	P	2	86	5,372	0.795
4	Bhattacharjee et al.	T	D	7	203	12,600	0.657
5	Bhattacharjee et al.	T	P	2	69	5,372	0.746
6	Golub et al.	T	D	2	72	7,129	0.653
7	Hedenfalk et al.	T	D	2	36	7,464	0.500
8	Iizuka et al.	T	P	2	60	7,129	0.661
9	Khan et al.	T	D	4	83	2,308	0.345
10	Nutt et al.	T	D	4	50	12,625	0.296
11	Pomeroy et al.	T	D	5	90	7,129	0.642
12	Pomeroy et al.	T	P	2	60	7,129	0.645
13	Ramaswamy et al.	T	D	29	280	16,063	0.100
14	Rosenwald et al.	T	P	2	240	7,399	0.574
15	Staunton et al.	T	D	9	60	7,129	0.145
16	Shipp et al.	T	D	2	77	7,129	0.747
17	Su et al.	T	D	13	174	12,533	0.150
18	Singh et al.	T	D	2	102	10,510	0.510
19	Veer et al.	T	P	2	78	24,481	0.562
20	Welsch et al.	T	D	2	39	7,039	0.878
21	Yeoh et al.	T	P	2	249	12,625	0.805
22	Petricoin et al.	P	D	2	322	11,003	0.784
23	Pusztai et al.	P	D	3	159	11,170	0.364
24	Ranganathan et al.	P	D	2	52	36,778	0.556

In the Type column, T denotes transcriptomic and P denotes proteomic. In the P/D column, P denotes prognostic and D denotes diagnostic. #t is the number of values of the target variable and #n is the number of instances in the dataset. #V is the number of predictor variables. M is the proportion of the data that has the majority target value.

ISSN: 1471-2105