Skip to main content

Table 1 Description of datasets

From: Application of an efficient Bayesian discretization method to biomedical data

Dataset Dataset name Type P/D #t #n #V M
1 Alon et al. T D 2 61 6,584 0.651
2 Armstrong et al. T D 3 72 12,582 0.387
3 Beer et al. T P 2 86 5,372 0.795
4 Bhattacharjee et al. T D 7 203 12,600 0.657
5 Bhattacharjee et al. T P 2 69 5,372 0.746
6 Golub et al. T D 2 72 7,129 0.653
7 Hedenfalk et al. T D 2 36 7,464 0.500
8 Iizuka et al. T P 2 60 7,129 0.661
9 Khan et al. T D 4 83 2,308 0.345
10 Nutt et al. T D 4 50 12,625 0.296
11 Pomeroy et al. T D 5 90 7,129 0.642
12 Pomeroy et al. T P 2 60 7,129 0.645
13 Ramaswamy et al. T D 29 280 16,063 0.100
14 Rosenwald et al. T P 2 240 7,399 0.574
15 Staunton et al. T D 9 60 7,129 0.145
16 Shipp et al. T D 2 77 7,129 0.747
17 Su et al. T D 13 174 12,533 0.150
18 Singh et al. T D 2 102 10,510 0.510
19 Veer et al. T P 2 78 24,481 0.562
20 Welsch et al. T D 2 39 7,039 0.878
21 Yeoh et al. T P 2 249 12,625 0.805
22 Petricoin et al. P D 2 322 11,003 0.784
23 Pusztai et al. P D 3 159 11,170 0.364
24 Ranganathan et al. P D 2 52 36,778 0.556
  1. In the Type column, T denotes transcriptomic and P denotes proteomic. In the P/D column, P denotes prognostic and D denotes diagnostic. #t is the number of values of the target variable and #n is the number of instances in the dataset. #V is the number of predictor variables. M is the proportion of the data that has the majority target value.