Skip to main content

Table 1 Description of datasets

From: Application of an efficient Bayesian discretization method to biomedical data

Dataset

Dataset name

Type

P/D

#t

#n

#V

M

1

Alon et al.

T

D

2

61

6,584

0.651

2

Armstrong et al.

T

D

3

72

12,582

0.387

3

Beer et al.

T

P

2

86

5,372

0.795

4

Bhattacharjee et al.

T

D

7

203

12,600

0.657

5

Bhattacharjee et al.

T

P

2

69

5,372

0.746

6

Golub et al.

T

D

2

72

7,129

0.653

7

Hedenfalk et al.

T

D

2

36

7,464

0.500

8

Iizuka et al.

T

P

2

60

7,129

0.661

9

Khan et al.

T

D

4

83

2,308

0.345

10

Nutt et al.

T

D

4

50

12,625

0.296

11

Pomeroy et al.

T

D

5

90

7,129

0.642

12

Pomeroy et al.

T

P

2

60

7,129

0.645

13

Ramaswamy et al.

T

D

29

280

16,063

0.100

14

Rosenwald et al.

T

P

2

240

7,399

0.574

15

Staunton et al.

T

D

9

60

7,129

0.145

16

Shipp et al.

T

D

2

77

7,129

0.747

17

Su et al.

T

D

13

174

12,533

0.150

18

Singh et al.

T

D

2

102

10,510

0.510

19

Veer et al.

T

P

2

78

24,481

0.562

20

Welsch et al.

T

D

2

39

7,039

0.878

21

Yeoh et al.

T

P

2

249

12,625

0.805

22

Petricoin et al.

P

D

2

322

11,003

0.784

23

Pusztai et al.

P

D

3

159

11,170

0.364

24

Ranganathan et al.

P

D

2

52

36,778

0.556

  1. In the Type column, T denotes transcriptomic and P denotes proteomic. In the P/D column, P denotes prognostic and D denotes diagnostic. #t is the number of values of the target variable and #n is the number of instances in the dataset. #V is the number of predictor variables. M is the proportion of the data that has the majority target value.