Skip to main content

Table 1 Cancer datasets with missing values

From: Impact of missing data imputation methods on gene expression clustering and classification

     

Original data

  

MV Filtering

Dataset

Tissue

No. classes

Size of classes

No. samples

No. genes

% MV

% Genes with MV

No. genes

% MV

% Genes with MV

alizadeh-2000-v1

Blood

2

21, 21

42

4022

3.25

49.30

3678

2.15

44.56

alizadeh-2000-v2

Blood

3

42, 9, 11

62

4022

4.59

66.93

3369

2.75

60.52

alizadeh-2000-v3

Blood

4

21, 21, 9, 11

62

4022

4.59

66.93

3369

2.75

60.52

bredel-2005

Brain

3

31, 14, 5

179

41472

7.57

43.06

19200

3.25

30.56

chen-2002

Liver

2

104, 75

66

24192

6.04

88.46

22336

2.18

85.46

garber-2001

Lung

4

17, 40,4, 5

110

24192

3.87

67.81

36663

2.23

65.14

lapointe-2004-v1

Prostate

3

11, 39, 19

69

42640

4.56

73.57

35265

2.10

69.26

lapointe-2004-v2

Prostate

4

11, 39, 19, 41

110

42640

4.93

67.16

36663

2.23

60.29

liang-2005

Brain

3

28, 6, 3

37

42640

4.56

73.57

22923

0.82

23.16

risinger-2003

Endometrium

4

13, 3, 19, 7

42

24192

7.97

74.33

8366

0.76

20.76

tomlins-2006

Prostate

5

27, 20, 32, 13, 12

104

8872

4.46

89.34

9936

3.27

80.94

tomlins-2006-v2

Prostate

4

27, 20, 32, 13

92

20001

4.04

84.23

10048

3.34

79.72

Mean

23575

5.04

70.39

17651

2.32

56.74