Skip to main content

Table 1 Cancer datasets with missing values

From: Impact of missing data imputation methods on gene expression clustering and classification

      Original data    MV Filtering
Dataset Tissue No. classes Size of classes No. samples No. genes % MV % Genes with MV No. genes % MV % Genes with MV
alizadeh-2000-v1 Blood 2 21, 21 42 4022 3.25 49.30 3678 2.15 44.56
alizadeh-2000-v2 Blood 3 42, 9, 11 62 4022 4.59 66.93 3369 2.75 60.52
alizadeh-2000-v3 Blood 4 21, 21, 9, 11 62 4022 4.59 66.93 3369 2.75 60.52
bredel-2005 Brain 3 31, 14, 5 179 41472 7.57 43.06 19200 3.25 30.56
chen-2002 Liver 2 104, 75 66 24192 6.04 88.46 22336 2.18 85.46
garber-2001 Lung 4 17, 40,4, 5 110 24192 3.87 67.81 36663 2.23 65.14
lapointe-2004-v1 Prostate 3 11, 39, 19 69 42640 4.56 73.57 35265 2.10 69.26
lapointe-2004-v2 Prostate 4 11, 39, 19, 41 110 42640 4.93 67.16 36663 2.23 60.29
liang-2005 Brain 3 28, 6, 3 37 42640 4.56 73.57 22923 0.82 23.16
risinger-2003 Endometrium 4 13, 3, 19, 7 42 24192 7.97 74.33 8366 0.76 20.76
tomlins-2006 Prostate 5 27, 20, 32, 13, 12 104 8872 4.46 89.34 9936 3.27 80.94
tomlins-2006-v2 Prostate 4 27, 20, 32, 13 92 20001 4.04 84.23 10048 3.34 79.72
Mean 23575 5.04 70.39 17651 2.32 56.74