Skip to main content

Table 1 Datasets.

From: Missing value imputation improves clustering and interpretation of gene expression microarray data

Name

N

M

M C

M F

MV SD

MV

Type

PC1

Brauer05

19

6256

3924

3066

4.0

6.7%

MT

54.9%

Ronen05

26

7070

4916

2695

3.2

3.8%

MT

51.1%

Spahira04A

23

4771

2970

2090

3.9

2.7%

TS

62.0%

Spahira04B

14

4771

3340

2898

4.2

3.0%

TS

54.1%

Hirao03

8

6229

5913

259

0.7

0.9%

SS

43.3%

Yoshimoto02

24

6102

4379

2323

1.9

3.2%

MT

64.7%

Wyrick99

7

6180

6169

3600

0.0

0.0%

TS

61.3%

Spellman98E

14

6075

5766

1094

0.4

0.4%

TS

39.9%

  1. N is the number of measurements (columns in the observation matrix), M is the number of genes (rows), M C is the number of genes without missing values in the complete dataset, M F is the number of genes after applying the filtering, MV SD is the standard deviation of the missing value distribution over the measurements, MV is the percentage of missing values, Type indicates whether the dataset is a time series (TS), steady state (SS), or mixed type (MT), i.e., multiple time series measured under different experimental conditions, and PC1 gives the proportion of total variance explained by the first principal component (high value indicates high correlation structure between the genes [31]).