Skip to main content

Table 2 Description of the 20 disease expression data sets

From: Random generalized linear model: a highly accurate and interpretable ensemble predictor

Data set

Samples

Features

Reference

Data set ID

Binary outcome

adenocarcinoma

76

9868

[32]

NA

most prevalent class vs others

brain

42

5597

[33]

NA

most prevalent class vs others

breast2

77

4869

[34]

NA

most prevalent class vs others

breast3

95

4869

[34]

NA

most prevalent class vs others

colon

62

2000

[35]

NA

most prevalent class vs others

leukemia

38

3051

[36]

NA

most prevalent class vs others

lymphoma

62

4026

[37]

NA

most prevalent class vs others

NCI60

61

5244

[38]

NA

most prevalent class vs others

prostate

102

6033

[39]

NA

most prevalent class vs others

srbct

63

2308

[40]

NA

most prevalent class vs others

BrainTumor2

50

10367

[41]

NA

Anaplastic oligodendrogliomas vs Glioblastomas

DLBCL

77

5469

[42]

NA

follicular lymphoma vs diffuse large B-cell lymphoma

lung1

58

10000

[43]

GSE10245

Adenocarcinoma vs Squamous cell carcinoma

lung2

46

10000

[44]

GSE18842

Adenocarcinoma vs Squamous cell carcinoma

lung3

71

10000

[45]

GSE2109

Adenocarcinoma vs Squamous cell carcinoma

psoriasis1

180

10000

[46, 47]

GSE13355

lesional vs healthy skin

psoriasis2

82

10000

[48]

GSE14905

lesional vs healthy skin

MSstage

26

10000

[49]

E-MTAB-69

relapsing vs remitting RRMS

MSdiagnosis1

27

10000

[50]

GSE21942

RRMS vs healthy control

MSdiagnosis2

44

10000

[49]

E-MTAB-69

RRMS vs healthy control

  1. Sample size, number of features, original reference, data set IDs and outcomes for the 20 disease related gene expression data sets.