Skip to main content

Table 2 Description of the 20 disease expression data sets

From: Random generalized linear model: a highly accurate and interpretable ensemble predictor

Data set Samples Features Reference Data set ID Binary outcome
adenocarcinoma 76 9868 [32] NA most prevalent class vs others
brain 42 5597 [33] NA most prevalent class vs others
breast2 77 4869 [34] NA most prevalent class vs others
breast3 95 4869 [34] NA most prevalent class vs others
colon 62 2000 [35] NA most prevalent class vs others
leukemia 38 3051 [36] NA most prevalent class vs others
lymphoma 62 4026 [37] NA most prevalent class vs others
NCI60 61 5244 [38] NA most prevalent class vs others
prostate 102 6033 [39] NA most prevalent class vs others
srbct 63 2308 [40] NA most prevalent class vs others
BrainTumor2 50 10367 [41] NA Anaplastic oligodendrogliomas vs Glioblastomas
DLBCL 77 5469 [42] NA follicular lymphoma vs diffuse large B-cell lymphoma
lung1 58 10000 [43] GSE10245 Adenocarcinoma vs Squamous cell carcinoma
lung2 46 10000 [44] GSE18842 Adenocarcinoma vs Squamous cell carcinoma
lung3 71 10000 [45] GSE2109 Adenocarcinoma vs Squamous cell carcinoma
psoriasis1 180 10000 [46, 47] GSE13355 lesional vs healthy skin
psoriasis2 82 10000 [48] GSE14905 lesional vs healthy skin
MSstage 26 10000 [49] E-MTAB-69 relapsing vs remitting RRMS
MSdiagnosis1 27 10000 [50] GSE21942 RRMS vs healthy control
MSdiagnosis2 44 10000 [49] E-MTAB-69 RRMS vs healthy control
  1. Sample size, number of features, original reference, data set IDs and outcomes for the 20 disease related gene expression data sets.