Skip to main content

Table 1 Summary of the 17 binary classification datasets used in this study

From: McTwo: a two-step feature selection algorithm based on maximal information coefficient

ID Dataset Samples Features Summary
1 DLBCL 77 7129 DLBCL patients (58) and follicular lymphoma (19)
2 Pros (Prostate) 102 12625 prostate (52) and non-prostate (50)
3 Colon 62 2000 tumour (40) and normal (22)
4 Leuk (Leukaemia) 72 7129 ALL (47) and AML (25)
5 Mye (Myeloma) 173 12625 presence (137) and absence (36) of focallesions of bone
6 ALL1 128 12625 B-cell (95) and T-cell (33)
7 ALL2 100 12625 Patients that did (65) and did not (35) relapse
8 ALL3 125 12625 with (24) and without (101) multidrug resistance
9 ALL4 93 12625 with (26) and without (67) the t(9;22) chromosome translocation
10 CNS 60 7129 medulloblastoma survivors (39) and treatment failures (21)
11 Lym (Lymphoma) 45 4026 germinalcentre (22) and activated B-like DLBCL (23)
12 Adeno (Adenoma) 36 7457 colon adenocarcinoma (18) and normal (18)
13 Gas (Gastric) 65 22645 tumors (29) and non-malignants (36)
14 Gas1 (Gastric1) 144 22283 non-cardia (72) of gastric and normal (72)
15 Gas2 (Gastric2) 124 22283 cardia (62) of gastric and normal (62)
16 T1D 101 54675 T1D (57) and healthy control (44)
17 Stroke 40 54675 ischemic stroke (20) and control (20)
  1. Column “Dataset” gives the dataset names that will be used throughout this manuscript. Columns “Samples” and “Features” are the numbers of samples and features in this dataset, respectively. Column “Summary” describes the two sample classes, and the sample number in each class is given in the parenthesis. Details of the dataset and the original study may be found in the references listed in the column “Reference”