Skip to main content

Table 1 The datasets used in MAQC-II project.

From: Selecting a single model or combining multiple models for microarray-based classifier development? – A comparative analysis based on large and diverse datasets generated from the MAQC-II project

Endpoint code Endpoint Endpoint description Training set Validation set
    #Sample P/N ratio* #Sample P/N ratio
A Lung tumorigenicity Lung tumorigen vs. non-tumorigen 70 0.59 88 0.47
B Non-genotoxicity Non-genotoxic hepatocarcinogen vs. non-carcinogen 216 0.51 201 0.4
C Liver toxicity Liver toxicants vs. non-toxicants 214 0.58 204 0.62
D Breast cancer Pathologic complete response, pCR 130 0.34 100 0.18
E Breast cancer Estrogen receptor status (ER +/-) 130 1.6 100 1.56
F Multiple myeloma Overall survival 340 0.18 214 0.14
G Multiple myeloma Event-free survival 340 0.33 214 0.19
H Multiple myeloma Male vs. female (positive control) 340 1.33 214 1.89
I Multiple myeloma Random 2-class label (negative control) 340 1.43 214 1.33
J Neuroblastoma Overall survival 238 0.1 177 0.28
K Neuroblastoma Event-free survival 239 0.26 193 0.75
L Neuroblastoma Male vs. female (positive control) 246 1.44 231 1.36
M Neuroblastoma Random 2-class label (negative control) 246 1.44 253 1.36
  1. * P/N = Positive/Negative ratio. Positive denotes for these samples showing the positive results (e.g. cancer, tumor).