Skip to main content

Table 1 The datasets used in MAQC-II project.

From: Selecting a single model or combining multiple models for microarray-based classifier development? – A comparative analysis based on large and diverse datasets generated from the MAQC-II project

Endpoint code

Endpoint

Endpoint description

Training set

Validation set

   

#Sample

P/N ratio*

#Sample

P/N ratio

A

Lung tumorigenicity

Lung tumorigen vs. non-tumorigen

70

0.59

88

0.47

B

Non-genotoxicity

Non-genotoxic hepatocarcinogen vs. non-carcinogen

216

0.51

201

0.4

C

Liver toxicity

Liver toxicants vs. non-toxicants

214

0.58

204

0.62

D

Breast cancer

Pathologic complete response, pCR

130

0.34

100

0.18

E

Breast cancer

Estrogen receptor status (ER +/-)

130

1.6

100

1.56

F

Multiple myeloma

Overall survival

340

0.18

214

0.14

G

Multiple myeloma

Event-free survival

340

0.33

214

0.19

H

Multiple myeloma

Male vs. female (positive control)

340

1.33

214

1.89

I

Multiple myeloma

Random 2-class label (negative control)

340

1.43

214

1.33

J

Neuroblastoma

Overall survival

238

0.1

177

0.28

K

Neuroblastoma

Event-free survival

239

0.26

193

0.75

L

Neuroblastoma

Male vs. female (positive control)

246

1.44

231

1.36

M

Neuroblastoma

Random 2-class label (negative control)

246

1.44

253

1.36

  1. * P/N = Positive/Negative ratio. Positive denotes for these samples showing the positive results (e.g. cancer, tumor).