Skip to main content

Table 1 Data sets

From: Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

Name

Description

Size1

Classes

Alizadeh [13]

Diffuse large B-cell lymphoma (DL-BCL) and other lymphoid malignancies (FL and CLL), normal cell samples as well as tissue type cell lines.

http://llmpp.nih.gov/lymphoma,

GEO: GSE60

133 × 7806

(133 × 7430)

DLCL (68) and other (65) samples (FL, CLL, normal cell samples and a variety of cell lines).

Finak [14]

Samples from epithelial and striomal tissue from breast reduction tissue from tumor-adjacent normal tissue. Agilent microarrays.

GEO: GSE4823

66 × 33491

epithelial (34) and stromal tissue (32)

Galland [15]

40 non-functioning pituitary adenomas (NFPAs). Agilent microarrays.

arrayExpress: E-TABM-899

40 × 40475

(40 × 40291)

invasive (22) and non-invasive (18)

Herschkowitz [16]

Human breast tumor samples (The full study includes 232 samples, but here only 119 samples run on a particular array (GPL1390) are included).

GEO: GSE3165

106 × 19718

ER-status, ER+ (59) and ER- (47)

Jones [17]

High-grade lung neuroendocrine tumors.

GEO: GSE1037

91 × 40233

(91 × 39746)

Patients with (72) and without (19) cancer.

Sørlie [18]

Human breast carcinomas.

http://genome-www.stanford.edu/breast_cancer/mopo_clinical/data/mopo_clinical.gz.tar

73 × 8033

(73 × 7734)

Clinical ER-status ER+ (55), ER- (18)

Ye [19]

Samples from 40 hepatitis B-positive patients with hepatocellular carcinoma (HCC).

GEO: GSE364

87 × 8911

metastatic (P) and non-metastatic (PN) patients. P (65), PN (22)

  1. A summary of the data sets used in this study.
  2. 1Number of samples × Number of genes, the values are after duplicates have been joined and after the filtering with respect to missing values. For those data sets where the resulting dimension is not the same for the background corrected normalization (norm.pt.bkg and norm.glob.bkg) the dimension given in parenthesis is for norm.pt.bkg and norm.glob.bkg.