Skip to main content

Table 1 Data sets

From: Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

Name Description Size1 Classes
Alizadeh [13] Diffuse large B-cell lymphoma (DL-BCL) and other lymphoid malignancies (FL and CLL), normal cell samples as well as tissue type cell lines.
http://llmpp.nih.gov/lymphoma,
GEO: GSE60
133 × 7806
(133 × 7430)
DLCL (68) and other (65) samples (FL, CLL, normal cell samples and a variety of cell lines).
Finak [14] Samples from epithelial and striomal tissue from breast reduction tissue from tumor-adjacent normal tissue. Agilent microarrays.
GEO: GSE4823
66 × 33491 epithelial (34) and stromal tissue (32)
Galland [15] 40 non-functioning pituitary adenomas (NFPAs). Agilent microarrays.
arrayExpress: E-TABM-899
40 × 40475
(40 × 40291)
invasive (22) and non-invasive (18)
Herschkowitz [16] Human breast tumor samples (The full study includes 232 samples, but here only 119 samples run on a particular array (GPL1390) are included).
GEO: GSE3165
106 × 19718 ER-status, ER+ (59) and ER- (47)
Jones [17] High-grade lung neuroendocrine tumors.
GEO: GSE1037
91 × 40233
(91 × 39746)
Patients with (72) and without (19) cancer.
Sørlie [18] Human breast carcinomas.
http://genome-www.stanford.edu/breast_cancer/mopo_clinical/data/mopo_clinical.gz.tar
73 × 8033
(73 × 7734)
Clinical ER-status ER+ (55), ER- (18)
Ye [19] Samples from 40 hepatitis B-positive patients with hepatocellular carcinoma (HCC).
GEO: GSE364
87 × 8911 metastatic (P) and non-metastatic (PN) patients. P (65), PN (22)
  1. A summary of the data sets used in this study.
  2. 1Number of samples × Number of genes, the values are after duplicates have been joined and after the filtering with respect to missing values. For those data sets where the resulting dimension is not the same for the background corrected normalization (norm.pt.bkg and norm.glob.bkg) the dimension given in parenthesis is for norm.pt.bkg and norm.glob.bkg.