Skip to main content

Table 1 Overview of the data sets and the methods used in this study

From: Classification of microarrays; synergistic effects between normalization, gene selection and machine learning

Data set (D)

Classes*

No. of genes**

Alizadeh

DLBCL (68), other samples (65)

7806 (7430)

Finak

Epithelial (34), stromal tissue (32)

33491

Galland

Invasive NFPAs (22), non- invasive NFPAs (18)

40475 (40291)

Herschkowitz

High ER expression (58), low ER expression (46)

19718

Jones

Cancerous samples (72), non-cancerous samples (19)

40233 (39746)

Sørlie

High ER expression (55), low ER expression (18)

8033 (7734)

Ye

Metastatic (65), non-metastatic (22)

8911

Normalization (No)

Description

 

No 0

Raw data

 

No 1

Print-tip MA-loess, no background correction

 

No 2

Print-tip MA-loess, background correction

 

No 3

Global MA-loess, no background correction

 

No 4

Global MA-loess, background correction

 

Gene selection (G)

Fixed parameters

 

T-test

Two-sided

 

Relief

Threshold = 0, nosample = # obs. in data set

 

Paired distance

Euclidian distance

 

Number of genes (N)

2, 12, 22, 32, 42, 52, 62, 72, 82, 92, 100, 200, 300, 400, 150, 500, 600, 700, 800, 900, 1000

 

Machine learning (M)

Description, Fixed parameters

Optimized parameters

DT Gini

Decision tree, Splitting index = Gini

 

DT Information

Decision tree, Splitting index = Information

 

NN One layer

Neural Network, one hidden layer, decay = 0.001, rang = 0.1, maxit = 100

size = [2-5]

NN No layer

Neural Network, no hidden layer, decay = 0.001, rang = 0.1, maxit = 100, skip = TRUE, size = 0

 

SVM Linear

Support Vector Machine, linear kernel, type = nu-scv, cross = 10, nu = 0.2, scaled = FALSE

 

SVM Poly2

Support Vector Machine, polynomial kernel, deg 2, type = nu-scv, cross = 10, nu = 0.2, scaled = FALSE

 

SVM Poly3

Support Vector Machine, polynomial kernel, deg 3, type = nu-scv, cross = 10, nu = 0.2, scaled = FALSE

 

SVM Rb

Support Vector Machine, radial basis kernel, type = nu-scv, cross = 10, nu = 0.2, scaled = FALSE

sigma = [2-14, 214]

  1. Acronyms defined here are used throughout the paper. "Fixed parameters" in the methods were given fixed values, while "Optimized parameters" were optimized in the inner cross validation using a grid search. *The number of samples belonging to each class is given in parenthesis. **Dimensions after background corrected normalization (No 2 and No 4) are given in parenthesis.