The effect of oligonucleotide microarray data pre-processing on the analysis of patient-cohort studies

BMC Bioinformatics

Table 6 Performance of different classification algorithms: AML dataset, inv(16) problem. Mean test set error (standard deviation) over 100 random splits of the original data into a training set (90%) and a test set (10%). Error is defined as average error per class, i.e. corresponding to assuming a prior probability of occurrence of a class of 50%. Classifiers were trained for 10, 20, 50, 100, 250, 500 and 1000 probe sets selected by the variation filter; results shown here are for the number of probe sets resulting in the smallest average test set error over the four methods, indicated between brackets after the classifier name.

		MAS	dChip	RMA	GCRMA
Classifier	NC (50)	0.01 (0.02)	0.02 (0.02)	0.01 (0.02)	0.02 (0.03)
(number of probe sets used)	PAM (20)	0.01 (0.02)	0.01 (0.02)	0.01 (0.02)	0.01 (0.02)
	LIKNON (10)	0.02 (0.03)	0.02 (0.03)	0.02 (0.02)	0.02 (0.02)
	k-NN (50)	0.01 (0.02)	0.01 (0.02)	0.01 (0.02)	0.01 (0.02)
	SVC/P (10)	0.02 (0.03)	0.02 (0.02)	0.02 (0.03)	0.02 (0.03)
	SVC/RBF (10)	0.03 (0.02)	0.02 (0.02)	0.02 (0.02)	0.02 (0.02)

ISSN: 1471-2105