Tumor classification and marker gene prediction by feature selection and fuzzy c-means clustering using microarray data

Table 1 Comparison of test error against literature and an independent test. a) The test error of supervised clustering from [5]. b) The test error of weighted voting on leukemia data from [9], on brain tumor data from [16]. c) The test error of support vector machines from [7]. d) The test error of the boosting method on leukemia and NCI data from [6], on colon data from [4]. e) The test error of nearest neighbors on leukemia and NCI data from [6], on colon data from [4]. f) The test error of an independent test, by using the same data set that had been tested on our proposed models with the t-test and Fisher's linear discriminant. Here, NA means that the test error is not available, because we either did not find classification results in the literature (i.e. weighted voting, support vector machines, boosting and nearest neighbors) or the model was not able to perform multiple class classification.

	Leukemia (2 class)	Colon (2 class)	Brain (5 class)	NCI (8 class)
Model one (manual feature selection): mean test error	2.4%	12.27%	10%	24%
Model one (manual feature selection): median test error	4%	13.64%	8.82%	22.73%
Model two (automatic feature selection): mean test error	4%	11.36%	13.53%	22.27%
Model two (automatic feature selection): median test error	4%	11.36%	14.71%	22.73%
a) Supervised clustering	2.62%	15.95%	16.86%	26.5%
b) Weighted voting	4.17%	NA	16.67%	NA
c) Support vector machines	5.88%	9.68%	NA	NA
d) Boosting	2.94%	17.7%	NA	42.86%
e) Nearest neighbors	2.94%	19.4%	NA	42.86%
f) T-test plus Fisher's linear discriminant	4%	18%	NA	NA

ISSN: 1471-2105