Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction

Shi, Ping; Ray, Surajit; Zhu, Qifu; Kon, Mark A

doi:10.1186/1471-2105-12-375

BMC Bioinformatics

Table 2 Comparison of various classifiers in structural variants of Data-I and Data-II

From: Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction

A. Data-I of fixed variance vs. random variance with abundant signal genes
Data	Data structure				Classification error rate on the test set (%)
	Signal genes	Variance	Correlation ρ	Signal vector	TSP	k -TSP	SVM	k -TSP + SVM	Fisher + SVM	RFE + SVM
Data -I	10%	Fixed unit	0	μ ₃	39.2 ± 1.1	32.4 ± 0.9	24.1 ± 1.0	27.0 ± 1.1	26.5 ± 1.0	25.8 ± 1.1
Data -I	10%	Fixed unit	0.45	μ ₃	34.0 ± 1.0	21.7 ± 0.8	21.4 ± 0.9	15.8 ± 0.9	21.8 ± 1.0	21.0 ± 1.0
Data -I	10%	Fixed unit	0.6	μ ₃	31.0 ± 1.1	13.9 ± 1.0	20.6 ± 0.9	10.0 ± 0.8	21.9 ± 1.4	17.3 ± 1.1
Data -Ib	10%	Inverse gamma	0	μ ₃	26.1 ± 1.2	19.1 ± 1.1	26.6 ± 1.1	12.1 ± 0.6	12.4 ± 0.6	22.5 ± 0.8
Data -Ib	10%	Inverse gamma	0.45	μ ₃	18.0 ± 1.0	7.0 ± 0.5	23.7 ± 1.0	3.4 ± 0.5	5.4 ± 0.5	9.6 ± 1.0
Data -Ib	10%	Inverse gamma	0.6	μ ₃	15.8 ± 0.9	5.3 ± 0.5	23.8 ± 1.0	1.6 ± 0.4	4.2 ± 0.6	5.4 ± 0.7
B. Data-I of stronger signal vs. weak signal with sparse signal genes
Data	Data structure				Classification error rate on the test set (%)
	Signal genes	Variance	Correlation ρ	Signal vector	TSP	k -TSP	SVM	k -TSP + SVM	Fisher + SVM	RFE + SVM
Data -Ic	1%	Fixed unit	0	μ ₃	46.5 ± 1.1	49.4 ± 0.9	48.3 ± 1.0	47.8 ± 1.2	47.0 ± 1.1	46.8 ± 1.2
Data -Ic	1%	Fixed unit	0.45	μ ₃	44.1 ± 1.2	44.7 ± 0.9	45.8 ± 1.0	43.1 ± 1.0	45.6 ± 1.2	45.0 ± 1.2
Data -Ic	1%	Fixed unit	0.6	μ ₃	38.1 ± 1.5	43.2 ± 1.2	48.0 ± 1.1	40.3 ± 1.2	46.9 ± 1.2	41.7 ± 1.5
Data -Id	1%	Fixed unit	0	μ _3b	43.5 ± 1.4	44.9 ± 1.1	43.7 ± 1.0	42.2 ± 1.3	39.9 ± 1.1	41.0 ± 1.0
Data -Id	1%	Fixed unit	0.45	μ _3b	34.8 ± 1.2	36.8 ± 1.2	42.6 ± 0.9	30.4 ± 1.3	40.0 ± 1.2	35.0 ± 1.2
Data -Id	1%	Fixed unit	0.6	μ _3b	30.4 ± 1.2	33.8 ± 1.4	40.8 ± 1.1	23.0 ± 1.3	38.1 ± 1.2	30.1 ± 1.3
C. Data-II with independent blocks of signal genes vs. correlated blocks of signal genes
Data	Data structure				Classification error rate on the test set (%)
	Signal genes	Variance	Within-corr ρ	Inter-corr ρ'	TSP	k -TSP	SVM	k -TSP + SVM	Fisher + SVM	RFE + SVM
Data-IIb	10%	Fixed unit	0.6	0	42.5 ± 1.1	34.7 ± 1.1	34.6 ± 1.1	37.9 ± 1.2	38.9 ± 1.0	37.6 ± 1.3
Data-IIb	10%	Fixed unit	0.6	0.5	33.4 ± 0.9	22.9 ± 0.9	26.2 ± 0.8	24.2 ± 0.9	30.6 ± 1.3	28.5 ± 0.9

The classification error rates (mean ± SE) of various classifiers as correlation varies among signal genes in A) Data-I of fixed variance vs. random variance when signal genes are abundant (10%); B) Data-I of strong signal vs. weaker signal when signal genes are sparse (1%); and C) Data-II of independent blocks vs. correlated blocks. The lowest error rates for each dataset are indicated in bolded.

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com