Skip to main content

Table 2 Comparison of various classifiers in structural variants of Data-I and Data-II

From: Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction

A. Data-I of fixed variance vs. random variance with abundant signal genes
Data Data structure Classification error rate on the test set (%)
  Signal genes Variance Correlation ρ Signal vector TSP k -TSP SVM k -TSP + SVM Fisher + SVM RFE +
SVM
Data -I 10% Fixed unit 0 μ 3 39.2 ± 1.1 32.4 ± 0.9 24.1 ± 1.0 27.0 ± 1.1 26.5 ± 1.0 25.8 ± 1.1
Data -I 10% Fixed unit 0.45 μ 3 34.0 ± 1.0 21.7 ± 0.8 21.4 ± 0.9 15.8 ± 0.9 21.8 ± 1.0 21.0 ± 1.0
Data -I 10% Fixed unit 0.6 μ 3 31.0 ± 1.1 13.9 ± 1.0 20.6 ± 0.9 10.0 ± 0.8 21.9 ± 1.4 17.3 ± 1.1
Data -Ib 10% Inverse gamma 0 μ 3 26.1 ± 1.2 19.1 ± 1.1 26.6 ± 1.1 12.1 ± 0.6 12.4 ± 0.6 22.5 ± 0.8
Data -Ib 10% Inverse gamma 0.45 μ 3 18.0 ± 1.0 7.0 ± 0.5 23.7 ± 1.0 3.4 ± 0.5 5.4 ± 0.5 9.6 ± 1.0
Data -Ib 10% Inverse gamma 0.6 μ 3 15.8 ± 0.9 5.3 ± 0.5 23.8 ± 1.0 1.6 ± 0.4 4.2 ± 0.6 5.4 ± 0.7
B. Data-I of stronger signal vs. weak signal with sparse signal genes
Data Data structure Classification error rate on the test set (%)
  Signal genes Variance Correlation ρ Signal vector TSP k -TSP SVM k -TSP + SVM Fisher + SVM RFE +
SVM
Data -Ic 1% Fixed unit 0 μ 3 46.5 ± 1.1 49.4 ± 0.9 48.3 ± 1.0 47.8 ± 1.2 47.0 ± 1.1 46.8 ± 1.2
Data -Ic 1% Fixed unit 0.45 μ 3 44.1 ± 1.2 44.7 ± 0.9 45.8 ± 1.0 43.1 ± 1.0 45.6 ± 1.2 45.0 ± 1.2
Data -Ic 1% Fixed unit 0.6 μ 3 38.1 ± 1.5 43.2 ± 1.2 48.0 ± 1.1 40.3 ± 1.2 46.9 ± 1.2 41.7 ± 1.5
Data -Id 1% Fixed unit 0 μ 3b 43.5 ± 1.4 44.9 ± 1.1 43.7 ± 1.0 42.2 ± 1.3 39.9 ± 1.1 41.0 ± 1.0
Data -Id 1% Fixed unit 0.45 μ 3b 34.8 ± 1.2 36.8 ± 1.2 42.6 ± 0.9 30.4 ± 1.3 40.0 ± 1.2 35.0 ± 1.2
Data -Id 1% Fixed unit 0.6 μ 3b 30.4 ± 1.2 33.8 ± 1.4 40.8 ± 1.1 23.0 ± 1.3 38.1 ± 1.2 30.1 ± 1.3
C. Data-II with independent blocks of signal genes vs. correlated blocks of signal genes
Data Data structure Classification error rate on the test set (%)
  Signal genes Variance Within-corr ρ Inter-corr ρ' TSP k -TSP SVM k -TSP + SVM Fisher + SVM RFE +
SVM
Data-IIb 10% Fixed unit 0.6 0 42.5 ± 1.1 34.7 ± 1.1 34.6 ± 1.1 37.9 ± 1.2 38.9 ± 1.0 37.6 ± 1.3
Data-IIb 10% Fixed unit 0.6 0.5 33.4 ± 0.9 22.9 ± 0.9 26.2 ± 0.8 24.2 ± 0.9 30.6 ± 1.3 28.5 ± 0.9
  1. The classification error rates (mean ± SE) of various classifiers as correlation varies among signal genes in A) Data-I of fixed variance vs. random variance when signal genes are abundant (10%); B) Data-I of strong signal vs. weaker signal when signal genes are sparse (1%); and C) Data-II of independent blocks vs. correlated blocks. The lowest error rates for each dataset are indicated in bolded.