Skip to main content

Table 2 Comparison of various classifiers in structural variants of Data-I and Data-II

From: Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction

A. Data-I of fixed variance vs. random variance with abundant signal genes

Data

Data structure

Classification error rate on the test set (%)

 

Signal genes

Variance

Correlation ρ

Signal vector

TSP

k -TSP

SVM

k -TSP + SVM

Fisher + SVM

RFE +

SVM

Data -I

10%

Fixed unit

0

μ 3

39.2 ± 1.1

32.4 ± 0.9

24.1 ± 1.0

27.0 ± 1.1

26.5 ± 1.0

25.8 ± 1.1

Data -I

10%

Fixed unit

0.45

μ 3

34.0 ± 1.0

21.7 ± 0.8

21.4 ± 0.9

15.8 ± 0.9

21.8 ± 1.0

21.0 ± 1.0

Data -I

10%

Fixed unit

0.6

μ 3

31.0 ± 1.1

13.9 ± 1.0

20.6 ± 0.9

10.0 ± 0.8

21.9 ± 1.4

17.3 ± 1.1

Data -Ib

10%

Inverse gamma

0

μ 3

26.1 ± 1.2

19.1 ± 1.1

26.6 ± 1.1

12.1 ± 0.6

12.4 ± 0.6

22.5 ± 0.8

Data -Ib

10%

Inverse gamma

0.45

μ 3

18.0 ± 1.0

7.0 ± 0.5

23.7 ± 1.0

3.4 ± 0.5

5.4 ± 0.5

9.6 ± 1.0

Data -Ib

10%

Inverse gamma

0.6

μ 3

15.8 ± 0.9

5.3 ± 0.5

23.8 ± 1.0

1.6 ± 0.4

4.2 ± 0.6

5.4 ± 0.7

B. Data-I of stronger signal vs. weak signal with sparse signal genes

Data

Data structure

Classification error rate on the test set (%)

 

Signal genes

Variance

Correlation ρ

Signal vector

TSP

k -TSP

SVM

k -TSP + SVM

Fisher + SVM

RFE +

SVM

Data -Ic

1%

Fixed unit

0

μ 3

46.5 ± 1.1

49.4 ± 0.9

48.3 ± 1.0

47.8 ± 1.2

47.0 ± 1.1

46.8 ± 1.2

Data -Ic

1%

Fixed unit

0.45

μ 3

44.1 ± 1.2

44.7 ± 0.9

45.8 ± 1.0

43.1 ± 1.0

45.6 ± 1.2

45.0 ± 1.2

Data -Ic

1%

Fixed unit

0.6

μ 3

38.1 ± 1.5

43.2 ± 1.2

48.0 ± 1.1

40.3 ± 1.2

46.9 ± 1.2

41.7 ± 1.5

Data -Id

1%

Fixed unit

0

μ 3b

43.5 ± 1.4

44.9 ± 1.1

43.7 ± 1.0

42.2 ± 1.3

39.9 ± 1.1

41.0 ± 1.0

Data -Id

1%

Fixed unit

0.45

μ 3b

34.8 ± 1.2

36.8 ± 1.2

42.6 ± 0.9

30.4 ± 1.3

40.0 ± 1.2

35.0 ± 1.2

Data -Id

1%

Fixed unit

0.6

μ 3b

30.4 ± 1.2

33.8 ± 1.4

40.8 ± 1.1

23.0 ± 1.3

38.1 ± 1.2

30.1 ± 1.3

C. Data-II with independent blocks of signal genes vs. correlated blocks of signal genes

Data

Data structure

Classification error rate on the test set (%)

 

Signal genes

Variance

Within-corr ρ

Inter-corr ρ'

TSP

k -TSP

SVM

k -TSP + SVM

Fisher + SVM

RFE +

SVM

Data-IIb

10%

Fixed unit

0.6

0

42.5 ± 1.1

34.7 ± 1.1

34.6 ± 1.1

37.9 ± 1.2

38.9 ± 1.0

37.6 ± 1.3

Data-IIb

10%

Fixed unit

0.6

0.5

33.4 ± 0.9

22.9 ± 0.9

26.2 ± 0.8

24.2 ± 0.9

30.6 ± 1.3

28.5 ± 0.9

  1. The classification error rates (mean ± SE) of various classifiers as correlation varies among signal genes in A) Data-I of fixed variance vs. random variance when signal genes are abundant (10%); B) Data-I of strong signal vs. weaker signal when signal genes are sparse (1%); and C) Data-II of independent blocks vs. correlated blocks. The lowest error rates for each dataset are indicated in bolded.