Skip to main content

Table 1 Comparison of the results obtained with different classifiers in a variety of data-sets.

From: Using the information embedded in the testing sample to break the limits caused by the small sample size in microarray-based classification

Data-set

genes

samples

DP

k NN

WV

LDA

SVM

ML-s

ML-d

BRCA1

3226

7 BRCA1-positive

21/22

18/22 (1)

18/22

18/22

18/22

19/22

16/22

  

15 BRCA1-negative

       

BRCA2

3226

8 BRCA2-positive

21/22

21/22 (1)

17/22

19/22

18/22

17/22

17/22

  

14 BRCA2-negative

       

PROS

12600

52 tumor tissue

93/102

90/102 (5)

61/102

92/102

93/102

64/102

50/102

  

50 normal tissue

       

PROS-OUT

12625

8 non-recurrence

15/21

12/21 (1)

12/21

13/21

14/21

13/21

13/21

  

13 recurrence

       

DLBCL-FL

6817

52 DLBCL

74/77

71/77 (7)

63/77

74/77

74/77

65/77

58/77

  

25 FL

       

ALL-AML

6817

27 AML

38/38

37/38 (3)

38/38

38/38

38/38

30/38

27/38

  

11 ALL

       

I-2000

2000

40 tumor colon tissue

61/62

59/62 (3)

58/62

61/62

61/62

59/62

58/62

  

22 normal colon tissue

       
  1. Columns indicate the algorithm used, rows the data-set. In each cell the number in the numerator specifies the number of left-out-samples that has been correctly classified by the corresponding algorithm. The value in the denominator is the total number of samples n. The k NN algorithm has a free parameter that needs to be determined – the number of neighbors k. To allow for a fair comparison, we have optimized this value for each of the databases using cross-validation [12]. The optimal resulting value is specified in parenthesis. In the ML classifier, we consider two cases: those where the two classes are assumed to have the same variance, and those where the variances are assumed to be different. These are referred to as ML-s (same) and ML-d (different).