Skip to main content

Table 1 Comparison of the results obtained with different classifiers in a variety of data-sets.

From: Using the information embedded in the testing sample to break the limits caused by the small sample size in microarray-based classification

Data-set genes samples DP k NN WV LDA SVM ML-s ML-d
BRCA1 3226 7 BRCA1-positive 21/22 18/22 (1) 18/22 18/22 18/22 19/22 16/22
   15 BRCA1-negative        
BRCA2 3226 8 BRCA2-positive 21/22 21/22 (1) 17/22 19/22 18/22 17/22 17/22
   14 BRCA2-negative        
PROS 12600 52 tumor tissue 93/102 90/102 (5) 61/102 92/102 93/102 64/102 50/102
   50 normal tissue        
PROS-OUT 12625 8 non-recurrence 15/21 12/21 (1) 12/21 13/21 14/21 13/21 13/21
   13 recurrence        
DLBCL-FL 6817 52 DLBCL 74/77 71/77 (7) 63/77 74/77 74/77 65/77 58/77
   25 FL        
ALL-AML 6817 27 AML 38/38 37/38 (3) 38/38 38/38 38/38 30/38 27/38
   11 ALL        
I-2000 2000 40 tumor colon tissue 61/62 59/62 (3) 58/62 61/62 61/62 59/62 58/62
   22 normal colon tissue        
  1. Columns indicate the algorithm used, rows the data-set. In each cell the number in the numerator specifies the number of left-out-samples that has been correctly classified by the corresponding algorithm. The value in the denominator is the total number of samples n. The k NN algorithm has a free parameter that needs to be determined – the number of neighbors k. To allow for a fair comparison, we have optimized this value for each of the databases using cross-validation [12]. The optimal resulting value is specified in parenthesis. In the ML classifier, we consider two cases: those where the two classes are assumed to have the same variance, and those where the variances are assumed to be different. These are referred to as ML-s (same) and ML-d (different).