Skip to main content

Table 4 Variable selection for prostate cancer data (prostate cancer vs. BPH)

From: Greedy feature selection for glycan chromatography data with the generalized Dirichlet distribution

 

GDFS

CFS

rpart

Predominant glycans (GDFS method)

Peak 1

 

Peak 2

 

Peak 3

 

Peak 4

 

Peak 5

 

Peak 6*

FA2[3]G1, FA2[6]BG1

Peak 7

 

Peak 8

 

Peak 9

 

Peak 10

FA2G2, FA2[6]G1S1, FA2[6]BG1S1

Peak 11

 

Peak 12

 

Peak 13**

A2BG2S1

Peaks 14 - 24

 
  1. Features selected from the prostate cancer dataset (prostate cancer vs. BPH cases) by the proposed GDFS method (GDFS), correlation-based feature selection (CFS), and recursive partitioning (rpart). Features that were selected in 90% more of the cross-validation models are marked with . Also listed are the predominant glycan structures corresponding to each selected peak. Detailed N-glycan composition of human serum was described in Royle et al. [9], and peak 10 was also assigned in Saldova et al. [24]. *Peak 6 was the most commonly identified feature by the rpart method, although it was selected less than 90% of the time. **Peak 13 was selected more than 60% of the time by the GDFS method.