Skip to main content

Table 4 Variable selection for prostate cancer data (prostate cancer vs. BPH)

From: Greedy feature selection for glycan chromatography data with the generalized Dirichlet distribution

  GDFS CFS rpart Predominant glycans (GDFS method)
Peak 1  
Peak 2  
Peak 3  
Peak 4  
Peak 5  
Peak 6* FA2[3]G1, FA2[6]BG1
Peak 7  
Peak 8  
Peak 9  
Peak 10 FA2G2, FA2[6]G1S1, FA2[6]BG1S1
Peak 11  
Peak 12  
Peak 13** A2BG2S1
Peaks 14 - 24  
  1. Features selected from the prostate cancer dataset (prostate cancer vs. BPH cases) by the proposed GDFS method (GDFS), correlation-based feature selection (CFS), and recursive partitioning (rpart). Features that were selected in 90% more of the cross-validation models are marked with . Also listed are the predominant glycan structures corresponding to each selected peak. Detailed N-glycan composition of human serum was described in Royle et al. [9], and peak 10 was also assigned in Saldova et al. [24]. *Peak 6 was the most commonly identified feature by the rpart method, although it was selected less than 90% of the time. **Peak 13 was selected more than 60% of the time by the GDFS method.