Skip to main content

Table 1 Feature selection for lung cancer data

From: Greedy feature selection for glycan chromatography data with the generalized Dirichlet distribution

  GDFS CFS rpart Predominant glycans(GDFS method)
Peak 1  
Peak 2 A2
Peak 3 FA2
Peak 4 FA2B, A2[3]G1, A2[6]G1, M5
Peak 5 FA2[3]G1, FA2[6]G1, FA2[3]BG1, FA2[6]BG1
Peak 6 A2G2, A2BG2, A2[3]G1S1, A2[6]G1S1
Peak 7 FA2G2, FA2BG2,FA2[3]G1S1, FA2[6]G1S1
Peak 8 A2G2S1, A2BG2S1
Peak 9 A3G3S2, A3BG3S2, A2F1G2S2
Peak 10  
Peak 11  
Peak 12 A3G3S2, A3BG3S2, A2F1G2S2
Peak 13  
Peak 14 A3F1G3S3
Peak 15  
Peak 16  
Peak 17 A4G4LacS4, A4F2G3S4
  1. Features selected from the lung cancer dataset (control vs. cancer cases) by the proposed GDFS method (GDFS), correlation-based feature selection (CFS), and recursive partitioning (rpart). Features that were selected in 90% or more of the cross-validation models are marked with . Also listed are the predominant glycan structures corresponding to each selected peak. Detailed N-glycan composition of human serum was described in [9] and these peaks were also assigned in [23]. Nomenclature has been used according to [9, 50]: all N-glycans have two core GlcNAcs; F at the start of the abbreviation indicates a core fucose α1-6 to inner GlcNAc; Man (x), number (x) of mannose on core GlcNAcs; A(x), number(x) of antenna (GlcNAc) on trimannosyl core; B, bisecting GlcNAc linked β1-4 to β1-3 mannose; F(x), number (x) of fucose linked α1-3 to antenna GlcNAc, G(x), number (x) of galactose on antenna; [3]G1 and [6]G1 indicates that the galactose is on the antenna of the α1-3 or α1-6 mannose; S(x), number of sialic acids on antenna.
  2. Structural assignments of N-glycans to the peaks on a HILIC chromatogram are made using the Glycobase software (http://glycobase.nibrt.ie/glycobase/show_nibrt.action). Campbell et al. [51] provide further details.