Skip to main content

Table 1 Feature selection for lung cancer data

From: Greedy feature selection for glycan chromatography data with the generalized Dirichlet distribution

 

GDFS

CFS

rpart

Predominant glycans(GDFS method)

Peak 1

 

Peak 2

A2

Peak 3

FA2

Peak 4

FA2B, A2[3]G1, A2[6]G1, M5

Peak 5

FA2[3]G1, FA2[6]G1, FA2[3]BG1, FA2[6]BG1

Peak 6

A2G2, A2BG2, A2[3]G1S1, A2[6]G1S1

Peak 7

FA2G2, FA2BG2,FA2[3]G1S1, FA2[6]G1S1

Peak 8

A2G2S1, A2BG2S1

Peak 9

A3G3S2, A3BG3S2, A2F1G2S2

Peak 10

 

Peak 11

 

Peak 12

A3G3S2, A3BG3S2, A2F1G2S2

Peak 13

 

Peak 14

A3F1G3S3

Peak 15

 

Peak 16

 

Peak 17

A4G4LacS4, A4F2G3S4

  1. Features selected from the lung cancer dataset (control vs. cancer cases) by the proposed GDFS method (GDFS), correlation-based feature selection (CFS), and recursive partitioning (rpart). Features that were selected in 90% or more of the cross-validation models are marked with . Also listed are the predominant glycan structures corresponding to each selected peak. Detailed N-glycan composition of human serum was described in [9] and these peaks were also assigned in [23]. Nomenclature has been used according to [9, 50]: all N-glycans have two core GlcNAcs; F at the start of the abbreviation indicates a core fucose α1-6 to inner GlcNAc; Man (x), number (x) of mannose on core GlcNAcs; A(x), number(x) of antenna (GlcNAc) on trimannosyl core; B, bisecting GlcNAc linked β1-4 to β1-3 mannose; F(x), number (x) of fucose linked α1-3 to antenna GlcNAc, G(x), number (x) of galactose on antenna; [3]G1 and [6]G1 indicates that the galactose is on the antenna of the α1-3 or α1-6 mannose; S(x), number of sialic acids on antenna.
  2. Structural assignments of N-glycans to the peaks on a HILIC chromatogram are made using the Glycobase software (http://glycobase.nibrt.ie/glycobase/show_nibrt.action). Campbell et al. [51] provide further details.