Skip to main content

Table 6 Comparison of classifier generalization performance using the F230 dataset before and after recursive feature elimination

From: LANDMark: an ensemble approach to the supervised selection of biomarkers in high-throughput sequencing data

 

LANDMark (oracle)

LANDMark (no oracle)

Extra trees

LinearSVC

Logistic regression

Random forest

Ridge regression

SGD (MH)

SGD (SH)

LANDMark (oracle)

0.932 ± 0.005

0.933 ± 0.006

0.001 ± 0.003

− 0.008 ± 0.005

− 0.016 ± 0.007

− 0.005 ± 0.005

− 0.009 ± 0.006

− 0.008 ± 0.004

− 0.017 ± 0.007

− 0.037 ± 0.007*

LANDMark (no oracle)

− 0.001 ± 0.004

0.933 ± 0.006

0.935 ± 0.005

− 0.009 ± 0.005

− 0.016 ± 0.007

− 0.006 ± 0.004

− 0.10 ± 0.006

− 0.008 ± 0.004

− 0.018 ± 0.007

− 0.038 ± 0.007*

Extra trees

0.013 ± 0.006

0.014 ± 0.005

0.924 ± 0.005

0.921 ± 0.005

− 0.008 ± 0.007

0.002 ± 0.005

− 0.002 ± 0.005

0.000 ± 0.005

− 0.009 ± 0.007

− 0.029 ± 0.008*

LinearSVC

0.007 ± 0.006

0.008 ± 0.004

− 0.006 ± 0.007

0.916 ± 0.007

0.926 ± 0.005

0.010 ± 0.006

0.006 ± 0.009

0.008 ± 0.006

− 0.001 ± 0.009

− 0.021 ± 0.008

Logistic regression

0.012 ± 0.005

0.014 ± 0.004

0.000 ± 0.005

0.005 ± 0.006

0.927 ± 0.005

0.921 ± 0.004

− 0.004 ± 0.005

− 0.002 ± 0.004

− 0.012 ± 0.006

− 0.032 ± 0.007*

Random forest

0.011 ± 0.004

0.013 ± 0.005

− 0.001 ± 0.004

0.004 ± 0.007

− 0.001 ± 0.004

0.922 ± 0.006

0.922 ± 0.006

0.002 ± 0.005

− 0.007 ± 0.007

− 0.027 ± 0.010

Ridge regression

0.006 ± 0.004

0.008 ± 0.004

− 0.006 ± 0.005

− 0.001 ± 0.007

− 0.006 ± 0.005

0.005 ± 0.005

0.924 ± 0.005

0.927 ± 0.006

− 0.009 ± 0.006

− 0.029 ± 0.007*

SGD (MH)

0.037 ± 0.009**

0.038 ± 0.008

0.024 ± 0.010

0.030 ± 0.008**

0.025 ± 0.008

0.031 ± 0.008**

0.009 ± 0.006

0.915 ± 0.008

0.896 ± 0.008

− 0.020 ± 0.010

SGD (SH)

0.063 ± 0.012**

0.064 ± 0.011**

0.050 ± 0.010**

0.055 ± 0.012**

0.050 ± 0.011**

0.056 ± 0.010**

0.029 ± 0.007**

0.020 ± 0.010

0.895 ± 0.008

0.871 ± 0.10

  1. Models were trained using data from the F230 amplicon and generalization performance was measured using the balanced accuracy score. The mean performance of each model before and after recursive feature elimination can be found along the main diagonal. The upper triangle reflects the difference of means between each comparison before recursive feature elimination while the bottom triangle reflects differences in means after recursive feature elimination. A single asterisk is used to represent a statistically significant difference (p ≤ 0.05) in generalization performance which favors classifiers along the rows while statistically significant differences favoring the classifiers along the columns are represented using a double asterisk. The mean and standard error are reported