Skip to main content

Table 5 Comparison of classifier generalization performance using the BE dataset before and after recursive feature elimination

From: LANDMark: an ensemble approach to the supervised selection of biomarkers in high-throughput sequencing data

 

LANDMark (oracle)

LANDMark (No oracle)

Extra trees

LinearSVC

Logistic regression

Random forest

Ridge regression

SGD (MH)

SGD (SH)

LANDMark (oracle)

0.971 ± 0.002

0.963 ± 0.004

0.001 ± 0.001

− 0.005 ± 0.004

0.002 ± 0.004

− 0.013 ± 0.004

− 0.008 ± 0.004

− 0.006 ± 0.002

− 0.026 ± 0.006*

− 0.018 ± 0.006

LANDMark (no oracle)

0.003 ± 0.002

0.971 ± 0.004

0.960 ± 0.004

− 0.006 ± 0.004

− 0.003 ± 0.004

− 0.013 ± 0.004

− 0.009 ± 0.004

− 0.007 ± 0.002

− 0.027 ± 0.006*

− 0.023 ± 0.005*

Extra trees

0.009 ± 0.004

0.006 ± 0.004

0.966 ± 0.004

0.954 ± 0.005

0.003 ± 0.004

− 0.008 ± 0.004

− 0.004 ± 0.003

− 0.002 ± 0.004

− 0.022 ± 0.005*

− 0.023 ± 0.005*

LinearSVC

0.016 ± 0.003**

− 0.013 ± 0.004**

0.007 ± 0.005

0.968 ± 0.003

0.947 ± 0.004

− 0.0118 ± 0.004

0.006 ± 0.005

− 0.004 ± 0.004

− 0.024 ± 0.006*

− 0.021 ± 0.005*

Logistic regression

0.017 ± 0.005**

0.014 ± 0.004**

0.008 ± 0.005

0.001 ± 0.005

0.958 ± 0.004

0.946 ± 0.005

0.004 ± 0.004

0.006 ± 0.003

− 0.014 ± 0.006

− 0.010 ± 0.006

Random forest

0.003 ± 0.004

0.000 ± 0.003

− 0.006 ± 0.003

− 0.013 ± 0.004

− 0.015 ± 0.004

0.962 ± 0.004

0.960 ± 0.004

0.002 ± 0.004

− 0.018 ± 0.006

− 0.014 + 0.006

Ridge regression

− 0.003 ± 0.004

− 0.006 ± 0.004

− 0.012 ± 0.005

− 0.019 ± 0.004*

− 0.020 ± 0.006*

− 0.006 ± 0.005

0.964 ± 0.003

0.966 ± 0.003

− 0.020 ± 0.006

− 0.016 ± 0.006

SGD (MH)

0.030 ± 0.005**

0.027 ± 0.005

0.021 ± 0.006**

0.014 ± 0.006

0.012 ± 0.006

0.027 ± 0.006**

0.033 ± 0.006**

0.944 ± 0.005

0.933 ± 0.005

0.004 ± 0.007

SGD (SH)

0.062 ± 0.013**

0.059 ± 0.013

0.053 ± 0.014**

0.046 ± 0.014

0.044 ± 0.013

0.059 ± 0.013**

0.064 ± 0.012**

0.032 ± 0.006**

0.948 ± 0.005

0.901 ± 0.012

  1. Models were trained using data from the BE amplicon and generalization performance was measured using the balanced accuracy score. The mean performance of each model before and after recursive feature elimination can be found along the main diagonal. The upper triangle reflects the difference of means between each comparison before recursive feature elimination while the bottom triangle reflects differences in means after recursive feature elimination. A single asterisk is used to represent a statistically significant difference (p ≤ 0.05) in generalization performance which favors classifiers along the rows while statistically significant differences favoring the classifiers along the columns are represented using a double asterisk. The mean and standard error are reported