Skip to main content

Table 4 Overview of the generalization performance (balanced accuracy) of each model when trained using metabarcoding data

From: LANDMark: an ensemble approach to the supervised selection of biomarkers in high-throughput sequencing data

Amplicon

LANDMark (Oracle)

LANDMark (No Oracle)

Extra Trees

Linear SVC

Logistic Regression

Random Forest

Ridge Regression

SGD (MH)

SGD (SH)

F230

0.957 ± 0.061 (1)a

0.957 ± 0.061 (2)a

0.940 ± 0.061 (6)

0.944 ± 0.071 (5)

0.931 ± 0.073 (7)

0.951 ± 0.066 (3)

0.944 ± 0.069 (4)

0.922 ± 0.077 (8)

0.919 ± 0.079 (9)

BE

0.963 ± 0.049 (7.5)

0.967 ± 0.048 (5)

0.976 ± 0.043 (2)

0.970 ± 0.0054 (4)a

0.970 ± 0.047 (3)

0.98 ± 0.043 (1)

0.963 ± 0.049 (7.5)

0.963 ± 0.055 (6)

0.934 ± 0.062 (9)

  1. The average generalization performance and standard deviation (measured using balanced accuracy) for each classification model was calculated using test data after training each classification model on data derived from either the F230 or the BE amplicons
  2. The best performing results are in bold
  3. aTruncating to three significant digits resulted in the score rounding up to the nearest thousandth