Skip to main content

Table 1 Performance of models with different ML algorithms and single descriptors

From: Prediction of polyreactive and nonspecific single-chain fragment variables through structural biochemical features and protein language-based descriptors

Descriptor

Method

Train AUC

Valid AUC

Test AUC

Accuracy

Precision

Recall

F1-score

F46

GBM

0.917 ± 0.006

0.705 ± 0.129

0.805

0.730

0.753

0.607

0.672

LGBM

0.858 ± 0.009

0.704 ± 0.124

0.811

0.731

0.700

0.720

0.710

RF

1.000 ± 0.000

0.697 ± 0.123

0.795

0.728

0.722

0.658

0.689

XGB

0.951 ± 0.003

0.701 ± 0.125

0.810

0.732

0.705

0.710

0.708

UniRep

GBM

1.000 ± 0.000

0.591 ± 0.177

0.821

0.747

0.741

0.686

0.712

LGBM

0.926 ± 0.006

0.606 ± 0.178

0.816

0.734

0.704

0.722

0.713

RF

1.000 ± 0.000

0.575 ± 0.178

0.815

0.736

0.728

0.671

0.699

XGB

0.999 ± 0.000

0.596 ± 0.181

0.824

0.740

0.718

0.709

0.713

TAPE

GBM

1.000 ± 0.000

0.647 ± 0.160

0.824

0.745

0.741

0.679

0.709

LGBM

0.919 ± 0.005

0.657 ± 0.155

0.810

0.731

0.703

0.713

0.708

RF

1.000 ± 0.000

0.638 ± 0.160

0.815

0.745

0.747

0.666

0.704

XGB

0.998 ± 0.000

0.651 ± 0.156

0.822

0.746

0.729

0.706

0.717

ESM-1b

GBM

0.979 ± 0.003

0.603 ± 0.182

0.807

0.727

0.742

0.616

0.673

LGBM

0.922 ± 0.006

0.608 ± 0.176

0.807

0.730

0.697

0.722

0.710

RF

1.000 ± 0.000

0.593 ± 0.176

0.814

0.736

0.730

0.669

0.698

XGB

0.998 ± 0.000

0.604 ± 0.177

0.821

0.741

0.718

0.713

0.716

ESM-1v

GBM

1.000 ± 0.000

0.594 ± 0.150

0.819

0.743

0.761

0.639

0.695

LGBM

0.919 ± 0.005

0.602 ± 0.172

0.813

0.730

0.694

0.729

0.711

RF

1.000 ± 0.000

0.582 ± 0.171

0.816

0.740

0.735

0.674

0.703

XGB

0.949 ± 0.002

0.597 ± 0.175

0.808

0.728

0.702

0.701

0.701

  1. The bold means the best performance, the AUC score in the test set