Skip to main content

Table 2 Statistical significances of features of cross-validation and blind data sets in discriminating large deviations from small

From: Dataset size and composition impact the reliability of performance benchmarks for peptide-MHC binding predictions

Features

SMMPMBEC

NetMHC

NetMHCpan

log_size_cv

7.7e-07

2.5e-04

2.5e-02

log_size_bl

2.9e-05

3.6e-03

1.2e-02

entss_cv

1.1e-04

1.7e-03

2.0e-02

entss_bl

3.4e-05

3.9e-04

5.1e-03

ent_meas_cv

1.7e-01

5.5e-01

4.6e-01

ent_meas_bl

4.6e-01

5.4e-01

8.5e-01

ent_pred_cv

1.5e-01

2.1e-01

2.0e-01

ent_pred_bl

4.8e-03

6.4e-02

1.1e-02

prbol_meas

3.5e-01

9.9e-02

3.1e-01

prbol_pred

7.8e-03

3.7e-02

2.8e-02

  1. Here, deviation = |cv_gs - blind|, where blind and cv_gs correspond to predictive performances in AROCs. Significant features (t-test; two-tailed at 0.05 cutoff) are italicized. See Methods for definitions of the features.