Skip to main content

Table 2 Details of training and test datasets

From: USMPep: universal sequence models for major histocompatibility complex binding affinity prediction

Dataset

Usage

Total size

Share of binders

# alleles

Median size

Share of quant. meas.

Sequence length

MHC class I

BD2009

train

117326

0.25

53

1971

0.58

8–11

Blind

test

27680

0.33

53

470

0.58

8–11

MHCFlurry18

train

120720

0.25

32

3659

0.68

8–15

IEDB16_I

test

2827

0.54

32

73

1.0

9

MHCFlurry18

train

68117

0.26

7

6884

0.64

8–15

HPV

test

743

0.34

7

125

0.37

8–11

MHC class II

Wang10

train

23203

0.37

24

999

1.0

15–37

IEDB16_II

test

15691

0.33

24

641

1.0

15

  1. The threshold for MHC class I binders is 500nM, except for the HPV dataset, where the threshold is 100 000nM. For MHC class II binders, the threshold is 1000nM