Skip to main content

Table 2 Prediction Results on Unknown Alleles Dataset

From: Integrating peptides' sequence and energy of contact residues information improves prediction of peptide and HLA-I binding with unknown alleles

Allele

Supertype

ANNBM

NetMHC

NetMHCpan

Peptides

A*0101

A1

0.854

0.672

0.873

1157

A*0201

A2

0.905

0.886

0.912

3089

A*0202

A2

0.840

0.784

0.815

1447

A*0203

A2

0.836

0.818

0.832

1443

A*0206

A2

0.883

0.826

0.847

1436

A*0301

A3

0.867

0.820

0.849

2094

A*1101

A3

0.879

0.851

0.866

1985

A*2301

A24

0.917

0.877

0.863

104

A*2402

A24

0.864

0.848

0.821

197

A*2403

A24

0.923

0.894

0.912

254

A*2601

A1

0.771

0.631

0.733

672

A*2902

A3

0.832

0.603

0.749

160

A*3001

A3

0.863

0.846

0.838

669

A*3002

A1

0.671

0.711

0.721

92

A*3101

A3

0.853

0.822

0.878

1869

A*3301

A3

0.838

0.699

0.763

1140

A*6801

A3

0.768

0.744

0.760

1141

A*6802

A2

0.812

0.664

0.669

1434

A*6901

A2

0.902

0.811

0.823

833

B*0702

B7

0.919

0.864

0.902

1262

B*1501

B62

0.687

0.536

0.750

978

B*1801

B62

0.823

0.775

0.729

969

B*3501

B7

0.805

0.737

0.762

736

B*4001

B44

0.852

0.818

0.870

1078

B*4002

B44

0.883

0.802

0.807

118

B*4402

B44

0.824

0.771

0.839

119

B*4403

B44

0.836

0.800

0.842

119

B*4501

B44

0.822

0.804

0.809

114

B*5101

B7

0.887

0.879

0.905

244

B*5301

B7

0.828

0.819

0.838

254

B*5401

B7

0.880

0.847

0.845

255

B*5701

B58

0.945

0.652

0.919

59

B*5801

B58

0.869

0.625

0.841

988

AVG

 

0.847

0.774

0.824

 
  1. From table.2, we can see that ANNBM method obtains the higher average AUC value than NetMHCpan and NetMHC methods by 0.023 and 0.073. NetMHC encoding method doesn't take into account the HLA molecules information. Although the training data comes from the same super-type and acquires perfect results on the allele specific benchmark dataset, the HLA differences in the same super class are not reflected, so it is not difficult to understand the NetMHC prediction accuracy decreases and lower than those of ANNBM and NetMHCpan that encode HLA molecules information. Comparing the encoding method of the HLA molecules between ANNBM and NetMHCpan, ANNBM uses the B matrix and each amino acid that could interact with peptide is denoted by a numerical value, while NetMHCpan uses the BLOSUM matrix and a 20 dimensions vector to denote each amino acid. Obviously, ANNBM has higher efficiency in the storage and computation. The average AUC of ANNBM is greater than that of NetMHCpan, especially on the A*0202 and B*3501, whose ROC curves are showed in figure 6 and 7.