Skip to main content

Table 2 Prediction Results on Unknown Alleles Dataset

From: Integrating peptides' sequence and energy of contact residues information improves prediction of peptide and HLA-I binding with unknown alleles

Allele Supertype ANNBM NetMHC NetMHCpan Peptides
A*0101 A1 0.854 0.672 0.873 1157
A*0201 A2 0.905 0.886 0.912 3089
A*0202 A2 0.840 0.784 0.815 1447
A*0203 A2 0.836 0.818 0.832 1443
A*0206 A2 0.883 0.826 0.847 1436
A*0301 A3 0.867 0.820 0.849 2094
A*1101 A3 0.879 0.851 0.866 1985
A*2301 A24 0.917 0.877 0.863 104
A*2402 A24 0.864 0.848 0.821 197
A*2403 A24 0.923 0.894 0.912 254
A*2601 A1 0.771 0.631 0.733 672
A*2902 A3 0.832 0.603 0.749 160
A*3001 A3 0.863 0.846 0.838 669
A*3002 A1 0.671 0.711 0.721 92
A*3101 A3 0.853 0.822 0.878 1869
A*3301 A3 0.838 0.699 0.763 1140
A*6801 A3 0.768 0.744 0.760 1141
A*6802 A2 0.812 0.664 0.669 1434
A*6901 A2 0.902 0.811 0.823 833
B*0702 B7 0.919 0.864 0.902 1262
B*1501 B62 0.687 0.536 0.750 978
B*1801 B62 0.823 0.775 0.729 969
B*3501 B7 0.805 0.737 0.762 736
B*4001 B44 0.852 0.818 0.870 1078
B*4002 B44 0.883 0.802 0.807 118
B*4402 B44 0.824 0.771 0.839 119
B*4403 B44 0.836 0.800 0.842 119
B*4501 B44 0.822 0.804 0.809 114
B*5101 B7 0.887 0.879 0.905 244
B*5301 B7 0.828 0.819 0.838 254
B*5401 B7 0.880 0.847 0.845 255
B*5701 B58 0.945 0.652 0.919 59
B*5801 B58 0.869 0.625 0.841 988
AVG   0.847 0.774 0.824  
  1. From table.2, we can see that ANNBM method obtains the higher average AUC value than NetMHCpan and NetMHC methods by 0.023 and 0.073. NetMHC encoding method doesn't take into account the HLA molecules information. Although the training data comes from the same super-type and acquires perfect results on the allele specific benchmark dataset, the HLA differences in the same super class are not reflected, so it is not difficult to understand the NetMHC prediction accuracy decreases and lower than those of ANNBM and NetMHCpan that encode HLA molecules information. Comparing the encoding method of the HLA molecules between ANNBM and NetMHCpan, ANNBM uses the B matrix and each amino acid that could interact with peptide is denoted by a numerical value, while NetMHCpan uses the BLOSUM matrix and a 20 dimensions vector to denote each amino acid. Obviously, ANNBM has higher efficiency in the storage and computation. The average AUC of ANNBM is greater than that of NetMHCpan, especially on the A*0202 and B*3501, whose ROC curves are showed in figure 6 and 7.