USMPep: universal sequence models for major histocompatibility complex binding affinity prediction

Vielhaben, Johanna; Wenzel, Markus; Samek, Wojciech; Strodthoff, Nils

doi:10.1186/s12859-020-03631-1

BMC Bioinformatics

Table 1 Comparison of MHC I prediction tools

From: USMPep: universal sequence models for major histocompatibility complex binding affinity prediction

Architecture
SMMPMBEC [7]	One-hot encoding, linear model (scoring matrix)
consensus [8]	Linear model (scoring matrix), median rank as prediction
NetMHC4 [9]	Input: 9mer fixed length blocks substitution matrix (BLOSUM) encoding plus additional features; multilayer perceptron with one hidden layer
NetMHCpan4 [10]	Input: 9mer fixed length BLOSUM encoding for peptide, pseudo-sequence for MHC molecule plus additional features; multilayer perceptron with one hidden layer
MHCFlurry [11]	Input: 15mer fixed length BLOSUM62 encoding, missing residues filled with wildcard amino acid (AA); feedforward neural network (NN) with 0 to 2 locally connected and one fully connected hidden layer
USMPep (this work)	Learned embedding layer; AWD LSTM with one hidden layer
Training procedure
SMMPMBEC	Ridge regression with modified regularization, peptide MHC binding energy covariance (PMBEC) similarity matrix as Bayesian prior
consensus	Four scoring matrices from existing algorithms
NetMHC4	Training on non 9mer peptides by insertion of wildcard AA or deletion at all possible positions; augmented training set with natural peptides for each length assumed to be negative
NetMHCpan4	Same insertion/ deletion procedure as NetMHC4; augmented training set with random artificial negatives
MHCFlurry	Pretraining on BLOSUM62 similar allele for alleles with little training data; augmented training set with artificial negative peptides
USMPep	Optional: language model pretraining on unlabeled sequences
Model selection
SMMPMBEC	Single model
consensus	Single model
NetMHC4	Ensemble of 4 NNs
NetMHCpan4	Ensemble of 100 NNs
MHCFlurry	Ensemble of 8-16 NNs selected from 320 models on a validation set
USMPep	Optional: ensemble of 10 NNs with identical architectures and hyperparameters

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com