Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: Using protein language models for protein interaction hot spot prediction with limited data

Fig. 1

Model training and validation using features derived from the representations learned by the ESM-2 protein language model. The target residue, together with its sequence neighbor aa residues, is passed to the ESM-2 encoder, which produces an N-dimensional (N = 1280) embedding vector for each residue in the sequence. The 1,280 elements of each vector were supplied as a set of input features for training a model on a 20% split dataset. All features or a reduced set of k (k < N) features were selected either randomly or based on AutoGluon's feature importance test for training a model on a 70% split dataset

Back to article page