Fig. 3

Confidence scores predicted by each model within the ensemble for each sample on the test and validation set using PPI-BioBERT-x10 where left) shows the standard deviation of correct predictions and right) standard deviation of incorrect predictions. Incorrect predictions do not have low variation in the confidence score except for the 3 test samples in phosphorylation which on manual verification are in fact correct