Skip to main content
Fig. 6 | BMC Bioinformatics

Fig. 6

From: Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT

Fig. 6

PTM-wise cosine similarity of test and large-scale abstracts with the train set. The blue region is the similarity of the test set with train set. The orange region is the similarity of the abstracts from large scale predictions with the train set. The count vector representation of the abstracts from the large scale predictions with high quality (top) are more similar to test and train abstracts compared to the low quality predictions (bottom). Note that acetylation and ubiquitination have very low test samples

Back to article page