Skip to main content
Fig. 5 | BMC Bioinformatics

Fig. 5

From: Protein language models can capture protein quaternary state

Fig. 5

Cosine similarity provides an estimate of the reliability of qs predictions based on annotation transfer from homolog proteins A. Cosine similarity using information that includes qs of homologous sequences. For each qs (x-axis) a box plot depicts the distribution of the cosine similarities for correct (green) and incorrect (purple) qs predictions (y-axis). The plot is capped at 0.85, with 3 dots removed for clarity. The two distributions differ significantly (one sided Wilcoxon test, p-values < 0.05, except for qs = 7, p-value = 0.051; see Additional file 2: Table S1). This separation demonstrates the power of embeddings to capture qs, and can be used to assess the confidence of annotation transfers. B. Corresponding plot of cosine similarities when information of qs of homologous sequences is NOT included (i.e. no entry with > 30% sequence identity is considered for annotation transfer). In this case, the two distributions show no significant difference.

Back to article page