Table 1 Statistics of entity ambiguity for the Bio-ID corpus

From: Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes

PropertiesTraining setTest set
# Mentions44401715
# Monosemous30311265
# Polysemous/Ambiguity Rate1409 / 2.79450 / 2.41
  1. The left column reports four types of attributes, which are the number of unique proteins/genes mention terms (#Mentions), the number of #Mentions with only one entity ID attested in the corpus (#Monosemous), the number of #Mentions with two or more IDs attested in the corpus (#Polysemous), and the average number of candidate IDs that a polysemous target mention has (Ambiguity Rate)