Skip to main content

Table 2 Upper bound protein name recognition performance after ideal lexicon enrichment

From: How to make the most of NE dictionaries in statistical NER

Method

 

R

P

F

Tagging (+test set protein names)

Full

79.02

61.87

69.40

 

Left

82.28

64.42

72.26

 

Right

80.96

63.38

71.10

Labelling (+test set protein names)

full

86.13

72.49

78.72

 

Left

89.58

75.40

81.88

 

Right

90.23

75.95

82.47

  1. The upper bound performance on the JNLPBA-2004 test set by enriching the lexicon with protein names appearing in the test set. NB: It was the only the lexicon that was modified. The tagging and sequential labelling models were not retrained using the test set. The first block shows the performance of POS/PROTEIN tagging after adding protein names appearing in the test set to the dictionary. Since many protein names overlap with general English words, sometimes protein names in sentences are not recognized as protein names. The second block shows the performance of the sequence labelling based on the tagging output. Note that the tagging and sequential labelling models were not retrained using the test set.