Skip to main content

Table 3 POS tagging accuracies on the test set with gold tokenization

From: From POS tagging to dependency parsing for biomedical event extraction

Model

GENIA

CRAFT

MarMoT

98.61

97.07

jPTDP-v1

98.66

97.24

NLP4J-POS

98.80

97.43

BiLSTM-CRF

98.44

97.25

+ CNN-char

98.89

97.51

+ LSTM-char

98.85

97.56

Stanford tagger [ ⋆]

98.37

_

GENIA tagger [ ⋆]

98.49

_

  1. [ ⋆] denotes a result with a pre-trained POS tagger. We do not provide accuracy results of the pre-trained POS taggers on CRAFT because CRAFT uses an extended PTB POS tag set (i.e. there are POS tags in CRAFT that are not defined in the original PTB POS tag set). Corpus-level accuracy differences of at least 0.17% in GENIA and 0.26% in CRAFT between two POS tagging models are significant at p≤0.05. Here, we compute sentence-level accuracies, then use paired t-test to measure the significance level