Skip to main content

Table 3 POS tagging accuracies on the test set with gold tokenization

From: From POS tagging to dependency parsing for biomedical event extraction

Model GENIA CRAFT
MarMoT 98.61 97.07
jPTDP-v1 98.66 97.24
NLP4J-POS 98.80 97.43
BiLSTM-CRF 98.44 97.25
+ CNN-char 98.89 97.51
+ LSTM-char 98.85 97.56
Stanford tagger [ ] 98.37 _
GENIA tagger [ ] 98.49 _
  1. [ ] denotes a result with a pre-trained POS tagger. We do not provide accuracy results of the pre-trained POS taggers on CRAFT because CRAFT uses an extended PTB POS tag set (i.e. there are POS tags in CRAFT that are not defined in the original PTB POS tag set). Corpus-level accuracy differences of at least 0.17% in GENIA and 0.26% in CRAFT between two POS tagging models are significant at p≤0.05. Here, we compute sentence-level accuracies, then use paired t-test to measure the significance level