Skip to main content

Table 4 Parsing results on the test set with predicted POS tags and gold tokenization (except [\(\mathcal {G}\)] which denotes results when employing gold POS tags in both training and testing phases)

From: From POS tagging to dependency parsing for biomedical event extraction

System With punctuation Without punctuation
   Overall Exact match Overall Exact match
   LAS UAS LAS UAS LAS UAS LAS UAS
GENIA          
Pre-trained Stanford-NNdep [ ∙] 86.66 88.22 25.15 29.26 87.31 89.02 25.88 30.22
  Stanford-Biaffine-v1 [ ∙] 84.69 87.95 16.25 26.10 84.92 88.55 16.99 28.24
  Stanford-NNdep 86.79 88.13 25.22 29.19 87.43 88.91 25.88 30.15
  Stanford-Biaffine-v1 84.72 87.89 16.47 25.81 84.94 88.45 17.06 27.79
  BLLIP+Bio 88.38 89.92 28.82 35.96 88.76 90.49 29.93 37.43
GENIA          
Retrained Stanford-NNdep 87.02 88.34 25.74 30.07 87.56 89.02 26.03 30.59
  NLP4J-dep 88.20 89.45 28.16 31.99 88.87 90.25 28.90 32.94
  jPTDP-v1 90.01 91.46 29.63 35.74 90.27 91.89 30.29 37.06
  Stanford-Biaffine-v2 91.04 92.31 33.38 39.56 91.23 92.64 34.41 41.10
  Stanford-Biaffine-v2 [\(\mathcal {G}\)] 91.68 92.51 36.99 40.44 91.92 92.84 38.01 41.84
CRAFT          
Retrained Stanford-NNdep 84.76 86.64 25.31 30.40 85.59 87.81 25.48 30.96
  NLP4J-dep 86.98 88.85 27.60 33.71 87.62 89.80 28.16 34.60
  jPTDP-v1 88.27 90.08 29.68 36.06 88.66 90.79 30.24 37.12
  Stanford-Biaffine-v2 90.41 92.02 33.20 40.03 90.77 92.67 33.87 41.10
  Stanford-Biaffine-v2 [\(\mathcal {G}\)] 91.43 92.93 35.22 41.99 91.69 93.47 35.61 42.95
  1. “Without punctuation” refers to results excluding punctuation and other symbols from evaluation. “Exact match” denotes the percentage of sentences whose predicted trees are entirely correct [25]. [ ∙] denotes the use of the pre-trained Stanford tagger for predicting POS tags on test set, instead of using the retrained NLP4J-POS model. Score differences between the “retrained” parsers on both corpora are significant at p≤0.001 using McNemar’s test (except UAS scores obtained by Stanford-Biaffine-v2 for gold and predicted POS tags on GENIA, i.e. 92.51 vs. 92.31 and 92.84 vs. 92.64, where p≤0.05)