A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools

Table 25 Macro-averaged results for dependency parsers on the CRAFT folds and dev set compared to untrained results on dev set; labeled attachment score (LAS), unlabeled attachment score (UAS), labeled accuracy score (LS)

Parser	Fold 0	Fold 1	Fold 2	Fold 3	Fold 4	Training Average	Dev Set	Dev – WSJ model
MaltParser
LAS	88.45	88.70	89.62	89.12	88.85	88.97	88.93	72.40
UAS	90.33	90.63	91.50	90.94	90.51	90.80	90.72	75.90
LS	93.43	93.78	94.23	94.16	93.93	93.92	94.03	82.73
MSTParser
LAS	88.30	88.85	89.58	89.12	88.90	88.98	89.36	75.99
UAS	90.37	90.83	91.50	91.04	90.82	90.93	91.31	79.42
LS	93.32	94.06	94.37	94.25	93.98	94.03	94.52	85.73
ClearParser
LAS	89.09	89.43	90.33	89.86	89.59	89.68	90.09	74.56
UAS	90.66	91.09	91.81	91.42	91.08	91.23	91.63	77.78
LS	93.89	94.37	94.88	94.65	94.57	94.50	94.99	85.17

ISSN: 1471-2105