A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools

Table 18 Part of speech tagging results on the CRAFT public release data (70% set)

POS Tagger	Precision	Recall	F-measure
LingPipe (Brown model)	0.59 (0.90)	0.58 (0.84)	0.59 (0.87)
LingPipe (MedPost model)	0.47 (0.88)	0.46 (0.83)	0.46 (0.85)
LingPipe (Genia model)	0.79 (0.88)	0.76 (0.85)	0.77 (0.87)
OpenNLP	0.82 (0.86)	0.74 (0.77)	0.78 (0.81)

Numbers in parentheses indicate the upper-bound performance potential of the tools, calculated by removing occurrences of tags that did not align to the gold-standard tagset.

ISSN: 1471-2105