Skip to main content

Table 17 Tokenization results on the CRAFT public release data (70% set)

From: A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools

Tokenizer

Precision

Recall

F-measure

UCompare OpenNLP

0.95

0.86

0.90

UIMA-native

0.96

0.93

0.95

PennBio

0.92

0.91

0.91

Offset Tokenizer

0.97

0.80

0.88