A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools

Table 20 Results of constituent parsers using their distributed non-biomedical models on the CRAFT development set; labeled bracket precision (LB-P), recall (LB-R) and F-score (LB-F)

Parser	LB-P	LB-R	LB-F	unevaluated count
Berkeley	61.60	64.50	63.02	4
Bikel	63.97	65.82	64.89	2
Charniak-Johnson	62.51	65.55	64.00	59
Enju	71.93	43.56	54.26	8
Mogura	54.74	43.25	48.32	8
Stanford 1.6	60.76	64.70	62.67	3

ISSN: 1471-2105