BMC Bioinformatics

Table 1 The number of files (#file), sentences (#sent), word tokens (#token) and out-of-vocabulary (OOV) percentage in each experimental dataset

From: From POS tagging to dependency parsing for biomedical event extraction

	Dataset	#file	#sent	#token	OOV
GENIA	Training	1701	15,820	414,608	0.0
	Development	148	1361	36,180	4.4
	Test	150	1360	35,639	4.4
CRAFT	Training	55	18,644	481,247	0.0
	Development	6	1280	31,820	6.6
	Test	6	1786	47,926	6.3

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com