Skip to main content

Table 1 The number of files (#file), sentences (#sent), word tokens (#token) and out-of-vocabulary (OOV) percentage in each experimental dataset

From: From POS tagging to dependency parsing for biomedical event extraction

  Dataset #file #sent #token OOV
GENIA Training 1701 15,820 414,608 0.0
  Development 148 1361 36,180 4.4
  Test 150 1360 35,639 4.4
CRAFT Training 55 18,644 481,247 0.0
  Development 6 1280 31,820 6.6
  Test 6 1786 47,926 6.3