Skip to main content

Table 1 The number of files (#file), sentences (#sent), word tokens (#token) and out-of-vocabulary (OOV) percentage in each experimental dataset

From: From POS tagging to dependency parsing for biomedical event extraction

 

Dataset

#file

#sent

#token

OOV

GENIA

Training

1701

15,820

414,608

0.0

 

Development

148

1361

36,180

4.4

 

Test

150

1360

35,639

4.4

CRAFT

Training

55

18,644

481,247

0.0

 

Development

6

1280

31,820

6.6

 

Test

6

1786

47,926

6.3