Skip to main content

Table 1 BioCreAtIvE Data Sets

From: BioCreAtIvE Task1A: entity identification with a stochastic tagger

Set

Number of Sentences

Number of Entities

1 word

2 words

3 words

4 words

> 4 words

training

7500

8876

46.1%

25.7%

14.9%

6.6%

6.6%

devtest

2500

2975

46.6%

23.9%

15.1%

6.7%

7.7%

official test

5000

5949

46.1%

26.7%

14.3%

6.2%

6.7%

  1. This table shows the BioCreAtIvE data including the ratio for the word length, which shows same tendency among sets.