Skip to main content

Table 1 BioCreAtIvE Data Sets

From: BioCreAtIvE Task1A: entity identification with a stochastic tagger

Set Number of Sentences Number of Entities 1 word 2 words 3 words 4 words > 4 words
training 7500 8876 46.1% 25.7% 14.9% 6.6% 6.6%
devtest 2500 2975 46.6% 23.9% 15.1% 6.7% 7.7%
official test 5000 5949 46.1% 26.7% 14.3% 6.2% 6.7%
  1. This table shows the BioCreAtIvE data including the ratio for the word length, which shows same tendency among sets.