Skip to main content

Table 1 Basic analysis of SPACCC corpus documents

From: Combining word embeddings to extract chemical and drug entities in biomedical literature

 

Train

Dev

Test

Number of documents

500

250

250

Avg sentences

25.14

25.85

25.69

No. tokens

202,901

96,869

100,963

No. unique tokens

18,623

12,170

12,442