From: Combining word embeddings to extract chemical and drug entities in biomedical literature
Train
Dev
Test
Number of documents
500
250
Avg sentences
25.14
25.85
25.69
No. tokens
202,901
96,869
100,963
No. unique tokens
18,623
12,170
12,442