Skip to main content

Table 1 The statistics of the datasets used in our experiments

From: Biomedical named entity recognition with the combined feature attention and fully-shared multi-task learning

Dataset

Number of sentences

Sentence length

Entity type

Entity count

BC2GM

20,000

28.5

Gene/protein

24,583

JNLPBA

24,806

29.7

 

35,336

BC5CDR-disease

13,938

26.0

Disease

12,852

NCBI-disease

6881

26.1

 

6881

Linnaeus

23,155

23.1

Species

4263

Species-800

8130

25.9

 

3651

BC5CDR-chemical

13,938

26.0

Chemical

15,935