Skip to main content

Table 1 BioNER corpora in experiments

From: DTranNER: biomedical named entity recognition with deep learning-based label-label transition model

Datasets

Number of Sentences

Entity Types

Entity Counts

Max Entity Length

Average Entity Length

BC2GM [35]

20128

Gene/Protein

24583

26 tokens

2.44 tokens

BC4CHEMD [36]

87682

Chemical/Drug

84310

137 tokens

2.19 tokens

BC5CDR-Chemical [37]

13935

Chemical/Drug

15935

56 tokens

1.33 tokens

BC5CDR-Disease [37]

13935

Disease

12852

19 tokens

1.65 tokens

NCBI-Disease [38]

7284

Disease

6881

22 tokens

2.21 tokens