Skip to main content

Table 1 The datasets and details of their annotations

From: A neural network multi-task learning approach to biomedical named entity recognition

Dataset

Contents

Entity counts

AnatEM [38]

Anatomy NE

13,701

BC2GM [2]

Gene/Protein NE

24,583

BC4CHEMD [3]

Chemical NE

84,310

BC5CDR [5]

Chemical, Disease NEs

Chemical: 15,935; Disease:12,852

BioNLP09 [52]

Gene/Protein NE

14,963

BioNLP11EPI [53]

Gene/Protein NE

15,811

BioNLP11ID [53]

4 NEs

Gene/Protein: 6551; Organism: 3471;

  

Chemical: 973; Regulon-operon: 87

BioNLP13CG [54]

16 NEs

Gene/Protein: 7908; Cell: 3492; Cancer: 2582

  

Chemical: 2270; Organism: 1715; Multi-tissue structure: 857;

  

Tissue: 587; Cellular component: 569; Organ: 421;

  

Organism substance: 283; Pathological formation: 228; Amino acid: 135;

  

Immaterial anatomical entity: 102; Organism subdivision: 98;

  

Anatomical system: 41; Developing anatomical structure: 35

BioNLP13GE [55]

Gene/Protein NE

12,057

BioNLP13PC [56]

4 NEs

Gene/Protein: 10,891; Chemical: 2487;

  

Complex: 1502; Cellular component: 1013

CRAFT [57]

6 NEs

SO: 18,974; Gene/Protein: 16,064;

  

Taxonomy: 6868; Chemical: 6053; CL: 5495; GO-CC: 4180

Ex-PTM [58]

Gene/Protein NE

4698

JNLPBA [44]

5 NEs

Gene/Protein: 35,336; DNA: 10,589; Cell Type: 8639

  

Cell Line: 4330; RNA: 1069

Linnaeus [4]

Species NE

4263

NCBI-Disease [6]

Disease NE

6881

GENIA-PoS [59]

PoS-Tagging

N/A