Skip to main content

Table 1 The datasets and details of their annotations

From: A neural network multi-task learning approach to biomedical named entity recognition

Dataset Contents Entity counts
AnatEM [38] Anatomy NE 13,701
BC2GM [2] Gene/Protein NE 24,583
BC4CHEMD [3] Chemical NE 84,310
BC5CDR [5] Chemical, Disease NEs Chemical: 15,935; Disease:12,852
BioNLP09 [52] Gene/Protein NE 14,963
BioNLP11EPI [53] Gene/Protein NE 15,811
BioNLP11ID [53] 4 NEs Gene/Protein: 6551; Organism: 3471;
   Chemical: 973; Regulon-operon: 87
BioNLP13CG [54] 16 NEs Gene/Protein: 7908; Cell: 3492; Cancer: 2582
   Chemical: 2270; Organism: 1715; Multi-tissue structure: 857;
   Tissue: 587; Cellular component: 569; Organ: 421;
   Organism substance: 283; Pathological formation: 228; Amino acid: 135;
   Immaterial anatomical entity: 102; Organism subdivision: 98;
   Anatomical system: 41; Developing anatomical structure: 35
BioNLP13GE [55] Gene/Protein NE 12,057
BioNLP13PC [56] 4 NEs Gene/Protein: 10,891; Chemical: 2487;
   Complex: 1502; Cellular component: 1013
CRAFT [57] 6 NEs SO: 18,974; Gene/Protein: 16,064;
   Taxonomy: 6868; Chemical: 6053; CL: 5495; GO-CC: 4180
Ex-PTM [58] Gene/Protein NE 4698
JNLPBA [44] 5 NEs Gene/Protein: 35,336; DNA: 10,589; Cell Type: 8639
   Cell Line: 4330; RNA: 1069
Linnaeus [4] Species NE 4263
NCBI-Disease [6] Disease NE 6881
GENIA-PoS [59] PoS-Tagging N/A