Skip to main content

Table 1 Biomedical NER datasets used in the experiments

From: Multitask learning for biomedical named entity recognition with cross-sharing structure

Dataset Size Entity types & counts
BC2GM 20,131 sentences Gene (24,583)
Ex-PTM 3,653 sentences Protein (4,698)
NCBI-disease 7,287 sentences Disease (6,881)
Linnaeus 23,155 sentences Species (4,263)
JNLPBA 24,806 sentences Cell (12,969), Gene (10,589), Protein (35,336)
BC5CDR 13,938 sentences Chemical (15,935), Disease (12,852)
BioNLP09 11,356 sentences Protein (14,963)
BioNLP11ID 5,178 sentences Chemical (973), Protein (6,551), Species (3,471)
BioNLP13PC 5,051 sentences Cell (1,013), Chemical (3,989), Gene (10,891)