From: Improving biomedical named entity recognition with syntactic information
Datasets | Entity type | Token # | Sent. # | Entity # | |
---|---|---|---|---|---|
BC2GM | Gene/protein | Train | 355.4k | 12.5k | 15.1k |
Dev | 71.0k | 2.5k | 3.0k | ||
Test | 143.4k | 5.0k | 6.3k | ||
JNLPBA | Train | 443.6k | 14.6k | 32.1k | |
Dev | 117.2k | 3.8k | 8.5k | ||
Test | 114.7k | 3.8k | 6.2k | ||
BC5CDR-chemical | Chemical | Train | 118.1K | 4.5K | 5.2K |
Dev | 117.4K | 4.5K | 5.3K | ||
Test | 124.7K | 4.7K | 5.3K | ||
NCBI-disease | Disease | Train | 135.7K | 5.4K | 5.1K |
Dev | 23.9K | 923 | 787 | ||
Test | 24.4K | 940 | 960 | ||
LINNAEUS | Species | Train | 281.2k | 11.9k | 2.1k |
Dev | 93.8k | 4.0k | 711 | ||
Test | 165k | 7.1k | 1.4k | ||
Species-800 | Train | 147.2K | 5.7K | 2.5K | |
Dev | 22.2K | 830 | 384 | ||
Test | 42.2K | 1.6K | 767 |