Dataset | Number of sentences | Sentence length | Entity type | Entity count |
---|---|---|---|---|
BC2GM | 20,000 | 28.5 | Gene/protein | 24,583 |
JNLPBA | 24,806 | 29.7 | Â | 35,336 |
BC5CDR-disease | 13,938 | 26.0 | Disease | 12,852 |
NCBI-disease | 6881 | 26.1 | Â | 6881 |
Linnaeus | 23,155 | 23.1 | Species | 4263 |
Species-800 | 8130 | 25.9 | Â | 3651 |
BC5CDR-chemical | 13,938 | 26.0 | Chemical | 15,935 |