From: Benchmarking for biomedical natural language processing tasks with a domain specific ALBERT
Dataset | Task | Domain | Train | Dev | Test | Metric |
---|---|---|---|---|---|---|
BC5CDR (disease) | NER | Biomedical | 109,853 | 121,971 | 129,472 | F1-Score |
BC5CDR (chemical) | NER | Biomedical | 109,853 | 117,391 | 124,676 | F1-Score |
NCBI (disease) | NER | Clinical | 135,615 | 23,959 | 24,488 | F1-Score |
JNLPBA | NER | Biomedical | 443,653 | 117,213 | 114,709 | F1-Score |
BC2GM | NER | Biomedical | 333,920 | 70,937 | 118,189 | F1-Score |
LINNAEUS | NER | Biomedical | 267,500 | 87,991 | 134,622 | F1-Score |
Species-800 (S800) | NER | Biomedical | 147,269 | 22,217 | 42,287 | F1-Score |
Share/Clefe | NER | Clinical | 4628 | 1075 | 5195 | F1-Score |
GAD | RE | Biomedical | 3277 | 1025 | 820 | F1-Score |
Euadr | RE | Biomedical | 227 | 71 | 57 | F1-Score |
DDI | RE | Biomedical | 2937 | 1004 | 979 | F1-Score |
ChemProt | RE | Biomedical | 4154 | 2416 | 3458 | F1-Score |
i2b2 | RE | Clinical | 3110 | 11 | 6293 | F1-Score |
HoC | Document classification | Biomedical | 1108 | 157 | 315 | F1-Score |
MedNLI | Inference | Clinical | 11,232 | 1395 | 1422 | Accuracy |
MedSTS | Sentence similarity | Clinical | 675 | 75 | 318 | Pearson |
BIOSSES | Sentence similarity | Biomedical | 64 | 16 | 20 | Pearson |
BioASQ 4b-factoid | QA | Biomedical | 327 | – | 161 | Accuracy (Lenient) |
BioASQ 5b-factoid | QA | Biomedical | 486 | – | 150 | Accuracy (Lenient) |
BioASQ 6b-factoid | QA | Biomedical | 618 | – | 161 | Accuracy (Lenient) |