From: Benchmarking for biomedical natural language processing tasks with a domain specific ALBERT
Dataset | SOTA | BioALBERT | Difference over SOTA | |||||||
---|---|---|---|---|---|---|---|---|---|---|
Base1 | Base2 | Large1 | Large2 | Base3 | Base4 | Large3 | Large4 | |||
Named entity recognition task | ||||||||||
Share/Clefe | 75.40 | 94.27 | 94.47 | 93.16 | 94.30 | 94.84* | 94.82 | 94.70 | 94.66 | 19.44 \(\uparrow\) |
BC5CDR (disease) | 87.15 | 97.66 | 97.62 | 97.78* | 97.61 | 90.03 | 90.01 | 90.29 | 91.44 | 10.63 \(\uparrow\) |
BC5CDR (chemical) | 93.47 | 97.90 | 98.08* | 97.76 | 97.79 | 89.83 | 90.08 | 90.01 | 91.48 | 4.61 \(\uparrow\) |
JNLPBA | 82.00 | 82.72 | 83.22 | 84.01 | 83.53 | 86.74* | 86.56 | 86.20 | 85.72 | 4.74 \(\uparrow\) |
Linnaeus | 93.54 | 99.71 | 99.72 | 99.73 | 99.73* | 95.72 | 98.27 | 98.24 | 98.23 | 6.19 \(\uparrow\) |
NCBI (disease) | 89.71 | 95.89 | 95.61 | 97.18* | 95.85 | 85.82 | 85.93 | 85.86 | 85.83 | 7.47 \(\uparrow\) |
S800 | 75.31 | 98.76 | 98.49 | 99.02* | 98.72 | 93.53 | 93.63 | 93.63 | 93.63 | 23.71 \(\uparrow\) |
BC2GM | 85.10 | 96.34 | 96.02 | 96.97* | 96.33 | 83.35 | 83.38 | 83.44 | 84.72 | 11.87 \(\uparrow\) |
BLURB | 84.61 | 95.41 | 95.41 | 95.70* | 95.48 | 89.98 | 90.34 | 90.30 | 90.71 | 11.09\(\uparrow\) |
Relation extraction task | ||||||||||
DDI | 82.36 | 82.32 | 79.98 | 83.76 | 84.05* | 76.22 | 75.57 | 76.28 | 76.46 | 1.69 \(\uparrow\) |
ChemProt | 77.50 | 78.32* | 76.42 | 77.77 | 77.97 | 62.85 | 62.34 | 61.69 | 57.46 | 0.82 \(\uparrow\) |
i2b2 | 76.40 | 76.49 | 76.54 | 76.86* | 76.81 | 73.83 | 73.08 | 72.19 | 75.09 | 0.46 \(\uparrow\) |
Euadr | 86.51 | 82.32 | 74.07 | 84.56 | 81.32 | 62.52 | 76.93 | 70.41 | 70.48 | − 1.95 \(\downarrow\) |
GAD | 84.30 | 73.82 | 66.32 | 76.74 | 69.65 | 72.68 | 69.14 | 71.81 | 68.17 | − 7.56 \(\downarrow\) |
BLURB | 79.14 | 78.66 | 74.67 | 79.94* | 77.96 | 69.62 | 71.41 | 70.50 | 69.53 | 0.80\(\uparrow\) |
Sentence similarity task | ||||||||||
BIOSSES | 92.30 | 82.27 | 73.14 | 92.80* | 81.90 | 24.94 | 55.80 | 47.86 | 30.48 | 0.50 \(\uparrow\) |
MedSTS | 84.80 | 85.70 | 85.00 | 85.70* | 85.40 | 51.80 | 56.70 | 45.80 | 42.00 | 0.90 \(\uparrow\) |
BLURB | 88.20 | 83.99 | 79.07 | 89.25* | 83.65 | 38.37 | 56.25 | 46.83 | 36.24 | 1.05\(\uparrow\) |
Inference task | ||||||||||
MedNLI | 84.00 | 77.69 | 76.35 | 79.38 | 79.52 | 78.25 | 77.20 | 76.34 | 75.51 | − 4.48 \(\downarrow\) |
Document classification task | ||||||||||
HoC | 87.30 | 83.21 | 84.52 | 87.92* | 84.32 | 64.20 | 75.20 | 61.00 | 81.70 | 0.62 \(\uparrow\) |
Question answering task | ||||||||||
BioASQ 4b | 47.82 | 47.90 | 48.34 | 48.90* | 48.25 | 47.10 | 47.35 | 45.90 | 46.10 | 1.08 \(\uparrow\) |
BioASQ 5b | 60.00 | 61.10 | 61.90 | 62.31* | 61.57 | 58.54 | 59.21 | 58.98 | 58.50 | 2.31 \(\uparrow\) |
BioASQ 6b | 57.77 | 59.80 | 62.00 | 62.88* | 61.54 | 56.10 | 56.22 | 56.60 | 56.85 | 5.11 \(\uparrow\) |
BLURB | 55.20 | 56.27 | 57.41 | 58.03* | 57.12 | 53.91 | 54.26 | 53.83 | 53.82 | 2.83\(\uparrow\) |