Skip to main content

Table 6 Statistics of the datasets used

From: Benchmarking for biomedical natural language processing tasks with a domain specific ALBERT

Dataset

Task

Domain

Train

Dev

Test

Metric

BC5CDR (disease)

NER

Biomedical

109,853

121,971

129,472

F1-Score

BC5CDR (chemical)

NER

Biomedical

109,853

117,391

124,676

F1-Score

NCBI (disease)

NER

Clinical

135,615

23,959

24,488

F1-Score

JNLPBA

NER

Biomedical

443,653

117,213

114,709

F1-Score

BC2GM

NER

Biomedical

333,920

70,937

118,189

F1-Score

LINNAEUS

NER

Biomedical

267,500

87,991

134,622

F1-Score

Species-800 (S800)

NER

Biomedical

147,269

22,217

42,287

F1-Score

Share/Clefe

NER

Clinical

4628

1075

5195

F1-Score

GAD

RE

Biomedical

3277

1025

820

F1-Score

Euadr

RE

Biomedical

227

71

57

F1-Score

DDI

RE

Biomedical

2937

1004

979

F1-Score

ChemProt

RE

Biomedical

4154

2416

3458

F1-Score

i2b2

RE

Clinical

3110

11

6293

F1-Score

HoC

Document classification

Biomedical

1108

157

315

F1-Score

MedNLI

Inference

Clinical

11,232

1395

1422

Accuracy

MedSTS

Sentence similarity

Clinical

675

75

318

Pearson

BIOSSES

Sentence similarity

Biomedical

64

16

20

Pearson

BioASQ 4b-factoid

QA

Biomedical

327

161

Accuracy (Lenient)

BioASQ 5b-factoid

QA

Biomedical

486

150

Accuracy (Lenient)

BioASQ 6b-factoid

QA

Biomedical

618

161

Accuracy (Lenient)