Benchmarking for biomedical natural language processing tasks with a domain specific ALBERT

BMC Bioinformatics

Table 1 Data used in prior state-of-the-art studies compared to ours (BioALBERT)

Training corpus	BioBERT [13]	SciBERT [11]	BLUE [12]	PubMedBERT [14]	KeBioLM [15]	BioALBERT
General	\(\checkmark\)	\(\times\)	\(\checkmark\)	\(\times\)	\(\times\)	\(\checkmark\)
PMC	\(\checkmark\)	\(\times\)	\(\times\)	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
PubMed	\(\checkmark\)	\(\times\)	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
Clinical notes	\(\times\)	\(\times\)	\(\checkmark\)	\(\times\)	\(\times\)	\(\checkmark\)

ISSN: 1471-2105