Skip to main content

Table 2 BERT model statistics: model parameters, vocabulary size in wordpieces, and number of English language words in the pretraining data

From: Dependency parsing of biomedical text with BERT

Model Params (M) Vocab (K) Words (Eng.) (B)
Google BERT large 340 29 3.3
Google mBERT 180 120 2.5
SciBERT base scivocab uncased 110 31 3.2
BioBERT large v1.1. custom vocab 360 59 21.3
BlueBERT base P+M 110 31 4.5