Skip to main content

Table 4 Structure of the candidate word selection model

From: MLM-based typographical error correction of unstructured medical texts for named entity recognition

Layer

Output shape

Input

(None, None, 768)

Dense

(768, 768)

Layer normalization

(768,)

Output

(None, None, 30,522)

  1. We added a dense layer of (768, 768) to the pre-trained BERT model to consider context of text using MLM. Input layer is (None, None, 768), Output layer is (None, None, 30,552), and Layer Normalization is (768,) to modify inputs for the next layer.