Skip to main content

Table 2 Model characteristics

From: Empirical evaluation of language modeling to ascertain cancer outcomes from clinical text reports

Model

Architecture

# of parameters Trainable/Total

Pre-trained or contextual token embeddings?

Language model pre-trained in domain?

Language model frozen for classification training?

Final classification training layer/strategy tested

TF-IDF

Bag of words logistic regression with elastic net regularization

40 K

No

N/A

N/A

N/A

CNN

One-dimensional convolutional neural network with global max-pooling

7 M

No

N/A

N/A

N/A

BERT-base

BERT

766 K/110 M

Yes

No

Yes

CNN head

BERT-med

BERT

521 K/42 M

Yes

No

Yes

CNN head

BERT-mini

BERT

275 K/11 M

Yes

No

Yes

CNN head

BERT-tiny

BERT

152 K/4.5 M

Yes

No

Yes

CNN head

Longformer [13]

RoBERTa with local context and global attention

766 K/128 M

Yes

No

Yes

CNN head

ClinicalBERT [10]

BERT

766 K/110 M

Yes

Partial (trained on MIMIC-III ICU data) [39]

Yes

CNN head

DFCI-ImagingBERT, frozen

BERT

766 K/110 M

Yes

Yes (trained on DFCI imaging reports)

Yes

CNN head

DFCI-ImagingBERT, unfrozen

BERT

110 M

Yes

Yes (trained on DFCI imaging reports)

No

Linear head

Flan-T5 XXL

Text to Text Transfer Transformer

11 B

Yes

No

N/A (zero-shot learning only)

1−the predicted probability of the word “no”

  1. TF-IDF Term Frequency-Inverse Document Frequency, CNN convolutional neural network, BERT Bidirectional Encoder Representations from Transformers [9], RoBERTa, Robustly optimized BERT approach [40], MIMIC Medical Information Mart for Intensive Care, DFCI Dana-Farber Cancer Institute