Empirical evaluation of language modeling to ascertain cancer outcomes from clinical text reports

BMC Bioinformatics

Table 2 Model characteristics

Model	Architecture	# of parameters Trainable/Total	Pre-trained or contextual token embeddings?	Language model pre-trained in domain?	Language model frozen for classification training?	Final classification training layer/strategy tested
TF-IDF	Bag of words logistic regression with elastic net regularization	40 K	No	N/A	N/A	N/A
CNN	One-dimensional convolutional neural network with global max-pooling	7 M	No	N/A	N/A	N/A
BERT-base	BERT	766 K/110 M	Yes	No	Yes	CNN head
BERT-med	BERT	521 K/42 M	Yes	No	Yes	CNN head
BERT-mini	BERT	275 K/11 M	Yes	No	Yes	CNN head
BERT-tiny	BERT	152 K/4.5 M	Yes	No	Yes	CNN head
Longformer [13]	RoBERTa with local context and global attention	766 K/128 M	Yes	No	Yes	CNN head
ClinicalBERT [10]	BERT	766 K/110 M	Yes	Partial (trained on MIMIC-III ICU data) [39]	Yes	CNN head
DFCI-ImagingBERT, frozen	BERT	766 K/110 M	Yes	Yes (trained on DFCI imaging reports)	Yes	CNN head
DFCI-ImagingBERT, unfrozen	BERT	110 M	Yes	Yes (trained on DFCI imaging reports)	No	Linear head
Flan-T5 XXL	Text to Text Transfer Transformer	11 B	Yes	No	N/A (zero-shot learning only)	1−the predicted probability of the word “no”

TF-IDF Term Frequency-Inverse Document Frequency, CNN convolutional neural network, BERT Bidirectional Encoder Representations from Transformers [9], RoBERTa, Robustly optimized BERT approach [40], MIMIC Medical Information Mart for Intensive Care, DFCI Dana-Farber Cancer Institute

ISSN: 1471-2105