From: Empirical evaluation of language modeling to ascertain cancer outcomes from clinical text reports
Progression | |||||||
---|---|---|---|---|---|---|---|
Accuracy | Precision | AUROC [95% CI] | F1 | AUPRC | Recall | MCC | |
BERT-Base | 0.88 [0.87, 0.90] | 0.71 [0.66, 0.76] | 0.92 [0.91, 0.94] | 0.72 [0.68, 0.76] | 0.76 [0.72, 0.81] | 0.74 [0.69, 0.79] | 0.65 [0.60, 0.70] |
BERT-Med | 0.88 [0.86, 0.89] | 0.69 [0.64, 0.73] | 0.92 [0.91, 0.94] | 0.73 [0.69, 0.76] | 0.75 [0.69, 0.80] | 0.77 [0.72, 0.81] | 0.65 [0.60, 0.69] |
BERT-Mini | 0.85 [0.83, 0.87] | 0.61 [0.56, 0.66] | 0.89 [0.88, 0.91] | 0.68 [0.64, 0.72] | 0.68 [0.62, 0.73] | 0.77 [0.72, 0.81] | 0.59 [0.54, 0.63] |
BERT-Tiny | 0.80 [0.78, 0.82] | 0.51 [0.46, 0.56] | 0.84 [0.82, 0.86] | 0.56 [0.52, 0.61] | 0.56 [0.50, 0.63] | 0.63 [0.58, 0.68] | 0.43 [0.38, 0.49] |
Longformer | 0.86 [0.84, 0.87] | 0.70 [0.63, 0.75] | 0.89 [0.87, 0.91] | 0.62 [0.57, 0.67] | 0.70 [0.65, 0.75] | 0.55 [0.50, 0.61] | 0.54 [0.48, 0.59] |
Clinical BERT | 0.88 [0.86, 0.89] | 0.69 [0.64, 0.74] | 0.93 [0.91, 0.94] | 0.72 [0.68, 0.75] | 0.77 [0.72, 0.82] | 0.75 [0.70, 0.80] | 0.64 [0.59, 0.69] |
DFCI-ImagingBERT (BERT frozen, CNN head) | 0.90 [0.89, 0.92] | 0.75 [0.70, 0.79] | 0.95 [0.94, 0.96] | 0.78 [0.74, 0.81] | 0.84 [0.80, 0.87] | 0.81 [0.77, 0.85] | 0.72 [0.68, 0.76] |
DFCI-ImagingBERT (BERT unfrozen, linear head) | 0.90 [0.89, 0.92] | 0.74 [0.69, 0.79] | 0.95 [0.94, 0.96] | 0.78 [0.74, 0.81] | 0.85 [0.81, 0.89] | 0.81 [0.77, 0.85] | 0.71 [0.67, 0.76] |
CNN | 0.89 [0.87, 0.90] | 0.72 [0.66, 0.76] | 0.93 [0.92, 0.95] | 0.74 [0.70, 0.78] | 0.81 [0.77, 0.85] | 0.77 [0.72, 0.82] | 0.67 [0.62, 0.72] |
TF-IDF | 0.88 [0.86, 0.89] | 0.72 [0.67, 0.77] | 0.92 [0.90, 0.93] | 0.69 [0.64, 0.73] | 0.75 [0.71, 0.80] | 0.66 [0.61, 0.71] | 0.61 [0.56, 0.66] |
Flan-T5-XXL (zero-shot) | 0.89 [0.87, 0.90] | 0.77 [0.72, 0.82] | 0.92 [0.91, 0.94] | 0.71 [0.66, 0.75] | 0.77 [0.72, 0.81] | 0.65 [0.60, 0.71] | 0.64 [0.59, 0.69] |
Response | |||||||
---|---|---|---|---|---|---|---|
Accuracy | Precision | AUROC [95% CI] | F1 | AUPRC | Recall | MCC | |
BERT-Base | 0.93 [0.92, 0.95] | 0.80 [0.74, 0.85] | 0.93 [0.90, 0.95] | 0.73 [0.68, 0.78] | 0.78 [0.73, 0.83] | 0.67 [0.61, 0.74] | 0.70 [0.64, 0.75] |
BERT-Med | 0.93 [0.92, 0.94] | 0.75 [0.69, 0.81] | 0.92 [0.90, 0.95] | 0.71 [0.66, 0.76] | 0.78 [0.72, 0.83] | 0.68 [0.62, 0.74] | 0.67 [0.62, 0.73] |
BERT-Mini | 0.92 [0.91, 0.94] | 0.72 [0.65, 0.78] | 0.90 [0.88, 0.93] | 0.71 [0.66, 0.76] | 0.74 [0.67, 0.79] | 0.71 [0.65, 0.77] | 0.67 [0.61, 0.72] |
BERT-Tiny | 0.89 [0.88, 0.91] | 0.59 [0.53, 0.66] | 0.86 [0.83, 0.89] | 0.61 [0.55, 0.67] | 0.63 [0.57, 0.70] | 0.63 [0.57, 0.70] | 0.55 [0.49, 0.61] |
Longformer | 0.92 [0.90, 0.93] | 0.80 [0.72, 0.87] | 0.89 [0.86, 0.91] | 0.61 [0.54, 0.67] | 0.71 [0.64, 0.77] | 0.49 [0.42, 0.56] | 0.59 [0.52, 0.65] |
Clinical BERT | 0.93 [0.92, 0.94] | 0.77 [0.70, 0.83] | 0.93 [0.90, 0.95] | 0.72 [0.66, 0.77] | 0.77 [0.70, 0.83] | 0.67 [0.61, 0.74] | 0.68 [0.62, 0.73] |
DFCI-ImagingBERT (BERT frozen, CNN head) | 0.94 [0.93, 0.95] | 0.83 [0.77, 0.89] | 0.94 [0.93, 0.96] | 0.76 [0.71, 0.80] | 0.81 [0.76, 0.86] | 0.69 [0.63, 0.76] | 0.73 [0.67, 0.78] |
DFCI-ImagingBERT (BERT unfrozen, linear head) | 0.94 [0.93, 0.95] | 0.84 [0.77, 0.89] | 0.93 [0.91, 0.95] | 0.73 [0.68, 0.78] | 0.80 [0.75, 0.85] | 0.65 [0.59, 0.72] | 0.71 [0.65, 0.76] |
CNN | 0.93 [0.92, 0.94] | 0.92 [0.86, 0.97] | 0.94 [0.92, 0.96] | 0.67 [0.60, 0.72] | 0.82 [0.77, 0.87] | 0.52 [0.45, 0.59] | 0.66 [0.60, 0.72] |
TF-IDF | 0.93 [0.91, 0.94] | 0.81 [0.74, 0.87] | 0.93 [0.91, 0.95] | 0.68 [0.63, 0.73] | 0.75 [0.69, 0.81] | 0.59 [0.53, 0.66] | 0.65 [0.59, 0.71] |
Flan-T5-XXL (zero-shot) | 0.92 [0.90, 0.93] | 0.69 [0.63, 0.76] | 0.90 [0.87, 0.93] | 0.69 [0.64, 0.74] | 0.69 [0.61, 0.75] | 0.68 [0.61, 0.75] | 0.64 [0.58, 0.70] |