Skip to main content

Table 2 Performance metrics different fine-tuned language models

From: Leveraging pre-trained language models for mining microbiome-disease relationships

 

Model

Accuracy

F1 score

Precision

Recall

Baseline

BERE_TL(MDI)

NA

0.738

0.736

0.740

Our models (fine-tuned)

Bert-base-uncased

\(0.733 \pm 0.018\)

\(0.731 \pm 0.015\)

\(0.742 \pm 0.02\)

\(0.733 \pm 0.018\)

BioMegatron

\(0.778 \pm 0.008\)

\(0.769 \pm 0.013\)

\(0.771 \pm 0.013\)

\(0.778 \pm 0.008\)

PubMedBERT

\(0.782 \pm 0.022\)

\(0.778 \pm 0.019\)

\(0.783 \pm 0.021\)

\(0.782 \pm 0.022\)

BioClinicalBERT

\(0.729 \pm 0.032\)

\(0.724 \pm 0.029\)

\(0.731 \pm 0.032\)

\(0.729 \pm 0.032\)

BioLinkBERT-base

\({\textbf {0.811}} \pm {\textbf {0.029}}\)

\({\textbf {0.804}} \pm {\textbf {0.036}}\)

\({\textbf {0.813}} \pm {\textbf {0.034}}\)

\({\textbf {0.811}} \pm {\textbf {0.028}}\)

BioMedLM

\(0.806 \pm 0.028\)

\(0.804 \pm 0.028\)

\({\textbf {0.822}} \pm {\textbf {0.030}}\)

\(0.806 \pm 0.028\)

BioGPT

\(0.732 \pm 0.017\)

\(0.726 \pm 0.017\)

\(0.732 \pm 0.025\)

\(0.736 \pm 0.016\)

GPT-3

\({\textbf {0.814}} \pm {\textbf {0.021}}\)

\({\textbf {0.810}} \pm {\textbf {0.025}}\)

\(0.810 \pm 0.021\)

\({\textbf {0.814}} \pm {\textbf {0.021}}\)

  1. Bold indicates the performance of the models which gave the best performance