Skip to main content

Table 1 Accuracy of BERT models on three independent datasets. DrugProt is the dataset containing 2788 positive articles based on DTIs (positive class) and 1215 from negative articles class. Medline is a completely negative dataset, and ChEMBL is a completely positive dataset containing DTIs

From: Using BERT to identify drug-target interactions from whole PubMed

Dataset

Articles

BERT

SciBERT

BioBERT

BioMed-RoBERTa

BlueBERT

Majority voting

DrugProt

4003

68

65.9

71.5

71.4

67.5

69.6

Medline

55,056

99.7

98.6

75.2

99.9

100

100

ChEMBL

876

89.6

93.2

91.2

83.4

88.7

90.3

  1. Bold values indicate the top results for a dataset