Skip to main content

Table 2 Description of all molecular featurizers benchmarks

From: Learning self-supervised molecular representations for drug–drug interaction prediction

Featurizer

Type

Dataset name

Dataset size

Feature vector. dim

Architecture

ECFP

Hashed fingerprint

2048

ChemBERTa-77M

Pretrained

PubChem

77 M

384

Transformer

MOL2VEC

Pretrained

ZINC + ChemBL

19.9 M

300

Word2vec

SMR-DDI

Pretrained

Chembl

200 K

262

CNN

ChemGPT-1B

Pretrained

PubChem

10 M

256

Transformer

MACCKEYS

Structural fingerprint

166

gin_supervised_edgepred

Pretrained

ChembL + ZINC15

465 K + 2 M

300

Graph

ChemGPT-4M

Pretrained

PubChem

10 M

128

Transformer

gin_supervised_contextpred

Pretrained

ChembL + ZINC15

465 K + 2 M

300

Graph

gin_supervised_masking

Pretrained

ChembL + ZINC15

465 K + 2 M

300

Graph

gin_supervised_masking

Pretrained

ChembL + ZINC15

465 K + 2 M

300

Graph