Skip to main content

Table 3 The IDF scores for the LINGOs in the sample imaginary SMILES strings S M I 1 and S M I 2. The IDF scores are computed by assuming that S M I 1 and S M I 2 are compounds in the enzyme data set consisting of 445 compounds in total

From: A comparative study of SMILES-based compound similarity functions for drug-target interaction prediction

LINGO Dictionary

IDF (log10(N/df))

OC(O

log10(445/2)

C(O)

log10(445/113)

(O) =

log10(445/105)

O) =O

log10(445/143)

CCCC

log10(445/61)

CCC(

log10(445/49)

CC(O

log10(445/36)

O) =C

log10(445/4)

) =C0

log10(445/5)