Skip to main content

Table 2 List of binary similarity measures included in SimCAL

From: SimCAL: a flexible tool to compute biochemical reaction similarity

S. No.

Measure

Definition

Range

1.

Tanimoto

\( \frac{a}{\left(a+b\right)+\left(a+c\right)-c} \)

[0-1]

2.

Dice

\( \frac{2a}{2a+b+c} \)

[0-1]

3.

Ochiai

\( \frac{a}{\sqrt{\left(a+b\right)}\left(a+c\right)} \)

[0-1]

4.

Simpson

\( \frac{a}{\min \kern0.5em \left(a+b,a+c\right)} \)

[0-1]

5.

Russell and Rao

\( \frac{a}{a+b+c+d} \)

[0-1]

6.

Sokal and Michener

\( \frac{a+d}{a+b+c+d} \)

[0-1]

7.

Faith

\( \frac{a+0.5d}{a+b+c+d} \)

[0-1]

8.

Gower and Legendre

\( \frac{a+d}{a+0.5\left(b+c\right)+d} \)

[0-1]

9.

Roger and Tanimoto

\( \frac{a+d}{a+2\left(b+c\right)+d} \)

[0-1]

  1. The measures are in correspondence to [45]. a is count of set bits in both fingerprint of both the molecules. b is count of set bits in fingerprint of first molecule and not in second molecule. c is count of set bits in fingerprint of second molecule and not in first molecule. d is count of unset bits in both fingerprint of both the molecules. The size of the fingerprint is given by n = (a + b + c + d)