Skip to main content

Table 2 List of binary similarity measures included in SimCAL

From: SimCAL: a flexible tool to compute biochemical reaction similarity

S. No. Measure Definition Range
1. Tanimoto \( \frac{a}{\left(a+b\right)+\left(a+c\right)-c} \) [0-1]
2. Dice \( \frac{2a}{2a+b+c} \) [0-1]
3. Ochiai \( \frac{a}{\sqrt{\left(a+b\right)}\left(a+c\right)} \) [0-1]
4. Simpson \( \frac{a}{\min \kern0.5em \left(a+b,a+c\right)} \) [0-1]
5. Russell and Rao \( \frac{a}{a+b+c+d} \) [0-1]
6. Sokal and Michener \( \frac{a+d}{a+b+c+d} \) [0-1]
7. Faith \( \frac{a+0.5d}{a+b+c+d} \) [0-1]
8. Gower and Legendre \( \frac{a+d}{a+0.5\left(b+c\right)+d} \) [0-1]
9. Roger and Tanimoto \( \frac{a+d}{a+2\left(b+c\right)+d} \) [0-1]
  1. The measures are in correspondence to [45]. a is count of set bits in both fingerprint of both the molecules. b is count of set bits in fingerprint of first molecule and not in second molecule. c is count of set bits in fingerprint of second molecule and not in first molecule. d is count of unset bits in both fingerprint of both the molecules. The size of the fingerprint is given by n = (a + b + c + d)