Skip to main content

Table 1 List of 79 binary similarity and dissimilarity measures

From: Finding an appropriate equation to measure similarity between binary vectors: case studies on Indonesian and Japanese herbal medicines

Eq. IDs

Equations

References

Note

1

\( {S}_{Jaccard}=\frac{a}{a+b+c} \)

[1, 20, 21, 23, 24, 29, 4043, 4550, 55]

 

2

\( {S}_{Dice-2}=\frac{a}{2a+b+c} \)

[20, 21, 47, 48]

 

3

\( {S}_{Dice-1/ Czekanowski}=\frac{2a}{2a+b+c} \)

[3, 23, 24, 29, 4042, 4447, 49, 50, 55]

***

4

\( {S}_{3W- Jaccard}=\frac{3a}{3a+b+c} \)

[23, 24, 43, 47]

 

5

\( {S}_{Nei\&Li}=\frac{2a}{\left(a+b\right)+\left(a+c\right)} \)

[23, 40, 54]

*

6

\( {S}_{Sokal\& Sneath-1}=\frac{a}{a+2b+2c} \)

[1, 23, 24, 40, 45, 47, 55]

 

7

\( {S}_{Sokal\& Michener}=\frac{a+d}{a+b+c+d} \)

[1, 3, 20, 21, 23, 24, 29, 4042, 45, 46, 4850]

 

8

\( {S}_{Sokal\& Sneath-2}=\frac{2\left(a+d\right)}{2a+b+c+2d} \)

[1, 23, 24, 40, 45, 49, 50, 55]

 

9

\( {S}_{Roger\& Tanimoto}=\frac{a+d}{a+2\left(b+c\right)+d} \)

[20, 21, 23, 24, 29, 40, 41, 45, 46, 4850, 55, 56]

 

10

\( {S}_{Faith}=\frac{a+0.5d}{a+b+c+d} \)

[23, 24, 56, 57]

 

11

\( {S}_{Gower\& Legendre}=\frac{a+d}{a+0.5\left(b+c\right)+d} \)

[23, 24, 58]

*

12

S Intersection  = a

[23, 47]

 

13

S Innerproduct  = a + d

[23]

***

14

\( {S}_{Russell\&Rao}=\frac{a}{a+b+c+d} \)

[1, 3, 20, 21, 23, 24, 29, 40, 41, 45, 4750, 55, 56]

***

15

D Hamming  = b + c

[23, 48, 59]

 

16

\( {D}_{Euclid}=\sqrt{b+c} \)

[23]

 

17

\( {D}_{Squared- euclid}=\sqrt{{\left(b+c\right)}^2} \)

[23, 60]

*

18

\( {D}_{Canberra}={\left(b+c\right)}^{\frac{2}{2}} \)

[23]

*

19

D Manhattan  = b + c

[23]

*

20

\( {D}_{Mean- Manhattan}=\frac{b+c}{a+b+c+d} \)

[23, 55]

***

21

D Cityblock  = b + c

[23]

*

22

\( {D}_{Minkowski}={\left(b+c\right)}^{\frac{1}{1}} \)

[23]

*

23

\( {D}_{Vari}=\frac{b+c}{4\left(a+b+c+d\right)} \)

[23, 61]

***

24

\( {D}_{SizeDifference}=\frac{{\left(b+c\right)}^2}{{\left(a+b+c+d\right)}^2} \)

[23]

 

25

\( {D}_{ShapeDifference}=\frac{n\left(b+c\right)-{\left(b-c\right)}^2}{{\left(a+b+c+d\right)}^2} \)

[23]

 

26

\( {D}_{PatternDifference}=\frac{4bc}{{\left(a+b+c+d\right)}^2} \)

[23]

 

27

\( {D}_{Lance\& Williams}=\frac{b+c}{2a+b+c} \)

[23, 61]

 

28

\( {D}_{Bray\& Curtis}=\frac{b+c}{2a+b+c} \)

[23]

*

29

\( {D}_{Hellinger}=2\sqrt{\left(1-\frac{a}{\sqrt{\left(a+b\right)\left(a+c\right)}}\right)} \)

[23]

 

30

\( {D}_{Chord}=\sqrt{2\left(1-\frac{a}{\sqrt{\left(a+b\right)\left(a+c\right)}}\right)} \)

[23]

***

31

\( {S}_{Cosine}=\frac{a}{\sqrt{\left(a+b\right)\left(a+c\right)}} \)

[24, 55]

 

32

\( {S}_{Gilbert\& Wells}= \log a- \log n- \log \left(\frac{a+b}{n}\right)- \log \left(\frac{a+c}{n}\right) \)

[23, 45]

**

33

\( {S}_{Ochiai-1}=\frac{a}{\sqrt{\left(a+b\right)\left(a+c\right)}} \)

[23, 24, 29, 40, 41, 49, 55, 56]

*

34

\( {S}_{Forbes-1}=\frac{na}{\left(a+b\right)\left(a+c\right)} \)

[23, 24, 40, 45, 47, 55]

 

35

\( {S}_{Fossum}=\frac{n{\left(a-0.5\right)}^2}{\left(a+b\right)\left(a+c\right)} \)

[23, 24, 55]

 

36

\( {S}_{Sorgenfrei}=\frac{a^2}{\left(a+b\right)\left(a+c\right)} \)

[23, 24, 40, 45]

 

37

\( {S}_{Mountford}=\frac{a}{0.5\left( ab+ac\right)+bc} \)

[23, 24, 40, 45]

**

38

\( {S}_{Otsuka}=\frac{a}{{\left(\left(a+b\right)\left(a+c\right)\right)}^{0.5}} \)

[23, 46]

*

39

\( {S}_{McConnaughey}=\frac{a^2-bc}{\left(a+b\right)\left(a+c\right)} \)

[23, 40, 45, 55]

 

40

\( {S}_{Tarwid}=\frac{na-\left(a+b\right)\left(a+c\right)}{na+\left(a+b\right)\left(a+c\right)} \)

[23, 45]

 

41

\( {S}_{Kulczynski-2}=\frac{\frac{a}{2}\left(2a+b+c\right)}{\left(a+b\right)\left(a+c\right)} \)

[23, 40, 45, 46, 49, 55]

***

42

\( {S}_{Driver\& Kroeber}=\frac{a}{2}\left(\frac{1}{a+b}+\frac{1}{a+c}\right) \)

[23, 40, 45]

***

43

\( {S}_{Johnson}=\frac{a}{a+b}+\frac{a}{a+c} \)

[23, 24, 40, 45, 51]

***

44

\( {S}_{Dennis}=\frac{ad-bc}{\sqrt{n\left(a+b\right)\left(a+c\right)}} \)

[23, 24, 55]

 

45

\( {S}_{Simpson}=\frac{a}{ \min \left(a+b,a+c\right)} \)

[23, 24, 40, 45, 55]

 

46

\( {S}_{Braun\& Banquet}=\frac{a}{ \max \left(a+b,a+c\right)} \)

[23, 24, 40, 45, 47]

 

47

\( {S}_{Fager\& McGowan}=\frac{a}{\sqrt{\left(a+b\right)\left(a+c\right)}}-\frac{ \max \left(a+b,a+c\right)}{2} \)

[23, 45]

 

48

\( {S}_{Forbes-2}=\frac{na-\left(a+b\right)\left(a+c\right)}{n \min \left(a+b,a+c\right)-\left(a+b\right)\left(a+c\right)} \)

[23, 45]

 

49

\( {S}_{Sokal\& Sneath-4}=\frac{\frac{a}{\left(a+b\right)}+\frac{a}{\left(a+c\right)}+\frac{d}{\left(b+d\right)}+\frac{d}{\left(c+d\right)}}{4} \)

[1, 24, 40, 45]

 

50

\( {S}_{Gower}=\frac{a+d}{\sqrt{\left(a+b\right)\left(a+c\right)\left(b+d\right)\left(c+d\right)}} \)

[23]

 

51

\( {S}_{Pearson-1}={\chi}^2=\frac{n{\left( ad-bc\right)}^2}{\left(a+b\right)\left(a+c\right)\left(c+d\right)\left(b+d\right)} \)

[23, 40, 45]

 

52

\( {S}_{Pearson-2}={\left(\frac{\chi^2}{n+{\chi}^2}\right)}^{\frac{1}{2}} \)

[23, 45]

 

53

\( {S}_{Pearson-3}={\left(\frac{\rho }{n+\rho}\right)}^{\frac{1}{2}} \)

\( \mathrm{where}\kern0.75em \rho =\frac{ad-bc}{\sqrt{\left(a+b\right)\left(a+c\right)\left(b+d\right)\left(c+d\right)}} \)

[23]

**

54

\( {S}_{Pearson\& Heron-1}=\frac{ad-bc}{\sqrt{\left(a+b\right)\left(a+c\right)\left(b+d\right)\left(c+d\right)}} \)

[20, 21, 23, 24, 40, 45]

 

55

\( {S}_{Pearson\& Heron-2}= \cos \left(\frac{\pi \sqrt{bc}}{\sqrt{ad}+\sqrt{bc}}\right) \)

[23, 45]

 

56

\( {S}_{Sokal\& Sneath-3}=\frac{a+d}{b+c} \)

[23, 40, 45, 55]

**

57

\( {S}_{Sokal\& Sneath-5}=\frac{ad}{\left(a+b\right)\left(a+c\right)\left(b+d\right){\left(c+d\right)}^{0.5}} \)

[1, 23, 24, 40, 45]

 

58

\( {S}_{Cole}=\frac{\sqrt{2}\left( ad-bc\right)}{\sqrt{{\left( ad-bc\right)}^2-\left(a+b\right)\left(a+c\right)\left(b+d\right)\left(c+d\right)}} \)

[23, 45]

**

59

\( {S}_{Stiles}={ \log}_{10}\frac{n{\left(\left| ad-bc\right|-\frac{n}{2}\right)}^2}{\left(a+b\right)\left(a+c\right)\left(b+d\right)\left(c+d\right)} \)

[23, 40, 53, 55]

 

60

\( {S}_{Ochiai-2}=\frac{ad}{\sqrt{\left(a+b\right)\left(a+c\right)\left(b+d\right)\left(c+d\right)}} \)

[23, 29, 49]

*

61

\( {S}_{Yuleq}=\frac{ad-bc}{ad+bc} \)

[20, 21, 23, 24, 40, 41, 45, 46, 48, 55]

 

62

\( {D}_{Yuleq}=\frac{2bc}{ad+bc} \)

[23]

 

63

\( {S}_{Yulew}=\frac{\sqrt{ad}-\sqrt{bc}}{\sqrt{ad}+\sqrt{bc}} \)

[3, 23, 24, 40, 45]

 

64

\( {S}_{Kulczynski-1}=\frac{a}{b+c} \)

[3, 20, 21, 23, 4550, 55]

**

65

\( {S}_{Tanimoto}=\frac{a}{\left(a+b\right)+\left(a+c\right)-a} \)

[1, 23, 24, 55]

*

66

\( {S}_{Disperson}=\frac{ad-bc}{{\left(a+b+c+d\right)}^2} \)

[23, 24]

 

67

\( {S}_{Hamann}=\frac{\left(a+d\right)-\left(b+c\right)}{a+b+c+d} \)

[3, 23, 40, 45, 46, 49, 50, 55]

***

68

\( {S}_{Michael}=\frac{4\left( ad-bc\right)}{{\left(a+d\right)}^2+{\left(b+c\right)}^2} \)

[23, 24, 40, 45, 52]

 

69

\( {S}_{Goodman\& Kruskal}=\frac{\sigma -{\sigma}^{\hbox{'}}}{2n-{\sigma}^{\hbox{'}}} \)

\( \begin{array}{l}\mathrm{where}\;\sigma = \max \left(a,b\right)+ \max \left(c,d\right)+ \max \left(a,c\right)+ \max \left(b,d\right)\\ {}\kern1.56em {\sigma}^{\hbox{'}}= \max \left(a+c,b+d\right)+ \max \left(a+b,c+d\right)\end{array} \)

[23]

**

70

\( {S}_{Anderberg}=\frac{\sigma -{\sigma}^{\hbox{'}}}{2n} \)

[23]

**

71

\( {S}_{Baroni- Urbani\& Buser-1}=\frac{\sqrt{ad}+a}{\sqrt{ad}+a+b+c} \)

[23, 24, 40, 45, 55, 56, 62]

 

72

\( {S}_{Baroni- Urbani\& Buser-2}=\frac{\sqrt{ad}+a-\left(b+c\right)}{\sqrt{ad}+a+b+c} \)

[23, 24, 40, 45, 62]

***

73

\( {S}_{Peirce}=\frac{ab+bc}{ab+2bc+ cd} \)

[23, 45]

**

74

\( {S}_{Eyraud}=\frac{n^2\left(na-\left(a+b\right)\left(a+c\right)\right)}{\left(a+b\right)\left(a+c\right)\left(b+d\right)\left(c+d\right)} \)

[23]

 

75

\( {S}_{Tarantula}=\frac{\frac{a}{\left(a+b\right)}}{\frac{c}{\left(c+d\right)}}=\frac{a\left(c+d\right)}{c\left(a+b\right)} \).

[23]

**

76

\( {S}_{Ample}=\left|\frac{\frac{a}{\left(a+b\right)}}{\frac{c}{\left(c+d\right)}}\right|=\left|\frac{a\left(c+d\right)}{c\left(a+b\right)}\right| \).

[23]

**

77

\( {S}_{Derived\_ Rusell-Rao}=\frac{ \log \left(1+a\right)}{ \log \left(1+n\right)} \).

[1, 24]

 

78

\( {S}_{Derived\_ Jaccard}=\frac{ \log \left(1+a\right)}{ \log \left(1+a+b+c\right)} \)

[1, 24]

 

79

\( {S}_{Var\_ of\_ Correlation}=\frac{ \log \left(1+ ad\right)- \log \left(1+bc\right)}{ \log \left(1+{n}^2/4\right)} \)

[1, 24]

 
  1. S is similarity measure, D is dissimilarity measure, *means algebraically redundant, **means produce infinite/NaN coefficients or indeterminate forms, ***means grouped in the same cluster with zero or nearly to zero distance, n is a constant (n = M = a + b + c + d)