Skip to main content

Table 1 Similarity score calculation

From: Automatic detection of false annotations via binary property clustering

Source

Annotation

CD63_RABITa

CD68_HUMANa

Frequencyb

-ln(freq)

SwissProt

Antigen

1

1

0.007130

4.9435114

SwissProt

Lysosome

1

1

0.001929

6.2506136

SwissProt

Glycoprotein

1

1

0.094727

2.3567562

SwissProt

Transmembrane

1

1

0.159770

1.8340200

SwissProt

Alternative splicing

0

1

0.029281

-

SwissProt

Signal

0

1

0.123850

-

SwissProt

Repeat

0

1

0.078968

-

InterPro

Serum albumin family

1

0

0.000342

-

InterPro

CD9/CD37/CD63 antigen

1

0

0.000666

-

InterPro

Lysosome-associated membrane glycoprotein (lamp)/CD68

0

1

0.000123

-

GO

Membrane

1

1

0.210869

1.5565182

GO

Lysosome

1

1

0.002043

6.1932038

GO

Vacuole

1

1

0.002184

6.1267895

GO

Lytic vacuole

1

1

0.002043

6.1932038

GO

Cell

1

1

0.440206

0.8205125

GO

Integral membrane protein

1

1

0.160874

1.8271338

GO

Cytoplasm

1

1

0.186569

1.6789541

GO

Intracellular

1

1

0.307578

1.1790266

    

Similarity Score:

40.960244

  1. The table shows a calculation of the similarity score between two SwissProt proteins: Rabbit CD63 antigen (CD63_RABIT) and Human Microsialin precursor (CD68_HUMAN). The similarity score is the summation of -ln(freq) on all annotations that are shared by both proteins. a – 1 or 0 indicate if the given protein has or does not have the annotation respectively. b – The frequency is the portion of proteins in the database that have the annotation.