How can functional annotations be derived from profiles of phenotypic annotations?

Serrano-Solano, Beatriz; Díaz Ramos, Antonio; Hériché, Jean-Karim; Ranea, Juan A. G.

doi:10.1186/s12859-017-1503-5

BMC Bioinformatics

Table 3 Similarity measures used in this study

From: How can functional annotations be derived from profiles of phenotypic annotations?

Name	Formula
Euclidean similarity	\(s^{2}\left (g_{1}, g_{2}\right)=\frac {1}{1+\left (x_{g1}-x_{g2}\right)\left (x_{g1}-x_{g2}\right)^{'}}\)
Correlation similarity	\(s\left (g_{1},g_{2}\right) = \frac {\left (x_{g1}-\overline {x}_{g1}\right)\left (x_{g2}-\overline {x}_{g2}\right)^{'}} {\sqrt {\left (x_{g1}-\overline {x}_{g1}\right)\left (x_{g1}-\overline {x}_{g1}\right)^{'}} \sqrt {\left (x_{g2}-\overline {x}_{g2}\right)\left (x_{g2}-\overline {x}_{g2}\right)^{'}}}\)
	where \(\overline {x}_{g1}=\frac {1}{n}\sum _{p \in P}x^{p}_{g1}\) and \(\overline {x}_{g2}=\frac {1}{n}\sum _{p \in P}x^{p}_{g2}\)
Cosine similarity	\(s\left (g_{1},g_{2}\right) = \frac {x_{g1}x_{g2}^{'}}{\sqrt {x_{g1}^{'}x_{g1}} \sqrt {x_{g2}^{'}x_{g2}}}\)
Hamming similarity	\(s\left (g_{1},g_{2}\right) = \frac {x^{p}_{g1}=x^{p}_{g2}}{n}\)
Jaccard similarity	\(s\left (g_{1},g_{2}\right) = 1 - \frac {\left [\left (x^{p}_{g1} \neq x^{p}_{g2}\right)\wedge \left (\left (x^{p}_{g1} \neq 0\right) \vee \left (x^{p}_{g2} \neq 0\right)\right)\right ]} {\left (x^{p}_{g1} \neq 0\right) \vee \left (x^{p}_{g2} \neq 0\right)}\)
Cohen’s kappa	\(s\left (g_{1},g_{2}\right)=\frac {p_{0}-p_{c}}{1-p_{c}}\) where:
	- p ₀ is the proportion of terms common to profiles g ₁ and g ₂, and
	- p _c is the proportion of terms common to profiles g ₁ and g ₂ expected by chance.
TF-IDF similarity	\(s\left (g_{1},g_{2}\right) = \max _{p \in P}\left \{x^{p}_{g1}x^{p}_{g2}IDF(p)\right \}\) where\(IDF(p)=log\frac {n_{G}}{1+\sum _{g \in G}{x^{p}_{g}}}\)
Resnik’s semantic similarity	s(t ₁,t ₂)=IC(t _MICA) where:
	- the Most Informative Common Ancestor is\(t_{MICA}={argmax}_{t \in S\left (t_{1},t_{2}\right)}{IC(t)}\),
	- the information content (IC) of a term t is IC(t)=−log(p(t)),
	- the probability of a term t is \(p(t)=\frac {annotations(t)}{totalAnnotations}\), and
	- S(t ₁,t ₂) is the set of common ancestors of t ₁ and t ₂.
Lin’s semantic similarity	\(s\left (t_{1},t_{2}\right) = {\frac {{2\cdot IC\left (t_{MICA}\right)}}{IC\left (t_{1}\right)+IC\left (t_{2}\right)}}\)
Schlicker’s semantic similarity	\(s\left (t_{1},t_{2}\right) = \frac {2\cdot IC\left (t_{MICA}\right)}{IC\left (t_{1}\right)+IC\left (t_{2}\right)}\cdot \left (1-p\left (t_{MICA}\right)\right)\)
Jiang’s semantic similarity	s(t ₁,t ₂)=1+2·IC(t _MICA)(IC(t ₁)+IC(t ₂))
Pesquita’s semantic similarity	\(s\left (t_{1},t_{2}\right) = \frac {\sum \limits _{t \in S(t_{1},t_{2})}{IC(t)}}{\sum \limits _{t \in P(t_{1},t_{2})}{IC(t)}}\) where:
	- P(t ₁,t ₂) is the set of ancestors of either t ₁ or t ₂.

G is the full set of genes (n _G=4198) and P is the set of 36 (n _P) phenotypes. x _g denotes the phenotypic profile of gene g with \(x^{p}_{g}=1\) if g shows phenotype p, \(x^{p}_{g}=0\) otherwise

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com