Skip to main content

Table 6 Evaluation using gene/protein name snippets from MEDLINE abstracts

From: Normalizing biomedical terms by minimizing ambiguity and variability

  Dictionary   Lookup performance
Iter. Ambiguity Variability Rule Precision Recall
0 5.797 12.479 (convert capital letters to lower case) 0.782 0.582
1 5.807 12.161 ‘-’ → ‘’ 0.766 0.603
2 5.811 12.025 ‘ precursor’ → ‘’ 0.767 0.611
3 5.812 11.941 ‘,’ → ‘’ 0.767 0.611
4 5.812 11.907 ‘inc finger protein’ → ‘nf’ 0.767 0.611
5 5.812 11.868 ‘ isoform 1’ → ‘’ 0.767 0.611
6 5.813 11.832 ‘ isoform 2’ → ‘’ 0.766 0.611
7 5.813 11.806 ‘ isoform a’ → ‘’ 0.766 0.611
8 5.813 11.781 ‘ isoform b’ → ‘’ 0.766 0.611
9 5.813 11.748 ‘ containing protein’ → ‘containing’ 0.766 0.611
10 5.813 11.730 ‘ variant’ → ‘’ 0.766 0.611
: : : : : :
21 5.815 11.597 ‘nterleukin’ → ‘l’ 0.767 0.613
: : : : : :
24 5.816 11.566 ‘specific’ → ‘’ 0.767 0.615
: : : : : :
33 5.816 11.450 ‘protein’ → ‘gene’ 0.765 0.616
34 5.828 11.056 ‘ gene’ → ‘’ 0.765 0.619
: : : : : :
38 5.829 11.016 ‘ recepto’ → ‘’ 0.767 0.623
: : : : : :
44 5.830 10.970 ‘ alph’ → ‘’ 0.765 0.625
: : : : : :
75 5.831 10.838 ‘ i’ → ‘1’ 0.766 0.626
: : : : : :
84 5.831 10.790 ‘ lpha’ → ‘’ 0.766 0.627
: : : : : :
86 5.831 10.782 ‘ beta’ → ‘b’ 0.767 0.630
: : : : : :
100 5.832 10.732 ‘ type’ → ‘’ 0.767 0.633
\