Skip to main content

Table 3 Discovering rules from a gene/protein dictionary

From: Normalizing biomedical terms by minimizing ambiguity and variability

  Dictionary   Lookup performance
Iter. Ambiguity Variability Rule Precision Recall
0 1.004 10.399 (convert capital letters to lower case) 0.975 0.194
1 1.006 10.101 ‘ ’ → ‘-’ 0.967 0.233
2 1.009 9.759 ‘-’ → ‘’ 0.966 0.280
3 1.012 9.318 ‘protein’ → ‘’ 0.958 0.340
4 1.013 9.155 ‘precursor’ → ‘’ 0.959 0.347
5 1.013 9.038 ‘,’ → ‘’ 0.961 0.366
6 1.013 9.006 ‘incfinger’ → ‘nf’ 0.961 0.368
7 1.013 8.979 ‘isoforma’ → ‘’ 0.962 0.375
8 1.013 8.953 ‘isoformb’ → ‘’ 0.962 0.377
9 1.013 8.937 ‘prepro’ → ‘’ 0.962 0.379
10 1.013 8.916 ‘ike’ → ‘’ 0.962 0.380
11 1.013 8.911 ‘rotocadherin’ → ‘cdh’ 0.962 0.380
12 1.013 8.891 ‘(drosophila)’ → ‘’ 0.962 0.383
13 1.013 8.873 ‘variant’ → ‘’ 0.962 0.384
14 1.014 8.867 ‘nterleukin’ → ‘l’ 0.962 0.384
15 1.014 8.857 ‘drosophilahomologof’ → ‘homolog’ 0.963 0.385
16 1.014 8.846 ‘coupledrecepto’ → ‘p’ 0.963 0.387
17 1.014 8.830 ‘(s.cerevisiae)’ → ‘’ 0.963 0.390
: : : : : :
20 1.014 8.805 ‘oncogene’ → ‘’ 0.963 0.393
21 1.014 8.796 ‘ingfinger’ → ‘nf’ 0.963 0.394
22 1.014 8.790 ‘isoformc’ → ‘’ 0.963 0.395
23 1.014 8.783 ‘ransmembrane’ → ‘mem’ 0.963 0.395
24 1.014 8.778 ‘ibosomal’ → ‘p’ 0.964 0.396
25 1.014 8.770 ‘subunit’ → ‘chain’ 0.964 0.397
26 1.014 8.761 ‘s.cerevisiaehomologof’ → ‘’ 0.964 0.398
: : : : : :
34 1.014 8.719 ‘/’ → ‘f’ 0.962 0.400
: : : : : :
37 1.014 8.703 ‘hypothetical’ → ‘’ 0.962 0.402
: : : : : :
41 1.014 8.685 ‘eptid’ → ‘rote’ 0.962 0.403
42 1.014 8.682 ‘eucinerichrepeatcontaining’ → ‘rrc’ 0.962 0.403
43 1.014 8.678 ‘betadefensin’ → ‘defb’ 0.962 0.404
: : : : : :
57 1.014 8.639 ‘molecule’ → ‘antigen’ 0.962 0.405
: : : : : :
62 1.014 8.631 ‘oxonly’ → ‘x’ 0.962 0.406
63 1.014 8.627 ‘hromosome21openreadingframe’ → ‘21orf’ 0.962 0.407
64 1.014 8.625 ‘typeicytoskeletal’ → ‘’ 0.962 0.408
: : : : : :
68 1.014 8.611 ‘member’ → ‘’ 0.962 0.410
69 1.014 8.587 ‘lfactoryreceptorfamily’ → ‘r’ 0.963 0.413
: : : : : :