Skip to main content

Table 3 Discovering rules from a gene/protein dictionary

From: Normalizing biomedical terms by minimizing ambiguity and variability

 

Dictionary

 

Lookup performance

Iter.

Ambiguity

Variability

Rule

Precision

Recall

0

1.004

10.399

(convert capital letters to lower case)

0.975

0.194

1

1.006

10.101

‘ ’ → ‘-’

0.967

0.233

2

1.009

9.759

‘-’ → ‘’

0.966

0.280

3

1.012

9.318

‘protein’ → ‘’

0.958

0.340

4

1.013

9.155

‘precursor’ → ‘’

0.959

0.347

5

1.013

9.038

‘,’ → ‘’

0.961

0.366

6

1.013

9.006

‘incfinger’ → ‘nf’

0.961

0.368

7

1.013

8.979

‘isoforma’ → ‘’

0.962

0.375

8

1.013

8.953

‘isoformb’ → ‘’

0.962

0.377

9

1.013

8.937

‘prepro’ → ‘’

0.962

0.379

10

1.013

8.916

‘ike’ → ‘’

0.962

0.380

11

1.013

8.911

‘rotocadherin’ → ‘cdh’

0.962

0.380

12

1.013

8.891

‘(drosophila)’ → ‘’

0.962

0.383

13

1.013

8.873

‘variant’ → ‘’

0.962

0.384

14

1.014

8.867

‘nterleukin’ → ‘l’

0.962

0.384

15

1.014

8.857

‘drosophilahomologof’ → ‘homolog’

0.963

0.385

16

1.014

8.846

‘coupledrecepto’ → ‘p’

0.963

0.387

17

1.014

8.830

‘(s.cerevisiae)’ → ‘’

0.963

0.390

:

:

:

:

:

:

20

1.014

8.805

‘oncogene’ → ‘’

0.963

0.393

21

1.014

8.796

‘ingfinger’ → ‘nf’

0.963

0.394

22

1.014

8.790

‘isoformc’ → ‘’

0.963

0.395

23

1.014

8.783

‘ransmembrane’ → ‘mem’

0.963

0.395

24

1.014

8.778

‘ibosomal’ → ‘p’

0.964

0.396

25

1.014

8.770

‘subunit’ → ‘chain’

0.964

0.397

26

1.014

8.761

‘s.cerevisiaehomologof’ → ‘’

0.964

0.398

:

:

:

:

:

:

34

1.014

8.719

‘/’ → ‘f’

0.962

0.400

:

:

:

:

:

:

37

1.014

8.703

‘hypothetical’ → ‘’

0.962

0.402

:

:

:

:

:

:

41

1.014

8.685

‘eptid’ → ‘rote’

0.962

0.403

42

1.014

8.682

‘eucinerichrepeatcontaining’ → ‘rrc’

0.962

0.403

43

1.014

8.678

‘betadefensin’ → ‘defb’

0.962

0.404

:

:

:

:

:

:

57

1.014

8.639

‘molecule’ → ‘antigen’

0.962

0.405

:

:

:

:

:

:

62

1.014

8.631

‘oxonly’ → ‘x’

0.962

0.406

63

1.014

8.627

‘hromosome21openreadingframe’ → ‘21orf’

0.962

0.407

64

1.014

8.625

‘typeicytoskeletal’ → ‘’

0.962

0.408

:

:

:

:

:

:

68

1.014

8.611

‘member’ → ‘’

0.962

0.410

69

1.014

8.587

‘lfactoryreceptorfamily’ → ‘r’

0.963

0.413

:

:

:

:

:

: