Skip to main content

Table 4 Error analysis

From: How to make the most of NE dictionaries in statistical NER

 

False positives

 

Cause

Correct extraction

Identified term

1

lexicon

-

protein, binding sites

2

prefix word

trans-acting factor

common trans-acting factor

3

unknown word

-

ATTTGCAT

4

sequential labelling error

-

additional proteins

5

test set error

-

Estradiol receptors

 

False negatives

 

Cause

Correct extraction

Identified term

1

anaphoric

(the) receptor, (the) binding sites

-

2

coordination (and, or)

transcription factors NF-kappa B and AP-1

transcription factors NF-kappa B

3

prefix word

activation protein-1

protein-1

  

catfish STAT

STAT

4

postfix word

nuclear factor kappa B complex

nuclear factor kappa B

5

plural

protein tyrosine kinase(s)

protein tyrosine kinase

6

family name, biding site, and domain

T3 binding sites

-

  

residues 639–656

-

7

sequential labelling error

PCNA

-

  

Chloramphenicol acetyltransferase

-

8

test set error

superfamily member

-

  1. Error analysis of the results of the dictionary-based statistical approach.