Table 4 Error analysis

From: How to make the most of NE dictionaries in statistical NER

  False positives
  Cause Correct extraction Identified term
1 lexicon - protein, binding sites
2 prefix word trans-acting factor common trans-acting factor
3 unknown word - ATTTGCAT
4 sequential labelling error - additional proteins
5 test set error - Estradiol receptors
  False negatives
  Cause Correct extraction Identified term
1 anaphoric (the) receptor, (the) binding sites -
2 coordination (and, or) transcription factors NF-kappa B and AP-1 transcription factors NF-kappa B
3 prefix word activation protein-1 protein-1
   catfish STAT STAT
4 postfix word nuclear factor kappa B complex nuclear factor kappa B
5 plural protein tyrosine kinase(s) protein tyrosine kinase
6 family name, biding site, and domain T3 binding sites -
   residues 639–656 -
7 sequential labelling error PCNA -
   Chloramphenicol acetyltransferase -
8 test set error superfamily member -
  1. Error analysis of the results of the dictionary-based statistical approach.