Skip to main content

Table 3 List of boundary words

From: Investigating heterogeneous protein annotations toward cross-corpora utilization

Category Word AIMed GENIA GENETAG
   N a N n E b N a N n E b N a N n E b
Adjective constitutive 0 0 0.0000 12 11 0.9986 2 2 1.0000
  endogenous 0 0 0.0000 22 11 0.9183 9 9 1.0000
  exogenous 0 0 0.0000 9 16 0.9427 2 3 0.9710
  inducible 0 0 0.0000 18 17 0.9994 0 0 0.0000
  low 0 0 0.0000 14 11 0.9896 2 3 0.9710
  major 0 0 0.0000 25 15 0.9544 1 5 0.6500
  putative 0 0 0.0000 15 15 1.0000 0 0 0.0000
  recombinant 1 8 0.5033 36 24 0.9710 26 2 0.3712
  soluble 1 10 0.4395 14 15 0.9991 1 4 0.7219
Noun before factor 0 0 0.0000 5 26 0.6374 17 15 0.9972
  plasma 0 0 0.0000 13 1 0.3712 17 12 0.9784
  protein 12 34 0.8281 159 18 0.4743 53 10 0.6313
Noun after form 0 0 0.0000 21 14 0.9710 0 0 0.0000
  pathway 0 0 0.0000 0 0 0.0000 8 10 0.9911
  protein 40 17 0.8791 794 14 0.1262 241 2 0.0688
  1. Here, "Noun before" indicates the noun occurring before an entity as a modifier, "Noun after" indicates the noun occurring after an entity as a head. N a represents the number of the annotated occurrences, and N n represents the number of un-annotated occurrences.