Skip to main content

Table 15 Surface and parsing features generated from sentence text used for training non-kernel based classifiers

From: A detailed error analysis of 13 kernel methods for protein-protein interaction extraction

Feature type Feature Example
surface distance (word/char) sentence length in characters
   entity distance in words
  count number of proteins in sentence
  negation clues (s/b/w/a) negation word before entities
  hedge clues (s/b/w/a) hedge word after entities
  enumeration clues (b) comma between entities
  interaction word clues (s/b/w/a) interaction word in sentence
  entity modifier (a) -ing word after first entity
parsing distance (graph) length of syntax tree shortest path
  occurrence features (entire graph) number of conj constituents in the syntax tree
  occurrence features (shortest path) number of conj constituents along the shortest path in the syntax tree
  frequency features (entire graph) relative frequency of conj labels over the dependency graph
  frequency features (shortest path) relative frequency of conj labels over the shortest path relations
  entropy Kullback-Leibler divergence of constituent types in the entire syntax tree
  1. Features may refer to both sentence and pair level characteristics. Parsing features were generated from both syntax and dependency parses. Scope of features are typically sentence (s), before entities (b), between entities (w), after entities (a).