Skip to main content

Table 15 Surface and parsing features generated from sentence text used for training non-kernel based classifiers

From: A detailed error analysis of 13 kernel methods for protein-protein interaction extraction

Feature type

Feature

Example

surface

distance (word/char)

sentence length in characters

  

entity distance in words

 

count

number of proteins in sentence

 

negation clues (s/b/w/a)

negation word before entities

 

hedge clues (s/b/w/a)

hedge word after entities

 

enumeration clues (b)

comma between entities

 

interaction word clues (s/b/w/a)

interaction word in sentence

 

entity modifier (a)

-ing word after first entity

parsing

distance (graph)

length of syntax tree shortest path

 

occurrence features (entire graph)

number of conj constituents in the syntax tree

 

occurrence features (shortest path)

number of conj constituents along the shortest path in the syntax tree

 

frequency features (entire graph)

relative frequency of conj labels over the dependency graph

 

frequency features (shortest path)

relative frequency of conj labels over the shortest path relations

 

entropy

Kullback-Leibler divergence of constituent types in the entire syntax tree

  1. Features may refer to both sentence and pair level characteristics. Parsing features were generated from both syntax and dependency parses. Scope of features are typically sentence (s), before entities (b), between entities (w), after entities (a).