From: A detailed error analysis of 13 kernel methods for protein-protein interaction extraction
Feature type | Feature | Example |
---|---|---|
surface | distance (word/char) | sentence length in characters |
entity distance in words | ||
count | number of proteins in sentence | |
negation clues (s/b/w/a) | negation word before entities | |
hedge clues (s/b/w/a) | hedge word after entities | |
enumeration clues (b) | comma between entities | |
interaction word clues (s/b/w/a) | interaction word in sentence | |
entity modifier (a) | -ing word after first entity | |
parsing | distance (graph) | length of syntax tree shortest path |
occurrence features (entire graph) | number of conj constituents in the syntax tree | |
occurrence features (shortest path) | number of conj constituents along the shortest path in the syntax tree | |
frequency features (entire graph) | relative frequency of conj labels over the dependency graph | |
frequency features (shortest path) | relative frequency of conj labels over the shortest path relations | |
entropy | Kullback-Leibler divergence of constituent types in the entire syntax tree |