A detailed error analysis of 13 kernel methods for protein-protein interaction extraction

BMC Bioinformatics

Table 15 Surface and parsing features generated from sentence text used for training non-kernel based classifiers

Feature type	Feature	Example
surface	distance (word/char)	sentence length in characters
		entity distance in words
	count	number of proteins in sentence
	negation clues (s/b/w/a)	negation word before entities
	hedge clues (s/b/w/a)	hedge word after entities
	enumeration clues (b)	comma between entities
	interaction word clues (s/b/w/a)	interaction word in sentence
	entity modifier (a)	-ing word after first entity
parsing	distance (graph)	length of syntax tree shortest path
	occurrence features (entire graph)	number of conj constituents in the syntax tree
	occurrence features (shortest path)	number of conj constituents along the shortest path in the syntax tree
	frequency features (entire graph)	relative frequency of conj labels over the dependency graph
	frequency features (shortest path)	relative frequency of conj labels over the shortest path relations
	entropy	Kullback-Leibler divergence of constituent types in the entire syntax tree

Features may refer to both sentence and pair level characteristics. Parsing features were generated from both syntax and dependency parses. Scope of features are typically sentence (s), before entities (b), between entities (w), after entities (a).

ISSN: 1471-2105