Skip to main content

Table 15 The features used in the baseline argument classification model

From: BIOSMILE: A semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features

BASIC FEATURES

• Predicate – The predicate lemma

• Path – The syntactic path through the parsing tree from the constituent being classified to the predicate

• Constituent type

• Position – Whether the phrase is located before or after the predicate

• Voice – passive if the predicate has a POS tag VBN, and its chunk is not a VP, or it is preceded by a form of "to be" or "to get" within its chunk; otherwise, it is active

• Head word – Calculated using the head word table described by Collins (1999)

• Head POS – The POS of the Head Word

• Sub-categorization – The phrase structure rule that expands the predicate's parent node in the parsing tree

• First and last Word and their POS tags

• Level – The level in the parsing tree

PREDICATE FEATURES

• Predicate's verb class

• Predicate POS tag

• Predicate frequency

• Predicate's context POS

• Number of predicates

FULL PARSING FEATURES

• Parent, left sibling, and right sibling paths, constituent types, positions, head words, and head POS tags

• Head of Prepositional Phrase (PP) parent – If the parent is a PP, then the head of this PP is also used as a feature

COMBINATION FEATURES

• Predicate distance combination

• Predicate phrase type combination

• Head word and predicate combination

• Voice position combination

OTHERS

• Syntactic frame of predicate/NP

• Headword suffixes of lengths 2, 3, and 4

• Number of words in the phrase

• Context words & POS tags