Skip to main content

Table 3 Feature sets for learning.

From: Mining clinical relationships from patient narratives

Feature set Size Description
tokN 8N Surface string and POS of tokens surrounding the arguments, windowed -N to +N, N = 6 by default
gentokN 8N Root and generalised POS of tokens surrounding the argument entities, windowed N to +N, N = 6 by default
atype 1 Concatenated semantic type of arguments, in arg1-arg2 order
dir 1 Direction: linear text order of the arguments (is arg1 before arg2, or vice versa?)
dist 2 Distance: absolute number of sentence and paragraph boundaries between arguments
str 14 Surface string features based on Zhou et al [29], see text for full description
pos 14 POS features, as above
root 14 Root features, as above
genpos 14 Generalised POS features, as above
inter 11 Intervening mentions: numbers and types of intervening entity mentions between arguments
event 5 Events: are any of the arguments, or intevening entities, events?
allgen 96 All above features in root and generalised POS forms, i.e. gen-tok6+atype+dir+dist+root+genpos+inter+event
notok 48 All above except tokN features, others in string and POS forms, i.e. atype+dir+dist+str+pos+inter+event
dep 16 Features based on a syntactic dependency path.
syndist 2 The distance between the two arguments, along a token path and along a syntactic dependency path.
  1. Feature sets used for learning relationships. The table is split into non-syntactic features, combined non-syntactic features, and syntactic features. The size of a set is the number of features in that set.