Skip to main content

Advertisement

Springer Nature is making Coronavirus research free. View research | View latest news | Sign up for updates

Table 1 Features Used Description of the Full Feature Set Used In the Closed Section Submission.

From: Exploring the boundaries: gene and protein identification in biomedical text

Word Features w i
  wi-1
  wi+1
  Last "real" word
  Next "real" word
  Disjunction of 4 previous words
  Disjunction of 4 next words
Bigrams w i + wi-1
  w i + wi+1
TnT POS POS i
  POSi-1
  POSi+1
Character Substrings Up to a length of 6
Abbreviations abbr i
  abbri-1+ abbr i
  abbr i + abbri+1
  abbri-1+ abbr i + abbri+1
Word Shape shape i
  shapei-1
  shapei+1
  shapei-1+ shape i
  shape i + shapei+1
  shapei-1+ shape i + shapei+1
Previous NE NEi-1
  NEi-2+ NEi-1
Previous NE + Word NEi-1+ w i
Previous NE + POS NEi-1+ POSi-1+ POS i
  NEi-2+ NEi-1+ POSi-2+ POSi-1+ POS i
Previous NE + Shape NEi-1+ shape i
  NEi-1+ shapei+1
  NEi-1+ shapei-1+ shape i
  NEi-2+ NEi-1+ shapei-2+ shapei-1+ shape i
Paren-Matching A feature that signals when one parentheses in a pair has been assigned a different tag than the other in a window of 4 words