Skip to main content

Table 1 Features Used Description of the Full Feature Set Used In the Closed Section Submission.

From: Exploring the boundaries: gene and protein identification in biomedical text

Word Features

w i

 

wi-1

 

wi+1

 

Last "real" word

 

Next "real" word

 

Disjunction of 4 previous words

 

Disjunction of 4 next words

Bigrams

w i + wi-1

 

w i + wi+1

TnT POS

POS i

 

POSi-1

 

POSi+1

Character Substrings

Up to a length of 6

Abbreviations

abbr i

 

abbri-1+ abbr i

 

abbr i + abbri+1

 

abbri-1+ abbr i + abbri+1

Word Shape

shape i

 

shapei-1

 

shapei+1

 

shapei-1+ shape i

 

shape i + shapei+1

 

shapei-1+ shape i + shapei+1

Previous NE

NEi-1

 

NEi-2+ NEi-1

Previous NE + Word

NEi-1+ w i

Previous NE + POS

NEi-1+ POSi-1+ POS i

 

NEi-2+ NEi-1+ POSi-2+ POSi-1+ POS i

Previous NE + Shape

NEi-1+ shape i

 

NEi-1+ shapei+1

 

NEi-1+ shapei-1+ shape i

 

NEi-2+ NEi-1+ shapei-2+ shapei-1+ shape i

Paren-Matching

A feature that signals when one parentheses in a pair has been assigned a different tag than the other in a window of 4 words