Skip to main content

Table 6 Orthographical features

From: NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition

Feature name

Regular Expression

INITCAP

^[A-Z].+

CAPWORD

^[A-Z] [a-z]+$

ALLCAPS

^[A-Z]+$

CAPSMIX

^[A-z]*([A-Z] [a-z]|[a-z] [A-Z]) [A-z]*$

ALPHANUMMIX

^[A-z0-9]*([0-9] [A-z]|[A-z] [0-9]) [A-z0-9]*$

ALPHANUM

^[A-z]+ [0-9]+$

UPPERCHAR

^[A-Z]$

LOWERCHAR

^[b-z]$

SHORTNUM

^[0-9] [0-9]?$

INTEGER

^-[0-9]+$

REAL

^-?[0-9]\.[0-9]+$

ROMAN

^[IVX]+$

HASDASH

-

INITDASH

^-

ENDDASH

-$

PUNCTUATION

^[,.;:?!]$

QUOTE

^["'']$