From: Identifying gene and protein mentions in text using conditional random fields
Orthographic Feature | Reg. Exp. |
---|---|
Init Caps | [A-Z].* |
Init Caps Alpha | [A-Z] [a-z]* |
All Caps | [A-Z]+ |
Caps Mix | [A-Za-z]+ |
Has Digit | .*[0-9].* |
Single Digit | [0-9] |
Double Digit | [0-9][0-9] |
Natural Number | [0-9]+ |
Real Number | [-0-9]+ [.,]+[0-9].,]+ |
Alpha-Num | [A-Za-z0-9]+ |
Roman | [ivxdlcm]+ or [IVXDLCM]+ |
Has Dash | .*-.* |
Init Dash | -.* |
End Dash | .*- |
Punctuation | [,.;:?!-+'"'] |