From: Extraction of semantic biomedical relations from text using conditional random fields
Orthographic Feature | Regular Expression |
---|---|
Init Caps | [A-Z].* |
Init Caps Alpha | [A-Z][a-z]* |
All Caps | [A-Z]+ |
Caps Mix | [A-Za-z]+ |
Has Digit | .*[0-9].* |
Single Digit | [0-9] |
Double Digit | [0-9][0-9] |
Natural Number | [0-9]+ |
Real Number | [-\+][[0-9]+[\.,]+[0-9].,]+ |
Alpha-Numeric | [A-Za-z0-9]+ |
Roman | [ivxdlcm]+|[IVXDLCM]+ |
Has Dash | .*-.* |
Init Dash | -.* |
End Dash | .*- |
Punctuation | [,\.;:\?!-\+"] |
Greek | (alpha|beta|...|omega) |
Has Greek | .*\b(alpha|beta|...|omega)\b.* |
Mutation Pattern | \w*\d+-*\D+ |