Skip to main content

Table 5 Orthographic Features.

From: Extraction of semantic biomedical relations from text using conditional random fields

Orthographic Feature

Regular Expression

Init Caps

[A-Z].*

Init Caps Alpha

[A-Z][a-z]*

All Caps

[A-Z]+

Caps Mix

[A-Za-z]+

Has Digit

.*[0-9].*

Single Digit

[0-9]

Double Digit

[0-9][0-9]

Natural Number

[0-9]+

Real Number

[-\+][[0-9]+[\.,]+[0-9].,]+

Alpha-Numeric

[A-Za-z0-9]+

Roman

[ivxdlcm]+|[IVXDLCM]+

Has Dash

.*-.*

Init Dash

-.*

End Dash

.*-

Punctuation

[,\.;:\?!-\+"]

Greek

(alpha|beta|...|omega)

Has Greek

.*\b(alpha|beta|...|omega)\b.*

Mutation Pattern

\w*\d+-*\D+

  1. Orthographic features and their corresponding regular expressions used in the experiments.