Skip to main content

Table 2 Token classes derived from SNOMED CT concept descriptions.

From: Building a biomedical tokenizer using the token lattice design pattern and the adapted Viterbi algorithm

Class

Examples

Whitespace

 

Independents

[ ? )

Dash or Hyphen

ACHE - Acetylcholine

Alphabetic

Does or dental

Numeric

1500 1.2 10,000 III 1/2

Possessive

’s

Substances

2-chloroaniline

Serotypes

O128:NM

Abbreviations

L.H. O/E

Acronyms

DIY

Lists

Paracetamol + caffeine

Range

C1-4

Functional names

H-987

  1. Token classes derived from SNOMED CT concept descriptions.