Skip to main content

Table 2 Token classes derived from SNOMED CT concept descriptions.

From: Building a biomedical tokenizer using the token lattice design pattern and the adapted Viterbi algorithm

Class Examples
Whitespace  
Independents [ ? )
Dash or Hyphen ACHE - Acetylcholine
Alphabetic Does or dental
Numeric 1500 1.2 10,000 III 1/2
Possessive ’s
Substances 2-chloroaniline
Serotypes O128:NM
Abbreviations L.H. O/E
Acronyms DIY
Lists Paracetamol + caffeine
Range C1-4
Functional names H-987
  1. Token classes derived from SNOMED CT concept descriptions.