Table 1 Word lists used to generate features

From: A system for de-identifying medical message board text

Dictionary Source
Proper name Natural language toolkit [18] and Deid system [9, 16]
Common word Ispell, GNU spell-checker dictionary; inspired by Thomas et al. [8]
Stop word Generic list of very common English words
Medical word Deid system lexicon [9, 16]
Drug word Generated from the Cerner Multum Drug Lexicon, (Denver, CO)
Honorific Compiled by hand (e.g., mr., mrs, dr.)
All user Users that have posted to this message board, generated from “author” field of each message
User variant Users who have posted to this particular thread, with variants of these names derived automatically (strip digits, split by delimiters/camel case/known names and words)