Skip to main content

Table 1 Topics extracted from our corpus using a plain LDA model

From: Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies

Topic 1

renal

ckd

cr

kidney

appt

lasix

disease

anemia

pth

iv

Topic 2

htn

lisinopril

hctz

bp

lipitor

asa

date

amlodipine

ldl

hpl

Topic 3

pulm

pulmonary

ct

chest

copd

lung

pfts

sob

cough

pna

  1. Words are ranked by their significance in the topic (i.e., in the first topic the most important word is ā€œrenalā€). The first topic includes words pertaining to renal disease, the second to hypertension and the third to symptoms and treatments related to the pulmonary system.