Table 1 Topics extracted from our corpus using a plain LDA model

From: Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies

Topic 1 renal ckd cr kidney appt lasix disease anemia pth iv
Topic 2 htn lisinopril hctz bp lipitor asa date amlodipine ldl hpl
Topic 3 pulm pulmonary ct chest copd lung pfts sob cough pna
  1. Words are ranked by their significance in the topic (i.e., in the first topic the most important word is “renal”). The first topic includes words pertaining to renal disease, the second to hypertension and the third to symptoms and treatments related to the pulmonary system.