Skip to main content

Table 3 Collocation detection results in the different corpora

From: Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies

  All informative (redundant) - 8,557 notes Last informative (non-redundant) - 1,247 notes Reduced redundancy - 3,970 notes
Collocations (TMI/PMI) 5,649/15,814 2,082/2,527 3,590/6,034
Avg. number of patients per collocation (TMI/PMI) 32/18 74/66 48/37
% collocations that appear in notes of 3 patients or less (TMI/PMI) 32%/36% 1.2%/1% 6.2%/5.8%