Table 6 Redundancy in same patient note pairs

From: Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies

Corpus Redundancy of in-corpus note pairs Number of pairs in sample
All Informative 29% 2,000
Selective- Fingerprinting maximum similarity 0.33 12.70% 380
Selective-Fingerprinting maximum similarity 0.25 9.80% 305
Selective-Fingerprinting maximum similarity 0.2 9.30% 263
  1. Amount of redundancy in a random sample of 2,000 same-patient note pairs within the corpora using different similarity thresholds.