Figure 4From: Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategiesModel fit as function of number of topics on the WSJ corpora. In (a) we compare the effect of size on LDA, bigger corpora yield better fit. In (b) we examine the effect of redundancy: the doubled/trebled corpora reduce fit slightly while the noisier WSJs5 performs almost as badly as training on the smaller WSJ-600 corpus.Back to article page