Figure 4From: Statistical modeling of biomedical corpora: mining the Caenorhabditis Genetic Center Bibliography for genes related to life spanThe perplexity of LDA, mixture of unigrams, and unigram models estimated and evaluated on the CGC corpus. The score of test documents is shown against the number of latent topics (the perplexity of the unigram is constant because this statistical model has no notion of latent topics).Back to article page