Skip to main content

Table 6 Comparison of most frequent LDA top five topic terms and top five Theme-generated terms

From: Discovering themes in biomedical literature using a projection-based algorithm

LDA term Freq. in topics / Freq. in SNP Theme term Freq. in themes / Freq. in SNP
  Freq. in themes    Freq. in topics  
polymorphisms 46/0 32,071 cancer 94/14 8,175
gene 45/0 34,735 risk 47/24 20,363
genetic 42/3 29,383 patients 40/37 21,422
associated 37/0 31,365 diabetes 39/7 3,594
patients 37/40 21,422 schizophrenia 36/4 1,806
study 36/0 32,116 dna 36/21 11,098
association 30/11 30,831 genome-wide 32/5 8,100
disease 29/17 15,968 traits 31/6 4,063
analysis 27/1 23,797 method 28/6 6,551
receptor 25/10 7,511 populations 27/11 7,962
two 24/0 17,683 power 26/1 2,171
risk 24/47 20,363 data 23/19 15,234
results 22/0 31,862 loci 23/4 7,006
p 22/11 25,037 genome 23/5 5,790
dna 21/36 11,098 snps 22/18 23,870
genes 20/14 19,411 repair 21/5 1,388
data 19/23 15,234 sequencing 21/4 6,596
snps 18/22 23,870 disorder 21/6 3,517
polymorphism 17/4 23,162 haplotype 21/5 8,933
cell 16/11 5,832 expression 21/10 9,020
  1. Column 1 lists the most frequent LDA terms, followed by number of LDA topics/themes that contain that term in Column 2, and frequency of the term in the SNP dataset in Column 3. Columns 4-6 present similar information for the most frequent Theme-generated terms