Skip to main content

Table 6 Comparison of most frequent LDA top five topic terms and top five Theme-generated terms

From: Discovering themes in biomedical literature using a projection-based algorithm

LDA term

Freq. in topics /

Freq. in SNP

Theme term

Freq. in themes /

Freq. in SNP

 

Freq. in themes

  

Freq. in topics

 

polymorphisms

46/0

32,071

cancer

94/14

8,175

gene

45/0

34,735

risk

47/24

20,363

genetic

42/3

29,383

patients

40/37

21,422

associated

37/0

31,365

diabetes

39/7

3,594

patients

37/40

21,422

schizophrenia

36/4

1,806

study

36/0

32,116

dna

36/21

11,098

association

30/11

30,831

genome-wide

32/5

8,100

disease

29/17

15,968

traits

31/6

4,063

analysis

27/1

23,797

method

28/6

6,551

receptor

25/10

7,511

populations

27/11

7,962

two

24/0

17,683

power

26/1

2,171

risk

24/47

20,363

data

23/19

15,234

results

22/0

31,862

loci

23/4

7,006

p

22/11

25,037

genome

23/5

5,790

dna

21/36

11,098

snps

22/18

23,870

genes

20/14

19,411

repair

21/5

1,388

data

19/23

15,234

sequencing

21/4

6,596

snps

18/22

23,870

disorder

21/6

3,517

polymorphism

17/4

23,162

haplotype

21/5

8,933

cell

16/11

5,832

expression

21/10

9,020

  1. Column 1 lists the most frequent LDA terms, followed by number of LDA topics/themes that contain that term in Column 2, and frequency of the term in the SNP dataset in Column 3. Columns 4-6 present similar information for the most frequent Theme-generated terms