Skip to main content

Table 2 Comparison of the results on the lung cancer dataset using the three proposed topic model-derived clustering methods.

From: Topic modeling for cluster analysis of large biological and medical datasets

Methods

k

Cluster ID

Adenocarcinoma

Squamous cell carcinoma

No. of misclassified samples

NMI

Clustering based on feature selection

2

1

42

11

22

0.2809

  

2

11

47

  
 

3

1

40

8

21

0.2417

  

2

4

15

  
  

3

9

35

  
 

4

1

37

8

18

0.2926

  

2

9

35

  
  

3

0

14

  
  

4

7

1

  

Clustering based on highest topic assignment

2

1

13

46

25

0.2296

  

2

40

12

  
 

3

1

11

29

25

0.1847

  

2

37

9

  
  

3

5

20

  
 

4

1

5

13

26

0.1744

  

2

13

26

  
  

3

1

12

  
  

4

34

7

  

Clustering based on feature extraction

2

1

13

47

24

0.2461

  

2

40

11

  
 

3

1

8

34

24

0.2055

  

2

8

16

  
  

3

37

8

  
 

4

1

7

6

25

0.1820

  

2

33

6

  
  

3

8

31

  
  

4

5

15

 Â