Skip to main content

Table 3 Comparison of the results on the lung cancer dataset using the proposed method of topic model-derived clustering based on feature selection and two conventional clustering methods of k-means and PCA.

From: Topic modeling for cluster analysis of large biological and medical datasets

Methods

k

Cluster ID

Adenocarcinoma

Squamous cell carcinoma

No. of misclassified samples

NMI

Topic model-derived clustering based on feature selection

2

1

42

11

22

0.2809

  

2

11

47

  
 

3

1

40

8

21

0.2417

  

2

4

15

  
  

3

9

35

  
 

4

1

37

8

18

0.2926

  

2

9

35

  
  

3

0

14

  
  

4

7

1

  

k-means

2

1

41

12

24

0.2461

  

2

12

46

  
 

3

1

8

35

31

0.1365

  

2

27

17

  
  

3

18

6

  
 

4

1

6

14

25

0.1602

  

2

22

6

  
  

3

18

6

  
  

4

7

32

  

PCA (10 features) + k-means

2

1

12

46

24

0.2461

  

2

41

12

  
 

3

1

8

35

31

0.1456

  

2

22

6

  
  

3

23

17

  
 

4

1

16

5

25

0.1605

  

2

6

14

  
  

3

7

32

  
  

4

24

7

 Â