Skip to main content

Table 3 Comparison of the results on the lung cancer dataset using the proposed method of topic model-derived clustering based on feature selection and two conventional clustering methods of k-means and PCA.

From: Topic modeling for cluster analysis of large biological and medical datasets

Methods k Cluster ID Adenocarcinoma Squamous cell carcinoma No. of misclassified samples NMI
Topic model-derived clustering based on feature selection 2 1 42 11 22 0.2809
   2 11 47   
  3 1 40 8 21 0.2417
   2 4 15   
   3 9 35   
  4 1 37 8 18 0.2926
   2 9 35   
   3 0 14   
   4 7 1   
k-means 2 1 41 12 24 0.2461
   2 12 46   
  3 1 8 35 31 0.1365
   2 27 17   
   3 18 6   
  4 1 6 14 25 0.1602
   2 22 6   
   3 18 6   
   4 7 32   
PCA (10 features) + k-means 2 1 12 46 24 0.2461
   2 41 12   
  3 1 8 35 31 0.1456
   2 22 6   
   3 23 17   
  4 1 16 5 25 0.1605
   2 6 14   
   3 7 32   
   4 24 7