Skip to main content

Table 2 Comparison of the results on the lung cancer dataset using the three proposed topic model-derived clustering methods.

From: Topic modeling for cluster analysis of large biological and medical datasets

Methods k Cluster ID Adenocarcinoma Squamous cell carcinoma No. of misclassified samples NMI
Clustering based on feature selection 2 1 42 11 22 0.2809
   2 11 47   
  3 1 40 8 21 0.2417
   2 4 15   
   3 9 35   
  4 1 37 8 18 0.2926
   2 9 35   
   3 0 14   
   4 7 1   
Clustering based on highest topic assignment 2 1 13 46 25 0.2296
   2 40 12   
  3 1 11 29 25 0.1847
   2 37 9   
   3 5 20   
  4 1 5 13 26 0.1744
   2 13 26   
   3 1 12   
   4 34 7   
Clustering based on feature extraction 2 1 13 47 24 0.2461
   2 40 11   
  3 1 8 34 24 0.2055
   2 8 16   
   3 37 8   
  4 1 7 6 25 0.1820
   2 33 6   
   3 8 31   
   4 5 15