Skip to main content

Table 2 Comparison the results of clustering

From: A novel procedure on next generation sequencing data analysis using text mining algorithm

Methoda ARI NMI
k-means 0.9232 0.8861
PCA(2) + k-means 0.9202 0.8547
PCA(5) + k-means 0.9322 0.8547
LDA(5) 0.5325 0.4209
LDA(30) 0.8634 0.7946
LDA(5) + k-means 0.4301 0.713
LDA(30) + k-means 0.9543 0.912
  1. a k-means: traditional k-means applying on VSM format of dataset, using Hamming distance
  2. PCA(2) + k-means: traditional k-means applying on 2 feature matrix obtained by PCA
  3. PCA(5) + k-means: traditional k-means applying on 5 feature matrix obtained by PCA
  4. LDA(5): “highest probable topic assignment” by LDA with 5 topics
  5. LDA(30): “highest probable topic assignment” by LDA with 30 topics
  6. LDA(5) + k-means: traditional k-means applying on sample-topic matrix by LDA with 5 topics
  7. LDA(30) + k-means: traditional k-means applying on sample-topic matrix by LDA with 30 topics
  8. Note that PCA(2) and PCA(5) exhibited better clustering qualities than PCA(10) and PCA(30), and are shown in the table
  9. Bold numbers indicate the best results among various methods