Skip to main content

Table 2 Comparison the results of clustering

From: A novel procedure on next generation sequencing data analysis using text mining algorithm

Methoda

ARI

NMI

k-means

0.9232

0.8861

PCA(2) + k-means

0.9202

0.8547

PCA(5) + k-means

0.9322

0.8547

LDA(5)

0.5325

0.4209

LDA(30)

0.8634

0.7946

LDA(5) + k-means

0.4301

0.713

LDA(30) + k-means

0.9543

0.912

  1. a k-means: traditional k-means applying on VSM format of dataset, using Hamming distance
  2. PCA(2) + k-means: traditional k-means applying on 2 feature matrix obtained by PCA
  3. PCA(5) + k-means: traditional k-means applying on 5 feature matrix obtained by PCA
  4. LDA(5): “highest probable topic assignment” by LDA with 5 topics
  5. LDA(30): “highest probable topic assignment” by LDA with 30 topics
  6. LDA(5) + k-means: traditional k-means applying on sample-topic matrix by LDA with 5 topics
  7. LDA(30) + k-means: traditional k-means applying on sample-topic matrix by LDA with 30 topics
  8. Note that PCA(2) and PCA(5) exhibited better clustering qualities than PCA(10) and PCA(30), and are shown in the table
  9. Bold numbers indicate the best results among various methods