Method^{a}

ARI

NMI


kmeans

0.9232

0.8861

PCA(2) + kmeans

0.9202

0.8547

PCA(5) + kmeans

0.9322

0.8547

LDA(5)

0.5325

0.4209

LDA(30)

0.8634

0.7946

LDA(5) + kmeans

0.4301

0.713

LDA(30) + kmeans

0.9543

0.912


^{a}
kmeans: traditional kmeans applying on VSM format of dataset, using Hamming distance
 PCA(2) + kmeans: traditional kmeans applying on 2 feature matrix obtained by PCA
 PCA(5) + kmeans: traditional kmeans applying on 5 feature matrix obtained by PCA
 LDA(5): “highest probable topic assignment” by LDA with 5 topics
 LDA(30): “highest probable topic assignment” by LDA with 30 topics
 LDA(5) + kmeans: traditional kmeans applying on sampletopic matrix by LDA with 5 topics
 LDA(30) + kmeans: traditional kmeans applying on sampletopic matrix by LDA with 30 topics
 Note that PCA(2) and PCA(5) exhibited better clustering qualities than PCA(10) and PCA(30), and are shown in the table
 Bold numbers indicate the best results among various methods