Methoda
| ARI | NMI |
---|
k-means | 0.9232 | 0.8861 |
PCA(2) + k-means | 0.9202 | 0.8547 |
PCA(5) + k-means | 0.9322 | 0.8547 |
LDA(5) | 0.5325 | 0.4209 |
LDA(30) | 0.8634 | 0.7946 |
LDA(5) + k-means | 0.4301 | 0.713 |
LDA(30) + k-means |
0.9543
|
0.912
|
-
a
k-means: traditional k-means applying on VSM format of dataset, using Hamming distance
- PCA(2) + k-means: traditional k-means applying on 2 feature matrix obtained by PCA
- PCA(5) + k-means: traditional k-means applying on 5 feature matrix obtained by PCA
- LDA(5): “highest probable topic assignment” by LDA with 5 topics
- LDA(30): “highest probable topic assignment” by LDA with 30 topics
- LDA(5) + k-means: traditional k-means applying on sample-topic matrix by LDA with 5 topics
- LDA(30) + k-means: traditional k-means applying on sample-topic matrix by LDA with 30 topics
- Note that PCA(2) and PCA(5) exhibited better clustering qualities than PCA(10) and PCA(30), and are shown in the table
- Bold numbers indicate the best results among various methods