A novel procedure on next generation sequencing data analysis using text mining algorithm

BMC Bioinformatics

Table 2 Comparison the results of clustering

^a k-means: traditional k-means applying on VSM format of dataset, using Hamming distance
PCA(2) + k-means: traditional k-means applying on 2 feature matrix obtained by PCA
PCA(5) + k-means: traditional k-means applying on 5 feature matrix obtained by PCA
LDA(5): “highest probable topic assignment” by LDA with 5 topics
LDA(30): “highest probable topic assignment” by LDA with 30 topics
LDA(5) + k-means: traditional k-means applying on sample-topic matrix by LDA with 5 topics
LDA(30) + k-means: traditional k-means applying on sample-topic matrix by LDA with 30 topics
Note that PCA(2) and PCA(5) exhibited better clustering qualities than PCA(10) and PCA(30), and are shown in the table
Bold numbers indicate the best results among various methods

ISSN: 1471-2105