BMC Bioinformatics

Table 4 K-means clustering accuracy and running time of SIDER2 dataset

From: A heuristic approach to determine an appropriate number of topics in topic modeling

T	5	10	20	30	40	50
Purity**(k = 20)	0.41	0.44	0.53	0.53	0.53	0.58
Purity(k = 30)	0.41	0.44	0.56	0.50	0.54	0.60
Time (ms)	43,378	45,233	48,252	49,278	50,493	51,443
T	60	70	80	90	100
Purity (k = 20)	0.59	0.55	0.57	0.56	0.54
Purity(k = 30)	0.59	0.57	0.57	0.56	0.56
Time (ms)	52,526	52,577	54,298	54,468	54,608

**Purity of each cluster is calculated as the ratio of correctly classified drugs in the total 996 drugs in the cluster. The ratios in the table represent the average purities of k clusters obtained for each topic modeling.

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com