Skip to main content

Table 4 K-means clustering accuracy and running time of SIDER2 dataset

From: A heuristic approach to determine an appropriate number of topics in topic modeling

T

5

10

20

30

40

50

Purity**(k = 20)

0.41

0.44

0.53

0.53

0.53

0.58

Purity(k = 30)

0.41

0.44

0.56

0.50

0.54

0.60

Time (ms)

43,378

45,233

48,252

49,278

50,493

51,443

T

60

70

80

90

100

 

Purity (k = 20)

0.59

0.55

0.57

0.56

0.54

 

Purity(k = 30)

0.59

0.57

0.57

0.56

0.56

 

Time (ms)

52,526

52,577

54,298

54,468

54,608

 
  1. **Purity of each cluster is calculated as the ratio of correctly classified drugs in the total 996 drugs in the cluster. The ratios in the table represent the average purities of k clusters obtained for each topic modeling.