Skip to main content

Table 1 The accuracy rate of identifying the true number of clusters when ρ =0, B = 1 and p G  = 0.2

From: A novel pathway-based distance score enhances assessment of disease heterogeneity in gene expression

δ

 

0.5

0.6

0.7

0.8

0.9

1.0

1.1

1.2

1.3

1.4

1.5

HC

Euclid All

13%

13%

10%

8%

7%

7%

2%

6%

1%

10%

7%

Euclid KEGG

6%

8%

2%

3%

4%

2%

3%

19%

35%

55%

72%

Path KEGG

22%

37%

34%

38%

45%

61%

75%

89%

98%

99%

100%

Kmeans

Euclid All

0%

0%

0%

0%

0%

0%

0%

0%

0%

0%

0%

Euclid KEGG

0%

0%

0%

0%

0%

0%

3%

19%

39%

54%

77%

Path KEGG

19%

50%

77%

92%

97%

97%

100%

100%

100%

99%

100%

  1. When there is no correlation between genes, for different values of δ, the percentage of simulated data sets for which the given distances identify 3 as the optimal number of clusters based on the connectivity criteria is shown. Both hierarchical tree (HC) and K-means (Kmeans) were used as clustering method