Skip to main content
Fig. 3 | BMC Bioinformatics

Fig. 3

From: A novel pathway-based distance score enhances assessment of disease heterogeneity in gene expression

Fig. 3

Performance comparison when ρ=0 and B=1 for high dimension simulation. The median connectivity when p G  = 0.8 (panel a) and p G  = 0.6 (panel b) for different numbers of clusters using four distances: Euclidean distance using all genes (Euclidean All Genes), Euclidean distance using KEGG covered genes only (Euclidean KEGG), KEGG pathway-based distance score (Pathway KEGG) and the Euclidean distance of the pathway activity scores calculated by Pathifier (Pathifier_KEGG). Both the hierarchical tree clustering (HC) and the K-means (KMEANS) were used to calculate the connectivity criteria. Different lines in each panel represent the connectivity across the different number of clusters for each given value of δ = 0.5,0.7,0.9,1.1,1.3,1.5. The median purity criterion of the clustering results on the 100 simulated data sets when hierarchical clustering and K-means are applied to the four distances when p G  = 0.8 (panel c) and p G  = 0.6 (panel d). The number of clusters was set to be the optimal number of clusters identified based on the connectivity criteria using the corresponding calculated distance

Back to article page