Skip to main content
Fig. 4 | BMC Bioinformatics

Fig. 4

From: Euclidean distance-optimized data transformation for cluster analysis in biomedical data (EDOtrans)

Fig. 4

Results of hierarchical cluster analysis of a modified version of the “Lsun” dataset form the “Fundamental Clustering and Projection Suite” (FCPS) [28]. The data set n = 400 instances with d = 4 variables (X1–X4), of which 2 variables are original and two were the same variables but randomly permuted, and k = 3 classes (see insert C). A The original and transformed data (z-transformation, EDO transformation, and PVS transformation [6]) are shown as a probability density function (PDF) estimated using the Pareto density estimation (PDE [27]), which was developed as a nonparametric kernel density estimator to improve subgroup separation in mixtures. B Cluster quality and stability assessed a as cluster accuracy and adjusted Rand index [21] against the prior classification of the data, and as Dunn’s index [22]. The boxes were constructed using minimum, quartiles, median (solid line inside the box) and maximum. The whiskers add 1.5 times the inter-quartile range (IQR) to the 75th percentile or subtract 1.5 times the IQR from the 25th percentile. The figure has been created using the R software package (version 4.1.2 for Linux; https://CRAN.R-project.org/ [9]) and the R packages “ggplot2” (https://cran.r-project.org/package=ggplot2 [10]), and “FactoMineR” (https://cran.r-project.org/package=FactoMineR [16]). The colors were selected from the “colorblind_pal” palette provided with the R library “ggthemes” (https://cran.r-project.org/package=ggthemes [11])

Back to article page