Skip to main content
Fig. 2 | BMC Bioinformatics

Fig. 2

From: Euclidean distance-optimized data transformation for cluster analysis in biomedical data (EDOtrans)

Fig. 2

Effects of EDO transformation on innerclass and interclass distances and clustering of multivariate datasets. K-means clustering of an artificial data set that represented a three-class scenario with values generated by Gaussian mixture models with four different variables with increasing means, various standard deviations with a total of 3000 instances with class weights = [0.7, 0.2, 0.1] in each variable. The clustering was performed on untransformed (raw) data (panels AD), on z-standardized data (panels EH), and on EDO transformed data (panels IL). For each kind of data transformation, four panels are shown. The left panels A, E, I show the original data that consist of three variables that are distributed according to a Gaussian mixture containing three modes. The sinaplot [15] shows the individual data points of the three subgroups dithering along the x-axis to create a contour indicating the probability density of the distribution of the data points. Panels B, F, J show the distribution of innerclass and interclass distances as histograms. Panels C, G, H show factorial plots of the individual data points on a principal component analysis projection colored according to a k-means clustering. The borders of the colored areas visualize the cluster separation. The right panels D, H, L show Silhouette plots for the three clusters. Positive values indicate that the sample is within a cluster while negative values indicate that those samples might have been assigned to the wrong cluster because they are closer to neighboring than to their own cluster. The figure has been created using the R software package (version 4.1.2 for Linux; https://CRAN.R-project.org/ [9]) and the R packages “ggplot2” (https://cran.r-project.org/package=ggplot2 [10]), and “FactoMineR” (https://cran.r-project.org/package=FactoMineR [16]). The colors were selected from the “colorblind_pal” palette provided with the R library “ggthemes” (https://cran.r-project.org/package=ggthemes [11])

Back to article page