Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: Euclidean distance-optimized data transformation for cluster analysis in biomedical data (EDOtrans)

Fig. 1

Implicit definition of instances within a class (innerclass instances) in an exemplary distribution using the Euclidean distance. Properties of the Euclidian distance relevant to innerclass and interclass distances. A The problem addressed by the EDO transformation has its origin in the behavior of the squared differences function,\(f\left(x\right)={x}^{2}\). Here x2 < x holds for x values in the interval [1, 1] and x2 > x for x values outside this interval, which affects the analogous behavior of the Euclidean distance based on the sum of the squared single differences. The value of x = 1 at which the change occurs is marked by a red solid line. The dashed dark gray lines indicate the identity x2 = x. B Behavior of Euclidean distances compared to distances computed without using the square of individual distances, again indicating a break from ≤ 1 to > 1 at a distance of \(d = 1\) (solid red line). The identity between the two implementations of the distances is shown as a (horizontal) dashed dark gray line. C–E Limits on the assignment of a data point to the inner center of a distribution. The green lines mark the distance of one standard deviation from the mean in a normally distributed data set with distribution N(4,3). The red vertical lines mark the boundaries between which a data point has a Euclidean distance ≤ 1 from the center. Data points located within the innerclass rage are colored black, while data points located at greater distances from the center are colored gold. C For untransformed raw data, this innerclass range is much narrower than the usual mean ± standard deviation range. D When z-standardization is applied, the innerclass range becomes wider. The graph again shows the original data, but the innerclass limits were calculated for z-standardized data and transformed back to the original data range. E With the EDO transformation, the innerclass angel finally fulfills the desire to cover the usual mean +—normalization range. Again, the graph shows the original data, but the innerclass limits were calculated for EDO-transformed data and transformed back to the original data range. The figure has been created using the R software package (version 4.1.2 for Linux; https://CRAN.R-project.org/ [9]) and the R libraries “ggplot2” (https://cran.r-project.org/package=ggplot2 [10]) and “ggthemes” (https://cran.r-project.org/package=ggthemes [11])

Back to article page