Skip to main content
Figure 1 | BMC Bioinformatics

Figure 1

From: Clustering protein environments for function prediction: finding PROSITE motifs in 3D

Figure 1

Comparison of distance metrics. We evaluated the performance of different distance metrics using the median silhouette value (S). A silhouette value represents the clustering quality for each object in a cluster as a continuous number between +1 (perfectly clustered) and -1 (the opposite). In order to evaluate the performance of various distance metrics, we calculated the silhouette values for each object in 15 training clusters based on previously validated FEATURE models. (a) The Euclidean distance gave a negative median silhouette value (-0.117), indicating that it is not suitable for clustering FEATURE vectors. The distance was calculated using FEATURE vectors in their original representations before any preprocessing occurred. (b) After converting the FEATURE vectors in the 15 clusters into their binary representations, we obtained better separation between clusters (median silhouette value of 0.362). (c) The weighted Hamming distance (called F-distance) produced an even better result (median silhouette value of 0.414) than the unweighted Hamming distance and was thus selected for clustering of the entire dataset of binary FEATURE vectors.

Back to article page