Skip to main content
Fig. 2 | BMC Bioinformatics

Fig. 2

From: DiviK: divisive intelligent K-means for hands-free unsupervised clustering in big biological data

Fig. 2

Data-driven MSI peaks selection based on a histogram decomposition. Filtering considers each MSI peak’s average abundance (left panel) and abundance variance (right panel). The histograms of these two characteristics are decomposed into a GMM. Components represent sets of features similar concerning selected characteristics, i.e., average abundance or abundance variance across a cluster. We calculate a conditional probability for each Gaussian component for each value of the selected characteristic. Then we apply the maximum classification rule, which leads to the interpretation that the crossing points of the neighbouring GMM components become filtering thresholds. We remove all the peaks represented by the first GMM component for the average abundance. For the abundance variance, we persist only the peaks represented by the topmost GMM components, but not less than 1% of all the peaks

Back to article page