Skip to main content

Advertisement

Figure 1 | BMC Bioinformatics

Figure 1

From: AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number

Figure 1

Overview of the AutoSOME method. A simple dataset consisting of four 2-D Gaussian distributions of data points, shown on top, is used to illustrate the major steps of the AutoSOME method. (A) Gaussian data points are mapped randomly to the untrained SOM node lattice (left panel), and are organized onto the planar SOM surface after training (middle panel), and the error surface is then computed (right panel), with red representing nodes with highly similar data content compared to neighbors (low error), and blue representing nodes with dissimilar neighbors (high error). (B) A density-equalization procedure treats nodes with high error (Gaussian cluster boundaries) as high density and forces these nodes away from each other while nodes with low error (within clusters) have low density and are forced to aggregate. (C) A Minimum Spanning Tree is built from the rescaled node coordinates, and statistically significant point aggregations of diverse geometries are detected in the dataset using Monte Carlo sampling, resulting, in this case, in the identification of four major clusters corresponding to the four Gaussian point distributions, along with several outlier clusters and singletons (shown by colored nodes in the rightmost image). (D) Impact of number of ensemble iterations on the F-measure, reflecting cluster quality, using the dataset of two interlocking rings (see Figure 2).

Back to article page