A geometric interpretation of Gap. The curve in green is the WCSS computed with K-means-R on the CNS Rat dataset. The curve in red is the average WCSS curve, computed on ten datasets generated from the original data via the Ps null model. The vertical lines indicate the gap between the null model curves and the real curve. Since WCSS is expected to decrease sharply up to k*, on the real dataset, while it has a nearly constant slope on the null model datasets, the length of the vertical segments is expected to increase up to k* and then to decrease. In fact, we get k* = 7, a value very close to the number of classes (six) in the dataset.