Schematic overview of the cross-validation scheme for hierarchical clustering. (A) Composition of training data set, indicating an index color and proportion of spectra per class. (B) Dendrogram of the training data set and result of an optimal class assignment under a horizontal cut (indicated by dashed line in the left dendrogram) and an optimal tree assignment (right dendrogram) where each class is identified with the subtree colored according to its associated index color. Tree-assignment based segmentation not only achieves a much higher accuracy, but exhibits substantial differences in the assignment of several classes. The classes of crypts and submucosa are even identified as disjoint sets of spectra in both approaches, while substantial differences exist in the classes of tumour, inflammatory tissue, follicles, and support cells. The two segmentations indicate that even on well-curated training data, non-horizontal cuts in the dendrogram represent tissue classes much more reliably than horizontal cuts.