 Research article
 Open Access
 Published:
Similarity maps and hierarchical clustering for annotating FTIR spectral images
BMC Bioinformatics volume 14, Article number: 333 (2013)
Abstract
Background
Unsupervised segmentation of multispectral images plays an important role in annotating infrared microscopic images and is an essential step in labelfree spectral histopathology. In this context, diverse clustering approaches have been utilized and evaluated in order to achieve segmentations of Fourier Transform Infrared (FTIR) microscopic images that agree with histopathological characterization.
Results
We introduce socalled interactive similarity maps as an alternative annotation strategy for annotating infrared microscopic images. We demonstrate that segmentations obtained from interactive similarity maps lead to similarly accurate segmentations as segmentations obtained from conventionally used hierarchical clustering approaches. In order to perform this comparison on quantitative grounds, we provide a scheme that allows to identify nonhorizontal cuts in dendrograms. This yields a validation scheme for hierarchical clustering approaches commonly used in infrared microscopy.
Conclusions
We demonstrate that interactive similarity maps may identify more accurate segmentations than hierarchical clustering based approaches, and thus are a viable and due to their interactive nature attractive alternative to hierarchical clustering. Our validation scheme furthermore shows that performance of hierarchical twomeans is comparable to the traditionally used Ward’s clustering. As the former is much more efficient in time and memory, our results suggest another less resource demanding alternative for annotating large spectral images.
Background
In recent years, it has been wellestablished that labelfree Fourier transform infrared (FTIR) microscopy can resolve pathologically relevant information from histological tissue samples [13], as surveyed in [4]. Unveiling histopathologically relevant structures from localized absorbance spectra yielded by an FTIR microscope, schematically illustrated in Figure 1, is typically achieved through a combination of unsupervised and supervised learning approaches [5, 6]. First, certain number of spectrally measured tissue sections are being annotated based on presegmented spectral images, typically based on unsupervised clustering [68]. These annotations are then used to extract spectra as training data for supervised classifiers. Obviously, the quality of the annotation determines what tissue components can be resolved and how reliably they can be recognized by spectral classifiers. In this context, we introduce a novel interactive approach to annotation and quantitatively validate this approach in comparison to established annotation schemes. In particular, we provide novel algorithms to perform such quantitative comparison. As utilizing Raman [9] or CARS [10] microscopy often underlies the same workflow of data processing as FTIR microscopy [11], the ideas discussed here may equally apply to these other types of labelfree multispectral microscopy.
For annotating spectral images, at least two strategies are commonly employed. The first straightforward approach is to cluster all image spectra into a suitable number of k clusters. Each cluster is then identified with one index color, so that a pathologist may identify regions in the corresponding index color image with tissue components. As a second and typically complementary approach, the spectral image can be overlaid with a Hematoxylin and Eosin (H&E) stained image of the same tissue region. Pathologists can identify relevant tissue components in the H&E stained image, whose location patterns can be carried to the corresponding locations in the spectral image in order to extract spectra belonging to a certain tissue component. In practice, the accuracy in overlaying H&E stained images with FTIR spectral images is limited, e.g. due to slight distortions of the tissue during the staining procedure. Also, there are limitations in identifying and marking up precise borders between certain tissue components, so that most approaches to FTIR based spectral histopathology combine the two approaches [2, 7, 8]: A presegmentation of the spectral image is overlaid with the H&E stained image of the same sample, and then clusters in the spectral image are identified with tissue components based on their overlap with relevant regions in the H&E stained image, as identified by a pathologist. In general, the relation between clusters and tissue components is not onetoone, but one tissue component may often be associated with several clusters. Thus, the number of clusters is usually chosen relatively large, so that the image is rather oversegmented. For obtaining presegmentations, hierarchical cluster analysis (HCA) [7] as well as kmeans or fuzzy cmeans [2, 8] are common choices.
Spectral image segmentation using similarity maps
As our main contribution, we introduce a novel interactive method for annotating FTIR spectral images. Based on socalled similarity maps [12] and utilizing the concept of certain similarity measures between high dimensional vectors, annotations result from interactively choosing reference pixel spectra for the tissue components that can be identified in the tissue sample. By overlaying the interactive similarity maps (ISMs) with an H&E stained reference image, this allows to interactively take into account both spectral similarity and histopathological information from the stained image. This method is implemented in our socalled Lasagne software that has been originally proposed [13] and implemented [14] for multilabel fluorescence microscopy, while the present contribution adapts and quantitatively validates it for the use in vibrational microspectroscopy.
Clustering and its validation for spectral image segmentation
In order to convincingly establish similarity maps as a suitable tool for infrared image annotation, it is essential to compare them to the currently predominant approach using clusteringbased presegmentations. Clustering methods such as Ward’s method or kmeans have been used extensively in infrared image segmentation and compared on a qualitative level for their suitability in vibrational microspectroscopy [7]. Yet, as has been noted prominently [15], “the validation of clustering is the most difficult and frustrating part of cluster analysis”. In fact, different applications and different clustering algorithms — in particular hierarchical ones — require different validation approaches. One applicationspecific consideration when employing and validating clustering in the context of infrared image annotation is that the correspondence between tissue components is not onetoone, but rather onetomany. Such considerations are rarely accounted for in existing validation procedures, and may be a reason that current comparisons of clustering approaches [7] are qualitative rather than quantitative. We address the lack of quantitative evaluation through introducing a validation scheme for hierarchical clustering based on socalled treeassignments which were introduced in [16, 17] in the context of tracking cells in live cell imaging. A very brief and preliminary validation of Ward’s clustering in FTIR image segmentation was given in [3] and is fully detailed and systematically elaborated in the present work. In particular, we evaluate the suitability of hierarchical clustering approaches that are more efficient in terms of computational resources.
To identify a suitable validation scheme for unsupervised infrared image segmentation, it is important to consider in some more detail how clustering based segmentations are commonly used in this context. While not commonly described in detail, one typically attempts to choose a number of clusters that oversegments the image in a presegmentation. Then, the task of the human annotator is to identify each tissue component with one or several of these clusters. In some approaches, clusters may be extended or divided by navigating along the hierarchy of a given dendrogram. Using a ground truth segmentation as a reference, a validation scheme thus should aim to identify the best possible segmentation obtained from this workflow from a given clustering algorithm under realistic side assumptions.
One such side assumption is that there is a limit to the number of clusters that can be merged to represent one tissue component, which essentially represents a cognitive limit of a human annotator. Obviously, starting annotation from a crudely oversegmented image in general will allow more precise annotations. However, this comes at the cost of requiring the annotator to identify many small segments that merge into one tissue component. As there is no fixed limit to the degree of oversegmentation, we propose a validation scheme that takes the degree of oversegmentation into account as a parameter. We will refer to the degree of oversegmentation utilized during annotation as the depth of segmentation obtained from a dendrogram, and will introduce a validation scheme that allows to control segmentation depth through a parameter.
As surveyed in [18], a large diversity of validation measures for clustering algorithms has been proposed. Our main concern in this work is to validate clusters against a groundtruth reference segmentation, which is commonly referred to as external validation. While for external validation, one can principally rely on measures such as accuracy known from the validation for supervised classifiers, measures such as the Rand index and the Jaccard index [19, 20], both in eventually normalized forms, are well established and commonly used. Furthermore, socalled variation of information[21] can be considered a well established information theoretic measure. However, these measures can only be applied to fixed segmentations, but not to dendrograms obtained from hierarchical clustering, and do not account for onetomany relations between reference classes and clusters.
In infrared image annotation and other applications it is commonplace to use the dendrogram for obtaining a fixed partitioning into a certain number of classes. In this context, a straightforward and widely used approach horizontally cuts dendrograms into a fixed number of clusters [22]. In other words, given a dendrogram, one identifies edges ${e}_{1},\dots ,{e}_{k}$ so that each e_{ i } contains a point v_{ i } that has same distance δ from the root for all i. Now, subtrees below these k edges define a partitioning into k classes. In general, however, there are numerous nonhorizontal cuts supported by the same dendrogram that yield a different partitioning into the same number of clusters, which has been considered only recently in literature [2325].
As illustrated in Figure 2, a segmentation based on a nonhorizontal cut will generally reflect tissue components much better than a horizontal cut. Thus, validating different approaches to clustering in this context should take into account such nonhorizontal cuts. Our contribution elaborates an approach that allows to systematically identify such nonhorizontal cuts, yielding a corresponding validation scheme for hierarchical clustering. In particular, we utilize this scheme to quantitatively compare different hierarchical clustering approaches to interactive similarity maps. An important property of our validation scheme is that it can measure validity under different depths, i.e., different degrees of initial oversegmentation.
Methods
Interactive Similarity Maps (ISMs)
To introduce the concept of similarity maps following [12], let F(x,y) denote the FTIR absorbance spectrum at position (x,y) in the spectral image. By choosing a reference spectrum R = F(x_{ R },y_{ R }) at position x_{ R },y_{ R }, one can measure the similarity between any position spectrum F(x,y) and the reference spectrum $R=({R}_{1},\dots ,{R}_{n})$ using a suitable measure of spectral similarity. Now, interpreting the spectral similarity at each position as an intensity, we obtain the similarity map M as an intensity image through
where σ_{ R } measures the spectral similarity to the reference spectrum R. We follow the suggestions from [12] in using
as our similarity function between reference spectrum $R=({R}_{1},\dots ,{R}_{n})$ and pixel spectrum $S=({S}_{1},\dots ,{S}_{n})$. Here, α is a nonnegative realvalued parameter to adjust the sensitivity of the similarity measure. Note that Eq. 1 only makes sense if R_{ i } and S_{ i } range between 0 and 1. In practice, we achieve this by rescaling a dataset, and setting the minimum absorption occurring at any wavenumber in any spectrum to 0, and correspondingly setting the maximum absorption occurring in the dataset to 1.
Interpreting σ_{ R }(S) as a similarity measure between vectors R and S, it has been shown that Eq. (1) satisfies metric properties, which turns out to be a metric obtained by natural and systematic scheme to induce new metrics as products of other metrics [12]. A major advantage from a practical point of view is that Eq. (1) can be implemented on graphics hardware, so that the similarity map for all spectra from one image w.r.t. a given reference spectrum R can be computed within fractions of a second. This allows an interactive exploration of a spectral image by setting and moving the coordinate for the reference spectrum R with the mouse pointer.
In general, the product in Eq. (1) may vanish towards 0 rapidly if the difference is large for only a few features. The parameter α can be used to control this effect. Large values α lessen the tendency of the product to vanish towards 0 in datasets with heterogeneous features. Choosing a small α close to 0, on the other hand, can be used to amplify the tendency for the product to vanish when working on datasets with little variability. In practice, the parameter can adapted interactively, where common choices range between 1 and 2.
As demonstrated in Additional files 1, 2, 3, 4 and 5, an image can be annotated through similarity maps by interactively mouseclicking one (or several) reference positions for each tissue component. This yields one reference spectrum R for each tissue component in the spectral image. Once a reference spectrum R is chosen, an intensity cutoff in the similarity map M_{ R } can be set interactively. All positions (x,y) exceeding this threshold will be considered part of the same tissue component. If there are K different types of tissue components in the spectral image, annotation now reads as interactively identifying reference pixel spectra ${R}_{1},\dots ,{R}_{K}$. In practice, as discussed in further detail in Results and discussion Section, some tissue components need to be represented by two rather than one reference spectrum. In case one position exceeds the threshold of several reference spectra, the position is assigned to the similarity map of highest intensity.
As an implementation, we utilized a version of the Lasagne software [13, 14] adapted to the requirements of vibrational microspectroscopic data. A key feature of the Lasagne software is to perform computation of Eq. (1) on graphics hardware, so that the similarity maps M can be displayed in realtime. In our adapted version, the Lasagne software may also display overlays between the similarity maps and a reference image such as an H&Estaining image. In practice, annotation of an FTIR microscopic dataset using the Lasagne software works as follows (see Additional files 1, 2, 3, 4 and 5): The annotator uploads both the FTIR dataset and an H&E staining image of the same sample in the same coordinate system into the Lasagne software. Now, the spectrum at the current position of the mouse cursor in the image is interpreted as the reference spectrum to interactively display a similarity map. When moving the cursor to a suitable reference point, the similarity map may highlight a particular tissue compartment, which can be visually aligned by toggling between the similarity map and the H&E image. Once the correspondence between the similarity map and the tissue structure identifiable from the H&E image is visually well matched by varying the reference point, the annotator sets a suitable intensity threshold in the similarity map, so that the abovethreshold positions can be used as training spectra for the corresponding tissue compartment. This may not only be done for one tissue compartment, but the annotator may set one (or even several) reference point for each tissue compartment that is identifiable in the given tissue sample. Each tissue compartment may be associated with an index color, so that the annotation can be interpreted as an index color image that resolves the tissue structure. In case of a conflicting position where several tissue compartments match the annotation, the position may be associated with the tissue compartment whose similarity map achieves the highest intensity.
In order to validate similarity maps for image annotation, we overlaid the spectral image with a reference segmentation obtained from a supervised classifier using a wellestablished set of training spectra [3]. Reference points for the different tissue components were set by a human operator, aiming to reproduce the reference segmentation as good as possible. While the annotation thus achieved may not be optimal in the sense that a different choice of reference points achieve a higher accuracy, it simulates a segmentation that may realistically achieved by a histopathologist.
Hierarchical clustering
We employed two variants of hierarchical clustering. First, we hierarchically clustered spectra using Ward’s approach [26] based on two different distance measures. First, we employed the wellestablished and widely used correlation distance (i.e., one minus correlation coefficient) and, second, we used the power metric d_{P}(X,Y) = 1  σ_{ X }(Y) obtained using Eq. (1). As a further flavor of hierarchical clustering, we performed hierarchical twomeans, i.e., recursively bipartitioning the dataset into two groups using twomeans clustering in a topdown fashion. In each round of twomeans clustering, the best subdivision among five repetitions on different random initialization was used for the next round of subdivision. For Ward’s clustering, we utilized the (parallelized) implementation provided by the Statistics toolbox of Matlab version 7.11. Hierarchical twomeans clustering was implemented using kmeans clustering provided by Matlab.
Validation of hierarchical clustering
When performing hierarchical clustering on curated training data with training spectra for tissue components $1,\dots ,K$, a dendrogram obtained from an “ideal” hierarchical clustering would contain one vertex v_{ i } for each $i=1,\dots ,K$ such that all spectra below v_{ i } belong to class i. In order to measure to what degree a dendrogram D obtained by Ward’s clustering achieves this criterion, we identify vertices ${v}_{1},\dots ,{v}_{K}$ in D that approach this goal as far as possible. As detailed below, this can be achieved based on ideas behind socalled treeassignments recently introduced in a different context in [16, 17, 27].
The main idea behind validating how well a given dendrogram reflects a given reference partitioning of a set of spectra is to utilize measures for comparing partitionings, such as accuracy or the popular Rand index (RI). Once such a measure is chosen, we determine a partitioning supported by the dendrogram that maximizes this measure. This approach is in line with the VICut introduced in [24], which determines a partitioning that maximizes variation of information as an information theoretic measure for cluster validity. In terms of validation of infrared image segmentation and annotation, however, VICut does not allow to control the depth of annotation. In fact, VICut may in the end perform its validation on a segmentation derived from the dendrogram that realistically may not be recoverable by a human annotator.
Measures for comparing partitionings
As our main validity measure for comparing partitionings, we use the Rand index[19], which is a wellestablished measure to compare two partitionings in the context of cluster validation [28]. The Rand index is defined for two partitionings and ${\mathcal{C}}^{\prime}$ that partition the set $\{1,\dots ,n\}={C}_{1}\cup \cdots {C}_{k}={C}_{1}^{\prime}\cup \cdots {C}_{\ell}^{\prime}$. Following the notation from [28], the Rand index (RI) is based on the indicator function
and ${e}^{\prime}(i,j)$ correspondingly equal to one if i and j are in the same class in ${\mathcal{C}}^{\prime}$ and 0 otherwise. We can now further define
which finally yields the Rand index
We will also utilize the Mirkin metric, which as a close relative to the Rand index is defined as
where ${m}_{\mathit{\text{ij}}}={C}_{i}\cap {C}_{j}^{\prime}.$ Obviously, one can compute the Rand index easily from the Mirkin metric [28] using
Determining optimal partitionings supported by a dendrogram
Given a dendrogram with n leaves and a reference partitioning ${\mathcal{C}}^{\prime}$ that partitions the numbers $\{1,\dots ,n\}$, we now aim to use the dendrogram to obtain a partitioning that maximizes the Rand index between and ${\mathcal{C}}^{\prime}$. We allow to derive a partitioning from the dendrogram by assigning a class label to vertices in the dendrogram, so that all leaves below a labelled vertex v will belong to the assigned class. To prevent assignments of leaves to more than one class, no ancestor or descendant of an assigned vertex can be further assigned to a class.
Eq. (2) shows that indeed it is sufficient to minimize the Mirkin metric rather than maximizing the Rand index. Furthermore, the Mirkin metric is composed of 3 parts. Since ${\mathcal{C}}^{\prime}$ is the reference partitioning, $\sum _{j=1}^{\ell}{C}_{j}^{\prime}{}^{2}$ is constant. Thus we only need to minimize the left 2 parts:
Let ${w}_{i}={C}_{i}{}^{2}2\sum _{j=1}^{\ell}{m}_{\mathit{\text{ij}}}^{2}$. Then
Here w_{ i } is the weight associated with class C_{ i }. C_{ i } is the number of leaves underneath vertex v_{ i } and m_{ ij } is the number of points shared by cluster C_{ i } and ${C}_{j}^{\prime}$. Thus, the values w_{ i } can be computed easily and quickly. The terminology introduce above suggests the following integer linear programming to identify an optimal partitioning:
Here, p is the number of vertices in the dendrogram, w_{ i } is the gained Mirkin metric if there is a cut at vertex v_{ i } and X_{ i } is a binary variable. X_{ i } = 1 indicates that there is a cut at vertex v_{ i }. Finally, Q is the parameter that controls how many vertices may be assigned overall in the partitioning, thus controlling the depth of annotation: A small value of Q means the “annotator” has to choose large high vertices in the dendrogram to obtain the partitioning, a large value of Q means that the partitions can be merged from many small segments in lower parts of the dendrogram.
Once a treeassignment has been obtained, it is useful to obtain a partitioning of the dataset where each partition is assigned one of the classes in the reference partitioning ${\mathcal{C}}^{\prime}$. Such class assignment can be used to associate an accuracy of the segmentation , and in case of an image dataset can be used to produce an index color image. In order to obtain such class assignment, we follow a straightforward majority vote approach: Whenever a vertex v_{ i } is active, i.e., X_{ i } = 1, we need to associate the data points at the leaves below v_{ i } with a class. By considering the labels of these q data points ${x}_{i,1},\dots ,{x}_{i,q}$ in the reference partitioning ${\mathcal{C}}^{\prime}$, we determine the label which occurs most often, and associate it with all leave data points ${x}_{i,1},\dots ,{x}_{i,q}$.
Our treeassignment implementation is based on the Matlab interface to version 5.5 of lpsolve. In order to limit the size of the ILP and avoid assignments to very lowlevel vertices, only the topmost 255 vertices in each dendrogram were allowed to be assigned. Note that this cutoff is far beyond what could be utilized in an HCA based annotation by histologists, as the resulting presegmented index color image involves at most 128 different index colors and thus appears highly fragmented. Thus, vertices located even lower in the dendrogram can be considered as unidentifiable in practice by an annotator. Meanwhile, we only need to compute the topmost 255 vertices in hierarchical twomeans, which can reduce the running time even further.
If applied to a training dataset where each spectrum is assigned with a class label, the result of the treeassignment reads as a reclassification of the training dataset. Thus, we can apply any validation measure used for measuring the quality of supervised classifiers. In particular, we can mimic validation schemes such as MonteCarlotype cross validation by repeatedly subsampling from the training dataset. In Results and discussion Section, we extensively utilize this idea to validate hierarchical clustering in comparison to both supervised classifiers and similaritymap based annotation.
Datasets
For our computational studies, we utilized a colon tissue spectral dataset derived from [3]. The dataset consists of a training data set comprising 23,278 pixel spectra grouped into 14 classes of tissue components, along with three large spectral images displaying 854 × 502, 576 × 672 and 832 × 416 FTIR pixel spectra of three tissue sections. The images will henceforth be referred to as 120514, 88180 and colon_p53_active, respectively. The spatial resolution is 5.5 μm/px. Following common practice in infrared image analysis, spectra exhibiting a weak signal or strong noise, e.g. resulting from holes or cracks in the tissue section or other artifacts, are discarded in a preprocessing step. This affects roughly 10% of all image spectra; for image 120514, e.g., 8.24% of the image spectra are not considered for further analysis.
Based on the training dataset, a Random Forest classifier has been trained (for details, refer to [3]), yielding a segmented version of the spectral images that assigns one of the 14 trained classes to each pixel, see Figure 3(A). The training data set contains wellcurated spectra and has been validated in detail in [3] by further experimental evidence using fluorescence microscopy. Furthermore, it was shown in [3] that the segmentations obtained from this supervised classifier resolve histopathologically relevant details such as the lamina muscularis mucosae. Following the general difficulty to obtain ground truth for biological image data [29], we used this fluorescencevalidated and histopathologically well supported segmentation as a ground truth segmentation to quantitatively compare with segmentations obtained from similarity maps and hierarchical clustering algorithms. We may consider the crossvalidation accuracy of the random forest of 94.92% on the training data as an estimate on the accuracy of our reference data.
Results and discussion
To measure the performance of the classification and segmentation methods introduced here with other methods, we used the mean accuracy achieved in a MonteCarlo type validation scheme whenever applicable.
Validation of tree assignments
We compared segmentations obtained from tree assignments, kmeans and horizontal cut (see Figure 4). In this case, horizontal cut performs slightly better than kmeans. While nonhorizontal cut using tree assignments gets much higher Rand index than the other two methods. Our results further confirmed the previous findings [7] in a systematic and quantitative way.
Validation of similarity maps
We applied the Lasagne software to all the three spectral images, using the random forest classifications based on wellcurated training data as reference segmentations. These RFbased reference segmentations were visually reconstructed as good as possible using the Lasagne software by a human operator. We allowed the operator to specify up to two reference pixel spectra per class. In the resulting segmentation of image 120514 (Figure 3), 376,718 (95.76%) out of the 393,378 nonbackground pixel spectra were assigned to one of the classes in the training dataset. The smallest five classes, namely out, fat remainders, follicles, blood and slime could not be properly identified as either too few spectra belong to this class (36 spectra for slime) or their location patterns were spectrally not unambiguously resolved by the Lasagne software. Yet, the resulting segmentation assigns 53.94% out of 393,378 pixel spectra to the correct class. This accuracy is higher than the accuracy achieved by either variant of HCA, where at most 53.35% of the spectra were assigned correctly by Ward’s clustering with the power metric. From the confusion matrices, we can see that both HCA based segmentations and similarity maps based segmentation perform better for big classes than for small classes. What is different is that for small classes that are difficult to identify, Lasagne rejects to assign any class label while HCA based methods make wrong assignments. In Figure 3(B), submucosa was totally mistakenly identified as either support cells or muscle, which is undesirable.
Figure 3 shows the RFsegmented reference image, tree assignments based segmentation and the Lasagnereconstruction image for dataset 120514. Corresponding results for the other two datasets 88180 and colon_p53_active are shown in Additional files 6 and 7. For dataset 88180, the Rand index is equivalent between similarity maps and either variant of HCA (.75), while the accuracy is slightly higher for HCA based segmentations (≥ 59.39% for HCA vs. 56.23% for similarity maps). For dataset colon_p53_active, HCA accuracies are significantly higher (≥ 69.21% vs. 41.68%). Although HCA based segmentations received higher overall accuracies than similarity map based segmentation, many details of the tissue structure are lost. Due to the majority vote approach of class assignment subsequent to the treeassignment based validation, they are more likely to mistakenly recognize small tissue classes as big tissue classes. This property may cause problems for samples containing unbalanced proportions of tissue classes. Furthermore, our validation is conservative in the sense that HCA is validated by a segmentation that algorithmically mimics the annotation that the “best possible annotator” could obtain from the given dendrogram, whereas the similarity map relies on a real human annotator to visually reproduce the ground truth segmentation.
Validation of different hierarchical clustering approaches
We applied and evaluated tree assignments using different depths of segmentation $Q=14,16,\dots ,42$ (see Figure 5 and Additional file 8). Both Rand index and accuracy increase with larger values of Q in essentially all cases. However, accuracy increases faster than the Rand index, which may be due to the relatively large number of 14 groups in our dataset, where the Rand index tends to approach 1. Hierarchical twomeans performs worse than Ward’s method on training data, while comparable or even slightly better on image 120514. In general, we may conclude that hierarchical twomeans works well on image data, and using the power metric gives a slight, but not significant advantage over the established and widely used correlation distance on both image and training data. As to be expected, the accuracy achieved by unsupervised HCA using either distance measure is much smaller than the 94.92% accuracy obtained from a supervised random forest.
Beside validation measures, the running time required for obtaining clustering results is of high practical relevance. While not investigated in further detail, clustering roughly half a million image spectra using hierarchical twomeans takes only few hours without parallelized computation, while Ward’s clustering consumes more than one week of computation time using up to 64 CPUs in parallel.
Conclusions
We have introduced two novel concepts in the context of annotating FTIR microspectroscopic images. First, we proposed a quantitative validation of hierarchical clustering schemes commonly employed during spectral image annotation. Second, we described and validated interactive similarity maps as an alternative to clusteringbased image annotation.
Similarity maps for vibrational microscopy image segmentation
Our contribution on interactive similarity maps suggests that there are viable alternatives to this “clustering paradigm”. As our findings suggest, annotations obtained using similarity maps may achieve similarly accurate as annotations based on hierarchical clustering. Compared to the costs of computing time and memory that are still significant even for the more efficient hierarchical twomeans, similarity maps require no preprocessing beyond the commonly performed lowlevel normalization or baseline correction. Implemented on a GPU, recomputing the similarity map after an interactive change of a reference point can be done within fractions of a second even on large (>500,000 spectra) infrared images.
While visually identifying reference points is an intuitive concept addressing the histologist or pathologist not requiring any explicit computational expertise, this contribution provides a proof of concept based on quantitative validation. Establishing it to the level of a routine task for histologists or pathologists in larger scale studies is a perspective that should be encouraged by our positive quantitative validation of the approach. Both similaritymap based exploration and annotation and the concept of treeassignments introduced here may be equally useful for Raman [9] and CARS [10] microscopy, which is worthwhile to explore in future contributions.
Clustering in vibrational microscopy image segmentation
Along our contribution to quantitatively compare unsupervised infrared image segmentation strategies, we have provided a validation scheme for hierarchical clustering that matches the assumptions behind spectral image annotation, which turned out to be a nontrivial task in itself. As hierarchical clustering is arguably the most commonly used basis for infrared image annotation, this contribution is particularly important for systematically quantifying performance of different methods, rather than comparing by qualitative visual inspection. One of the immediate consequences we obtain is that the traditionally used Ward’s clustering may be substituted without significant loss of quality by hierarchical twomeans for image segmentation. As the latter is much more time and memory efficient, this finding will make it much more practical to work with large spectral images. Being able to handle larger numbers of spectra without compromising in terms of accuracy becomes increasingly important in multispectral microscopy. In fact, the sizes of images keep growing with new generations of FTIR microscopes and array detectors, or when working on confocally measured stacks of Raman or CARS images.
Turning dendrograms into segmentations or partitionings
Finally, the idea of determining nonhorizontal cuts in dendrograms and the crossvalidation scheme based on this idea may be of further interest in infrared microscopy and beyond. Although not explored in this contribution, tree assignments also allow to compare two (or more) dendrograms by identifying an optimal set of classes supported by both dendrograms, rather than matching a fixed segmentation into one dendrogram. While this can achieved by relatively simple modifications of the integer linear programming and the weighting scheme provided here, exploration is left for future contributions.
Abbreviations
 FTIR:

Fourier transform infrared
 H&E:

Hematoxylin and Eosin
 HCA:

hierarchical clustering analysis
 ISMs:

interactive similarity maps
 RF:

random forest
 ILP:

integer linear programming
 RI:

Rand index.
References
 1.
Lasch P, Haensch W, Lewis EN, Kidder LH, Naumann D: Characterization of colorectal adenocarcinoma sections by spatially resolved ftir microspectroscopy. Appl Spectrosc. 2002, 56 (1): 19. 10.1366/0003702021954322.
 2.
Steller W, Einenkel J, Horn LC, Braumann UD, Binder H, Salzer R, Krafft C: Delimitation of squamous cell cervical carcinoma using infrared microspectroscopic imaging. Anal Bioanal Chem. 2006, 384 (1): 145154. 10.1007/s0021600501244.
 3.
KallenbachThieltges A, Großerüschkamp F, Mosig A, Diem M, Tannapfel A, Gerwert K: Immunohistochemistry, histopathology and infrared spectral histopathology of colon cancer tissue sections. J Biophotonics. 2013, 6 (1): 88100. 10.1002/jbio.201200132.
 4.
Trevisan J, Angelov PP, Carmichael PL, Scott AD, Martin FL: Extracting biological information with computational analysis of fouriertransform infrared (ftir) biospectroscopy datasets: current practices to future perspectives. Analyst. 2012, 137 (14): 32023215. 10.1039/c2an16300d.
 5.
Lasch P, Diem M, Hänsch W, Naumann D: Artificial neural networks as supervised techniques for ftir microspectroscopic imaging. J Chemometrics. 2006, 20 (5): 209220. 10.1002/cem.993.
 6.
Bird B, Miljkovic M, Romeo MJ, Smith J, Stone N, George MW, Diem M: Infrared microspectral imaging: distinction of tissue types in axillary lymph node histology. BMC Clin Pathol. 2008, 8 (1): 810.1186/1472689088.
 7.
Lasch P, Haensch W, Naumann D, Diem M: Imaging of colorectal adenocarcinoma using ftir microspectroscopy and cluster analysis. Biochimica et Biophysica Acta (BBA)Molecular Basis of Disease. 2004, 1688 (2): 176186. 10.1016/j.bbadis.2003.12.006.
 8.
Kannan S, Ramathilagam S, Sathya A, Pandiyarajan R: Effective fuzzy cmeans based kernel function in segmenting medical images. Comput Biol Med. 2010, 40 (6): 572579. 10.1016/j.compbiomed.2010.04.001.
 9.
Turrell G, Corset J: Raman Microscopy: Developments and Applications. 1996, San Diego: Academic Press
 10.
Freudiger CW, Min W, Saar BG, Lu S, Holtom GR, He C, Tsai JC, Kang JX, Xie XS: Labelfree biomedical imaging with high sensitivity by stimulated raman scattering microscopy. Science. 2008, 322 (5909): 18571861. 10.1126/science.1165758.
 11.
Matthäus C, Chernenko T, Newmark JA, Warner CM, Diem M: Labelfree detection of mitochondrial distribution in cells by nonresonant raman microspectroscopy. Biophys J. 2007, 93 (2): 668673. 10.1529/biophysj.106.102061.
 12.
Dress A, Lokot T, Schubert W, Serocka P: Two theorems about similarity maps. Ann Combinatorics. 2008, 12 (3): 279290. 10.1007/s0002600803514.
 13.
Serocka P: Visualization of highdimensional biomedical image data. Advances in Multimedia Information ProcessingPCM 2007. 2007, Berlin Heidelberg: Springer, 475482.
 14.
Schubert W, Gieseler A, Krusche A, Serocka P, Hillert R: Nextgeneration biomarkers based on 100parameter functional superresolution microscopy tis. New Biotechnol. 2011, 29 (5): 599610.
 15.
Jain AK, Dubes RC: Algorithms for Clustering Data. PrenticeHall Advanced Reference Series. 1988, Upper Saddle River: Prentice Hall PTR
 16.
Mosig A, Jäger S, Wang C, Nath S, Ersoy I, Palaniappan K, Chen SS, et al: Tracking cells in life cell imaging videos using topological alignments. Algorithms Mol Biol. 2009, 4 (1): 1010.1186/17487188410.
 17.
Xiao H, Li Y, Du J, Mosig A: Ct3d: tracking microglia motility in 3d using a novel cosegmentation approach. Bioinformatics. 2011, 27 (4): 56410.1093/bioinformatics/btq691.
 18.
Halkidi M, Batistakis Y, Vazirgiannis M: On clustering validation techniques. J Intell Inf Syst. 2001, 17: 107145. 10.1023/A:1012801612483.
 19.
Rand WM: Objective criteria for the evaluation of clustering methods. J Am Stat Assoc. 1971, 66 (336): 846850. 10.1080/01621459.1971.10482356.
 20.
Hubert L, Arabie P: Comparing partitions. J Classif. 1985, 2 (1): 193218. 10.1007/BF01908075.
 21.
Meilă M: Comparing clusterings—an information based distance. J Multivariate Anal. 2007, 98 (5): 873895. 10.1016/j.jmva.2006.11.013.
 22.
Friedman J, Hastie T, Tibshirani R: The Elements of Statistical Learning, 2nd edn. 2008, New York: Springer, Chap. 14.3.12 Hierarchical Clustering
 23.
DotanCohen D, Melkman AA, Kasif S: Hierarchical tree snipping: clustering guided by prior knowledge. Bioinformatics. 2007, 23 (24): 33353342. 10.1093/bioinformatics/btm526.
 24.
Navlakha S, White J, Nagarajan N, Pop M, Kingsford C: Finding biologically accurate clusterings in hierarchical tree decompositions using the variation of information. J Comput Biol. 2010, 17 (3): 503516. 10.1089/cmb.2009.0173.
 25.
Bruzzese D, Vistocco D: Cutting the dendrogram through permutation tests. Proceedings of COMPSTAT’2010. 2010, PhysicaVerlag HD, 847854. COMPSTAT 2010 Book of Abstracts, 62
 26.
Ward Jr JH: Hierarchical grouping to optimize an objective function. J Am Stat Assoc. 1963, 58 (301): 236244. 10.1080/01621459.1963.10500845.
 27.
Xiao H, Zhang M, Mosig A, Leong H: Dynamic programming algorithms for efficiently computing cosegmentations between biological images. Algorithms in Bioinformatics. 2011, Berlin Heidelberg: Springer, 339350.
 28.
Wagner S, Wagner D: Comparing clusterings: an overview. Technical Report 20064, Universität Karlsruhe, Fakultät für Informatik. 2007
 29.
Peng H, Chung P, Long F, Qu L, Jenett A, Seeds AM, Myers EW, Simpson JH: Brainaligner: 3d registration atlases of drosophila brains. Nat Methods. 2011, 8 (6): 493498. 10.1038/nmeth.1602.
Acknowledgements
This research was supported by the Protein Research Unit Ruhr within Europe (PURE) from the Ministry of Science and Technology, North RhineWestphalia, Germany. AM was supported by a Chinese Academy of Sciences Visiting Professorship for Senior International Scientists (grant No. 2011T1S11).
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
QZ implemented clustering and validation algorithms, conducted computational experiments, drafted the manuscript, and participated in design of the study. CY participated in algorithm implementation. FG and AK prepared datasets. PS implemented software for interactive similarity maps. KG coordinated the study. AM conceived of the study, participated in its design, and drafted the manuscript. All authors read and approved the final manuscript.
Electronic supplementary material
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
About this article
Cite this article
Zhong, Q., Yang, C., Großerüschkamp, F. et al. Similarity maps and hierarchical clustering for annotating FTIR spectral images. BMC Bioinformatics 14, 333 (2013). https://doi.org/10.1186/1471210514333
Received:
Accepted:
Published:
Keywords
 Hierarchical clustering
 Cluster validation
 FTIR microscopy
 Raman microscopy
 Image analysis
 Similarity maps