The basic color scheme, based on the HSB model, is shown in Figure 1 together with color representations for three-way comparisons of selected sets of values. The color representations were calculated according to the proposed procedure described in the Methods section. When the three compared values are identical, the resulting color is white (Figure 1, rows 1–3). If two of the values are identical and one of them is different, the resulting color corresponds to the hue characteristic of the differing value. For example, if a is the different value, the resulting color is red (rows 4–7); if b is the different value, the resulting color is green (rows 8–11); and if c is the different value, the resulting color is blue (rows 12 and 13).
When all three values to be compared are different, the color representing their three-way comparison is selected from the color gradient running between the characteristic hues of the two most distant values (measured by the absolute value of their difference). The exact color depends on the relative position of the remaining value between the two most distant values. If a and b are the most distant values and c lies half way between them, the resulting color is yellow (rows 14–17). If c lies closer to b, the color becomes orange (row 29) and if c lies closer to a, the color becomes yellow-green (row 30). Similarly, if a and c are the most distant and b lies half-way between them, the resulting color is pink (rows 18–24). If b and c are the most distant and a lies half-way between them, the resulting color is cyan (rows 25–28).
The saturation of the colors indicates the extent of differences between the values. When two of the compared values are identical and one is different, the saturation value corresponds to the distance between the two identical values and the unique value (e.g. rows 4–13). If all three values are different, the saturation corresponds to the distance between the two most distant values (e.g. rows 18–28).
To contrast other color schemes with our proposed color-coding method, Figure 1 also shows colors which result from direct substitutions of the compared values into RGB (red, green, blue) and CMYK (cyan, magenta, yellow, black) color models. Identical values lead to colors from white to black (grayscale) gradient for both color models (rows 1–3). Distributions, in which two compared values are identical and one is different (rows 4–13) can each be represented by one of two colors with varying brightness. If a ≠ b = c, direct RGB coding leads to red if a > b = c (rows 4 and 7) or cyan if a <b = c (rows 5 and 6). For both RGB and CMYK direct coding, using two colors per distribution group (separated by horizontal lines in Figure 1) may provide additional distinguishing features for individual distributions, but also lead to undesirable ambiguities. For example, the RGB colors for rows 18–20 corresponding to a ≠ b ≠ c and b lies half-way between a and c are very similar to cyan, corresponding to a ≠ b = c (rows 5,6) and blue corresponding to a = b ≠ c (row 12). Other similar sources of ambiguity can be found in both RGB and CMYK columns of Figure 1. Moreover, the brightness of the colors given by direct RGB or CMYK coding cannot be interpreted easily. For RGB direct coding, in some cases smaller absolute differences lead to darker colors (e.g. rows 4 and 7) while in other cases identical absolute differences lead to different brightness of the color (rows 21 and 22). For all these reasons the proposed color-coding approach appears superior for intuitive visualization of three-way comparisons.
To illustrate how the visualization method can be used to analyze experimental data, we applied the proposed color-coding method to direct three-way comparisons of metabolite profiles. Three groups of replicate quantitative metabolite profiles (n = 5) derived from capillary electrophoresis time-of-flight mass spectrometry (CE-TOFMS) analysis of mouse liver samples were used for the comparison. The datasets originate from our previous work [2]. Replicate datasets from each group were normalized and averaged into single datasets which are visualized as density plots in Figure 2. In this case the data is represented in three dimensions as a map of signals in time (x-axis), molecular mass (m/z), and intensity (color). An additional filter dataset was generated by calculating the F ratio (one-way ANOVA) for the groups of all corresponding signal intensities from the original replicate datasets. A moving average smoothing filter (window size 9) was applied to all electropherograms in the filter dataset. The averaged datasets (Figure 2) were used for the generation of an initial three-way comparison result (not shown). This preliminary comparison was then processed to remove signals for which the corresponding F ratio value in the filter dataset was below a threshold value of 3.9 (corresponding to p = 0.05 when comparing three groups of five replicate values). The final filtered three-way comparison result is shown if Figure 3a.
Parts of the data corresponding to the vicinity of the most significant differences according to the three-way comparison results (Figure 3a) in the normalized replicate datasets are shown in Figure 4 in the form of overlaid extracted electropherograms. These represent the mass electropherograms of metabolite profiles obtained from CE-TOFMS and are used here to confirm visually that the signals are genuine and not due to noise or other artifacts.
Multiple types of possible distributions of compared values, as discussed above, are visible in Figure 3a. Distributions in which one specific value is different and the remaining two compared values are similar are shown as red (label 321 in Figures 2 and 3, corresponding to Figure 4c), green (labels 54 and 38, Figure 4a, g), or blue (near label 245, Figure 4j). Distributions in which all three of the compared values are different and one value lies approximately half-way between the remaining two are shown as yellow (near label 320, Figure 4f), pink (near label 312), or cyan (near label 305).
As described in the Methods section, the brightness value of the HSB color model is not used in the proposed color-coding method but can be used to encode additional information about the three-way comparison. For example, Figure 3b shows an overlay of one of the compared averaged datasets (Figure 2a) onto the filtered three-way comparison result (Figure 3a) via the brightness value. This results in a darkening in the color of the spots that is proportional to the size of the corresponding peaks in the overlaid dataset. Peaks, which do not differ significantly among the three compared averaged datasets, lead to no signals on the filtered three-way comparison result (Figure 3a), but appear as gray spots in Figure 3b (e.g. labels 50, 177, 300) providing both a global overview of total sample composition and instant visualization of specific differences.