The utility of the plots described above are demonstrated with the aid of two different microarray-based datasets. The 2D plots are illustrated with the help of the Spellman et al. [26] dataset identifying cell cycle related genes in yeast, while microarray data from the human gene atlas study [27], profiling gene expression across multiple tissues, is used for the 3D plots.
2D plots
Spellman et al. [26] produced genome-wide time course profiles in yeast using micro-arrays under different synchronization methods. Fourier analysis was then used to identify 800 genes, with the correct periodicity, as cell cycle related. We consider only these 800 cell cycle related genes and study their profiles under alpha synchronization. For an example with a larger number of points without such periodicity see Additional File 1. Since a natural time ordering of the measurements exists, we are only interested in the relationship between genes.
For comparison to the plots produced by NeatMap we used the Multiexperiment Viewer (MeV) software to generate the standard clustered heat map for this data (figure 1a). Average linkage hierarchical clustering of the Pearson correlation, followed by MeV's function for optimal reordering of genes were used. Although the periodicity of these genes is clear, and locally good groupings are seen, the pattern as a whole appears quite jagged. This is because a cluster like topology was forced on an essentially continuous distribution. Closely related groups of genes are correctly clustered together but the global relations between genes in different clusters (which is essential for complete ordering) are lost. Figure 1b shows the result produced by a 2D embedding of the gene profiles using nMDS, again with the Pearson correlation. A clear continuous ring like pattern emerges naturally. (PCA, with normalized profiles, shows a similar result although the ring structure is more diffuse; see Additional File 2).
Such a ring-like structure is very common when an amplitude-normalized distance measure such as the Pearson correlation is used. In this situation, it is natural to parameterize the position of a gene by a single angle. This is what heatmap1 does. For each gene, its angular position in the nMDS result (figure 1b), with respect to its center of mass, is determined, and the profiles are placed (figure 1d) in a standard heat map ordered according to this angle. The periodic nature of the profiles is now clear, and it is evident that points are arranged by time of up-regulation; essentially the cell cycle phase in which the gene is expressed. While in this case the angular co-ordinate was interpretable as the cell cycle phase, this method works even with non-periodic data when such interpretation is not the possible (see, for example, Additional File 1). Note that heatmap1 also accepts orderings produced by other methods. The R package seriation [12] offers a variety of these, and heatmap1 plots using them for the Spellman data set are available as Additional File 3. In general, the NeatMap ordering is superior, except for the case of Rank Two Ellipse [23]. This method, like NeatMap, uses angular ordering based on normalized profiles (the correlation matrix itself in this case). heatmap1 also allows the superimposition of clustering results. Evidently, the local arrangements in nMDS and clustering are consistent. Large scale rearrangement, produced by incorrect 'swinging', however, makes the clustered heat map result seem poor.
There are some long lines in the gene clustering result in figure 1c spanning the entire length of the heat map. This is a consequence of the periodicity of the angular variable, which results in the two opposite ends of the heat map being almost identical. To avoid artifacts from this periodicity, one may use circularmap (figure 1e). The ordering of profiles is identical to heatmap1, except they are placed along a circle according to their angular positions in figure 1b. One additional advantage of this format is that the non-uniformity in the phase distribution stands out more clearly. It is much harder to gain this type of information from a traditional heat map display.
Figure 1c shows the lineplot based on the nMDS result in figure 1b. As explained earlier, each cell in the grid in figure 1c shows the time course profiles of all the genes in the corresponding cell in figure 1b. The sinusoidal nature of the profiles is much clearer in this plot. It also emerges that the radial coordinate in this case is a measure of 'cyclicity', with the genes close to the centre being less cyclic.
Thus, lineplot emphasizes the overall nature and change in profiles with position. However, compared to heatmap1 and circularmap, comparison of expression at a fixed time across genes is more difficult. It is also more difficult to quickly look up a specific gene. On the other hand, heatmap1 and circularmap are intended for essentially one dimensional results. To deal with the more general case we must use 3D rotatable plots.
Assuming the profiles are stored in matrix form in alpha.profiles, the code to produce figure 1c, d, and 1e (except for specific graphics options) is:
pos.nMDS<-nMDS(alpha.profiles)$x;# Perform nMDS embedding
lineplot(pos.nMDS,alpha.profiles,normalize=T); #1c
make.heatmap1(alpha.profiles,row.normalize=T); #1d
make.circularmap(alpha.profiles); #1e
To use PCA instead of nMDS, a single parameter specifying this would need to be added to each of these plots.
3D plots
We illustrate the 3D plots using the gene atlas dataset. Su et al. [27] used microarrays to analyze the expression profiles of genes in a variety of tissues in both humans and mouse. There is no natural ordering of the genes or tissues, but the relationships between tissues are more easily understood. We therefore primarily focus on these.
Since, in the present context, we are not interested in cross-species comparison, for this demonstration only human data was used (mouse gives similar results). The 1000 genes on the HG-U133A array showing largest variance across the 79 tissues were analyzed. Functionally, there are broadly 3 groups of tissues: those from the brain proper, some nervous system related tissues, and those from other parts of the body. The result of applying hierarchical clustering (average-linkage) using the Pearson correlation to the tissues is shown in figure 2a. Three distinct clusters are seen, one of which is composed solely of brain tissues. However, the nervous tissues are mixed with the other non-brain tissues in the second cluster and no relation to the brain can be gleaned from the leaf order or distance along the tree.
A 2D embedding of the same data using nMDS with Pearson correlation was also performed. The cluster analysis result was superimposed on the 2D nMDS result in a rotatable 3D environment using draw.dendrogram3d (figure 2b). The same three clusters are present, and there is broad agreement between the clustering and nMDS results. Unlike the clustering result, however, the relationship between the brain and nervous system tissues is much clearer. The nervous system genes are also quite similar to the central cluster of tissues in figure 2b. Apparently, cluster analysis assigns them to this cluster, and in doing so their relationship to the proper brain tissues is lost.
The profiles underlying the nMDS result may be displayed in a rotatable 3D environment by using profileplot3d. Figure 3a shows this with the cluster analysis results for genes and tissues superimposed on it. The genes were ordered according to their angular positions in a ring-like nMDS embedding by making use of the Pearson correlation, much like heatmap1. The separation between the three groups of tissues can be seen as before. However, profileplot3d makes it clear that there are different set of genes up-regulated in these groups. The same result can be viewed as a rotatable stereo plot using stereo.profileplot3d (figure 3b). This type of plot could be useful for publications and other environments where dynamic rotations are not possible.
Assuming the data is stored in matrix form (with genes along the rows and tissues along columns) in atlas.profiles, the cluster analysis result for tissues in atlas.cluster, and the three groups are color coded in atlas.group.colors the code to produce the plots in figure. 2 and 3 are:
atlas.nMDS<-nMDS(profiles)$x;
draw.dendrogram3d(atlas.nMDS,atlas.cluster,labels=colnames(atlas.profiles),
label.colors=atlas.group.colors);
make.profileplot3d(atlas.profiles,column.method="nMDS",
labels=colnames(atlas.profiles),label.colors=atlas.group.colors);
make.stereo.profileplot3d(atlas.profiles,column.method="nMDS",
labels=colnames(atlas.profiles),label.colors=atlas.group.colors);