The visualization methods discussed in this paper are designed for cluster solutions of partitioning cluster algorithms where clusters can be represented by centroids (e.g., Kmeans and PAM or QTClust [10]).
Neighborhood graphs
Neighborhood graphs [7] use the mean relative distances between points and centers as edge weights in order to measure how separated pairs of clusters are. Hence they display the distance between clusters. In the graph each node corresponds to a cluster centroid and two nodes are connected by an edge if there exists at least one point that has these two as closest and secondclosest centroid.
For a given data set X_{
N
}= {x_{1},..., x_{
N
}} the distance between points x and y is given by d(x, y), e.g., the Euclidean or absolute distance. C_{
K
}= {c_{1},..., c_{
N
}} is a set of centroids and the centroid closest to x is denoted by
The second closest centroid to x is denoted by
The set of all points where c_{
k
}is the closest centroid is given by
Now the set of all points where c_{
i
}is the closest centroid and c_{
j
}is secondclosest is given by
For each observation x the shadow value s(x) is defined as
s(x) is small if x is close to its cluster centroid and close to 1 if it is almost equidistant between the two cluster centroids. The average svalue of all points where cluster i is closest and cluster j is second closest can be used as a proximity measure between clusters and as edge weight in the graph.
A_{
i
} is used in the denominator instead of A_{
ij
} to make sure that a small set A_{
ij
}consisting only of badly clustered points with large shadow values does not induce large cluster similarity.
Functional relevance test
Now the obtained similarity between clusters and the neighborhood graph can be used to evaluate a cluster result at hand. The cluster structure can be used to decide whether the clustering is too coarse and needs further subdivision to respect the data or if it is too fine and some clusters should be merged. On the one hand this can be accomplished by defining some threshold t for the shadow value s above which two clusters are merged. In the case of too large clusters more accurate clusters can for instance be obtained by running the algorithm again with larger K.
On the other hand external knowledge about the data can be used to validate a given clustering. In the case of microarray data a priori information about gene function or the association to functional groups can be used as functionally related genes are more likely to be coexpressed. Clusters with similar expression pattern are connected in the neighborhood graph. If functional group F is independent of the experimental setup genes classified to group F will be assigned to arbitrary clusters, i.e., they are assumed to be spread all over the neighborhood graph. Further, genes functionally independent of the experimental setup do not have a common expression pattern. If functional group F plays a role in the experiment the corresponding genes are more likely to show a typical pattern of either up or downregulation and there should be clusters with accumulation of such genes.
Assigning all genes in the clustered data set to some functional group F yields proportions π_{1},..., π_{
K
}where K is the number of clusters or nodes and N_{
F
}is the total number of genes in the data set assigned to group F. If there is no association between the functional group and the cluster solution then all proportions are the same, i.e., the differences between proportions d_{
ij
}= 0 where
If there is an association then some π_{
k
}will be large and others small. The test for functional relevance of a given clustering is conducted in a stepwise way.
Step 1: Perform a global test of the equality of proportions, i.e., test the null hypothesis that all proportions are the same
The test procedure stops if there is no difference in proportions. But if there are significant differences in proportions each single difference has to be investigated in more detail. If the proportion of functionally related genes is the same in two clusters these two clusters are similar with respect to functional group F and can therefore be merged. This procedure yields separated subgraphs with common gene function within the neighborhood graph.
Without knowledge about the cluster structure and the similarities between clusters given in the neighborhood graph G each pair of clusters has to be tested for a significant difference in proportions, i.e., K(K  1)/2 tests have to be conducted. Using the neighborhood structure only a fraction of all possible pairs, i.e., clusters connected by an edge have to be tested. A further reduction of tests can be achieved by taking into account only nodes where the number of functionally assigned genes is above a threshold m.
Step 2: Assess the significance of the observed differences with respect to a reference distribution by permuting the function labels. The null hypothesis is again no difference in proportions.

Select all clusters where the number of functionally assigned genes is above the predefined threshold m and conduct all further calculations on the resulting subgraph G'.

Calculate the difference between proportions d_{
ij
}, i, j = 1,..., K for each edge in the subgraph.

Permute the function labels, i.e., randomly assign genes to functional group F, where is the number of assigned genes in the subgraph G' with ≤ N_{
F
}. Compute the resulting differences in proportions , i, j = 1,..., K and keep the respective maximum
as used in [11] to form a reference distribution where L is the number of permutations considered.

Compute marginal tests whether a particular d_{
ij
}is extreme relative to the joint distribution M^{l}, i.e., compute how often the maximum of the permuted differences in proportions is larger than the observed one.
In other words, if the observed difference in proportions is very unlikely with respect to the reference distribution of the maxima M^{l}the edge will be removed. In this procedure a modified neighborhood graph is formed for the cluster solution and functional group under investigation. In this modified graph two clusters are only connected if they have

1.
a large similarity value s and

2.
no significant difference in proportions of functionally related genes.
Compare cluster results
Validation of microarray cluster results is a challenging task (e.g., [2]) as there is in general no true cluster membership. The quality of a cluster solution should be judged based on its ability to provide insight into the underlying mechanistic biology. As described in the previous section the validity of a cluster solution can be judged based on its ability to find groups of functionally related genes. Another approach is to find genes with common mechanism of regulation by searching for groups of genes that show a common response in different experiments.
For that purpose another test procedure was developed. We test how valid a given cluster solution is on a different data set taking into account the average within cluster distance W = (w_{1},..., w_{
K
}) where
Let X_{
N
}be the data matrix of N genes for a given experiment and let M be the vector of length N of the corresponding cluster memberships. Further let Y_{
N
}be the data matrix of the same N genes in a different experiment. In order to test if the cluster memberships M found for data set X_{
N
}are also valid in data set Y_{
N
}the following procedure is used.

1.
Compute the new cluster centroids for data set Y_{
N
}using the vector of cluster memberships M.

2.
For each cluster k compute the average within cluster distance of data points y_{
n
}to their assigned centroid , i.e.,

3.
Permute the cluster memberships, i.e., randomly assign the genes to clusters but do not modify cluster sizes. Compute the resulting average within cluster distance for each cluster and keep the where L is the number of permutations considered.

4.
Compute marginal tests for each cluster of whether a particular is extreme relative to the joint distribution of .
For each k where k = 1,..., K a single test is performed with the null hypothesis
and the alternative hypothesis is
The null hypothesis is rejected if the propability of observing a smaller within cluster distance by randomly assigning genes to clusters is less than e.g. 5%. In this case there is a relationship between the investigated cluster solution on the original data set and on the new data set and genes with common expression pattern across experiments are found.
Data
E. coli cultivation data were collected at the Department of Biotechnology of the University of Natural Resources and Applied Life Sciences in Vienna. Two recombinant E. coli processes with different induction strategies were conducted in order to evaluate the influence of the expression level of the inclusion body forming protein N^{pro}GFPmut3.1 on the host metabolism [12]. The standard strategy with a single pulse of inducer yielding in a fully induced system was compared to a process with continuous supply of limiting amounts of inducer resulting in a partially induced system [13]. In order to analyze the cellular response to different induction strategies on the transcription level two independent DNA microarray experiments were performed. A dyeswap design was used and the cells in the noninduced state of each experiment were compared to samples past induction. The two experiments are available at ArrayExpress http://www.ebi.ac.uk/microarrayas/ae/. The experiment with fully induced E. coli expression system has accession number EMARS16 and the experiment with partially induced system has accession number EMARS17. For standard low level analysis the data were preprocessed using printtip loess normalization. Differential expression estimates were calculated using Bioconductor ([14], http://www.bioconductor.org) package limma [15]. The two data sets were filtered by selecting genes with pvalue of the corresponding Fstatistic smaller 0.05. Additionally, only genes expressed at a certain level (average log intensity A larger 8) and genes with clearly defined pattern (logratio M larger ± 1.5 at least at one time point) were used. After filtering the data acquired from the experiment with a fully induced E. coli expression system consists of 733 genes and the data acquired from the process with limited induction consists of 429 genes.
For the functional relevance test another E. coli experiment was used where various mutants were investigated under oxygen deprivation [16]. The mutants were designed to monitor the response from E. coli during an oxygen shift in order to target the a priori most relevant part of the transcriptional network by using six strains with knockouts of key transcriptional regulators in the oxygen response. These experiments provide expression profiles for 4205 genes derived from the original data set downloaded from the Gene Expression Omnibus [17] with accession GDS680 by applying the altering steps described in [18].
Functional grouping
Cluster analysis is used to find groups of coregulated genes in the microarray data without prior knowledge about the gene functions. However, by clustering expression profiles of coexpressed genes groups of genes with similar function are often found.
The annotation of genes to categories or classes is a very important aspect in the analysis of gene expression data. The genes can for example be mapped to functional groups like Gene Ontology (GO, [9]) classifications or to protein complexes. Gene functions are very complex, therefore genes are usually mapped to multiple classes. In any case the mapping is known a priori and does not depend on the data of the currently investigated experiment.
External information about the annotation of genes to functional groups can easily be included in the neighborhood graph, e.g., the accumulation of gene ontology (GO) classifications in certain gene clusters can be highlighted in the node representation. In microarray data analysis gene ontology classifications about Biological Process, Molecular Function and Cellular Component are typically investigated. In this study experimental data from E. coli is used where further sources of external knowledge are the GenProtEC ([19], http://genprotec.mbl.edu/) classification system for cellular and physiological roles of E. coli gene products and the RegulonDB ([20], http://regulondb.ccg.unam.mx/) for detailed information about operons and regulons.
Software and implementation
All cluster algorithms and visualization methods used are implemented in the statistical computing environment R[21]. R package flexclust [7] is a flexible toolbox to investigate the influence of distance measures and cluster algorithms. It contains extensible implementations of the Kcentroids and QTClust algorithm and offers the possibility to try out a variety of distance or similarity measures as cluster algorithms are treated separately from distance measures. New distance measures and centroid computations can easily be incorporated into cluster procedures. The default plotting method for cluster solutions in flexclust is the neighborhood graph.
A linear projection of the data into 2 dimensions using for example linear discriminant analysis (LDA) has the advantage that the lengths of edges in the graph are directly interpretable. However, LDA does not scale well in the number of clusters, and relationships between the centroids of more than 15 clusters can hardly be displayed in the plane. As shown in [22] linear methods cannot be used for highdimensional gene expression data and a large number of clusters. R package gcExplorer [8] uses nonlinear layout algorithms implemented in the open source graph visualization software Graphviz (http://www.graphviz.org/) for the display of neighborhood graphs. Bioconductor packages graph and Rgraphviz [23] provide tools for creating, manipulating, and visualizing graphs in R as well as an interface to Graphviz. Rgraphviz returns the layout information for a graph object, x and ycoordinates of the graph's nodes as well as the parameterization of the trajectories of the edges. Several layout algorithms can be chosen:
dot: hierarchical layout algorithm for directed graphs
neato and fdp: layout algorithms for large undirected graphs
twopi: radial layout
circo: circular layout
The default layout algorithm in gcExplorer is "dot". Even though distances between nodes and length of edges are no longer interpretable when using nonlinear layout algorithms the increase in readability and clear arrangement is obvious.
The latest release of gcExplorer is always available at the Comprehensive R Archive Network CRAN: (http://cran.Rproject.org/package=gcExplorer). Details on how to use the gcExplorer can be found in the online appendix [see Additional file 1 for the vignette and Additional file 2 for the corresponding R code].