Exploratory and inferential analysis of gene cluster neighborhood graphs
 Theresa Scharl^{1, 2},
 Ingo Voglhuber^{1} and
 Friedrich Leisch^{3}Email author
DOI: 10.1186/1471210510288
© Scharl et al; licensee BioMed Central Ltd. 2009
Received: 5 May 2009
Accepted: 14 September 2009
Published: 14 September 2009
Abstract
Background
Many different cluster methods are frequently used in gene expression data analysis to find groups of coexpressed genes. However, cluster algorithms with the ability to visualize the resulting clusters are usually preferred. The visualization of gene clusters gives practitioners an understanding of the cluster structure of their data and makes it easier to interpret the cluster results.
Results
In this paper recent extensions of R package gcExplorer are presented. gcExplorer is an interactive visualization toolbox for the investigation of the overall cluster structure as well as single clusters. The different visualization options including arbitrary node and panel functions are described in detail. Finally the toolbox can be used to investigate the quality of a given clustering graphically as well as theoretically by testing the association between a partition and a functional group under study.
Conclusion
It is shown that gcExplorer is a very helpful tool for a general exploration of microarray experiments. The identification of potentially interesting gene candidates or functional groups is substantially accelerated and eased. Inferential analysis on a cluster solution is used to judge its ability to provide insight into the underlying mechanistic biology of the experiment.
Background
Cluster analysis is frequently used in gene expression data analysis to find groups of coexpressed genes which can finally suggest functional pathways and interactions between genes. Clusters of coexpressed genes can help to discover potentially coregulated genes or association to conditions under investigation. Usually cluster analysis provides a good initial investigation of microarray data before actually focusing on functional subgroups of interest. Genetic interactions are complex and the definition of gene clusters is often not clear. Additionally microarray data are very noisy and coexpressed genes can end up in different clusters. Therefore the set of genes may be divided into artificial subsets where relationships between clusters play an important role.
In the literature numerous methods for clustering gene expression data have been proposed. Detailed reviews of currently used methods and challenges with gene expression data are given in [1–3]. The display of cluster solutions particularly for a large number of clusters is very important in exploratory data analysis. Visualization methods are necessary in order to make cluster analysis useful for practitioners. They give an understanding of the relationships between segments of a partition and make it easier to interpret the cluster results. In hierarchical clustering dendrograms and heatmaps are routinely used (e.g., [4]). The most popular group of partitioning cluster algorithms are centroidbased cluster algorithms (e.g., Kmeans or Partitioning Around Medoids). Once a set of centroids has been found centroidbased cluster solutions are usually visualized by projection of the data into two dimensions (e.g., by principal component analysis). Silhouette plots [5] can be used to check whether clusters of points are well separated whereas topology representing networks [6] reveal similarity between clusters. Neighborhood graphs [7] combine these two approaches to visualize cluster structure.
In this paper recent extensions of R package gcExplorer [8] are presented. In the package neighborhood graphs are used for visual assessment of the cluster structure. Several node functions can be used to add further information to the graph, e.g., cluster size or cluster tightness. Additionally it is possible to use distinct graphical symbols for the representation of single clusters, e.g. line plots or boxplots. Beside the node function a panel function is implemented allowing to explore the corresponding clusters interactively in more detail by looking at arbitrary cluster plots or HTML tables of the group of genes under investigation. Further, external information about the genes like gene function or association to gene sets like Gene Ontology [9] can easily be integrated into the exploration. Finally the toolbox can be used to investigate the quality of a given clustering graphically as well as theoretically. In the functional relevance test the association between a partition and a functional group under study is tested. Further, the validity of a cluster solution under different experimental conditions is tested.
Methods
The visualization methods discussed in this paper are designed for cluster solutions of partitioning cluster algorithms where clusters can be represented by centroids (e.g., Kmeans and PAM or QTClust [10]).
Neighborhood graphs
Neighborhood graphs [7] use the mean relative distances between points and centers as edge weights in order to measure how separated pairs of clusters are. Hence they display the distance between clusters. In the graph each node corresponds to a cluster centroid and two nodes are connected by an edge if there exists at least one point that has these two as closest and secondclosest centroid.
A_{ i } is used in the denominator instead of A_{ ij } to make sure that a small set A_{ ij }consisting only of badly clustered points with large shadow values does not induce large cluster similarity.
Functional relevance test
Now the obtained similarity between clusters and the neighborhood graph can be used to evaluate a cluster result at hand. The cluster structure can be used to decide whether the clustering is too coarse and needs further subdivision to respect the data or if it is too fine and some clusters should be merged. On the one hand this can be accomplished by defining some threshold t for the shadow value s above which two clusters are merged. In the case of too large clusters more accurate clusters can for instance be obtained by running the algorithm again with larger K.
On the other hand external knowledge about the data can be used to validate a given clustering. In the case of microarray data a priori information about gene function or the association to functional groups can be used as functionally related genes are more likely to be coexpressed. Clusters with similar expression pattern are connected in the neighborhood graph. If functional group F is independent of the experimental setup genes classified to group F will be assigned to arbitrary clusters, i.e., they are assumed to be spread all over the neighborhood graph. Further, genes functionally independent of the experimental setup do not have a common expression pattern. If functional group F plays a role in the experiment the corresponding genes are more likely to show a typical pattern of either up or downregulation and there should be clusters with accumulation of such genes.
If there is an association then some π_{ k }will be large and others small. The test for functional relevance of a given clustering is conducted in a stepwise way.
The test procedure stops if there is no difference in proportions. But if there are significant differences in proportions each single difference has to be investigated in more detail. If the proportion of functionally related genes is the same in two clusters these two clusters are similar with respect to functional group F and can therefore be merged. This procedure yields separated subgraphs with common gene function within the neighborhood graph.
Without knowledge about the cluster structure and the similarities between clusters given in the neighborhood graph G each pair of clusters has to be tested for a significant difference in proportions, i.e., K(K  1)/2 tests have to be conducted. Using the neighborhood structure only a fraction of all possible pairs, i.e., clusters connected by an edge have to be tested. A further reduction of tests can be achieved by taking into account only nodes where the number of functionally assigned genes is above a threshold m.
Step 2: Assess the significance of the observed differences with respect to a reference distribution by permuting the function labels. The null hypothesis is again no difference in proportions.

Select all clusters where the number of functionally assigned genes is above the predefined threshold m and conduct all further calculations on the resulting subgraph G'.

Calculate the difference between proportions d_{ ij }, i, j = 1,..., K for each edge in the subgraph.
as used in [11] to form a reference distribution where L is the number of permutations considered.

Compute marginal tests whether a particular d_{ ij }is extreme relative to the joint distribution M^{ l }, i.e., compute how often the maximum of the permuted differences in proportions is larger than the observed one.
 1.
a large similarity value s and
 2.
no significant difference in proportions of functionally related genes.
Compare cluster results
Validation of microarray cluster results is a challenging task (e.g., [2]) as there is in general no true cluster membership. The quality of a cluster solution should be judged based on its ability to provide insight into the underlying mechanistic biology. As described in the previous section the validity of a cluster solution can be judged based on its ability to find groups of functionally related genes. Another approach is to find genes with common mechanism of regulation by searching for groups of genes that show a common response in different experiments.
 1.
 2.
 3.
Permute the cluster memberships, i.e., randomly assign the genes to clusters but do not modify cluster sizes. Compute the resulting average within cluster distance for each cluster and keep the where L is the number of permutations considered.
 4.
Compute marginal tests for each cluster of whether a particular is extreme relative to the joint distribution of .
The null hypothesis is rejected if the propability of observing a smaller within cluster distance by randomly assigning genes to clusters is less than e.g. 5%. In this case there is a relationship between the investigated cluster solution on the original data set and on the new data set and genes with common expression pattern across experiments are found.
Data
E. coli cultivation data were collected at the Department of Biotechnology of the University of Natural Resources and Applied Life Sciences in Vienna. Two recombinant E. coli processes with different induction strategies were conducted in order to evaluate the influence of the expression level of the inclusion body forming protein N^{ pro }GFPmut3.1 on the host metabolism [12]. The standard strategy with a single pulse of inducer yielding in a fully induced system was compared to a process with continuous supply of limiting amounts of inducer resulting in a partially induced system [13]. In order to analyze the cellular response to different induction strategies on the transcription level two independent DNA microarray experiments were performed. A dyeswap design was used and the cells in the noninduced state of each experiment were compared to samples past induction. The two experiments are available at ArrayExpress http://www.ebi.ac.uk/microarrayas/ae/. The experiment with fully induced E. coli expression system has accession number EMARS16 and the experiment with partially induced system has accession number EMARS17. For standard low level analysis the data were preprocessed using printtip loess normalization. Differential expression estimates were calculated using Bioconductor ([14], http://www.bioconductor.org) package limma [15]. The two data sets were filtered by selecting genes with pvalue of the corresponding Fstatistic smaller 0.05. Additionally, only genes expressed at a certain level (average log intensity A larger 8) and genes with clearly defined pattern (logratio M larger ± 1.5 at least at one time point) were used. After filtering the data acquired from the experiment with a fully induced E. coli expression system consists of 733 genes and the data acquired from the process with limited induction consists of 429 genes.
For the functional relevance test another E. coli experiment was used where various mutants were investigated under oxygen deprivation [16]. The mutants were designed to monitor the response from E. coli during an oxygen shift in order to target the a priori most relevant part of the transcriptional network by using six strains with knockouts of key transcriptional regulators in the oxygen response. These experiments provide expression profiles for 4205 genes derived from the original data set downloaded from the Gene Expression Omnibus [17] with accession GDS680 by applying the altering steps described in [18].
Functional grouping
Cluster analysis is used to find groups of coregulated genes in the microarray data without prior knowledge about the gene functions. However, by clustering expression profiles of coexpressed genes groups of genes with similar function are often found.
The annotation of genes to categories or classes is a very important aspect in the analysis of gene expression data. The genes can for example be mapped to functional groups like Gene Ontology (GO, [9]) classifications or to protein complexes. Gene functions are very complex, therefore genes are usually mapped to multiple classes. In any case the mapping is known a priori and does not depend on the data of the currently investigated experiment.
External information about the annotation of genes to functional groups can easily be included in the neighborhood graph, e.g., the accumulation of gene ontology (GO) classifications in certain gene clusters can be highlighted in the node representation. In microarray data analysis gene ontology classifications about Biological Process, Molecular Function and Cellular Component are typically investigated. In this study experimental data from E. coli is used where further sources of external knowledge are the GenProtEC ([19], http://genprotec.mbl.edu/) classification system for cellular and physiological roles of E. coli gene products and the RegulonDB ([20], http://regulondb.ccg.unam.mx/) for detailed information about operons and regulons.
Software and implementation
All cluster algorithms and visualization methods used are implemented in the statistical computing environment R[21]. R package flexclust [7] is a flexible toolbox to investigate the influence of distance measures and cluster algorithms. It contains extensible implementations of the Kcentroids and QTClust algorithm and offers the possibility to try out a variety of distance or similarity measures as cluster algorithms are treated separately from distance measures. New distance measures and centroid computations can easily be incorporated into cluster procedures. The default plotting method for cluster solutions in flexclust is the neighborhood graph.
A linear projection of the data into 2 dimensions using for example linear discriminant analysis (LDA) has the advantage that the lengths of edges in the graph are directly interpretable. However, LDA does not scale well in the number of clusters, and relationships between the centroids of more than 15 clusters can hardly be displayed in the plane. As shown in [22] linear methods cannot be used for highdimensional gene expression data and a large number of clusters. R package gcExplorer [8] uses nonlinear layout algorithms implemented in the open source graph visualization software Graphviz (http://www.graphviz.org/) for the display of neighborhood graphs. Bioconductor packages graph and Rgraphviz [23] provide tools for creating, manipulating, and visualizing graphs in R as well as an interface to Graphviz. Rgraphviz returns the layout information for a graph object, x and ycoordinates of the graph's nodes as well as the parameterization of the trajectories of the edges. Several layout algorithms can be chosen:
dot: hierarchical layout algorithm for directed graphs
neato and fdp: layout algorithms for large undirected graphs
twopi: radial layout
circo: circular layout
The default layout algorithm in gcExplorer is "dot". Even though distances between nodes and length of edges are no longer interpretable when using nonlinear layout algorithms the increase in readability and clear arrangement is obvious.
The latest release of gcExplorer is always available at the Comprehensive R Archive Network CRAN: (http://cran.Rproject.org/package=gcExplorer). Details on how to use the gcExplorer can be found in the online appendix [see Additional file 1 for the vignette and Additional file 2 for the corresponding R code].
Results and Discussion
Exploratory analysis
Now the PS19 data is used to demonstrate the new functionality of gcExplorer. The data is clustered using stochastic QTClust [24] yielding a cluster object which consists of 14 clusters.
Color coding of nodes
In the graph shown above one single kind of node symbol is used for all nodes. This way no information about the different clusters is revealed. There are several possibilities how to include additional information in the representation of nodes. The most simple method is to use color coding, e.g., to color nodes by size or tightness of the corresponding clusters. In this case the color of a node depends on the distribution of a certain property over all nodes where the maximum will get the darkest and the minimum will get the brightest color. Usually the smaller or tighter clusters are more interesting and can more easily be explored. The percentage of genes in a cluster assigned to a functional group under investigation can also be used for color coding. The visualization of functional groups in the graph is not only a validation of the cluster method. It is also a very helpful tool for practitioners to quickly find subgroups of genes related to specific functions under study.
In panel (d) the GO term "flagellar motility" is shown which is part of the biological process classification. Flagellar motility is an example of a functional group where the corresponding genes have similar expression profiles and are therefore grouped into similar clusters (i.e., clusters 11, 3 and 14) which are connected by edges in the neighborhood graph. In the case of σ_{32}regulated genes (panel (c)) there is no clear relationship between the cluster solution and the functional group as the corresponding genes are located in various clusters.
Node symbols
The second option for adding further information to the display of the neighborhood graph is to use different graphical symbols for the representation of nodes. For that purpose gcExplorer makes use of R package symbols ([25], http://rforge.rproject.org/projects/symbols). symbols is based on Grid[26], a very flexible graphics system for R. Grid features viewports, i.e., rectangular areas allowing the creation of plotting regions all over the R graphic device. Due to the layout algorithms used in the gcExplorer nodes remain quite large allowing large viewports for the visualization of nodes. Several gridbased functions are implemented in package symbols which can directly be used as node functions in the gcExplorer.
Directed vs. undirected graph
The neighborhood graph is a directed graph as the similarity of cluster 1 to cluster 4 is different from the similarity of cluster 4 to cluster 1 and so on. Besides plotting the original directed graph there are several options how to plot edges taking into account for instance the mean, minimum or maximum of the similarities between two clusters. In practice the mean similarity is frequently used especially when testing the functional relationship between clusters (an example is given below).
Graph modifications
The nonlinear layout algorithms implemented in Graphviz are optimized for the given set of nodes and edges. Removing an edge or a node will result in a different graph which makes comparisons between graphs rather complicated. R package gcExplorer contains the function gcModify which allows to modify a given graph without changing the original layout. There are several possibilities how to modify a given graph. However, it is only possible to remove nodes and edges from a larger graph. Adding new nodes and edges is not allowed. The node symbols are independent of the graph structure so different node functions can be used in each modified graph.
Comparisons of different cutoff values as shown in Figure 6 are only possible when starting with the largest set of edges.
Inferential analysis
Compare cluster solutions
Result of comp_test. Judge the validity of the PS19 cluster solution for the PS17 data using the comp_test.
size  obs. av. dist  5% quantile. perm  p. val. lower  

1  302  0.58  0.95  0.00 
2  299  0.55  0.94  0.00 
3  41  0.65  0.83  0.00 
4  59  0.62  0.85  0.00 
5  52  0.73  0.84  0.00 
6  31  0.61  0.79  0.00 
7  30  0.66  0.78  0.00 
8  26  0.82  0.77  0.10 
9  14  0.52  0.68  0.00 
10  10  0.38  0.62  0.00 
11  10  0.70  0.63  0.12 
12  5  0.49  0.45  0.07 
13  12  0.96  0.66  0.53 
14  10  0.62  0.63  0.04 
Functional relevance test
Result of functional relevance test.
Clsize1  Clsize2  Diff. in. Prop.  Pvalue  

1^{~}2  671  526  0.02  1.00 
1^{~}3  671  424  0.01  1.00 
4^{~}6  378  209  0.02  1.00 
2^{~}7  526  121  0.01  1.00 
4^{~}7  378  121  0.02  1.00 
6^{~}8  209  108  0.01  1.00 
4^{~}12  378  16  0.11  0.59 
1^{~}14  671  33  0.14  0.51 
2^{~}14  526  33  0.16  0.50 
1^{~}16  671  13  0.11  0.59 
3^{~}16  424  13  0.12  0.57 
1^{~}21  671  9  0.40  0.00 
3^{~}21  424  9  0.41  0.00 
14^{~}21  33  9  0.26  0.05 
14^{~}22  33  12  0.48  0.00 
21^{~}22  9  12  0.22  0.13 
4^{~}25  378  10  0.19  0.29 
6^{~}25  209  10  0.17  0.34 
12^{~}25  16  10  0.08  0.93 
2^{~}32  526  11  0.34  0.01 
7^{~}32  121  11  0.33  0.03 
12^{~}32  16  11  0.24  0.05 
22^{~}32  12  11  0.30  0.03 
3^{~}34  424  6  0.30  0.03 
5^{~}34  263  6  0.33  0.03 
21^{~}34  9  6  0.11  0.77 
2^{~}35  526  17  0.09  0.81 
21^{~}36  9  5  0.04  1.00 
34^{~}36  6  5  0.07  0.94 
22^{~}43  12  9  0.44  0.00 
32^{~}43  11  9  0.14  0.51 
36^{~}43  5  9  0.18  0.33 
Power simulations for the functional relevance test
The power of the functional relevance test is simulated on artificial cluster solutions. For defined

datasize

number of clusters

difference in proportions between cluster 1 and 2

proportion of grouped genes in cluster 1

proportion of grouped genes in the total data set
Power simulations for the functional relevance test.
Data size  prop. c1  prop. all  d 0.05  d 0.1  d 0.15  d 0.2  d 0.25  d 0.3  d 0.35  d 0.4 

100  0.50  0.50  0  0.000  0.000  0.000  0.004  0.043  0.062  0.108 
100  0.50  0.33  0  0.000  0.000  0.000  0.010  0.044  0.095  0.179 
100  0.50  0.25  0  0.000  0.000  0.000  0.011  0.074  0.129  0.229 
100  0.50  0.20  0  0.000  0.000  0.001  0.018  0.078  0.186  0.300 
100  0.33  0.50  0  0.000  0.001  0.005  0.033  0.051  0.033  0.029 
100  0.33  0.33  0  0.000  0.000  0.006  0.035  0.068  0.071  0.044 
100  0.33  0.25  0  0.000  0.000  0.013  0.049  0.065  0.074  0.062 
100  0.33  0.20  0  0.000  0.001  0.020  0.064  0.087  0.088  0.080 
500  0.50  0.50  0  0.000  0.010  0.084  0.276  0.653  0.999  1.000 
500  0.50  0.33  0  0.000  0.015  0.137  0.442  0.918  1.000  1.000 
500  0.50  0.25  0  0.000  0.010  0.180  0.606  0.996  1.000  1.000 
500  0.50  0.20  0  0.000  0.025  0.248  0.700  1.000  1.000  1.000 
500  0.33  0.50  0  0.001  0.026  0.159  0.384  0.747  0.764  0.450 
500  0.33  0.33  0  0.001  0.069  0.242  0.551  0.978  0.889  0.669 
500  0.33  0.25  0  0.002  0.074  0.301  0.733  1.000  0.909  0.905 
500  0.33  0.20  0  0.000  0.098  0.414  0.903  1.000  0.935  0.976 
Conclusion
Clustering gene expression profiles is a helpful tool for finding biologically meaningful groups of genes without prior information from databases. As the definition of gene clusters is not very clear and genetic interactions are extremely complex the relationship between clusters is very important and coexpressed genes can end up in different clusters. In order to make cluster analysis useful for practitioners the interactive visualization tool gcExplorer was developed. It allows not only to visualize the cluster structure in form of neighborhood graphs, beyond the gene clusters are plotted or shown in HTML tables with links to databases. In this paper recent extensions of the package were presented including different node representations using node coloring and the choice of node symbols. Additional properties of the clusters like cluster size or cluster tightness can be highlighted as well as external information like functional grouping. Graphs can be modified by removing nodes and edges or by zooming into a subgraph of interest. Further, the functional relevance of a clustering can be tested using external information about gene function from databases. Finally, the validity of a cluster solution can be judged based on its performance on another data set where the same set of genes is investigated under different experimental conditions.
Availability and requirements
Project name: gcExplorer; Project home page: http://cran.Rproject.org/package=gcExplorer. Operating system(s): A wide variety of UNIX platforms, Windows and MacOS. Programming language: R; License: GPL2.
The gcExplorer package and its associated packages are part of the R/Bioconductor project, an environment for statistical computing and bioinformatics. The R software environment is freely available at http://www.rproject.org. The dependencies flexclust and Rgraphviz can be downloaded from CRAN http://cran.rproject.org and the Bioconductor project website http://bioconductor.org.
Declarations
Acknowledgements
This work was supported by the Austrian K_{ ind }/K_{ net }Center of Biopharmaceutical Technology (ACBT).
The authors would like to thank Gerald Striedner and Karoline Marisch for software testing and valuable feedback.
Authors’ Affiliations
References
 Sheng Q, Moreau Y, Smet FD, Marchal K, Moor BD: Advances in Cluster Analysis of Microarray Data. In Data Analysis and Visualization in Genomics and Proteomics. Edited by: Azuaje F, Dopazo J. John Wiley & Sons, Ltd; 2005. ISBN 0–470–09439–7 ISBN 0470094397Google Scholar
 Androulakis I, Yang E, Almon R: Analysis of TimeSeries Gene Expression Data: Methods, Challenges, and Opportunities. Annual Review of Biomedical Engineering 2007, 9: 205–228. 10.1146/annurev.bioeng.9.060906.151904PubMed CentralView ArticlePubMedGoogle Scholar
 Kerr G, Ruskin HJ, Crane M, Doolan P: Techniques for clustering gene expression data. Comput Biol Med 2008, 38(3):283–293. 10.1016/j.compbiomed.2007.11.001View ArticlePubMedGoogle Scholar
 Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genomewide expression patterns. Proc Natl Acad Sci USA 1998, 95: 14863–14868. 10.1073/pnas.95.25.14863PubMed CentralView ArticlePubMedGoogle Scholar
 Rousseeuw P: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 1987, 20: 53–65. 10.1016/03770427(87)901257View ArticleGoogle Scholar
 Martinetz T, Schulten K: Topology representing networks. Neural Networks 1994, 7(3):507–522. 10.1016/08936080(94)901090View ArticleGoogle Scholar
 Leisch F: A Toolbox for KCentroids Cluster Analysis. Computational Statistics and Data Analysis 2006, 51(2):526–544. 10.1016/j.csda.2005.10.006View ArticleGoogle Scholar
 Scharl T, Leisch F: gcExplorer: Interactive Exploration of Gene Clusters. Bioinformatics 2009, 25(8):1089–1090. 10.1093/bioinformatics/btp099View ArticlePubMedGoogle Scholar
 The Gene Ontology Consortium: Gene Ontology: tool for the unification of biology. Nature Genetics 2000, 25: 25–29. 10.1038/75556PubMed CentralView ArticleGoogle Scholar
 Heyer LJ, Kruglyak S, Yooseph S: Exploring Expression Data: Identification and Analysis of Coexpressed Genes. Genome Research 1999, 9: 1106–1115. 10.1101/gr.9.11.1106PubMed CentralView ArticlePubMedGoogle Scholar
 Zeileis A, Meyer D, Hornik K: ResidualBased Shadings for Visualizing (Conditional) Independence. Journal of Computational and Graphical Statistics 2007, 16(3):507–525. 10.1198/106186007X237856View ArticleGoogle Scholar
 Scharl T, Striedner G, Pötschacher F, Leisch F, Bayer K: Interactive visualization of clusters in microarray data: an efficient tool for improved metabolic analysis of E. coli. Microbial Cell Factories 2009, 8: 37. 10.1186/14752859837PubMed CentralView ArticlePubMedGoogle Scholar
 Striedner G, CserjanPuschmann M, Pötschacher F, Bayer K: Tuning the transcription rate of recombinant protein in strong Escherichia coli expression systems through repressor titration. Biotechnol Prog 2003, 19(5):1427–32. 10.1021/bp034050uView ArticlePubMedGoogle Scholar
 Gentleman R, Carey VJ, Huber W, Irizarry RA, Dudoit S, (Eds): Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Statistics for Biology and Health, New York: SpringerVerlag; 2005. ISBN 978–0387–25146–2 ISBN 9780387251462Google Scholar
 Smyth GK: Limma: linear models for microarray data. In Bioinformatics and Computational Biology Solutions Using R and Bioconductor, Statistics for Biology and Health. Edited by: Gentleman R, Carey VJ, Huber W, Irizarry RA, Dudoit S. New York: SpringerVerlag; 2005. ISBN 978–0387–25146–2 ISBN 9780387251462Google Scholar
 Covert M, Knight E, Reed J, Herrgard M, Palsson B: Integrating highthroughput and computational data elucidates bacterial networks. Nature 2004, 429(6987):92–96. 10.1038/nature02456View ArticlePubMedGoogle Scholar
 Barrett T, Troup D, Wilhite S, Ledoux P, Rudnev D, Evangelista C, Kim I, Soboleva A, Tomashevsky M, Edgar R: NCBI GEO: mining tens of millions of expression profiles  database and tools update. Nucleic Acids Res 2007, 35: D760–5. 10.1093/nar/gkl887PubMed CentralView ArticlePubMedGoogle Scholar
 Castelo R, Roverato A: Reverse engineering molecular regulatory networks from microarray data with qpgraphs. Journal of Computational Biology 2009, 16(2):213–227. 10.1089/cmb.2008.08TTView ArticlePubMedGoogle Scholar
 Serres M, Goswami S, Riley M: GenProtEC: an updated and improved analysis of functions of Escherichia coli K12 proteins. Nucleic Acids Res 2004, 32: D300–2. 10.1093/nar/gkh087PubMed CentralView ArticlePubMedGoogle Scholar
 Salgado H, GamaCastro S, PeraltaGil M, DiazPeredo E, SanchezSolano F, SantosZavaleta A, MartinezFlores I, JimenezJacinto V, BonavidesMartinez C, SeguraSalazar J, MartinezAntonio A, ColladoVides J: RegulonDB (version 5.0): Escherichia coli K12 transcriptional regulatory network, operon organization, and growth conditions. Nucleic Acids Res 2006, (34 Database):D394–7. 10.1093/nar/gkj156Google Scholar
 R Development Core Team:R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria; 2009. ISBN 3–900051–07–0 [http://www.Rproject.org] ISBN 3900051070Google Scholar
 Scharl T, Leisch F: Visualizing Gene Clusters Using Neighborhood Graphs in R. In Proceedings of COMPSTAT' International Conference on Computational Statistics, Porto  Portugal, August 24th29th 2008). Edited by: Brito P. PhysicaVerlag; 2008:51–58.Google Scholar
 Carey VJ, Gentleman R, Huber W, Gentry J: Bioconductor Software for Graphs. In Bioinformatics and Computational Biology Solutions Using R and Bioconductor, Statistics for Biology and Health. Edited by: Gentleman R, Carey VJ, Huber W, Irizarry RA, Dudoit S. New York: SpringerVerlag; 2005. ISBN 978–0387–25146–2 ISBN 9780387251462Google Scholar
 Scharl T, Leisch F: The stochastic QTclust algorithm: evaluation of stability and variance on timecourse microarray data. In Compstat 2006Proceedings in Computational Statistics. Edited by: Rizzi A, Vichi M. Physica Verlag, Heidelberg, Germany; 2006:1015–1022.Google Scholar
 Voglhuber I: Visualization of CentroidBased Cluster Solutions. Vienna University of Technology, Austria; 2008. [Diploma Thesis] [Diploma Thesis]Google Scholar
 Murrell P: R Graphics. Chapman & Hall/CRC Computer Science & Data Analysis, Taylor & Francis, Inc; 2005.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.