Methods for simultaneously identifying coherent local clusters with smooth global patterns in gene expression profiles
 YinJing Tien^{1},
 YunShien Lee^{2, 3},
 HanMing Wu^{4} and
 ChunHouh Chen^{5}Email author
https://doi.org/10.1186/147121059155
© Tien et al; licensee BioMed Central Ltd. 2008
Received: 05 September 2007
Accepted: 20 March 2008
Published: 20 March 2008
Abstract
Background
The hierarchical clustering tree (HCT) with a dendrogram [1] and the singular value decomposition (SVD) with a dimensionreduced representative map [2] are popular methods for twoway sorting the genebyarray matrix map employed in gene expression profiling. While HCT dendrograms tend to optimize local coherent clustering patterns, SVD leading eigenvectors usually identify better global grouping and transitional structures.
Results
This study proposes a flipping mechanism for a conventional agglomerative HCT using a ranktwo ellipse (R2E, an improved SVD algorithm for sorting purpose) seriation by Chen [3] as an external reference. While HCTs always produce permutations with good local behaviour, the ranktwo ellipse seriation gives the best global grouping patterns and smooth transitional trends. The resulting algorithm automatically integrates the desirable properties of each method so that users have access to a clustering and visualization environment for gene expression profiles that preserves coherent local clusters and identifies global grouping trends.
Conclusion
We demonstrate, through four examples, that the proposed method not only possesses better numerical and statistical properties, it also provides more meaningful biomedical insights than other sorting algorithms. We suggest that sorted proximity matrices for genes and arrays, in addition to the genebyarray expression matrix, can greatly aid in the search for comprehensive understanding of gene expression structures. Software for the proposed methods can be obtained at http://gap.stat.sinica.edu.tw/Software/GAP.
Keywords
Background
Matrix visualization [4], for example the Cluster and TreeView package [5], is an important exploratory data analysis tool in the study of microarray gene expression profiles. The visual patterns of genes (rows) and arrays (columns) in the permuted genebyarray expression profile matrix are useful for clustering purposes. The hierarchical clustering tree and the singular value decomposition are the two methods for identifying suitable gene/array permutations. This section briefly reviews the advantages and disadvantages of the two techniques using the fibroblast to serum gene expression data [1, 6].
Hierarchical clustering tree (HCT)
The branching structure of a dendrogram plays an important role in identifying permutations of genes and arrays by its arrangement of intermediate nodes. For a given HCT with n terminal nodes (genes or arrays), there are n1 intermediate nodes. Each of these intermediate nodes can be flipped independently resulting in 2^{n1}possible orderings of the terminal nodes from the same dendrogram built on the identical proximity matrix. BarJoseph et al. [7] had detailed discussion on the HCT intermediate nodes flipping phenomena. It was first formulated by Gruvaeus and Wainer [8]. To order the leaves of a binary HCT when two ordered branches are merged, the new branch is formed by placing the similar endpoints of the joining branches adjacent to each other. Many different heuristic ordering methods [1, 9, 10] have also been suggested for solving this problem. BarJoseph et al. [7] presented a fast optimal leaf ordering for the hierarchical clustering algorithm that maximizes the sum of the similarities of adjacent leaves in the Travelling Salesman sense [11], and we refer to this approach as the optimal tree method. BarJoseph et al. [12] proposed a heuristic algorithm for constructing kary trees by extending and improving the optimal leaf ordering algorithm in [7].
Singular value decomposition (SVD) and Ranktwo ellipse seriation (R2E)
For identifying smooth transitional expression patterns and more globalgrouping structures, people turn to dimension reduction techniques, such as singular value decomposition, for help [2, 13, 14]. Alter et al. [2] laid down the mathematics of SVD for analyzing gene expression profiles and proposed the concept of eigenarrays and eigengenes as representative linear combinations of original arrays and genes. They further suggested that one sort the arrays and genes according to the relative positions on the subspaces spanned by the two leading eigenarrays and eigengenes.
Chen [3] introduced a sorting algorithm called ranktwo ellipse (R2E) seriation which improves the SVD method by extracting the elliptical structure of the converging sequence of iteratively formed correlation matrices using the eigenvalue decomposition. Figure 1b displays the resulting matrix visualization of the human fibroblasts expression profile sorted by the R2E algorithm. We see that the R2E sorted correlation matrix identifies a very smooth transitional pattern. More advantages of the R2E method over the SVD method will be discussed in the Methods section.
The proposed ranktwo ellipse seriationguided hierarchical clustering tree (HCT_R2E)
We propose to guide the flipping mechanism of a conventional agglomerative HCT using the ranktwo ellipse (R2E) seriation of Chen [3] as an external reference. The resulting algorithm automatically integrates the desirable properties of HCT and R2E so that users have access to a clustering and visualization environment for gene expression profiles that preserves coherent local clusters and identifies global grouping trends.
The R2Eguided HCT with the corresponding permuted matrices can be seen in Figure 1c. The permuted correlation and gene expression matrices in Figure 1c resemble the corresponding matrices in Figure 1b extremely well, meaning that the coherent local structure (clusters) identified by the HCT architecture and the smooth global transitional pattern explored by the R2E algorithm do not necessarily conflict with each other. An important note here is that the dendrogram (hierarchical tree) architecture (merging steps) in Figure 1c (with R2E guide) is identical to that of Figure 1a (without R2E guide). The only thing different is the flipping mechanism of intermediate nodes.
Global trend and the Robinson matrix
where I is an indicator function that outputs 1 if the condition is satisfied. More general antiRobinson scores, generalized antiRobinson (GAR), and relative generalized antiRobinson (RGAR) scores, are defined in the Methods section.
Results
Three additional real data sets, together with the fibroblast to serum gene expression data, are analyzed to demonstrate the performance of the proposed method. The first one is the annotated subset cell cycle data from [16]; the second is the severe acute respiratory syndrome coronavirus (SARSCoV) studied in [17]; the transition metal study in [18] is the final example. The same eight sorting algorithms (SVD with one eigenvector (SVD1), SVD with two eigenvectors (SVD2), selforganizing maps (SOM) [19], ranktwo ellipse (R2E), HCT with random flips (HCT_RAM), optimal tree (HCT_OPT), SOMguided tree (HCT_SOM), and R2Eguided tree (HCT_R2E)) are tested for all data sets. We only summarize the results of two HCT and two nonHCT algorithms: SVD2, R2E, HCT_OPT, and HCT_R2E. (Please see Additional file 1 for detailed comparison of all eight sorting algorithms.)
Fibroblast to serum data
Results
The GAR curves (windowsize ranges from 1 to 516) for the four sorting algorithms plotted in Figure 3a produce the following observations:

the R2E (smooth green line) clearly outperforms (lowest GAR scores) the other three methods ;

the HCT_OPT algorithm has poor global (large windowsize) performance;

the proposed HCT_R2E method outperforms HCT_OPT, and is nearly as good as the SVD2 algorithm in the global sense.
We plot in Figure 3b the relative generalized antiRobinson (RGAR) loss scores for better comparison of local behaviours among the four methods, to observe the following:

both HCT algorithms (curves with dots) outperform two nonHCT (smooth curves) in small windowsize area (1 ≦ w ≦ 50);

the optimal hierarchical clustering tree, HCT_OPT, has the best performance among the four HCTs for the smallest windowsize area (1 ≦ w ≦ 35);

the proposed HCT_R2E method actually scores best for a small period in the middle range (35 ≦ w ≦ 75);

the R2E algorithm dominates the competition from w = 100 on.
Without the visualization of two smooth transitional patterns for up and downregulated genes in Figure 1b, HCT in Figure 1a suggests many geneclusters with very coherent expression profiles, but with no knowledge of the possible embedded smooth transitional patterns. The proposed HCT_R2E method automatically integrates the coherent local property of HCT and the smooth global trend of R2E to provide users the improved Figure 1c. The visualization of the expression profile and the correlation matrices in Figure 1c provide users exploration for local behaviour of genes function closely together in small time scale and for more complicate global relationship with larger time interval simultaneously in such a time series expression experiment.
Yeast cell cycle data
These data are a subset of the original 6240 genes expressed at 17 time points used in Cho et al. [16]. We selected the 145 genes that have been biologically characterized and assigned to five different cell cycle phases (early G1, late G1, S, G2, and M). Expression at one abnormal time point was removed from the data set (as suggested by [20]) resulting in our gene expression profile of 145 genes at 16 time points.
Results
Matching scores of the rearranged phase positions for the 145 genes sorted by the eight seriation methods relative to the known (annotated) phase positions.
Seriation method  Match  Weighted match  Total deviation 

(a) SVD1  0.5103  0.6862  1584 
(b) SVD2  0.4552  0.6483  1688 
(c) SOM  0.6828  0.8068  907 
(d) R2E  0.6759  0.8034  915 
(e) HCT_RAM  0.5172  0.6759  1665 
(f) HCT_OPT  0.6552  0.7966  1056 
(g) HCT_SOM  0.5931  0.7379  1288 
(h) HCT_R2E  0.7103  0.8241  818 

SVD2 performed rather poorly;

HCT_OPT permutation showed better correlation to the known phases than SVD2;

HCT_R2E arranged the 145 genes at positions very close to their annotated phase positions.
Although the HCT_R2E algorithm aligned the 145 genes close to their known phases, several genes deviated far away from their annotated cell cycle phases, as can be seen from the cell cycle diagram in Figure 4c. We further examined the phase annotations provided by another yeast cell cycle study of Spellman et al. [21]; the crossannotated phase labels for both studies are listed in Additional file 2. The 15 genes with largest deviations from their annotated phase groups sorted by the proposed HCT_R2E algorithm are boldfaced. From the corresponding annotated phases of [21], in the last column, we see that the Spellman et al. [21] annotated phases for these 15 genes either fit better into the overall cell cycle pattern (e.g., YKL067W from S to G1, and YEL017W from early G1 to S/G2), or their phase conditions are not annotated (7 out of 15). This result further implies the proposed algorithm can be applied to either verify known biological conditions or to explore unknown phenomena.
Severe acute respiratory syndrome coronavirus (SARSCoV) data
In the severe acute respiratory syndrome (SARS) study of Lee et al. [17], the expression profiles of 52 signature genes are used to explore the betweensample severity pattern from normal controls to acute SARS patients. A Euclidean distance matrix among 55 samples (11 acute SARS (AS) patients, 33 recovering SARS (RS) patients, and 11 normal control (NC) subjects) using these 52 genes is computed to identify a potential order that could reflect the severity structure of the disease. There are three major differences between this SARS example and the yeast cell cycle data analysis. These are not time series gene expression data; the focus is on the betweensample structure instead of the gene set; and the proximity measure adopted is the betweensample Euclidean distance instead of the correlation coefficient.
Results

there is a clear unidimensional Robinson pattern for this SARS Euclidean matrix;

the HCT_OPT (Figure 5a) algorithm presented rather coherent local structure;

R2E (Figure 5b) sorted samples identify colour bands that exhibit a clear blue (NC) to yellow (RS) to red (AS) severity structure of the disease;

the Euclidean matrix sorted by the proposed HCT_R2E (Figure 5c) method displays very coherent local relationships, as well as extremely good global structure. Its identity colour band has a coherent within samplesubtype pattern
We have summarized the numerical comparisons (GAR, RGAR) for the eight sorting algorithms in Additional file 1.
Correlations between the SARS severity ranks derived from eight seriation methods using number of days after the onset of disease and clinical pulmonary infection score (CPIS).
Seriation method  Pearson correlation (days)  Pearson correlation (CPIS) 

(a) SVD1  0.6303  0.5006 
(b) SVD2  0.3276  0.1873 
(c) SOM  0.4028  0.2925 
(d) R2E  0.6497  0.4890 
(e) HCT_RAM  0.1551  0.0230 
(f) HCT_OPT  0.4249  0.5006 
(g) HCT_SOM  0.6468  0.3151 
(h) HCT_R2E  0.6693  0.5116 

the proposed HCT_R2E algorithm has the highest correlation with number of days after the onset of disease while the R2E method comes next;

the proposed HCT_R2E algorithm has the highest correlation with CPIS among all eight sorting methods, while the SVD1 and HCT_OPT algorithms share second place
From these comparisons we observe a significant advantage of the proposed R2Eguided hierarchical clustering tree in searching for meaningful biomedical information and correlation such that researchers can further propose more precise hypotheses and conducting more accurate experiments.
Transition metal stress data
Kaur et al. [18] tried to reconstruct physiological behaviours of Halobacterium NRC1, an archaeal halophile, in sublethal stress levels of six transition metals (Mn [II], Fe [II], Co [II], Ni [II], Cu [II], and Zn [II]). Halobacterium NRC1 was exposed for five hours to at least three concentrations of each of the six transition metals. In Figure 5 of [18], using 468 genes that changed significantly in at least two conditions out of a total of 19 (3 concentrations for each of the 6 transition metal with an additional concentration from Fe [II]), an HCT and a correspondence analysis (CA, [22]) are carried out (we only obtained 444 genes using identical selection criteria). Their HCT permutation for the 19 metal conditions does not correlate well with the pattern displayed in their CA plot for the conditions. Our task here is to guide the flips of HCT intermediate nodes by the R2E algorithm with the hope that the resulting permutation does not contradict that of the CA analysis.
Results
This study illustrates well that the proposed HCT_R2E method is capable of providing permutations with both good global and local properties, although the optimal HCT still outputs better local orders numerically. The accompanying distance matrix map clearly indicates the Zn(0.005) and Cu(0.7) conditions, in addition to the Ni [II] conditions, deviate from the main linear trend of these transition metals and the Robinson pattern.
Discussion and Conclusion
When analyzing gene expression profile data sets, researchers usually apply a hierarchical clustering tree (HCT) to search for coherent local clusters and the singular value decomposition (SVD) to identify smooth global trends. Users of HCT dendrograms would identify only local clusters without knowing the existence of global structure that might accompany cell cycleregulated experiments, dosage level studies, or subtypes of tumours. Applications of SVD on the other hand may overlook the importance of local behaviour.
While the optimal HCT [7] always produces permutations with best local behaviour, the ranktwo ellipse seriation [3] gives the best global grouping patterns and smooth transitional trends. The proposed hierarchical clustering tree guided by ranktwo ellipse seriation (HCT_R2E) nicely integrates these two extremes and provides users both coherent local clusters and smooth global patterns for gene expression profile studies.
In four data analyses, the proposed HCT_R2E algorithm not only exhibits outstanding numerical (statistical) performance, it also provides us better insights into the biomedical information embedded in these high dimensional data structures. Visualization of sorted proximity matrices in addition to the visualization of the expression profile matrices also greatly enhances the overall comprehension of the association structures of arrays and genes.
Applicability and limitation
As was illustrated in the two time series data sets, the proposed ranktwo ellipseguided hierarchical clustering (HCT_R2E) is very powerful in identifying smooth time series patterns. The SARS data and the transitional metal data, on the other hand, showed the proposed method can also be used to search for potential global grouping structure for genes, and for arrays embedded in the given gene expression profiles.
When the underlying clustering pattern is a clear disjoint one, the ranktwo ellipse seriation method is only capable of identifying the global betweencluster pattern, not the withincluster relationship. The optimal tree method gives better permutations than the proposed method for such circumstances.
The R2E algorithm (and the HCT_R2E method) is computationally more time consuming than other methods. It takes a personal computer (Celeron (R) 3.2 GHz CPU with 512 MB RAM) running C++ on Windows XP about (0.09 sec, 9.09 sec, and 2.71 hr) to obtain the R2E permutations for proximity matrices with (50, 500, 5000) rows/columns. The computation complexity for R2E is of order n^{3}. The computing speed is much slower in the current Java version GAP package although we are implementing a much faster algorithm now. We have also developed a prototype PC cluster system for performing the proposed methods for very large proximity matrices that will be released after it has been fully tested.
Methods
Various concepts have been proposed for rearranging objects in statistical graphs in order to display information structure more effectively. Chen [3] proposed the concept of "relativity of a statistical graph" for placing similar (different) objects at closer (distant) positions in a statistical graph. The local property optimized by the aforementioned HCT techniques realizes only half of the relativity concept when it places similar objects in closer proximity without the necessity of distancing distinct objects.
Ranktwo ellipse seriation
Chen [3] introduced a sorting algorithm called ranktwo ellipse (R2E) seriation that extracts the elliptical structure at iteration with rank two of the converging sequence of iteratively formed correlation matrices. R2E improves SVD in identifying even smoother global permutations. Please see Additional file 1 for an illustration with the 517 gene example. The permuted expression profile matrix and the sorted genebygene correlation matrix using R2E are displayed in Figures 1b. The sorted expression matrix displays a clear smooth transitional twocomponent pattern.
There are two advantages of the R2E method over the SVD method in the sorting of arrays and genes in expression profile matrices. The first is that users do not need to choose the number of leading components; the R2E method always summarizes the embedding variation structure into the final two eigenvectors of the ranktwo correlation matrix. With a unidimensional underlying structure, the two eigenvectors form a halfellipse pattern for sorting purposes. The second advantage is that it can be applied to any given proximity matrix, be it correlation, covariance, Euclidean distance, or other proximity matrix for genes and arrays.
Proximity matrix visualization
Although both the dendrogram of an HCT and representative genes (arrays) of an SVD are generated from given proximity matrices, researchers usually do not pay much attention to the sorted proximity matrices.
Comparing the permuted genebygene correlation matrices in Figures 1a and 1c we see that the HCT forms many blocks along the main diagonal of the correlation matrix while ranktwo method identifies two smooth transitional patterns for up and downregulated genes. Without the visualization of correlation matrix in Figure 1a, HCT suggests many geneclusters with very coherent expression profiles, but with no knowledge of the possible embedded smooth transitional patterns. In light of both correlation matrices in Figures 1a and 1c one can see that the geneclusters actually are formed only because of the constraints imposed by the HCT dendrogram branching structure; the withincluster coherent expression profiles are correctly identified, but the betweenclusters contrasting patterns may not be applicable.
In addition to the visualization of permuted expression profile matrices, we want to emphasize the importance of visualization of sorted proximity matrices for comparing the differences in permutations that result from various sorting algorithms.
Integration of local clustering patterns and global grouping structures
Local coherent gene clusters with very similar expression profiles may represent groups of genes that are coregulated by certain transcription factors or activated by identical binding sites. Global clustering patterns and smooth transitional trends on the other hand, could signal some biological processes at a higherlevel control, such as metabolite pathways or the cellcycle operation. It is necessary to develop clustering and visualization methods that can simultaneously explore local behaviours as well as global grouping effects of gene expression profiles.
This study proposes to guide the flipping mechanism of a conventional agglomerative HCT with the ranktwo ellipse (R2E) seriation as an external reference. The standard working procedure of the proposed algorithm for gene clustering is illustrated as steps 0~5 in Figure 7, using Figure 2 as an example. The same process can be applied for array grouping and sorting.
Generalized antiRobinson criteria
where w is the windowsize defining the range of summation, and I is an indicator function that outputs 1 if the condition is satisfied. Windowsize is the number of columns (rows) from the diagonal of D that we consider in calculating the antiRobinson events. Small windowsizes refer to criteria for considering only local behaviours, and larger windowsizes refer to criteria for more global relationship between subjects.
which ranges between 0 (no antiRobinson events) to 1 (all antiRobinson events). The RGAR curves have better resolution for small windowsize region than the GAR curves for comparing performance of algorithms.
Availability and requirements
The ranktwo ellipse (R2E) seriation and the R2Eguided hierarchical clustering tree methods are implemented in the GAP (generalized association plots) system.
Project name: HCTR2E
Project home page: http://gap.stat.sinica.edu.tw/Software/GAP
Operating systems: any OS that supports the Java environment
Programming language: Java
License: free
Declarations
Acknowledgements
The authors are grateful to Donald Ylvisaker, Konan Peck, ChiunHow Kao, and ShengLi Tzeng for valuable suggestions and assistance. The comments of two anonymous reviews are gratefully acknowledged. This work was supported partially by the National Science Council of Taiwan, R. O. C. (NSC943112B001012Y) and the Genomics Research Center, Academia Sinica, Taiwan, R. O. C. (94B002).
Authors’ Affiliations
References
 Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genomewide expression patterns. PNAS 1998, 95: 14863–14868. 10.1073/pnas.95.25.14863PubMed CentralView ArticlePubMedGoogle Scholar
 Alter O, Brown PO, Botstein D: Singular value decomposition for genomewide expression data processing and modeling. PNAS 2000, 97: 10101–10106. 10.1073/pnas.97.18.10101PubMed CentralView ArticlePubMedGoogle Scholar
 Chen CH: Generalized association plots for information visualization: The applications of the convergence of iteratively formed correlation matrices. Statistica Sinica 2002, 12: 1–23.Google Scholar
 Chen CH, Hwu HG, Jang WJ, Kao CH, Tien YJ, Tzeng S, Wu HM: Matrix visualization and information mining. In Proceedings of Computational Statistics. Physika Verlag, Heidelberg; 2004:85–100.Google Scholar
 Eisen MB: Cluster v. 2.11 and Treeview v. 1.5.[http://rana.lbl.gov/EisenSoftware.htm]
 Iyer VR, Eisen MB, Ross DT, Schuler G, Moore T, Lee JCF, Trent JM, Staudt LM, Hudson J Jr, Boguski MS, Lashkari D, Shalon D, Botstein D, Brown PO: The Transcriptional Program in the Response of Human Fibroblasts to Serum. Science 1999, 283: 83–87. 10.1126/science.283.5398.83View ArticlePubMedGoogle Scholar
 BarJoseph Z, Gifford DK, Jaakkola TS: Fast optimal leaf ordering for hierarchical clustering. Bioinformatics 2001, 17 Suppl 1: S22S29.View ArticlePubMedGoogle Scholar
 Gruvaeus G, Wainer H: Two additions to hierarchical cluster analysis. British journal of Mathematical and Statistical Psychology 1972, 25: 200–206.View ArticleGoogle Scholar
 Degerman R: Ordered binary trees constructed through an application of kendall's tau. Psychometrica 1982, 47: 523–527. 10.1007/BF02293713View ArticleGoogle Scholar
 Gale N, Halperin CW, Costanzo CM: Unclassed matrix shading and optimal ordering in hierarchical cluster analysis. J Classification 1984, 1: 75–92. 10.1007/BF01890117View ArticleGoogle Scholar
 Lawler EL, Lenstra JK, Rinnooy KAHG, Shmoys DB: The travelling salesman problem: A guided tour of combinatorial optimization. Wiley, Chichester; 1985.Google Scholar
 BarJoseph Z, Demaine ED, Gifford DK, Srebro N, Hamel AM, Jaakkola TS: K ary clustering with optimal leaf ordering for gene expression data. Bioinformatics, Special section on Microarray Analysis 2003, 19: 1070–1078.Google Scholar
 Alter O, Brown PO, Botstein D: Generalized singular value decomposition for comparative analysis of genomescale expression data sets of two different organisms. PNAS 2003, 100: 3351–3356. 10.1073/pnas.0530258100PubMed CentralView ArticlePubMedGoogle Scholar
 Holter NS, Maritan A, Cieplak M, Fedoroff NV, Banavar JR: Dynamic modeling of gene expression data. PNAS 2001, 98: 1693–1698. 10.1073/pnas.98.4.1693PubMed CentralView ArticlePubMedGoogle Scholar
 Robinson W: A method for chronologically ordering archaeological deposits. American Antiquity 1951, 16: 293–301. 10.2307/276978View ArticleGoogle Scholar
 Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian AE, Landsman D, Lockhart DJ, Davis RW: A genomewide transcriptional analysis of the mitotic cell cycle. Mol Cell 1998, 2: 65–73. 10.1016/S10972765(00)801148View ArticlePubMedGoogle Scholar
 Lee YS, Chen CH, Chao A, Chen ES, Wei ML, Chen LK, Yang K, Lin MC, Wang YH, Liu JW, Eng HL, Chiang PC, Wu TS, Tsao KC, Huang CG, Tien YJ, Wang TH, Wang HS, Lee YS: Molecular signature of clinical severity in recovering patients with severe acute respiratory syndrome coronavirus (SARSCoV). BMC Genomics 2005, 6: 132. 10.1186/147121646132PubMed CentralView ArticlePubMedGoogle Scholar
 Kaur A, Pan M, Seislin M, Facciotti MT, ElGewely R, Baliga NS: A systems view of haloarchaeal strategies to withstand stress from transition metals. Genome Research 2006, 16(7):841–854. 10.1101/gr.5189606PubMed CentralView ArticlePubMedGoogle Scholar
 Kohonen T: SelfOrganizing Maps. Berlin: SpringerVerlag; 1995.View ArticleGoogle Scholar
 Tamayo P, Slonim J, Mesirov D, Zhu J, Kitareewan S, Dmitrovsky E, Lander E, Golub T: Interpreting patterns of gene expression with selforganizing maps: Methods and applications tohematopoietic differention. PNAS 1999, 96: 2907–2912. 10.1073/pnas.96.6.2907PubMed CentralView ArticlePubMedGoogle Scholar
 Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B: Comprehensive identification of cell cycleregulated genes of the yeast saccharomyces cerevisiae by microarray hybirdization. Molecular Biology of the cell 1998, 9: 3273–3297.PubMed CentralView ArticlePubMedGoogle Scholar
 Fellenberg K, Hauser NC, Brors B, Neutzner A, Hoheisel JD, Vingron M: Correspondence analysis applied to microarray data. PNAS 2001, 98: 10781–10786. 10.1073/pnas.181597298PubMed CentralView ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.