GeneXplorer: an interactive web application for microarray data visualization and analysis
- Christian A Rees†1,
- Janos Demeter†2,
- John C Matese†1, 3,
- David Botstein1, 3 and
- Gavin Sherlock1Email author
© Rees et al; licensee BioMed Central Ltd. 2004
Received: 21 May 2004
Accepted: 01 October 2004
Published: 01 October 2004
When publishing large-scale microarray datasets, it is of great value to create supplemental websites where either the full data, or selected subsets corresponding to figures within the paper, can be browsed. We set out to create a CGI application containing many of the features of some of the existing standalone software for the visualization of clustered microarray data.
The software is released under the permissive MIT Open Source license, and the complete documentation and the entire source code are freely available for download from CPAN http://search.cpan.org/dist/Microarray-GeneXplorer/.
Microarray experiments produce vast amounts of data. The resulting datasets are highly complex and contain large matrices of expression measurements as well as sequence and experiment annotations that provide biological context to the data. To organize these different types of data in a way that allows intuitive exploration of the data, and provides the ability to gain important insights into relationships within a given dataset requires sophisticated visualization tools. Such visualization tools are of benefit not only to researchers analyzing and presenting or publishing their own data, but also to Model Organism Databases (MODs) for compiling and displaying microarray data for a given model organism.
There are several excellent free tools available that allow an individual user to analyze their own data. These tools are either accessible on the web, or can be downloaded and used on a desktop machine. Examples include the EPCLUST , GEPAS [2, 3] and FGDP [4, 5] web-based tools and the TMEV [6, 7] desktop tool from TIGR. However, once these tools have been used, and a cluster or other group of genes has been selected, this resulting dataset needs to be made available to other people for browsing and exploration. There are a few visualization tools that allow display of such a static dataset that are available as free software tools, e.g. Michael Eisen's TreeView [8, 9], JavaTreeView , or the more recent MapleTree . All of these tools are, however, desktop tools that themselves have to be downloaded and work on locally stored datasets. The impetus for the development of GeneXplorer was the desire to provide access to datasets via the Internet, without the requirement to download and install additional software. We developed GeneXplorer for use in web supplements of microarray publications whose raw data are housed within the Stanford Microarray Database (SMD) [11, 12] and for use as a tool to allow SMD users to browse their own data within SMD before publication. Using GeneXplorer, hierarchically clustered gene expression data can be interactively viewed using a web browser on any computer platform. GeneXplorer uses the widely accepted CDT file format  produced by several freely available clustering programs (e.g. [9, 14]), which between them have been downloaded several thousand times. Thus GeneXplorer should be widely usable my SMD and non-SMD users alike.
The Microarray::CdtDataset has two essential functions: during dataset creation (see below) it decomposes the data file into its constituent data parts and creates the files needed during data viewing (see below). During data viewing it provides the API for the viewer, and allows searching and retrieval of the data. Under the current model the dataset object itself is immutable. Microarray::CdtDataset was implemented as a client of the Microarray::DataMatrix module, which provides an API for accessing matrices of expression data. In the design of the classes certain compromises had to be made to accommodate the stateless client server environment in which the program operates. Specifically, to allow rapid responses, pre-generated images and correlation data are cached in a compact format on the web-server.
There are two stages required to publish a microarray dataset on the web using GeneXplorer. The first stage (executed only once per dataset) involves creation of all the necessary files for GeneXplorer to use. The second stage uses these files to produce the display using the GeneXplorer web front-end.
Full text searching
The search box in the toolbar enables a string search of either all, or specific gene annotation fields. The string may contain more than one term, where each term in the search string should be at least 2 characters long. Spaces between the terms are interpreted as term separators and the terms are combined using the logical 'AND' operator. Wildcard searches are allowed using the '*' character, such that at least one character should precede the wildcard character. The hits resulting from the search are displayed in the zoom frame, as expression patterns. The number of hits displayed in the zoom window is limited to 200 hits.
GeneXplorer allows configurable linking out of the gene annotations to external databases. The number of these links per a gene is not limited, making it easy to be able to look at the information for a gene in several different databases. A configuration file in the dataset directory is used to control where the various gene identifiers are linked. Templates are available for various organisms, and the existing files can be edited manually if a link to a new database is desired. Because of the current limitations of the input cdt file format, setting up the external database links might require manual editing at the time of dataset creation. This is fully described in the README document that is part of the distribution. The external database annotations are not currently updatable in any automated fashion; this will be addressed as part of our plans to make GeneXplorer able to read MAGE-ML (see future plans) that would allow us to do the updates via web services.
Installation and use
The GeneXplorer package is provided as a typical Perl distribution on the Comprehensive Perl Archive Network (CPAN), and adheres to the usual installation mantra of perl modules. After unpacking the software, a user with administrative privileges merely needs to type:
This will install the libraries and the executable files that are needed for dataset creation by GeneXplorer into the regular system locations, unless otherwise specified during the first step above. The example in Figure 2 shows the file structure if the library and bin directories under the web server's root had been specified for installation of the libraries and executables respectively. To actually use the gx script, it must be copied into a cgi-bin directory, and the various html files must be copied to the appropriate location under the web server's root (see Figure 2).
Results and discussion
In addition to its use within SMD, GeneXplorer has been used by many publications to provide access to microarray datasets through their web supplements, that can be accessed through SMD's publication page , and was used as the basis for visualization of fuzzy k-means cluster data . We demonstrate on an example dataset how GeneXplorer works [19, 20]. Figure 3a shows a display of this dataset in the browser window. The whole dataset is displayed in the radar frame, and the zoom window shows the section of this image that was selected, with the gene annotations at a readable size. Clicking on any of the hyperlinks in the zoom frame brings up a new window displaying the biological information for the selected gene that is found in SOURCE (Figure 3b.) . Searching the dataset for all the genes whose name field contains the keyword 'kinase' results in the zoom window shown in Figure 3c. This type of search allows comparison of the expression patterns of a subset of the genes based on some functional category – e.g. GO process-terms, if the annotation fields contain these terms. Clicking on one of the expression profiles (the one belonging to 'Estrogen Receptor 1', in this case) leads to the display in Figure 3d. In the zoom frame it shows the expression profile of the selected gene as the top row, and all the other expression profiles below with Pearson correlation above 0.5. The length of the small orange bar on the right side of the expression profiles gives a graphical representation of these correlation values, while the actual value is displayed in the info box in the toolbar when the mouse is over the orange bar.
We are planning to further develop GeneXplorer to enable it to handle other data formats. Specifically, we would like it to be able to accept data files in MAGE-ML format , which is becoming a standard file format for communicating gene expression data. In addition, we would like it to be able to display tree views of the clustered data and allow zooming on specific nodes of the cluster.
We have developed a web-application, GeneXplorer, which allows the visualization of microarray datasets over the Internet using only a web browser. This application has been extremely useful in our experience, where it serves both SMD users during analysis of their data and the public while browsing published datasets.
Availability and requirements
GeneXplorer is available at  under the MIT Open Source license. It should work on any UNIX-type system capable of running Perl and a Web server, though we ourselves have deployed it on Sun Solaris. Additional information on installation and usage is provided in the installation instructions and documentation that is part of the distribution.
List of abbreviations used
Stanford Microarray Database.
The authors would like to thank members of the Brown and Botstein laboratories, for their feedback on GeneXplorer, and for providing datasets for testing. Thanks also go to all the members of the Stanford Microarray Database group for stimulating discussions. This work was funded by a grant from the NHGRI, R01HG002732, to GS.
- EPCLUST - Clustering, visualization, and analysis[http://ep.ebi.ac.uk/EP/EPCLUST/]
- Herrero J, Al-Shahrour F, Diaz-Uriarte R, Mateos A, Vaquerizas JM, Santoyo J, Dopazo J: GEPAS: A web-based resource for microarray gene expression data analysis. Nucleic Acids Res 2003, 31: 3461–3467. 10.1093/nar/gkg591PubMed CentralView ArticlePubMedGoogle Scholar
- FGDP: Functional Genomics Data Pipeline[http://bioinformatics.fccc.edu/software/OpenSource/FGDP/FGDP.shtml]
- Grant JD, Somers LA, Zhang Y, Manion FJ, Bidaut G, Ochs MF: FGDP: functional genomics data pipeline for automated, multiple microarray data analyses. Bioinformatics 2004, 20: 282–283. 10.1093/bioinformatics/btg407View ArticlePubMedGoogle Scholar
- MeV: MultiExperiment Viewer[http://www.tigr.org/software/tm4/mev.html]
- Saeed AI, Sharov V, White J, Li J, Liang W, Bhagabati N, Braisted J, Klapa M, Currier T, Thiagarajan M, Sturn A, Snuffin M, Rezantsev A, Popov D, Ryltsov A, Kostukovich E, Borisovsky I, Liu Z, Vinsavich A, Trush V, Quackenbush J: TM4: a free, open-source system for microarray data management and analysis. Biotechniques 2003, 34: 374–378.PubMedGoogle Scholar
- Eisen Lab Software[http://rana.lbl.gov/EisenSoftware.htm]
- Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 1998, 95: 14863–14868. 10.1073/pnas.95.25.14863PubMed CentralView ArticlePubMedGoogle Scholar
- SourceForge.net: Project Info - Java Treeview[http://sourceforge.net/projects/jtreeview/]
- Gollub J, Ball CA, Binkley G, Demeter J, Finkelstein DB, Hebert JM, Hernandez-Boussard T, Jin H, Kaloper M, Matese JC, Schroeder M, Brown PO, Botstein D, Sherlock G: The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Res 2003, 31: 94–96. 10.1093/nar/gkg078PubMed CentralView ArticlePubMedGoogle Scholar
- Sherlock G, Hernandez-Boussard T, Kasarskis A, Binkley G, Matese JC, Dwight SS, Kaloper M, Weng S, Jin H, Ball CA, Eisen MB, Spellman PT, Brown PO, Botstein D, Cherry JM: The Stanford Microarray Database. Nucleic Acids Res 2001, 29: 152–155. 10.1093/nar/29.1.152PubMed CentralView ArticlePubMedGoogle Scholar
- Stanford MicroArray Database File Format Help[http://smd.stanford.edu/help/formats.shtml]
- de Hoon MJ, Imoto S, Nolan J, Miyano S: Open source clustering software. Bioinformatics 2004, 20: 1453–1454. 10.1093/bioinformatics/bth078View ArticlePubMedGoogle Scholar
- Krasner Glenn E, Pope Stephen T: A cookbook for using the model-view controller user interface paradigm in Smalltalk-80. Journal of Object Oriented Programming 1988, 1: 26–49.Google Scholar
- search.cpan.org: Lincoln D. Stein /GD[http://search.cpan.org/dist/GD/]
- SMD : List Data for publication[http://smd.stanford.edu/cgi-bin/tools/display/listMicroArrayData.pl?tableName=publication]
- Gasch AP, Eisen MB: Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biol 2002, 3: RESEARCH0059. 10.1186/gb-2002-3-11-research0059PubMed CentralView ArticlePubMedGoogle Scholar
- Prostate Cancer Molecular Subtypes[http://microarray-pubs.stanford.edu/cgi-bin/gx?n=prostate1&rx=5]
- Lapointe J, Li C, Higgins JP, van de Rijn M, Bair E, Montgomery K, Ferrari M, Egevad L, Rayford W, Bergerheim U, Ekman P, DeMarzo AM, Tibshirani R, Botstein D, Brown PO, Brooks JD, Pollack JR: Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proc Natl Acad Sci U S A 2004, 101: 811–816. 10.1073/pnas.0304146101PubMed CentralView ArticlePubMedGoogle Scholar
- Diehn M, Sherlock G, Binkley G, Jin H, Matese JC, Hernandez-Boussard T, Rees CA, Cherry JM, Botstein D, Brown PO, Alizadeh AA: SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data. Nucleic Acids Res 2003, 31: 219–223. 10.1093/nar/gkg014PubMed CentralView ArticlePubMedGoogle Scholar
- Spellman PT, Miller M, Stewart J, Troup C, Sarkans U, Chervitz S, Bernhart D, Sherlock G, Ball C, Lepage M, Swiatek M, Marks WL, Goncalves J, Markel S, Iordan D, Shojatalab M, Pizarro A, White J, Hubley R, Deutsch E, Senger M, Aronow BJ, Robinson A, Bassett D, Stoeckert C. J., Jr., Brazma A: Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol 2002, 3: RESEARCH0046. 10.1186/gb-2002-3-9-research0046PubMed CentralView ArticlePubMedGoogle Scholar
- search.cpan.org: Gavin Sherlock /Microarray-GeneXplorer[http://search.cpan.org/dist/Microarray-GeneXplorer/]
This article is published under license to BioMed Central Ltd. This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.