Caryoscope: An Open Source Java application for viewing microarray data in a genomic context
© Awad et al; licensee BioMed Central Ltd. 2004
Received: 12 May 2004
Accepted: 15 October 2004
Published: 15 October 2004
Microarray-based comparative genome hybridization experiments generate data that can be mapped onto the genome. These data are interpreted more easily when represented graphically in a genomic context.
We have developed Caryoscope, which is an open source Java application for visualizing microarray data from array comparative genome hybridization experiments in a genomic context. Caryoscope can read General Feature Format files (GFF files), as well as comma- and tab-delimited files, that define the genomic positions of the microarray reporters for which data are obtained. The microarray data can be browsed using an interactive, zoomable interface, which helps users identify regions of chromosomal deletion or amplification. The graphical representation of the data can be exported in a number of graphic formats, including publication-quality formats such as PostScript.
Caryoscope is a useful tool that can aid in the visualization, exploration and interpretation of microarray data in a genomic context.
The application of high-throughput technologies (such as DNA microarrays) to biomedical experimentation generates large quantities of data that can be difficult to browse and interpret in the absence of a graphical representation. Eisen et al. have previously displayed clustered microarray data using a false color representation that greatly aids in the intuitive interpretation of the data (). However, when these data are from array comparative genome hybridization (arrayCGH) experiments (e.g., see ), the genomic locations of the reporters (the molecules in each spot on a microarray) that were used to generate the data are important for interpretation. A relative increase or decrease in the ratios for a group of reporters that report on adjacent genomic locations may indicate amplification or deletion of that genomic region, respectively. Additionally, even the analysis of expression data in the context of genomic position can also identify regions of amplification or deletion, or even cases of aneuploidy ([3, 4]).
In addition to being able to view and browse arrayCGH data, it is also important that the data be readily connected to annotation sources, such that a user can easily determine the identity and attributes of the gene represented by a reporter that was present on a microarray, which for instance may show evidence of amplification or deletion. For example, in arrayCGH experiments using tumor cells as the DNA source, there is an obvious value in rapidly determining whether a deleted region contains a tumor suppressor gene.
Finally, researchers frequently need to create figures, for publications, communication with co-workers, supplemental websites, or presentations. Thus researchers should be able to produce the visual representation of their data in a variety of graphic formats.
Caryoscope was originally implemented as a Web form, generating either a bitmap or a clickable PDF output. When this became an important day-to-day tool for our users ([5–9]), we created an improved, interactive version, consisting of a standalone application for analyzing arrayCGH data and an open architecture of re-usable classes that may be embedded by other developers in their own applications. In this paper, we focus on Caryoscope as an application.
Some other software packages were developed while this work was in progress, and can perform some of these functions. For instance, Genome2D () is designed to display bacterial transcriptome data on linear chromosome maps, while SeeCGH () was designed for viewing arrayCGH data (only for 2-channel arrays). However, both of these programs are designed to run solely on the Windows operating system, whereas Caryoscope is a Java application that can be used on Macintosh OS X, Linux and various UNIX operating systems, as well as Windows. Greshock et al. () have built similar functionality, called CGHAnalyzer, on TIGR's Multiple Experiment Viewer (MeV) platform, but with a different (circular) whole-genome view. Furthermore, Caryoscope can be run in a command line mode, making it easy to embed within a CGI or a processing pipeline.
We implemented Caryoscope in Java () and deployed it as a Java Web Start () application, so a user may run it directly from our website () by clicking on a link. One can also install Caryoscope directly on a computer, but we recommend launching via the website in order to obtain the most current version of the software.
Caryoscope accepts data as text input files in simple formats so as to maximize interoperability with other systems.
Caryoscope provides several modes in which the user may view the data; these are controlled by the View modes toolbar (Figure 2). In the various panning and zooming modes, the user may change the view of the data to drill down to specific regions of interest. In Navigate mode, the user sees tooltips (small informational pop-up windows) that appear immediately when mousing over the features, and can navigate to related URLs by clicking on each feature. Typically, users at Stanford University (our primary source of testers and users) link GenBank accessions, associated with the cDNA clones that are on their microarrays, to SOURCE Gene Pages ().
The user can enable two built-in computations on the data values: a user can compute the logarithm of the values (to any base specified by the user), and a user can compute a moving average of the values. Both these computations can be controlled from the Settings dialog (see Figure 2b). Users can perform other computations outside Caryoscope; this is facilitated by the fact that we support common spreadsheet-compatible file formats (TXT and CSV).
To prepare diagrams, the user can export the Caryoscope display to a variety of graphics formats via the Export dialog. Specifically, Caryoscope supports vector (e.g., PostScript and PDF) output for scalable publication-quality results, and raster (e.g., JPEG and PNG) output for ease of viewing, posting on supplemental websites, and inclusion in presentations.
A user may export graphics from Caryoscope via the command line mode, without having to invoke the interactive user interface. For example, to export a view of a dataset as a PDF file, the user could invoke Caryoscope as follows:
java -jar caryoscope-run.jar
All settings may be modified via the command line. The -listFormats option provides a list of available graphics output formats, and the -help option prints a brief summary of the options.
Comparison to other tools Itemized comparison of the features of Caryoscope (), CGHAnalyzer (,) and SeeCGH (,).
Java (runs on any popular platform)
Java (runs on any popular platform)
Unlimited use; no signup
Unlimited use with signup
Free with signup for academics
Install Java; run application directly from Web
Install Java; install application
Install MySQL; install application; configure database tables
Input file formats
Single input file; spreadsheet compatible text or General Feature Format (GFF)
Multi-component input; text-based input formats
Text-based input formats
Internal data format
Text file or database
Source code availability
Download with signup
Source code license
Internal code dependencies
All Open Source
Depends on some commercial software
Superimposed karyotype plot
Informational summary for each feature
Yes, configurable in application settings
Yes, fixed tabular format
Yes, fixed tabular format
Web link from each feature
Yes, configurable in application settings
Yes, fixed Ensembl or GoldenPath links
Data points in genome coordinates
Data points in linear order
Simultaneous display of multiple experiments
Part of a more complete analysis framework
Yes (TIGR MeV)
Built-in data point filtering
Concentric circles representing chromosomes
Generate graphics from command line
In addition to immediately executable copies, the complete source code for Caryoscope is available without limitations from our website (), and is covered by a very liberal Open Source () license (the MIT License, ). All external components used by Caryoscope are also Open Source.
We update Caryoscope frequently (approximately once every three weeks) and post news items on the website. We also send e-mail announcements to people who have requested them.
Caryoscope is useful for viewing both arrayCGH and expression data in the context of genomic position. It helps a biologist gain insight by providing a high-level view of a large amount of data at once, where patterns can be perceived at a glance.
A biologist studying amplifications or deletions in tumor cells may create and export graphics representing arrayCGH and expression data for the same cells using Caryoscope, and visually compare the two side-by-side. For instance, co-located regions that are amplified at the DNA level and over-expressed at the RNA level would provide excellent confirmation of the results. Using the zooming and panning features of Caryoscope, the biologist could focus on specific regions of interest.
In a gene expression study, a biologist may suspect that some expression patterns are correlated with genomic position. Caryoscope allows one to view expression data, either for the whole genome or on a region of interest, to help confirm or refute a hypothesis.
Finally, in all this work, the biologist may want to have quick and easy access to information about the genomic features displayed. As long as the information needed is available from the annotations that were saved in the input data file (or available at a URL that can be built based on the annotations), one can use the Feature URL expression and Feature tooltip expression (Figure 4) settings to provide immediate mouse-over feedback with this information – almost as if the application were customized for a specific field of interest.
We intend Caryoscope to be a bench-top visualization tool that biologists can use immediately to get day-to-day work done, with a very low "cost of entry" for getting started. This led us to a number of design choices.
Caryoscope can be launched from our web site without a prior installation step. Since it is Open Source, anyone, including any for-profit organization, can use it without restrictions and without having to obtain a license or register for access.
The input file formats and output graphical formats we chose are all in common usage. In particular, the TXT and CSV input formats can be generated using any popular spreadsheet or database software, or even with a plain text editor, without having to do any programming.
Finally, we built Caryoscope to be content-neutral, with no hard-coded specificity to any research field. Thus, users of Caryoscope may control how annotations are treated as numerical data, and can "program" data-driven interactive behavior of the display (i.e., the tooltips and hyperlinks).
From the outset of this project, our biggest challenge has been how to accurately represent the huge amounts of data in a typical gene expression or arrayCGH experiment using the limited number of pixels available on the screen or on a printed page. If we skew our display algorithms too much towards producing a "sharp", high-contrast plot, we risk obscuring detail in the data and leading biologists to the wrong conclusions. On the other hand, since the size of the data elements, properly scaled from genomic coordinates to the display device, can be far less than the size of one pixel, we need a supportable way to "summarize" the data within each pixel and represent that summary as a single value: the color (including the brightness) of that pixel.
Modern computer graphics systems use a technique called "anti-aliasing" ( and Figure 5a) to render sub-pixel details with the illusion of smoothness. The Java subsystems we use in Caryoscope do this automatically and, in the current version of the software, we simply rely on them (Figure 5b). However, the anti-aliasing in Java is designed to display visually appealing text, lines and arcs, but not to ensure the most accurate possible on-screen rendering of scientific data. Specifically, at low magnification, the data almost disappear unless we force a minimum pixel size for each locus (Figure 5c).
We will develop our display methods further to ensure that we can provide an easy-to-read display while retaining the subtle variations in the data. Following the spirit of medical diagnostic imaging (DICOM, ), whereby incorrect details in a few pixels could lead to an incorrect conclusion, we must ensure that our displays, which are used for important research decisions, are never misleading.
One solution, suggested by , is to display the data elements, not aligned to the position of the loci along the chromosome, but rather in strict sequential order with a fixed width. While this solves the anti-aliasing problems, it does eliminate consistent chromosomal positions and alignments of the data. Furthermore, it causes the appearance of the display to be dependent on the specific choice of clones – which can be another source of subtle variation when comparing multiple datasets. Another idea is to display dots, rather than horizontal bars, so that the "spread" of the data is more visible even if data points are super-imposed. We will investigate this for a future release of Caryoscope.
We are particularly concerned about the use of Caryoscope (or similar) graphics in vector formats (such as PostScript, PDF and SVG) that are subsequently rendered on diverse display devices and printers. Since we have no control over the rendering at the destination, it is likely that the same vector output could look very different on different devices. Once we have studied this problem in more detail, we intend to provide practical usage guidelines for researchers.
In the future, we will modify our display so that the user can turn "on" or "off" the display of the available chromosomes. Within that display, and with the help of our users, we will review the role of the X-axis scaling: perhaps it should change the scaling of the data, or perhaps it does not belong in Caryoscope-like applications at all.
A common user request is to display two or more arrayCGH and/or expression datasets side by side, either to determine regions of recurrent deletion or amplification, or to discern visually the impact that changes in chromosome copy number may have on transcript levels (e.g., see ). Another frequent request is to show the cytoband information for each chromosome. We intend to add these features as part of our future development, in the course of which we would perhaps redefine, or extend, the manner in which our input data are defined (i.e., we would accept the definition of the chromosome names, lengths and cytobands in a separate file that would be re-used by different datasets).
Caryoscope currently provides a flexible method to visualize, explore and create images of microarray data in a genomic context. With such a tool, microarray researchers will be able to answer questions about how genome copy number or genome position plays a role in biological processes or human diseases.
Caryoscope is a useful, flexible Java application for the visualization of microarray data in a genomic context. It is available as Open Source under the permissive MIT License, allowing anyone to use or modify it.
Availability and requirements
Project name: Caryoscope
Project home page: http://caryoscope.stanford.edu/
Operating system(s): Platform independent
Programming language: Java
Other requirements: Java 1.4 or higher
License: MIT License
Any restrictions to use by non-academics: None
The authors would like to thank members of the Pollack, Brown and Botstein laboratories, particularly Jon Pollack, for their feedback on Caryoscope, and for providing datasets for testing, and Ash Alizadeh for suggesting the Caryoscope name. Thanks also go to all the members of the Stanford Microarray Database group for stimulating discussions. This work was funded by a grant from the NHGRI, R01HG002732, to GS.
- Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 1998, 95: 14863–14868. 10.1073/pnas.95.25.14863PubMed CentralView ArticlePubMedGoogle Scholar
- Pollack JR, Perou CM, Alizadeh AA, Eisen MB, Pergamenschikov A, Williams CF, Jeffrey SS, Botstein D, Brown PO: Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nat Genet 1999, 23: 41–46. 10.1038/14385View ArticlePubMedGoogle Scholar
- Crawley JJ, Furge KA: Identification of frequent cytogenetic aberrations in hepatocellular carcinoma using gene-expression microarray data. Genome Biol 2002, 3: RESEARCH0075. 10.1186/gb-2002-3-12-research0075PubMed CentralView ArticlePubMedGoogle Scholar
- Hughes TR, Roberts CJ, Dai H, Jones AR, Meyer MR, Slade D, Burchard J, Dow S, Ward TR, Kidd MJ, Friend SH, Marton MJ: Widespread aneuploidy revealed by DNA microarray expression profiling. Nat Genet 2000, 25: 333–337. 10.1038/77116View ArticlePubMedGoogle Scholar
- Pollack JR, Sorlie T, Perou CM, Rees CA, Jeffrey SS, Lonning PE, Tibshirani R, Botstein D, Borresen-Dale AL, Brown PO: Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors. Proc Natl Acad Sci U S A 2002, 99: 12963–12968. 10.1073/pnas.162471999PubMed CentralView ArticlePubMedGoogle Scholar
- Baldus CD, Liyanarachchi S, Mrozek K, Auer H, Tanner SM, Guimond M, Ruppert AS, Mohamed N, Davuluri RV, Caligiuri MA, Bloomfield CD, de la Chapelle A: Acute myeloid leukemia with complex karyotypes and abnormal chromosome 21: Amplification discloses overexpression of APP, ETS2, and ERG genes. Proc Natl Acad Sci U S A 2004, 101: 3915–3920. 10.1073/pnas.0400272101PubMed CentralView ArticlePubMedGoogle Scholar
- Linn SC, West RB, Pollack JR, Zhu S, Hernandez-Boussard T, Nielsen TO, Rubin BP, Patel R, Goldblum JR, Siegmund D, Botstein D, Brown PO, Gilks CB, van de Rijn M: Gene expression patterns and gene copy number changes in dermatofibrosarcoma protuberans. Am J Pathol 2003, 163: 2383–2395.PubMed CentralView ArticlePubMedGoogle Scholar
- Dunham MJ, Badrane H, Ferea T, Adams J, Brown PO, Rosenzweig F, Botstein D: Characteristic genome rearrangements in experimental evolution of Saccharomyces cerevisiae. Proc Natl Acad Sci U S A 2002, 99: 16144–16149. 10.1073/pnas.242624799PubMed CentralView ArticlePubMedGoogle Scholar
- Lin JY, Pollack JR, Chou FL, Rees CA, Christian AT, Bedford JS, Brown PO, Ginsberg MH: Physical mapping of genes in somatic cell radiation hybrids by comparative genomic hybridization to cDNA microarrays. Genome Biol 2002, 3: RESEARCH0026. 10.1186/gb-2002-3-6-research0026PubMed CentralView ArticlePubMedGoogle Scholar
- Baerends RJ, Smits WK, De Jong A, Hamoen LW, Kok J, Kuipers OP: Genome2D: a visualization tool for the rapid analysis of bacterial transcriptome data. Genome Biol 2004, 5: R37. 10.1186/gb-2004-5-5-r37PubMed CentralView ArticlePubMedGoogle Scholar
- Chi B, DeLeeuw RJ, Coe BP, MacAulay C, Lam WL: SeeGH - A software tool for visualization of whole genome array comparative genomic hybridization data. BMC Bioinformatics 2004, 5: 13. 10.1186/1471-2105-5-13PubMed CentralView ArticlePubMedGoogle Scholar
- Greshock J, Naylor TL, Margolin A, Diskin S, Cleaver SH, Futreal PA, deJong PJ, Zhao S, Liebman M, Weber BL: 1-Mb resolution array-based comparative genomic hybridization using a BAC clone set optimized for cancer gene analysis. Genome Res 2004, 14: 179–187. 10.1101/gr.1847304PubMed CentralView ArticlePubMedGoogle Scholar
- Java Web Start[http://java.sun.com/products/javawebstart/]
- General Feature Format (GFF)[http://www.sanger.ac.uk/Software/formats/GFF/]
- Diehn M, Sherlock G, Binkley G, Jin H, Matese JC, Hernandez-Boussard T, Rees CA, Cherry JM, Botstein D, Brown PO, Alizadeh AA: SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data. Nucleic Acids Res 2003, 31: 219–223. 10.1093/nar/gkg014PubMed CentralView ArticlePubMedGoogle Scholar
- Open Source Initiative (OSI)[http://www.opensource.org/]
- The MIT License[http://www.opensource.org/licenses/mit-license.php]
- Aliasing Problems and Anti-Aliasing Techniques[http://www.siggraph.org/education/materials/HyperGraph/aliasing/alias0.htm]
- Digital Imaging and Communications in Medicine (DICOM)[http://medical.nema.org/]
This article is published under license to BioMed Central Ltd. This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.