Genoviz Software Development Kit: Java tool kit for building genomics visualization applications
© Helt et al; licensee BioMed Central Ltd. 2009
Received: 12 April 2009
Accepted: 25 August 2009
Published: 25 August 2009
Visualization software can expose previously undiscovered patterns in genomic data and advance biological science.
The Genoviz Software Development Kit (SDK) is an open source, Java-based framework designed for rapid assembly of visualization software applications for genomics. The Genoviz SDK framework provides a mechanism for incorporating adaptive, dynamic zooming into applications, a desirable feature of genome viewers. Visualization capabilities of the Genoviz SDK include automated layout of features along genetic or genomic axes; support for user interactions with graphical elements (Glyphs) in a map; a variety of Glyph sub-classes that promote experimentation with new ways of representing data in graphical formats; and support for adaptive, semantic zooming, whereby objects change their appearance depending on zoom level and zooming rate adapts to the current scale. Freely available demonstration and production quality applications, including the Integrated Genome Browser, illustrate Genoviz SDK capabilities.
Separation between graphics components and genomic data models makes it easy for developers to add visualization capability to pre-existing applications or build new applications using third-party data models. Source code, documentation, sample applications, and tutorials are available at http://genoviz.sourceforge.net/.
Since the beginning of the genomics era, numerous authors have warned against on-coming information overload, using metaphors that evoke natural disasters ("deluge,"  "avalanche" , "tsunami" ) to emphasize how our capacity to generate data threatens to overwhelm our ability to deploy it in research. Finding ways to store and analyze vast amounts of data is not as difficult as it once was, thanks to improvements in database technologies, web services, and computer hardware, but developing graphical software that allows scientists to visualize, explore, and interact with novel and rapidly expanding data sets from genomics remains a challenging task.
Flexible, highly-interactive visualization software has great value in genomics because it enables scientists to explore the genomic landscape without knowing in advance what patterns and relationships they might find [4–6]. Visualization techniques provide an excellent complement to more abstract, quantitative or statistical analysis methods in that they rely on innate human visual processing systems, rather than on abstract mathematical reasoning. Discoveries arising from interactive inspection of the genomic landscape will always require corresponding statistical validation, but visual methods have tremendous power to expose compelling patterns and give scientists ideas on what to test.
Developing visualization software for genomics can be difficult. Genome science changes so rapidly that new data types frequently appear in advance of statistical methods and visualization software needed to analyze and display them. Conceiving effective ways to represent new types of biological data in graphical formats can be difficult in part because even the scientists generating the data may not know precisely what they want to see until they have seen and interacted with real-life examples. Once they do, the experience of viewing their data in graphical format often suggests new questions and new ideas, creating the need to develop new ways to display data and new modes of interaction. However, once a developer has created a working application for scientists to use, it may not be easy or even feasible to modify how the data are represented, depending on how the developer has implemented the graphical components.
The Genoviz Software Development Kit (SDK) aims to solve some of these problems. The Genoviz SDK is an open-source, freely available Java-based toolkit for building genomics visualization applications. It provides core functionality developers can easily deploy in their applications, notably interactive, dynamic zooming, which allows rapid navigation and exploration of genome-scale data sets covering many orders of magnitude, from chromosomes to genes to individual base pairs. The toolkit implements functionality well-suited to genomics visualization applications, but its architecture also makes it easy for developers to invent new ways to represent emerging data types in graphical formats. The Genoviz SDK aims to help developers build new applications iteratively and organically, inventing novel graphical representational schemes in collaboration with users as they explore their own data, make discoveries, and think of new questions for their software and experiments to address.
The Genoviz SDK is implemented using the Java programming language and requires Java versions 1.6 or above.
The Genoviz SDK is an object-oriented, Java-based graphics framework that provides methods, objects, and a class hierarchy for developers to display genomic data in two-dimensional fields. It originated as the Neomorphic Software Development Kit (NGSDK) and was first developed at Neomorphic Software, a bioinformatics company that later merged with Affymetrix, which continued development of the software. The NGSDK later entered the public domain as open source software under a new name: the Genoviz SDK. As a result, many classes and packages in the toolkit bear appellations "neo" and "affymetrix," reflecting the Genoviz SDK's origins at the two companies.
The Genoviz SDK's core graphics system employs three collaborating classes: Scene, a View, and Glyph. A Scene object represents a two-dimensional data field and its coordinate system. For example, a Scene might represent the physical map of a single chromosome or chromosome arm, with its associated annotations and sequence data. In this case, the coordinate system comprises nucleotide positions, most easily expressed as whole numbers. Another Scene might represent a genetic or cytological map; in this case, the coordinate system might be based on map units, which can have fractional values. To accommodate different types of maps, the trio of interacting objects use floating point numbers to indicate positions, but programmers are free to use either floating point numbers or integers when adding items to a Scene.
A View represents the user's current view on the Scene. It captures the user's current level of zoom and the range of visible coordinates. Each View references a single Scene and encapsulates algorithms for transforming coordinates referencing the Scene's biology-based coordinate system into the pixel-based coordinate system of the user's computer screen. The details of how Scene and View objects accomplish this translation are largely hidden from the programmer, and most programmers would not need to manipulate instances of these classes directly, unless they required specialized behavior not already provided as part of the toolkit. However, because the system is open source, programmers can investigate the transformation logic in detail when necessary.
The Scene, Glyph, and View come together as components of a fourth object, called a Widget (see below), that manages their interactions with each other and implements basic functionality for mediating user interactions with the genomic data on display in the Scene. The Widget typically creates and manages scrollbars, sliders, or other graphical user interface components the user operates to zoom or pan the display. The Widget intercepts user requests to change scale or position, and then triggers invocation of drawing methods via the one or more View objects associated with its Scene. Each View in turn requests its Scene's Glyphs to draw themselves; Glyphs consult the View in which they appear to determine their sizes relative to the pixel-based coordinate system of the computer screen and then use the built-in Java AWT Graphics object (also obtained from a View) to draw graphical elements on the screen. The division of responsibility between Scene, Glyph, and View makes it possible for a single Scene to appear in multiple Views, allowing multiple, alternative representations of the same data. This can be useful in a number of settings, for example, when viewing an overview graphic of an entire chromosome in tandem with a zoomed-in view of the same data.
An important feature of this design is that it enables the graphics system to operate relatively independently of the data models, akin to the Model-View-Controller architectural design pattern commonly encountered in modern software applications that aim to separate business logic from presentation strategy . Glyphs can contain references to custom data models, and specialized Glyph subclasses may implement drawing logic that consults these data models, but otherwise, Glyphs do not enforce a particular scheme for modeling genomic data.
One of the core features of Genoviz SDK is that the graphical rendering system handles zooming and panning (scrolling) without the programmer having to provide explicit control of the system in response to user drags on scrollbars or sliders attached to a display. The zooming sub-system typically uses a default (but replaceable) log-based Transform object to adapt the amount of zoom obtained per unit of user drag (e.g., the number of pixels a scrollbar thumb moves) to the current scale of the view on display in an application. As a result, drag gestures at high zoom change the scale of the display less than do drags at lower zoom. Similarly, the programmer does not typically have to determine the vertical positioning of Glyphs in horizontal maps. Typically, the widget component implements algorithms (encapsulated in Packer objects) that determine the vertical location of new Glyphs, stacking them in ways that prevent them from colliding in the two-dimensional data field.
Custom data representation schemes using Glyphs
Separation of Concerns: Graphics Semantics and Genomics Semantics
The Genoviz SDK aims wherever possible to separate the semantics of graphical rendering from the semantics of the genomic data. This design decision enables developers to reuse preexisting libraries and applications when creating visualization applications. The Genoviz SDK graphics logic does not specify how data should be represented internally within an application. At first glance, this statement may seem contradictory in that the Genoviz SDK aims to make creation of genomic data visualization applications easy and convenient for programmers. However, the precise semantics of genomic data models are often application-specific, whereas the graphics components more often generalize across diverse applications and problem domains. For example, a developer who has implemented a database system for representing genomic data may wish to use data models that harmonize with the database. Similarly, a developer familiar with the open source BioJava library might prefer to use BioJava data models in conjunction with the Genoviz SDK . To ensure maximum reusability, the Genoviz SDK does not require programmers to conform to any pre-determined scheme for representing genomic data. By separating drawing logic from genomic semantics of genes and genomes, sequence and annotations, the Genoviz SDK graphics system provides conveniences for the programmer that are nonetheless well-adapted to representation of genomic data.
Genoviz SDK Widgets
The Genoviz SDK includes several classes (called Widgets) that provide convenient functions for interacting with and representing data types commonly encountered in bioinformatics. These Widgets implement a generic interface that specifies basic functionality related to zooming, panning, selection, and interaction with the underlying graphics system. Widgets provide methods for establishing horizontal and/or vertical axes, setting display bounds, panning, zooming, selecting and placing items at designated positions, configuring glyph factories and data adapters (for creating graphical objects with common attributes), setting background colors, establishing window resize behavior, and managing event handler objects that intercept and respond to user interactions. Specialized Widget implementation classes also provide methods and functionality for representing specific categories of genomic data. Some of the currently available widgets include a NeoMap object, for representation of physical and genetic maps; a TieredNeoMap widget for display of physical or genetics map data in individually-adjustable and configurable tracks; a NeoAssembler widget, aimed at display of EST/cDNA assemblies; and a NeoSeq widget for display of sequence data. By providing a generic Widget interface, the Genoviz SDK framework aims to encourage developers to create new widgets that support emerging data types, such as data from sequence-based expression profiling.
Creating an application using the Genoviz SDK
A typical Genoviz SDK application consists of several collaborating classes: parsers that read data from files or databases and generate in-memory data structures; adapter objects that translate these data structures into Glyph objects shown in the display; one or more display components (e.g., NeoMap) that mediate user interactions with data, and custom application logic that specifies how the application will respond to user interactions with Glyphs, menus, scrollbars, and other graphical elements. Typically, developers attach data models representing genomic data to Glyph objects via a generic setInfo method, which accepts any Java object; this allows developers to link the object models representing genomic data to the graphical elements that control how the data will appear within a display. When users interact with Glyphs, the display component generates events, which the application logic may capture and then interpret. For example, right-clicking a Glyph representing a gene model might signal a request to get more information about it, as happens in the Integrated Genome Browser . Alternatively, it could represent a request to perform an editing operation on the underlying genomic data model the Glyph represents. In this way, the graphics system provides a visual representation of data structures that users can easily inspect and manipulate via Glyph objects, similar to windows, checkboxes, and other familiar graphical elements that users can click, drag, and manipulate in standard graphical user interfaces.
Developers evaluating the Genoviz SDK for use in their applications may be interested in comparing it to other graphics frameworks that render two-dimensional data. The Jazz/Piccolo framework, first developed at the University of Maryland, enjoys a large group of enthusiastic users . In Jazz/Piccolo, zooming is typically two-dimensional, imitating the action of camera rising and lowering over a two-dimensional data field. Jazz/Piccolo zooming involves focusing on a single point around which the data field contracts or expands as the user zooms in or out. In genomic applications, where the field of data is typically one-dimensional and involves stretching and contracting a genomic sequence axis, zooming is simpler and more intuitive when restricted to a single dimension . The graphical components implemented in Jazz/Piccolo require more memory and processing power than their equivalents in the Genoviz SDK, thus limiting their usefulness for display of vast genomic data sets . They are best-suited to the representation of hundreds of data objects, whereas the Genoviz SDK is well-suited for representation of genome-scale data sets, which can include hundreds of thousands of objects and millions of data points.
Other toolkits for genomics visualization that have been published include bioTK and bioWidgets. David Searls' bioTk toolkit offered a set of configurable graphical components, termed "widgets," that programmers could manipulate using the Tcl/Tk command-line language . This early toolkit included components for drawing chromosomes, genome maps, and sequence displays. Later, Gregg Helt developed bioTK-Perl, which allowed Perl developers to use similar components in Perl applications, such as Genotator, a workbench for genome annotation developed by Nomi Harris . At least two Java-based toolkits for genomics visualization were developed in the late 1990s, including the BioViews toolkit from the Berkeley Drosophila Genome Project  and the BioWidgets toolkit from CBIL at the University of Pennsylvania . To our knowledge, these toolkits are no longer supported. The Genoviz SDK draws inspiration from these early groups' work; however, its structured graphics approach more closely resembles the Jazz/Piccolo toolkit and places greater emphasis on efficient memory usage. The open source BioJava and BioPerl projects include sequence visualization components, but they are more tightly coupled to their respective BioJava and BioPerl data models .
The Genoviz SDK is a Java data visualization toolkit for genome data application developers. It handles the low-level aspects of linking object-oriented data models to graphical widgets so that application developers can focus on the unique aspects of their data and application logic, rather than implement graphical rendering algorithms. An award from the National Science Foundation (U.S.A.), as well as volunteer efforts from a small but growing community of developers, continue to support development of the IGB software and the Genoviz SDK. Major efforts currently underway include creation and updates of documentation for novice and experienced developers, creation of new tutorials showing programmers how they can use Genoviz SDK to add visualization capability to their applications, and development of demonstration applications showing Genoviz SDK graphics capability. Resources for developers and users alike are available from http://genoviz.sourceforge.net.
Availability and Requirements
The Genoviz SDK is implemented in the Java programming language and is freely available under the Common Public License, v1.0, an Open Source Initiative http://www.opensource.org/ approved open source license. The project home page is http://genoviz.sourceforge.net, from where users can download and view source code anonymously. Users interested in downloading a pre-compiled copy of the Integrated Genome Browser software can obtain it at http://igb.bioviz.org. Please note that the igb.bioviz.org site systems administrator compiles composite usage statistics aimed at tracking the overall number of downloads and average number of accesses per IP address. The sourceforge site also tracks general usage statistics. Other than this, no details about individual users and their visits to the site are tracked. To run the software, users will require Java version 1.6 or higher.
Funding from the National Science Foundation Arabidopsis 2010 program (Award number 0820371, to PI Ann Loraine) and the National Institutes of Health (Award R01HG003040-02S1 to PI Gregg Helt) provided financial support for this work. We gratefully acknowledge former and current colleagues at Affymetrix, Neomorphic, and the Berkeley Drosophila Genome Project for their support and encouragement during the development of the Genoviz SDK. We also gratefully acknowledge Affymetrix, Inc. for releasing the Genoviz SDK under an open source license and donating developer time during the transition from closed to open source development.
- Aldhous P: Managing the genome data deluge. Science 1993, 262(5133):502–503. 10.1126/science.8211171View ArticlePubMedGoogle Scholar
- Sonnhammer EL: Genome informatics: taming the avalanche of genomic data. Genome Biol 2005, 6(1):301. 10.1186/gb-2004-6-1-301PubMed CentralView ArticlePubMedGoogle Scholar
- Hoon S, Ratnapu KK, Chia JM, Kumarasamy B, Juguang X, Clamp M, Stabenau A, Potter S, Clarke L, Stupka E: Biopipe: a flexible framework for protocol-based bioinformatics analysis. Genome Res 2003, 13(8):1904–1915.PubMed CentralPubMedGoogle Scholar
- Cline MS, Kent WJ: Understanding genome browsing. Nat Biotechnol 2009, 27(2):153–155. 10.1038/nbt0209-153View ArticlePubMedGoogle Scholar
- Helt GA, Lewis S, Loraine AE, Rubin GM: BioViews: Java-based tools for genomic data visualization. Genome Res 1998, 8(3):291–305.PubMed CentralView ArticlePubMedGoogle Scholar
- Loraine AE, Helt GA: Visualizing the genome: techniques for presenting human genome data and annotations. BMC Bioinformatics 2002, 3: 19. 10.1186/1471-2105-3-19PubMed CentralView ArticlePubMedGoogle Scholar
- Nicol JW, Helt GA, Blanchard SG Jr, Raja A, Loraine AE: The Integrated Genome Browser: Free software for distribution and exploration of genome-scale data sets. Bioinformatics 2009. [Epub ahead of print] [Epub ahead of print]Google Scholar
- Holland RC, Down TA, Pocock M, Prlic A, Huen D, James K, Foisy S, Drager A, Yates A, Heuer M, et al.: BioJava: an open-source framework for bioinformatics. Bioinformatics 2008, 24(18):2096–2097. 10.1093/bioinformatics/btn397PubMed CentralView ArticlePubMedGoogle Scholar
- Piccolo2D – A structured 2D Graphics Framework[http://www.piccolo2d.org]
- FAQ: Why is my app so slow?[http://www.piccolo2d.org/learn/dev-faq.html#q2]
- Searls DB: bioTk: Componentry for genome informatics graphical user interfaces. Gene 1995, 163(2):GC1–16. 10.1016/0378-1119(95)00424-5View ArticlePubMedGoogle Scholar
- Harris NL: Annotating sequence data using Genotator. Mol Biotechnol 2000, 16(3):221–232. 10.1385/MB:16:3:221View ArticlePubMedGoogle Scholar
- Fischer S, Crabtree J, Brunk B, Gibson M, Overton GC: bioWidgets: data interaction components for genomics. Bioinformatics 1999, 15(10):837–846. 10.1093/bioinformatics/15.10.837View ArticlePubMedGoogle Scholar
- Huntley D, Tang YA, Nesterova TB, Butcher S, Brockdorff N: Genome Environment Browser (GEB): a dynamic browser for visualising high-throughput experimental data in the context of genome features. BMC Bioinformatics 2008, 9: 501. 10.1186/1471-2105-9-501PubMed CentralView ArticlePubMedGoogle Scholar
- Haas BJ, Wortman JR, Ronning CM, Hannick LI, Smith RK Jr, Maiti R, Chan AP, Yu C, Farzad M, Wu D, et al.: Complete reannotation of the Arabidopsis genome: methods, tools, protocols and the final release. BMC Biol 2005, 3: 7. 10.1186/1741-7007-3-7PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.