Architecture
Thanks to technologies such as HTML5 and JavaScript, modern Web browsers are capable of rendering fully featured, graphical user interfaces for both Web sites and local applications. Krona's architecture takes a hybrid approach in which data are stored locally, but the interface code is hosted on the Internet. This allows each Krona chart to be contained in a single file, making them easy to view, share, and integrate with existing websites. The only requirements for viewing are an Internet connection and a recent version of any major web browser (though local charts that do not require an Internet connection can also be created and viewed with a Krona installation). Modularity is achieved by embedding XML chart data in an XHTML document that links to an external JavaScript implementation of the interface (Figure 1). When a web browser renders the XHTML document, the JavaScript loads chart data from the embedded XML and renders the chart to an HTML5 canvas tag. Hosting the JavaScript on the Internet avoids installation requirements and allows seamless, automatic updating as Krona evolves. To allow Krona to be used for a wide variety of applications, utilities for creating Krona charts are separated from the viewing engine. A package of these, called KronaTools, comprises Perl scripts for importing data from several popular bioinformatics tools and generic file types.
Hierarchical classifications can be directly imported from the RDP Classifier, Phymm/PhymmBL, MG-RAST (both taxonomic and functional), or the web-based bioinformatics platform Galaxy [17]. Sequences can also be taxonomically classified from BLAST results downloaded from NCBI [9, 10] or the METAREP metagenomic repository [4]. Classification of raw BLAST results is performed by finding the lowest common ancestor of the highest scoring alignments (an approach similar to that of MEGAN), and data are mapped to a taxonomy tree automatically downloaded and indexed from the NCBI taxonomy database [18]. When importing classifications from RDP and PhymmBL a color gradient can be used to represent the average reported confidence of assignments to each node. For MG-RAST, METAREP, and raw BLAST results, the nodes can be colored by average log of e-value or average percent identity. Also, since Phymm/PhymmBL and BLAST classifications can be performed either on reads or assembled contigs, the scripts for importing from these tools allow the optional specification of magnitudes for each classified sequence. A script is also provided to generate magnitudes based on reads per contig from assemblies in the common ACE file format. Other types of classifications can be imported from basic text files or an Excel template detailing lineage and magnitude. Finally, an XML file can be imported to gain complete control over the chart, including custom attributes and colors for each node. Since node attributes can contain HTML and hyperlinks, XML import allows Krona to be deployed as a custom data browsing and extraction platform in addition to a visualization tool.
Visual design
The Krona display resembles a pie chart, in that it subdivides separate classes into sectors, but with an embedded hierarchy. Each sector is overlaid with smaller sectors representing its children, which are squeezed toward the outside of the chart to give the parent room for labeling. This does not cause distortion because, as in a pie chart, magnitudes are represented by the angle of each sector rather than the area. For example, Figure 2 shows an oceanic metagenome [19] imported from METAREP. The taxon "Gammaproteobacteria" is selected, and the angle of the highlighted sector indicates the relative magnitude of the node (in this case 110,467 classified sequencing reads, as shown in the upper right corner). The sector also surrounds smaller sectors, which represent constituents of Gammaproteobacteria. In this case, the sum of the constituent angles equals the angle of the parent, indicating that no assignments were made directly to Gammaproteobacteria. If assignments had been made to this internal node, its angular sweep would be wider than the sum of its children's, clearly showing both the summary and the assigned amount in relation to each other.
A common criticism of RSF displays is the difficulty of comparing similarly sized nodes. To make comparisons easier, Krona sorts nodes by decreasing magnitude with respect to their siblings. In addition, the nodes can be colored using a novel algorithm that works with the sorting to visually emphasize both hierarchy and quantity. This algorithm, which is enabled by default, uses the hue-saturation-lightness (HSL) color model to allow procedural coloring that can adapt to different datasets. First, the hue spectrum is divided among the immediate children of the current root node. Each of these children in turn subdivides its hue range among its children using their magnitudes as weights. Coloring each sorted node by the minimum of its hue range causes recursive inheritance of node hue by the largest child of each generation. The result is visual consistency for lineages that are quantitatively skewed toward particular branches. To distinguish each generation without disrupting this consistency, the lightness aspect of the HSL model is increased with relative hierarchical depth, with saturation remaining constant.
Spatial efficiency
Metagenomic hierarchies can easily become too complex for all nodes to be discernibly apportioned and labeled on a computer screen. Although Krona ameliorates this problem with interactive zooming, it also offers several modifications to RSF displays that maximize the amount of information contained in each view.
First, radix-tree compression is used to collapse linear subgraphs in the hierarchy, simplifying the chart without removing quantitative relationships. Linear subgraphs, which represent multiple ranks of the same classification, occur when taxonomic classifications for a sample are mapped onto a full taxonomy tree. For example, if Homo sapiens were the only representative species of the class Mammalia, it would typically be redundantly classified under Primates, Hominids, and other ranks. To allow such classifications to be viewed, collapsing can be dynamically toggled, with animation depicting the transition. For additional simplification of complex trees, the taxonomy can be pruned to summarize the data at a specified depth. Figure 2, for example, shows an NCBI taxonomy summarized at a maximum depth of 6 levels and with linear subgraphs collapsed.
Second, since deeper taxonomical levels are often the most interesting (e.g. genus and species classifications), Krona allows significant quantities at these levels to be viewed in direct relation to the root of the hierarchy. This is accomplished by dynamically reducing the labeling area of intermediate classifications, removing their labels if necessary. Compression is increased moving outward from the center to ensure that the highest levels of the current view can also be labeled. The intermediate levels that have been compressed can always be seen more clearly by zooming.
Finally, Krona's labeling algorithms greatly increase textual information density compared to other RSF implementations. Space is used efficiently by orienting leaf node labels along radii and internal node labels along tangents. Internal labels use step-wise positioning and collision-based shortening to display as much text as possible while avoiding overlaps.
Polar-coordinate zooming
Because radial space-filling displays recursively subdivide angles, the shapes of the nodes approach rectangles as hierarchical depth increases and as node magnitudes decrease. Thus, zooming small nodes by simply scaling the entire figure in Cartesian coordinate space would result in a loss of the angular aspect that makes RSF displays intuitive and space-efficient. To increase the capacity of the displays without causing this problem, Krona uses a polar coordinate space for zooming. This is accomplished by increasing the angular sweep and radius of the zooming node until it occupies the same circle as the original overview. The angular sweeps of surrounding nodes are decreased simultaneously, creating an animated "fisheye" effect. This animation ensures user cognition of the change in context, and the final zoomed view retains the entire capacity as the original. Zooming can then be repeated for any node with children, providing informative views of even the deepest levels of a complex hierarchy. Zooming out to traverse up the hierarchy can be accomplished similarly by clicking ancestral nodes, which are shown in the center of the plot and as summary pie charts next to the plot. This triggers the reverse of the fisheye animation, compressing the current node to reveal its position in the new, broader context.
Multi-dimensional data
To visualize secondary attributes in addition to magnitude, individual nodes in Krona may be colored by variable. For categorical variables, users may define the color of every node in the XML. For quantitative variables, a gradient may be defined that will color each node by value. An example of this is shown in Figure 3, where each node is colored by a quantitative red-green gradient representing classification confidence.
Additionally, metagenomic data are often generated at discrete points across multiple locations or times. Krona is able to store the data from multiple samples in a single document. Individual samples may then be stepped through, at any zoom level, using the navigation interface at the top left. For example, in Figure 2 Krona is displaying one of seven depth samples from the oceanic water column. Advancing through these samples progresses through samples at greater and greater depths. The transition between samples is animated using a polar "tween" effect, emphasizing the difference between samples. The result of this style of navigation is a series of moving pictures, where the taxa dynamically grow and shrink from sample to sample-in this case as sampling descends the water column. This approach is eye-catching for a few samples, but direct comparison between many samples simultaneously is difficult with radial charts. Analysis across many samples is better left to traditional heatmap and differential barchart visualizations.