Software | Open | Published:
BMC Bioinformaticsvolume 18, Article number: 469 (2017)
Despite the ubiquity of mass spectrometry (MS), data processing tools can be surprisingly limited. To date, there is no stand-alone, cross-platform 3-D visualizer for MS data. Available visualization toolkits require large libraries with multiple dependencies and are not well suited for custom MS data processing modules, such as MS storage systems or data processing algorithms.
JS-MS enables custom MS data processing and evaluation by providing fast, 3-D visualization using improved navigation without dependencies. JS-MS is publicly available with a GPLv2 license at github.com/optimusmoose/jsms.
Mass spectrometry (MS) plays a role in many biological/biomedical investigations  because it can quantify and identify the major components (proteins, lipids, metabolites) of most cellular systems. MS is a long-standing technology that is vital for answering a variety of experimental questions across many disciplines . MS yields a list of identities and quantities of molecules in a sample through the analysis of signals generated by charged molecules (see Fig. 1).
While fragmented spectra are of principle interest in current proteomics mass spectrometry workflows, precursor or MS1 data is important in proteomics and other mass spectrometry applications due to its usefulness in constraining the peptide spectra search space, providing quantitative information, and in workflows (e.g. metabolomics or lipidomics) where fragmentation rules are not well known.
In such cases, there exists a need to view and interact with MS1 data. The most appropriate open source software for this purpose is TOPPView , a module of OpenMS. TOPPView opens any.mzML file and displays it in a 3-d representation with zoom and rotate capabilities. Repurposing TOPPView for manual MS data processing reveals several shortcomings:
TOPPView automatically filters out some visible points, making it impossible for the practitioner to dictate how these points ought to be interpreted.
The only way to annotate data in TOPPView is to export the visible subset of data as an.mzML file. The exported data will include hidden points, and further processing requires programming expertise to process the exported data.
TOPPView’s data plot does not allow for shifting to the right, left, up, or down. In order to shift the view, the user must zoom out and carefully zoom back in adjacent to the previous area. This makes navigation difficult, and keeping track of the unprocessed portion of the data exceedingly difficult.
TOPPView requires the installation of the large OpenMS library. This is not an issue on Windows or Mac, but for linux requires the installation of several difficult to install dependencies and a local build, which is beyond the technical ability of many potential users.
The form and function of current user interfaces do not provide the tools necessary to automate the user’s involvement, nor the visualizations necessary to efficiently iterate through the data.
JS-MS is designed under a modular view paradigm. It responds to a simple JSON API, making it extensible to any workflow. JS-MS is packaged with a modular back-end implemented in Java as an example of how it can operate modularly with any backend via a well-defined API. The package is provided as a single self-contained JAR file so that the only prerequisites to run the software are the Java Runtime Environment (JRE) and a web browser, typically pre-installed on any computer.
In this paper, we discuss the design of JS-MS and demonstrate its capabilities. JS-MS is publicly available and has been tested on Windows, Mac, and Linux operating systems with Chrome, Firefox, and Edge browsers.
An effective representation of the original full data set at any granularity is essential to maintaining a small rendering load while maintaining an accurate representation of the underlying data. Ideally, representations would be selected in such a way that the points chosen are real data points, that is, they are not an aggregation of points, and that the resulting graph looks similar to a graph of all original points within the viewing area. The first requirement is necessary for performing data processing, and the second requirement makes a great difference in usability and navigability of the data for the end user. The point selection process is referred to as “summarization”. While summarization takes place in the data store module and not JS-MS, a brief description of efficiencies designed into MzTree –JS-MS’s default data store module–is useful to understanding JS-MS’s design and performance.
The MzTree data structure performs summarization both statically and dynamically.
Static summarization occurs at MzTree construction time. First, the data set is culled to all points above an intensity threshold of 1–points with intensity below 1 are considered noise. This typically reduces data cardinality by over fifty percent. Second, MzTree is built recursively into a hierarchical bounding boxes. Each leaf of the MzTree contains data points, and each internal node samples a preset number of points from the set of all child nodes’ points. Each sample is merely a collection of pointers to avoid data duplication. This process has a trickle-up effect starting at the MzTree leaf level (all data); each level of the tree contains a sample set of points with a degree of summarization equal to its height. After construction completes the resulting MzTree has numerous prepared data samples of ascending detail levels for any given viewing area.
Dynamic summarization occurs at runtime on a per-query basis. Upon receiving a query (consisting of a viewing area and desired number of points) the MzTree is traversed breadth-first starting at the root node. At each level in the tree, each node’s data bounds are compared with the query’s viewing area. If the node’s data bounds overlap with the query’s viewing area its within-view points are included in the prospective point set for that level in the tree. The size of the prospective point set is then compared to the desired number of points. If the point set is smaller than the desired number of points the next level of the tree is processed (except at the leaf level, where fewer points than requested must be returned). If the level’s point set is sufficiently large dynamic summarization occurs on the point set to whittle it down to the desired number of points. Dynamic summarization enables query flexibility in the MzTree to ensure that JS-MS receives exactly the number of points requested.
While JS-MS is memory conservative by default, users have the option of increasing memory usage as a means of degrading render time. In the current implementation of the JS-MS viewer, a user-configurable level of detail determines the desired number of points per query. Detail level is implemented as a fixed number of points to render regardless of viewing area and location; by maintaining the same rendering load for any region of the data set, application performance remains uniform across an entire user session. Not all users have the same hardware capabilities or preferences for speed versus detail, thus the detail level is adjustable using a slider on the toolbar. In addition to the configurable detail level, JS-MS minimizes memory usage by keeping in memory only the currently visible data. Since the MzTree data server is exceedingly responsive, the viewer has no need to cache or prefetch data and can discard all loaded points from memory any time the viewing area changes.
JS-MS’s user interfaces and data rendering provide performance and functionality unavailable in other MS viewers.
The user is able to control the view range and perspective of the graph with a variety of graph transformations. Zoom level can be altered using one of two options: the mouse wheel, or a rectangular click and drag (see Fig. 3). The mouse wheel allows the user to zoom centered on the location of the mouse pointer. Alternatively, holding the control key activates click and drag zooming which allows the user to left-click and drag the mouse to specify a view window (see Fig. 3). In this manner rectangular zooming grants control over the graph’s aspect ratio.
JS-MS provides a rotation mechanism to allow users to view the data at different three-dimensional perspectives. By right-clicking and dragging, the graph can be rotated in any direction. Dragging vertically pitches the graph, dragging horizontally yaws.
The view range can be translated in all four horizontal directions by scrolling the graph. Scrolling is achieved by either left-clicking and dragging or using the arrow keys. Clicking and dragging scrolls the graph in the direction of the drag by the distance of the drag. Pressing an arrow key incrementally moves the graph in the key’s direction.
To further enable fast navigation, JS-MS is equipped with a jump to m/z feature. This feature allows the user to enter an m/z value and be immediately placed at that m/z position, while maintaining the current RT position, zoom level, and perspective.
To our knowledge, these features do not exist in any other 3-D MS viewing software. Without them, visiting each data point in a file would require zooming out and zooming in on adjacent data, a time-consuming practice resulting in loss of ones place and skipped points.
Two-dimensional mode provides a bird’s-eye view of the data. It is particularly useful for detecting low intensity points that would go unnoticed from a three-dimensional perspective. Switching to two-dimensional mode transforms data lines to squares so that they are visible from an overhead perspective. A variety of controls activate two-dimensional mode. Users can quickly view the data two-dimensionally from three-dimensional mode by either rotating to a bird’s-eye perspective or pressing and holding the shift key (releasing the shift key returns to the previous perspective). To persist the two dimensional view, the user can activate a toolbar toggle button to switch to two-dimensional mode. All applicable three-dimensional interfaces remain the same in two-dimensional mode, providing a seamless experience for the user. For example, the user can scroll the two-dimensional graph with the same controls as the three-dimensional graph.
.mzML is the standard open, vendor-neutral data format for storing and communicating mass spectrometry data and follows the XML schema . Parsing an.mzML file to extract all MS data points is necessary for using JS-MS, for using any other three-dimensional viewer, or for converting to novel file formats that are not spectrum ordered. This requires a new approach to reading.mzML files that is both fast and memory efficient.
There are mature libraries for parsing mzML files in many languages, such as C++ and Python; however, options for parsing.mzML files in Java are limited. Jmzml, an existing library for processing.mzML files in Java, is optimized for memory-efficient processing of large files. It implements on-the-fly indexing of files to ensure only requested data is loaded into memory . Jmzml is optimized for multiple, fast reads after a lengthy index-rich initialization, making it poorly suited for parsing.mzML to convert into another data structure. DOM (Document object model), a popular API for interacting with XML documents, is very fast when parsing large files but uses considerable memory (several times the size of the XML file). Since.mzML files can run into the tens of GBs, this is a limiting feature of DOM.
A single pass parser using the StAX (Streaming API for XML) interface in Java provides fast MS data extraction with minimal memory usage.
.mzML parsers based on the StAX API, the DOM API and Jmzml were implemented for reading the set of points in an.mzML file. The parsers were compared based on memory footprint and execution time. Experiments were carried out on a set of.mzML files from several MS experiments. File size varied from 4.3 to 1240 MB and included indexed and non-indexed.mzML files. Both uncompressed and zlib compressed binary arrays were used. Testing was performed on a personal computer running Ubuntu with an SSD. For timing experiments, minimum and maximum Java heap space were both set to 4 GB. Memory footprint was measured by reducing the Java heap space allocated until the parser failed to execute. The parsers were modified to discard points after accessing them in order to give an accurate measurement of the parser’s share of total memory.
Overall execution time was measured for all tests; in addition, build time for the parser object and actual parse time were measured for each of seven files by placing timers inside the code for each parser.
For each of the input.mzML files above 50 MB, a set of files containing subsets of the spectra for that original.mzML file were generated. For instance, for a file with 1000 spectra, a file containing only the first 20 of these spectra, a file containing the first 40 spectra, and so on could be created to yield a total of 50 files. In this way, the execution time for parsing as a function of file size or number of points was measured, with other aspects of the file held constant .
Memory usage of the different parsing objects in shown in Fig. 4. DOM parsing requires memory for indexing overhead that while amortized for larger file sizes, still leads to much greater requirements than Jmzml, which uses roughly two orders of magnitude less, and StAX, which uses about two orders of magnitude less. StAX memory requirements do not increase noticeably even on large files.
Figure 5 shows build time and parse time separately for the different parser objects. StAX outperformed DOM on all tests, with Jmzml requiring performing much slower than either alternative. under normal circumstances. For DOM, build time was minimal, and for StAX, build time was negligible.
Figure 6 shows plots of execution time versus file size for spectra subsets. The plots are linear for all parsers. This is to be expected, as file reading is a linear algorithm. For Jmzml, execution time increased more quickly with file size than for DOM and StAX, implying that Jmzml will always be slower even as the file size increases. This is likely because Jmzml has to read the file from the disk twice–once to index the file and once to access the data–while DOM and StAX only have to read the file once.
Performance-centric design allows JS-MS to handle the overwhelming processing and storage demands for a web environment created by the magnitude of mass spectrometry data. The viewer logic is fully contained in a thin client designed to flexibly interface with any data storage and retrieval system that interfaces with the JS-MS HTTP API. Our implementation is a Java web server that is designed to run either on the same machine as the client, or as a standard stand-alone server. The viewer is responsible for drawing the current window of data provided on the latest request to the server. The decoupling of the viewer from the backend allows implementation and testing of various different approaches for data storage and retrieval, algorithmic data processing, and manual inspection.
JS-MS is modular because it makes no assumptions about previous or successive data processes, other than what is defined in the API. You can use any backend of choice with JS-MS, and use the data for any downstream processes. JS-MS’s included Java server provides a working example of how users can interact with the viewer using its HTTP API, and also is a stand alone contribution for those who want to implement their own modular data processing tools for MS. The StAX parser for.mzML, which outperforms other publicly available parsers, should be useful for anyone who is using MS data with Java code. Those who wish to implement their own server or use another pre-existing server can do so provided they implement the API that JS-MS expects.
JS-MS is a cross-platform, modular, browser-based MS data viewer. It runs on any modern browser without additional dependencies. It is the first MS viewer to provide a full suite of navigational tools (including scroll). JS-MS’s advanced interfaces and novel backend enable sufficient speedup to make visual analysis practical.
Availability and requirements
Mass to charge ratio
Cravatt BF, Simon GM, Yates III JR. The biological impact of mass-spectrometry-based proteomics. Nature. 2007; 450(7172):991–1000.
Cole RB. Electrospray Ionization Mass Spectrometry: Fundamentals, Instrumentation, and Applications. New York: Wiley; 1997.
Sturm M, Kohlbacher O. TOPPView: An Open-Source Viewer for Mass Spectrometry Data. J Proteome Res. 2009; 8(7):3760–763.
Marrin C. WebGL specification: Khronos WebGL Working Group; 2011.
Danchilla B. Three.js framework. In: Beginning WebGL for HTML5. Berkley: Apress: 2012. p. 173–203.
Handy K, Rosen J, Gillan A, Smith R. Fast, axis-agnostic, dynamically summarized storage and retrieval for mass spectrometry data. PloS ONE. 2017. in review.
Deutsch E. mzML: a single, unifying data format for mass spectrometer output. Proteomics. 2008; 8(14):2776–7.
Côté RG, Reisinger F, Martens L. jmzML, an open-source Java API for mzML, the PSI standard for MS data. Proteomics. 2010; 10(7):1332–5.
Wilhelm M, Kirchner M, Steen JA, Steen H. mz5: space-and time-efficient storage of mass spectrometry data sets. Mol Cell Proteomics. 2012; 11(1):111–011379.
This material is based upon work supported by the National Science Foundation under Grant No. 1552240.
Availability of data and materials
JS-MS is publicly available with a GPLv2 license at github.com/optimusmoose/jsms.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
- Mass spectrometry visualization
- 3-D visualization
- Mass spectrometry data processing