OpenChrom: a cross-platform open source software for the mass spectrometric analysis of chromatographic data
© Wenig and Odermatt; licensee BioMed Central Ltd. 2010
Received: 9 March 2010
Accepted: 30 July 2010
Published: 30 July 2010
Today, data evaluation has become a bottleneck in chromatographic science. Analytical instruments equipped with automated samplers yield large amounts of measurement data, which needs to be verified and analyzed. Since nearly every GC/MS instrument vendor offers its own data format and software tools, the consequences are problems with data exchange and a lack of comparability between the analytical results. To challenge this situation a number of either commercial or non-profit software applications have been developed. These applications provide functionalities to import and analyze several data formats but have shortcomings in terms of the transparency of the implemented analytical algorithms and/or are restricted to a specific computer platform.
This work describes a native approach to handle chromatographic data files. The approach can be extended in its functionality such as facilities to detect baselines, to detect, integrate and identify peaks and to compare mass spectra, as well as the ability to internationalize the application. Additionally, filters can be applied on the chromatographic data to enhance its quality, for example to remove background and noise. Extended operations like do, undo and redo are supported.
OpenChrom is a software application to edit and analyze mass spectrometric chromatographic data. It is extensible in many different ways, depending on the demands of the users or the analytical procedures and algorithms. It offers a customizable graphical user interface. The software is independent of the operating system, due to the fact that the Rich Client Platform is written in Java. OpenChrom is released under the Eclipse Public License 1.0 (EPL). There are no license constraints regarding extensions. They can be published using open source as well as proprietary licenses. OpenChrom is available free of charge at http://www.openchrom.net.
Software has become an integral part of analysis techniques. Especially in the area of gas chromatography/mass spectrometry, automatic samplers enable high throughput analyses. Software assists handling large amounts of data generated by automated and fast operating analytical instruments. Modern computer systems are inexpensive, powerful and allow analysis techniques that could not have been applied in the past. Deconvolution, a chromatographic quality enhancing technique, demonstrates for instance that increasing processor power makes new analysis techniques applicable. The technique of deconvolution has been described by Biller and Biemann [1, 2], Dromey et al. , Colby , Hindmarch et al. , Halket et al. , Kong et al. , Taylor et al. , Pool et al. [9, 10] and Davies  in various ways. Stein  published an enhanced deconvolution algorithm that has been implemented in the software AMDIS (Automated Mass Spectral Deconvolution and Identification System) . AMDIS is available free of charge from the National Institute of Standards and Technology (NIST). Windig et al. [14, 15] described another approach to enhance chromatographic quality by a deconvolution method called CODA (Component Detection Algorithm). The commercially available software ACD/MS Manager  offers an implementation of this approach.
Increasing computational power enables new applications, but there is still a lack of interoperability. Instrument vendors, such as Agilent Technologies, Shimadzu, Thermo Fisher Scientific and Waters Corporation have created their own software and data format. Usually, the mass spectral data formats are binary and can only be accessed by the instrument vendors' proprietary software. Some commercial tools exist to convert the mass spectral data files into other formats, such as MASS Transit from PALISADE Corporation . To avoid these limitations, some efforts have been made to design and implement interoperable data formats and software libraries as for example NetCDF  or mzXML [19, 20]. But even if it is possible to convert the data files to other formats, there are drawbacks in data processing as each software implements specific functions, has its own graphical user interface and is in most cases commercially available only, as for example the applicable software of ChemStation, Xcalibur or MassLynx. Hence, the users are forced to become familiar with different software systems, user interfaces and methods. Moreover, the software tools primarily target only specific operating systems, such as Microsoft Windows and Mac OSX. The number of software applications that are independent of the operating system and can also be run under Unix or Linux is limited. Linux systems are open source, available at no cost and their usage increases in scientific research (see Scientific Linux ), as well as in the public sector [22, 23]. Software applications, such as AMDIS, have been published to be used free of charge, but their source code is not disposable. Thus, it is not possible to evaluate the algorithms implemented in the software. Especially in the case of scientific research, it is not possible to figure them out and to extend them. Even if algorithms are described in published papers [2, 4, 9, 12, 24], it is often impossible to validate them manually due to the complexity of chromatographic data. Other applications like ChemStation, Xcalibur, ACD/MS Manager are proprietary and closed source. They are only commercially available. There is no means of revealing the correctness of their utilized algorithms. Efforts have been made to solve the problems of missing interoperability and restricted access to source codes and algorithms . Bioclipse is a sophisticated project that is open source and is focused with its algorithms on metabolism analysis and gene sequencing. Its techniques are state-of-the-art. Some other projects are mMass , COMSPARI  and fityk , but they do have some restrictions regarding their interoperability and extensibility. BioSunMS  is a tool to read TOF (Time of Flight) mass spectral data files, but it is not able to read instrument vendors' native data files. The Chemistry Development Kit (CDK)  implements convenient features to edit chemical data and structures, but it has no appropriate user interface. The open source tool OpenMS  aims to edit mass spectrometric data, but it is not completely platform independent, as it is written in C++ programming language.
Projects like Bioclipse, Sashimi  or TPP (Trans-Proteomic Pipeline)  are focused on the evaluation of metabolism products and gene sequencing and make extensive use of accurate mass resolution techniques. But there is still a lack of software systems that are capable to enhance nominal mass spectral data files, that are flexible, extensible and that offer an easy to use graphical user interface. According to the authors' knowledge, no application offers functions to import vendor systems chromatographic data files and has the ability to edit and analyze chromatograms in the way ChemStation and AMDIS do. No application combines the flexibility in analyses, is easily extensible, open source, platform independent and has a configurable graphical user interface.
Tools in different areas have been implemented based on the Rich Client Platform, such as the Eclipse IDE (Integrated Development Environment), Lotus Notes, Bioclipse, BioSunMS, XMind, Apache Directory Studio and several more. It is part of the OpenChrom architecture to define useful extension points and to build a suitable object model.
Some selected bundles of the OpenChrom software.
Compare chromatograms and mass spectra
Converter to read binary/textual data files
Read Agilent data files
Read and write NetCDF data files
Modify chromatographic data
Identify chromatograms, mass spectra and peaks
Models (chromatogram, mass spectrum, peak,...)
Third party libraries (SWTChart, log4j,...)
Graphical User Interface
The Rich Client Platform offers a wide support to present an appropriate graphical user interface. Concepts detailing this include editors, views, perspectives, wizards, menus, cheat sheets, settings and help pages. OpenChrom makes extensive use of the available concepts. The editor shows the graphical representation of a chromatogram and several options, as for example a page to select or exclude distinct mass fragments. It also supports functions to save, edit and analyze chromatograms. The views are used to show different aspects of the chromatographic model. It is possible to show peaks in different kind of views. One view could show a peak including the background of the chromatogram. Another could show the peak with its increasing and decreasing tangents and its width at 50% height. A flexible mechanism was introduced to inform all views if the chromatogram selection has been changed. The update functionality is also realized by an extension point. Views and editors are composed in a task specific way using perspectives.
Results and Discussion
The menu "Chromatogram Edit" allows to access functions that modify or evaluate the chromatographic data. For example, all registered bundles that support filters will be listed in the sub menu "Filter". It is possible to add a filter that implements a Savitzky-Golay  smoothing operation or to add filters that remove the background of the chromatogram. Each action will be performed on the active chromatogram selection. Actions are commonly very fast, due to the fact that the chromatogram is kept in the random access memory (RAM), depending on the implemented algorithms. Furthermore, the filter actions are reversible. This editing support is well known from modern IDEs and office suites. But the support for do/undo and redo operations does cost processing time. If the reversibility is not needed, it can be deactivated in the applications preference dialog. Another extension point is responsible to register baseline detectors. Different baseline detectors can be implemented in separated bundles and will be offered in the "Baseline Detectors" sub menu. Peak detection and integration are done commonly in one run. One improvement achieved through OpenChrom is a division of the detection and the integration of peaks into two separated actions. The peak detectors can be applied by calling an appropriate detector in the sub menu "Peak Detectors" and the peak integration can be performed by using an listed integrator from the sub menu "Integrators". The separation of detector and integrator methods makes it possible to detect peaks in a chromatogram using several algorithms and methods. The chosen peak detectors could be of different types, as for example detectors using deconvolution techniques like AMDIS or CODA. All detected peaks can afterwards be integrated by a unique integrator, which leads to comparable results. This feature offers a high flexibility in using different kinds of detectors and integrators.
Further on, property views show miscellaneous values of the selected chromatogram. Due to the chromatogram object model, different values will be shown if different chromatogram files have been loaded. Chromatograms from Agilent Technologies and NetCDF differ in their information content. Hence, the properties view helps to inspect the files. There are additional extension points implemented that enable adding bundles to compare mass spectra using different methods [24, 37–40] or to identify peaks or chromatograms. A method similar to the one implemented in the software F-Search  from Frontier Laboratories Ltd. could be used to identify chromatograms, for example.
Moreover, the OpenChrom platform supports bundles with a system built-in logging mechanism that extends the Apache project log4j. Each module can use the logging mechanism which makes it easier to detect problems and failures. Bundles are further separated into fragments, which allows the separation of concerns. Each OpenChrom bundle supports an internationalization (i18n) and JUnit test fragment. At the moment, approximately 3000 unit tests are written and can be executed to ensure the quality of the software.
If necessary, the extension point mechanism gives the flexibility to add functions needed by users at any time. Thus, OpenChrom can be connected to other systems, as for example to LIMS (Laboratory Information Management System), databases, existing software tools or workflow systems. The object model of OpenChrom offers a convenient access to values and results from the edited chromatograms. Specialized modules take care of how to handle specific concerns, for example how to store results in an information management system. Further on, it is possible to implement bundles for specific analyses or for an automated experimentation.
OpenChrom enables several ways to edit and analyze chromatographic data. The advantage of the flexibility and the abstract architecture makes it partly difficult to get started with the platform, even if the functionality is provided by different bundles to decrease its complexity and to focus on special tasks. The intention to publish the software under an open source license is to support code contributions and to open the project for individual solutions. Moreover, the separation into bundles makes it easier for others to contribute new functionality. Further improvements will be done to optimize the current algorithms and to develop new and better filters, peak detectors and integrators.
OpenChrom has been designed to become an extensible cross-platform open source software for the mass spectrometric analysis of chromatographic data. It provides extension points to enable built-in import capabilities for binary or textual instrument vendors' data formats. In addition to its custom XML format it supports the Agilent Technologies, mzXML and NetCDF mass spectrometric data format. Further development is planned to support more data formats. The open source concept has been chosen to initiate the contributions of third parties, as it depends on the ideas and needs of the community to extend the capabilities of the presented concept. OpenChrom offers extension points that enable the implementation of different baseline detectors as well as peak detectors and integrators. Furthermore, there is an option to implement filters, used to increase the chromatographic quality. The framework offers a full support of do/undo and redo operations. The examples Bioclipse and BioSunMS show how to use the Eclipse Rich Client Platform in a specific way, but no software has been published until now that is capable to import binary chromatographic files natively, offers support to edit and analyze chromatograms and makes it possible to implement new algorithms and methods. As it is open source, everybody has the possibility to inspect the implemented algorithms and methods, especially for verification. OpenChrom is a software with a special focus on the editing and evaluation of mass spectrometric chromatographic data. OpenChrom will be hopefully extended by contributing developers, scientists and companies in the future.
Availability and requirements
Project name: OpenChrom
Project homepage: http://www.openchrom.net
Operating systems: Platform independent
Programming language: Java
Java Runtime Environment: Sun/Oracle JVM 1.6.0, OpenJDK
Minimum RAM: 500 MB
Minimum Processor: 1 GHz
Commercial restrictions: none
OpenChrom is available for download free of charge from the project home page.
The Agilent data file input converter must be installed separately using the OpenChrom update mechanism. The instructions how to install the converter can be found at the following website: http://www.openchrom.net/plugins/converter/agilent.
OpenChrom is licensed under the Eclipse Public License 1.0 (EPL). The EPL is an OSI approved open source license that ensures, that the source code will remain open source. OpenChrom uses some third party libraries that are partly published under different open source licenses. All third party libraries are available in separated bundles, to ensure that no license conflicts occur. The third party library bundles are published under the Apache, LGPL, AGPL and EPL license, depending on the bundle. The GPL licenses are viral, it means that derivative works must be published under the GPL license too. The EPL and Eclipse Rich Client Platform enable a different licensing for the bundles, as a bundle using methods of another bundle can not be seen as a derivative work, though it only uses its interfaces.
The authors thank all participants at the Department of Wood Science (University of Hamburg, Germany) for their support and their helpful suggestions.
- Biller JE, Herlihy WC, Biemann K: Identification Of Components Of Complex-Mixtures By Gcms. Abstracts Of Papers Of The American Chemical Society 1977, 173(MAR20):23–23.Google Scholar
- Biller JE, Biemann K: Reconstructed Mass-Spectra - Novel Approach For Utilization Of Gas Chromatograph - Mass-Spectrometer Data. Analytical Letters 1974, 7(7):515–528.View ArticleGoogle Scholar
- Dromey RG, Stefik MJ, Rindfleisch TC, Duffield AM: Extraction Of Mass-Spectra Free Of Background And Neighboring Component Contributions From Gas Chromatography Mass Spectrometry Data. Analytical Chemistry 1976, 48(9):1368–1375. 10.1021/ac50003a027View ArticleGoogle Scholar
- Colby BN: Spectral Deconvolution For Overlapping Gc Ms Components. Journal of the American Society for Mass Spectrometry 1992, 3(5):558–562. 10.1016/1044-0305(92)85033-GView ArticlePubMedGoogle Scholar
- Hindmarch P, Demir C, Brereton RG: Deconvolution and spectral clean-up of two-component mixtures by factor analysis of gas chromatographic mass spectrometric data. The Analyst 1996, 121(8):993–1001. 10.1039/an9962100993View ArticleGoogle Scholar
- Halket JM, Przyborowska A, Stein SE, Mallard WG, Down S, Chalmers RA: Deconvolution gas chromatography mass spectrometry of urinary organic acids - Potential for pattern recognition and automated identification of metabolic disorders. Rapid Communications In Mass Spectrometry 1999, 13(4):279–284. 10.1002/(SICI)1097-0231(19990228)13:4<279::AID-RCM478>3.0.CO;2-IView ArticlePubMedGoogle Scholar
- Kong HW, Ye F, Lu X, Guo L, Tian J, Xu GW: Deconvolution of overlapped peaks based on the exponentially modified Gaussian model in comprehensive two-dimensional gas chromatography. Journal Of Chromatography A 2005, 1086(1–2):160–164. 10.1016/j.chroma.2005.05.103View ArticlePubMedGoogle Scholar
- Taylor J, Goodacre R, Wade WG, Rowland JJ, Kell DB: The deconvolution of pyrolysis mass spectra using genetic programming: application to the identification of some Eubacterium species. FEMS Microbiology Letters 1998, 160(2):237–246. 10.1111/j.1574-6968.1998.tb12917.xView ArticlePubMedGoogle Scholar
- Pool WG, deLeeuw JW, vandeGraaf B: Backfolding applied to differential gas chromatography mass spectrometry as a mathematical enhancement of chromatographic resolution. Journal Of Mass Spectrometry 1996, 31(5):509–516. 10.1002/(SICI)1096-9888(199605)31:5<509::AID-JMS323>3.0.CO;2-BView ArticleGoogle Scholar
- Pool WG, deLeeuw JW, vandeGraaf B: Automated extraction of pure mass spectra from gas chromatographic mass spectrometric data. Journal Of Mass Spectrometry 1997, 32(4):438–443. 10.1002/(SICI)1096-9888(199704)32:4<438::AID-JMS499>3.0.CO;2-NView ArticleGoogle Scholar
- Davies A: The new Automated Mass Spectrometry Deconvolution and Identification System (AMDIS). spectrosceur 1998, 10(3):22–26.Google Scholar
- Stein SE: An integrated method for spectrum extraction and compound identification from gas chromatography/mass spectrometry data. Journal of the American Society for Mass Spectrometry 1999, 10(8):770–781. 10.1016/S1044-0305(99)00047-1View ArticleGoogle Scholar
- Windig W, Smith WF: Chemometric analysis of complex hyphenated data - Improvements of the component detection algorithm. Journal Of Chromatography A 2007, 1158(1–2):251–257. 10.1016/j.chroma.2007.03.081View ArticlePubMedGoogle Scholar
- Windig W, Phalp JM, Payne AW: A noise and background reduction method for component detection in liquid chromatography mass spectrometry. Analytical Chemistry 1996, 68(20):3602–3606. 10.1021/ac960435yView ArticleGoogle Scholar
- ACD Labs[http://www.acdlabs.com]
- Palisade Corporation[http://www.palisade.com]
- Pedrioli PGA, Eng JK, Hubley R, Vogelzang M, Deutsch EW, Raught B, Pratt B, Nilsson E, Angeletti RH, Apweiler R, Cheung K, Costello CE, Hermjakob H, Huang S, Julian RK, Kapp E, McComb ME, Oliver SG, Omenn G, Paton NW, Simpson R, Smith R, Taylor CF, Zhu WM, Aebersold R: A common open representation of mass spectrometry data and its application to proteomics research. Nature Biotechnology 2004, 22(11):1459–1466. 10.1038/nbt1031View ArticlePubMedGoogle Scholar
- Falkner JA, Falkner JW, Andrews PC: ProteomeCommons.org IO Framework: reading and writing multiple proteomics data formats. Bioinformatics 2007, 23(2):262–263. 10.1093/bioinformatics/btl573View ArticlePubMedGoogle Scholar
- Scientific Linux[http://en.wikipedia.org/wiki/Scientific_Linux]
- Alfassi ZB: On the normalization of a mass spectrum for comparison of two spectra. Journal of the American Society for Mass Spectrometry 2004, 15(3):385–387. 10.1016/j.jasms.2003.11.008View ArticlePubMedGoogle Scholar
- Spjuth O, Helmus T, Willighagen EL, Kuhn S, Eklund M, Wagener J, Murray-Rust P, Steinbeck C, Wikberg JE: Bioclipse: an open source workbench for chemo- and bioinformatics. BMC Bioinformatics 2007, 8: 59–68. 10.1186/1471-2105-8-59View ArticlePubMedPubMed CentralGoogle Scholar
- Cao Y, Wang N, Ying XM, Li AL, Wang HS, Zhang XM, Li WJ: BioSunMS: a plug-in-based software for the management of patients information and the analysis of peptide profiles from mass spectrometry. BMC Medical Informatics and Decision Making 2009, 9: 1–9. 10.1186/1472-6947-9-13View ArticleGoogle Scholar
- Steinbeck C, Han YQ, Kuhn S, Horlacher O, Luttmann E, Willighagen E: The Chemistry Development Kit (CDK): An open-source Java library for chemo- and bioinformatics. Journal of Chemical Information and Computer Sciences 2003, 43(2):493–500.PubMedGoogle Scholar
- Sturm M, Bertsch A, Gropl C, Hildebrandt A, Hussong R, Lange E, Pfeifer N, Schulz-Trieglaff O, Zerck A, Reinert K, Kohlbacher O: OpenMS-An open-source software framework for mass spectrometry. BMC Bioinformatics 2008, 9: 163–173. 10.1186/1471-2105-9-163View ArticlePubMedPubMed CentralGoogle Scholar
- TPP (Trans-Proteomic Pipeline)[http://tools.proteomecenter.org]
- Eclipse Rich Client Platform[http://wiki.eclipse.org/index.php/Rich_Client_Platform]
- Horstmann CGCS: Core Java 2: Fundamentals. Upper Saddle River, NJ, Prentice Hall; 2002.Google Scholar
- Savitzky A, Golay MJE: Smoothing + Differentiation Of Data By Simplified Least Squares Procedures. Analytical Chemistry 1964, 36(8):1627–1639. 10.1021/ac60214a047View ArticleGoogle Scholar
- McLafferty FW, Zhang MY, Stauffer DB, Loh SY: Comparison of algorithms and databases for matching unknown mass spectra. Journal of the American Society for Mass Spectrometry 1998, 9: 92–95. 10.1016/S1044-0305(97)00235-3View ArticlePubMedGoogle Scholar
- Loh SY, McLafferty FW: Exact-mass probability based matching of high-resolution unknown mass-spectra. Analytical Chemistry 1991, 63(6):546–550. 10.1021/ac00006a002View ArticleGoogle Scholar
- Damen H, Henneberg D, Weimann B: Siscom - a new library search system for mass spectra. Analytica Chimica Acta 1978, 103(4):289–302. 10.1016/S0003-2670(01)83095-6View ArticleGoogle Scholar
- Alfassi ZB: Vector analysis of multi-measurements identification. Journal Of Radioanalytical And Nuclear Chemistry 2005, 266(2):245–250. 10.1007/s10967-005-0899-yView ArticleGoogle Scholar
- Frontier Labs[http://www.frontier-lab.com]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.