Decon2LS: An open-source software package for automated processing and visualization of high resolution mass spectrometry data
© Jaitly et al; licensee BioMed Central Ltd. 2009
Received: 07 June 2008
Accepted: 17 March 2009
Published: 17 March 2009
Data generated from liquid chromatography coupled to high-resolution mass spectrometry (LC-MS)-based studies of a biological sample can contain large amounts of biologically significant information in the form of proteins, peptides, and metabolites. Interpreting this data involves inferring the masses and abundances of biomolecules injected into the instrument. Because of the inherent complexity of mass spectral patterns produced by these biomolecules, the analysis is significantly enhanced by using visualization capabilities to inspect and confirm results. In this paper we describe Decon2LS, an open-source software package for automated processing and visualization of high-resolution MS data. Drawing extensively on algorithms developed over the last ten years for ICR2LS, Decon2LS packages the algorithms as a rich set of modular, reusable processing classes for performing diverse functions such as reading raw data, routine peak finding, theoretical isotope distribution modelling, and deisotoping. Because the source code is openly available, these functionalities can now be used to build derivative applications in relatively fast manner. In addition, Decon2LS provides an extensive set of visualization tools, such as high performance chart controls.
With a variety of options that include peak processing, deisotoping, isotope composition, etc, Decon2LS supports processing of multiple raw data formats. Deisotoping can be performed on an individual scan, an individual dataset, or on multiple datasets using batch processing. Other processing options include creating a two dimensional view of mass and liquid chromatography (LC) elution time features, generating spectrum files for tandem MS data, creating total intensity chromatograms, and visualizing theoretical peptide profiles. Application of Decon2LS to deisotope different datasets obtained across different instruments yielded a high number of features that can be used to identify and quantify peptides in the biological sample.
Decon2LS is an efficient software package for discovering and visualizing features in proteomics studies that require automated interpretation of mass spectra. Besides being easy to use, fast, and reliable, Decon2LS is also open-source, which allows developers in the proteomics and bioinformatics communities to reuse and refine the algorithms to meet individual needs.
Decon2LS source code, installer, and tutorials may be downloaded free of charge at http://http:/ncrr.pnl.gov/software/.
High resolution mass spectrometry (MS) is used extensively in proteomics and metabolomics studies to identify and quantify proteins and metabolites . This information is inferred from peak patterns observed in either mass spectra of intact proteins, digested proteins (i.e., peptides) or metabolites, or tandem mass spectra (MS/MS) of proteins, peptides, or metabolites fragmented as a result of collision-induced dissociation within the instrument. While hundreds of individual species can be resolved from a single mass spectrum, even relatively simple proteomics and metabolomics samples can result in thousands of overlapping isotopic patterns. As these patterns may not be readily separable by the instrument or discernible by downstream processing algorithms, liquid chromatography and gas chromatography (GC) are often coupled to MS to reduce the complexity of an individual mass spectrum. Biological material eluting from the chromatographic system is continuously transferred to the mass spectrometer during the course of the analysis, and a mass spectrum is captured at regular intervals. As a result, a single experiment may contain thousands of mass spectra that require automated interpretation.
A single mass spectrum is composed of a list of ion mass-to-charge (m/z) ratios and their abundance values. As most elements (e.g., carbon, hydrogen, etc.) are naturally present in different isotopic forms, a population of the same molecular species produces a pattern that reflects the incorporation of the different isotopic contributions . As a result, a charged species is observed not as a single peak but as a pattern of peaks whose relative heights and m/z depend on the isotopic distribution of the elements they are composed of and operational aspects of the instrument such as resolution, type of detector, etc. Thus simply selecting each observed peak as a unique chemical species would give rise to too many false positives. Proper inference of chemical species from the mass spectra requires that the pattern of related peaks be grouped together into unique explanatory isotopic patterns, a process typically referred to as deisotoping. Since carbon, hydrogen, oxygen, nitrogen, phosphorus, and sulphur are the main elemental constituents of most biomolecules, their isotopic distribution in nature (or in the biological material the sample was grown) determines the spacing between observed peaks, as well as the relative heights of the peaks. (In should be noted here that the difference between masses of isotopes of the same element is not the same as the difference in the mass for the different number of neutrons between them because of the mass deficit resulting from nuclear binding energy of each nucleus. Hence, the mass difference between 13C and 12C, for example, is different from that between 1H and 2H and from half of the difference between 34S and 32S. In fact, in high resolution FTICR-MS runs one can discern the contribution of the 34S atom to the third isotope, separate from that of other atoms). The isotopic distributions of unmodified peptide species are primarily influenced by carbon as it has the largest proportion of naturally abundant isotope to any alternative isotope (i.e., 98.89% 12C, 1.11% 13C). Thus, "isotopic" species of peptides have a mass difference on average, of ~1.003 Da (the difference between the masses of 13C and 12C). The spacing between the m/z peaks is approximately equal to 1.003/charge and relative peak heights depend on the elemental composition of the peptides. Since the hundreds of differently charged peptides transferred to the mass spectrometer may have abundances that span several orders of magnitude, the resulting mass spectrum can be composed of complex overlapping contributions. As a result, "deisotoping" algorithms are required to collapse a complex mass spectrum into a representative set of peptide (or metabolite) masses (typically the monoisotopic species) and their respective abundance values. It may be noted here that the nature of the spectrum observed is affected by the complexity of the sample and the distribution of masses and charges of the chemical species in the sample. Hence the complexity of deisotoping step depends on these factors. For example, ESI spectra are typically harder to deisotope than MALDI spectra because most peptides carry multiple charges thus requiring the ability to accurately determine charge state.
Several deisotoping and visualization algorithms have been described in the literature [3–9] and are available as part of commercial vendor packages such as XCalibur (Thermo Fisher), MassHunter (Agilent) and Elucidator System (Rosetta Biosoftware). For example, BUDA, which is used to analyze FTICR data, incorporates the algorithm described by Kaur & O'Connor  to deisotope intact protein mass spectra. Pep3D is useful for visualizing an LC-MS dataset as a density plot of abundance (as a function of m/z and scan), but does not perform deisotoping. MapQuant  is used in the PePPer platform  to deisotope LC-MS data and has image processing algorithms that utilize the retention time dimension. While XCMS  is useful for data pre-processing, peak-detection, retention-time alignment and peak matching across samples, this algorithm also does not perform deisotoping. msInspect  is an analysis package for deisotoping high-resolution LC-MS data in mzXML format in sequential steps. Peaks are first detected in each scan (along the m/z dimension) using a wavelet additive decomposition method and then peaks identified as eluting isotopes in the retention dimension are retained. Peak clusters are identified as peptides by comparing the heights of observed isotopes to the heights of theoretical isotopes, which are approximated using a Poisson distribution. OpenMS is a recently released pipeline which allows rapid application development for liquid-chromatography mass spectrometry analyses. In addition to providing rich visualization capabilities it also provides implementation of feature discovery algorithms, retention time alignment algorithms and other processing routines. Vendor software systems also provide access to some analytical capabilities. However, the algorithms employed are typically proprietary and specific to vendor systems, preventing standardization across different instrument types. In addition the range of options provided by the vendor software systems is typically limiting for users with fairly specific needs and customization can be challenging if at all possible. For example, while the Thermo Fisher XCalibur system can be used to extract a list of deisotoped species, it can only do so for a limited set of the most abundant features. As a result, users with specific needs such as access to lists of observed peaks, isotopic patterns, chemically labeled pairs etc, typically have to write their own analytical software.
THRASH (Thorough High Resolution Analysis of Spectra by Horn) is one of the most well known and comprehensive algorithm for analyzing mass spectra. It includes methods for calculating background noise levels, determining charge state using the Fourier-Transform/Patterson technique , calculating theoretical profiles [2, 17], and for subsequent fitting with observed isotopic profiles. While a functional application of THRASH was not provided by its developers, the algorithm was reportedly incorporated into the MIDAS  data system. Another application of THRASH was released in the form of ICR2LS (available only in executable format; G.A. Anderson, http://ncrr.pnl.gov/software). We have recently modularized the deisotoping and other algorithms previously implemented in ICR2LS (developed in Visual Basic 6 and unpublished) into an open-source software package referred to as Decon2LS, which has been developed in C++ with several improvements to optimize performance by almost an order of magnitude.
Herein, we describe Decon2LS, a software package for finding and visualizing features in high resolution MS datasets. Decon2LS uses a derivative of THRASH to determine monoisotopic mass lists from a given dataset of m/z and intensity values across scans and supports several different file formats for data visualization, including Thermo Fisher .RAW files, Agilent TOF .wiff, Micromass .dat files and Bruker .ser files provided the libraries are installed. In addition, mzXML standard and ascii file formats are supported, which provides users with a single tool for processing high resolution data from all major MS formats. Decon2LS has already been applied extensively in our laboratory to analyze >15,000 datasets obtained using different types of high resolution mass spectrometers. While the THRASH algorithm was developed for mass spectra of intact proteins and Decon2LS may be used for this purpose with a suitable adjustment of parameters, the implementation is geared towards deisotoping of spectra of proteins and peptides of mass less than 10,000 Da.
A typical mass spectrum of a proteomics sample consists of patterns of isotopic peak distributions for many different peptides, each with its own charge and intensity that can vary over five orders of magnitude. The pattern for a singly charged peptide is made up of several peaks spaced ~1 Thompson (unit of m/z) apart, representing a mass difference of ~1 amu. (In reality, the individual "isotopic" peaks are composed of "fine structure" peaks, representing the slight mass differences for the different possible isotopic combinations of the elements, which are not generally resolvable by the mass spectrometer). The same peptide, but with a higher charge generates peaks with similar relative intensities as in the single charge state, but with spacing that is ~1.003/charge. Determining the masses present in a spectrum is a challenging task that requires selecting peak patterns most likely to represent peptides and metabolites in the spectrum. The Decon2LS software package uses an in house modified version of the THRASH algorithm to select these peak patterns.
Peaks are picked from the mass spectrum provided the signal-to-noise ratio (calculated as the ratio of the intensity of the peak to the average intensity at the valleys of the peak) is greater than a user-specified threshold. In addition, the intensity of the peak is required to be greater than a user-specified multiple of the background intensity computed for the spectrum. This background intensity is computed for the entire spectrum as the average intensity of those points that are within five standard deviations of the average of all points in the spectrum. The selected peaks are processed in steps 2–6.
While unprocessed peaks remain in the spectrum, the charge of the most intense unprocessed peak is determined using the autocorrelation algorithm , and the average mass is computed from the m/z of the peak and the computed charge.
The most likely empirical composition is determined using the average mass determined in step 2 and the Averagine algorithm , which assumes an empirical composition that is equal to the average composition of all the peptides in a protein FASTA file. By default, Decon2LS uses the average formula derived from the Swiss-Prot protein FASTA database.
- 4.A theoretical spectrum is generated using the Mercury algorithm  for the empirical formula generated in step 3 and fit against the observed spectrum by aligning the most abundant peak of the theoretical spectrum to the peak under consideration after scaling each peak to an intensity of 100. The score for the fit is computed on the basis of the similarity of the theoretical pattern to the observed pattern, using one of the following three user selectable functions: a. Area fit function:
- b.Peak fit function:
Chi-square fit function, as specified in Senko, et al. .
Alternative fits are also computed by aligning the most abundant peak of the theoretical pattern with the observed isotopic peaks ("THRASHing"), which is accomplishing by moving the theoretical pattern 1.003 Da (using charge to convert Da to Thompson units). The user is able to choose between THRASHing to the next isotopic peak only as long as scores improve and as long as a peak is found at the appropriate Thompson distance away (Complete Fit). The highest of the fit values is maintained for further consideration.
If an acceptable fit is found, then the isotopic peaks of the observed peak are deleted and the points in the spectrum are set to 0. The number of isotopic peaks deleted is specified indirectly by a user-specified Deletion parameter that specifies the minimum relative abundance (compared to most abundant isotope) of all isotopic peaks to be deleted. Monoisotopic mass of the feature is calculated as the monoisotopic mass of the theoretical spectrum when overlaid with the best-fit isotope peak.
If an acceptable fit is not found, then the current peak is removed from the list of unprocessed peaks and the process is repeated, starting with step 2.
Because the nature of the theoretical isotopic profile does not change much over the range of 1 Da mass, we optimized the performance of the algorithm by caching the isotopic distribution at every integer mass by only storing the position and the relative intensity of isotopic peaks. The theoretical profile for a given mass and resolution can then be created from the cached intensities and masses of isotopic peaks by super imposing appropriate peak shapes on them.
Parameters required to control all functionalities can be visualized through an options form in Decon2LS. The main parameters include: options for specifying the processing parameters used in an analysis e.g. type of peak-fitting model (apex, three point quadratic, lorentzian); signal-to-noise thresholds; an averaging window size to sum spectra along the retention time dimension to enhance low intensity isotopic patterns seen over multiple scans; fitness score thresholds to control the rate of false positive features;, scan and m/z range applicable to running the algorithms; spectral pre-processing options for zero-filling and smoothing using the Savitzky- Golay filter; and the ability to change the isotopic composition of naturally occurring isotopes. All parameters can be saved in an XML format for future use.
Features file ([dataset]_isos.csv). This comma delimited file provides details for all deisotoped features, such as monoisotopic mass, most abundant isotope intensity, scan number, fitness score, and other feature-relevant information.
Scan summary file ([dataset]_scans.csv). This comma delimited file contains summary statistics for the LC scans in the datasets such as the scan type (MS or MS/MS), the base peak m/z, the total-intensity-chromatogram value per scan, number of peaks present, and number of peaks deisotoped.
Peaks file ([dataset].dat). This file contains relevant binary data such as the peak information and the deisotoped records that are visualized using the two dimensional view.
The output of Decon2LS can be loaded into auxiliary software such as VIPER , which identifies peptides by comparing LC-MS mass and elution time features to peptides in a database constructed from previous LC-MS/MS analyses .
With a modular design and open source license, Decon2LS allows users to extrapolate necessary modules based purely on functionality for individual use. A good example of this reusability aspect is the DeconMSn tool  that uses the raw data readers, and the peak-finding and THRASH routines (including Averagine and Mercury) that are incorporated as part of the DeconEngine library to process LC-MS/MS datasets. X2XML (available at http://ncrr.pnl.gov) is another tool that uses the multiple raw data readers to convert several file-formats to the mzXML standard for representing data. The visualization routines and user controls may readily be plugged into other .NET applications.
Results and discussion
Description of the samples analyzed by Decon2LS for this study
Shewenella oneidensis MR-1
Thermo Fisher LTQ-FT linear ion trap-FTICR hybrid mass spectrometer with electrospray ionization (ESI)
A chemostatic growth experiment in oxygen-limited condition with the addition of fumarate (an electron-acceptor)
11.5-Tesla FTICR instrument, designed and constructed in our laboratory at PNNL 
Insoluble preparation of cells that were carbon-limited (glucose limitation) as part of a time course study
Thermo Fisher LTQ-Orbitrap hybrid mass spectrometer with electrospray ionization (ESI)
Samples from human nipple aspirate fluid collected as part of bilateral studies of possible ductal carcinoma in breast tissue
Main processing parameters and values used in analysis
Sets the type of peak-fitting to be performed (APEX, LORENTZIAN or Quadratic)
Sets the signal-noise ratio as the ratio of maximum peak intensity to the minimum of floor intensities on either side of the peak
Minimum Background Ratio
Sets the maximum intensity level to be considered as background
Horn Transform Parameters
Mass of charge carrier
Maximum mass to consider
Maximum charge to consider
If set, scores each isotopic profile in stops of +/- 1 Da for fit to data, exits and returns if new_score > current_score
If set, works same as THRASH except the best fit from a series of fits is returned
Sets the method of fitting theoretical and observed distributions (Area, Peak or Chi-Squared)
Sets the number of allowable shoulders as the number of non-decreasing peaks preceding a minima for it to be considered a shoulder
Range 0 to 1, measure the maximum difference allowable between theoretical and observed distribution
Threshold intensity for score
Sets the minimum normalized intensity (0-100) for selection of the area of the peak that will be used for fit calculation
Threshold intensity for deletion
Sets the intensity threshold (normalized 0 – 100) that determines which areas of a peak are to be deleted.
Results of processing three LC-MS datasets using Decon2LS
MS scans in dataset
peaks in dataset
isotopic patterns found
1 hr 08 min
Once all ions were deisotoped, the software tool VIPER  was used to find LC-MS features by grouping deisotoped features of similar masses in neighbouring LC scans into a single mass and retention time feature. These LC-MS features can be used to identify proteins and metabolites .in proteomics and metabolomics studies .
Decon2LS, a new software package for automated processing and visualization of LC-MS datasets supports multiple vendor formats, making it useful for analyzing data from different MS instruments. Datasets can be viewed in traditional mode, whereby a TIC can be used to select which scan of the analysis to view in a mass spectrum pane. A two dimensional view is also supported in which a contour map of peaks is linked to a selected ion chromatogram and a mass spectrum pane that aids users in choosing correct deisotoping parameter values.
Decon2LS uses a variant of the THRASH algorithm to determine accurate monoisotopic masses for the vast majority of observed isotopic distributions. The deisotoped results can be viewed by overlaying theoretical patterns on the observed spectrum so that the user has feedback on how the deisotoping worked for a particular dataset. Through the use of indexing data structures and faster search routines, Decon2LS is an order of magnitude faster than ICR2LS.
Availability and requirements
Project Name: Decon2LS
Project Home Page: http://omics.pnl.gov/software/Decon2LS.php
Operating System: Microsoft Windows XP.
Programming Language: Algorithms in C++, Visualization in C#
Other Requirements:.NET framework 1.1 for operation, Visual Studio 2003 for compilation
License: Apache 2.0
Any Restrictions to use by non-academics: None
Portions of this research were supported by the NIH National Center for Research Resources (RR018522), and the U. S. Department of Energy (DOE) Office of Biological and Environmental Research. Datasets were obtained in the Environmental Molecular Sciences Laboratory, a DOE national scientific user facility located at the Pacific Northwest National Laboratory (PNNL) in Richland, Washington. PNNL is a multiprogram national laboratory operated by Battelle for the DOE under Contract No. DE-AC05-76RLO 1830.
- Liu T, et al.: Accurate Mass Measurements in Proteomics. Chemical Reviews 2007, 107: 3621–3653. 10.1021/cr068288jPubMed CentralView ArticlePubMedGoogle Scholar
- Rockwood AL, Van Orden SL, Smith RD: Rapid Calculation of Isotope Distributions. Anal Chem 1995, 67: 2699–2704. 10.1021/ac00111a031View ArticleGoogle Scholar
- Sturm M, et al.: OpenMS-An open-source software framework for mass spectrometry. Bmc Bioinformatics 2008., 9:Google Scholar
- Mann M, Meng CK, Fenn JB: Interpreting mass spectra of multiply charged ions. Anal Chem 1989, 61: 1702–1708. 10.1021/ac00190a023View ArticleGoogle Scholar
- Reinhold BB, Reinhold VN: Electrospray Ionozation Mass Spectrometry: Deconvolution by an Entroby-Based Algorithm. Journal of American Mass Spectrometry 1992, 2: 207–215. 10.1016/1044-0305(92)87004-IView ArticleGoogle Scholar
- Zhang Z, Marshall AG: A universal algorithm for fast and automated charge state deconvolution of electrospray mass-to-charge ratio spectra. J Am Soc Mass Spectrom 1998, 9(3):225–33. 10.1016/S1044-0305(97)00284-5View ArticlePubMedGoogle Scholar
- Kaur P, O'Connor P: Algorithms for Automatic Interpretation of High Resolution Mass Spectra. Journal of American Soceity of Mass Spectrometry 2006, 17: 459–68. 10.1016/j.jasms.2005.11.024View ArticleGoogle Scholar
- Hoopmann MR, Finney GL, MacCoss MJ: High-Speed Data Reduction, Feature Detection, and MS/MS Spectrum Quality Assessment of Shotgun Proteomics Data Sets Using High-Resolution Mass Spectrometry. Anal Chem 2007, 79(15):5620–5632. 10.1021/ac0700833PubMed CentralView ArticlePubMedGoogle Scholar
- Du PC, Angeletti RH: Automatic deconvolution of isotope-resolved mass spectra using variable selection and quantized peptide mass distribution. Analytical Chemistry 2006, 78(10):3385–3392. 10.1021/ac052212qView ArticlePubMedGoogle Scholar
- Li X-j, et al.: A Tool To Visualize and Evaluate Data Obtained by Liquid Chromatography-Electrospray Ionization-Mass Spectrometry. Anal Chem 2004, 76(13):3856–3860. 10.1021/ac035375sView ArticlePubMedGoogle Scholar
- Leptos KC, et al.: MapQuant: open-source software for large-scale protein quantification. Proteomics 2006, 6(6):1770–82. 10.1002/pmic.200500201View ArticlePubMedGoogle Scholar
- Jaffe JD, et al.: PEPPeR: A platform for experimental proteomic pattern recognition. Mol Cell Proteomics 2006.Google Scholar
- Smith CA, et al.: XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal Chem 2006, 78(3):779–87. 10.1021/ac051437yView ArticlePubMedGoogle Scholar
- Bellew M, et al.: A suite of algorithms for the comprehensive analysis of complex protein mixtures using high-resolution LC-MS. Bioinformatics 2006, 22(15):1902–1909. 10.1093/bioinformatics/btl276View ArticlePubMedGoogle Scholar
- Horn DM, Zubarev RA, McLafferty FW: Automated reduction and interpretation of high resolution electrospray mass spectra of large molecules. J Am Soc Mass Spectrom 2000, 11(4):320–32. 10.1016/S1044-0305(99)00157-9View ArticlePubMedGoogle Scholar
- Senko MW, Beu SC, McLafferty FW: Automated assignment of charge states from resolved isotopic peaks for multiplycharged ions. J Am Soc Mass Spectrom 1995, 6: 52–56. 10.1016/1044-0305(94)00091-DView ArticlePubMedGoogle Scholar
- Senko MW, Beu SC, McLafferty FW: Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions. J Am Soc Mass Spectrom 1995, 6: 229–233. 10.1016/1044-0305(95)00017-8View ArticlePubMedGoogle Scholar
- Senko MW, et al.: A high-performance modular data system for Fourier transform ion cyclotron resonance mass spectrometry. Rapid Commun Mass Spectrom 1996, 10(14):1839–44. Publisher Full Text 10.1002/(SICI)1097-0231(199611)10:14<1839::AID-RCM718>3.0.CO;2-VView ArticlePubMedGoogle Scholar
- Monroe ME, et al.: VIPER: an advanced software package to support high-throughput LC-MS peptide identification. Bioinformatics 2007, 23(15):2021–3. 10.1093/bioinformatics/btm281View ArticlePubMedGoogle Scholar
- Smith RD, et al.: An accurate mass tag strategy for quantitative and high-throughput proteome measurements. Proteomics 2002, 2(5):513–23. 10.1002/1615-9861(200205)2:5<513::AID-PROT513>3.0.CO;2-WView ArticlePubMedGoogle Scholar
- Mayampurath AM, et al.: DeconMSn- A Software Tool for Determination of Accurate Monoisotopic Masses of Parent Ions of Tandem Mass Spectra. Bioinformatics 2008, 24(7):1021–1023. 10.1093/bioinformatics/btn063PubMed CentralView ArticlePubMedGoogle Scholar
- Eng K, McCormack AL, Yates JR III: An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American Society of Mass Spectrometry 1994, 5: 976–989. 10.1016/1044-0305(94)80016-2View ArticleGoogle Scholar
- Harkewicz R, et al.: ESI-FTICR mass spectrometry employing data-dependent external ion selection and accumulation. J Am Soc Mass Spectrom 2002, 13(2):144–54. 10.1016/S1044-0305(01)00343-9View ArticlePubMedGoogle Scholar
- A Ramos-Fernandez, D Lopez-Ferrer, Vazquez J: Improved method for differential expression proteomics using trypsin-catalyzed 18O labeling with a correction for labeling efficiency. Mol Cell Proteomics 2007, 6(7):1274–86. 10.1074/mcp.T600029-MCP200View ArticleGoogle Scholar
- Ding J, et al.: Capillary LC coupled with high-mass measurement accuracy mass spectrometry for metabolic profiling. Anal Chem 2007, 79(16):6081–93. 10.1021/ac070080qView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.