Envelope: interactive software for modeling and fitting complex isotope distributions
© Sykes and Williamson; licensee BioMed Central Ltd. 2008
Received: 31 July 2008
Accepted: 20 October 2008
Published: 20 October 2008
An important aspect of proteomic mass spectrometry involves quantifying and interpreting the isotope distributions arising from mixtures of macromolecules with different isotope labeling patterns. These patterns can be quite complex, in particular with in vivo metabolic labeling experiments producing fractional atomic labeling or fractional residue labeling of peptides or other macromolecules. In general, it can be difficult to distinguish the contributions of species with different labeling patterns to an experimental spectrum and difficult to calculate a theoretical isotope distribution to fit such data. There is a need for interactive and user-friendly software that can calculate and fit the entire isotope distribution of a complex mixture while comparing these calculations with experimental data and extracting the contributions from the differently labeled species.
Envelope has been developed to be user-friendly while still being as flexible and powerful as possible. Envelope can simultaneously calculate the isotope distributions for any number of different labeling patterns for a given peptide or oligonucleotide, while automatically summing these into a single overall isotope distribution. Envelope can handle fractional or complete atom or residue-based labeling, and the contribution from each different user-defined labeling pattern is clearly illustrated in the interactive display and is individually adjustable. At present, Envelope supports labeling with 2H, 13C, and 15N, and supports adjustments for baseline correction, an instrument accuracy offset in the m/z domain, and peak width. Furthermore, Envelope can display experimental data superimposed on calculated isotope distributions, and calculate a least-squares goodness of fit between the two. All of this information is displayed on the screen in a single graphical user interface. Envelope supports high-quality output of experimental and calculated distributions in PNG or PDF format. Beyond simply comparing calculated distributions to experimental data, Envelope is useful for planning or designing metabolic labeling experiments, by visualizing hypothetical isotope distributions in order to evaluate the feasibility of a labeling strategy. Envelope is also useful as a teaching tool, with its real-time display capabilities providing a straightforward way to illustrate the key variable factors that contribute to an observed isotope distribution.
Envelope is a powerful tool for the interactive calculation and visualization of complex isotope distributions for comparison to experimental data. It is available under the GNU General Public License from http://williamson.scripps.edu/envelope/.
One of the challenges of quantitative mass spectrometry is dealing with this rich and complex distribution of peaks. Several software packages exist for the analysis of MS data, including OpenMS , MapQuant , MASPECTRAS , msInspect , MzMine , SpecArray , TPP , Viper , Superhirn , XCMS , mMass  and isodist . Each of these programs focuses on the high-throughput processing of large datasets, which is crucial to analyzing proteomic MS data. There is a more limited set of tools aimed at calculating and visualizing individual isotope distributions. These include iMass , Isotopica , MS-Isotope  and Isotopident . However none of these tools are interactive, offer fitting of the entire isotope distribution, and a flexible definition of the labeling pattern.
Here, the program Envelope is described, offering a more sophisticated approach to the visualization of calculated isotope distributions than any existing software package. Any number of different labeling patterns and labeled species can be accommodated for both peptides and oligonucleotides. In addition, Envelope offers the ability to interact with data by directly comparing calculated distributions with experimental data. Envelope features a user-friendly interface, and the displayed isotope distributions change in near real-time in response to user-controlled changes in the labeling parameters using continuously variable slider controls or text input boxes. Envelope is a flexible tool designed to help researchers interpret the complex spectra they may see in their experiments, and to help plan new experiments by illustrating which combinations of labels will yield interpretable spectra. Envelope will also be useful in a classroom environment to help explain the principles of MS and isotope distributions since no scripting or command line knowledge is necessary. Envelope is a powerful MS analysis package that is well suited for a variety of research and educational purposes.
Source Code and Algorithms
Envelope is an open source desktop application. The user interface and visualization components are written in Objective-C, taking advantage of the Mac OS X; Cocoa framework, while the isotope distribution calculation core is written in C. Isotope distributions are calculated using the Fourier Transform convolution (FTC) method previously described by Rockwood et al [22, 23] and subsequently extended by Sperling et al . Briefly, MS datasets have the mass/charge (m/z) ratio as the independent variable (m-domain), and intensity as the dependent variable. A spectrum of N real points in the m-domain S(m) has a conjugate Fourier representation as a frequency in the μ-domain s(μ), where the μ-domain representation is complex. The two representations can be interconverted by forward Fourier Transforms (FT) and inverse Fourier Transforms (IFT). The isotope distribution for a species with a particular labeling pattern is first calculated in the μ-domain, followed by IFT to obtain the m-domain spectrum. FTC was chosen over the polynomial method  as it is an exact method, and lends itself well to the calculation of both atom-based and residue-based labeling patterns . Fourier Transforms were implemented using the FFTW library . The Envelope executable (Additional file 1) and source code (Additional file 2) are both available for download as additional files or from the Envelope website.
The isotope distribution for a species with a particular labeling pattern has a unit spectrum S(m) i as well as an amplitude A i by which the unit spectrum is multiplied. The labeling patterns are defined in terms of fractional 2H, 13C, and 15N content, and can be further defined in terms of all atoms of a particular type for the entire molecule (atom-based labeling) or only atoms belonging to specific residue types (residue-based labeling). In order to account for hydrogen exchange, two categories of hydrogen labeling can be defined for subsets of a residue's hydrogen atoms, each with an independent fraction of 2H. Hydrogen atoms beyond the sum of these two categories are assumed to have natural abundance isotope content.
It would be useful for Envelope to be able to interact with the wide variety of existing MS analysis programs. To this end a subset of Envelope's functionality can be manipulated by script, allowing other programs to use Envelope as a frontend for the display of calculated and experimental isotope distributions.
Upon opening, Envelope presents the user with the default peptide NVLP, and two entries in the list of spectra, one entry for a calculated spectrum with natural abundance isotope levels, and one entry for the sum of all calculated spectra, which cannot be deleted. Pressing the Go button at this point will display the natural abundance isotope distribution for the peptide NVLP, z = +1. The Go button may be activated at any point to visualize changes that have occurred, or the Live checkbox may be selected and spectra will automatically be recalculated in response to user-initiated changes. The user may add additional calculated spectra using the "+" button, and the labeling pattern may be defined at any time. Only one calculated spectrum is needed to begin visualizing isotope distributions, though a distinct calculated spectrum is required for each species generated in an experiment. In quantitative proteomics experiments, there are typically one unlabeled and at least one labeled species. Once the necessary calculated spectra have been added, experimental data can be loaded by the "+" button, or via the File menu. The sequence, molecule type and charge will each need to be adjusted to match the data. The user may then adjust the labeling patterns for the calculated species, their amplitudes, and the baseline, Gaussian width and offset in order to fit the sum of the calculated spectra to the data by monitoring the decrease in χ2. Envelope will display any desired ratio of amplitudes of calculated spectra defined by the user in order to quantitatively compare the amounts of different species. This value dynamically updates during the fitting process, always reflecting the current amplitudes.
15N Pulse Labeling
For this peptide f = 0.402, indicating that approximately 40% of the S2 protein in the sample was labeled with 50% 15N.
In a similar metabolic labeling experiment the ribosomal RNA was purified, digested with RNAse T1 and submitted for LC/MS analysis in negative ion mode, without the addition of a fully 15N labeled species. The result is a mixture of just two species, one unlabeled and the other partially labeled (50% 15N). The experimental spectrum resulting from these two species, as well as the calculated isotope distribution (16S RNA, residues 766–769, AAAG, z = -2) are shown in Figure 3b. The amplitudes of the unlabeled and partially labeled species are 0.550 and 0.350 respectively, and f = 0.389. The data used is available for download (Additional File 3).
Pulse Labeling with Amino Acids
While Envelope is not intended to replace high-throughput batch analysis programs for the large-scale fitting of data in a research environment, it has its own place alongside them. Envelope is useful in planning experiments by visualizing the isotope distributions produced by potential labeling patterns. By exploring the predicted isotope distributions before performing any actual experiments, optimal labeling patterns can be determined which help increase the quality of the final data. Envelope also allows the user to fit calculated distributions to experimental spectra that may be poorly fit by automatic fitting routines. This is especially true in the case of spectral overlap, where to the human eye it may be clear that adjacent peaks are overlapping the signal from the peptide of interest, but a computer algorithm cannot resolve this situation. A least-squares fit for example will increase the baseline of the calculated spectrum or amplitude of a species in an attempt to fit overlapping experimental peaks where no calculated distribution exists, skewing the final result. In a case where there is an unknown extent of isotope enrichment, Envelope can be used to fit data for a few peaks by hand, determining the actual enrichment to be used as a starting value for a batch fit using another program.
More than just a research tool, Envelope is useful as a learning aid, both for individual researchers and in the classroom environment. By defining a labeling pattern and immediately seeing the calculated distribution onscreen, then watching the distribution change in response to adjustments to the labeling pattern, one can easily understand the different factors that contribute to a given isotope distribution. There are two important concepts that must be distinguished in metabolic labeling experiments, which are the concepts of fraction labeled (f), defined here as the relative amount of the labeled species compared to the unlabeled species, and the isotope abundance which is the fractional isotope content of a particular labeled species. For students being introduced to the subject for the first time, these two distinct quantities can be confusing as they both deal with a different kind of extent of labeling. Using Envelope, these two concepts can be directly illustrated and explored as a didactic exercise. Envelope features slider bars to manipulate the isotope enrichment for a species while observing the effect on the spectrum in the interactive display, and checkboxes to define the fraction labeled in terms of the amplitudes of different species. These changes are displayed in near real-time, dependent on processor speed and complexity of the system. This immediate feedback is very useful to illustrate how the parameters affect the isotope distribution, and has been very effective as a live tool in seminars describing the analysis of LC/MS data from metabolic labeling experiments.
Envelope is a powerful tool for the interactive calculation and visualization of isotope distributions that is capable of simultaneously calculating distributions for an arbitrary number of species of a single peptide or oligonucleotide, each with a different labeling pattern. Envelope can visualize experimental mass spectra, allowing the user to perform manual least-squares fits of calculated distributions to real experimental data. Envelope is useful for small-scale data analysis and planning experiments. Moreover it can be used as a teaching tool, and its user-friendly and interactive qualities make it well suited for use by research groups, in seminars, or in the classroom.
Availability and requirements
Project name: Envelope
Project home page: http://williamson.scripps.edu/envelope/
Operating system(s): Mac OS X; 10.4 or 10.5, Intel and Power PC
Programming language: C and Objective-C
Other requirements: Executable: None. Source: Compilation of the Envelope source requires the Apple Developer Tools, freely available from http://connect.apple.com. Envelope makes use of the FFTW library, freely available under the terms of the GNU General Public License from http://www.fftw.org.
License: GNU General Public License (GPL)
Any restrictions to use by non-academics: See GPL license for details
Fourier Transform convolution
Inverse Fourier Transform
Liquid chromatography coupled mass spectrometry
Electrospray ionization time-of-flight.
This work was supported by NIH grant R37-GM53757 to J.R.W. M.T.S. was supported by grant number F32-GM083510 from the N.I.G.M.S. The authors wish to thank Edit Sperling for providing the experimental data and Stephen Chen for help testing Envelope and providing valuable feedback on its implementation and design.
- Smith JC, Lambert J-P, Elisma F, Figeys D: Proteomics in 2005/2006: developments, applications and challenges. Anal Chem 2007, 79: 4325–4343. 10.1021/ac070741jView ArticlePubMedGoogle Scholar
- Oda Y, Huang K, Cross FR, Cowburn D, Chait BT: Accurate quantitation of protein expression and site-specific phosphorylation. PNAS 1999, 96: 6591–6596. 10.1073/pnas.96.12.6591PubMed CentralView ArticlePubMedGoogle Scholar
- Doherty MK, McLean L, Beynon RJ: Avian proteomics: advances, challenges and new technologies. Cytogenet Genome Res 2007, 117: 358–369. 10.1159/000103199View ArticlePubMedGoogle Scholar
- Doherty MK, Whitehead C, McCormack H, Gaskell SJ, Beynon RJ: Proteome dynamics in complex organisms: using stable isotopes to monitor individual protein turnover rates. Proteomics 2005, 5: 522–533. 10.1002/pmic.200400959View ArticlePubMedGoogle Scholar
- Hayter JR, Doherty MK, Whitehead C, McCormack H, Gaskell SJ, Beynon RJ: The subunit structure and dynamics of the 20S proteasome in chicken skeletal muscle. Mol Cell Proteomics 2005, 1370–1381.Google Scholar
- Sturm M, Bertsch A, Gröpl C, Hildebrandt A, Hussong R, Lange E, Pfeifer N, Schulz-Trieglaff O, Zerck A, Reinert K, Kohlbacher O: OpenMS – an open-source software framework for mass spectrometry. BMC Bioinformatics 2008, 9: 163. 10.1186/1471-2105-9-163PubMed CentralView ArticlePubMedGoogle Scholar
- Leptos KC, Sarracino DA, Jaffe JD, Krastins B, Church GM: MapQuant: open-source software for large-scale protein quantification. Proteomics 2006, 6(6):1770–1782. 10.1002/pmic.200500201View ArticlePubMedGoogle Scholar
- Hartler J, Thallinger GG, Stocker G, Sturn A, Burkard TR, Körner E, Rader R, Schmidt A, Mechtler K, Trajanoski A: MASPECTRAS: a platform for management and analysis of proteomics LC-MS/MS data. BMC Bioinformatics 2007, 8: 197. 10.1186/1471-2105-8-197PubMed CentralView ArticlePubMedGoogle Scholar
- Bellew M, Coram M, Fitzgibbon M, Igra M, Randolph R, Wang P, May D, Eng J, Fang R, Lin C, Chen J, Goodlett D, Whiteaker J, Paulovich A, McIntosh M: A suite of algorithms for the comprehensive analysis of complex protein mixtures using high-resolution LC-MS. Bioinformatics 2006, 22: 1902–1909. 10.1093/bioinformatics/btl276View ArticlePubMedGoogle Scholar
- Katajamaa M, Miettinen J, Oresic M: MZmine: toolbox for processing and visualization of mass spectrometry based molecular profile data. Bioinformatics 2006, 22: 634–636. 10.1093/bioinformatics/btk039View ArticlePubMedGoogle Scholar
- Li X-J, Yi EC, Kemp CJ, Zhang H, Aebersold R: A software suite for the generation and comparison of peptide arrays from sets of data collected by liquid chromatography-mass spectrometry. Mol Cell Proteomics 2005, 4: 1328–1340. 10.1074/mcp.M500141-MCP200View ArticlePubMedGoogle Scholar
- Keller A, Eng J, Zhang N, Li X-J, Aebersold R: A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol Syst Biol 2005, 1: 0017. 10.1038/msb4100024View ArticlePubMedGoogle Scholar
- Monroe ME, Toliæ N, Jaitly N, Shaw JL, Adkins JN, Smith RD: VIPER: an advanced software package to support high-throughput LC-MS peptide identification. Bioinformatics 2007, 23: 2021–2023. 10.1093/bioinformatics/btm281View ArticlePubMedGoogle Scholar
- Mueller LN, Rinner O, Schmidt A, Letarte S, Bodenmiller B, Brusniak M-Y, Vitek O, Aebersold R, Müller M: SuperHirn – a novel tool for high resolution LC-MS-based peptide/protein profiling. Proteomics 2007, 7: 3470–3480. 10.1002/pmic.200700057View ArticlePubMedGoogle Scholar
- Smith CA, Want EJ, O'Maille G, Abagyan R, Siuzdak G: XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal Chem 2006, 78: 779–787. 10.1021/ac051437yView ArticlePubMedGoogle Scholar
- Strohalm M, Hassman M, Kosata B, Kodícek M: mMass data miner: an open source alternative for mass spectrometric data analysis. Rapid Commun Mass Spectrom 2008, 22(6):905–908. 10.1002/rcm.3444View ArticlePubMedGoogle Scholar
- Sperling E, Bunner AE, Sykes MT, Williamson JR: Quantitative analysis of isotope distributions in proteomic mass spectrometry using least-squares Fourier transform convolution. Anal Chem 2008, 80: 4906–4917. 10.1021/ac800080vPubMed CentralView ArticlePubMedGoogle Scholar
- Fernandez-de-Cossio J, Gonzalez LJ, Satomi Y, Betancourt L, Ramos Y, Huerta V, Amaro A, Besada V, Padron G, Minamino N, Takao T: Isotopica: a tool for the calculation and viewing of complex isotopic envelopes. Nucleic Acids Res 2004, (32 Web Server):W674–8. 10.1093/nar/gkh423Google Scholar
- Rockwood AL, van Orden SL, Smith RD: Rapid Calculation of Isotope Distributions. Anal Chem 1995, 67: 2699–2704. 10.1021/ac00111a031View ArticleGoogle Scholar
- Rockwood AL: Relationship of Fourier Transforms to Isotope Distribution Calculations. Rapid Commun Mass Sp 1995, 9: 103–105. 10.1002/rcm.1290090122View ArticleGoogle Scholar
- Yergey J, Heller D, Hansen G, Cotter RJ, Fenselau C: Isotopic Distributions in Mass Spectra of Large Molecules. Anal Chem 1983, 55: 353–356. 10.1021/ac00253a037View ArticleGoogle Scholar
- Frigo M, Johnson SG: The design and implementation of FFTW3. P IEEE 2005, 93: 216–231. 10.1109/JPROC.2004.840301View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.