Robust MS quantification method for phospho-peptides using 18O/16O labeling
© Andersen et al; licensee BioMed Central Ltd. 2009
Received: 15 February 2008
Accepted: 11 May 2009
Published: 11 May 2009
Quantitative measurements of specific protein phosphorylation sites, as presented here, can be used to investigate signal transduction pathways, which is an important aspect of cell dynamics. The presented method quantitatively compares peptide abundances from experiments using 18O/16O labeling starting from elaborated MS spectra. It was originally developed to study signaling cascades activated by amyloid-β treatment of neurons used as a cellular model system with relevance to Alzheimer's disease, but is generally applicable.
The presented method assesses, in complete cell lysates, the degree of phosphorylation of specific peptide residues from MS spectra using 18O/16O labeling. The abundance of each observed phospho-peptide from two cell states was estimated from three overlapping isotope contours. The influence of peptide-specific labeling efficiency was removed by performing a label swapped experiment and assuming that the labeling efficiency was unchanged upon label swapping. Different degrees of phosphorylation were reported using the fold change measure which was extended with a confidence interval found to reflect the quality of the underlying spectra. Furthermore a new way of method assessment using simulated data is presented. Using simulated data generated in a manner mimicking real data it was possible to show the method's robustness both with increasing noise levels and with decreasing labeling efficiency.
The fold change error assessable on simulated data was on average 0.16 (median 0.10) with an error-to-signal ratio and labeling efficiency distributions similar to the ones found in the experimentally observed spectra. Applied to experimentally observed spectra a very good match was found to the model (<10% error for 85% of spectra) with a high degree of robustness, as assessed by data removal. This new method can thus be used for quantitative signal cascade analysis of total cell extracts in a high throughput mode.
In order to better understand the vast complexity of the molecular events in biology, good measurement techniques and methodologies are required to investigate the biological processes as they unfold. The presented approach was developed to identify protein targets in Alzheimer's disease as part of the first steps in the drug discovery pipeline. The activated cellular signal transduction pathways were studied in a neuronal disease model immediately upon amyloid-β stimulation. Protein phosphorylation is a well known and extensively used signaling mechanism, so measuring specific changes in protein phosphorylation was used to inspect these pathways. To this end it is required to assess the degree of phosphorylation at a specific protein residue, which differs from the overall degree of phosphorylation of a given protein e.g. observed as a shift in isoelectric point on a gel.
The experimental setup uses stable isotope labeling by normal or heavy oxygen (16O or 18O) to differentiate between mixed treated and control peptides. This peptide mixture is analyzed by mass spectrometry in a single run. The proteins were extracted and the samples were analyzed in two steps. First the proteins were trypsinized and peptides identified in an MS/MS mode experiment from an unlabeled mixture of the treated and control samples. Secondly the proteins were extracted from the treated and untreated cells, an aliquot split was performed followed by 18O/16O C-terminal labeling by trypsination in two independent experiments (see Methods). This produced a 'direct' experiment, where the peptides from the treated cells were labeled with heavy oxygen (18O) and mixed with peptides from the untreated control cells labeled with light oxygen (16O), and an 'inverted' experiment where the labeling was swapped. The samples were subsequently analyzed by mass spectrometry and the acquired spectra were initially processed through a series of analysis steps (see Methods), which are not part of the method presented and therefore not detailed here.
Overview of Method Variables
We have chosen to report the primary end result as the fold change between treated and control. Choosing the fold change also means that the absolute intensities of treated and control are no longer needed, thus allowing the assumptions used to be reduced to one i.e. equal labeling efficiency between the two label swapped samples (see Methods).
The theoretical probability distribution between double, single and no incorporation of 18O upon labeling can also be derived. Here we have chosen not to make any such assumptions, but rather use additional experimental data in the form of label swapping and the limited assumption of equal labeling efficiency of a given peptide upon label swapping.
The basic analysis methodology starts with the fitting of theoretical isotopic contours for the heavy, mixed and light isotopes to the observed signal (see Figure 1). The fitting was carried out using multivariate linear regression where the squared error was minimized to yield the three intensities of labeled and unlabeled isotopes with their respective confidence intervals (see Methods). To reflect the quality of fit of the multivariate regression onto the resulting fold change, confidence intervals were calculated by a parametric bootstrap using the estimated covariance matrix of the three regression coefficients.
Some previous methods have relied on just a few peaks (2–4 peaks) to estimate the ratio between treated and control[5, 6] and in most cases the peptide sequence was assumed unknown. More relevant information is present in the remaining peaks which can be utilized to improve the peptide abundance estimate. The abundance estimate can be improved further if the peptide sequence and post-translational modifications are known, since the theoretical isotopic contour is improved, as presented here. The presented method can also use estimated isotopic contours directly if the peptides have not been identified prior to MS quantitation[7, 8]. This is the case for a method recently published by Eckel-Passow et al. which uses averagine (an imaginary average amino acid) and all of the MS-peaks presently discernible from background noise[3, 9]. They used the linear regression described by Mirgorodskaya et al. which is analogous to the one presented here to calculate the isotopic intensities. The sample used for their analysis was a simple mixture of two proteins in a 1:1 ratio, in contrast to proteins originating from a complete cell lysate as investigated here and by others[2, 11–13].
To assess a new peptide quantitation methodology ideally you would need to have a large and varied set of spectra with already known abundances. In short of such a data set we propose to use simulated spectra for a thorough model assessment and the limited experimental data for validation.
Noise was added to the simulated data with a distribution similar to the one found for the experimental spectra, thereby reflecting the experimental data from this perspective, resulting in less than 10% error for 85% of the spectra and less than 5% for 45% of the spectra. Another important parameter is the labeling efficiency which is defined as the ratio between the labeled isotope intensity (either fully or partially labeled) and the total intensity of the sample undergoing labeling (see Figure 2b and Equation 2). In most of the spectra a reasonably good labeling efficiency is observed (above 0.6), while there are three spectra pairs with low or absent labeling. We found that it was unrealistic to rescue these spectra with label efficiencies close to zero (data not shown) so for the simulated spectra the tail of the imposed label efficiency distribution was cut off at 0.4. The labeling efficiency shown for the simulated spectra is the one estimated by the model in the same manner as it is done for the experimentally observed spectra, which naturally differs from the one imposed when constructing the spectra. Finally in supplementary Figure [see Additional file 1] the estimated fold change distribution is shown for the experimental and simulated spectra. The average fold change was close to one, as one would expect from a biological perspective since only a small part of the phosphorylated proteins were expected to be involved in the cellular response. The experimental spectra had a trend towards positive fold changes, which we chose not to impose on the simulated data. Furthermore we also chose a larger spread of the fold change to provide enough data points for a statistical analysis of larger fold changes, which actually increase the average fold change error as shown below. Having inspected the experimentally observed spectra and ensured that the simulated spectra reflect their overall measurable characteristics, we can perform an in-depth assessment of the presented model's behavior using simulated spectra followed by an investigation of the experimentally observed spectra.
Fold change assessment
The fold change parameter reports the change in abundance between the treated and control samples in a symmetrical and straightforward manner by a simple ratio (see Methods). We used the simulated data presented above to assess the model estimates of fold change and its dependencies on noise, label efficiency and the absolute value of the fold change itself. We have chosen to display primarily the relationships with respect to the estimated parameters as they can be calculated from the experimentally observed spectra and thus relate the fold change to them.
Fold Change Confidence Interval Assessment
Based on the obvious assumption that a contributing intensity cannot be negative we were able to derive a set of constraints in order to assess whether a label swapped spectra pair was mutually consistent (see Methods). They specify that aD ≥ b I + c I and a I ≥ b D + c D , which basically means that a sample may not change intensity upon label swapping. They are important in order to identify problematic spectra pairs and were used as an initial filter on the experimental and simulated spectra pairs, thus flagging 8% and 5%, respectively.
Examples from Experimental Spectra
To illustrate the reported results on experimentally observed spectra pairs, a peptide with low labeling efficiency (0.61) is shown in Figure 5. Nevertheless a tight fold change interval could be estimated where FC = 2.49 with a 95% confidence interval of [2.05 3.21]. Another medium intensity peptide was observed as two separate species, sharing a phosphorylation site (pTPGTPGpTPSYPR and TPGTPGpTPSYPR). Using the presented methodology the phosphorylation site specific fold change was estimated for the shared Thr phosphorylation to 1.33 with a 95% confidence interval of [1.17 1.64] (see Methods). This peptide originates from the microtubule-associated protein 2 (map-2), the phosphorylation state of which has been found to influence cytoskeleton structure. In Alzheimer's, the disease studied, cytoskeleton integrity is known to be of great importance.
The presented method can estimate residue specific protein phosphorylation fold changes with associated confidence intervals from an 18O/16O label swapped spectra pair independently of the label efficiency, if minimal labeling occurs. The only prerequisite for the presented methodology is equal peptide specific label efficiency between label swapped experiments. The fold change for each peptide had a very low average and median error of 0.16 and 0.10, respectively, where the error was estimated from simulated spectra with similar noise and labeling efficiency distributions as the experimentally observed spectra. The labeling efficiency and spectrum noise were along with the absolute value of the fold change itself found to be the main determinants of fold change error, but are being continuously improved. Using the simulated data we were able to map this relationship in a contour plot, where the average fold change error is reported, given the estimated labeling efficiency and ε/S-ratio. The contour plot displays the non-linear relationship between labeling efficiency and ε/S-ratio and allows an additional estimate of the fold change error directly applicable to experimental observed spectra. A qualitative decrease in fold change estimation was found for a few poor spectra with an ε/S-ratio above 0.1 or estimated labeling efficiency below 0.5, which had an average fold change error of 0.50. If these spectra are excluded the average and median fold change drops to 0.13 and 0.09, respectively. Two major drawbacks where highlighted by Miyagi and Rao of the 18O/16O-labeling technique: variability in labeling and computational tools, both are addressed by the presented method.
The method presented was developed to support its application in high throughput experiments by quality check filtering based on spectra pair consistency and the accurate reporting of fold change confidence interval. The fold change confidence interval was found to summarize spectral quality nicely from several aspects such as noise, labeling efficiency and sample signal. We found that the confidence interval provided very valuable information, thus reducing the amount of time spent quality checking the underlying spectra manually. A matter of experimental reality is the occasional intrusion of an interfering species into a spectrum, which can be identified and eliminated using the leave-one-out robustness assessment presented. The described method can also be used for quantitation in a broader context such that different peptide isotope contributions (due to post-translational modifications or peptidase miscleavages) can be joined to yield a more accurate overall peptide fold change estimate and resulting protein quantitation. The presented method is thus useful for the elucidation of the constitution and dynamics of cellular signaling pathways by allowing the accurate measurement of residue-specific phosphorylation events.
Rat cortical neurons were treated with amyloid-β for a duration of 5 minutes and compared to untreated controls in order to identify proteins with differentially phosphorylated residues. The peptide labeling technique uses heavy (H218O) and normal (H216O) water as described by Yao et al. in a similar manner to others [2, 11]. This should ideally label all peptides except those originating from the C-terminal peptide of the protein. An aliquot split was performed in the biological sample and label swapping was performed in order to separate label efficiency from biological variation. The phosphorylated peptides were enriched using the Phos-Select IMAC resin (Sigma-Aldrich), separated using 2D-nanoflow-liquid chromatography with SCX step gradient as first dimension and RP as the second and finally analyzed using a nanoflow-ESI-QTOF-MS. The intensity of the isotopes of each peptide species considered was measured from MS spectra acquired in profile mode. All peptide spectra with an ion count above 40 were combined and centroided using the Mass Measure tool in MassLynx 4.0 (supplied by Waters), summing a window of retention time and multiple charge states. The isotope Modeling tool of MassLynx was used to compute the theoretical contour based on the identified peptide sequence and its post-translational modifications. The experiments produced a total of 52 spectra of which 4 were found to be inconsistent (see Label Swapped Spectra Pair Consistency) resulting in a total of 48 spectra pairs used in the presented analysis. All data are available as a table in the supporting material [see Additional file 2].
The present section describes the details needed to reproduce the data presented in the Result section covering noise level, labeling efficiency and fold change calculations. One million spectra were generated of which 53860 were found to be inconsistent (see Label Swapped Spectra Pair Consistency) and removed, thereby leaving 946140 which were characterized in Figure 2, 3 and 4. A simulated spectra for a peptide was generated by adding noise (here denominated imposed noise) to the sum of three idealized isotope contours (corresponding to light, mixed and heavy isotopes). The imposed noise was added from two flat distributions, since this was found to give an ε/S-ratio distribution similar to the one observed for the experimental spectra (see Figure 2). One distribution had a maximum of 37% noise and the other a maximum of 14% noise. The simulated spectra were generated by first choosing one of the two distributions for a spectrum e.g. 14%, then generating a random number for each peak ranging from +14% to -14% and adding this number to each peak in the spectra. This imposed noise level reported for each spectrum was the actual noise added by the random number generator, which varies from one simulated spectrum to another. The idealized spectra used were the same as those generated as part of the analysis of the 48 experimental spectra pairs used as example data.
Multivariate Linear Regression
The multivariate linear model fits three (light, mixed and heavy) stepwise shifted identical theoretical isotopic contours (T0–T4) to an observed spectrum (S0–S8) with intensities a, b and c finding the least squares solution (see Figure 1). The model also includes an adaptable background level (Kbg) resulting in a total of four parameters estimated from nine observables. An adaptation of the multivariate regression algorithm was required in order to produce realistic (non-negative) isotope intensities. If an estimated isotope intensity parameter (a, b or c) came out negative the regression was redone while fixing it to zero. In Table 1 an overview is given of the variables used. The statistical analyses were performed in MatLab and scripts are available upon request.
Label Efficiency Subtraction
The labeling reaction yield has been observed to vary considerably between the various phospho-peptides studied. Some have a very high labeling reaction yield, most are labeled well and yet some don't seem to be labeled at all [see Additional file 3]. With the single assumption of equal labeling reaction yield upon label swapping we derive the ratio between treated and control, which is all that is needed to calculate the fold change and is independent of the labeling efficiency of the peptide in question. Naturally when the labeling efficiency becomes very low or absent the fold change can no longer be estimated properly, but this should be reflected in the fold change confidence interval.
Eq. 2 (I) The primary assumption of this paper is defined here and states that the peptide label efficiency remains unchanged between Direct and Inverted experiments (i.e. label swapping). (II and III) The coefficients ( a , b and c ) used in the theoretical framework (see Table 1) are derived from the estimated coefficients by normalization, such that the total intensities ( I D and I I ) between Direct and Inverted experiments are equal. The expectancy, E() , of an estimated parameter is its theoretical counterpart after normalization. See Table 1 for an explanation of the theoretical quantities used.
The ratio is all that is needed in order to calculate the fold change (see Equation 3) and is independent of the peptide specific labeling efficiency.
The number of assumptions incorporated into the model has been minimized to avoid inconsistencies due to noise from the experimental measuring process. An example is the labeling efficiency which could have been formulated in a much more constraining manner, where the proportion of each isotope species (heavy, mixed and light) would have been assumed unvaried between the two label swapped experiments. In effect when noise is present in the experimental data or in the simulated data they rarely conform to this demanding assumption (data not shown).
Definition of Fold Change
Eq. 3 The fold change definition applied ensures a symmetrical representation of an increase in abundance from control to treated (e.g. 1.25 means 25% increase) with respect to a decrease in abundance (e.g. -1.25 means 25% decrease). The ratio between treated (I T ) and control (I C ) intensities was estimated using Equation (1).
Fold Change Confidence Interval Estimation
The basic assumption used for the confidence interval calculation of the fold change was that the parameters estimated by the linear regression (a, b and c) were following a multivariate Gaussian distribution. Although this is known not to be exact it is a workable solution which upon sufficient sampling becomes reliable. The regression fit method used to fit MS spectra was as described by Chatterjee and Hadi which also estimates the covariance matrix ∑ of the returned parameters (a, b and c). Subsequently a bootstrap calculation was performed in order to estimate the fold change distribution. 10000 random values (ak, bk and ck) were drawn from a multivariate Gaussian distribution with mean vector [a, b, c] and covariance matrix ∑; for each random set (ak, bk and ck) the fold change FCk was calculated using Equations 1 and 3. The 95% confidence interval was estimated by making use of the bootstrap generated fold change distribution.
Label Swapped Spectra Pair Consistency
Phosphorylation Site Specific Fold Change Estimation
When multiple peptides are joined to yield a single fold change an added experimental assumption is made of equal detectability of the involved peptide species. While for 18O labeling the change is believed to be negligible it is known not always to be so between different peptide species in general as described by Tang et al..
Generally in the figures reported we have used bin counting for optimal visualization. This means that when x versus y is plotted, each spot or circle reports the average x- and average y-value for a given x-axis percentile of the data e.g. if there are 50 spots the first spot reports the average x, y-value for the 2% of data with the lowest x-values. This enables a good estimation of the x-, y-values and also shows the data distribution without flooding the plot with each individual x, y pair. In the special case of reporting fold changes the mean and standard deviation were calculated on "zero-centered fold changes" (zero centered fold change ∈ (-∞, +∞)) in order to perform the statistics on a continuous value set i.e. before calculation of mean or standard deviation, one was added to all fold change values below -1 and one was subtracted from all the others. After calculation of the mean it was transformed back into a real fold change (fold change ∈ (-∞, -1) ∪ (1, +∞)) for reporting or visualization.
This work has been supported by a grant from the EC FP6 (ADIT, contract n. LSHB-CT-2005-511977)
- Williamson R, Scales T, Clark BR, Gibb G, Reynolds CH, Kellie S, Bird IN, Varndell IM, Sheppard PW, Everall I, et al.: Rapid tyrosine phosphorylation of neuronal proteins including tau and focal adhesion kinase in response to amyloid-beta peptide exposure: involvement of Src family protein kinases. J Neurosci 2002, 22: 10–20.PubMedGoogle Scholar
- Ramos-Fernandez A, Lopez-Ferrer D, Vazquez J: Improved method for differential expression proteomics using trypsin-catalyzed 18O labeling with a correction for labeling efficiency. Mol Cell Proteomics 2007, 6: 1274–1286. 10.1074/mcp.T600029-MCP200View ArticlePubMedGoogle Scholar
- Eckel-Passow JE, Oberg AL, Therneau TM, Mason CJ, Mahoney DW, Johnson KL, Olson JE, Bergen HR III: Regression analysis for comparing protein samples with 16O/18O stable-isotope labeled mass spectrometry. Bioinformatics 2006, 22: 2739–2745. 10.1093/bioinformatics/btl464View ArticlePubMedGoogle Scholar
- Chatterjee S, Hadi A: Influential Observations, High Leverage Points, and Outliers in Linear Regression. Statistical Science 1986, 1(3):p379–416. 10.1214/ss/1177013622View ArticleGoogle Scholar
- Johnson KL, Muddiman DC: A method for calculating 16O/18O peptide ion ratios for the relative quantification of proteomes. J Am Soc Mass Spectrom 2004, 15: 437–445. 10.1016/j.jasms.2003.11.016View ArticlePubMedGoogle Scholar
- Heller M, Mattou H, Menzel C, Yao X: Trypsin catalyzed 16O-to-18O exchange for comparative proteomics: tandem mass spectrometry comparison using MALDI-TOF, ESI-QTOF, and ESIion trap mass spectrometers. J Am Soc Mass Spectrom 2003, 14: 704–718. 10.1016/S1044-0305(03)00207-1View ArticlePubMedGoogle Scholar
- Horn DM, Zubarev RA, McLafferty FW: Automated reduction and interpretation of high resolution electrospray mass spectra of large molecules. J Am Soc Mass Spectrom 2000, 11: 320–332. 10.1016/S1044-0305(99)00157-9View ArticlePubMedGoogle Scholar
- Senko MWBSCaMFW: Determination of Monoisotopic Masses and Ion Populations for Large Biomolecules from Resolved Isotopic Distributions. J Am Soc Mass Spectrom 1995, 6: 229–233. 10.1016/1044-0305(95)00017-8View ArticlePubMedGoogle Scholar
- Mason CJ, Therneau TM, Eckel-Passow JE, Johnson KL, Oberg AL, Olson JE, Nair KS, Muddiman DC, Bergen HR III: A method for automatically interpreting mass spectra of 18O-labeled isotopic clusters. Mol Cell Proteomics 2007, 6: 305–318.View ArticlePubMedGoogle Scholar
- Mirgorodskaya OA, Kozmin YP, Titov MI, Korner R, Sonksen CP, Roepstorff P: Quantitation of peptides and proteins by matrix-assisted laser desorption/ionization mass spectrometry using (18)O-labeled internal standards. Rapid Commun Mass Spectrom 2000, 14: 1226–1232. 10.1002/1097-0231(20000730)14:14<1226::AID-RCM14>3.0.CO;2-VView ArticlePubMedGoogle Scholar
- Lane CS, Wang Y, Betts R, Griffiths WJ, Patterson LH: Comparative cytochrome P450 proteomics in the livers of immunodeficient mice using 18O stable isotope labeling. Mol Cell Proteomics 2007, 6: 953–962. 10.1074/mcp.M600296-MCP200PubMed CentralView ArticlePubMedGoogle Scholar
- Petyuk VA, Qian WJ, Chin MH, Wang H, Livesay EA, Monroe ME, Adkins JN, Jaitly N, Anderson DJ, Camp DG, et al.: Spatial mapping of protein abundances in the mouse brain by voxelation integrated with high-throughput liquid chromatography-mass spectrometry. Genome Res 2007, 17: 328–336. 10.1101/gr.5799207PubMed CentralView ArticlePubMedGoogle Scholar
- Smith JR, Olivier M, Greene AS: Relative quantification of peptide phosphorylation in a complex mixture using 18O labeling. Physiol Genomics 2007, 31: 357–363. 10.1152/physiolgenomics.00096.2007PubMed CentralView ArticlePubMedGoogle Scholar
- Ozer RS, Halpain S: Phosphorylation-dependent localization of microtubule-associated protein MAP2c to the actin cytoskeleton. Mol Biol Cell 2000, 11: 3573–3587.PubMed CentralView ArticlePubMedGoogle Scholar
- Mandelkow E: Alzheimer's disease. The tangled tale of tau. Nature 1999, 402: 588–589. 10.1038/45095View ArticlePubMedGoogle Scholar
- Korbel S, Schumann M, Bittorf T, Krause E: Relative quantification of erythropoietin receptor-dependent phosphoproteins using in-gel 18O-labeling and tandem mass spectrometry. Rapid Commun Mass Spectrom 2005, 19: 2259–2271. 10.1002/rcm.2054View ArticlePubMedGoogle Scholar
- Miyagi M, Rao KC: Proteolytic 18O-labeling strategies for quantitative proteomics. Mass Spectrom Rev 2007, 26: 121–136. 10.1002/mas.20116View ArticlePubMedGoogle Scholar
- Raggiaschi R, Gotta S, Terstappen GC: Phosphoproteome analysis. Biosci Rep 2005, 25: 33–44. 10.1007/s10540-005-2846-0View ArticlePubMedGoogle Scholar
- Yao X, Afonso C, Fenselau C: Dissection of proteolytic 18O labeling: endoprotease-catalyzed 16O-to-18O exchange of truncated peptide substrates. J Proteome Res 2003, 2: 147–152. 10.1021/pr025572sView ArticlePubMedGoogle Scholar
- Efron B, Tibshirani R: An Introduction to the Bootstrap. Chapman & Hall/CRC press, Florida; 1994.Google Scholar
- Tang H, Arnold RJ, Alves P, Xun Z, Clemmer DE, Novotny MV, Reilly JP, Radivojac P: A computational approach toward label-free protein quantification using predicted peptide detectability. Bioinformatics 2006, 22: e481-e488. 10.1093/bioinformatics/btl237View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.