High-throughput peptide quantification using mTRAQ reagent triplex

Background Protein quantification is an essential step in many proteomics experiments. A number of labeling approaches have been proposed and adopted in mass spectrometry (MS) based relative quantification. The mTRAQ, one of the stable isotope labeling methods, is amine-specific and available in triplex format, so that the sample throughput could be doubled when compared with duplex reagents. Methods and results Here we propose a novel data analysis algorithm for peptide quantification in triplex mTRAQ experiments. It improved the accuracy of quantification in two features. First, it identified and separated triplex isotopic clusters of a peptide in each full MS scan. We designed a schematic model of triplex overlapping isotopic clusters, and separated triplex isotopic clusters by solving cubic equations, which are deduced from the schematic model. Second, it automatically determined the elution areas of peptides. Some peptides have similar atomic masses and elution times, so their elution areas can have overlaps. Our algorithm successfully identified the overlaps and found accurate elution areas. We validated our algorithm using standard protein mixture experiments. Conclusions We showed that our algorithm was able to accurately quantify peptides in triplex mTRAQ experiments. Its software implementation is compatible with Trans-Proteomic Pipeline (TPP), and thus enables high-throughput analysis of proteomics data.


Background
Introduction of mass spectrometry (MS) provides massive biological information of proteins for both qualitative and quantitative analysis [1]. Recently, quantitative analyses have become of particular interest in proteomics research [2]. To determine the expressional differences of proteins across samples representing different physiological or disease states, various experimental approaches have been developed: spectral counting, stable isotope labeling, and label-free quantification [3].
In this paper, we focus on the isotope label mTRAQ, which is a nonisobaric variant of the iTRAQ and was originally designed for multiple reaction monitoring (MRM) [20]. The mTRAQ labels were first designed in two chemically identical versions. The heavy-label is identical to the iTRAQ 117 label and its mass is 145 Da. The light-label is chemically identical to the heavy-label, but it has no 13C or 15N, so its mass is 141 Da. They are labeled at lysine residue and N-terminal. We verified that the mTRAQ is a powerful isotope label for MSbased relative quantification [8], and developed a new algorithm to improve the accuracy of peptide quantification in mTRAQ labeling based MS experiments [21]. Recently, the mTRAQ has become available in triplex format, where the label with 149 Da is added.
One of the major obstacles to accurate peptide quantification is the overlap of isotopic clusters. There are two types of overlap problems, one is the overlap between differently labeled peptides, and the other is the overlap between chemically different peptides. The former can happen when the mass difference between labels is very small. In mTRAQ experiments, the mass difference between differently labeled peptides is 4 Da if the original peptide has no lysine, so it is important to separate their isotopic clusters correctly. The latter could be found in all kinds of MS-based experiments. For peptide quantification, most of the times we are interested in relative quantification of peptides whose amino acid sequences are known. When we know the sequences of peptides of interest, there are better chances to recognize the overlaps from differential labeling by comparing them to the theoretical isotopic distributions.
In this manuscript, we present a new data analysis algorithm for peptide quantification in triplex mTRAQ experiments. It is an extension of the algorithm for duplex mTRAQ experiments [21]. We identify isotopic clusters of triplex labeled peptides and separate their intensities using cubic equation modelling when there are overlaps. We also designed an automatic determination algorithm for the elution area of peptides, which could recognize the overlap between chemically different peptides. We demonstrate the performance of our algorithm using standard protein mixture experiments.

Mass spectrometric analyses of mTRAQ labeled samples
Labeled sample mixtures were reconstituted in 0.4% acetic acid and an aliquot (~1 μg) was injected to a reversed-phase Magic C18aq column (15 cm x 75 μm) on an Eksigent multi-dimensional liquid chromatography (MDLC) system at the flow rate of 300 nL/min. The column was equilibrated with 95% buffer A (0.1% formic acid in H 2 O) + 5 % buffer B (0.1% formic acid in acetonitrile) prior to use. The peptides were eluted with a linear gradient of 10 to 40% Buffer B over 40 min.
The high performance liquid chromatography (HPLC) system was coupled to a linear trap quadrupole (LTQ) XL-Orbitrap mass spectrometer (Thermo Scientific, San Jose, CA, U.S.A.). The spray voltage was set to 1.9 kV, and the temperature of the heated capillary was set to 250°C. Survey full-scan MS spectra (m/z 300-2,000) were acquired in the Orbitrap with 1 microscan and a resolution of 100,000 allowing the preview mode for precursor selection and charge-state determination. MS/ MS spectra of the five most intense ions from the preview survey scan were acquired in the ion-trap concurrently with full-scan acquisition in the Orbitrap with the following options: isolation width, ±10 ppm; normalized

Overview of the algorithm
Our algorithm is designed to be executed within TPP. For each LC/MS experiment, TPP generates a pepXML file which contains a list of peptides with sequences, tandem scans, charges, and modifications. Our algorithm calculates medium to light (M/L) and heavy to light (H/L) ratios of peptides in pepXML files and produces new pepXML files that can be used for further analysis. For each peptide, our algorithm first determines its elution area. It then identifies triplex isotopic clusters and calculates M/L and H/L ratios for each MS scan contained in the elution area. Finally, each of the set of M/L and H/L ratios is integrated based on linear regression.

Model of overlapping isotopic clusters
We made a schematic model of overlapping triplex isotopic clusters, which is an extension of the model in our previous work ( Figure 1) [21]. We assumed that an isotopic cluster of a peptide has 8 or less peaks. Such an assumption is reasonable for peptides whose masses are less than 4000 Da because the relative intensity of the ninth peak in the theoretical distribution of a 4000 Da peptide is only 0.56% according to an averagine model [23]. Under this assumption, an overlap exists only if a peptide has no lysine. In this case, the mass difference between labeled peptides is 4 Da, thus the intensity of the kth peak I k is given as follows: where n is the number of peaks in the isotopic distribution of a peptide, L k , M k , and H k are the intensities of the kth peaks of the isotopic distributions of the light, medium, and heavy-labeled peptides, respectively.
Let a be the M/L ratio and b be the H/L ratio. For 1 ≤ k ≤ 4, it is easy to show from equation (1). Using equation (2), we induced three equations From equations (4) and (5), we obtain a cubic equation for b: Solving equation (6), we obtain up to three candidate values for b. Then, by substituting the candidates into equation (3) and solving it, we obtain up to two candidate values for a. (Substituting candidates for b into equation (4) may lead to an abnormal a value because I k+12 could possibly be very small and inaccurate in its value. Substituting into Equation (5) could also be problematic because a small value of b could cause an inaccurate a value.) To select the most accurate ratio pair, we define an error function as follows: where T k is the intensity of the kth peak of the theoretical isotopic distribution of the peptide. (The EMASS algorithm was used to calculate T k values [24].) The error value should be very small for the correct ratio pair because L k+4 /L k , M k+4 /M k , and H k+4 /H k are theoretically the same as T k+4 /T k . Therefore, we calculated the error value for each candidate pair and select the pair with the lowest error value. After all pairs for 1 ≤ k ≤ 4 are selected, we can calculate the M/L ratio

Determination of the elution areas of peptides
In most LC/MS experiments, tandem MS scans are acquired using dynamic exclusion (DE). For each MS/ MS scan, therefore, we know only one MS scan where the identified peptide is eluted. We need to determine the elution area of the peptide as it is eluted over a period of time. However, some peptides have similar atomic masses and elution times, so their elution areas can have overlaps. A naive approach such as using a fixed range (e.g. within ±30s from the tandem scan of peptides) has a risk of including incorrect MS a)  scans where other peptides are overlapped. Therefore, it is very important to determine accurate elution areas of the peptides for accurate relative quantification.
We assume that the distribution of peptide elution time can be approximated as a normal distribution. Because of noise and overlap of peptides, MS scans with low intensities at both ends of the elution area may not be trusted. If we use only MS scans with high total ion current while modeling the elution profile as a normal distribution, the mean μ of the normal distribution can be approximated, but the variance σ 2 can't. Instead, we use the full width at half maximum (FWHM) to induce When a peptide identification and the associated tandem MS scan is given, our algorithm first finds the maximum point of the peptide's elution profile. For each MS scan within ±30s range from the given tandem scan, it identifies triplex isotopic clusters and calculates the sum of intensities. (Details are explained in the next section.) The MS scan whose sum of intensities is the highest is selected as the maximum point of the elution area. Then it extends the elution area while the sum of intensities of MS scan is above a half of that of the maximum point. The length of the extended area is used as FWHM and weighted average time of scans in the extended area is used as μ. The area with higher intensities than 10% of the maximum intensity in the normal distribution (from  − FWHM 2 10 4 2 ln / ln to  + FWHM 2 10 4 2 ln / ln ) is used as the elution area of a peptide. An example for approximation to normal distribution is shown in Figure 2.
Our algorithm calculates M/L and H/L ratios for all MS scans in the elution area. Then, each of the set of M/L and H/L ratios is integrated by linear regression using the form "y = cx". The intensities of peaks are split into the intensity of light-, medium-, and heavylabeled peptide. We estimate c using the set of intensities of light-labeled peptides as x i 's, and the set of intensities of medium-and heavy-labeled peptides as y i 's for M/L and H/L ratios, respectively.

Identification and validation of triplex isotopic clusters
For each MS scan in the elution area, our algorithm identifies isotopic clusters of a target peptide. Let MZ k be the m/z of the kth peak of an isotopic cluster, then we can calculate three MZ 1 's corresponding to triplex isotopic clusters from the given sequence, charge z, and modification. Our algorithm first finds the monoisotopic peak of each isotopic cluster from MZ 1 within 10 ppm error tolerance. Then, it finds subsequent isotopic peaks from MZ k = MZ k-1 + 1.00235/z within 10 ppm error tolerance. The kth peak is inserted to the isotopic cluster only if the peak improves the least squares fit value Since it is difficult to separate these overlapping isotopic clusters accurately, we discard this MS scan during the quantification.
(LSQ). If the LSQ between the theoretical distribution of the peptide and the isotopic cluster without the kth peak is lower than that with the kth peak, the kth peak is discarded and the algorithm does not look for any more peaks. If there are two or more candidate peaks for the kth peak, the peak with the lowest LSQ is selected. For example, there are two candidates for the third isotopic peak of the light-labeled isotopic cluster of the target peptide and the smaller peak is selected in Figure 3a.
After identification of triplex isotopic clusters of a target peptide, we check them and discard the current MS scan if they are doubtful according to the following criteria. First, we check whether the overall shape of each isotopic cluster resembles that of a theoretical isotopic distribution. At least the LSQ of the most abundant isotopic cluster must be below a threshold (e.g. 0.2). The LSQ of the others should also be below the threshold unless their sums of intensities are lower than a half of that of the most abundant isotopic cluster. (If an isotopic cluster has low abundance, its shape could be abnormal because it may be interfered by chemical noise and other peptides.) Second, we check whether the identified isotopic cluster is overlapped with another peptide. Four types of overlaps are shown in Figure 3. There is no problem if no isotopic peak is shared by two isotopic clusters (Figure 3a). If an isotopic cluster with a different charge value is overlapped, the LSQ of the identified isotopic cluster should be significantly high, so we can discard the current MS scan (Figure 3b). If an isotopic cluster with the same charge and a higher mass is overlapped, shared isotopic peaks could not be inserted to the isotopic cluster of the target peptide because it increases the LSQ of the isotopic cluster (Figure 3c). Only the case in which an isotopic cluster with the same charge and a lower mass is overlapped needs additional filtering (Figure 3d). We can easily detect these overlaps by considering previous peaks, but we can't separate overlapping isotopic clusters in this case because they look like one isotopic cluster. Therefore, we discard the current MS scan if at least one isotopic cluster of a target peptide could be identified as an isotopic cluster with the same charge and a lower mass.

Results and discussion
Application to 7-standard protein data mixed with known ratios We analyzed two datasets in which seven standard proteins were mixed in different ratios. For the Set1 experiment, Std1 was labeled with light, Std2 with medium, and Std3 with heavy. For the Set2 experiment, Std1 was labeled with heavy, Std2 with medium, and Std3 with light. The expected ratios for each experiment are shown in Table 2.
After validation, we obtained 147 MS/MS scans from Set1 and 139 MS/MS scans from Set2, resulting in 168 unique peptides in total. We calculated M/L and H/L ratios of the peptides and classified them according to the proteins. Then we calculated the averages of ratios in individual cases and compared them to the expected ratios ( Table 2). The M/L ratios were generally similar to the expected ratios except CSN2 and CSN1S2, whose ratios were somewhat higher than expected ratios. Most H/L ratios were somewhat lower than the expected ratios, but their standard deviations are meaningfully small. We manually inspected the isotopic clusters of these peptides and concluded that the computed ratios are certainly correct despite their discrepancy from the expected ratios. Some examples of these cases are shown in Figure 4. In spite of our effort to label the samples and to mix them accurately, the mixed ratios of samples may deviate from the expected ratio because of different labeling efficiencies between the labels, and experimental errors such as unequal mixing. Average ratio and standard deviation are the two parameters that determine the accuracy of our quantification analysis. Unlike the average ratio that is very sensitive to such errors, standard deviation is more inert because the ratios originated from peptides of the same protein should be identical in theory. Therefore, the low standard deviations give strong evidence that our computed ratios were accurately determined. Figure 5 shows ratios for the other proteins are given in Supplementary Figure S1 in Additional file 1.

Separation of overlapping triplex isotopic clusters
To verify the robustness of our method for the overlap of triplex isotopic clusters, we prepared Set3 experiment, in which Std1 with light and Std3 with heavy were mixed. In the Set3 experiment, two isotopic clusters originated from the same peptide have no overlap and their relative ratio can be computed accurately even though the peptide has no lysine. Therefore, we can show the robustness of our method by comparing the H/L ratios from the Set1 experiment to those from the Set3 experiment. Fifteen unique peptides were identified in both experiments and their H/L ratios are shown in Figure 6. The H/L ratios in the Set1 experiment were very close to the H/L ratios in the Set3 experiment in spite of the interference of the medium-labeled peptides. The relative ratios of two H/L ratios ranged from 0.868378 to 1.315178, except two peptides from CSN2 whose expected L:M:H ratios in the Set1 experiment were 5:10:1. The H/L ratios of the peptides of CNS2 were somewhat lower in the Set1 experiment than in the Set3 experiment because the medium-labeled isotopic cluster was much larger than the heavy-labeled isotopic cluster and influenced it (Figure 4a).

Cause of low abundance of heavy-labeled peptides
Std1 and Std3 were labeled with light and heavy mTRAQ labels, respectively, in Set1 Experiment and vice-versa in Set2 Experiment. The calculated H/L ratios were lower than the estimated values in both cases, which exclude the possibility of under-digestion of some of the standard mixtures compared to the others. If then, we would expect reversed H/L ratios between the two experimental sets. It becomes even more evident if we consider the MS/MS search results in which only one out of 168 validated peptides was identified as partially labeled. The most probable explanation at this point is low labeling efficiency of the heavy reagent. If we assume   that the M/L ratios are correct, we can approximate the H/L ratios in Set1 experiment using M/L ratios in Set2 experiment. Similarly, we can approximate the H/L ratios in Set2 experiment using M/L ratios in Set1 experiment. We compared them with the computed H/ L ratios and observed that the computed H/L ratios are consistently 50~70% of the approximated H/L ratios except for the cases of CYCS in Set1 experiment (Table 3). This result shows the possibility that the heavy reagent had low labeling efficiency. The origin can also be explained, though in part, by isotope impurity of heavy label. Upon closer inspection of MS spectra of the identified peptides, a peak 1 Da smaller than the monoisotopic peak of heavy label was frequently found (Supplementary Figure S2 in Additional file 1). It was reported that iTRAQ reagents contain trace levels of isotopic impurities [25]. Since mTRAQ shares the same chemical structure with iTRAQ, we expect that the same problem will happen in mTRAQ data analysis.
In real experiments where quantification of complex proteome is needed, one can add a known standard at the ratio of 1:1:1, and use the calculated ratio of the standard as a correction factor. For example, if the calculated ratio of LALBA in the current study is used as a correction factor, the ratios of other proteins become closer to the expected ratios.

Conclusions
We have developed a new data analysis algorithm for peptide quantification in triplex mTRAQ experiments. It can calculate the ratios of peptides accurately by separating overlapping triplex isotopic clusters based on the arithmetic models of isotope overlap and an automatic determination for the elution area of peptides. When used within the TPP pipeline, it can easily analyze highthroughput proteomics data.