BPDA  A Bayesian peptide detection algorithm for mass spectrometry
 Youting Sun^{1},
 Jianqiu Zhang^{2}Email author,
 Ulisses BragaNeto^{1} and
 Edward R Dougherty^{1, 3, 4}Email author
https://doi.org/10.1186/1471210511490
© Sun et al; licensee BioMed Central Ltd. 2010
Received: 1 June 2010
Accepted: 29 September 2010
Published: 29 September 2010
Abstract
Background
Mass spectrometry (MS) is an essential analytical tool in proteomics. Many existing algorithms for peptide detection are based on isotope template matching and usually work at different charge states separately, making them ineffective to detect overlapping peptides and low abundance peptides.
Results
We present BPDA, a Bayesian approach for peptide detection in data produced by MS instruments with high enough resolution to baselineresolve isotopic peaks, such as MALDITOF and LCMS. We model the spectra as a mixture of candidate peptide signals, and the model is parameterized by MS physical properties. BPDA is based on a rigorous statistical framework and avoids problems, such as voting and adhoc thresholding, generally encountered in algorithms based on template matching. It systematically evaluates all possible combinations of possible peptide candidates to interpret a given spectrum, and iteratively finds the best fitting peptide signal in order to minimize the mean squared error of the inferred spectrum to the observed spectrum. In contrast to previous detection methods, BPDA performs deisotoping and deconvolution of mass spectra simultaneously, which enables better identification of weak peptide signals and produces higher sensitivities and more robust results. Unlike templatematching algorithms, BPDA can handle complex data where features overlap. Our experimental results indicate that BPDA performs well on simulated data and real MS data sets, for various resolutions and signal to noise ratios, and compares very favorably with commonly used commercial and opensource software, such as flexAnalysis, OpenMS, and Decon2LS, according to sensitivity and detection accuracy.
Conclusion
Unlike previous detection methods, which only employ isotopic distributions and work at each single charge state alone, BPDA takes into account the charge state distribution as well, thus lending information to better identify weak peptide signals and produce more robust results. The proposed approach is based on a rigorous statistical framework, which avoids problems generally encountered in algorithms based on template matching. Our experiments indicate that BPDA performs well on both simulated data and real data, and compares very favorably with commonly used commercial and opensource software. The BPDA software can be downloaded from http://gsp.tamu.edu/Publications/supplementary/sun10a/bpda.
Keywords
Background
Mass spectrometry (MS) is a key analytical tool in proteomics. A mass spectrometer measures the concentration of ionized molecules at a range of masstocharge ratios (m/z). MS instruments consist of three modules: an ionization source, a mass analyzer and a detector which captures the ions and measures the intensity of each ion species. Widely used ionization methods include electrospray ionization (ESI) [1] and matrixassisted laser desorption/ionization (MALDI) [2, 3]. Mass analyzers separate the ions according to their masstocharge ratios. There are several types of mass analyzers including the Orbitrap [4], Quadrupole [5], TimeofFlight (TOF) [6, 7], and fourier transform ion cyclotron resonance (FTICR) [8]. Liquid Chromatography (LC) is often coupled with MS to achieve additional separation of peptides and thus reduce the complexity of an individual mass spectrum. Before entering the mass spectrometer, peptide species pass through a LC column with different speeds depending on their physicochemical properties and interactions with the solvent [9]. A single LCMS experiment usually produces hundreds to thousands of mass spectra sampled during the LC elution process.
Peptide detection, which converts raw spectra to a list of peptide masses, is usually the first step in protein MS data processing. It directly affects the accuracy of subsequent analyses such as protein identification and quantification, data alignment between multiple experiments, biomarker discovery and classification of different samples. One difficulty in peptide detection is that a peptide species may register several peaks in the spectra due to the following two points: First, a peptide species may take different numbers of charges during ionization, therefore its peaks can be observed at different charge states. Second, at a given charge state, several peaks can be observed due to heavy isotopes (e.g. ^{13}C), which are commonly referred to as isotopic peaks or the isotope series. The isotopic distribution of a peptide can be determined by the elemental formula of the peptide and the natural abundance of heavy isotopes, and therefore known [10]. When mass spectra have low resolution in which isotopic peaks cannot be baseline resolved (i.e. the isotopic peaks convolve together to form isotope envelopes, and only one peak can be observed for one peptide at a given charge state), and when peptides are singly charged as commonly observed in MALDI, to report each detected peak as a peptide feature might be sufficient as in [11–14]. But for high resolution spectra, reporting each observed peak as a unique peptide species would give rise to too many false positives. Thus a variety of algorithms for deisotoping and charge states deconvolution have been proposed. Many of these algorithms such as PepList [15], msInspect [16], Noy's method [17], Decon2LS [18], and OpenMS [19] are based on template matching. Templates employed in the first four algorithms are based on 1 D theoretic isotope patterns predicted from peptide masses [20–22], while the last algorithm combines isotope patterns (along the m/z dimension) with elution peaks (along the elution time dimension) to form 2 D templates. If the observed signal (a cluster of peaks) matches the proposed template well  the quality of the match is assessed by a fitting score  it will be reported as a feature and then subtracted from the spectrum. The process iterates until no more matches can be found. The major problem of this matching and subtraction process is that it may be ineffective to detect overlapping peptides. In the case of overlapping (e.g. one doubly charged peptide can overlap with a singly charged peptide of half the mass), if the peak cluster of one peptide is incorrectly matched and subtracted, the rest of the peptides can not be detected correctly based on the remaining spectrum, which will cause error propagation. Besides, each predicted template is based on a single peptide and it can not match the observed overlapping peaks well, which renders a low quality match and reduces the sensitivities of these algorithms. In addition to these algorithms based on template matching, Du et al. developed an algorithm based on variable selection [10]. The key idea is to select the least number of candidate isotope series to explain the spectrum, and hence find the corresponding peptides. But the superimposed criterion "selecting the least number of candidates" is not justified. In fact, while this criterion may result in a desirable reduction of the false detection rate, the sensitivity may be reduced as well. Zhang et. al. [23] proposed a Bayesian approach for peptide ion peak detection. A model for the m/z interval of one dalton was developed, a Bayesian approach was applied to estimate the model parameters based on the observed spectrum, and the existence probability of a peptide ion peak at each charge state and isotope position was calculated. Note that this algorithm did not perform peptide level peak detection as what we are proposing in this paper. The authors reported that their method had better sensitivity results than the wavelet based algorithms when tested by simulated data and eight sets of real prOTOF MS data. In this paper, we propose a Bayesian Peptide Detection Algorithm (BPDA), which is basically an extension of Zhang's method. The difference being that a model for the whole spectrum is developed, and both isotope patterns and charge state distributions of peptides are considered in our method.
The proposed method, BPDA, can be applied to data generated by MS instruments with mass resolutions high enough to baselineresolve isotopic peaks. BPDA evaluates all possible combinations of possible peptide candidates (originated from welldefined peaks of the raw spectrum  see Methods section for more details) to interpret a given spectrum, and iteratively finds the best fitting peptide parameters (peptide peak heights, existence probabilities, etc.) in order to minimize the mean squared error of the inferred spectrum to the observed spectrum. BPDA offers the following four advantages: Firstly, since BPDA looks for the optimal among all possible interpretations of the MS spectra, the procedure is thus systematic. In contrast, the aforementioned templatematching methods are greedy. They cannot evaluate all potential interpretations of a given spectrum. Hence, they are neither systematic nor optimal. Besides, many methods do not include all possible peptide candidates, especially the ones with low abundance in the first place, and the sensitivity is reduced compared to BPDA. Secondly, BPDA considers all charge states and isotopic peaks of peptides for detection. It is noted that multiply charged peptides can register peaks at several charge states, but deisotoping and charge state deconvolution are often dealt with separately in many algorithms [10, 16–19, 24]. While high abundance charge states may be correctly detected, low abundance charge states might be missed or wrongly assigned, rendering low sensitivity results in peptide identification and inaccuracy in peptide quantification. In contrast, BPDA combines the information of isotopic peaks at different charge states as a whole to detect one peptide, lending information to better identify weak peptide signals. Thirdly, BPDA provides existence probabilities for all the peptides considered, as opposed to the fitting scores generally provided by templatematching methods, the benefits being that the existence probabilities can be directly used for probabilitybased evaluation of peptides and confident peptide detection similar to that of PeptideProphet [25], which is a popular software used for LCMS/MS peptide identification. Finally, most of the parameters in the proposed method possess a clear physical meaning, since they come directly from the observation of the mass spectra. In contrast, many other approaches require the selection of numerous nonintuitive parameters, such as wavelet functions and coefficients [11–13].
Methods
For 1 D MS spectrum, we first perform spectrum preprocessing to remove the baseline, filter the noise and generate a list of peptide candidates. Then BPDA is applied based on the developed MS model to infer the best fitting peptide signals of the observed spectrum, the results being peptide abundances, existence probabilities and so on. For 2 D LCMS spectra, we first detect peptide elution peaks along the retention time dimension, and build elution peak groups by collecting the peaks which have similar retention time together using a method similar to [24]. Each group contains a series of consecutive spectra, which are then averaged to form a mean spectrum. The rationale of using a mean spectrum to represent the group is that the noise of consecutive spectra could be canceled out to a certain degree [11]. The BPDA algorithm is then applied to each of the mean spectra, and finally an overall peptide list is generated. The details of the preprocessing step, the proposed MS model, and the BPDA algorithm are described in the following subsections.
Spectrum preprocessing and obtaining peptide candidates
where mass is the mass of one peptide candidate, m_{ pc } is the mass of one positive charge and m_{ nt } is the mass shift caused by addition of one neutron. Due to mass defect, the mass shift varies for different elements. We approximate m_{ nt } using the mass shift from ^{13}C to ^{12}C, which is 1.0034, since Carbon contributes most to the isotope patterns. This approximation works well if the mass calibration of the instrument is correct. The parameters cs and iso are user defined maximum numbers of considered charge states and isotopic positions, respectively. It is easy to see from the above equation that each detected peak gives rise to cs × (iso + 1) different peptide candidates (masses). These candidates exhaust all the possibilities to generate the peak with centroid d, but it does not follow that all the candidates really exist in the sample. Therefore, our primary goal in peptide detection is to find the existence probability of each peptide candidate. Also note that the total number of candidates should be less than or equal to cs × (iso + 1) × number of detected peaks, as is possible that multiple peaks yield the same candidate mass.
Modeling the mass spectrum
where the peak shape function is given by $f\left({x}_{m};{\rho}_{k,ij},{\alpha}_{k,ij}\right)={e}^{\u2013{\rho}_{k,ij}{({x}_{m}\u2013{\alpha}_{k,ij})}^{2}}$. That is, the peak is modeled as Gaussianshaped, as in [27]. It is reported that the Gaussianshaped peak approximates the reality well enough to obtain good detection results [17]. Still, this peak shape function can be adjusted for different instruments without affecting the overall structure of the algorithm.
In the above three equations, x_{ m } is the m th masstocharge ratio (m/z) in the spectrum, y_{ m } is the observed intensity at x_{ m } , M is the number of observations, and ϵ_{ m } is Gaussian random noise with zero mean and standard deviation σ. The value of can be approximated by the standard deviation of the background region in the spectrum. Note that we model ϵ_{ m } as additive Gaussian which is generally a good model for the thermal noise in electronic instruments. There are reports of nonGaussian noise in FTMS [28] and thus it is safer to apply the proposed algorithm to TOF MS instruments [29]. The parameters of the k th candidate, namely, α_{ k,ij } , ρ_{ k,ij } , λ_{ k } and c_{ k,ij } are discussed in detail below:

α_{ k,ij }is the theoretic centroid (m/z value) of the peak generated by candidate k, at charge state i and isotopic number j.${\alpha}_{k,ij}=\frac{mas{s}_{k}+i{m}_{pc}+j{m}_{nt}}{i},i=1,2,...,cs,j=0,1,...,iso,$(4)
where mass_{ k } is the mass of the k th candidate. Since the candidate's mass is already obtained, α_{ k }_{, ij}can be calculated.

ρ_{ k,ij }relates to the shape (width) of the peak centered at α_{ k,ij }. It can be estimated by using its relationship to the peak's Full Width at Half Maximum (FWHM): ${\rho}_{k,ij}=2\sqrt{2\mathrm{ln}2}/\text{FWHM}$.

λ_{ k }is an indicator random variable, which is 1 if the k th peptide candidate truly exists in the sample and 0 otherwise.

c_{ k,ij }is the height (i.e. intensity) of the peak generated by peptide k, at charge state i and isotopic number j.
In summary, the model considers peaks at different isotopic positions and charge states simultaneously for each peptide candidate, incorporating candidates' existence probabilities and the spectrum thermal noise.
Bayesian peptide detection
be the set of all the unknown model parameters. The goal of our algorithm is to determine the value of θ based on the observed spectrum y = [y_{1},..., y_{ M } ] ^{ T } . In fact, the value of λ_{ k } is of our prime interest for the peptide detection problem. For this purpose, we can use a Bayesian approach to first obtain the a posteriori probability (APP) of all the parameters, P (θ y). Then the APPs P (λ_{ k }y), k = 1, ..., N, can be obtained by integration of the joint posterior distribution P (θ y) over all parameters except λ_{k}. Clearly, the calculation involves high dimension integration which is not an easy task. Besides, due to the highly nonlinear nature of the data model, none of the desired APPs can be obtained analytically. To overcome the computational obstacle, we resort to the Gibbs sampling method [30], which is a variant of the Markov Chain Monte Carlo (MCMC) approach [31], to sample the model parameters.
Gibbs sampling is an iterative scheme, which uses the popular strategy of divideandconquer to sample a subset of parameters at a time while fixing the rest at the sample values from the previous iteration, as if they were true. In other words, for the l th parameter group θ _{ l } , we sample from the conditional posterior distribution P( θ _{ l } θ_{l}, y), where θ _{ l } ≜ θ\θ_{ l } . After this sampling process iterates among the parameter groups for a sufficient number of cycles (which is referred to as the "burnin" period), convergence is reached. The samples collected afterwards are shown to be from the marginal posterior distribution P( θ _{ l } y), which is independent of θ_{ l } , and thus these samples can be used to estimate the target parameters.
The Gibbs sampling process for the k th peptide candidate and the derivations of the conditional posterior distributions of important model parameters are briefly summarized below. The detailed derivations can be found in Additional file 1.

Sample the peak height vector c_{k} ≜ [c_{ k,ij; } i = 1,..., cs, j = 0,..., iso]^{ T }for the k th peptide candidate
where ${\theta}_{{\text{c}}_{k}}\triangleq \theta \backslash {c}_{k}$.
The derivations of the likelihood, the prior distribution and the conditional posterior distribution of c_{ k } are given in Additional file 1.
• Sample the peptide existence indicator variable λ _{ k }
where ${\theta}_{{\lambda}_{k}}\triangleq \theta \backslash {\lambda}_{k}$.
The derivations of the likelihood, the prior distribution and the conditional posterior distribution of λ_{ k } are given in Additional file 1
The complexity of the proposed Gibbs sampling algorithm is determined by two factors: (1) the sheer number of peptide candidates, and (2) the correlation between parameters that need to be sampled. The algorithm complexity grows exponentially with the number of peptide candidates, and the correlation between parameters reduces the sampling efficiency. To address these two issues, we first partition nonoverlapping peptide candidates into different groups. The proposed algorithm can be applied to each group in a parallel manner and the algorithm complexity is reduced, because within each group the number of candidates is reduced, and the corresponding signalcontaining spectrum region is restricted. Peptide candidates within each group are then clustered by the kmeans clustering algorithm [32], the distance measure being the correlation between peptide candidate signals. Peptide candidates within a cluster have strong correlations among each other, and their indicator variables are sampled from the joint conditional posterior distribution. These two measures improve the overall efficiency of the algorithm. The pseudocode of the entire Gibbs sampling process is given in Additional file 2: Table S1.
where r_{0} is the first iteration after convergence is reached, R is the total number of iterations, and ${\lambda}_{k}^{r}$ is the sample value of λ_{ k } in the r th iteration. The k th peptide candidate is said to be detected if its existence probability P(λ_{ k } = 1y) is greater than a predefined threshold.
Results
We report below the observed performance of BPDA, side by side with wellknown tools, such as OpenMS and Decon2LS, in a number of experiments using both synthetic and real data.
Synthetic data
It is difficult to evaluate the performance of a given detection method using real data due to the existence of unpredictable contaminants and the unknown true composition of the samples. The merit of using simulated data is that the ground truth is known and thus algorithm evaluation can be carried out [27, 29].
Synthetic 20mix spectra with different abundance levels (SNRs)
First, to test the robustness of our algorithm, we generated MS data sets with different signal to noise ratios (SNRs), using the method described in [27]. In fact, the mean signal strength (i.e., peptide abundance) was varied while the noise level (i.e., the mean and variance of the noise) was fixed. For each peptide abundance level a, a ∈ {500, 2500, 12500}, the simulation was repeated 50 times. In each repetition, 20 true peptides (with abundance level a and masses randomly selected from a qualitycontrol Shewanella Oneidensis data set provided by PNNL [33]) served as the input of the data model given by Eq. 3. The charge state distribution of one peptide was modeled by a binomial distribution, which was reported to approximate the real data well [27]. The isotopic distribution was obtained for each peptide by using the Averagine model [22] and the Mercury algorithm [21]. The output consists of a simulated mass spectrum. BPDA was applied to obtain the peptide existence probabilities and abundance results. Its performance was evaluated by the classic Receiver Operating Characteristic (ROC) curve. To obtain the ROC curve, first a series of detection levels τ ranging from 0 to 1 with 0.001 increments was selected. Peptides with existence probabilities not less than τ were said to be detected at this specific detection level. The True Positive Rate (TPR) and False Positive Rate (TPR) were then calculated at each detection level as follows:
Synthetic 10mix spectrum with overlapping peptides
Results for synthetic 10mix spectrum with overlapping peptides
BPDA  OpenMS  

True Mass (Da)/Intn/CS  dM (Da)/Intn/CS  dM (Da)/Intn/CS 
1264.279/0.034/13  0.0065/0.032/13  NA 
1266.383/0.103/13  0.0025/0.110/13  0.0025/0.156/13 
1382.247/0.171/14  0.0028/0.181/14  0.0031*/0.228/13 
1388.367/0.114/14  0.0073/0.097/14  0.0046/0.150/13 
1293.323/0.006/13  0.0081/0.007/12  NA 
1294.345/0.008/13  0.0124/0.008/13  0.0033/0.018/12 
1312.441/0.229/14  0.0018/0.247/14  0.0019*/0.334/14 
1313.451/0.183/14  0.0061/0.173/14  NA 
1327.386/0.080/14  0.0035/0.067/13  0.0061*/0.114/13 
1329.378/0.072/14  0.0035/0.078/14  NA 
We remark that Decon2LS results are missing from both synthetic experiments described previously because the synthetic data could not be loaded, causing the program to crash (the data was contained in a mzXML file converted from a 3column text file by the OpenMS FileConverter tool, whose format was successfully verified against mzXML version 2.1). We contacted Decon2LS's developers, but did not hear from them in time to have the Decon2LS results included.
Real data
In this section we report results from experiments carried out with real MS data. The test data and parameter files used for different software tools were provided as supplementary files on the BPDA project website. We stick mainly to the recommended parameter values while only adjusted a few parameters such as mass range and detection level to adapt to each data set.
MALDITOF MS 7mix spectrum
Results for the MALDITOF MS 7mix spectrum
BPDA  OpenMS  Decon2LS  Bruker  

True Masses (Da)  dM (Da)/Intn  dM (Da)/Intn  dM (Da)/Intn  dM (Da)/Intn 
1045.535  0.023/0.550  0.019/0.655  0.021/0.615  0.023/0.532 
1295.678  0.003/0.173  0.026/0.232  0.002/0.168  0.001/0.167 
1346.728  0.017/0.053  0.040/0.070  0.013/0.050  0.011/0.052 
1618.815  0.035/0.178  NA  0.024/0.137  0.022/0.202 
2092.079  0.021/0.004  0.021/0.009  NA  NA 
2464.191  0.012/0.042  0.020/0.034  0.007/0.030  0.009/0.047 
3146.464  NA  NA  NA  NA 
Highresolution LCMS data set MyoLCMS
The preparation of the MyoLCMS data set is detailed as below: the data set was collected from an overnight tryptic digest of horse myoglobin. Capillary liquid chromatographymass spectrometry (cLC/MS) was performed with a splitless nanoLC2 D pump (Eksigent), a 50 mmi.d. column packed with 10 cm of 5 mmo.d. C18 particles, nanoelectrospray and a highresolution timeofflight mass spectrometer (MicrOTOF; Bruker Daltonics). The cLC gradient was 2 to 98% 0.1% formic acid/acetonitrile in 172 seconds at 400 nL/min. Sample was injected at a concentration of 60 fmol/mL with an injection volume of 10 mL (600 fmol injected oncolumn).
Discussion
We observed in our experiments that BPDA performs well on both simulated data and real data, for various SNRs and resolutions, and in complex cases where features overlap.
For the synthetic 20mix experiment, we observe in Figure 1 that the sensitivity (i.e., TPR) of BPDA was consistently higher than that of OpenMS for each abundance level, and both methods gave better sensitivity results as the abundance level (i.e., SNR) increased. Also it is observed that BPDA was quite robust for different SNRs. For the synthetic 10mix experiment with overlapping peptides, we saw that BPDA detected all the peptides at a small falsepositive rate FPR = 0.1, with very small mass deviations and quite accurate abundance results, and nearly all the charge states of the 10 true peptides were correctly reported. In contrast, at FPR = 0.1, OpenMS could detect only a few of the peptides. The abundance results given by OpenMS were not very close to those of the true peptides. Also OpenMS could only detect about half of the charge states.
The results obtained with real data corroborated the findings made with the synthetic experiments. For the MALDITOF MS 7mix data, the four algorithms yielded similar intensity results, but BPDA was the only one to detect six out of the seven peptides. For the MyoLCMS experiment, we focused on protein coverage results, which is an important criterion to determine the confidence in protein identification and quantification [38, 39]. It was observed that BPDA displayed the largest protein coverage among the programs tested.
Conclusions
We have presented BPDA, a Bayesian approach for peptide detection. Feature extraction in MS analysis is difficult because peptides can register multiple peaks. We model peptide signals based on both charge state distributions and isotopic distributions. And unlike perviously published methods, where the detection only utilizes isotopic distributions and works at each single charge state alone, BPDA takes into account the charge state distribution as well, and performs deisotoping and charge state deconvolution at the same time, thus lending information to better identify weak peptide signals and produce more robust results. Moreover, the proposed approach is systematic. It is based on a rigorous statistical framework and avoids problems, such as voting, thresholding and matching ambiguities, generally encountered in algorithms based on template matching.
We have shown that BPDA performs well on both simulated data and real data, for various SNRs and resolutions, and in complex cases where features overlap. Our experimental results indicate that BPDA compares very favorably with commercial software flexAnalysis and commonly used opensource softwares such as OpenMS and Decon2LS in terms of detection performance. As for computational time, BPDA is a globalbased approach, which looks for the optimal solution iteratively through Gibbs sampling, while templatematching based algorithms such as OpenMS and Decon2LS work on a local region at a time and calculate the fitting score, which typically does not require much computation. Hence, BPDA is expected to be more timeconsuming than those algorithms, especially when running under the raw data mode. For example, for 10mix data set, the running times for OpenMS and BPDA were 1 minutes and 30 minutes, respectively  although these figures are not entirely comparable at present, since OpenMS and Decon2LS are developed using the C programming language, while BPDA is developed using Matlab; we plan to translate the Matlab code into C in future work. In addition, we point out that the user can choose the centroid mode to run BPDA as a tradeoff between running time and performance.
Availability and requirements
Project name: BPDA
Project home page: http://gsp.tamu.edu/Publications/supplementary/sun10a/bpda
Operating system(s): Platform independent
Programming language: Matlab
Licence: GNU GPL (note that we do not allow material transfer agreements or software transfer agreements for academics)
Any restrictions to use by nonacademics: licence needed.
Declarations
Acknowledgements
The authors thank the support of the Partnership for Personalized Medicine (PPM) project, through Translational Genomics (TGen) contract C0800904. The authors thank the RCMI Proteomics Core at UTSA (NIH G12 RR013646) for assistance with experiment design, sample preparation and data collection for the horse myoglobin data. The authors thank the Cancer Biology and Infectious Diseases Research Center at the Eastern Virginia Medical School for the peptide standard data collected on the Bruker Ultra flex MALDITOF.
Jianqiu Zhang's research is supported by a San Antonio Life Science Institute research enhancement grant and a grant with award number G12RR013646 from the National Center For Research Resources, USA.
Authors’ Affiliations
References
 Hop C, Bakhtiar R: An introduction to electrospray ionization and matrixassisted laser desorption/ionization mass spectrometry: essential tools in a modern biotechnology environment. Biospectroscopy 1997, 3: 259–280. 10.1002/(SICI)15206343(1997)3:4<259::AIDBSPY2>3.0.CO;2#View ArticleGoogle Scholar
 Karas M, Bahr U: Laser desorption ionization mass spectrometry of large biomolecules. Trends Anal Chem 1990, 9: 321–325. 10.1016/01659936(90)85065FView ArticleGoogle Scholar
 Batoy S, Akhmetova E, Miladinovic S, Smeal J, Wilkins CL: Developments in MALDI mass spectrometry: the quest for the perfect matrix. Appl Spectrosc Rev 2008, 43: 485–550. 10.1080/05704920802108198View ArticleGoogle Scholar
 Hu Q, Noll RJ, Li H, Makarov A, Hardman M, Graham Cooks R: The Orbitrap: a new mass spectrometer. Journal of mass spectrometry 2005, 40: 430–443. 10.1002/jms.856View ArticlePubMedGoogle Scholar
 Todd JFJ, March RE: Quadrupole ion trap mass spectrometry. New York, NY, USA: WileyInterscience; 2005.Google Scholar
 Wollnik H: Timeofflight mass analyzers. Mass Spectrometry Reviews 1993, 12: 89–11. 10.1002/mas.1280120202View ArticleGoogle Scholar
 Chernushevich IV, Loboda AV, Thomson BA: An introduction to quadrupoletimeofflight mass spectrometry. J Mass Spectrom 2001, 36: 849–865. 10.1002/jms.207View ArticlePubMedGoogle Scholar
 Gross ML, Rempel DL: Fourier transform mass spectrometry. Science 1984, 226: 261–268. 10.1126/science.6385250View ArticlePubMedGoogle Scholar
 Zhang J, Gonzalez E, Hestilow T, Haskins W, Huang Y: Review of peak detection algorithms in liquidchromatographymass spectrometry. Curr Genomics 2009, 10: 388–401. 10.2174/138920209789177638View ArticlePubMedPubMed CentralGoogle Scholar
 Du P, Angeletti RH: Automatic Deconvolution of IsotopeResolved Mass Spectra Using Variable Selection and Quantized Peptide Mass Distribution. Anal Chem 2006, 78: 3385–3392. 10.1021/ac052212qView ArticlePubMedGoogle Scholar
 Morris JS, Coombes KR, Koomen J, Baggerly KA, Kobayashi R: Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum. Bioinformatics 2005, 21: 1764–1775. 10.1093/bioinformatics/bti254View ArticlePubMedGoogle Scholar
 Coombes KR, Tsavachidis S, Morris JS, Baggerly KA, Hung MC, Kuerer HM: Improved peak detection and quantification of mass spectrometry data acquired from surfaceenhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform. Proteomics 2005, 5: 4107–4117. 10.1002/pmic.200401261View ArticlePubMedGoogle Scholar
 Du P, Kibbe WA, Lin SM: Improved peak detection in mass spectrum by incorporating continuous wavelet transformbased pattern matching. Bioinformatics 2006, 22: 2059–2065. 10.1093/bioinformatics/btl355View ArticlePubMedGoogle Scholar
 Wang Y, Zhou X, Wang H, Li K, Yao L, Wong STC: Reversible jump MCMC approach for peak identification for stroke SELDI mass spectrometry using mixture model. Bioinformatics 2008, 24: i407i413. 10.1093/bioinformatics/btn143View ArticlePubMedPubMed CentralGoogle Scholar
 Li X, Yi EC, Kemp CJ, Zhang H, Aebersold R: A software suite for the generation and comparison of peptide arrays from sets of data collected by liquid chromatographymass spectrometry. S Mol Cell Proteom 2005, 4: 1328–1340. 10.1074/mcp.M500141MCP200View ArticleGoogle Scholar
 Bellew M, Coram M, Fitzgibbon M, Igra M, Randolph T, Wang P, May D, Eng J, Fang R, Lin C, Chen J, Goodlett D, Whiteaker J, Paulovich A, McIntosh M: A suite of algorithms for the comprehensive analysis of complex protein mixtures using highresolution LCMS. Bioinformatics 2006, 22(15):1902–909. 10.1093/bioinformatics/btl276View ArticlePubMedGoogle Scholar
 Noy K, Fasulo D: Improved modelbased, platformindependent feature extraction for mass spectrometry. Bioinformatics 2007, 23: 2528–2535. 10.1093/bioinformatics/btm385View ArticlePubMedGoogle Scholar
 Jaitly N, Mayampurath A, Littlefield K, Adkins JN, Anderson GA, Smith RD: Decon2LS: An opensource software package for automated processing and visualization of high resolution mass spectrometry data. BMC bioinformatics 2009, 10: 87. 10.1186/147121051087View ArticlePubMedPubMed CentralGoogle Scholar
 Sturm M, Bertsch A, Gröpl C, Hildebrandt A, Hussong R, Lange E, Pfeifer N, SchulzTrieglaff O, Zerck A, Reinert K, Kohlbacher O: OpenMS  An opensource software framework for mass spectrometry. BMC Bioinformatics 2008, 9: 163. 10.1186/147121059163View ArticlePubMedPubMed CentralGoogle Scholar
 Yergey JA: A general approach to calculating isotopic distributions for mass spectrometry. Int J Mass Spectrom Ion Phys 1983, 52: 337–349. 10.1016/00207381(83)850530View ArticleGoogle Scholar
 Rockwood AL, Van Orden SL, Smith R: Rapid cacluation of isotope distributions. Anal Chem 1995, 67: 2699–2704. 10.1021/ac00111a031View ArticleGoogle Scholar
 Horn DM, Zubarev RA, McLafferty FW: Automated reduction and interpretation of high resolution electrospray mass spectra of large molecules. Journal of the American Society for Mass Spectrometry 2000, 11(4):320–332. 10.1016/S10440305(99)001579View ArticlePubMedGoogle Scholar
 Zhang J, Wang H, Suffredini A, Gonzales D, Gonzales E, Huang Y, Zhou X: Bayesian peak detection for proTOF MS MALDI data. In Proc of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Las Vegas, NV, USA; 2008:661–664.Google Scholar
 Du P, Sudha R, Prystowsky MB, Angeletti RH: Data reduction of isotoperesolved LCMS spectra. Bioinformatics 2007, 23: 1394–1400. 10.1093/bioinformatics/btm083View ArticlePubMedGoogle Scholar
 Keller A, Nesvizhskii AI, Kolker E, Aebersold R: Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem 2002, 74: 5383–5392. 10.1021/ac025747hView ArticlePubMedGoogle Scholar
 Matlab mspeaks[http://www.mathworks.com/access/helpdesk/help/toolbox/bioinfo/ref/mspeaks.html]
 SchulzTrieglaff O, Pfeifer N, Gröpl C, Kohlbacher O, Reinert K: LCMSsim  a simulation software for liquid chromatography mass spectrometry data. BMC Bioinformatics 2008, 9: 423. 10.1186/147121059423View ArticlePubMedPubMed CentralGoogle Scholar
 Du P, Stolovitzky G, Horvatovich P, Bischoff R, Lim J, Suits F: A noise model for mass spectrometry based proteomics. Bioinformatics 2008, 24: 1070–1077. 10.1093/bioinformatics/btn078View ArticlePubMedGoogle Scholar
 Coombes KR: Understanding the characteristics of mass spectrometry data through the use of simulation. Cancer Informatics 2005, 1: 41–52.PubMedPubMed CentralGoogle Scholar
 Geman S, Geman D: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 1984, 6: 721–741. 10.1109/TPAMI.1984.4767596View ArticlePubMedGoogle Scholar
 Robert CP, Casella G: Monte Carlo Statistical Methods. New York, NY, USA: Springer; 2004.View ArticleGoogle Scholar
 Duda R, Hart P: Pattern Classification. New York, NY, USA: JohnWiley&Sons; 2001.Google Scholar
 Shewanella Oneidensis data set[http://omics.pnl.gov]
 OpenMS/TOPP website[http://openms.sourceforge.net]
 Bruker peptide calibration standard[http://www2.bdal.de/data/careonline_data/206195/PI_206195_Peptide%20Cal%20Stand_V2.pdf]
 Bruker Daltonics website[http://www.bdal.de]
 Monroe ME, Tolic N, Jaitly N, Shaw JL, Adkins JN, Smith RD: VIPER: an advanced software package to support highthroughput LCMS peptide identification. Bioinformatics 2007, 23(15):2021–2023. 10.1093/bioinformatics/btm281View ArticlePubMedGoogle Scholar
 Stead DA, Preece A, Brown JP: Universal metrics for quality assessment of protein identifications by mass spectrometry. Mol Cell Prot 2006, 5: 1205–1211. 10.1074/mcp.M500426MCP200View ArticleGoogle Scholar
 McHugh L, Arthur JW: Computational Methods for Protein Identification from Mass Spectrometry Data. PLoS Comput Biol 2008, 4: e12. 10.1371/journal.pcbi.0040012View ArticlePubMedPubMed CentralGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.