LC-MSsim – a simulation software for liquid chromatography mass spectrometry data
© Schulz-Trieglaff et al; licensee BioMed Central Ltd. 2008
Received: 07 May 2008
Accepted: 08 October 2008
Published: 08 October 2008
Mass Spectrometry coupled to Liquid Chromatography (LC-MS) is commonly used to analyze the protein content of biological samples in large scale studies. The data resulting from an LC-MS experiment is huge, highly complex and noisy. Accordingly, it has sparked new developments in Bioinformatics, especially in the fields of algorithm development, statistics and software engineering. In a quantitative label-free mass spectrometry experiment, crucial steps are the detection of peptide features in the mass spectra and the alignment of samples by correcting for shifts in retention time. At the moment, it is difficult to compare the plethora of algorithms for these tasks. So far, curated benchmark data exists only for peptide identification algorithms but no data that represents a ground truth for the evaluation of feature detection, alignment and filtering algorithms.
We present LC-MSsim, a simulation software for LC-ESI-MS experiments. It simulates ESI spectra on the MS level. It reads a list of proteins from a FASTA file and digests the protein mixture using a user-defined enzyme. The software creates an LC-MS data set using a predictor for the retention time of the peptides and a model for peak shapes and elution profiles of the mass spectral peaks. Our software also offers the possibility to add contaminants, to change the background noise level and includes a model for the detectability of peptides in mass spectra. After the simulation, LC-MSsim writes the simulated data to mzData, a public XML format. The software also stores the positions (monoisotopic m/z and retention time) and ion counts of the simulated ions in separate files.
LC-MSsim generates simulated LC-MS data sets and incorporates models for peak shapes and contaminations. Algorithm developers can match the results of feature detection and alignment algorithms against the simulated ion lists and meaningful error rates can be computed. We anticipate that LC-MSsim will be useful to the wider community to perform benchmark studies and comparisons between computational tools.
In mass spectrometry (MS) based proteomics, proteins in a sample are digested and the resulting peptides are separated by high-performance liquid chromatography (LC) before injecting them into the mass spectrometer . In this work, we focus on data from LC-MS experiments, as opposed to LC-MS/MS experiments where a fragmentation of selected sample compounds is performed to obtain ion ladders which can be used for the identification of the compound . Pure LC-MS experiments do not directly give information about the sequences of the peptides in a sample but we can still use the information on the LC-MS level to perform a quantification of the sample proteins . In this application, algorithms detect peptide ion signals (features) in LC-MS spectra and estimate their abundances by integrating the signal area. Different charge variants of the same peptide are summarized (deconvoluted) and the peptides are mapped back to their parent protein to obtain abundance estimates at the protein level.
This plethora of tools is often confusing for the user who needs to decide which algorithms to apply for his data. But also developers of new algorithms need standardized benchmark data to compare their approach to existing ones. This is a difficult task, since only few quality metrics  and only limited benchmarks exist so far. Carefully compiled databases of annotated test data are standard in other fields such as DNA sequence [17–19] or RNA structure analysis . But they are not yet available for mass spectrometry based proteomics. Only few researchers make their LC-MS data publicly available and all proteomic databases so far focus on data for the identification of peptides from MS/MS spectra [21–25] and not on broader applications such as quantitative experiments.
An ideal LC-MS data set for the evaluation of feature detection, alignment and quantification algorithms would contain annotations with the positions of all peptide ion signals, their charge states, monoisotopic masses and abundances. Only this information would allow meaningful comparisons between different methods and fair benchmark studies. Of course, this information can be partially obtained by peptide identifications using MS/MS fragmentation. Unfortunately, only a few of the peptide ions present in a sample are selected for fragmentation. Furthermore, even of those fragmented, many cannot be identified due to noise, mutations or posttranslational modifications. For these reasons, annotations by MS/MS will always be incomplete. Manual annotations by a human expert have been performed for single data sets  but are clearly infeasible if our aim is to generate larger benchmarks. We believe that the simulation of LC-MS spectra is a valid approach, to be supplanted by the accumulation of annotated real-word spectral databases.
In the following sections, we introduce our software LC-MSsim and describe its implementation details. We would like to emphasize that our aim was not to create a detailed physical model of mass spectra generation as, for instance, attempted in . But we want to simulate data that is reasonably close to reality and provides a fair testing ground for data analysis methods. The idea of simulating ESI mass spectra to assess the performance of MS feature detection algorithms was pioneered by Wong et al.  who presented a straightforward model for the simulation of ESI mass spectra. They simulate spectra as mass lists derived from theoretical digests of protein sequences with normalized intensities without prediction of ion intensities, retention times or simulation of isotopic pattern. They also restrict their comparison to their own algorithm which implements a very specific task, the detection of protein-ligands and other macromolecular complexes in mass spectra. Of course, the applications of LC-MSsim are not restricted to feature detection benchmarks. The next obvious step would be to compare alignment algorithms, but even the comparison of a full quantification workflow is an interesting scenario.
To our knowledge, LC-MSsim is the first software that models the whole LC-MS data acquisition process and delivers an output (the simulated LC-MS map and the list of peptides and contaminants with m/z and retention time) that can directly be used for the assessment of proteomics algorithms. There are, of course, some programs that simulate individual parts of the LC-MS data acquisition process, such as the estimation of isotopic peak patterns [28, 29], the prediction of peptide retention times [30–34] or detectability [35, 36]. However, these tools are written in different programming languages and they have different output formats that cannot be easily combined. Therefore, to simulate a full LC-MS run, it is clearly desirable to have all of these tools combined in a single application.
LC-MSsim is written in C++ as an add-on for OpenMS , our software library for computational mass spectrometry. LC-MSsim uses OpenMS data structures for file reading, writing and the calculation of isotopic patterns. It is also compatible with The OpenMS Proteomics Pipeline (TOPP)  and can readily be integrated into its workflows. This makes it very easy to generate large numbers of simulated data sets and to pipe them directly into a TOPP data analysis pipeline. LC-MSsim is compatible with the current OpenMS release version 1.1.
Furthermore, LC-MSsim supports the TOPP INI (configuration) file format. This format is XML-based and can be edited using common XML editors or the INIFileEditor supplied with TOPP. LC-MSsim, OpenMS and TOPP are all published under the Lesser GNU Public License. The source code can be downloaded from http://sourceforge.net/projects/lcms-sim.
An artificial LC-MS data set is generated by the following steps: digestion of proteins, prediction of peptide detectability and retention time, relative abundances of charge states, modeling of isotopic and elution profiles and addition of shot noise to spectra. Key parameters that influence the outcome of the simulation are the minimum accepted peptide detectability which influences the number of theoretical peptides appearing in the LC-MS spectra, mass accuracy and resolution, as well as the Full-Width-At-Half-Maximum (FWHM) of the peptide peaks and the percentage of non-peptide contaminants added by the simulation software. In the following sections, we give an overview of all simulation steps and explain their parameters in more detail.
The user can supply a list of protein sequences in a FASTA file and define their relative abundances in the sequence header. If no abundance is given, we assume that each protein, and thus its peptides, will appear in equal abundances in the mass spectra (apart from effects such as ion suppression etc.). LC-MSsim supports only tryptic digests in this version, but new proteases can be added easily by extending the corresponding OpenMS classes by new regular expressions. We can also simulate missed cleavages and self-digestion of the protease.
Detectability and Retention Time Prediction
After the enzymatic digest of all protein sequences, we need to determine the retention time of each peptide. Pfeifer et al.  recently introduced the paired oligo-border kernel (POBK) for machine learning problems in computational proteomics. Support vector regression  using this kernel function yields very accurate retention time predictions while requiring only a small number of training samples. We use the POBK for retention time prediction in our simulator. We trained the SVM on the test set of Petritis et al.  and determined the best parameters using nested cross-validation. The data set consists of 1304 peptide identifications of capillary reversed-phase liquid chromatography runs.
Charge Distribution Model
After protein digestion, peptide detectability and retention time prediction, we need to determine the relative abundances of the ions created for each peptide. LC-MSsim models an electrospray ionization (ESI) mass spectrometer. ESI ionizes peptides and other sample compounds by applying a strong electric field to the sample. This field induces a charge accumulation at the liquid surface which will form highly charged droplets. As a result, we expect to see one to four ions per peptide, but charge states two or three are the most common. There are several chemical models describing the charge distribution for molecules after ESI and numerous factors influence this distribution such as the pH, sample composition and conformation of the peptide [43, 44]. However, our experiments have shown that a simple model gives a good approximation of real data.
For this reason, we decided to stick with a straightforward model of an ESI mass spectrometer in positive ion mode. We follow an approach by Schnier et al  and assume that each basic amino acid in a peptide can receive at most one charge unit (proton). Consequently, most tryptic peptides have a maximum charge state of 2 – 3 which matches observations of real data. We determine the relative abundances of each charge state by sampling from a binomial distribution. As a result, low charge states are much more likely to occur than higher ones.
Ion Signal Model
The position of a peptide ion signal in the LC-MS map is determined by three parameters: monoisotopic mass, charge and retention time. We calculate the mass from the amino acid sequence, charge is given by our binomial charge distribution model and the retention time predicted by the SVM.
Usually, a peptide ion gives rise to several peaks in a mass spectrum due to the fact that some of its atoms will occur in heavier isotopic states. Given the sequence of the peptide, we calculate its monoisotopic mass from its empirical formula. The relative heights of the isotopic peaks are calculated using a simple but fast algorithm . This algorithm gives us the relative intensities of the isotopic peaks. We model the peak shape using a Gaussian distribution. The user can choose the peak width in terms of the Full-Width-At-Half-Maximum (FWHM). The FWHM of a peak in a mass spectrum is given by the difference of the m/z values at which the ion count equals half of the maximum ion count of this peak. Note that we assume the peak shape to be Gaussian and the FWHM of a Gaussian function is given by , where σ is the standard deviation of the Gaussian.
Noise and Contaminations
No real LC-MS data set consists only of true signals i.e. signals caused by sample compounds. There is always some (and often a high amount of) noise in each spectrum. LC-MSsim has several parameters that allow the user to introduce noise of various forms into a data set. Users can simulate almost perfect LC-MS runs and runs with high amount of noise posing severe challenges to data analysis algorithms.
First, the user can define error bounds on the theoretically predicted retention times. By doing so, we simulate retention time shifts between different experiments and, for instance, can evaluate the performance of LC-MS alignment algorithms that are used to correct for these shifts as illustrated in Fig. 1. LC-MSsim assumes these errors to be Gaussian-distributed and the user can define medium and standard deviation in each case.
Mass analyzers with different mass accuracies and resolutions are simulated by changing the FWHM of the peptide peaks as described above and by altering the sampling step size of the peptide models. Furthermore, LC-MSsim simulates inaccuracies in peak intensity measurements by adding Gaussian-distributed noise to peptide peaks. Finally, ESI mass spectra frequently contain high-frequency noise signals of low to medium intensity, often referred to as shot noise. This term stems from electronics and physics  and describes statistical fluctuations occurring if the number of particles measured by a detector is very small. Its strength increases with the average intensity of the detected signal but is usually only detectable if the measured signal is very weak. The common assumption is that shot noise is Poisson-distributed .
Another typical phenomenon in mass spectra is a so-called baseline signal which usually decays with increasing m/z. This is usually a problem for MALDI instruments but less in ESI mass spectrometry. LC-MSsim can simulate baseline signals by adding an exponentially-decaying baseline to each mass spectrum, but this feature is turned off by default.
Shot noise and a baseline are both factors that hamper a computational analysis. But of equal concern for feature detection algorithms are non-peptidic contaminations in an LC-MS experiment or peptide signals arising from modified peptides. Hoopmann et al.  demonstrated that the detection of modified peptides is difficult and requires additional computational effort since the isotopic pattern of these peptides does not follow the typical averagine pattern assumed by most algorithms. In short, an averagine is an average amino acid with a composition estimated from a large number of protein sequences. Using the averagine, we can estimate the average isotopic pattern for a peptide of a given mass. Furthermore, contaminations such as salt molecules or metabolites are of lesser interest in proteomics studies and should not be reported by peptide feature detection algorithms. For these reasons, we decided to simulate these interferences as well. LC-MSsim comes with a list of sample contaminants that can easily be extended by editing the corresponding text file. The current list of available contaminants comprises a snapshot of metabolites downloaded from the Human Metabolome Database . The user can set the percentage of added contaminants with respect to the number of peptides.
In this section, we present exemplary applications of LC-MSsim. The advantage of our simulator is that we can generate LC-MS maps for which the exact mass, charge, retention time and bounding box of all compounds are known. The bounding box is the smallest axis-parallel rectangle that fully encloses the raw data points constituting the peptide feature. We can also deliberately introduce noise or change instrument parameters such as resolution or chromatographic behavior. This allows scientists to perform fine grained comparisons of LC-MS data analysis algorithms.
We decided to focus on peptide feature detection algorithms and compare the algorithms msInspect , Superhirn , SpecArray , MZmine  and Decon2LS . Decon2LS is an implementation of the THRASH algorithm . We also report on results for the algorithm implemented in OpenMS . This is for reference only, since we use some of the simulation models in our feature detection algorithm as well, which would make benchmarking of the OpenMS algorithm biased.
The algorithms we compared differ heavily in the type and number of parameters they accept. Some require only m/z and retention time range in which to search for features, others require lots of parameters such as confidence cutoffs, bin width or minimum signal-to-noise levels, to name just a few. Parameters are also not always well documented. To achieve a comparison as fair and unbiased as possible, we chose for each algorithms settings that seemed suitable for each simulation run (such as mass resolution and m/z range), but apart from that we decided to stick with the standard parameters and not to further optimize.
Quality of Simulation
Performing simulations always raises the question whether the simulated data is sufficiently close to reality. In this section, we will demonstrate that our simulations are realistic.
Influence of Mass Resolution on Feature Detection
We noticed early on that each algorithm follows a different strategy. Some algorithms report a lot of potential peptide features even for simple data sets, rather than missing an important signal. The rationale seems to be that it is better to obtains many false positives than to miss a potentially crucial signal. Of course, spurious noise signals can be removed during later stages of the workflow. For instance, by removing signals that do not appear at consistent positions during alignment. Nevertheless, this makes matters unnecessary difficult. In contrast, some algorithms are highly specific but tend to miss poorly resolved signals. Which strategy is best might depend on the specific task to be performed and the complexity of the data.
False Discovery and True Positive Rates for changing mass resolutions
We also note that some algorithms, especially msInspect and Decon2LS, compute huge numbers of false positives and consequently, their False Discovery Rates are poor. On the other hand, both algorithms find almost all true signals, especially on the high resolution data set.
Influence of Chromatographic Conditions
False Discovery and True Positive Rates under changing chromatography conditions
The performance of most algorithms remains stable across chromatographic conditions. There are only two algorithms whose performance lags behind if the elution peaks become noisier, OpenMS and MZmine. The first simulated run with good column conditions contains many overlapping isotopic pattern, and OpenMS is not able to separate strongly overlapping signals. Furthermore, OpenMS uses a Gaussian model to fit the elution curve of a feature and discards features having a poor probability under this model. Obviously, this dampens the performance of OpenMS in this experiment. MZmine does not perform well on high resolution data, as shown in the previous section. This might be due to unfavorable parameter settings. The False Discovery Rate of SpecArray increases slightly at poorer chromatography conditions. All other algorithms are not affected. Note that Decon2LS is not affected by changes in the chromatographic condition since this tool detects isotopic patterns in a scanwise matter and does not take the elution profile into consideration.
Finally, we tested to what extent current peptide feature detection algorithms can discriminate between peptide signals and signals of other sample compounds. To this end, we generated an LC-MS map consisting of 360 metabolites, but no peptides. These metabolites represent a random subset of compounds from the Human Metabolome Database (accessed 11 March 2008). For each metabolite, we computed its isotopic distribution and placed it at a randomly-determined retention time in the LC-MS map. We modeled the elution profile using a Gaussian function.
Percentage of metabolites declared as features
We computed the Pearson correlation coefficient for both, metabolite and peptide isotopic profiles. The lowest correlation is 0.95, which is still high. But this means that current algorithms that try to detect peptide signals using the averagine method will only poorly be able to distinguish peptides from other biomolecules.
This problem might not be grave. If we simply search for signals that discriminate between two conditions e.g. control and disease, it might at first not be that important whether this signal is caused by a peptide or a metabolite. But it is a fact that users have to keep in mind: most feature detection algorithms detect a lot of features in a real world data set, many more than are sequenced. This has usually been attributed to the fact that the data dependent acquisition process is a semi-random sampling of sample compounds and many peptides will never be identified. But users need to be aware that not all detected features will be caused by peptides, but also by other biomaterials including metabolites.
LC-MSsim simulates mass spectrometry experiments with a wide range of instrument settings and column performances. There are some ways this software could be improved. To give an example, we trained our SVM predictor for the detectability of peptides on data obtained from MS/MS identifications. That is, our model actually predicts whether a peptide is detected and identified using MS/MS. But we use it to predict whether a peptide occurs at all in an LC-MS data set or not. This is clearly a less stringent criterion since not all peptides visible in a mass spectrum will be identified by MS/MS.
It would also be interesting to test another important class of LC-MS data analysis algorithms, namely alignment methods. There is a similar diversity of approaches  as for feature detection algorithms and it would be highly beneficial to the computational proteomics community to know about their individual strengths and weaknesses. The next step, as already mentioned in the introduction, would be to test full data analysis pipelines for accuracy of quantification, robustness in the presence of noise and contaminants, etc. Obviously, finding good parameters for each and every pipeline will become even more difficult than it was already in this smaller study. It might be a good idea to compile a benchmark data set consisting of some real and manually annotated LC-MS runs, complemented by a large number of simulated runs. This would be an ideal testing ground for the proteomics community to compare and assess different analysis methods.
To summarize, our aim was not to develop a simulation capturing all physical aspects of an LC-MS experiment. This is hard since not all these aspects are entirely understood. But our aim was to develop a tool which yields benchmark data that are sufficiently close to reality. Furthermore, we tried to keep the source code as modular as possible such that the community can adopt it or add new ideas and simulation models.
We presented LC-MSsim, a simulation software for LC-ESI-MS spectra. Our software contains predictors for peptide retention time and detectability as well as models for charge distribution, peak shapes and isotopic intensity distributions. It has already proved to be valuable for in-house studies and we make it publicly available in the hope that it will be useful to the wider community.
LC-MSsim is implemented as an add-on to the OpenMS C++ software library and available for free under an open source license (LGPL). Both OpenMS and LC-MSsim can be downloaded from the sourceforge software repository. From a software engineering point of view, LC-MSsim is an example how mass spectrometry-related software can easily be built using the OpenMS library.
In this work, we demonstrated the versatility of LC-MSsim for the benchmarking of peptide feature detection algorithms. This is a difficult task on real LC-MS data since there is no clearly defined ground truth in this case. We were able to probe the capabilities of currently available algorithms to a deeper extent than previously possible.
Availability and Requirements
LC-MSsim runs under Linux and Windows (using the MingGW compiler). Sourcecode is available from http://sourceforge.net/projects/lcms-sim. Installation instructions can be found at http://lcms-sim.sourceforge.net/. The software depends on several data structures in the OpenMS software library which can be downloaded at http://www.openms.de.
List of abbreviations used
Liquid Chromatography coupled to Mass Spectrometry
Lesser GNU Public License. Available at http://www.gnu.org/licenses/lgpl.html
An integrated software for support vector classification, available at http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Multidimensional Protein Identification Technology, it combines 2D chromatography, i.e. two coupled columns, with a mass spectrometer
Paired Oligo-Border Kernel
Support Vector Machine
True Positive Rate.
O.S.-T. acknowledges funding by the International Max Planck Research School for Computational Biology and Scientific Computing (IMPRS-CBSC) and by a grant of the German Federal Ministry for Education and Research (BMBF), grant no. 031369C.
We thank Parag Mallick (UC Los Angeles) for providing us with the data sets for peptide detectability prediction. We also thank Alexander Haupt who implemented a preliminary version of the simulator and Marcel Grunert who implemented the peak elution profiles. Many other researchers and students in Berlin, Saarbrücken and Tübingen contributed to the OpenMS software library of which we made heavy use in this work. We also thank the anonymous reviewers who helped to improve this manuscript.
- Mann M, Aebersold R: Mass spectrometry-based proteomics. Nature 422 2003, 422: 198–207. 10.1038/nature01511View ArticleGoogle Scholar
- Nesvizhskii AI, Vitek O, Aebersold R: Analysis and validation of proteomic data generated by tandem mass spectrometry. Nat Meth 2007, 4(10):787–797. 10.1038/nmeth1088View ArticleGoogle Scholar
- MacCoss M, Matthews DE: Quantitative MS for proteomics: Teaching a new dog old tricks. Anal Chem 2005, 77(15):294A-302A.View ArticlePubMedGoogle Scholar
- Schulz-Trieglaff O, Hussong R, Gröpl C, Hildebrandt A, Reinert K: A fast and accurate algorithm for the quantification of peptides from LC-MS data. In Research in Computational Molecular Biology, 11th Annual International Conference, RECOMB 2007, Oakland, CA, USA, April 21–25, 2007, Proceedings, of Lecture Notes in Computer Science. Volume 4453. Edited by: Speed TP, Huang H. Springer; 2007:473–487.Google Scholar
- Hoopmann M, Finney G, MacCoss M: High-Speed Data Reduction, Feature Detection, and MS/MS Spectrum Quality Assessment of Shotgun Proteomics Data Sets Using High-Resolution Mass Spectrometry. Analytical Chemistry 2007, 79(15):5620–5632. 10.1021/ac0700833PubMed CentralView ArticlePubMedGoogle Scholar
- Du P, Sudha R, Prystowsky MB, Angeletti RH: Data reduction of isotope-resolved LC-MS spectra. Bioinformatics 2007, 23(11):1394–1400. 10.1093/bioinformatics/btm083View ArticlePubMedGoogle Scholar
- Prakash A, Mallick P, Whiteaker J, Zhang H, Paulovich A, Flory M, Lee H, Aebersold R, Schwikowski B: Signal Maps for Mass Spectrometry-based Comparative Proteomics. Mol Cell Proteomics 2006, 5(3):423–432.View ArticlePubMedGoogle Scholar
- Lange E, Gröpl C, Schulz-Trieglaff O, Leinenbach A, Huber C, Reinert K: A geometric approach for the alignment of liquid chromatography mass spectrometry data. Bioinformatics 2007, 23(13):i273–281. 10.1093/bioinformatics/btm209View ArticlePubMedGoogle Scholar
- Prince J, Marcotte E: Chromatographic Alignment of ESI-LC-MS Proteomics Data Sets by Ordered Bijective Interpolated Warping. Analytical Chemistry 2006, 78(17):6140–6152. 10.1021/ac0605344View ArticlePubMedGoogle Scholar
- Listgarten J, Emili A: Statistical and computational methods for comparative proteomic profiling using liquid chromatography-tandem mass spectrometry. Mol Cell Proteomics 2005, 4(4):419–434. 10.1074/mcp.R500005-MCP200View ArticlePubMedGoogle Scholar
- Bellew M, Coram M, Fitzgibbon M, Igra M, Randolph T, Wang P, May D, Eng J, Fang R, Lin CW, Chen J, Goodlett D, Whiteaker J, Paulovich A, McIntosh M: A suite of algorithms for the comprehensive analysis of complex protein mixtures using high-resolution LC-MS. Bioinformatics 2006, 22(15):1902–1909. 10.1093/bioinformatics/btl276View ArticlePubMedGoogle Scholar
- Katajamaa M, Orešič M: Processing methods for differential analysis of LC/MS profile data. BMC Bioinformatics 2005, 6: 179. 10.1186/1471-2105-6-179PubMed CentralView ArticlePubMedGoogle Scholar
- Kohlbacher O, Reinert K, Gröpl C, Lange E, Pfeifer N, Schulz-Trieglaff O, Sturm M: TOPP-the OpenMS proteomics pipeline. Bioinformatics 2007, 23(2):e191–197. 10.1093/bioinformatics/btl299View ArticlePubMedGoogle Scholar
- Mueller LN, Rinner O, Schmidt A, Letarte S, Bodenmiller B, Brusniak MY, Vitek O, Aebersold R, Müller M: SuperHirn – a novel tool for high resolution LC-MS-based peptide/protein profiling. Proteomics 2007, 7(19):3470–3480. 10.1002/pmic.200700057View ArticlePubMedGoogle Scholar
- Mueller LN, Brusniak MY, Mani DR, Aebersold R: An Assessment of Software Solutions for the Analysis of Mass Spectrometry Based Quantitative Proteomics Data. Journal of Proteome Research 2008, 7: 51–61. 10.1021/pr700758rView ArticlePubMedGoogle Scholar
- Piening B, Wang P, Bangur C, Whiteaker J, Zhang H, Feng LC, Keane J, Eng J, Tang H, Prakash A, McIntosh M, Paulovich A: Quality Control Metrics for LC-MS Feature Detection Tools Demonstrated on Saccharomyces cerevisiae Proteomic Profiles. Journal of Proteome Research 2006, 5(7):1527–1534. 10.1021/pr050436jView ArticlePubMedGoogle Scholar
- Thompson JD, Plewniak F, Poch O: BAliBASE: a benchmark alignment database for theevaluation ofmultiple alignment programs. Bioinformatics 1999, 15: 87–88. 10.1093/bioinformatics/15.1.87View ArticlePubMedGoogle Scholar
- Julie D, Thompson RR, Patrice Koehl, Poch O: BAliBASE 3.0: Latest developments of the multiplesequence alignmentbenchmark. Proteins: Structure, Function, and Bioinformatics 2005, 61: 127–136. 10.1002/prot.20527View ArticleGoogle Scholar
- Edgar RC: MUSCLE: multiple sequence alignment with high accuracy andhighthroughput. Nucleic Acids Research 2004, 32(5):1792–1797. 10.1093/nar/gkh340PubMed CentralView ArticlePubMedGoogle Scholar
- Gardner P, Giegerich R: A comprehensive comparison of comparative RNA structure prediction approaches. BMC Bioinformatics 2004, 5: 140. 10.1186/1471-2105-5-140PubMed CentralView ArticlePubMedGoogle Scholar
- Desiere F, Deutsch E, Nesvizhskii A, Mallick P, King N, Eng J, Aderem A, Boyle R, Brunner E, Donohoe S, Fausto N, Hafen E, Hood L, Katze M, Kennedy K, Kregenow F, Lee H, Lin B, Martin D, Ranish J, Rawlings D, Samelson L, Shiio Y, Watts J, Wollscheid B, Wright M, Yan W, Yang L, Yi E, Zhang H, Aebersold R: Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry. Genome Biology 2004, 6: R9. 10.1186/gb-2004-6-1-r9PubMed CentralView ArticlePubMedGoogle Scholar
- Klimek J, Eddes J, Hohmann L, Jackson J, Peterson A, Letarte S, Gafken P, Katz J, Mallick P, Lee H, Schmidt A, Ossola R, Eng J, Aebersold R, Martin D: The Standard Protein Mix Database: A Diverse Data Set To Assist in the Production of Improved Peptide and Protein Identification Software Tools. Journal of Proteome Research 2007.Google Scholar
- Prince JT, Carlson MW, Wang R, Lu P, Marcotte EM: The need for a public proteomics repository. Nat Biotech 2004, 22(4):471–472. 10.1038/nbt0404-471View ArticleGoogle Scholar
- Bodenmiller B, Malmstrom J, Gerrits B, Campbell D, Lam H, Schmidt A, Rinner O, Mueller LN, Shannon PT, Pedrioli PG, Panse C, Lee HK, Schlapbach R, Aebersold R: PhosphoPep[mdash]a phosphoproteome resource for systems biology research in Drosophila Kc167 cells. Mol Syst Biol 2007., 3:Google Scholar
- Jones P, Cote RG, Martens L, Quinn AF, Taylor CF, Derache W, Hermjakob H, Apweiler R: PRIDE: a public repository of protein and peptide identifications for the proteomics community. Nucl Acids Res 2006, 34: D659–663. 10.1093/nar/gkj138PubMed CentralView ArticlePubMedGoogle Scholar
- Coombes KR, Koomen J, Baggerly KA, Morris JS, Kobayashi R: Understanding the Characteristics of Mass Spectrometry Data Through the Use of Simulation. Cancer Informatics 2005., 1:Google Scholar
- Wong JWH, Downard KM: Performance of the computer algorithm COMPLX for the detection of protein complexes in the mass spectra of simulated biological mixtures. Journal of Mass Spectrometry 2005, 40(9):1187–1196. 10.1002/jms.894View ArticlePubMedGoogle Scholar
- ExPASy: Isotopident[http://education.expasy.org/student_projects/isotopident/htdocs/]
- ProteinProspector (MS-Isotope)[http://prospector.ucsf.edu/]
- Meek JL: Prediction of Peptide Retention Times in High-Pressure Liquid Chromatography on the Basis of Amino Acid Composition. PNAS 1980, 77: 1632–1636. 10.1073/pnas.77.3.1632PubMed CentralView ArticlePubMedGoogle Scholar
- Petritis K, Kangas LJ, Yan B, Monroe ME, Strittmatter EF, Qian WJ, Adkins JN, Moore RJ, Xu Y, Lipton MS, Camp DG, Smith RD: Improved peptide elution time prediction for reversed-phase liquid chromatography-MS by incorporating peptide sequence information. Anal Chem 2006, 78(14):5026–5039. 10.1021/ac060143pPubMed CentralView ArticlePubMedGoogle Scholar
- Krokhin OV: Sequence-specific retention calculator. Algorithm for peptide retention prediction in ion-pair RP-HPLC: application to 300- and 100-A pore size C18 sorbents. Anal Chem 2006, 78(22):7785–7795. 10.1021/ac060777wView ArticlePubMedGoogle Scholar
- Klammer A, Yi X, MacCoss M, Noble W: Improving Tandem Mass Spectrum Identification Using Peptide Retention Time Prediction across Diverse Chromatography Conditions. Analytical Chemistry 2007, 79(16):6111–6118. 10.1021/ac070262kView ArticlePubMedGoogle Scholar
- Pfeifer N, Leinenbach A, Huber CG, Kohlbacher O: Statistical learning of peptide retention behavior in chromatographic separations: a new kernel-based approach for computational proteomics. BMC Bioinformatics 2007, 8: 468. 10.1186/1471-2105-8-468PubMed CentralView ArticlePubMedGoogle Scholar
- Mallick P, Schirle M, Chen SS, Flory MR, Lee H, Martin D, Ranish J, Raught B, Schmitt R, Werner T, Kuster B, Aebersold R: Computational prediction of proteotypic peptides for quantitative proteomics. Nat Biotech 2007, 25: 125–131. 10.1038/nbt1275View ArticleGoogle Scholar
- Tang H, Arnold RJ, Alves P, Xun Z, Clemmer DE, Novotny MV, Reilly JP, Radivojac P: A computational approach toward label-free protein quantification using predicted peptide detectability. Bioinformatics 2006, 22(14):e481–488. 10.1093/bioinformatics/btl237View ArticlePubMedGoogle Scholar
- Sturm M, Bertsch A, Groepl C, Hildebrandt A, Hussong R, Lange E, Pfeifer N, Schulz-Trieglaff O, Zerck A, Reinert K, Kohlbacher O: OpenMS – An open-source software framework for mass spectrometry. BMC Bioinformatics 2008., 9:Google Scholar
- Schölkopf B, Smola AJ, Williamson RC, Bartlett PL: New Support Vector Algorithms. Neural Computation 2000, 12(5):1207–1245. 10.1162/089976600300015565View ArticlePubMedGoogle Scholar
- Sanders W, Bridges S, McCarthy F, Nanduri B, Burgess S: Prediction of peptides observable by mass spectrometry applied at the experimental set level. BMC Bioinformatics 2007, 8(Suppl 7):S23. 10.1186/1471-2105-8-S7-S23PubMed CentralView ArticlePubMedGoogle Scholar
- Vapnik VN: The nature of statistical learning theory. New York, NY, USA: Springer-Verlag New York, Inc; 1995.View ArticleGoogle Scholar
- Wu T, Lin C, Weng R: Probability estimates for multi-class classification by pairwise coupling. 2003.Google Scholar
- Chang CC, Lin CJ:LIBSVM: a library for support vector machines. 2001. [http://www.csie.ntu.edu.tw/~cjlin/libsvm]Google Scholar
- Iavarone AT, Jurchen JC, Williams ER: Effects of solvent on the maximum charge state and charge state distribution of protein ions produced by electrospray ionization. Journal of the American Society for Mass Spectrometry 2000, 11(11):976–985. 10.1016/S1044-0305(00)00169-0PubMed CentralView ArticlePubMedGoogle Scholar
- Konermann L: A Minimalist Model for Exploring Conformational Effects on the Electrospray Charge State Distribution of Proteins. Journal of Physical Chemistry B 2007, 111(23):6534–6543. 10.1021/jp070720tView ArticleGoogle Scholar
- Schnier PD, Gross DS, Williams ER: On the Maximum Charge State and Proton Transfer Reactivity of Peptide and Protein Ions Formed By Electrospray Ionization. Journal of the American Society for Mass Spectrometry 1995, 6(11):1086–1097. 10.1016/1044-0305(95)00532-3View ArticlePubMedGoogle Scholar
- Kubinyi H: Calculation of Isotope Distributions in Mass Spectrometry. A Trivial Solution for a Non-Trivial Problem. Anal Chim Acta 1991, 247: 107–109. 10.1016/S0003-2670(00)83059-7View ArticleGoogle Scholar
- Grushka E: Characterization of exponentially modified Gaussian peaks in chromatography. Anal Chem 1972, 44(11):1733–1738. [First peak on application of EMG for elution profiles] 10.1021/ac60319a011View ArticlePubMedGoogle Scholar
- Li J: Comparison of the capability of peak functions in describing real chromatographic peaks. Journal of Chromatography A 2002, 952(1–2):63–70. 10.1016/S0021-9673(02)00090-0View ArticlePubMedGoogle Scholar
- Naish P, Hartwell S: Exponentially Modified Gaussian functions: A good model for chromatographic peaks in isocratic HPLC? Chromatographia 1988, 26: 285–296. 10.1007/BF02268168View ArticleGoogle Scholar
- R Sarpeshkar TD, Mead CA: White noise in MOS transistors and resistors. IEEE Circuits Devices Mag 1993, 23–29. 10.1109/101.261888Google Scholar
- van Etten WC: Poisson Processes and Shot Noise. Introduction to Random Signals and Noise 2006, 193–210.View ArticleGoogle Scholar
- Anderle M, Roy S, Lin H, Becker C, Joho K: Quantifying reproducibility for differential proteomics: noise analysis for protein liquid chromatography-mass spectrometry of human serum. Bioinformatics 2004, 20: 3575–3582. 10.1093/bioinformatics/bth446View ArticlePubMedGoogle Scholar
- Du P, Stolovitzky G, Horvatovich P, Bischoff R, Lim J, Suits F: A Noise Model for Mass Spectrometry Based Proteomics. Bioinformatics 2008, 1070–1077. 10.1093/bioinformatics/btn078Google Scholar
- Shin H, Koomen J, Baggerly K, Markey M: Towards a noise model of MALDI TOF spectra. American Association for Cancer Research (AACR) advances in proteomics in cancer research, Key Biscayne, FL 2004.Google Scholar
- Wishart DS, Tzur D, Knox C, Eisner R, Guo AC, Young N, Cheng D, Jewell K, Arndt D, Sawhney S, Fung C, Nikolai L, Lewis M, Coutouly MA, Forsythe I, Tang P, Shrivastava S, Jeroncic K, Stothard P, Amegbey G, Block D, Hau DD, Wagner J, Miniaci J, Clements M, Gebremedhin M, Guo N, Zhang Y, Duggan GE, MacInnis GD, Weljie AM, Dowlatabadi R, Bamforth F, Clive D, Greiner R, Li L, Marrie T, Sykes BD, Vogel HJ, Querengesser L: HMDB: the Human Metabolome Database. Nucl Acids Res 2007, 35: D521–526. 10.1093/nar/gkl923PubMed CentralView ArticlePubMedGoogle Scholar
- Li Xj, Yi EC, Kemp CJ, Zhang H, Aebersold R: A Software Suite for the Generation and Comparison of Peptide Arrays from Sets of Data Collected by Liquid Chromatography-Mass Spectrometry. Mol Cell Proteomics 2005, 4: 1328–1340. 10.1074/mcp.M500141-MCP200View ArticlePubMedGoogle Scholar
- NCRR Proteomics Resource at PNNL: Decon2LS.[http://ncrr.pnl.gov/software/]
- Horn DM, Zubarev RA, McLafferty FW: Automated reduction and interpretation of high resolution electrospray mass spectra of large molecules. Journal of the American Society for Mass Spectrometry 2000, 11(4):320–332. 10.1016/S1044-0305(99)00157-9View ArticlePubMedGoogle Scholar
- Schley C, Swart R, Huber C: Capillary scale monolithic trap column for desalting and preconcentration of peptides and proteins in one- and two-dimensional separations. J Chromatogr A 2006, 1136(2):210–220. 10.1016/j.chroma.2006.09.072View ArticlePubMedGoogle Scholar
- Mayr BM, Kohlbacher O, Reinert K, Sturm M, Gröpl C, Lange E, Klein C, Huber C: Absolute Myoglobin Quantitation in Serum by Combining Two-Dimensional Liquid Chromatography-Electrospray Ionization Mass Spectrometry and Novel Data Analysis Algorithms. J Proteome Res 2006, 5: 414–421. 10.1021/pr050344uView ArticlePubMedGoogle Scholar
- Senko M, Beu S, McLafferty F: Determination of Monoisotopic Masses and Ion Populations for Large Biomolecules from Resolved Isotopic Distributions. Journal of the American Society for Mass Spectrometry 1995, 6: 229–233. 10.1016/1044-0305(95)00017-8View ArticlePubMedGoogle Scholar
- America AHP, Cordewener JHG: Comparative LC-MS: A landscape of peaks and valleys. Proteomics 2008, 8(4):731–749. 10.1002/pmic.200700694View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.