A robust linear regression based algorithm for automated evaluation of peptide identifications from shotgun proteomics by use of reversed-phase liquid chromatography retention time
© Xu et al; licensee BioMed Central Ltd. 2008
Received: 11 March 2008
Accepted: 19 August 2008
Published: 19 August 2008
Rejection of false positive peptide matches in database searches of shotgun proteomic experimental data is highly desirable. Several methods have been developed to use the peptide retention time as to refine and improve peptide identifications from database search algorithms. This report describes the implementation of an automated approach to reduce false positives and validate peptide matches.
A robust linear regression based algorithm was developed to automate the evaluation of peptide identifications obtained from shotgun proteomic experiments. The algorithm scores peptides based on their predicted and observed reversed-phase liquid chromatography retention times. The robust algorithm does not require internal or external peptide standards to train or calibrate the linear regression model used for peptide retention time prediction. The algorithm is generic and can be incorporated into any database search program to perform automated evaluation of the candidate peptide matches based on their retention times. It provides a statistical score for each peptide match based on its retention time.
Analysis of peptide matches where the retention time score was included resulted in a significant reduction of false positive matches with little effect on the number of true positives. Overall higher sensitivities and specificities were achieved for database searches carried out with MassMatrix, Mascot and X!Tandem after implementation of the retention time based score algorithm.
The science of proteomics encapsulates the large-scale identification, characterization and quantitation of proteins from biological samples. Mass spectrometry (MS) has been recognized as a powerful technique to study proteins. High-performance liquid chromatography (HPLC) coupled with tandem mass spectrometry (LC-MS/MS) is most commonly used in shotgun proteomics to resolve and identify proteolytic peptides generated from complex protein mixtures . Peptide and protein identifications are usually derived from information contained in the tandem MS data. Automated database searching and de novo sequencing algorithms are routinely used to convert the MS/MS data into peptide and protein identifications . Database search algorithms are more commonly used at this time due to their relatively low computational expense and higher compatibility with low mass accuracy and low quality MS/MS data [3, 4].
It has also been recognized that the LC retention times of peptides are related to their sequences and can be used as complementary information for their identification and characterization . Several methods have been developed to predict peptide retention times in reversed-phase liquid chromatography (RPLC) based on amino acid compositions and/or sequences [6–22]. High correlation between observed and predicted retention times for peptides in RPLC under different conditions has been achieved by use of these methods. Furthermore, these approaches can be combined with mass spectrometry to achieve better confidence in peptide identification than MS alone. For example, accurate mass tags combined with peptide retention time prediction has been effectively used by several groups to improve proteome characterization [23–26].
Peptide retention time prediction can also be used to refine and improve peptide identifications resulting from analysis of LC-MS/MS by database search software. In this way, false peptide matches from database search results can be minimized and true peptide matches can be confirmed with higher confidence. Krokhin et al reported an algorithm to refine the results obtained from the Global Proteome Machine database searches . In their approach, either internal or external standard peptides were used to estimate the regression parameters for the linear retention time prediction model used to refine their results. Strittmatter et al reported a post-database search method that evaluated peptide matches from SEQUEST based upon their retention times. They reported a peptide retention time prediction model based on artificial neutral networks . The prediction model was calibrated with highly reliable peptide matches from SEQUEST for each LC-MS/MS analysis . An empirical discriminant score based on retention time and SEQUEST scores was also developed for peptide matches. It was shown that the number of reliable peptide matches was increased by use of peptide retention time information . Klammer et al also developed an algorithm based on support vector regression to improve peptide identifications in tandem mass spectrometry by use of retention time prediction. As much as 50% of false peptide identifications in database search results from SEQUEST can be filtered and only 3% of true peptide matches were lost. The algorithm also trains the linear regression model dynamically for each data set .
We recently developed a robust linear regression based algorithm for automated evaluation of peptide identifications from database search programs based on retention time in RPLC. The algorithm extends the retention time prediction algorithm and its use for peptide identification in off-line LC-MS/MS by Krokhin et al [16, 28]. The algorithm described here works for on-line LC-MS/MS experiments and eliminates the need of retention time prediction model calibration by use of internal or external standard peptides. The algorithm is generic and can be used to evaluate peptide matches from any database search program. It has been included in a database search program, MassMatrix , to perform automated data analysis. A post-hoc retention time analysis program LR_RT was also developed to analyze search results from other publicly available programs, such as Mascot and X!Tandem. Furthermore, a score algorithm was developed to provide a statistical score for each peptide match based on its predicted and observed retention times.
Sample preparation and mass spectrometry
Bovine histones were isolated from bovine thymus tissue as described by Sures et al [32, 33]. The bovine histone mixture was digested by use of trypsin in 100 mM ammonium bicarbonate buffer (pH = 8.0). Enzymes were used in 25:1 ratio (substrate:enzyme) and the mixture was incubated at 37°C for two hours. The digested peptides were identified by use of data-dependent nano-LC-MS/MS on an LCQ Deca XP ion trap mass spectrometer (ThermoFisher, San Jose, CA, USA) as reported previously by Su et al . In brief, 2.0 μL of bovine histone peptides at a total concentration of 0.1 μg/μL was injected and eluted off the capillary HPLC column (5 cm × 75 μm Pico Frit C18 column, 300 Å pore size, New Objective, Woburn MA) into the LCQ mass spectrometer at a flow rate of ~250 nL/min. Mobile phases A and B were water with 0.1% acetic acid and acetonitrile with 0.1% acetic acid respectively. A linear gradient of 5–50% of mobile phase B over 35 minutes was used. The total run time was 70 minutes.
Database Search and Search Parameters
The .RAW data files obtained from the mass spectrometer were converted to mzXML files by use of ReAdW http://tools.proteomecenter.org/wiki/index.php?title=Software:ReAdW. Tandem mass spectra that were not derived from singly charged precursor ions were considered as both doubly and triply charged precursors. The mzXML file was searched by use of MassMatrix http://www.massmatrix.net against a database that contained both the bovine histone database and a reversed NCBInr human protein database as a decoy database. The search options were set as follows: i) No variable or fixed modifications; ii) Enzyme: trypsin; iii) Missed Cleavages = 3; iv) Peptide Length = 6 to 30 amino acid residues; and v) Mass tolerances of 2.0 Da and 0.8 Da for the precursor and product ions respectively. The data set was also evaluated by Mascot  and X!Tandem . The search parameters were identical to those in MassMatrix. The search results from Mascot and X!Tandem were then analyzed by the post-hoc retention time analysis program, LR_RT http://www.massmatrix.net to obtain retention time based scores for each peptide match from the two search programs.
Results and discussion
A key advantage of this algorithm is that the LR model can be trained independently for each search. Thus there is no need to train or calibrate the LR model with internal or external standards for a given batch of samples. Furthermore, the algorithm is generic and can be used to evaluate peptide matches from any database search program. The algorithm can use different linear regression models for predicting peptide retention times under a variety of chromatographic conditions. For analysis of shotgun proteomic data sets, the linear regression model for peptide retention time prediction developed by Krokhin et al  was used. The model can be used to accurately predict retention times of tryptic peptides on reversed-phase (300 Å pore size) HPLC columns of various sizes with linear water-acetonitrile gradients containing trifluoroacetic acid, acetic acid, or formic acid as the ion-pairing agent . The detailed implementation and performance of the algorithm are described in the next sections.
Linear regression model for predicting peptide retention times in RPLC
Retention times for true peptide matches identified by database search programs were assumed to follow a linear regression (LR) model: T = aH + b + ε;
The RC and RcNt values for the 20 common amino acid residues were reported previously by Krokhin et al .
Selection of training data
The retention time of the peptide match for each tandem mass spectrum was obtained from the mzXML or mzData file. We assume that retention times for true peptide matches will follow the LR model described in eqn. 1. The parameters of the LR model for true peptide matches are estimated from a training data set, which are then used to evaluate all peptide matches in the search result. In order to eliminate the need for LR model training by internal or external protein or peptide standards, the algorithm creates a training data set directly from the current database search results. This training data contains a selected number of peptide matches from the database search program with scores above a specified threshold. The accuracy and reliability of the LR model directly rely on the quality of this training data set. There are two major factors that affect model training: 1) size of the training data set and 2) false peptide matches included in the training data set that do not follow the linear regression model of the true matches. These false peptide matches should appear as outliers in our LR model and are referred to as outlier peptide matches. These outlier peptide matches have a negative impact on the calculation of parameters in the LR model. Increasing the threshold for statistical scores contained in the training data can minimize outlier peptide matches but will also significantly reduce the size of the training data set. By setting up moderate thresholds, a typical database search can generate training data sets containing 100 to 500 peptide matches. One challenge of this approach is that outlier peptide matches may be retained within the training data, especially for searches with large databases obtained at low mass accuracy.
Recursive outlier-removal algorithm
For each iteration step, a robust LR model is fitted to the training data set and gives robust estimates, , of the parameters for the LR model in eqn. 1 (see Appendix 2 for details).
Calculate the residuals e = [e1, e2,⋯, en]Tn×1based on the robust parameter estimates by e = T - H .
Remove those data as outliers that have residuals outside the 95% confidence interval, i.e. e i ≤ - 1.96 or e i ≥ + 1.96 , where is the median of the residuals, and is equal to the median absolute of the residuals divided by 0.6745.
Repeat steps 1, 2 and 3 until no outliers are detected from the training data set in step 3 (Figure 1).
Score algorithm based on peptide retention time
where is the standard error of the predicted retention time given in eqn. A.9, Ft(n-2)(x) is the cumulative density function of the t distribution with degrees of freedom of n - 2. Smaller δ gives higher CRT score and indicates a higher confidence for the peptide match.
The theoretical distribution of CRT for random peptide matches is unknown and varies from one search to another.
Automate evaluation of peptide matches based on their retention times
The retention time score algorithm is included as part of the MassMatrix database search program to perform automated evaluation of peptide matches. Due to the robustness of the algorithm, the score threshold for selection of training data does not significantly affect the model training and results. The score threshold for selection of training data for the algorithm in MassMatrix was set to be ≥ 8.0 for both pp and pp2 scores  and ≥ 2.0 for pptag score . MassMatrix takes the mzXML, mzData and MGF data files as input data formats. The retention time based algorithm automatically scores peptide matches if the input data file is either mzXML or mzData. The retention time based algorithm is not used if the input data file is a MGF file due to the fact that MGF files lack retention times.
Post-hoc retention time scoring of other database search programs
The retention time based algorithm described herein is generic and a post-hoc retention time analysis program for all other database search programs, LR_RT, was developed to perform automated post-search evaluation of the peptide matches. The post-hoc analysis requires the original mzData or mzXML file along with a tab or space delimited .txt file of the search results. The search result file must contain the scan number, peptide sequence and score information for each peptide match. The program was tested on Mascot and X!Tandem search results. Search results in the tab delimited .txt files can be obtained from Mascot html search results or X!Tandem pepXML search results by use of Perl scripts available at http://www.massmatrix.net. Score thresholds for selection of training data for the retention time based algorithm are set to be ≥ 30 and ≤ 0.1 for Mascot and X!Tandem results respectively.
Evaluation of the robust linear regression based algorithm
MassMatrix automated evaluation of peptide matches based on their retention times
The robust LR based algorithm built in MassMatrix was evaluated against experimental LC-MS/MS data from bovine histone digests acquired by use of a LCQ Deca XP+ MS. The data set contained 3166 tandem mass spectra and was searched against a database that contained a bovine histone database and the NCBInr reversed human protein database as decoy sequences. The complete list of peptide matches is provided in the additional file [see Additional file 1]. The decoy database was much larger than the bovine histone database and created ~1000 times as many theoretical peptides as the bovine histone database. False positive peptide matches from the bovine histone database were thus assumed to be negligible [37, 38]. As a result peptide matches returned from the bovine histone database were considered as true positives (TPs) while those from the decoy database were considered to be false positives (FPs).
Figure 2b shows a scatter plot of the training data set that contained 143 peptide matches after removal of outliers by the recursive outlier-removal algorithm described above. Outlier removal resulted in a strong linear relationship between the retention times and the hydrophobicities of peptide matches in the training data. The training data set was fitted to the LR model, and the R2 value was improved from 0.35 to 0.90 after removal of the outliers. Furthermore, the 99% confidence band of the LR models was much narrower after the outlier removal by the algorithm.
The algorithm was also evaluated by use of two publicly available LC-MS/MS data sets from significantly more complex samples. The first data set was created by use of LC-MS/MS on an LCQ ion trap mass spectrometer from a tryptic digest of a proteome sample from Deinococcus radiodurans MR-1 gram-positive bacteria. The data set (Dataset_021014.RAW for Deinococcus radiodurans data) along with the experimental details can be obtained at http://ncrr.pnl.gov/data/. The data set was searched by use of the MassMatrix database search program against a database that contained both the Deinococcus radiodurans database and a dominant reversed NCBInr human protein database used as a decoy database. The second data set was created by use of 2D-LC-MS/MS on an LCQ Deca XP+ mass spectrometer from the tryptic digest of a human proteome sample. The sample was separated by a SCX column in 11 salt steps and the fraction from each step was analyzed by a C18 RPLC-MS/MS. Eleven MS/MS data sets were generated. The data set that was created from the first fraction that contained the greatest number of MS/MS scans among all 11 data sets was used in our evaluation. The data set and the experimental details can be found at http://bioinformatics.icmb.utexas.edu/OPD/. The data set was searched against a database with a target NCBInr human database and a dominant decoy database. The dominant decoy database contained ten randomized NCBInr human database and one reversed human database. The search parameters for both data sets were the same as those used for the bovine histone data set.
Test of the assumptions of the algorithm
There are two assumption involved in the algorithm. The first is that all true positives follow the linear regression model. It can be seen from the previous discussion that this assumption was violated. However, this departure from the first assumption was small and only caused small losses (0.31 to 5.47%) of TPs.
Post-hoc retention time analysis program for other database search programs
The post-hoc retention time analysis program was also evaluated by Mascot and X!Tandem search results from the Deinococcus radiodurans and human proteome data sets from complex samples. The databases and search parameters in Mascot and X!Tandem were the same as those used in the MassMatrix searches for the two data sets. For the Mascot searches of the Deinococcus radiodurans and human data sets (Figures 7b &7c) and the X!Tandem search of the Deinococcus radiodurans data set (Figure 8b), the algorithm also effectively reduced false positives with small losses of true positives. However, the algorithm was not applicable to the X!Tandem search of the human data set. This was due to the fact that X!Tandem did not return a significant number of true positives for the data set. The number of peptide matches with expectation value ≤ 0.1 from the target database of the search was 14, which was not enough for peptide retention time model training.
An algorithm based on robust LR has been developed for automated evaluation of peptide matches from database searches by use of peptide retention time in reversed-phase HPLC. The recursive outlier-removal algorithm based on robust LR enables the algorithm to train the LR model on the fly for each search thus the need for internal or external protein or peptide standards is eliminated. The LR model for peptide retention in RPLC developed by Krokhin et al  was adopted in the current implementation of the algorithm.
The algorithm was implemented in the MassMatrix database search program and evaluated with a LC-MS/MS data set of bovine histones obtained on a LCQ Deca XP mass spectrometer. The R2 value for LR model was improved from 0.35 to 0.90 after outlier removal. The majority (96.21%) of true peptide matches fell within the 99% confidence band for the trained LR model, whereas only 39.02% of false peptide matches fell in the same 99% confidence band. By use of this approach the majority (60.98%) of the false peptide matches can be filtered from the results based on retention time while only losing 3.79% of the true positive peptide matches.
A post-hoc retention time analysis program, LR_RT, was also developed to analyze peptide matches from other database search programs. The program was tested on Mascot and X!Tandem search results for the bovine histone data set. More than 60% of false positives in Mascot and X!Tandem search results were filtered by the program with a loss of less than 3.5% for true positives.
The algorithm was also tested on two publicly available data sets from complex samples. For the data set from a Deinococcus radiodurans proteome sample, the algorithm was able to reduce the majority of false positives at a small loss of true positives for searches in MassMatrix, Mascot and X!Tandem. For the data set from a human proteome sample, the algorithm could still effectively reduce false rates for searches in MassMatrix and Mascot. For the search of that data set in X!Tandem, the algorithm was not applicable due to the fact that X!Tandem was not able to catch a significant number of true positives.
A statistical score algorithm was developed for ranking peptide matches based on predicted and observed retention times. The score distribution for true peptide matches was close to its theoretical distribution, which indicates that the LR model trained by the robust LR based algorithm represents the true linear relationship between the peptide retention times in RPLC and their calculated hydrophobicities. False peptide matches tend to have much lower scores than true matches, and the majority of the false matches have scores less than 0.01. This score enables differentiation between true and false matches based on retention time. After removal of peptide matches with insignificant scores based on retention time, higher sensitivities and specificities were achieved and the false positive rates of the searches were significantly lowered as shown by the ROC analysis for all the three database search programs.
Availability and requirements
Project name: MassMatrix Retention Time Analysis.
Project home page: http://www.massmatrix.net/.
Operating systems: Windows, Linux.
Programming language: ANSI C++.
Other requirements: None.
Any restrictions to use by non-academics: None.
1. Linear Regression
where h = [1, H]1×2.
A regression model in eqn. 1 that assumes the residual ε follows an independent N(0, σ2) is called a normal error regression model. For this type of model, the variance of T isVar(T) = σ2
Due to the normality assumption of the residuals, the prediction error follows a normal distribution. Therefore, is a t distribution with degrees of freedom of (n - 2).
2. Robust Linear Regression
The ordinary least-square estimates of the LR model from eqn. A.1 are obtained as initial estimates of the regression parameters, .
- 2.At the ith iteration step, calculate residuals based on the parameter estimates from the previous i-1th iteration, ,(A10)
- 3.Calculate the weighted-least-squares estimates(A13)
Repeat steps 2 and 3 until the parameter estimates converge.
The converged estimates, , are the solution for the robust LR model.
The study was funded by the Ohio State University, the National Institutes of Health (CA107106, CA101956), the V Foundation (AACR Translational Cancer Research Grant) and the Leukemia & Lymphoma Society. The authors thank Ken J Auberry, Richard D Smith, and Anderson Gordon for the help in obtaining the Deinococcus radiodurans data set. The data set was obtained by Richard D. Smith and the Biological Systems Analysis and Mass Spectrometry group at Pacific Northwest National Laboratory (PNNL) in Richland, Washington. Portions of this research were supported by the NIH National Center for Research Resources (RR18522), and the W.R. Wiley Environmental Molecular Science Laboratory (a national scientific user facility sponsored by the U.S. Department of Energy's Office of Biological and Environmental Research and located at PNNL). PNNL is operated by Battelle Memorial Institute for the U.S. Department of Energy under contract DE-AC05-76RL0 1830. The human proteome data set was obtained from http://bioinformatics.icmb.utexas.edu/OPD/.
- Aebersold R, Mann M: Mass spectrometry-based proteomics. Nature 2003, 422: 198–207. 10.1038/nature01511View ArticlePubMedGoogle Scholar
- Nesvizhskii AI, Aebersold R: Analysis, statistical validation and dissemination of large-scale proteomics datasets generated by tandem MS. Drug Discov Today 2004, 9(4):173–181. 10.1016/S1359-6446(03)02978-7View ArticlePubMedGoogle Scholar
- Sadygov RG, Cociorva DC, Yates JR: Large-scale database searching using tandem mass spectra: Looking up the answer in the back of the book. Nature Methods 2004, 1(3):195–202. 10.1038/nmeth725View ArticlePubMedGoogle Scholar
- Kapp EA, Schütz F, Connolly LM, Chakel JA, Meza JE, Miller CA, Fenyo D, Eng JK, Adkins JN, Omenn GS, Simpson RJ: An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: sensitivity and specificity analysis. Proteomics 2005, 5(13):3475–3490. 10.1002/pmic.200500126View ArticlePubMedGoogle Scholar
- Shinoda K, Sugimoto M, Tomita M, Ishihama Y: Informatics for peptide retention properties in proteomics LC-MS. Proteomics 2008, 8: 787–798. 10.1002/pmic.200700692View ArticlePubMedGoogle Scholar
- Meek JL: Prediction of peptide retention times in high-pressure liquid chromatography on the basis of amino acid composition. Proc Natl Acad Sci USA 1980, 77(3):1632–1636. 10.1073/pnas.77.3.1632PubMed CentralView ArticlePubMedGoogle Scholar
- Meek JL, Rossetti ZL: Factors affecting retention and resolution of peptides in high-performance liquid-chromatography. J Chromatogr 1981, 211(1):15–28. 10.1016/S0021-9673(00)81169-3View ArticleGoogle Scholar
- Browne CA, Bennett HPJ, Solomon S: The isolation of peptides by high-performance liquid-chromatography using predicted elution positions. Anal Biochem 1982, 124(1):201–208. 10.1016/0003-2697(82)90238-XView ArticlePubMedGoogle Scholar
- Sasagawa T, Okuyama T, Teller DC: Prediction of peptide retention times in reversed-phase high-performance liquid-chromatography during linear gradient elution. J Chromatogr 1982, 240(2):329–340. 10.1016/S0021-9673(00)99612-2View ArticleGoogle Scholar
- Guo D, Mant CT, Taneja AK, Hodges RS: Prediction of peptide retention times in reversed-phase high-performance liquid chromatography II. Correlation of observed and predicted peptide retention times factors and influencing the retention times of peptides. J Chromatogr A 1986, 359: 519–532. 10.1016/0021-9673(86)80103-0View ArticleGoogle Scholar
- Guo D, Mant CT, Taneja AK, Parker JMR, Hodges RS: Prediction of peptide retention times in reversed-phase high-performance liquid chromatography I. Determination of retention coefficients of amino acid residues of model synthetic peptides. J Chromatogr A 1986, 359: 499–518. 10.1016/0021-9673(86)80102-9View ArticleGoogle Scholar
- Mant CT, Burke TWL, Black JA, Hodges RS: Effect of peptide-chain length on peptide retention behavior in reversed-phase chromatography. J Chromatogr 1988, 458: 193–205. 10.1016/S0021-9673(00)90564-8View ArticlePubMedGoogle Scholar
- Sakamoto Y, Kawakami N, Sasagawa T: Prediction of peptide retention times. J Chromatogr 1988, 442: 69–79. 10.1016/S0021-9673(00)94457-1View ArticlePubMedGoogle Scholar
- Palmblad M, Ramstrom M, Markides KE, Hakansson P, Bergquist J: Prediction of chromatographic retention and protein identification in liquid chromatography/mass spectrometry. Anal Chem 2002, 74(22):5826–5830. 10.1021/ac0256890View ArticlePubMedGoogle Scholar
- Petritis K, Kangas LJ, Ferguson PL, Anderson GA, Pasa-Tolic L, Lipton MS, Auberry KJ, Strittmatter EF, Shen Y, Zhao R, Smith RD: Use of artificial neutral networks for the accurate prediction of peptide liquid chromatography elution times in proteome analysis. Anal Chem 2003, 75: 1039–1048. 10.1021/ac0205154View ArticlePubMedGoogle Scholar
- Krokhin OV, Craig R, Spicer V, Ens W, Standing KG, Beavis RC, Wilkins JA: An improved model for prediction of retention times of tryptic peptides in ion pair reversed-phase HPLC. Mol Cell Proteomics 2004, 3(9):908–919. 10.1074/mcp.M400031-MCP200View ArticlePubMedGoogle Scholar
- Strittmatter EF, Kangas LJ, Petritis K, Mottaz HM, Anderson GA, Shen Y, Jacobs JM, Camp II DG, Smith RD: Application of peptide LC retention time information in a discriminant function for peptide identification by tandem mass spectrometry. J Proteome Res 2004, 3: 760–769. 10.1021/pr049965yView ArticlePubMedGoogle Scholar
- Baczek T, Wiczling P, Marszall M, Heyden YV, Kallszan R: Prediction of peptide retention at different HPLC conditions from multiple linear regression models. J Proteome Res 2005, 4(2):555–563. 10.1021/pr049780rView ArticlePubMedGoogle Scholar
- Wang Y, Gu X, Zhang J, Zhang XM: Prediction of peptid retention in RPLC. Chromatographia 2005, 62: 385–392. 10.1365/s10337-005-0644-2View ArticleGoogle Scholar
- Gorshkov AV, Tarasova IA, Evreinov VV, Savitski MM, Nielsen ML, Zubarev RA, Gorshkov MV: Liquid chromatography at critical conditions: Comprehensive approach to sequence-dependent retention time prediction. Anal Chem 2006, 78: 7770–7777. 10.1021/ac060913xView ArticlePubMedGoogle Scholar
- Petritis K, Kangas LJ, Yan B, Monroe ME, Strittmatter EF, Qian W, Adkins JN, Moore RJ, Xu Y, Lipton MS, Camp II DG, Smith RD: Improved peptide elution time prediction for reversed-phase liquid chromatography-ms by incorporating peptide sequence information. Anal Chem 2006, 78: 5026–5039. 10.1021/ac060143pPubMed CentralView ArticlePubMedGoogle Scholar
- Tripet B, Cepeniene DC, Kovacs JM, Mant CT, Krokhin OV, Hodges RS: Requirements for prediction of peptide retention time in reversed-phase high-performance liquid chromatography: Hydrophilicity/hydrophobicity of side-chains at the N- and C-termini of peptides are dramatically affected by the end-groups and location. J Chromatogr A 2007, 1141: 212–225. 10.1016/j.chroma.2006.12.024PubMed CentralView ArticlePubMedGoogle Scholar
- May D, Fitzgibbon M, Liu Y, Holzman T, Eng J, Kemp CJ, Whiteaker J, Paulovich A, McIntosh M: A platform for accurate mass and time analysis of mass spectrometry data. J Proteome Res 2007, 6: 2685–2694. 10.1021/pr070146yView ArticlePubMedGoogle Scholar
- Norbeck AD, Monroe ME, Adkins JN, Anderson KK, Daly DS, Smith RD: The utility of accurate mass and LC elution time information in the analysis of complex proteomes. J Am Soc Mass Spectrum 2005, 16: 1239–1249. 10.1016/j.jasms.2005.05.009View ArticleGoogle Scholar
- Jaitly N, Monroe ME, Paetyuk VA, Clauses TRW, Adkins JN, Smith RD: Robust algorithm for alignment of liquid chromatography-mass spectrometry analyses in an accurate mass and time tag data analysis pipeline. Anal Chem 2006, 78: 7397–7409. 10.1021/ac052197pView ArticlePubMedGoogle Scholar
- Palmblad M, Ramstrom M, Bailey CG, McCutchen-Maloney SL, Bergquist J, Zeller LC: Protein identification by liquid chromatography-mass spectrometry using retention tiem prediction. J Chromatogr B Analyt Technol Biomed Life Sci 2004, 803(1):131–135. 10.1016/j.jchromb.2003.11.007View ArticlePubMedGoogle Scholar
- Craig R, Cortens JP, Beavis RC: Open source system for analyzing, validating, and storing protein identification data. J Proteome Res 2004, 3(6):1234–1242. 10.1021/pr049882hView ArticlePubMedGoogle Scholar
- Krokhin OV, Ying S, Cortens JP, Ghosh D, Spicer V, Ens W, Standing KG, Beavis RC, Wilkins JA: Use of peptide retention prediction for protein identification by off-line reversed-phase HPLC-MALDI MS/MS. Anal Chem 2006, 78: 6265–6269. 10.1021/ac060251bView ArticlePubMedGoogle Scholar
- Eng JK, McCormack AL, Yates JR: An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom 1994, 5: 976–989. 10.1016/1044-0305(94)80016-2View ArticlePubMedGoogle Scholar
- Klammer AA, Yi X, MacCoss MJ, Noble WS: Improving tandem mass spectrum identification using peptide retention tiem prediction across diverse chromatography conditions. Anal Chem 2007, 79: 6111–6118. 10.1021/ac070262kView ArticlePubMedGoogle Scholar
- Xu H, Freitas AF: A high mass accuracy sensitive probability based scoring algorithm for database searching of tandem mass spectrometry data. BMC Bioinformatics 2007, 8: 133. 10.1186/1471-2105-8-133PubMed CentralView ArticlePubMedGoogle Scholar
- Sures I, Gallwitz D: Histone-specific acetyltransferases from calf thymus. isolation, properties, and substrate specificity of three different enzymes. Biochem 1980, 19: 943–951. 10.1021/bi00546a019View ArticleGoogle Scholar
- Zhang LW, Freitas MA, Wickham J, Parthun MR, Klisovic MI, Marcucci G, Byrd JC: Differential expression of histone post-translational modifications in acute myeloid and chronic lymphocytic leukemia determined by high-pressure liquid chromatography and mass spectrometry. J Am Soc Mass Spectrom 2004, 15: 77–86. 10.1016/j.jasms.2003.10.001View ArticlePubMedGoogle Scholar
- Su X, Jacob NK, Amunugama R, Lucas DM, Knapp AR, Ren C, Davis ME, Marcussi G, Parthun MR, Byrd JC, Fishel R, Freitas MA: Liquid chromatography mass spectrometry profiling of histones. J Chromatogr B 2007, 850: 440–454. 10.1016/j.jchromb.2006.12.037View ArticleGoogle Scholar
- Perkins DN, Pappin DJC, Creasy DM, Cottrell JS: Probability-based protein identification by searching sequence database using mass spectrometry data. Electrophoresis 1999, 20: 3551–3567. 10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2View ArticlePubMedGoogle Scholar
- Xu H, Freitas MA: Monte Carlo simulation based algorithms for analysis of shotgun proteomic data. J Proteome Res 2008, 7(7):2605–2615. 10.1021/pr800002uPubMed CentralView ArticlePubMedGoogle Scholar
- Huttlin EL, Hegeman AD, Harms AC, Sussman MR: Prediction of error associated with false-positive rate determinantion for peptide identification in large-scale proteomics experiments using a combined reversed and forward peptide sequence database strategy. J Proteome Res 2007, 6: 392–398. 10.1021/pr0603194PubMed CentralView ArticlePubMedGoogle Scholar
- Elias JE, Gygi SP: Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nature Methods 2007, 4(3):207–214. 10.1038/nmeth1019View ArticlePubMedGoogle Scholar
- Prince JT, Carlson MW, Wang R, Lu P, Marcotte EM: The need for a public proteomics repository. Nature Biotechnology 2004, 22(4):471–472. 10.1038/nbt0404-471View ArticlePubMedGoogle Scholar
- Fox J: An R and S-PLUS comparison to applied regression. Thousand Oaks, CA, USA , Sage; 2002.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.