Optimization of filtering criterion for SEQUEST database searching to improve proteome coverage in shotgun proteomics
© Jiang et al; licensee BioMed Central Ltd. 2007
Received: 14 November 2006
Accepted: 31 August 2007
Published: 31 August 2007
In proteomic analysis, MS/MS spectra acquired by mass spectrometer are assigned to peptides by database searching algorithms such as SEQUEST. The assignations of peptides to MS/MS spectra by SEQUEST searching algorithm are defined by several scores including Xcorr, ΔCn, Sp, Rsp, matched ion count and so on. Filtering criterion using several above scores is used to isolate correct identifications from random assignments. However, the filtering criterion was not favorably optimized up to now.
In this study, we implemented a machine learning approach known as predictive genetic algorithm (GA) for the optimization of filtering criteria to maximize the number of identified peptides at fixed false-discovery rate (FDR) for SEQUEST database searching. As the FDR was directly determined by decoy database search scheme, the GA based optimization approach did not require any pre-knowledge on the characteristics of the data set, which represented significant advantages over statistical approaches such as PeptideProphet. Compared with PeptideProphet, the GA based approach can achieve similar performance in distinguishing true from false assignment with only 1/10 of the processing time. Moreover, the GA based approach can be easily extended to process other database search results as it did not rely on any assumption on the data.
Our results indicated that filtering criteria should be optimized individually for different samples. The new developed software using GA provides a convenient and fast way to create tailored optimal criteria for different proteome samples to improve proteome coverage.
Because of the high sensitivity, mass spectrometry has been widely used for protein identification and characterization in proteome researches within the past decade[1, 2]. Shotgun proteome approach, which is based on analysis using liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS), can be applied to analyze complex protein mixtures directly even without any prior purification step. Large-scale proteome profiling using multidimensional LC-MS/MS has become increasingly applied for the analysis of many biological samples, including various mammalian tissues, cell lines, and serum/plasma [3–8]. In shotgun proteomics, complex protein mixtures are first digested by the enzyme (e.g. trypsin) to produce peptide mixtures. Then the peptide mixtures are subjected to extensive separations such as strong cation exchange chromatography (SCX) coupling with on-line or off-line reversed-phase capillary LC (RPLC). Peptides eluting from the reversed phase capillary LC column are sprayed into tandem mass spectrometer to produce MS/MS spectra. And then peptide sequences are assigned to experimental MS/MS spectra by database searching algorithm.
SEQUEST, Mascot and other database searching algorithms match experimental spectra with theoretical spectra which are generated from peptide sequences in silico, and then calculate scores to evaluate how well they match. These scores help discriminating between correct and incorrect peptide assignments. One of the major issues in database search for proteome analysis is to determine the false-discovery rate (FDR) of the identifications. FDR is the rate at which significant identifications are actually null. A variety of methods were developed to determine FDR for peptide identifications. Some efforts have been made on establishing statistical analysis methods [11–17] to determine the possibility of positive identifications, e.g. PeptideProphet. Complicated statistical algorithms are often needed in these methods. Another simpler way to evaluate FDR is using decoy proteome approach which was introduced by Peng et al. Determination of FDR in this method is based on the database searching using a composite database including original protein database and its reversed version. Statistically, the probability that a peptide is identified incorrectly from reversed database is expected to be same as the probability that it is identified incorrectly from original protein database as the sizes of reversed database and original database are the same [19–21]. Therefore, FDR can be calculated using the following equation:
FDR = 2*n(rev)/(n(rev)+n(forw)), (1)
where n(forw) and n(rev) are the number of peptides identified in proteins with forward (original) and reversed sequences, respectively[18, 22]. The database searching strategy using composite database is also known as reversed database searching strategy. Because of the simple usage, it has been widely used in the evaluation of proteomic search results[18, 22–26] including post-translation modification (PTM) researches[19, 27, 28].
SEQUEST is one of the commonly used database searching algorithms. It first counts the peaks which are common in experimental and theoretical spectra, and computes a preliminary score (Sp). Then it selects a proportion of top candidate peptides based on the rank of preliminary score (Rsp) for cross-correlation analysis. So, for each candidate peptide identification, several scores and rankings are determined. To distinguish correct identifications from incorrect identifications, filters using a set of database searching scores are applied, including two commonly used scores, Xcorr and ΔCn. In order to evaluate FDR of the identifications, reversed database searching could be performed and the FDR could be determined by Equation (1). To control FDR, many research groups usually use fixed Xcorr values and manually increase ΔCn to get peptide identifications with specific FDR, or use a fixed ΔCn value and manually increase of Xcorr scores . However, these new criteria which were determined by adjusting only one score filter to reach a specific FDR may be not optimal.
Genetic algorithm (GA) belongs to evolutionary algorithms and applies natural selection process, where better fitted species are selected. The optimization process of this algorithm is based on multi-point-search for which many solutions are calculated simultaneously. If the fitness function is properly designed, GA has the ability to search through very large sets of possible solutions and converge to an optimal or near optimal solution quite quickly. It has been successfully applied to process MS data in proteome researches [30–32].
In this work, we combined the decoy database searching approach with automated filter criteria optimization, and developed a software suite named SFOER (SEQUEST Filter Optimizer Using Genetic Algorithm) using GA which enables simultaneous optimization of multiple SEQUEST score filtering criteria. The optimized criteria were used to filter datasets which were generated from two different human samples and resulted in approximate 20% increase of peptide identifications than that using conventional criteria[14, 25, 33] while FDR were kept the same (<1%). Direct comparison between SFOER and PeptideProphet has been performed using both complex human samples and standard protein mixtures. Compared with PeptideProphet, SFOER showed nearly same ability in distinguishing correct peptide identifications from incorrect ones with only 1/10 of the processing time. And because SFOER doesn't rely on models which are based on possible unfounded assumptions, it provides a safe way for fast determination of tailored optimal filtering criteria for different proteome samples, thus, higher proteome coverage can be achieved.
Results and discussion
To evaluate the confidence of peptide assignments by SEQUEST and generate the score distribution for peptide identifications, we have generated large datasets of human proteome samples by SCX/RPLC-MS/MS. Approximately 277,000 MS/MS spectra were generated from human liver tissue lysate. All MS/MS spectra were searched by SEQUEST against a composite database containing human IPI proteins in both forward and reversed orientation. Herein only the top matched peptide from a spectrum with specific charge state was accepted. Approximate 11,000, 186,000 and 181,000 peptides according to the charge states of 1+, 2+ and 3+ were finally generated. 165,966 (43.86%) peptides were derived from reversed protein database and 212,430 (56.14%) were from forward protein database.
True and false assignment distribution
To obtain confident identifications with specific FDR (< 1%), filter criteria with two SEQUEST scores, Xcorr and ΔCn, need to be adjusted. A series of filtering criteria using these two cutoff scores can be determined in this way: Xcorr cutoff scores were increased by a specific value (e.g. 0.05) step by step, and ΔCn cutoff scores were decided accordingly with the Xcorr cutoff values for the aim that identifications passed the filtering criterion had an overall FDR less than 1%. Cumulate curves of these filters determined above were shown in each graph of Figure 1 according to the charge states of 1+, 2+ and 3+, and every point on each curve indicated a set of criteria leading to FDR < 1% for peptide identifications with a specific charge state. These curves indicated that to achieve peptide identifications with FDR less than 1%, various criteria can be used.
Optimization of filtering criteria by genetic algorithm
On the basis of above GA optimization procedure, SFOER was utilized to optimize the filtering criteria for dataset generated from human liver tissue lysate. Finally, we got the following optimized criteria (FDR <1%): according to the charge states of 1+, 2+ and 3+, Xcorr scores should be bigger than 1.76, 2.31 and 2.41, ΔCn should be bigger than 0.061, 0.199 and 0.265, Sp should be bigger than 44.42, 104 and 276.9 and Rsp should be within 3, 4 and 2. Filtered by this set of criteria, 29,934 positive peptides were generated, including 162 singly charged peptides, 18,513 doubly charged peptides and 11,259 triply charged peptides.
Comparison of the performance of conventional criteria, PeptideProphet and SFOER in peptide identifications for the analysis of human liver tissue lysatea
# false pep
%incr unique pep
When the optimization was performed on another very different sample, human blood plasma, different set of optimal filtering criteria was generated. According to the charge states of 1+, 2+ and 3+, Xcorr scores should be bigger than 1.88, 2.31 and 2.40, ΔCn should be bigger than 0.179, 0.27 and 0.319, Sp should be bigger than 238, 71 and 215.6 and Rsp should be within 80, 2 and 1. Filtered by this set of criteria, 14,218 peptides were generated. And there was an 15.3% increase of peptide identifications than those resulted from conventional criteria (Xcorr cutoffs bigger than 2.0, 2.5 and 3.8 for singly, doubly and triply charged peptides and ΔCn scores bigger than 0.265)[14, 25, 33].
The optimized criteria of peptide identifications from human liver tissue lysate and human plasma by SFOER with FDR less than 1%
In most cases, the differences on proteome analysis were inevitable: protein samples may come from different tissues or even different species, mass spectra may be collected by different type of mass spectrometers under different separation conditions and so on. These differences will result in the generation of datasets with different characteristics. Statistical approaches based on training with some assumptions on one type of dataset may only work well on datasets with that particular type. However, for other type of datasets with different characteristics, these approaches may need retraining or redesign. While SFOER does not employ any statistical method and no training was required. So SFOER can be applied to process any database search results as long as the searches were performed against decoy database where FDR could be easily determined. By using this GA based software suite, optimized criteria for different datasets can be easily determined, and these tailored optimal criteria should be very effective to improve the coverage for proteome analysis.
Summary of the peptide identifications from human liver tissue by applying filtering criteria optimized using different score combinations
All four scores
Xcorr ΔCn Rsp
Xcorr ΔCn Sp
Classification performance of SFOER
Compared with the conventional approach, the numbers of identified peptides increased significantly when the filtering criteria optimized by SFOER were applied. A concern for this is that whether the increased peptide identifications are true identifications. For datasets from human liver tissue sample, 5,588 extra peptide identifications were achieved when the filtering criteria optimized by SFOER were applied. It is impossible to manually validate all of these peptide identifications. A practical way is to randomly select small portion of the increased peptide identifications and manually check with their spectra. Thus 300 out of from 5,588 extra peptides identifications were randomly selected. Each of these spectra was assessed for acceptable signal-to-noise ratio and the presence of at least three consecutive b or y ion fragments. Finally 98.3% (295 out of 300) of these peptides were true positive and the false-discovery rate was very close to the overall predicted FDR. It was found that 84% (4,693 out of 5,588) of the increased peptides can also be detected by PeptideProphet at a probability cutoff of 0.9 for which the empirical error rate was 1.1%. Above results clearly demonstrated that the additional peptide identifications obtained by SFOER were quite confident. (MS/MS spectra of the increased peptide identifications using our optimized criteria can be downloaded from our website).
Classification performance of SFOER was further validated by standard protein mixture. Tryptic digest of seven standard proteins was selected as the sample. And the acquired MS/MS spectra were searched against a composite database containing both forward and reversed sequences of all control proteins (including trypsin) as well as forward and reversed protein sequences from yeast, chosen for its low homology with readily available control proteins. Because the proteins present in the sample were known, correct and incorrect peptide assignments can be easily distinguished by the rule whether it is from known standard proteins. Thus actual FDR, i.e. the observed FDR, can be determined by the percentage of peptide identifications not from standard proteins among all peptide identifications, while predicted FDR was determined by Equation (1). If not otherwise stated, FDR refers to the predicted FDR. The classification performance of SFOER could be evaluated by comparing the actual and predicted FDR.
GA is a very efficient algorithm and is widely used in searching for optimal or near optimal solutions. Thus, SFOER which employing GA should inherit this advantage. Approximately 277,000 spectra (12 LC-MS/MS runs) were processed by PeptideProphet and SFOER on a Pentium 4 (3.0 GHz) computer separately. The optimization procedure using SFOER took less than 4 min (10 s for 1+, 100 s for 2+ and 99 s for 3+), while the procedure for calculation of probability by PeptideProphet took about 38 min. And the IO procedures (for PeptideProphet, it consisted of assembling peptides from out files to html files and the conversion of files from html format to xml format, while for SFOER it only included the assembling of peptides from out files to plain text files) took about 40 min and 28 min for PeptideProphet and SFOER, respectively. Evidently, SFOER was much faster than PeptideProphet for which only 1/10 of time was needed for the searching of optimal criteria (without consideration of IO procedures).
For model based algorithm like PeptideProphet, accuracy relies on the fitness between the empirical model and obtained datasets. If the model accurately reflects the physical processes by which the data are generated, it can work well even for a small amount of training data. On the other hand if the data distributes in a significant way, classification errors proportional to the degree of divergence result. However, SFOER is less risky for that it does not rely on model. The pre-knowledge on the property of the dataset or making assumptions about the dataset is not required. Therefore, this approach is equally applicable to many datasets with different characteristics. However, there is one requirement for application of SFOER. As FDR for peptide identification is required during the optimization, SFOER can only process database search results performed with decoy database.
SFOER can also be easily extended to some special applications by slightly revision. Currently, SFOER only takes several SEQUEST scores such as Xcorr, ΔCn, Sp and Rsp as its weights. It was reported that some peptide properties obtained from the experiments of proteome analysis could be used to increase the confidence of peptide identifications. These properties including the pI values obtained from the isoelectric focusing (IEF), hydrophobicity or elution times obtained from reversed phase LC separation (NET), high accurate masses obtained from using of FT mass spectrometer and so on. In principle, these properties as well as SEQUEST scores can be optimized simultaneously for filtering criteria by this software suite. And significant improvement in proteome coverage for proteome analysis is expected. Though SFOER was developed to optimize filtering criteria for SEQUEST database search, after slightly revision it should also be easily applied to the optimization of filtering criteria for other database search engines such as Mascot as long as the decoy database search strategy is applied.
A software suite, named as SFOER, was developed using predictive genetic algorithm (GA) to optimize filtering criterion for SEQUEST database searching. The optimization was based on reversed database search where FDR can be easily determined. It was demonstrated that SFOER was able to maximize the number of identified peptides without increase of FDR. Compared with statistical approach – PeptideProphet, SFOER has nearly the same classification performance but cost much less processing time. Moreover, as it did not rely on possibly unfounded assumptions about the data, SFOER can create tailored criteria for datasets which are obtained from different samples, generated from different mass spectrometers, even searched with different database searching algorithms (weights need to be altered).
Materials and reagents
Magic C18AQ (5 μm, 100 Å pore size) was purchased from Michrom BioResources (Auburn, CA, USA), and Polysulfoethyl Aspartamide (5 μm, 200Å pore) was from PolyLC Inc (Columbia, MD, USA). PEEK tubing, sleeves, microtee and microcross were obtained from Upchurch Scientific (Oak Harbor, WA, USA). Fused-silica capillaries (50, 75 and 100 μm I.D.) were purchased from Polymicro Technologies (Phoenix, AZ, USA). All the water used in the experiment was purified using a Mill-Q system (Millipore, Bedford, MA, USA). Dithiothreitol (DTT), iodoacetamide were all purchased from Sino-American Biotechnology Corporation (Beijing, China). Urea, ammonium acetate, ammonium bicarbonate and acetic acid were obtained from Sigma (St. Louis, MO, USA). Trypsin was from Promega (Madison, WI, USA). Tris was from Amersco (Solon, Ohio, USA). Formic acid was obtained from Fluka (Buches, Germany). Acetonitrile (ACN, HPLC grade) was from Merck (Darmstadt, Germany). Protease inhibitor cocktail tablets (Complete Mini) were purchased from Roche.
Human blood plasma was obtained from one healthy male donor (age 37, O type), provided by Zhuanghe Blood Center (Dalian, China). An initial protein concentration of ~95 mg/mL was determined in plasma using Bardford method. Human liver tissue was homogenized in lysis buffer (40 mM Tris, 6 M guanidine HCl, 65 mM DTT, 310 mM NaF, 3.45 mM NaVO3, protease inhibitor cocktail) and then sonicated for 180 s followed by centrifugation at 25,000 g for 1 h. The supernatant was collected as protein sample and the concentration was determined by Braford assay.
The human plasma sample and human liver tissue lysate were reduced by DTT and alkylated by iodoacetamide. Then the solutions were diluted to 1 M guanidine-HCl, and pH values were adjusted to 8.1. Finally, trypsin was added (trypsin:protein, 1:50) and the protein samples were incubated at 37°C for 20 h. Tryptic digests were desalted with a C18 solid – phase cartridge.
Tryptic digests of standard proteins were prepared by digesting of 500 pmol reduced, iodoacetamide alkylated bovine serum albumin, horse myoglobin, horse cytochrome c, chick ovalbumin, human hemoglobin, bovine β-casein and bovine α-casein. Bovine serum albumin was purchased from Roche and all other standard proteins were from Sigma-Aldrich. These digests were pooled to prepare seven protein digest mixture. The final concentrations of these proteins were ranged from 16 to 300 fmol per microliter.
LC-MS/MS analysis and database search
The configurations for 1D and 2D LC-MS/MS analysis were set as reported previously. Therein, a Finnigan LTQ linear ion trap mass spectrometer (Thermo, San Jose, CA) was coupled with capillary reversed phase LC for collection of MS/MS spectra. The tryptic digest of 7 standard proteins was analyzed by 1D LC-MS/MS with 7 replicate runs and the Human sample digests were analyzed by 2D LC-MS/MS.
The acquired MS/MS spectra were searched using Turbo SEQUEST in BioWorks 3.2 software suite (Thermo Finnigan, San Jose, CA). For 7 standard proteins, database was the composite of protein sequences from yeast (9,492 entries) in forward and reverse orient as well as the forward and reversed sequences of all control proteins with trypsin and α-s2-casein (for the impurity of α-casein). The database used for two human proteome samples was a composite of normal IPI human database (v3.04, 49,078 entries) from European Bioinformatics Institute with reversed version of the same database attached in the end. MS/MS spectra were searched using fully tryptic cleavage constraints and up to two missed cleavage sites were allowed. Cysteine residues were set as static modification of +57.0215 Da and methionine residues were set as variable modification of +15.9949 Da. Mass tolerances were 2 Da for peptide and 1 Da for fragment. FDR was determined by Equation (1).
Development of software suite SFOER using GA
Parameter settings for the genetic algorithm
Definition of a fitness function for evaluating individual members of a population is perhaps the most crucial step in designing genetic algorithm. The goal in this study was to derive optimized filtering criteria that achieved maximal separation between correct and incorrect peptide identifications and generated maximum sensitivity for true positive peptide identifications under specified confidence level (e.g. >99%). However, in most proteome researches, numbers of total positive peptides were commonly unknown. Thus, we utilized the following fitness function:
F(p) = n(p), (2)
where F(p) was the fitness value for a given filtering criterion which was consisted of several cutoff values for different scores, n(p) would be the number of overall positive peptide identifications passed this filtering criterion. And when FDR of peptide identifications filtered by a criterion was higher than specification, fitness of this criterion was set to zero. This function indicates the sensitivity of a specific criterion.
Creation of the starting population: The starting point in genetic algorithm of the initial population was randomly generated. One complete chromosome was assembled of a certain number of different SEQUEST scores and the population size was set as 100.
Selection: Roulette wheel selection pattern was chosen for the determination of each individual's probability for reproduction and breeding, concerning the policy that the better a chromosome of a parent was the more descendants with the same chromosomes were reproduced. When the fitness of an individual became zero, this individual was selected as death, and replaced by a new initial individual.
Genetic manipulation: Two new breed chromosomes were then performed by a single-point cross-over, whereas genes were randomly altered along the length of a chromosome at one point according to a natural occurring cross-over. The cross-over rate was set to 0.2 and the rate of a subsequently performed point mutation, thus a binary character was changed from 1 to 0 or vice versa, was set to 0.01.
Steps 2, 3 were repeated until termination of the optimization. A stop criterion was not pre-defined, owing to limited data known about the search space. In this study, we used specific generations which can be set manually to terminate optimizations.
All database search results were processed by SFOER to generate optimized criteria on different confidence levels, and then peptide identifications were filtered by these sets of criteria. PeptideProphet which was downloaded as part of Trans-Proteomics Pipeline (TPP) from The Seattle Proteome Center was also used to process these datasets. All peptides assigned from database searching were parsed by PeptideProphet to generate PeptideProphet-probability using default parameters. Manual adjustment of peptide probability threshold was used to generate peptide identifications with different confidence levels.
Availability and requirements
The SFOER is developed using Java 2 Platform Standard Edition (J2SE) Development Kit 5.0 (Sun Microsystems, Inc) and is platform independent. Java Runtime Environment 1.5.0 or higher is required. It is distributed under a GNU General Public License (GPL) and is available at http://bioanalysis.dicp.ac.cn/proteomics/software/SFOER.html.
This work was supported by National Natural Sciences Foundation of China (No. 20675081), the China State Key Basic Research Program Grant (2005CB522701, 2007CB914104), the China High Technology Research Program Grant (2006AA02A309), the Knowledge Innovation program of CAS (KJCX2.YW.HO9) and the Knowledge Innovation program of DICP to H.Z. are gratefully acknowledged.
- Aebersold R, Mann M: Mass spectrometry-based proteomics. Nature. 2003, 422 (6928): 198-207. 10.1038/nature01511.View ArticlePubMedGoogle Scholar
- Yates JR: Mass spectral analysis in proteomics. Annu Rev Biophys Biomolec Struct. 2004, 33: 297-316. 10.1146/annurev.biophys.33.111502.082538.View ArticleGoogle Scholar
- Koller A, Washburn MP, Lange BM, Andon NL, Deciu C, Haynes PA, Hays L, Schieltz D, Ulaszek R, Wei J, Wolters D, Yates JR: Proteomic survey of metabolic pathways in rice. Proc Natl Acad Sci U S A. 2002, 99 (18): 11969-11974. 10.1073/pnas.172183199.PubMed CentralView ArticlePubMedGoogle Scholar
- Wu CC, MacCoss MJ, Howell KE, Yates JR: A method for the comprehensive proteomic analysis of membrane proteins. Nat Biotechnol. 2003, 21 (5): 532-538. 10.1038/nbt819.View ArticlePubMedGoogle Scholar
- Florens L, Washburn MP, Raine JD, Anthony RM, Grainger M, Haynes JD, Moch JK, Muster N, Sacci JB, Tabb DL, Witney AA, Wolters D, Wu YM, Gardner MJ, Holder AA, Sinden RE, Yates JR, Carucci DJ: A proteomic view of the Plasmodium falciparum life cycle. Nature. 2002, 419 (6906): 520-526. 10.1038/nature01107.View ArticlePubMedGoogle Scholar
- Jessani N, Niessen S, Wei BQQ, Nicolau M, Humphrey M, Ji YR, Han WS, Noh DY, Yates JR, Jeffrey SS, Cravatt BF: A streamlined platform for high-content functional proteomics of primary human specimens. Nat Methods. 2005, 2 (9): 691-697. 10.1038/nmeth778.View ArticlePubMedGoogle Scholar
- Chen EI, Hewel J, Felding-Habermann B, Yates JR: Large scale protein profiling by combination of protein fractionation and multidimensional protein identification technology (MudPIT). Mol Cell Proteomics. 2006, 5 (1): 53-56. 10.1074/mcp.T500013-MCP200.View ArticlePubMedGoogle Scholar
- Washburn MP, Wolters D, Yates JR: Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat Biotechnol. 2001, 19 (3): 242-247. 10.1038/85686.View ArticlePubMedGoogle Scholar
- Eng JK, McCormack AL, Yates IIIJR: An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom. 1994, 5 (11): 976-989. 10.1016/1044-0305(94)80016-2.View ArticlePubMedGoogle Scholar
- Perkins DN, Pappin DJC, Creasy DM, Cottrell JS: Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999, 20 (18): 3551-3567. 10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2.View ArticlePubMedGoogle Scholar
- Weatherly DB, Atwood JA, Minning TA, Cavola C, Tarleton RL, Orlando R: A heuristic method for assigning a false-discovery rate for protein identifications from mascot database search results. Mol Cell Proteomics. 2005, 4 (6): 762-772. 10.1074/mcp.M400215-MCP200.View ArticlePubMedGoogle Scholar
- Keller A, Nesvizhskii AI, Kolker E, Aebersold R: Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem. 2002, 74 (20): 5383-5392. 10.1021/ac025747h.View ArticlePubMedGoogle Scholar
- Nesvizhskii AI, Keller A, Kolker E, Aebersold R: A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem. 2003, 75 (17): 4646-4658. 10.1021/ac0341261.View ArticlePubMedGoogle Scholar
- Sadygov RG, Liu H, Yates JR: Statistical models for protein validation using tandem mass spectral data and protein amino acid sequence databases. Anal Chem. 2004, 76 (6): 1664-1671. 10.1021/ac035112y.View ArticlePubMedGoogle Scholar
- Moore RE, Young MK, Lee TD: Qscore: An algorithm for evaluating SEQUEST database search results. J Am Soc Mass Spectrom. 2002, 13 (4): 378-386. 10.1016/S1044-0305(02)00352-5.View ArticlePubMedGoogle Scholar
- Baczek T, Bucinski A, Ivanov AR, Kaliszan R: Artificial neural network analysis for evaluation of peptide MS/MS spectra in proteomics. Anal Chem. 2004, 76 (6): 1726-1732. 10.1021/ac030297u.View ArticlePubMedGoogle Scholar
- Ulintz PJ, Zhu J, Qin ZHS, Andrews PC: Improved classification of mass spectrometry database search results using newer machine learning approaches. Mol Cell Proteomics. 2006, 5 (3): 497-509. 10.1074/mcp.M500233-MCP200.View ArticlePubMedGoogle Scholar
- Peng JM, Elias JE, Thoreen CC, Licklider LJ, Gygi SP: Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: The yeast proteome. J Proteome Res. 2003, 2 (1): 43-50. 10.1021/pr025556v.View ArticlePubMedGoogle Scholar
- Beausoleil SA, Villen J, Gerber SA, Rush J, Gygi SP: A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nat Biotechnol. 2006, 24 (10): 1285-1292. 10.1038/nbt1240.View ArticlePubMedGoogle Scholar
- Elias JE, Gygi SP: Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat Methods. 2007, 4 (3): 207-214. 10.1038/nmeth1019.View ArticlePubMedGoogle Scholar
- Higdon R, Kolker E: A predictive model for identifying proteins by a single peptide match. Bioinformatics. 2007, 23 (3): 277-280. 10.1093/bioinformatics/btl595.View ArticlePubMedGoogle Scholar
- Elias JE, Haas W, Faherty BK, Gygi SP: Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations. Nat Methods. 2005, 2 (9): 667-675. 10.1038/nmeth785.View ArticlePubMedGoogle Scholar
- Park GW, Kwon KH, Kim JY, Lee JH, Yun SH, Kim SI, Park YM, Ch SY, Paik YK, Yoo JS: Human plasma proteome analysis by reversed sequence database search and molecular weight correlation based on a bacterial proteome analysis. Proteomics. 2006, 6 (4): 1121-1132. 10.1002/pmic.200500318.View ArticlePubMedGoogle Scholar
- Qian WJ, Liu T, Monroe ME, Strittmatter EF, Jacobs JM, Kangas LJ, Petritis K, CampIi DG, Smith RD: Probability-based evaluation of peptide and protein identifications from tandem mass spectrometry and SEQUEST analysis: The human proteome. J Proteome Res. 2005, 4 (1): 53-62. 10.1021/pr0498638.View ArticlePubMedGoogle Scholar
- Xie HW, Griffin TJ: Trade-off between high sensitivity and increased potential for false positive peptide sequence matches using a two-dimensional linear ion trap for tandem mass spectrometry-based proteomics. J Proteome Res. 2006, 5 (4): 1003-1009. 10.1021/pr050472i.View ArticlePubMedGoogle Scholar
- Kislinger T, Cox B, Kannan A, Chung C, Hu PZ, Ignatchenko A, Scott MS, Gramolini AO, Morris Q, Hallett MT, Rossant J, Hughes TR, Frey B, Emili A: Global survey of organ and organelle protein expression in mouse: Combined proteomic and transcriptomic profiling. Cell. 2006, 125 (1): 173-186. 10.1016/j.cell.2006.01.044.View ArticlePubMedGoogle Scholar
- Lu BW, Ruse C, Xu T, Park SK, Yates J: Automatic validation of phosphopeptide identifications from tandem mass spectra. Anal Chem. 2007, 79 (4): 1301-1310. 10.1021/ac061334v.PubMed CentralView ArticlePubMedGoogle Scholar
- Olsen JV, Blagoev B, Gnad F, Macek B, Kumar C, Mortensen P, Mann M: Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell. 2006, 127 (3): 635-648. 10.1016/j.cell.2006.09.026.View ArticlePubMedGoogle Scholar
- Goldberg DE: Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Westey: New York. 1989Google Scholar
- Li LH, Tang H, Wu ZB, Gong JL, Gruidl M, Zou J, Tockman M, Clark RA: Data mining techniques for cancer detection using serum proteomic profiling. Artif Intell Med. 2004, 32 (2): 71-83. 10.1016/j.artmed.2004.03.006.View ArticlePubMedGoogle Scholar
- Heredia-Langner A, Cannon WR, Jarman KD, Jarman KH: Sequence optimization as an alternative to de novo analysis of tandem mass spectrometry data. Bioinformatics. 2004, 20 (14): 2296-2304. 10.1093/bioinformatics/bth242.View ArticlePubMedGoogle Scholar
- Jeffries NO: Performance of a genetic algorithm for mass spectrometry proteomics. BMC Bioinformatics . 2004, 5: 180-10.1186/1471-2105-5-180.PubMed CentralView ArticlePubMedGoogle Scholar
- Wilmarth PA, Riviere MA, Rustvold DL, Lauten JD, Madden TE, David LL: Two-dimensional liquid chromatography study of the human whole saliva proteome. J Proteome Res. 2004, 3 (5): 1017-1023. 10.1021/pr049911o.View ArticlePubMedGoogle Scholar
- Jiang XG, Feng S, Tian RJ, Han GH, Jiang XN, Ye ML, Zou HF: Automation of nanoflow liquid chromatography-tandem mass spectrometry for proteome analysis by using a strong cation exchange trap column. Proteomics. 2007, 7 (4): 528-539. 10.1002/pmic.200600661.View ArticlePubMedGoogle Scholar
- Qian WJ, Jacobs JM, Camp DG, Monroe ME, Moore RJ, Gritsenko MA, Calvano SE, Lowry SF, Xiao WZ, Moldawer LL, Davis RW, Tompkins RG, Smith RD: Comparative proteome analyses of human plasma following in vivo lipopolysaccharide administration using multidimensional separations coupled with tandem mass spectrometry. Proteomics. 2005, 5 (2): 572-584. 10.1002/pmic.200400942.PubMed CentralView ArticlePubMedGoogle Scholar
- Bodenmiller B, Mueller LN, Mueller M, Domon B, Aebersold R: Reproducible isolation of distinct, overlapping segments of the phosphoproteome. Nat Methods. 2007, 4 (3): 231-237. 10.1038/nmeth1005.View ArticlePubMedGoogle Scholar
- Na SJ, Paek E: Quality assessment of tandem mass spectra based on cumulative intensity normalization. J Proteome Res. 2006, 5 (12): 3241-3248. 10.1021/pr0603248.View ArticlePubMedGoogle Scholar
- Tao WA, Wollscheid B, O'Brien R, Eng JK, Li XJ, Bodenmiller B, Watts JD, Hood L, Aebersold R: Quantitative phosphoproteome analysis using a dendrimer conjugation chemistry and tandem mass spectrometry. Nat Methods. 2005, 2 (8): 591-598. 10.1038/nmeth776.View ArticlePubMedGoogle Scholar
- Link AJ, Eng J, Schieltz DM, Carmack E, Mize GJ, Morris DR, Garvik BM, Yates JR: Direct analysis of protein complexes using mass spectrometry. Nat Biotechnol. 1999, 17 (7): 676-682. 10.1038/10890.View ArticlePubMedGoogle Scholar
- DTA files. [http://bioanalysis.dicp.ac.cn/proteomics/software/SFOER.dta.rar]
- Krijgsveld J, Gauci S, Dormeyer W, Heck AJR: In-gel isoelectric focusing of peptides as a tool for improved protein identification. J Proteome Res. 2006, 5 (7): 1721-1730. 10.1021/pr0601180.View ArticlePubMedGoogle Scholar
- Everley PA, Bakalarski CE, Elias JE, Waghorne CG, Beausoleil SA, Gerber SA, Faherty BK, Zetter BR, Gygi SP: Enhanced analysis of metastatic prostate cancer using stable isotopes and high mass accuracy instrumentation. J Proteome Res. 2006, 5 (5): 1224-1231. 10.1021/pr0504891.View ArticlePubMedGoogle Scholar
- TPP project. [http://tools.proteomecenter.org/TPP.php]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.