Scientific workflow optimization for improved peptide and protein identification
© Holl et al. 2015
Received: 17 March 2015
Accepted: 24 August 2015
Published: 3 September 2015
Peptide-spectrum matching is a common step in most data processing workflows for mass spectrometry-based proteomics. Many algorithms and software packages, both free and commercial, have been developed to address this task. However, these algorithms typically require the user to select instrument- and sample-dependent parameters, such as mass measurement error tolerances and number of missed enzymatic cleavages. In order to select the best algorithm and parameter set for a particular dataset, in-depth knowledge about the data as well as the algorithms themselves is needed. Most researchers therefore tend to use default parameters, which are not necessarily optimal.
We have applied a new optimization framework for the Taverna scientific workflow management system (http://ms-utils.org/Taverna_Optimization.pdf) to find the best combination of parameters for a given scientific workflow to perform peptide-spectrum matching. The optimizations themselves are non-trivial, as demonstrated by several phenomena that can be observed when allowing for larger mass measurement errors in sequence database searches. On-the-fly parameter optimization embedded in scientific workflow management systems enables experts and non-experts alike to extract the maximum amount of information from the data. The same workflows could be used for exploring the parameter space and compare algorithms, not only for peptide-spectrum matching, but also for other tasks, such as retention time prediction.
Using the optimization framework, we were able to learn about how the data was acquired as well as the explored algorithms. We observed a phenomenon identifying many ammonia-loss b-ion spectra as peptides with N-terminal pyroglutamate and a large precursor mass measurement error. These insights could only be gained with the extension of the common range for the mass measurement error tolerance parameters explored by the optimization framework.
In mass spectrometry based proteomics, whether bottom-up, top-down, or middle-down , the matching of a single tandem mass spectrum, or a spectral tree  to a peptide is an integral part of most methods for identifying peptides and proteins. Existing methods fall into one of three broad categories: sequence database searches , spectral libraries [4–6] and de novo sequencing . Most recent methods can be applied to data from collision-induced dissociation , electron capture dissociation  or other fragmentation techniques, individually or in combination [10, 11]. The identification may be based on MS2, MS3 or a combination of these. Several groups have also published efforts in combining multiple algorithms for peptide-spectrum matching, for instance the framework developed by Searle et al. , the MSblender software from Kwon et al.  or the FDRAnalysis algorithm of Wedge et al. . Recently, in de Bruin et al.  and Mohammed et al.  we have shown how some of these algorithms can be integrated with other algorithms in scientific workflows . Scientific workflows enable researchers to concentrate on their research purpose rather than on computational challenges. However, all these algorithms use a number of user-defined input parameters, such as the specificity and fidelity of the enzymatic digestion, the sequences or library to search spectra against, mass measurement uncertainty or error (MME) and score or probability thresholds in the assembly of peptide-spectrum matches to peptide or protein sets. Typically, the choice of algorithm and parameters is determined from the users’ experience and expert knowledge about the experiment, instrumentation and data quality. Previously, Piehowski et al. have used a “systematic trial-and-error parameter selection” to optimize peptide identification using SEQUEST , showing significant improvement over using default search parameters. Here we describe the usage of a framework  for automated optimization of scientific workflows with two very different analysis tasks: peptide-spectrum matching and chromatographic retention time prediction. The optimization process can be reproduced by other researchers with the same or a different target and workflows. One must ensure to install all required applications, Taverna and the optimization plugin as described at http://ms-utils.org/Taverna_Optimization.pdf.
Test samples and sequences
In this study, we used six representative datasets from two different organisms and three different types of mass analyzers. Three datasets were generated in our own lab and three fetched from the PRIDE repository . As a prokaryote with a small genome and limited number of modified peptides, we used an E. coli whole-cell lysate, prepared as described by Mostovenko et al. . This sample was analyzed both by high-resolution TOF mass spectrometer and in an ion trap. As a eukaryote with a larger genome and frequent occurrence of modified peptides, we used a sample of human plasma isolated from blood drawn from a self-declared healthy individual after verbal informed consent according to local guidelines approved by the Medical Ethics Committee at the Leiden University Medical Center. The human plasma sample was analyzed on the same ion trap as the E. coli digest. The three additional datasets were downloaded from PRIDE were an orbitrap dataset from a study of label-free absolute proteome quantification methods using E. coli  (project PXD000283, dataset #29781), an orbitrap dataset from glioma-derived cancer stem cells  (PXD000563, file “GSC11_24h_R1.raw”) and a TOF dataset of human induced pluripotent stem cells  (PXD000071, “120118ry_201B7-32_2_2.wiff”). These datasets cover three common types of mass analyzers with varying resolving power and mass measurement accuracy as well as organisms with small and large genomes. UniProt reference proteomes data for E. coli (April 2013, 4,439 sequences and same number of decoys) and H. sapiens (April 2013, 89,601 sequences including isoforms and the same number of decoys) was used for peptide identification using the X!Tandem  sequence search engine.
Liquid chromatography – tandem mass spectrometry
The ion trap only datasets were generated as follows. Two μL of each tryptic digest were loaded and desalted on a 300 μm-i.d. 5-mm PepMap C18 trap column (Dionex, Sunnyvale, CA) and separated by reversed-phase liquid chromatography using a 15-cm, 300 μm-i.d. ChromXP C18 column (Eksigent, Dublin, CA) connected to a splitless NanoLC-Ultra 2D plus system (Eksigent) with a linear 90-min gradient from 4 to 33 % acetonitrile in 0.05 % formic acid and a constant flow rate of 4 μL/min. The LC system was coupled to an amaZon ETD ion trap (Bruker Daltonics, Bremen, Germany) via a CaptiveSpray™ ESI source. After each MS scan, up to 10 abundant multiply charged species in m/z 300-1300 were selected for MS/MS and excluded for one minute after having been selected twice for MS/MS. Each individual scan or tandem mass spectrum was saved to disk. The LC system was controlled by HyStar 3.2 and the ion trap by trapControl 7.0. To generate a hybrid TOF/ion trap dataset, the E. coli digest was loaded and desalted as above, separated on a 15-cm, 75 μm-i.d PepMap C18 column in an Ultimate 3000 LC system (Thermo Scientific, Sunnyvale, CA) with a 180-min 300 nL/min piece-wise linear gradient with the following breakpoints: 2 % B at 0 and 10 min, 5 % B at 25 min, 25 % B at 165 min, 30 % B at 175 min and 35 % B at 190 min, where B is 95 % acetonitrile and 0.1 % formic acid. The LC system was coupled simultaneously to a maXis high-resolution-TOF (also Bruker) and an amaZon speed ion trap using a post-column flow splitter (RePlay™, Advion, Ithaca, NY), both with the CaptiveSpray™ ESI source.
Optimization of the X!Tandem workflow
In X!Tandem and many other search engines, it is possible to define not only a number of allowed missed cleavage sites within a peptide, but also the fidelity of the enzyme. The latter allows for zero or one of the peptide termini not conforming to the enzymatic specificity and are in X!Tandem referred to as “full” – strict tryptic cleavage – meaning that both termini have to be the result of tryptic cleavages unless the peptide is from the protein N- or C-terminus, and “semi”, meaning that only one site of the termini has to result from cleavage by trypsin. In software such as Mascot, this is not an independent parameter but implemented as a virtual enzyme (“semiTrypsin”). In order to fully demonstrate the advantage of the optimization framework, we performed a second optimization on the E. coli ion trap data, starting from the MME tolerance optimum, with two additional parameters included in the optimization process: the number of missed cleavages (integer ∈ [0, 4]) and the enzymatic fidelity defined by a Boolean representing’full’ (default) or’semi-tryptic’.
There are many methods available for finding the optimum of a given function. One should take care if using a method based on derivatives (numerical, as it is not reasonable to find an analytical expression). For instance, when allowing isotope errors (or “# 13C” in Mascot), the derivative of the number of PSMs as a function of the allowed MME is discontinuous where the sum of the negative and positive error is 1 Da (Fig. 2b). In search engines having only one MME tolerance parameter, i.e. the same positive and negative maximum MME, this happens exactly at 0.5 Da maximum MME. This is easy to understand, as the two or three searched mass windows become one, and the window is expanding further along two edges rather than four or six. There are also a number of discrete variables that can be modified and that influence the peptide-spectrum matching, for example isotope error, missed cleavages, minimum and maximum peptide length, and both fixed and variable post-translational modifications. These parameters are often binary (isotope error, included PTMs) but can sometimes take on any integer value in a small range (peptide size, missed cleavages, maximum number of variable PTMs per peptide). Additionally, the choice of search algorithm itself can be subject to optimization. Most database search engines have equivalent parameters, such as MME, missed cleavages, peptide size and considered PTMs. In order to optimize the described parameters above, we use the Taverna workflow optimization framework that employs an evolutionary algorithmto optimize multiple continuous, discrete or binary parameters and find the combination that gives the best global performance according to a user-defined target, or fitness function. Here we use as fitness function the number of estimated correct PSMs from doubly charged precursors given by PeptideProphet using decoys and the non-parametric model divided by the total number of tandem mass spectra or the root-mean-square deviation of predicted peptide retention time, as these are robust and easily calculated metrics. We use the PSMs as they are closer to the data and better represent discrete units of information in a bottom-up proteomics experiment – in quantitation by spectral counting for example – than the perhaps biologically more relevant number of unique peptides or proteome coverage. However, there is no reason to assume that optimizing for the number of PSMs would not also provide good parameters for unique peptides and proteins.
The Taverna optimization framework used in this paper offers a generic application programming interface to extend Taverna with various types of optimization as well as optimization algorithms. For non-linear and partially discrete problems such as algorithms and simulations used in scientific workflows, the fitness landscape may be rugged and not assessable in many places. Properly dealing with these issues requires a robust and versatile method, such as metaheuristic optimization. The intrinsic parallelizability of such methods is a major advantage in large optimization problems such as those addressed here. Evolutionary algorithms are the parallel metaheuristic of preference  and thus the optimization pluginwe used in this paper was implemented with Evolutionary Algorithms, in detail Genetic Algorithms (GA) . Additional motivations for using GAs are their simplicity, proven performance, versatility and success in the life sciences . The plugin uses an existing Genetic-Algorithm-library, JGAP , and was adapted to workflow parameter optimization by coding each input parameter as a “gene” on a “chromosome”, where each chromosome contains a particular combination of input parameters. In each generation, individual instances of the workflow are executed; one for each chromosome (parameter set). After a user-defined number of generations or other abort criteria, the framework presents the user with the optimal or best parameter set found. Additional statistics, which we will also use in this paper, can be saved after the optimization phase. By using this generic optimization framework and the extended parameter optimization plugin, we obtain a better and more robust parameter set than by using defaults or refining parameters by trial and error. Additionally, there is no need for any prior knowledge about optimization techniques, as the framework and plugin manage all aspects of the optimization. The framework enables researchers to easily optimize scientific workflows and thus increase the scientific output more efficiently than using trial and error or parameter sweeps. More information about the optimization framework, the optimization process and other examples can be found at http://ms-utils.org/Taverna_Optimization.pdf or .
All computing intensive executions (e.g. X!Tandem) performed during the optimizations in this work were conducted on a Grid that was set up by the Grid software UNICORE . The calculations were executed on a cluster within the Grid with 206 compute nodes, each of which consists of two 2.66 GHz Intel Xeon 6-core processors and 96 GiB main memory. For the execution on the Grid, 4 CPUs per job were requested by the user. The scheduling and execution of the jobs were handled by UNICORE, as described previously .
Optimization of retention time prediction
Results and discussion
Results of X!Tandem optimization
Results from the X!Tandem and PeptideProphet optimization of the six test datasets with information on number of unique peptides and the optimal MME
PSMs [M + 2H]2+
opt. PSMs [M + 2H]2+
14057 (+9.1 %)
9260 (+10.3 %)
1296 (+8.3 %)
13221 (+17.2 %)
11264 (+17.2 %)
4356 (+13.4 %)
18548 (+1.1 %)
11366 (+2.1 %)
7526 (+1.4 %)
8490 (+4.1 %)
5571 (+4.8 %)
577 (+9.3 %)
8619 (-0.4 %)
5833 (+0.5 %)
4300 (+12.1 %)
19551 (+12.3 %)
13772 (+12.5 %)
5164 (+20.5 %)
Results from the X!Tandem and PeptideProphet optimization of the six test datasets with information on execution times and the total time for the optimization
E. coli (ion trap)
E. coli (TOF)
E. coli (orbitrap)
H. sapiens (ion trap)
H. sapiens (TOF)
H. sapiens (orbitrap)
As we make the mass error tolerance window larger, we also retrieve more random, or false, peptides. The score for the best matching random peptide increases monotonously as a function of MME. In PeptideProphet, this corresponds to a translation of the negative distribution to higher discriminant scores while the positive distribution remains unchanged. At some point, the cost of allowing better random matches will exceed the gain of additional PSMs. In addition, searching a larger window is more computationally expensive, scaling roughly linearly with the width of the error tolerance window. The X!Tandem run time at the optimum varied from 7 to 10 min for the E. coli datasets and from 9 min to 3 h for the human datasets (Table 2). Execution time and computational cost were not explicitly considered in the optimizations, and for datasets such as the human TOF data used here, the relatively marginal improvement of 2.5 % additional PSMs may not motivate the 2.5 h additional computational time, though all computationally intensive components of these workflows have been parallelized and can be run on clouds, grids or supercomputers . As mentioned above, it is generally recommended to run these database searches in parallel. When considering the optimization runtime, the entire computational cost consists of the sum of each workflow run. The real runtime of an optimization process is therefore the sum of the longest workflow execution within each generation. For example, if in generation 1 the longest workflow execution took 10 min and in the second generation 12 min, the total time for this optimization was 22 min, with 40 workflows having been executed in these two generations. This is feasible due to the parallel execution mechanism implemented within the optimization framework in Taverna. In any case, the researcher should be aware of the required total compute resources needed for the execution of the workflows. Table 2 also lists the runtimes of the workflow using the default MME tolerances (±0.5 Da), the maximum tolerances (±25 Da) and the optimum window. The times required for the entire optimizations are also included, although the optimization should only be required once for each combination of sample type, instrument and method parameters. Additionally, the time required to perform the specific optimization is given. Again, the researcher should be aware that the actual times may be dependent on the availability of the computing resources and the queuing time.
Results from the second optimization, in which different numbers of missed cleavages and different enzyme fidelities were also investigated for the E. coli hybrid ion trap/TOF data
PSMs [M + 2H]2+
14057 (+1.2 %)
1292 (-1.9 %)
13888 (-1.2 %)
9282 (+0.2 %)
1271 (-1.9 %)
Results of retention time prediction optimization
The two examples shown here demonstrate that systematic exploration of parameters and algorithms for data analysis in mass spectrometry based proteomics can achieve at least two things. First and foremost, re-evaluating legacy parameter and model choices allows more peptides and proteins to be identified, which may allow more biologically relevant information to be extracted from the raw mass spectrometry data. The optimization should be done on a representative dataset, or a fraction of all the spectra, for instance sampled using random data decomposition . Secondly, exploring different combinations of parameters and algorithms leads to new insight into the data and the algorithms themselves – for example the ammonia loss b-ion spectra identified as peptides with N-terminal pyroglutamate and the behavior of the retention time predictors for different size training sets. These phenomena were not chosen for investigation, but uncovered during the parameter optimization when allowing the parameters to vary over a wide range. The optimum MME windows were found to be asymmetric with a larger tolerance of positive MMEs. In one dataset, the optimal positive MME was found along the ridge (+17.62 Da) corresponding to the pyroglutamate/NH3-loss PSMs. The other optimal MMEs were found either just outside the actual mass measurement errors (-0.31 or -0.50 Da) or just outside the MME corresponding to the precursor isolation window as illustrated in Fig. 4 (-5.57, +7.32 or +7.95 Da). Similar observations were independently reported by three different groups at a recent international conference [38–40], including data from a Q Exactive Orbitrap . The phenomenon makes perfect sense given the distribution of MMEs observed when allowing very large MMEs in the X!Tandem search, with few PSMs with MMEs below -6 or between 8 and 15 Da. An important point here is that the genetic algorithm searches a very large parameter space, and would also be able to find an optimum very close to zero if one exists for very accurate precursor mass measurements.
It is also important to be aware of a number of effects that can mislead optimization procedures such as the ones followed here. For some combination of parameters, possibly very far from optimal, the PeptideProphet expectation-maximization (EM) may fail to find the globally best fit to the measured discriminant score distribution. This can sometimes be explained by a noisy discriminant score distribution, but sometimes the PeptideProphet EM algorithm gets stuck in a local minimum. We therefore settled for the target/decoy and the non-parametric model of “2+” spectra in PeptideProphet, as this does not fail over the range of parameters investigated in this study, whereas it occasionally fails for “1+” and “3+” spectra, especially when using the parametric model. The optimum found should still be a very good parameter choice for slightly different targets, as roughly two thirds of the identifiable spectra are from doubly charged precursors. The workflow feedback in the form of parameter surfaces is helpful in visually validating the optimization, and catching numbers returned from a failed EM that are obviously erroneous (such as identifying nearly 100 % of the spectra). Over smaller ranges and for more or better data and algorithms, the parametric model may still function sufficiently well for use in optimization. A different optimization target, such as the number of unique identified peptides, may theoretically produce a smaller optimum MME tolerance, as many of the peptides identified in the larger windows, such as the co-eluting peptides in Fig. 4, would have also been selected for MS/MS and identified from different spectra in the same dataset. However, it is good to remember that random (false) matches tend to be to unique peptides, and that optimizing for the number of unique peptides or proteins will have a positive bias toward spurious identifications.
The usage of the Taverna workflow management system and the optimization framework produced only a small overhead in this experiment. Even if scientific workflows are still new in the proteomics field , many researchers are already familiar with the usage of scientific workflow management systems like Taverna. As Taverna is implemented in Java, it can be executed as a Java application without installation and thus typically on every machine. With the Taverna graphical interface, users can design their own workflows or reuse existing ones from a repository . Some workflows require access to or installation of applications that will be called by the workflow. Adaptation is sometimes needed in order to run the workflows on one’s own machine. This procedure is very dynamic in Taverna and cannot be described in general. References and further literature can be found at http://www.taverna.org.uk. The workflow optimization plugin is designed as a standard Taverna plugin and can be installed automatically by adding the download page to Taverna (as described in http://ms-utils.org/Taverna_Optimization.pdf). To enable the optimization process on a workflow, a graphical user interface is offered to select the sub-workflow, define termination criteria, and specify parameters, along with their ranges and dependencies. A modification of the workflow is not required for the optimization. After the optimization process, the result is presented to the user, who can store the entire optimization process including execution statistics and other information. For more detailed information on the optimization plugin, please refer to .
We used a new optimization framework to optimize a scientific workflow for peptide-spectrum matching and retention time prediction. The two steps were optimized separately from each other in the Taverna Workflow Manager. With the optimization framework users can optimize various parameters of any algorithm or tool within a scientific workflow. In our use case we allowed a much larger MME window for X!Tandem than typically used. With this setup we had been able to find new PSMs outside of the commonly searched MME window. These PSMs were primarily due to the unpredicted matching of spectra from peptides with N-terminal pyroglutamate from glutamine or glutamic acid with measured spectra of unmodified peptides experiencing ammonia or water loss from the N-terminal glutamine/glutamic acid during fragmentation.
In conclusion, we suggest an open mind and perhaps a more widely open search window is needed whenever looking at data from new types of experiments or new mass spectrometers. Scientific workflows, for example in Taverna, have many advantages for analysis of large proteomics datasets, such as comprehension, shareability, provenance, interfacing with cloud or grid computing. In combination with the Taverna optimization framework, the workflow can then be optimized with respect to parameters as well as algorithms, on-the-fly and fully transparently. Additional search parameters and exclusion criteria, such as minimum number of peaks, minimum fragment m/z and minimum peptide length, may also deserve investigation, although short peptides tend to less protein-specific and therefore of less value in practice.
Availability of supporting data
All software and workflows are freely available at http://unicore-dev.zam.kfa-juelich.de/taverna/plugins/ and from myExperiment.org. The installation and usage guide is available at http://ms-utils.org/Taverna_Optimization.pdf. At http://www.myexperiment.org/workflows/3693.html the X!Tandem and PeptideProphet workflow is available. The workflow for the retention time prediction optimization can be accessed at http://www.myexperiment.org/workflows/3691.html. The liquid chromatography-tandem mass spectrometry datasets produced in-house, including the hybrid ion trap/maXis data, are available from http://cpm.lumc.nl/export/public_datasets/.
The authors wish to express their gratitude to Hans Dalebout for technical assistance, Dr. Oleg Klychnikov and Dr. Paul Hensbergen for providing the E. coli amaZon/maXis raw data, Dr. Ekaterina Mostovenko for the E. coli and human plasma datasets and Prof. André M. Deelder for helpful comments on the manuscript.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Han X, Aslanian A, Yates 3rd JR. Mass spectrometry for proteomics. Curr Opin Chem Biol. 2008;12(5):483–90. doi:10.1016/j.cbpa.2008.07.024.View ArticlePubMedPubMed CentralGoogle Scholar
- Xu H, Wang L, Sallans L, Freitas MA. A hierarchical MS2/MS3 database search algorithm for automated analysis of phosphopeptide tandem mass spectra. Proteomics. 2009;9(7):1763–70. doi:10.1002/pmic.200800282.View ArticlePubMedPubMed CentralGoogle Scholar
- Hernandez P, Muller M, Appel RD. Automated protein identification by tandem mass spectrometry: issues and strategies. Mass Spectrom Rev. 2006;25(2):235–54. doi:10.1002/mas.20068.View ArticlePubMedGoogle Scholar
- Craig R, Cortens JC, Fenyo D, Beavis RC. Using annotated peptide mass spectrum libraries for protein identification. J Proteome Res. 2006;5(8):1843–9. doi:10.1021/pr0602085.View ArticlePubMedGoogle Scholar
- Frewen BE, Merrihew GE, Wu CC, Noble WS, MacCoss MJ. Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries. Anal Chem. 2006;78(16):5678–84. doi:10.1021/ac060279n.View ArticlePubMedGoogle Scholar
- Lam H, Aebersold R. Spectral library searching for peptide identification via tandem MS. Methods Mol Biol. 2010;604:95–103. doi:10.1007/978-1-60761-444-9_7.View ArticlePubMedGoogle Scholar
- Seidler J, Zinn N, Boehm ME, Lehmann WD. De novo sequencing of peptides by MS/MS. Proteomics. 2010;10(4):634–49. doi:10.1002/pmic.200900459.View ArticlePubMedGoogle Scholar
- Wells JM, McLuckey SA. Collision-induced dissociation (CID) of peptides and proteins. Methods Enzymol. 2005;402:148–85. doi:10.1016/S0076-6879(05)02005-7.View ArticlePubMedGoogle Scholar
- Syka JE, Coon JJ, Schroeder MJ, Shabanowitz J, Hunt DF. Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry. Proc Natl Acad Sci U S A. 2004;101(26):9528–33. doi:10.1073/pnas.0402700101.View ArticlePubMedPubMed CentralGoogle Scholar
- Nielsen ML, Savitski MM, Zubarev RA. Improving protein identification using complementary fragmentation techniques in fourier transform mass spectrometry. Mol Cell Proteomics. 2005;4(6):835–45. doi:10.1074/mcp.T400022-MCP200.View ArticlePubMedGoogle Scholar
- Kim S, Mischerikow N, Bandeira N, Navarro JD, Wich L, Mohammed S, et al. The generating function of CID, ETD, and CID/ETD pairs of tandem mass spectra: applications to database search. Mol Cell Proteomics. 2010;9(12):2840–52. doi:10.1074/mcp.M110.003731.View ArticlePubMedPubMed CentralGoogle Scholar
- Searle BC, Turner M, Nesvizhskii AI. Improving sensitivity by probabilistically combining results from multiple MS/MS search methodologies. J Proteome Res. 2008;7(1):245–53. doi:10.1021/pr070540w.View ArticlePubMedGoogle Scholar
- Kwon T, Choi H, Vogel C, Nesvizhskii AI, Marcotte EM. MSblender: A probabilistic approach for integrating peptide identifications from multiple database search engines. J Proteome Res. 2011;10(7):2949–58. doi:10.1021/pr2002116.View ArticlePubMedPubMed CentralGoogle Scholar
- Wedge DC, Krishna R, Blackhurst P, Siepen JA, Jones AR, Hubbard SJ. FDRAnalysis: a tool for the integrated analysis of tandem mass spectrometry identification results from multiple search engines. J Proteome Res. 2011;10(4):2088–94. doi:10.1021/pr101157s.View ArticlePubMedPubMed CentralGoogle Scholar
- de Bruin JS, Deelder AM, Palmblad M. Scientific workflow management in proteomics. Mol Cell Proteomics. 2012;11(7):M111. doi:10.1074/mcp.M111.010595. 010595.View ArticlePubMedPubMed CentralGoogle Scholar
- Mohammed Y, Mostovenko E, Henneman AA, Marissen RJ, Deelder AM, Palmblad M. Cloud parallel processing of tandem mass spectrometry based proteomics data. J Proteome Res. 2012;11(10):5101–8. doi:10.1021/pr300561q.View ArticlePubMedGoogle Scholar
- Littauer R, Ram K, Ludäscher B, Michener W, Koskela R. Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practice. Int J Digit Curation. 2012;7(2):92–100. doi:10.2218/ijdc.v7i2.232.View ArticleGoogle Scholar
- Piehowski PD, Petyuk VA, Sandoval JD, Burnum KE, Kiebel GR, Monroe ME, et al. STEPS: a grid search methodology for optimized peptide identification filtering of MS/MS database search results. Proteomics. 2013;13(5):766–70. doi:10.1002/pmic.201200096.View ArticlePubMedPubMed CentralGoogle Scholar
- Holl S, Zimmermann O, Hofmann-Apitius M, editors. A new optimization phase for scientific workflow management systems. 2012 IEEE 8th International Conference on E-Science (e-Science). Washington DC: IEEE Computer Society; 2012 8-12 Oct. 2012.Google Scholar
- Vizcaino JA, Cote RG, Csordas A, Dianes JA, Fabregat A, Foster JM et al. The PRoteomics IDEntifications (PRIDE) database and associated tools: status in 2013. Nucleic Acids Res. 2013;41(Database issue). doi:10.1093/nar/gks1262.
- Mostovenko E, Deelder AM, Palmblad M. Protein expression dynamics during Escherichia coli glucose-lactose diauxie. BMC Microbiol. 2011;11:126. doi:10.1186/1471-2180-11-126.View ArticlePubMedPubMed CentralGoogle Scholar
- Arike L, Valgepea K, Peil L, Nahku R, Adamberg K, Vilu R. Comparison and applications of label-free absolute proteome quantification methods on Escherichia coli. J Proteomics. 2012;75(17):5437–48. doi:10.1016/j.jprot.2012.06.020.View ArticlePubMedGoogle Scholar
- Lichti CF, Liu H, Shavkunov AS, Mostovenko E, Sulman EP, Ezhilarasan R, et al. Integrated chromosome 19 transcriptomic and proteomic data sets derived from glioma cancer stem-cell lines. J Proteome Res. 2014;13(1):191–9. doi:10.1021/pr400786s.View ArticlePubMedGoogle Scholar
- Yamana R, Iwasaki M, Wakabayashi M, Nakagawa M, Yamanaka S, Ishihama Y. Rapid and deep profiling of human induced pluripotent stem cell proteome by one-shot NanoLC-MS/MS analysis with meter-scale monolithic silica columns. J Proteome Res. 2013;12(1):214–21. doi:10.1021/pr300837u.View ArticlePubMedGoogle Scholar
- Craig R, Beavis RC. TANDEM: matching proteins with tandem mass spectra. Bioinformatics. 2004;20(9):1466–7. doi:10.1093/bioinformatics/bth092.View ArticlePubMedGoogle Scholar
- Junker J, Bielow C, Bertsch A, Sturm M, Reinert K, Kohlbacher O. TOPPAS: a graphical workflow editor for the analysis of high-throughput proteomics data. J Proteome Res. 2012;11(7):3914–20. doi:10.1021/pr300187f.View ArticlePubMedGoogle Scholar
- Lam H, Deutsch EW, Eddes JS, Eng JK, King N, Stein SE, et al. Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics. 2007;7(5):655–67. doi:10.1002/pmic.200600625.View ArticlePubMedGoogle Scholar
- Holl S, Zimmermann O, Hofmann-Apitius M, editors. A UNICORE Plugin for HPC-Enabled Scientific Workflows in Taverna 2.2. 2011 IEEE World Congress on Services (SERVICES). Washington DC: IEEE Computer Society; 2011 4-9 July 2011.Google Scholar
- Alba E, Luque G, Nesmachnow S. Parallel metaheuristics: recent advances and new trends. Int Trans Oper Res. 2013;20(1):1–48.View ArticleGoogle Scholar
- Holland JH. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. M.I.T.P. Washington DC: IEEE Computer Society. 1992.Google Scholar
- Niazi A, Leardi R. Genetic algorithms in chemometrics. J Chemometr. 2012;26(6):345–51.View ArticleGoogle Scholar
- Meffert K. JGAP - Java Genetic Algorithms Package. 2013. http://jgap.sourceforge.net/.
- Holl S. Automated Optimization Methods for Scientific Workflows in e-Science Infrastructures. Forschungszentrum Jülich; 2014. http://juser.fz-juelich.de/record/153150.
- Streit A, Bala P, Beck-Ratzka Aea. UNICORE 6 – Recent and Future Advancements. Report. Jülich: Forschungszentrum Jülich Zentralbibliothek, Verlag Jülich; 2013. http://juser.fz-juelich.de/record/136184.
- Palmblad M, Ramström M, Markides KE, Håkansson P, Bergquist J. Prediction of chromatographic retention and protein identification in liquid chromatography/mass spectrometry. Anal Chem. 2002;74(22):5826–30.View ArticlePubMedGoogle Scholar
- Krokhin OV. Sequence-specific retention calculator. Algorithm for peptide retention prediction in ion-pair RP-HPLC: Application to 300-and 100-angstrom pore size C18 sorbents. Anal Chem. 2006;78(22):7785–95. doi:10.1021/Ac060777w.View ArticlePubMedGoogle Scholar
- Petritis K, Kangas LJ, Yan B, Monroe ME, Strittmatter EF, Qian WJ, et al. Improved peptide elution time prediction for reversed-phase liquid chromatography-MS by incorporating peptide sequence information. Anal Chem. 2006;78(14):5026–39. doi:10.1021/ac060143p.View ArticlePubMedPubMed CentralGoogle Scholar
- Gibson RJ, Nepomuceno AI, Randall SM, Muthusamy N, Ghashghaei HT, Muddiman DC. Elucidation of Search Parameters for Q-Exactive to Maximize Protein Identifications at 1% False Discovery Rate Using Wild-Type and FoxJ1 Knock Out Mouse Brain Tissues. 61st ASMS Conference on Mass Spectrometry and Allied Topics; 2013 June 9-13, 2013. Minneapolis, MN: ASMS; 2013.Google Scholar
- Wilmarth PA, Rathje WJ, David LL. An unbiased comparison of peptide identification performance between SEQUEST, Mascot and X!Tandem. 61st ASMS Conference on Mass Spectrometry and Allied Topics; 2013 June 9-13. Minneapolis, MN: ASMS; 2013.Google Scholar
- Kim S, Slysz GW, Crowell KL, Payne SH, Anderson GA, Smith RD. IPA: an Informed Proteomics Analysis Tool for Improved Peptide Identifications. 61st ASMS Conference on Mass Spectrometry and Allied Topics; 2013 June 9-13. Minneapolis, MN: ASMS; 2013.Google Scholar
- Michalski A, Damoc E, Hauschild JP, Lange O, Wieghaus A, Makarov A, et al. Mass spectrometry-based proteomics using Q Exactive, a high-performance benchtop quadrupole Orbitrap mass spectrometer. Mol Cell Proteomics. 2011;10(9):M111. doi:10.1074/mcp.M111.011015. 011015.View ArticlePubMedPubMed CentralGoogle Scholar
- Stoyanovich J, Taskar B, Davidson S. Exploring repositories of scientific workflows. Proceedings of the 1st International Workshop on Workflow Approaches to New Data-centric Science, vol. 1833405. Indianapolis, Indiana: ACM; 2010. p. 1–10.Google Scholar