Skip to main content

Progressive search in tandem mass spectrometry

Abstract

Background

High-throughput Proteomics has been accelerated by (tandem) mass spectrometry. However, the slow speed of mass spectra analysis prevents the analysis results from being up-to-date. Tandem mass spectrometry database search requires O(|S||D|) time where S is the set of spectra and D is the set of peptides in a database. With usual values of |S| and |D|, database search is quite time consuming. Meanwhile, the database for search is usually updated every month, with 0.5–2% changes. Although the change in the database is usually very small, it may cause extensive changes in the overall analysis results because individual PSM scores such as deltaCn and E-value depend on the entire search results. Therefore, to keep the search results up-to-date, one needs to perform database search from scratch every time the database is updated, which is very inefficient.

Results

Thus, we present a very efficient method to keep the search results up-to-date where the results are the same as those achieved by the normal search from scratch. This method, called progressive search, runs in O(|S||ΔD|) time on average where ΔD is the difference between the old and the new databases. The experimental results show that the progressive search is up to 53.9 times faster for PSM update only and up to 16.5 times faster for both PSM and E-value update.

Conclusions

Progressive search is a novel approach to efficiently obtain analysis results for updated database in tandem mass spectrometry. Compared to performing a normal search from scratch, progressive search achieves the same results much faster. Progressive search is freely available at: https://isa.hanyang.ac.kr/ProgSearch.html.

Background

Database search in tandem mass spectrometry, usually done by as Sequest [1], Tide [2], Comet [3], Mascot [4], Maxquant [5], MS-GF [6], MSFragger [7], and so on, is quite time consuming: Especially when the number of spectra is large, for example, more than 10 million of spectra [8, 9] and/or the search space is wide such as open search [7].

Meanwhile, protein databases used for search are updated frequently. For example, the most widely used database, Uniprot [10], is updated monthly with 0.5 to 2% changes, which means newly identified protein sequences are inserted and some incorrect sequences are deleted. Although the change is very small, it may cause changes in the overall analysis results because each spectrum score is calculated relatively based on the entire search results. Therefore, to keep the search results up-to-date, one needs to perform database search from scratch every time the database is updated, which is very inefficient.

Thus, we present a very efficient method to keep the search results up-to-date where the results are the same as those achieved by the normal search from scratch. This method, called progressive search, efficiently minimizes the computation time such that our progressive search is much faster than the normal search from scratch. In this study, we applied our progressive search to Comet which is not only incorporated into widely used proteomics pipelines such as Trans-Proteomics Pipeline [11] and Crux [12] but also a stand-alone open source tandem mass spectrometry database search engine. Our experimental results in Figs. 1 and 2 show that progressive search is 16.5–53.9 times faster than the normal search where the database change is 0.16%, the number of tryptic termini is 1, and the number of missed cleavage is 2.

Fig. 1
figure 1

Search time comparison between the normal search from scratch and the progressive search

Fig. 2
figure 2

Search time comparison according to database update intervals for ntt1mc2

Implementation

Database separation

First, we compare the old database Dold and the new database Dnew to identify Dsrd, Ddel, and Dins where Dsrd contains the proteins shared by both Dold and Dnew, Ddel contains the proteins stored in only Dold, and Dins contains the proteins stored in only Dnew. Let Rold, Rnew, and Rsrd denote the PSM results for Dold, Dnew, and Dsrd, respectively. Figure 3 shows the case that Dold is the set of proteins {A, B, C, D, E} and Dnew is the set of proteins {B, C, D, E, F}. Thus, Dsrd is the set of proteins {B, C, D, E}, Ddel is {A}, and Dins is {F}.

Fig. 3
figure 3

Database comparison example

For experimental results, we used UniProtKB database released from January to June 2020 (Fig. 4). On average, 0.07% and 0.67% of the proteins were deleted and inserted every month, respectively. In addition, 0.09% and 0.70% of the amino acids are deleted and inserted every month on average, respectively.

Fig. 4
figure 4

Differences between various Uniprot versions. We used different Uniprot database versions from January to June 2020 for database comparison. The databases were compared based on the number of proteins and amino acids

Workflow

Progressive search consists of four steps called “deletion”, “insertion”, “score calculation”, and “E-value calculation” (Fig. 5). We explain the progressive search that runs in O(|S||ΔD|) time on average where S is the set of spectra and ΔD is the difference between the old and the new databases where |X| denotes the number elements in X.

  1. (1)

    Deletion This is the process of obtaining Rsrd from Rold. The Rsrd is the same as Rold except the PSMs whose peptide sequences are from only Ddel. Those PSMs are deleted and replaced by PSMs obtained by searching Dsrd for the spectra in the deleted PSMs (Sdel). For example, PSMs of scans 1, 3 and 6 are updated after deletion in Fig. 5.

  2. (2)

    Insertion This is the process of obtaining Rnew from Rsrd. We search Dins for all the spectra to find PSMs. Then the found PSMs are compared with the PSMs in Rsrd. The PSMs with better scores are selected and stored in Rnew. For example, PSMs of scans 2, 3 and 6 are updated after insertion in Fig. 5.

  3. (3)

    Score calculation This is the process of calculating deltaCn values in Rnew. deltaCn is a score representing the difference between Xcorr values. Since we got the Xcorr of Rnew through previous steps, we can calculate the deltaCn of Rnew in this step.

  4. (4)

    E-value calculation This is the process of calculating E-values in Rnew. Note that the E-value of every PSM may be invalid even if only one of all PSMs has been changed. Since E-value calculation requires all PSM information that has not been output by the original Comet, we built “Comet-E”, a modified version of Comet, to address the E-value correction.

Fig. 5
figure 5

Workflow overall

Detailed explanations are given in the following subsections: Deletion, Insertion, Score calculation, and E-value calculation.

Deletion

Algorithm description: The main purpose of deletion is converting Rold into Rsrd. Each spectrum in Rold was identified by either Ddel or Dsrd (Fig. 6, Composition of database). Recall that Sdel denote the set of spectra identified by only Ddel and let Ssrd denote the set of spectra identified by Dsrd. While the PSMs for Ssrd remain as they are, the PSMs for Sdel should be replaced by the PSMs obtained by searching Dsrd for Sdel. In the example in Fig. 6, among the PSMs for scans 1–7, only the PSMs for scans 1 and 6 are identified with only Ddel (= {A}). Thus, Sdel consists of spectra in scans 1 and 6. (Note that the PSM for scan 3 belongs to Ssrd because its peptide AARASLIEQ exists in both proteins A and C and thus all we have to do is to delete A from the protein list of scan 3.) We search Dsrd (= {B, C, D, E}) for Sdel and the new results replace the old results of Sdel.

Fig. 6
figure 6

Deletion example

Time complexity: Since only Dsrd is searched for Sdel, the time complexity is O(|Sdel|∙|Dsrd|). We show that O(|Sdel|∙|Dsrd|) is reduced to O(|S|∙|Ddel|) on average where S is the set of total spectra SdelSsrd. If we assume that PSMs were randomly selected from Dold in general (this assumption is verified in Results), the ratio |Sdel|/|S| is similar to the ratio |Ddel|/|Dold|. Thus, |Sdel| is approximately the same as |S|∙|Ddel|/|Dold| and the time complexity can be expressed as O(|S|∙|Ddel|∙|Dsrd|/|Dold|). Furthermore, since |Dsrd|/|Dold|≤ 1, the time complexity is reduced to O(|S|∙|Ddel|) on average.

Insertion

The main purpose of insertion is converting Rsrd into Rnew. Each spectrum in Rnew is identified by either Dins or Dsrd. First, we search Dins for the set S. Let Rins denote the search result. For each spectrum, we replace its PSM in Rsrd by its PSM in Rins if the Xcorr of the PSM in Rins is higher than that in Rsrd. In Fig. 7, we search Dins (= {F}) for all the spectra and get Rins. Since only the PSMs of scans 3 and 6 in Rins have higher Xcorr values (2.91 and 1.51) than those in Rsrd (2.03 and 0.83), Rnew is obtained by replacing the PSMs of scans 3 and 6 in Rsrd with those in Rins. (Note that the scan 2 result of Rins is the same as Rsrd because its peptide LGGLWSAV exists in both proteins B and F and thus all we have to do is to add F to the protein list of scan 2.) Since the main part of insertion is to search Dins for the set S, the time complexity is O(|S|∙|Dins|).

Fig. 7
figure 7

Insertion example

Score calculation

After the deletion and insertion, all PSMs with their Xcorr scores have been updated for Dnew. Now, the deltaCn values which are defined as follows should be recalculated.

$${\text{deltaCn}}\left( i \right) \, = { 1 }{-}{\text{ Xcorr}}\left( {i + { 1}} \right) \, /{\text{ Xcorr}}\left( i \right)$$

where Xcorr(i) denote the i-th largest PSM score for a spectrum. Thus, recalculating deltaCn takes O(|S|) (= O(|PSM|)) time in the worst case. In addition, when deltaCn is updated, there are two subtleties to consider as follows.

  1. (i)

    Increment of the parameter num_output_lines by 1

In order to calculate deltaCn(i), not only Xcorr(i) but also Xcorr(i + 1) is required. Since the parameter num_output_lines of Comet determines the number of Xcorr values in the output, num_output_lines should be n + 1 if deltaCn(i)’s for \(i\le n\) are to be calculated by progressive search (Comet-P or Comet-E). Even though Comet just outputs n lines, it always calculates the Xcorr values for PSMs of all ranks, and thus incrementing num_output_lines by 1 rarely affects the total running time.

  1. (ii)

    Xcorr precision refinement in the output

In Comet, the internal data type of Xcorr is double but the Xcorr values in the output of Comet are rounded to the fourth decimal place as shown in Table 1. Thus, the deltaCn(1) calculated by Comet is different from the deltaCn(1) calculated by Xcorr(1) and Xcorr(2) values from the output of Comet as explained in the legend of Table 1. Hence, the Xcorr values in the output of Comet-P/Comet-E are rounded to the seventh decimal place so that the deltaCn calculated by the output of Comet-P/Comet-E is the same as that calculated by Comet.

Table 1 Xcorr precision refinement

E-value calculation

The purpose of “E-value calculation” is converting Rnew into Rnew-E-value. For example, we explain how to calculate E-values of Comet. We built “Comet-E”, a modified version of Comet, to address the E-value correction. Comet-E has two more features than the original Comet. First, it can output the histogram of Xcorr values which was just an intermediate data structure used to calculate E-values in Comet. Second, it can take a histogram of Xcorr values as input and calculate E-values based on the histogram. Let His(R) denote a histogram for a result set R. We calculate His(Rnew) as follows: First, we run Comet-E to acquire histograms His(Rdel) and His(Rins). Then, His(Rnew) is calculated by “His(Rold) − His(Rdel) + His(Rins)” where His(Rold) was already produced earlier by Comet-E. Finally, His(Rnew) is given as input to Comet-E and it recalculates the E-value. Then, Rnew is converted into Rnew-E-value. Detailed explanations are given in the following subsections i), ii), and iii). Subsection i) explains the E-value calculation by Comet and subsections ii) and iii) explain the two new features of Comet-E.

  1. (i)

    E-value calculation by Comet

Comet calculates the E-value for each spectrum based on Xcorr values for all candidate peptides (Fig. 8). Comet needs at least 3000 Xcorr values for each spectrum to calculate its E-value. Comet uses decoy peptides predefined in Comet if the number of Xcorr values is less than 3,000. Then, Comet calculates the histogram of the Xcorr values for each spectrum. The histogram is used to calculate the E-value of each spectrum by the internal scoring function of Comet.

  1. (ii)

    Comet-E (output)

Fig. 8
figure 8

E-value calculation of Comet

Comet-E can output the histogram of Xcorr values for each spectrum (Fig. 9). The histogram consists of Xcorr values for the sequences in the database only, excluding decoy sequences. Unlike Comet, Comet-E outputs the histogram table for every spectrum as a.txt file. Histograms are created with a bin width of 0.1, and has an average of 10 bin counts per spectrum. So, the histogram information (Xcorr counts for all bins) for each spectrum can be represented using only about 20 numbers.

  1. (iii)

    Comet-E (E-value recalculation)

Fig. 9
figure 9

Histogram output by Comet-E

Given His(Rnew) as input, Comet-E can calculate the E-values of Rnew (Fig. 10). Note that His(Rnew) is calculated by “His(Rold) − His(Rdel) + His(Rins)”. This calculation is performed for each bin. If there is no output for a bin among histograms, its frequency is assigned to 0. Note that His(Rold) was produced by Comet-E when Rold was generated and His(Rdel) and His(Rins) are produced by Comet-E when Rdel and Rins are generated, respectively. The time complexity of E-value calculation is O(|S|) because it is regardless of the size of database difference and only proportional to the number of spectra.

Fig. 10
figure 10

E-value recalculation by Comet-E

Results

We measured and compared the running times of Comet, Comet-P (progressive Comet with PSM update only), and Comet-E (progressive Comet with both PSM and E-value update). The databases used were the SwissProt and TrEMBL human protein databases provided by UniProt. And tandem mass spectrometry (MS/MS) spectra for HEK293 cells [13] were used as an input, and the total number of spectra was 1,121,149. We compared them in different parameter settings: In subsection i), we show the results when the difference between Dold and Dnew is fixed and the numbers of tryptic termini (ntt) and missed cleavages (mc) change. In subsection ii), we show the results when the difference between Dold and Dnew changes and ntt and mc are fixed. The search results of Comet, Comet-P, and Comet-E remain consistent for both PSM and peptide levels (Fig. 11).

Fig. 11
figure 11

Consistency between Comet and Progressive Search results at the PSM and peptide level. For comparison, Progressive Search used the results analyzed using databases updated from Uniprot 2020.01 to Uniprot 2020.02, Uniprot 2020.03, Uniprot 2020.04, Uniprot 2020.05, and Uniprot 2020.06 versions. Comparisons were made for Comet, Comet-E and Comet-P for ntt1mc2

The entire experiments were carried out on a Linux PC with an Intel(R) Xeon(R) octa-core CPU E5-2609 v3 @ 1.90 GHz and 36 GB of RAM. The Linux version is Ubuntu 12.04.5 LTS and the compiler is GNU C compiler 6.5.0. All experiments were performed by a single thread.

  1. (i)

    Changing the numbers of tryptic termini and missed cleavages

Table 2A shows the running time results when Dold and Dnew are fixed to Uniprot 2020.01 and Uniprot 2020.02, respectively and ntt changes from 0 to 1 and mc changes from 0 to 2. Note that the difference between Dold (Uniprot 2020.01) and Dnew (Uniprot 2020.02) is 0.16% (Fig. 4 #amino acid). Table 2A shows not only the overall running times of Comet, Comet-P, and Comet-E, but also breaks down the overall running times of Comet-E into the running times of individual modules (database separation, deletion, insertion, and E-value calculation). Note that the running time of Comet-P is the sum of the running times of all individual modules except the E-value calculation. For example, look at the leftmost column ntt2mc0. In this case, Comet, Comet-P, and Comet-E take 7846.9, 459.1, and 1900.1 s, respectively. The 459.1 s which is the running time of Comet-P is the sum of 3.7 s (database separation), 244.1 s (deletion), and 211.3 s (insertion). The 1900.1 s which is the running time of Comet-E is the sum of 459.1 s (Comet-P) and 1441.0 s (E-value calculation).

Table 2 Summary of the running times for various search parameter settings

Table 2B shows the statistics of the running time results in Table 2A. The running time ratio rows show the ratios of individual running times to the running time of Comet. Look at the leftmost column ntt2mc0 again. Since the running time of Comet-P is 459.1 s and that of the original Comet is 7846.9, the ratio is 459.1/7846.9 = 0.0585 = 5.85%. Since the ratio is 5.85%, Comet-P is 17.09 (= (1/5.85)*100) times faster than Comet which is shown just below 5.85% in the table. The speedup of Comet-P is between 17.09 (ntt2mc0) and 53.92 (ntt1mc2) and the speedup of Comet-E is between 4.13 (ntt2mc0) and 16.52 (ntt1mc2). Hence, the more nontryptic termini and missed cleavages there are, the bigger the speedup is.

Finally, it should be noted that the E-value calculation time does not change a lot as the nontryptic termini or missed cleavages change. It is between 1261.4 and 1459.9 s as shown in the last row of Table 2A. It may seem strange on a first look but it is reasonable because the time complexity of E-value calculation is just O(|S|) which means it is regardless of the size of database difference.

  1. (ii)

    Changing database update interval

Table 3 shows the running times and their statistics of Comet, Comet-P, and Comet-E when ntt and mc are fixed to 1 and 2, respectively and the database update interval changes from 1 to 5 months. In this experiment, Dold is fixed to Uniprot 2020.01 and Dnew changes appropriately from Uniprot 2020.02 to Uniprot 2020.06. The ratio |Ddel|/|Dnew| increases from 0.02% to 0.44% and the ratio |Dins|/|Dnew| also increases from 0.14% to 3.48% as the database update interval increases as shown in the last two rows in Table 3B. Recall that the time complexities of deletion and insertion are O(|S|∙|Ddel|) and O(|S|∙|Dins|), respectively. Thus, their running times are expected to increase as the database update interval increases. As expected, the measured running time of deletion (resp. insertion) increases from 344.3 to 598.3 s (resp. from 288.6 to 1443.9) as the database update interval increases as shown in Table 3A. When it comes to the E-value calculation, since its time complexity is O(|S|), its running time is regardless of the database update interval. It is between 1394.5 and 1496.7 s. Conclusively, the speedup of Comet-P is between 53.92 (1 month) and 17.39 (5 months) and the speedup of Comet-E is between 16.52 (1 month) and 10.23 (5 months) as shown in Table 3B.

Table 3 Summary of the running times for several database update intervals

Conclusions

Progressive search is a novel approach to efficiently obtain analysis results for updated database in tandem mass spectrometry. Its running time is O(|S||ΔD|) on average and thus it is up to 53.9 times faster than the normal search from scratch for PSM update only (including the update of PSM scores such as Xcorr and DeltaCn) and up to 16.5 times faster for both PSM and E-value update for the intervals up to 5 months. We also discovered our Progressive search is effective even for longer intervals. Comet-P and Comet-E are 2.5 and 4 times faster than normal search, respectively, even with the interval of 34 months (July 2019 and May 2022 databases) (data not shown). The PSMs and E-values achieved by progressive search are the same as those achieved by the normal search from scratch. In addition, we verified that repeated use of progressive search does not increase the differences in deltaCn values due to rounding. We compared the results from searches for 3-month intervals (between Jan. 2020 and Apr. 2020) with results from 3 repeated searches for 1-month interval (between Jan. 2020 and Feb. 2020, between Feb. 2020 and Mar. 2020, and between Mar. 2020 and Apr. 2020). The deltaCn values were the same in both results although progressive search was used multiple times. This study demonstrates the applicability of Progressive search for efficient tandem mass spectrometry database search. Use of this approach can be extended to a variety of public search tools, including Comet.

Availability and requirements

Project name: progressive search.

Project home page: https://isa.hanyang.ac.kr/ProgSearch.html

Operating system(s): Linux.

Programming language: Java, C +  + 

Other requirements: JDK 1.8 or higher.

License: Apache License V2.0

Any restrictions to use by non-academics: as stipulated by Apache License V2.0

Availability of data and materials

Experiments were carried out with the August 2019 version of Comet and can be obtained through Comet website: http://comet-ms.sourceforge.net. The database used in this current study are publicly available in the UniProt website: https://www.uniprot.org. The databases used were the SwissProt and TrEMBL human protein databases provided by UniProt. We measured the performance of Progressive Search using tandem mass spectrometry (MS/MS) spectra for HEK293 cells [13]. The HEK293 24-fraction MS/MS dataset was used in the experiment, and the total number of spectra was 1,121,149.

Abbreviations

S :

Set of spectra

D :

Set of peptides in a database

D new :

New database

D old :

Old database

D srd :

Database which contains the proteins shared by both Dold and Dnew

D del :

Database which contains the proteins stored in only Dnew

D ins :

Database which contains the proteins stored in only Dnew

R new :

PSM results for Dnew

R old :

PSM results for Dold

R srd :

PSM results for Dsrd

R del :

PSM results for Ddel

R ins :

PSM results for Dins

References

  1. Eng JK, McCormack AL, Yates JR. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom. 1994;5(11):976–89.

    Article  CAS  PubMed  Google Scholar 

  2. Diament BJ, Noble WS. Faster SEQUEST searching for peptide identification from tandem mass spectra. J Proteome Res. 2011;10(9):3871–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Eng JK, Jahan TA, Hoopmann MR. Comet: an open-source MS/MS sequence database search tool. Proteomics. 2013;13(1):22–4.

    Article  CAS  PubMed  Google Scholar 

  4. Perkins DN, Pappin DJ, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. ELECTROPHORESIS Int J. 1999;20(18):3551–67.

    Article  CAS  Google Scholar 

  5. Cox J, Mann M. MaxQuant enables high peptide identification rates, individualized ppb-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol. 2008;26(12):1367–72.

    Article  CAS  PubMed  Google Scholar 

  6. Kim S, Gupta N, Pevzner PA. Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. J Proteome Res. 2008;7(8):3354–63.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Kong AT, Leprevost FV, Avtonomov DM, Mellacheruvu D, Nesvizhskii AI. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry–based proteomics. Nat Methods. 2017;14(5):513–20.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Frank AM, Bandeira N, Shen Z, Tanner S, Briggs SP, Smith RD, Pevzner PA. Clustering millions of tandem mass spectra. J Proteome Res. 2008;7(01):113–22.

    Article  CAS  PubMed  Google Scholar 

  9. Griss J, Perez-Riverol Y, Lewis S, Tabb DL, Dianes JA, Del-Toro N, Rurik M, Walzer M, Kohlbacher O, Hermjakob H. Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets. Nat Methods. 2016;13(8):651–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Wu CH, Apweiler R, Bairoch A, Natale DA, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, et al. The Universal protein resource (UniProt): an expanding universe of protein information. Nucleic Acids Res. 2006;34:D187-191.

    Article  CAS  PubMed  Google Scholar 

  11. Deutsch EW, Mendoza L, Shteynberg D, Farrah T, Lam H, Tasman N, Sun Z, Nilsson E, Pratt B, Prazen B. A guided tour of the trans-proteomic pipeline. Proteomics. 2010;10(6):1150–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. McIlwain S, Tamura K, Kertesz-Farkas A, Grant CE, Diament B, Frewen B, Howbert JJ, Hoopmann MR, Käll L, Eng JK. Crux: rapid open source protein tandem mass spectrometry analysis. J Proteome Res. 2014;13(10):4488–91.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Chick JM, Kolippakkam D, Nusinow DP, Zhai B, Rad R, Huttlin EL, Gygi SP. A mass-tolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides. Nat Biotechnol. 2015;33(7):743–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Not applicable

Funding

This work was supported by the National Research Foundation of Korea grant funded by the Korea government (Ministry of Science and ICT) (No. 2018R1A5A7059549 and No. 2021M3H9A2030520), and by the Korea Institute of Science and Technology Information (KISTI) and Korea Bio Data Station (K-BDS) with computing resources including technical support.

Author information

Authors and Affiliations

Authors

Contributions

YJ, KL, HK, and HP designed the study. YJ and KL performed computing experiments. All authors wrote, read and approved the final manuscript.

Corresponding author

Correspondence to Heejin Park.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Joh, Y., Lee, K., Kim, H. et al. Progressive search in tandem mass spectrometry. BMC Bioinformatics 24, 94 (2023). https://doi.org/10.1186/s12859-023-05222-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12859-023-05222-2

Keywords