Skip to main content

Spectra library assisted de novo peptide sequencing for HCD and ETD spectra pairs



De novo peptide sequencing via tandem mass spectrometry (MS/MS) has been developed rapidly in recent years. With the use of spectra pairs from the same peptide under different fragmentation modes, performance of de novo sequencing is greatly improved. Currently, with large amount of spectra sequenced everyday, spectra libraries containing tens of thousands of annotated experimental MS/MS spectra become available. These libraries provide information of the spectra properties, thus have the potential to be used with de novo sequencing to improve its performance.


In this study, an improved de novo sequencing method assisted with spectra library is proposed. It uses spectra libraries as training datasets and introduces significant scores of the features used in our previous de novo sequencing method for HCD and ETD spectra pairs. Two pairs of HCD and ETD spectral datasets were used to test the performance of the proposed method and our previous method. The results show that this proposed method achieves better sequencing accuracy with higher ranked correct sequences and less computational time.


This paper proposed an advanced de novo sequencing method for HCD and ETD spectra pair and used information from spectra libraries and significant improved previous similar methods.


Tandem mass spectrometry (MS/MS) is a dominant technique nowadays for peptide sequencing [1]. A typical MS/MS experiment usually includes the following steps: protein mixtures are first digested into suitably sized peptides, and then the peptides are ionized via an ionization process. After that, selected peptides (also named as precursor ions) are further broken into fragment ions using different fragmentation techniques, and their tandem mass spectra (MS/MS spectra) are output [2]. MS/MS spectra usually contain two kinds of information of each ion detected, the mass-to-charge (m/z) value and the intensity.

In MS/MS experiments, precursor ions are broken into various kinds of fragment ions, among which, the commonly observed ones are named a-, b-, c-, x-, y-, and z-ions according to the cleavage sites on the peptide backbones. Different fragmentation techniques used in MS/MS yield differing dominant types of fragment ions. Collision-induced dissociation (CID) and higher-energy collisional dissociation (HCD) yield b-ions and y-ions as dominating ions. Electron capture dissociation (ECD) and electron transfer dissociation (ETD) preferentially produce variants of c-ions and z-ions, and occasionally a-ions [35].

Different kinds of computational methods including database search, de novo sequencing, and spectra library search have been developed for peptide sequencing using various MS/MS data. In database search, theoretical peptide spectra are computed from an existing protein database and peptides are identified by matching the theoretical spectra to experimental spectra. Spectra library search is a relatively new method that uses pre-build spectra libraries instead of protein database as the reference for matching. Spectra libraries contain annotated experimental spectra. This kind of method is claimed to be superior in speed and sensitivity, compared to the database search [6]. The limitation of these two kinds of methods is obvious that they can only identify peptides that are included in protein database or spectra libraries. De novo sequencing, on the other hand, interprets spectra directly using the masses of amino acids [710]. No prior database nor library is needed. Therefore, this kind of method has the potential to identify peptides that are not included in protein database and spectra libraries. The development of de novo sequencing used to be limited by insufficient information from an MS/MS spectrum itself, especially when the spectrum quality is low. However, with the recent development of high mass-accuracy MS/MS, alternative fragmentation techniques, and the idea of using multiple spectra from the same peptide for sequencing [11, 12], de novo sequencing has shown promising developments[3, 1317]. Therefore, this study focuses on de novo peptide sequencing methods.

A recent popular way of peptide sequencing is to combine different kinds of methods properly in order to achieve superior performance, for example, tag based searching methods [18, 19] can be viewed as a combination of de novo sequencing and database search. This kind of methods usually first produce partial sequences using de novo sequencing, called tags, from an MS/MS spectrum, and then use these tags to search against a protein database. The use of tags can dramatically reduce the search space and time needed.

Nowadays, with large amount of MS/MS spectra produced and sequenced, spectra libraries are expending. Information extracted from these experimental spectra in libraries can be used to enhance de novo sequencing performance. Previously, we have produced a series of de novo sequencing methods for alternative spectra including HCD and ECD/ETD spectra, and for multiple spectra generated by the same peptide [911]. We believe that information extracted from spectra libraries could help with these existing de novo sequencing methods. In this study, we use spectra libraries as training datasets to improve our previously proposed method for HCD and ETD spectra pairs.

Our previously proposed method for HCD and ETD spectra pairs [11] is based on the widely used spectrum graph model with proper modifications. In this method, a pair of spectra are first merged into one spectrum (the detailed merging steps are introduced in the following section), and a new spectrum graph with multiple types of edges is built on the merged spectrum. Then, the method uses peptide tags to separate the whole sequencing into small regions, and integrates amino acid composition (AAC) information into the graph model. Partial candidate sequences inferred from the graph are assembled together to be final candidates, and a ranking scheme is applied at last to find the best match. Several spectrum-specific features are applied to the graph model for sequencing. Since spectra libraries consist of annotated experimental spectra, features extracted from them are expected to reflect properties of real MS/MS spectra, and have the potential to improve our previous method. In this study, we propose an improved de novo sequencing method with the use of spectra libraries.


Spectra merging improvement

In our previously proposed method, in order to merge a pair of HCD and ETD spectra to be one spectrum for sequencing, peaks from both spectra satisfying certain criteria are selected. Having spectra libraries, we can evaluate these criteria and assign significant scores to them. Therefore, a new dimension of information, the significant score of each selected peak, can be added into the sequencing method. That is to say, previously we just decide to select a peak i into the merged spectrum (denoted as S m ) or not; but now, each selected i has a score associated with it, denoted as s s i , indicating the confidence level that it is a real fragment ion rather than a noisy peak. This score can be used as additional information in the following sequencing and candidate ranking steps. Next, we introduce the peak selection criteria and significant score calculation in details.

Two major selection criteria used in the spectra merging step are amino acid mass difference and ion complementarity. For the amino acid mass difference, a peak v in an experimental spectrum S is selected if there are two peaks u and t in S, one of the masses is smaller than v and the other is larger than v, and both u and t have mass difference to v equal to one of the 20 amino acid masses. For the ion complementarity, ions u and v in an experimental spectrum are both selected if masses of u and v satisfying the complementary ion relationship. Since ions may loss small molecules and contain various charge values, these variations are considered during the selection.

In order to evaluate the selection criteria and assign significant scores, spectra libraries are used as training datasets to calculate accuracies of the criteria. To be specific, on each spectrum in a spectra library, denoted as S L k , we apply the above selection criteria and then check how many of the selected peaks are real fragment ions (accuracy of such selection). The average accuracy on all S L k in the library is used as the significant score of such selection criterion.

To describe the selection and score calculation clearly, we denote A as the set of 20 amino acids, and a i A as a certain amino acid. a i is also used to represent its residue mass. S e represents an ETD spectrum and S h represents a HCD spectrum. m loss is defined to be the mass of some small molecules or groups lost from fragment ions, which include H 2 O and N H 3. u +,v +, and t + are the m/z values of u,v, and t in charge state +1. θ is a given threshold; and m p is the parent peptide mass. Relationships for selection and score calculation are summarized in Table 1. In the table, \(N_{ion}^{select}\) and \(N_{ion}^{real}\) are the number of selected ions using the selection criteria and real fragment ions in all selected ones, respectively. \(N_{comp}^{select}\) and \(N_{comp}^{real}\) are the number of selected complementary ions using the selection criteria and real fragment ions in all selected ones, respectively. a i ,a j A, and σ can be 0 or m loss (considering the loss of small molecules of fragment ions).

Table 1 Relationships for selection and score calculation

In the above selection, u,v, and t in multiple charges up to n−1 are considered, where n is the precursor ion charge. In a spectrum from a charge n precursor ion, typically the highest charged ion is the precursor ion itself, and fragment ions are in lower charge states (from +1 to n−1). Since the purpose here is to select real fragment ions, ion charges up to n−1 is considered. Basically, n−1 assumptions of charge values for each ion are built during the calculation, and the m/z values in charge state +1 is used for selection.

Finally, we summarize all scores calculated from spectra libraries in Table 2. We denote the total spectra number in a library is L. If a peak v in an experimental HCD or ETD spectrum satisfies multiple selection criteria, its final score s s v is the sum of the all scores from the selection.

Table 2 Scores calculated using spectra library SL

Here, we give a simple example to show the spectra merging and score assignment. We use the same example spectra as the ones in [11] with addition of significant scores. Assume the m/z values of two experimental spectra are S c ={130,199,277,346} (represent a HCD spectrum) and S e ={132,182,234} (represent an ETD spectrum). The parent mass is m p =492. The charge states of S c and S e are +2 and +3, respectively. The lost small molecule is H 2 O, and m loss =18. Integer values are used for all masses here to simplify the calculation and focus on the method process.

In the above selection, ions in multiple charges up to n−1 are considered, where n is the precursor ion charge. Therefore, we build n−1 assumptions of each spectrum and convert all ions to charge state +1. For the two spectra in the example, three associated spectra are generated. A spectrum with subscripts i t o1(i=1,2) represents the spectrum with charge +1 m/z values of the ions when assuming all ions are in charge state i. Different fonts and underlining of values are explained in later context.

$$\begin{array}{@{}rcl@{}} &S^{c}_{1to1}=\{130, \underline{\mathbf{199}}, \underline{277}, 346\},\\ &S^{e}_{1to1}=\{\underline{132}, 182, 234\},\\ &S^{e}_{2to1}=\{263, \underline{363}, 467\}. \end{array} $$

We first deal with \(S^{c}_{1to1}\). From the calculations in Table 1, we get that values 130, 199, and 346 satisfy (199−130)−a S +18=0 and (346−199)−a E −18=0, where a S =87 and a E =129 are the masses of serine and glutamine, respectively. Then we infer that the ion having m/z value of 199 (in boldface above) is a charge +1 fragment ion having a molecular of water loss, and score \({ss}_{199} = S^{aa}_{SL-HCD}\), where \(S^{aa}_{SL-HCD}\) is the amino acid difference score calculated on a HCD spectra library. In addition, we get that values 199 and 277 (underlined above) satisfy 492+2m H −(199+277)−18=0. Then we infer that these two ions are complimentary ions in charge state +1. s s 199 is updated to be \({ss}_{199} = S^{aa}_{SL-HCD} + S^{comp}_{SL-HCD}\), and \({ss}_{277} = S^{comp}_{SL-HCD}\), where \(S^{comp}_{SL-HCD}\) is ion complementarity score calculated on a HCD spectra library.

We now deal with \(S^{e}_{1to1}\) and \(S^{e}_{2to1}\). Values 132 and 363 (underlined above) satisfy 492+3m H −(132+363)=0. Then we infer that these two ions are complimentary ions, and the ion having m/z value of 182 is in charge state +2 (the ion at the same position as ion 363 in \(S^{e}_{1to1}\)). Ion complementarity score calculated on a ETD spectra library, \(S^{comp}_{SL-ETD}\), is assigned to both ions.

At this point, no more ions can be found satisfying the relationships described in Table 1. Therefore, the final merged spectrum S m is \(S_{m}=\left \{\left (132, S^{comp}_{SL-ETD}\right),\right.\left.\! \left (363, S^{comp}_{SL-ETD}\right), \left (\!199, S^{aa}_{SL-HCD} +^{comp}_{SL-HCD}\right), \left (277, S^{comp}_{SL-HCD}\right)\!\right \}\).

De novo sequencing modification

In the sequencing part, we first extend the peptide tags and re-rank them. The previously used tags are partial sequences consisting of three amino acids. If two tags t i ,t j T (T is the tag set) have two successive amino acids overlap and m/z values associated with the two tags have overlap, then a new tag t ij consisting of four amino acids is generated and added into T. Let us say t i =T A G and t j =A G T where the overlapped amino acids are A G,t i is generated by peaks I 1,I 2,I 3,I 4, and t j is generated by peaks J 1,J 2,J 3,J 4. I x and J x are also used to represent their m/z values, where x=1,2,3,4. If |I x J x−1|≤θ, where x=2,3,4, then a new tag t ij =T A G T is generated, and Tt ij T. If there is another tag t p =G T A where t p T, and the m/z of the ions generating t ij and t p satisfying the above relationship, a new tag t ijp =T A G T A is generated, and Tt ijp T. This process continues until no more new tags can be generated. For each tag tT with length l t , it is generated by l t +1 peaks in an experimental spectrum. Typically, amino acids in the middle of a tag, for example, the AT in t ij , tend to be more reliable than the amino acids in the ends, for example,the two Ts in t ij . Since each peak has a score calculated from above subsection, the score of t, denoted as s t , is a sum of the l t +1 peaks with proper weights to all peaks. s t is calculated using Eq. 1. Here, s s l is the score of l th peak in tag t. (1+0.1×m i n{l,l t l}) is the weight assigned to the l th peak. With this calculation, peaks in the two ends have lower weights and peaks in the middle parts have higher weights.

$$ s_{t} = \frac{1}{l_{t}+1} \sum^{l_{t} + 1}_{l=1} {ss}_{l}\left(1+0.1 \times min\lbrace l, l_{t}-l\rbrace\right) $$

All the tags tT are then ranked according to s t , and Sel tags with highest ranking are selected for the following sequencing. The set of the selected tags is denoted as TS.

Having a tag t sT S, the graph model with multiple types of edges are applied to find candidate peptide sequences. Since each peak has a score, the algorithm searches from the highest scored peak to extend paths, and a threshold is used here to stop the searching. Here, when K paths are successfully found, the searching stops. Here K is a user defined threshold.

Candidate ranking

Each candidate peptide P cp is generated by finding a proper path in the graph model. Since each vertex on the path represents a peak in the merged spectrum, the score of P cp , denoted as c s cp , is defined as the peaks’ score sum of all the peaks on the path generating P cp . When all candidate peptides are generated, we rank them with their score P cp , and output highest C candidates and their scores. Here, C can be defined by users.

Results and discussion

In this section, we use two spectra libraries, one containing HCD spectra and the other containing ETD spectra, as training datasets to calculate significant scores introduced above. Two pairs of HCD and ETD spectral datasets are used to test the performance of the proposed de novo sequencing method. The comparison to our previous method and results analysis are given as well.

Spectra libraries and MS/MS data

In this study, two spectra libraries consisting of annotated HCD and ETD spectra respectively are used. The first library of HCD spectra is from The National Institute of Science and Technology (NIST) website ( NIST has built MS/MS spectra libraries for several model organisms and made them publicly available [20]. We use the human peptide spectral library (built date Nov 24, 2014) containing 183,140 spectra measured with Orbitrap-HCD. The other library is a peptide library of over 100,000 synthetic, unmodified peptides and their phosphorylated counterparts, and they were analysed by both HCD and ETD fragmentation of MS/MS [21]. Among them, the ETD spectra of unmodified peptides are used in this study. The annotated peptide associated with each spectrum in these libraries was used as the correct sequence of such spectrum.

Experimental MS/MS spectra used here are similar as the ones in our previous study [11]. Two pairs of HCD and ETD spectral datasets, SCX _HCD_decon and SCX _ETD_decon, plus SCX _HCD_no_decon and SCX _ETD_no_decon, are used here. These pairs of datasets are from the same research paper [22]. The latter dataset pair (labeled with “ _no_decon”) contains raw data without deconvolution of spectra while the other pair contains spectra with deconvolution [11]. The original datasets contain spectra analysed by CID, HCD, and ETD fragmentation. Each spectrum has a sequence associated with it. The HCD and ETD spectra pairs having the same peptide sequences were selected first, and those pairs that can be successfully sequenced using only single spectrum separately are filtered out. Methods used in this filtration are NovoHCD [9] and NovoGMET [10], respectively. The reason for this filtration is that the focus of de novo sequencing using multiple spectra is for those ones that can not be sequenced by using just one spectrum. The number of spectra, the charges of spectra, and the number of selected pairs of spectra for experiment are summarized in Table 3.

Table 3 Number of spectra and charges in each dataset used in the experiments


There are several parameters usesd in the proposed method, and the values applied in the experiments are listed in Table 4. θ, number of tags generated for each experimental spectrum, and number of output candiates are set according to our previous study and experiments [11]. The number of tags is chosen to be 10 because the tags ranked lower than the top 10 tags are most likely to be wrong tags according to our previous study [9].

Table 4 Parameters used in the experiments

Score calculation

When using spectra libraries to calculate significant scores of the selection criteria, we investigate the score variation on different peaks in a spectrum. The results show that for the ion complementarity score on HCD spectra, the peak pairs in the middle of a spectrum tend to generate lower significant scores than the pairs in the two ends of a spectrum. Therefore, for a spectrum S L k in NIST-HCD library, we divide the peaks on S L k into four parts evenly according to the highest m/z valued peak. Then, \(S^{comp}_{SL}\) calculated using peak pairs in the middle two parts and the end two parts, denoted as \(S^{compM}_{SL}\) and \(S^{compE}_{SL}\) respectively, are shown in Table 5.

Table 5 Significant score calculated using spectra libraries

From Table 5 one can see the scores of peak pairs on different positions on a spectrum are distinguishable, and it is necessary to use two scores to represent them. We then investigate the same score on SynthETD library. However, the score variation is slight on it (0.38 to 0.44). Therefore, we use just one \(S^{comp}_{SL}\) score for ETD spectra.

For the score \(S^{aa}_{SL}\), preliminary results show that this change is not significant as well (0.54 to 0.62 on NIST-HCD library and 0.61 to 0.68 on SynthETD library). In addition, considering that there will be 400 slightly different scores if we distinguish the 20 amino acids, we just use one score, \(S^{aa}_{SL}\), to describe the significance of the amino acids differences. The values of \(S^{aa}_{SL}\) calculated on the two spectra libraries are shown in Table 5.

De novo sequencing performance

We first investigate the full length sequencing accuracy of the proposed method and our previous method. Our previous method output the three highest ranked peptide candidates for each spectra pair, and if any one of them matches the correct sequence, we say that this pair of spectra are correctly sequenced. Here, the same criterion is applied to the proposed method. The accuracy comparison of the proposed method and previous method using two pair of HCD and ETD datasets is shown in Table 6. Results on the previous method are from the orignal research paper presented them [11].

Table 6 Full length peptide sequencing accuracy comparison on two HCD and ETD dataset pairs

One can see from the results that the proposed method has similar accuracy compared to the previous method. This indicates that with the use of longer peptide tags and stop criterion (threshold K), the proposed method maintains the performance without any drop of accuracy. The proposed method does has a slight accuracy increase on these two pairs of datasets. After further investigating the results, it shows that the increase is because of the new ranking score of candidate peptides. Correct sequences are ranked higher (within top three) using the new ranking scores for those newly sequenced spectra, compared to the previous method, which are ranked out of the top three.

We then further analyse the rankings of candidate peptides sequenced from the proposed method since that it is a major difference between the proposed method and the previous method. One situation in the previous method is that often, several top ranked ones have very similar ranking scores. (We omit the detials of the ranking scheme here to aviod reduandency, and details can be seen in [11]). The correct sequence may not always be the highest scored one (ranked as first), but second or third ranked ones who has similar ranking scores as the highest one. So the previous method outputs all 3 highest ranked candidates. In this study, we would like to improve the ranking scheme with the significant scores calculated from spectra libraries. In the following, in order to show the contribution of the new ranking scheme, we compare the accuracy differences of the following three cases: output only the highest ranked one, the top two ranked ones, and the top three ranked ones. Results of the previous and proposed method on two different dataset pairs are shown in Tables 7 and 8.

Table 7 Accuracy comparison on SCX _HCD_decon and SCX _ETD_decon dataset pair with different output
Table 8 Accuracy comparison on SCX _HCD_no_decon and SCX _ETD_no_decon dataset pair with different output

From these figures one can see that the new ranking scheme has better performance than the one used in the previous method. With this new approach, more of the highest ranked candidates are the correct sequences, with an increase up to 11% compared to the previous method, if only outputting the first ranked candidates.

Finally, the computational time of the proposed method and our previous method is compared in Table 9. Both algorithms were written using MATLAB (2010b) and run on a PC with a 3.07 GHz quad-core CPU and MS Windows 7 operating system. Since the proposed method uses longer tags and limits the number of paths in the graph model, it uses less computational time for calculation compared to the previous method. The time saving is about 25 and 40% on the two pair of HCD and ETD datasets.

Table 9 Computational time comparison on two HCD and ETD dataset pairs


In this paper, an improved de novo sequencing method assisted with spectra library for HCD and ETD spectra pairs is proposed. It is a development of our previous proposed method for the same problem [11]. The proposed method uses spectra libraries as training datasets and introduces significant scores to the spectra merging criteria of the previous method. In addition, the use of tags is improved; the original length-three tags (three amino acids long) are extended to be longer tags in this method.

Two spectra libraries, one of HCD and the other of ETD spectra, were used to generate signigicant scores. To investigate the performance of the proposed method, two pairs of HCD and ETD spectral datasets were used for test and compared with our previous method. When outputting top three ranked candidates, the proposed method has a slight increase in terms of sequencing accuracy compared to the previous method. But the accuracy differs significantly when outputting only top one ranked candidates. In the latter case, the proposed method achieved higher accuracy up to 11% increase, compared to the previous method. In addition, with longer peptide tags used, the proposed method uses less computational time than the previous method, with a time saving up to 25 and 40% on the two pair of experimental spectral datasets. To summarize the advantages of this proposed method, it achieves better de novo sequencing accuracy with higher ranked correct sequences and less computational time.

In future, we would like to evaluate the proposed method on more MS/MS datasets, and further study the spectra library to integrate more information to the de novo sequencing methods for enhanced performance.


  1. Yan Y, Kusalik AJ, Wu FX. Protein Pept Lett. 2015; 22(11):983–91.

    Article  CAS  PubMed  Google Scholar 

  2. Wysocki VH, Resing KA, Zhang Q, Cheng G. Mass spectrometry of peptides and proteins. Methods. 2005; 35(3):211–22.

    Article  CAS  PubMed  Google Scholar 

  3. He L, Ma B. ADEPTS: advance peptide de novo sequencing with a pair of tandem mass spectra. J Bioinforma Comput Biol. 2010; 8:981–94.

    Article  CAS  Google Scholar 

  4. Fälth M, Savitski MM, Nielsen ML, Kjeldsen F, Andren PE, Zubarev Ra. Analytical utility of small neutral losses from reduced species in electron capture dissociation studied using SwedECD database. Anal Chem. 2008; 80(21):8089–94.

    Article  PubMed  Google Scholar 

  5. Chalkley RJ, Medzihradszky KF, Lynn aJ, Baker PR, Burlingame aL. Statistical analysis of peptide electron transfer dissociation fragmentation mass spectrometry. Anal Chem. 2010; 82(2):579–84.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Lam H, Aebersold R. Building and searching tandem mass (ms/ms) spectral libraries for peptide identification in proteomics. Methods. 2011; 54(4):424–31. Advances in biological mass spectrometry and proteomics.

    Article  CAS  PubMed  Google Scholar 

  7. Dancik V, Addona TA, Clauser KR, Vath JE, Pevzner PA. De novo peptide sequencing via tandem mass spectrometry. J Comput Biol. 1999; 6(3-4):327–42.

    Article  CAS  PubMed  Google Scholar 

  8. Horn DM, Zubarev RA, McLafferty FW. Automated de novo sequencing of proteins by tandem high-resolution mass spectrometry. Proc Natl Acad Sci. 2000; 97(19):10313–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Yan Y, Kusalik AJ, Wu FX. NovoHCD: De novo peptide sequencing from HCD spectra. IEEE Trans NanoBioscience. 2014; 13(2):65–72.

    Article  PubMed  Google Scholar 

  10. Yan Y, Kusalik AJ, Wu FX. NovoExD: De novo peptide sequencing for ETD/ECD spectra. IEEE Trans Comput Biol Bioinforma. 2015; PP(99):1–1.

    Article  Google Scholar 

  11. Yan Y, Kusalik AJ, Wu FX. A framework of de novo peptide sequencing for multiple tandem mass spectra. IEEE Trans NanoBioscience. 2015; 14(4):478–84.

    Article  Google Scholar 

  12. Guthals A, Clauser KR, Frank AM, Bandeira N. Sequencing-grade de novo analysis of MS/MS triplets (CID/HCD/ETD) from overlapping peptides. J Proteome Res. 2013; 12(6):2846–57.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Lu B, Chen T. Algorithms for de novo peptide sequencing using tandem mass spectrometry. Drug Discov Today: BIOSILICO. 2004; 2(2):85–90.

    Article  CAS  Google Scholar 

  14. Ma B, Zhang K, Hendrie C, Liang C, Li M, Doherty-Kirby A, Lajoie G. PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun Mass Spectrom. 2003; 17(20):2337–42.

    Article  CAS  PubMed  Google Scholar 

  15. Frank A, Pevzner P. Pepnovo: A de novo peptide sequencing via probabilistic network modeling. Anal Chem. 2005; 77(4):964–73.

    Article  CAS  PubMed  Google Scholar 

  16. Savitski MM, Nielsen ML, Kjeldsen F, Zubarev RA. Proteomics-grade de novo sequencing approach. J Proteome Res. 2005; 4(6):2348–54.

    Article  CAS  PubMed  Google Scholar 

  17. Bertsch A, Leinenbach A, Pervukhin A, Lubeck M, Hartmer R, Baessmann C, Elnakady YA, Müller R, Böcker S, Huber CG, et al. De novo peptide sequencing by tandem ms using complementary cid and electron transfer dissociation. Electrophoresis. 2009; 30(21):3736–47.

    Article  CAS  PubMed  Google Scholar 

  18. Pan C, Park B, McDonald W, Carey P, Banfield J, VerBerkmoes N, Hettich R, Samatova N. A high-throughput de novo sequencing approach for shotgun proteomics using high-resolution tandem mass spectrometry. BMC Bioinforma. 2010; 11(1):118.

    Article  Google Scholar 

  19. Tabb DL, Ma ZQ, Martin DB, Ham A-JL, Chambers MC. DirecTag: Accurate sequence tags from peptide MS/MS through statistical scoring. J Proteome Res. 2008; 7(9):3838–46.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Ma B. Novor: Real-time peptide de novo sequencing software. J Am Soc Mass Spectrom. 2015; 26(11):1885–94.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Marx H, Lemeer S, Schliep JE, Matheron L, Mohammed S, Cox J, Mann M, Heck AJ, Kuster B. A large synthetic peptide and phosphopeptide reference library for mass spectrometry-based proteomics. Nat Biotechnol. 2013; 31(6):557–64.

    Article  CAS  PubMed  Google Scholar 

  22. Frese CK, Altelaar AFM, Hennrich ML, Nolting D, Zeller M, Griep-Raming J, Heck AJR, Mohammed S. Improved peptide identification by targeted fragmentation using CID, HCD and ETD on an LTQ-Orbitrap velos. J Proteome Res. 2011; 10(5):2377–88.

    Article  CAS  PubMed  Google Scholar 

Download references


Not applicable.


This article has been published as part of BMC Bioinformatics Volume 17 Supplement 17, 2016: Proceedings of the 27th International Conference on Genome Informatics: bioinformatics. The full contents of the supplement are available online at


This work and the publication cost of this paper are supported by a discovery research grant and a discovery accelerator supplements grant from Natural Sciences and Engineering Research Council of Canada (NSERC).

Availability of data and material

The experimental datasets analysed during the current study are included in the published articles in references [11]. The spectra libraries are from reference [21] and

Authors’ contributions

YY developed the proposed method, conducted the experiments and wrote the manuscript. KZ provided comments and suggestions on the method development, and reviewed the manuscript on revision. Both authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Consent for publication

Not applicable.

Ethics approval and consent to participate

Not applicable.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Kaizhong Zhang.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yan, Y., Zhang, K. Spectra library assisted de novo peptide sequencing for HCD and ETD spectra pairs. BMC Bioinformatics 17 (Suppl 17), 538 (2016).

Download citation

  • Published:

  • DOI: