Skip to main content

Table 1 Features used in the machine learning formulation

From: An improved machine learning protocol for the identification of correct Sequest search results

Group

Name

Meaning

Origin

SEQUEST

XCorr

Rank score from the SEQUEST search.

SEQUEST

 

deltaMH

Difference between mass of parent ion and identified peptide mass.

SEQUEST

 

deltCn

Difference between XCorr of the highest ranked peptide and the peptide in question

SEQUEST

 

SP score

Preliminary score of peptide in search procedure

SEQUEST

 

SP rank

Initial rank of peptide based on SP-score

SEQUEST

 

Ion fraction

Percentage of ions in the mass spectra that could be correlated with the spectrum

SEQUEST

Published

Number of tryptic

Number of tryptic cleavage sites in the peptide targets (NTT)

Calculated

 

Peptide lenght

Residue count of the peptide

Calculated

 

Summed Intesity

Sum of peak intensities in the spectra

Calculated

 

Mobil proton factor (MPF)

Measure of the proton mobility in peptide

Calculated

 

C-terminal Residue

Amino acid residue at c-terminal (Arg = 1, Lys = 2, Other = 3)

Calculated

 

Mass-window peptides

# of DB peptides within prespecified mass-window of the parent ion

Calculated

 

Proline count

# of Pro residues in the peptide

Calculated

 

Arginine count

# of Arg residues in the peptide

Calculated

Novel

Intensity Mean

The mean of the peak intensities

Calculated

 

Intensity Std.

Std. of the peak intensities

Calculated

 

Intensity bins

The distribution of intensities in 20%-bins

Calculated

 

Protein Hit Count (PHC)

Probability score of observing × number of peptides from parent protein

Calculated

 

Potential Coverage

Ratio

The potential sequence coverage

Calculated

 

PTM percentage

The percentage of possible PTMs found in a peptide

Calculated

  1. For each individual feature we give a brief description and indicate whether the feature was obtained from the output of the SEQUEST algorithm or calculated from the identified peptide, the mass spectrum, or database statistics. The features have been divided up into three subgroups SEQUEST, Published, and Novel, denoting those features that can be derived directly from the SEQUEST algorithm output, those used in published studies of the identification problem, and those introduced in this work, respectively.