Skip to main content

Table 1 Summary of notation and abbreviations used throughout this paper.

From: Computational approaches to protein inference in shotgun proteomics

Notation Description
Set of all fragmentation spectra outputted by mass spectrometer
Set of spectra identified for peptide j
s A single fragmentation spectrum,
P i or i Protein i
p j or j Peptide j
p ij Peptide j derived from protein i; used to explicitly indicate the parent protein for peptide j
Protein database, a set of proteins used for peptide and protein identification
Peptide database, the set of all (tryptic) peptides derived from
Set of peptides derived from protein P i
t j Indicator variable, set to 1 if peptide is p j confidently identified
Set of peptides that are confidently identified
x j Indicator variable, set to 1 if is present in the sample
y i Indicator variable, set to 1 if is present in the sample
x = (x1, ... , x j , ...) Indicator vector representing all peptides in
y = (y1, ... , y i , ...) Indicator vector representing all proteins in
N(i) Set of peptides mapped to protein P i
N(j) Set of proteins that contain peptide p j
x N(i) Indicator vector representing peptides in
Peptide identification probability, the probability that peptide j is present in the sample given the spectra identified for peptide j
P (x j = 1|s) The probability of the PSM matching to be correct when peptide j is the top-scoring match of spectrum
Protein posterior probabilities, the probability that protein i is present in the sample given all spectra
d ij (q) Detectability of peptide p ij at some specified quantity q; effective detectability
Detectability of peptide p ij at standard quantity q0 ; standard detectability
d ij Detectability of peptide p ij ; effective detectability
NSP ij The estimated number of (identified) sibling peptides of peptide p ij , used by ProteinProphet to adjust the peptide identification probability
PSM Peptide-spectrum match; when it is clear from the context, we use PSM to also refer to the top-scoring PSM per spectrum
FDR False discovery rate; the fraction of incorrect peptide identifications in or the fraction of incorrect protein identifications in a given list outputted by a protein inference algorithm. FDR should be distinguished from the false positive rate (FPR), the fraction of all peptides (proteins) from the database that are not present in the sample but are predicted to be present (at a particular threshold).