Skip to main content

Table 1 Summary of notation and abbreviations used throughout this paper.

From: Computational approaches to protein inference in shotgun proteomics

Notation

Description

Set of all fragmentation spectra outputted by mass spectrometer

Set of spectra identified for peptide j

s

A single fragmentation spectrum,

P i or i

Protein i

p j or j

Peptide j

p ij

Peptide j derived from protein i; used to explicitly indicate the parent protein for peptide j

Protein database, a set of proteins used for peptide and protein identification

Peptide database, the set of all (tryptic) peptides derived from

Set of peptides derived from protein P i

t j

Indicator variable, set to 1 if peptide is p j confidently identified

Set of peptides that are confidently identified

x j

Indicator variable, set to 1 if is present in the sample

y i

Indicator variable, set to 1 if is present in the sample

x = (x1, ... , x j , ...)

Indicator vector representing all peptides in

y = (y1, ... , y i , ...)

Indicator vector representing all proteins in

N(i)

Set of peptides mapped to protein P i

N(j)

Set of proteins that contain peptide p j

x N(i)

Indicator vector representing peptides in

Peptide identification probability, the probability that peptide j is present in the sample given the spectra identified for peptide j

P (x j = 1|s)

The probability of the PSM matching to be correct when peptide j is the top-scoring match of spectrum

Protein posterior probabilities, the probability that protein i is present in the sample given all spectra

d ij (q)

Detectability of peptide p ij at some specified quantity q; effective detectability

Detectability of peptide p ij at standard quantity q0 ; standard detectability

d ij

Detectability of peptide p ij ; effective detectability

NSP ij

The estimated number of (identified) sibling peptides of peptide p ij , used by ProteinProphet to adjust the peptide identification probability

PSM

Peptide-spectrum match; when it is clear from the context, we use PSM to also refer to the top-scoring PSM per spectrum

FDR

False discovery rate; the fraction of incorrect peptide identifications in or the fraction of incorrect protein identifications in a given list outputted by a protein inference algorithm. FDR should be distinguished from the false positive rate (FPR), the fraction of all peptides (proteins) from the database that are not present in the sample but are predicted to be present (at a particular threshold).