Skip to main content

Table 2 A comparison between different probabilistic protein inference algorithms.

From: Computational approaches to protein inference in shotgun proteomics

Methods

ProteinProphet

MSBayesPro

Fido

MIPGEM

Underlying graph structure

Bipartite graph with identified peptides and matching proteins1

Bayesian network with all peptides from proteins with at least one identified peptide

Bayesian network with identified peptides and matching proteins

k-partite graph with identified peptides, matching proteins and (optionally) matching gene models2

Inference algorithm

EM (Expectation Maximization) like

1) Exact3;

2) Memorizing-Gibbs sampling

1) Exact3 ;

2) Pruning approximation

1) Exact3;

2) Direct sampling

Input

Probabilities for peptides with user-defined cutoff for p (often p > 0.05 is used)

Likelihood ratios for peptides with p > 0.05 and peptide detectabilities

Likelihood ratios for peptides

with p > 0.05

Probabilities for peptides with user-defined cutoff for p (often p > 0.05 is used; 0.9 for best performance)

Output

1) Protein probabilities;

2) Protein group probabilities;

3) NSP adjusted peptide probabilities

1) MAP solution, protein abundances and probabilities;

2) Protein group probabilities;

3) Posterior peptide probabilities

1) Protein probabilities;

2) Protein group probabilities

1) Protein probabilities;

2) Gene model probabilities

Protein prior estimation

No protein priors

Direct frequency estimation based on protein posterior probabilities in one run of MSBayesPro

Grid search optimizing cross-

validation performance through multi-runs of Fido with different

priors

Grid search optimizing model likelihood through multi-runs of the MIPGEM with different priors

Peptide probability adjustment by

NSP from a parent protein

Protein quantity adjusted peptide detectability

Two detectability-like parameters α, β

Treating peptide identifications as random variables

Protein grouping

Yes

No (indistinguishable proteins are resolved)

Yes

No (indistinguishable proteins are not resolved)

Peptide charge

Considered

Ignored

Considered

Considered

Novel aspects

1) First probabilistic protein inference algorithm;

2) Efficient EM algorithm

1) A Bayesian network;

2) Resolves indistinguishable proteins using unidentified peptides and peptide detectability;

3) Modified Gibbs sampling

1) Using a noise model to remedy inaccurate peptide probabilities;

2) Pruning algorithm, efficient inference

Gene model probabilities4

Availability

http://tools.proteomecenter.org

http://darwin.informatics.indiana.edu/yonli/

http://noble.gs.washington.edu/proj/fido

-

  1. 1. For ProteinProphet, the underlying bipartite graph does not correspond to a Bayesian Network although it guides the EM-like algorithm through inference.
  2. 2. MIPGEM uses a rule-based protein removal scheme to simplify the network structure;
  3. 3. Exact computation is used only for small connected components;
  4. 4. Gene centric proteomics was proposed in [77], and implemented earlier in a deterministic way in [67].