Skip to main content

Table 2 A comparison between different probabilistic protein inference algorithms.

From: Computational approaches to protein inference in shotgun proteomics

Methods ProteinProphet MSBayesPro Fido MIPGEM
Underlying graph structure Bipartite graph with identified peptides and matching proteins1 Bayesian network with all peptides from proteins with at least one identified peptide Bayesian network with identified peptides and matching proteins k-partite graph with identified peptides, matching proteins and (optionally) matching gene models2
Inference algorithm EM (Expectation Maximization) like 1) Exact3;
2) Memorizing-Gibbs sampling
1) Exact3 ;
2) Pruning approximation
1) Exact3;
2) Direct sampling
Input Probabilities for peptides with user-defined cutoff for p (often p > 0.05 is used) Likelihood ratios for peptides with p > 0.05 and peptide detectabilities Likelihood ratios for peptides
with p > 0.05
Probabilities for peptides with user-defined cutoff for p (often p > 0.05 is used; 0.9 for best performance)
Output 1) Protein probabilities;
2) Protein group probabilities;
3) NSP adjusted peptide probabilities
1) MAP solution, protein abundances and probabilities;
2) Protein group probabilities;
3) Posterior peptide probabilities
1) Protein probabilities;
2) Protein group probabilities
1) Protein probabilities;
2) Gene model probabilities
Protein prior estimation No protein priors Direct frequency estimation based on protein posterior probabilities in one run of MSBayesPro Grid search optimizing cross-
validation performance through multi-runs of Fido with different
priors
Grid search optimizing model likelihood through multi-runs of the MIPGEM with different priors
Peptide probability adjustment by NSP from a parent protein Protein quantity adjusted peptide detectability Two detectability-like parameters α, β Treating peptide identifications as random variables
Protein grouping Yes No (indistinguishable proteins are resolved) Yes No (indistinguishable proteins are not resolved)
Peptide charge Considered Ignored Considered Considered
Novel aspects 1) First probabilistic protein inference algorithm;
2) Efficient EM algorithm
1) A Bayesian network;
2) Resolves indistinguishable proteins using unidentified peptides and peptide detectability;
3) Modified Gibbs sampling
1) Using a noise model to remedy inaccurate peptide probabilities;
2) Pruning algorithm, efficient inference
Gene model probabilities4
Availability http://tools.proteomecenter.org http://darwin.informatics.indiana.edu/yonli/ http://noble.gs.washington.edu/proj/fido -
  1. 1. For ProteinProphet, the underlying bipartite graph does not correspond to a Bayesian Network although it guides the EM-like algorithm through inference.
  2. 2. MIPGEM uses a rule-based protein removal scheme to simplify the network structure;
  3. 3. Exact computation is used only for small connected components;
  4. 4. Gene centric proteomics was proposed in [77], and implemented earlier in a deterministic way in [67].
\