Skip to main content

Table 1 Accuracy results for the mean 85 AA COG simulation

From: pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree

range ML μ PP μ ML σ PP σ ML FC PP FC ML # PP #
0.0-0.1 - - - - - - 0 0
0.1-0.2 3.57 3.78 3.09 3.27 0.07 0.03 4149 2312
0.2-0.3 2.97 3.19 3.04 3.06 0.16 0.11 15123 9018
0.3-0.4 2.39 2.76 3.00 3.07 0.26 0.17 22696 18373
0.4-0.5 2.25 2.29 3.11 2.98 0.32 0.24 20120 23022
0.5-0.6 2.14 2.11 3.09 3.01 0.36 0.32 17228 20090
0.6-0.7 1.94 1.95 3.04 2.99 0.42 0.38 14113 16223
0.7-0.8 1.86 1.85 3.05 3.01 0.47 0.44 13527 14879
0.8-0.9 1.62 1.65 2.97 2.97 0.55 0.52 14850 15747
0.9-1.0 0.32 0.32 1.54 1.53 0.92 0.92 163815 165957
  1. Error analysis for the COG simulation with the error metric described in the text. As in Figure 6, simulated reads had a normally-distributed length with a mean of 85 amino acids, and a standard deviation of 20. This table pools the results, and shows mean (μ) and standard deviation (σ) of the error, the fraction placed correctly (FC), and the number of reads placed for pplacer run in maximum likelihood (ML) and posterior probability (PP) modes. For example, the "ML" columns in the row labeled 0.4-0.5 shows error statistics for all of the reads in the simulation that had likelihood weight ratio between 0.4 and 0.5: there were 20120 such reads of which 32% were placed correctly, and the corresponding error mean and standard deviation of about 2.25 and 2.29, respectively. This table demonstrates the effectiveness of the confidence scores- as the confidence scores increase, the error decreases. We note that the ML and PP methods have very comparable performance for this length of read, and thus the quickly-calculated ML weight ratio can act as a proxy for the more statistically rigorous posterior probability calculation.