Skip to main content
Figure 2 | BMC Bioinformatics

Figure 2

From: Denoising PCR-amplified metagenome data

Figure 2

Discrimination plots for a typical cluster in the Artificial data set with 4691 reads. (a) simulated errors drawn from the error model and (b) the real errors in the cluster. Sequences (diamonds) are characterized by abundance and the probability λ per read of having been produced. On the x-axis, we plot logλ scaled by the most common error probability, TA→G, so that values can be interpreted as an effective Hamming distance. The dashed lines delineate the region – the lower left quadrant – where, for significance thresholds Ω a and Ω r provided by the user, DADA accepts that a sequence could have arisen via the error model. The vertical dashed lines shows the λ below which (or the effective distance above which) the read p-value rejects sequences as being errors, and the curved dashed line shows the abundances above which the abundance p-value rejects sequences as being errors for each value of λ. There are several sequences in the real data (red diamonds) that would be rejected by the abundance p-value at the Ω a = .01 significance level; we posit that early round PCR effects are a suitable candidate to explain these departures from the error model.

Back to article page