Impact of variance components on reliability of absolute quantification using digital PCR

Background Digital polymerase chain reaction (dPCR) is an increasingly popular technology for detecting and quantifying target nucleic acids. Its advertised strength is high precision absolute quantification without needing reference curves. The standard data analytic approach follows a seemingly straightforward theoretical framework but ignores sources of variation in the data generating process. These stem from both technical and biological factors, where we distinguish features that are 1) hard-wired in the equipment, 2) user-dependent and 3) provided by manufacturers but may be adapted by the user. The impact of the corresponding variance components on the accuracy and precision of target concentration estimators presented in the literature is studied through simulation. Results We reveal how system-specific technical factors influence accuracy as well as precision of concentration estimates. We find that a well-chosen sample dilution level and modifiable settings such as the fluorescence cut-off for target copy detection have a substantial impact on reliability and can be adapted to the sample analysed in ways that matter. User-dependent technical variation, including pipette inaccuracy and specific sources of sample heterogeneity, leads to a steep increase in uncertainty of estimated concentrations. Users can discover this through replicate experiments and derived variance estimation. Finally, the detection performance can be improved by optimizing the fluorescence intensity cut point as suboptimal thresholds reduce the accuracy of concentration estimates considerably. Conclusions Like any other technology, dPCR is subject to variation induced by natural perturbations, systematic settings as well as user-dependent protocols. Corresponding uncertainty may be controlled with an adapted experimental design. Our findings point to modifiable key sources of uncertainty that form an important starting point for the development of guidelines on dPCR design and data analysis with correct precision bounds. Besides clever choices of sample dilution levels, experiment-specific tuning of machine settings can greatly improve results. Well-chosen data-driven fluorescence intensity thresholds in particular result in major improvements in target presence detection. We call on manufacturers to provide sufficiently detailed output data that allows users to maximize the potential of the method in their setting and obtain high precision and accuracy for their experiments. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-283) contains supplementary material, which is available to authorized users.

In what follows, we will consider experiments with given number n r of retained partitions. We assume n r is fixed and known. All results are derived conditional on n r . To improve readability, we will omit this in the notation.

Derivation of the confidence interval
Under regularity assumptions, the number of target copies X in a constant volume follows a Poisson distribution P ois(λ) [1], [2]. Since E[X] = λ, this can be defined as the expected number of copies per partition if we assume that the partition volume is constant. Define K the number of partitions that return a negative signal out of the n r retained partitions. The probability that a partition did not contain an initial target copy p = P (X = 0) = exp(−λ) can estimated as K n r and λ can be estimated aŝ It is shown in [1] that this is the maximum likelihood estimator (MLE) under the following model: Let Y be a binary indicator that is Y = 1 if a given partition does not contain a target copy (X = 0) and Y = 0 if it does contain a target copy (X > 0). Under the assumptions that the number of copies in a constant volume is Poisson distributed and all partitions have the same probability 1 − p to contain a target copy, the number of partitions with a negative signal K equals: where B denotes the binomial distribution. Using maximum likelihood theory, we can immediately obtain an estimate for the variance by inverting the Fisher information matrix: For more details, we refer to [1]. A plug-in estimator for the variance of the number of target copies per partition is obtained after replacing e −λ by its estimator K n r : V ar λ = n r − K Kn r .
Sinceλ is an MLE, the asymptotic 95% confidence interval can be calculated as: The confidence interval above gives highly similar results to the one derived in [3]. But, it is more accurate close to the right border (few negative partitions).

Optimization of the theoretical precision
The width of the confidence interval of the concentration is dependent on the asymptotic variance, which is a function of λ, the number of copies per partition. As such, it can be minimized with respect to the parameter λ. We aim to find the concentration such that the relative variance per copy number V ar λ λ is minimal. This is equivalent to optimizing the dilution for most accurate measurements. We minimize the following loss function: After derivation, we get: Since λ > 0, we solve λ − 2 + 2e −λ = 0 for λ. This has no closed-form solution, but can easily numerically approximated. We get λ = 1.59 which means the most precise estimates can be obtained for 1.59 copies per partition. Note, that the same result can be found by maximizing the Fisher information of log(λ) [1].

Decomposition of the variance in the presence of pipette error
Suppose we examine a sample and we prepare a reaction mix in several replicates to determine the concentration of a target gene. We define θ as the concentration of target nucleic acids (NA) in our raw material. When preparing the technical replicates, we mix the purified NA with appropriate primers, probes and other material necessary for the PCR reaction. Under the assumptions of the Poisson model, the concentration of each replicate, c k , is drawn from a P oisson(η k ) distribution with η k = η = θ V p V r , V p the pipetted volume and V r the volume of each reaction mix. In practice, pipette errors and sample heterogeneity occur. Hence, we have to redefine as the expected concentration in each replicate given the actual pipetted volume, V p k , and the actual volume of the reaction mix, V r k . We will thus estimatê θ k =η k V r V p for each replicate k. When technical replicates are prepared by the same operator and/or pipette, systematic pipette error can lead to bias: E[η k ] = η = η and E[θ k ] = θ. Users can assess and correct for this in a controlled laboratory environment.
Additional variability cannot be avoided as every pipetting step introduces random error. We have as a general property: Only for an ideal pipette, E k (η k ) = η k = η and E k θ k = θ k = θ so term B equals 0 and we can use the asymptotic variance estimator.
In the presence of random pipette error and the absence of systematic errors, the η k fluctuate randomly around η and this term will not disappear. Hence, the asymptotic variance estimator will underestimate the variance. Pipette error, thus, introduces an additional source of between replicate variation. Technical replicates can be used to account for this. Empirical variance estimators will capture both the variation of the individual estimates (term A) and the variation of the θ k around θ (term B).

Derivations for a model with unequal partition sizes
We assume in what follows that the probability of containing a copy is proportional to the partition size. Using previous notation, we have: For an experiment unequal partition sizes s i we have: When we consider a hypothetical reference experiment on the same replicate with equal partition size then nr i=1 λ i = n r λ and we see that e −λi = n r e −λi ≥ n r e −λi = n r e −λ = n r p where the inequality follows from the property that an arithmetic average is always at least as large as a geometric average. Consequently, we have shown that the expected number of partitions without a copy is larger than the expected number under the equal partition size assumption. Note, that when the s i are similar, so are the λ i and thus the difference will be small. Although at first sight invisible in the formula, this difference is highly dependent on the number of target copies in the mix. We have: We can see that changes for K closer to 0 (few negative partitions, high concentration of target copies) have a much larger influence on the estimate than changes for K closer to n r (many negative partitions, low concentration of target copies). Consequently, the downwards bias as a result of unequal partition sizes will be especially visible when there are many target copies present in the reaction mix. It is difficult to give a theoretical estimate for the variance in this case as every partition has a unique λ i .

An optimal ratio to minimize misclassification
We can write the ratio of false negatives to false positives as a function of the concentration for an unbiased estimator. The estimator is unbiased if the expected number of false positives equals the expected number of false negatives. Define π F P R = P (positive signal | no target) the false positive rate and π F N R = P (negative signal |target) the false negative rate. Assume for simplicity an experiment with equal partition size and no pipette error. We consider E[K] and In the absence of bias, . We can solve this for π F N R /π F P R and get an estimate of the necessary ratio to get an unbiased estimator if the concentration is given or already estimated.
This confirms the intuition that for a small number of target copies, we can accept a higher rate of false negatives if we keep the false positive rate small. For very concentrated samples, a higher rate of false positives is not problematic, but we want to keep the false negative rate small. Alternatively, we can solve the equation to E[p] to know for which concentration the ratio of the proportions can be equal to a certain given ratio.

E[p] =
π F N R π F N R + π F P R ⇒ λ = − log π F N R π F N R + π F P R If we have an estimate of the false positive and false negative rate, we cannot only estimate the bias, but also find the optimal concentration λ for which its estimate is unbiased. This can be used in combination with results of a dilution series to reduce bias. Note, that we need a less concentrated sample to reduce the bias if we expect a higher probability of false negatives, which may seem counter-intuitive.