Improving statistical inference on pathogen densities estimated by quantitative molecular methods: malaria gametocytaemia as a case study

Background Quantitative molecular methods (QMMs) such as quantitative real-time polymerase chain reaction (q-PCR), reverse-transcriptase PCR (qRT-PCR) and quantitative nucleic acid sequence-based amplification (QT-NASBA) are increasingly used to estimate pathogen density in a variety of clinical and epidemiological contexts. These methods are often classified as semi-quantitative, yet estimates of reliability or sensitivity are seldom reported. Here, a statistical framework is developed for assessing the reliability (uncertainty) of pathogen densities estimated using QMMs and the associated diagnostic sensitivity. The method is illustrated with quantification of Plasmodium falciparum gametocytaemia by QT-NASBA. Results The reliability of pathogen (e.g. gametocyte) densities, and the accompanying diagnostic sensitivity, estimated by two contrasting statistical calibration techniques, are compared; a traditional method and a mixed model Bayesian approach. The latter accounts for statistical dependence of QMM assays run under identical laboratory protocols and permits structural modelling of experimental measurements, allowing precision to vary with pathogen density. Traditional calibration cannot account for inter-assay variability arising from imperfect QMMs and generates estimates of pathogen density that have poor reliability, are variable among assays and inaccurately reflect diagnostic sensitivity. The Bayesian mixed model approach assimilates information from replica QMM assays, improving reliability and inter-assay homogeneity, providing an accurate appraisal of quantitative and diagnostic performance. Conclusions Bayesian mixed model statistical calibration supersedes traditional techniques in the context of QMM-derived estimates of pathogen density, offering the potential to improve substantially the depth and quality of clinical and epidemiological inference for a wide variety of pathogens. Electronic supplementary material The online version of this article (doi:10.1186/s12859-014-0402-2) contains supplementary material, which is available to authorized users.

Results: The reliability of pathogen (e.g. gametocyte) densities, and the accompanying diagnostic sensitivity, estimated by two contrasting statistical calibration techniques, are compared; a traditional method and a mixed model Bayesian approach. The latter accounts for statistical dependence of QMM assays run under identical laboratory protocols and permits structural modelling of experimental measurements, allowing precision to vary with pathogen density. Traditional calibration cannot account for inter-assay variability arising from imperfect QMMs and generates estimates of pathogen density that have poor reliability, are variable among assays and inaccurately reflect diagnostic sensitivity. The Bayesian mixed model approach assimilates information from replica QMM assays, improving reliability and inter-assay homogeneity, providing an accurate appraisal of quantitative and diagnostic performance.
Conclusions: Bayesian mixed model statistical calibration supersedes traditional techniques in the context of QMM-derived estimates of pathogen density, offering the potential to improve substantially the depth and quality of clinical and epidemiological inference for a wide variety of pathogens.

Background
The development of quantitative molecular methods (QMMs) has allowed the detection and quantification of pathogens at concentrations below the threshold of detection by conventional diagnostic tools [1]. Molecular tools such as quantitative real-time polymerase chain reaction (q-PCR), reverse-transcriptase PCR (qRT-PCR) and quantitative nucleic acid sequence-based amplification (QT-NASBA) are routinely used to estimate the density of a variety of pathogens, including human immunodeficiency virus (HIV), influenza viruses and Plasmodium species protozoa which cause malaria. Pathogen density estimates are increasingly being used in epidemiological assessments (for example, to determine viral [2,3] and bacterial [4] transmissibility), clinical management (such as in HIV [5] and bacterial pneumonia [6]), and to assess the effectiveness of control interventions [7,8]. Therefore, it is critically important that the quantitative and diagnostic performance of QMMs is accurately appraised and that point-estimates of pathogen density are accompanied by robust estimates of reliability (uncertainty).
The principles underlying QMMs such as qPCR, qRT-PCR and QT-NASBA are broadly the same. Nucleic acid in a sample is amplified together with a fluorescent probe and the time taken for the reaction to achieve a certain degree of fluorescence-the experimental measurement-is used to estimate the initial quantity of nucleic acid. ' Absolute' quantification [9] uses calibration or 'standard' curves of test samples with concentrations measured precisely enough to be considered known, so-called calibrators. Typically, this is achieved by diluting a sample of high concentration measured by the available gold standard quantitative diagnostic to yield a 'dynamic range' of calibrators typically in the order of 4 to 8 logarithms, a procedure called serial dilution. The alternative 'relative' quantification uses an internal reference gene and calculates the relative expression ratio [10]. Based on the theory of nucleic acid amplification, the quantity of nucleic acid in the amplification phase increases exponentially and so plotting the experimental measurement against the logarithm of the calibrators yields a linear relationship. The fitted regression line describing this relationship is called a calibration or standard 'curve'. Statistical calibration [11] refers to the process of using a calibration curve to estimate an unknown (logarithm of ) quantity of interest (here pathogen density) from an experimental measurement.
Quantitative molecular methods have been described as either quantitative or semi-quantitative [12]. In reality, their performance ranges from quantitative and highly accurate, to predominantly qualitative indicators of presence or absence. A cascade of numerous potential sources of uncertainty in laboratory protocol [13] mean that most QMMs lie between these extremes, having intermediate quantitative resolution [14,15]. Regardless of the source, uncertainties manifest in calibration curves with nonnegligible (intra-assay) residual error in experimental measurements, and potential inter-assay variability among slopes and intercepts, even when undertaken using standardized protocols within the same laboratory [13]. These errors are widely acknowledged, defined in the MIQE guidelines (minimum information for publication of quantitative real-time PCR experiments) as 'repeatability' (intra-assay variance) and 'reproducibility' (inter-assay variance) respectively [16], and are broadly indicative of the quantitative and diagnostic performance of the QMM in question.
Despite this, there is a lack of statistical understanding on how exactly such (intra-and inter-assay) errors translate into the reliability of estimated pathogen densities or nucleic acid copy numbers, and into the diagnostic sensitivity (sometimes termed 'clinical sensitivity' to distinguish it from 'analytical sensitivity' which refers to the minimum number of detectable nucleic acid copies [16]) of the QMM. Indeed, calibration techniques developed in the statistical literature [11] have not been adequately applied in the context of QMMs. By contrast, in applied physical science disciplines, particularly in analytical chemistry, where calibration is also widely used, methodological protocols are more firmly embedded within their statistical foundations [17].
In this paper, statistical calibration techniques are applied, as a case study, to 12 calibration curves derived from 12 QT-NASBA assays (1 curve per assay), generated from a single laboratory [18], and used routinely for estimating the density of Plasmodium gametocytes present in human blood (gametocytaemia). The QT-NASBA assay uses time to positivity (TTP) in minutes as an indirect measure of pathogen density; the shorter the TTP, the higher the density. Gametocytaemia density determines host infectivity to mosquito vectors and has major epidemiological implications, ranging from quantifying the contribution of different individuals to the reservoir of infection [19], to assessing the effectiveness of transmissionblocking interventions against malaria. Notwithstanding the importance of QT-NASBA to malariologists, the analytical approaches presented here are more broadly applicable to the absolute quantification of a wide range of pathogens by other QMMs. In particular, it is shown how refinements to the traditional calibration approach using random effects and implemented in a Bayesian framework, enable data (calibration curves) from multiple assays to be combined, yielding substantial improvements in accuracy, reliability and consistency of statistical inference on estimated pathogen densities as well as in diagnostic sensitivity.

Ethical clearance
Data were primarily derived from cultured NF54 gametocytes; natural gametocyte isolates were used from a previously published clinical study that received approval from the Ministry of Health of Burkina Faso (2000/3174/ MS/SG/DEP).

The QT-NASBA technique
Full details on the molecular aspects of the QT-NASBA technique are described elsewhere [18]. Briefly, 50-100 μl of blood is collected; the RNA of gametocytes is extracted and then amplified in the presence of a fluorescence probe. The assay measures time to positivity (TTP) in minutes which is the time it takes for the number of target amplicons detected to exceed a defined threshold. In the context of qPCR, TTP is analogous to the quantification cycle (C q ), threshold cycle (C T ), crossing point (C P ), and take-off point (TOP) [16].

Assays
A QT-NASBA assay is typically run on a 48-well plate. Here, 39 wells contained test samples of unknown gametocyte density; 3 were reserved for negative controls (water), and 6 wells contained samples of known gametocyte density (calibrators) used to calibrate the TTP-gametocyte density relationship. Experimental data from 12 QT-NASBA assays are analysed in this paper to motivate and illustrate the proposed analytical framework.

Calibrators
Calibrators were prepared using synchronized, purified mature gametocytes derived from an in vitro culture of P. falciparum [20]. A starting density of 10 6 gametocytes per ml was estimated using microscopy and used in 6 tenfold dilution series (10 6 to 10 1 gametocytes per ml). Hence, 6 calibrators were included in each of the 12 QT-NASBA assays.

Calibration curves
The log 10 (base 10)-transformed density of calibrator i = 1,2,…,n (here n = 6, i.e. 6 calibrators per assay) from assay j = 1,2,…,r (here r = 12, i.e. 12 assays in total), is denoted x ij . The TTP value associated with each calibrator (viz. each x ij ) is estimated using QT-NASBA and denoted by the random variable Y ij . The relationship between TTP and pathogen density is described by a linear mixed model (LMM) [21,22] of the form where x is the mean of the calibrators, ε ij is a normally distributed residual error term with mean 0, and β 0j and β 1j are random effects which follow a multivariate normal distribution with a mean vector of fixed effects β = (β 0 , β 1 ), standard deviation (SD) vector τ = (τ 0 , τ 1 ) and correlation ρ. Setting τ = (0, 0), the LMM becomes a linear model (LM) with fixed effects β 0j = β 0 and β 1j = β 1 (no random variation among assay-specific regression coefficients). Homoscedastic (constant) intra-assay residual variance is defined by var(ε ij ) = σ 2 . Alternatively, heteroscedasticity (non-constant intra-assay variance) is captured by defining var(ε ij ) as a non-constant function. Specifically, and based on inspecting residuals of a fitted model (Figure 1), var (ε ij ) is defined as a log-linear function of the mean of noting that γ = 0 reproduces intra-assay homoscedasticity.

Classical calibration
Statistical calibration [11] concerns making inference on an unknown value of the independent variable (gametocyte density in assay j), denoted X 0j , from a single experimental observation (a TTP), y 0j , or more generally from the mean of m observations (TTPs), y 0j ¼ Σ i y 0ij =m. Solving (1) for x ij and substituting x ij and Y ij for X 0j and y 0j respectively yields A point-estimate of X 0j , denotedx 0j ; is given by replacing β 0j and β 1j in (3) with their so-called empirical best linear unbiased predictors (EBLUPs) [21], denoted b 0j and b 1j , For the fixed effects LM where τ = (0,0), b 0j = b 0 and b 1j = b 1 , (4) becomes Eisenhart's classical calibration Figure 1 Heteroscedasticity of calibration or standard curves from quantitative nucleic acid sequence-based amplification (QT-NASBA) assays. Panel A depicts the conditional raw residuals of a homoscedastic linear mixed model with a correlated random intercept and slope (the homoscedastic homologue of the selected Model 7 in Table 2) plotted against the fitted time to positivity (TTP). The data points in panel B depict the natural logarithm (ln) of the variance of the residuals presented in panel A grouped by deciles (10-quantiles) of fitted TTP. The solid line and shaded area are for presentation purposes only, representing a linear regression ± 2 standard deviations respectively. estimator and the estimates b 0 and b 1 can be obtained by ordinary least squares.
The classical calibration estimator [23], derived from a linear regression model with normally distributed errors, is the fixed effects version of (4), Uncertainty in the estimates b 0 and b 1 propagates into the sampling distribution ofx 0 resulting in a ratio distribution with undefined moments [24,25]. Specifically, if the residual error variance σ 2 of the calibration curve is known,x 0 is distributed as the ratio of two normally distributed random variables. More commonly, σ 2 is estimated as s 2 , andx 0 is distributed as the ratio of t distributed random variables [26]. In practice, the distributional nuances ofx 0 are seldom relevant for 'high quality' calibration curves where the absolute magnitude of β 1 is large relative to σ 2 , and there are numerous and adequately dispersed calibrators (i.e. when Σ i x i − x ð Þ 2 is large). In such circumstances, b 1 is strongly significantly different from 0. Indeed, conditioning on this event has proved a popular approach to ensure the existence of finite moments ofx 0 [27,28]. By this method, the approximate variance ofx 0 is which is used widely, particularly in analytical chemistry, to construct confidence intervals (CIs); usually invoking a t-distribution, replacing, respectively, σ 2 and β 1 with s 2 and b 1 , and assuming that the final term in the parentheses is negligible [17,29,30]. The so-called 'fiducial' CI [31] is also popular, derived by finding values ofx 0 that satisfy the bounds of the prediction interval of y 0 atx 0 [32], where and t is the critical value of Student's t-distribution with n + m −3 degrees of freedom. Term g (the 'g statistic') is important because it inversely measures the 'performance' of a calibration curve, assimilating the gradient, the residual variability, and the number and spread of the calibrators into a single metric; as g → 0 the variance ofx 0 decreases [33][34][35]. It is also noteworthy that as g → 0, the fiducial CI tends to the CI constructed using the variance approximation. Indeed, the later approach is deemed valid only for g < 0.05 [17]. The fiducial limit, although generally more versatile than the variance approximation approach, performs increasingly poorly for increasing g [26,36].

Bayesian calibration
In general, the undefined moments of the ratio distribution b 0j / b 1j create problems for the quantification of SE by traditional (frequentist) approaches, albeit point estimates are largely invariant. This is particularly the case for a LMM where prerequisite estimation of uncertainty in the EBLUPs (b 0j and b 1j ) is problematic [21]. The Bayesian solution [11] rests on evaluating the posterior distribution of X 0j , given by Bayes' theorem, p X 0j y 0j ; θÞ ∝p y 0j X 0j ; θ Á p X 0j jθÞ; À where θ represents in a generic manner the parameters of the hetero-or homoscedastic LM or LMM. Indicated by (9) is that-in the absence of observed y 0j -the posterior distribution (hereafter abbreviated to posterior) of X 0j can be simulated via the posterior predictive distribution [37] of Y at X 0j ¼ x 0j ; pð y 0j X 0j ¼ x 0j ; θÞ , before applying the rearranged regression (3). Therefore, p(X 0j |ӯ 0j ,θ) is rewritten as p(X 0j |x 0j ,θ). Simulating from p(X 0j |x 0j ,θ) in this way enables the performance of calibration curves (summarized in terms of reliability and diagnostic sensitivity) to be evaluated at chosen hypothetical values of the 'true' unknown gametocyte density. In general, it is necessary to evaluate p(X 0j |x 0j ,θ) numerically by Markov chain Monte Carlo (MCMC) methods.
Simulations were conducted by MCMC sampling implemented in OpenBUGS (http://www.openbugs.net) [38], the currently maintained and updated version of WinBUGS [39]. To reflect the absence of prior information on the parameter values, vague (uninformative) prior distributions were defined for the θ: regression coefficients [fixed effects, including those of the log-linear heteroscedastic function defined by (2)] and random effects were assigned normal priors with mean 0 and variance 1000; variance parameters were assigned inverse-gamma priors with shape and rate parameters equal to 0.001, and the covariance matrix Σ of linear mixed models was assigned an inverse Wishart distribution with 2 degrees of freedom [37]. Three Markov chains were initialized for each simulation. Visual inspection of the Markov chains, autocorrelation plots and the Gelman-Rubin statistic [37] were used to assess convergence on the parameter posterior distributions and to check that the conclusions were not sensitive to the choice of starting values. In general, the first 2,500 samples from each chain were discarded as 'burn-in' and a further 50,000 samples were used to estimate the marginal posterior distributions.

Goodness of fit
Calibration curves fitted to the Plasmodium gametocytaemia data from the 12 assays, either individually using a homoscedastic (constant intra-assay variance) linear model (HoLM)also referred to as the traditional approach-or collectively using a heteroscedastic (dynamic intra-assay variance) linear mixed model (HeLMM), are depicted in Figure 2. Parameter estimates and summary statistics of the HoLMs are given in Table 1. The goodness-of-fit of these models varies considerably among assays, from R 2 = 97% in assay 7 (assay j = 7) to R 2 = 74% in assay 4 (assay j = 4). Reflecting this heterogeneity, the 95% prediction intervals for the mean of m TTP observations, y 0j ; from hypothetical 'true' values of log 10 gametocyte density, x 0j , also vary markedly ( Figure 2).
The HeLMM was selected from several parameterizations of the LMM (including homoscedastic variants) which were compared using the deviance information criterion (DIC) [40] ( Table 2). The selected LMM with the lowest DIC included a correlated random intercept and slope, and intra-assay heteroscedastic errors (Model 7 in Table 2). Inclusion of heteroscedasticity yielded particularly large reductions in the DIC of all models compared with their homoscedastic counterparts (Table 2), consistent with an analysis of the residuals from the homoscedastic LMM (Figure 1). Generally, the fitted lines (strictly, posterior means) of the selected HeLMM do not differ substantially from the individually-fitted HoLMs. Posterior means and SDs of estimated parameters of all LMM parameterizations are given in Table 3.

Performance and variability
The performance of calibration curves is summarized in terms of reliability and diagnostic sensitivity. Specifically, inference is based on the numerical (MCMC) approximation of the gametocyte density posterior conditioned on known x 0j and parameters θ, p(X 0j |x 0j ,θ). Reliability is measured by either the 95% Bayesian credible interval (BCI) (Figure 3) or the SD (hereafter referred to in a frequentist manner as a standard error, SE) ( Figure 4) of p(X 0j |x 0j ,θ). Diagnostic sensitivity is quantified by the percentage of realizations from p(X 0j |x 0j ,θ) greater than 1 gametocyte per 0.05 ml volume of test sample (the volume in a single well of a plate used to run each assay) (Figure 3). Estimates of reliability (Figure 3; Figure 4) and sensitivity ( Figure 5) derived from HoLMs are highly heterogeneous, reflecting substantial variation in performance among calibration curves (assays). This heterogeneity is also captured by the so-called g statistic (8) ( Table 1) which expresses the performance of each HoLM in a single metric. The g statistic also indicates when uncertainty intervals constructed using classical frequentist approximations [e.g. Equation (7)] will prove satisfactory (Additional file 1). By contrast, calibration using the selected HeLMM (Model 7, Table 2) yields less variable estimates of reliability ( Figure 3; Figure 4) and sensitivity ( Figure 5) by the (random effects) assumption that data from different assays are statistically related; that is, calibration curves are realizations from an underlying distribution, each with a different intercept and slope (Table 2), but with the same (heteroscedastic) intra-assay variance function (2).
Intra-assay heteroscedasticity decreases the SE of the gametocyte density (decreases variance, Figure 1) with increasing x 0j in a density-dependent manner (Figure 3; Figure 4). Consequently, the reliability of the HeLMMderived estimates at low x 0j , while quite homogeneous among assays, is only moderately superior to the majority of HoLM-derived estimates. Conversely, at high x 0j , HeLMM-derived estimates are markedly more reliable (Figure 3, Figure 4). Heteroscedasticity also introduces density dependence into diagnostic sensitivity, reducing the sensitivity estimated from the HeLMM compared to the HoLM at low x 0j , and vice versa. Indeed, at 10 gametocytes per ml, the HeLMM-derived estimates of sensitivity are between 25% and 50% compared to the HoLMderived estimates which are all greater than 75% ( Figure 4). Unsurprisingly, the SE and sensitivity estimated by both models over the range of x 0j is improved by increasing m replica TTP observations ( Figure 5).

Discussion
The availability of QMMs has led to a new dimension in clinical and epidemiological research, in which pathogen densities can be detected and quantified below the thresholds of conventional diagnostics. Despite broad awareness of the numerous potential sources of uncertainty inherent to QMMs, even implemented using standardized experimental protocols, there has hitherto been a lack of clear explanation on how these uncertainties: (a) manifest as intra-and inter-assay errors in calibration curves; (b) project onto the reliability (uncertainty) of estimated pathogen densities and the diagnostic sensitivity of the QMM, and (c) should be handled statistically to estimate robust measures of quantitative and diagnostic performance. These gaps perhaps explain why estimates of reliability and sensitivity-essential metrics for statistical inference-seldom accompany point pathogen density estimates. The presented analysis serves to address these issues by calling attention to calibration methods developed in the statistical literature and refining these techniques to develop a novel and powerful modelling framework.

Calibration curves, reliability and sensitivity
Previous studies on the reliability of QMM-derived estimates have focused predominantly on uncertainty and intra-assay variability in nucleic acid amplification efficiency [41][42][43], rather than uncertainty in measures of estimated nucleic acid concentrations or pathogen densities. This is understandable given the uppermost importance of a consistent amplification efficacy for analytical approaches designed for relative quantification [10] and  because, for microparasitic pathogens (viruses, bacteria, protozoa), relative quantification, or just detection (presence or absence), is often considered sufficient in clinical contexts [42]. However, in terms of absolute quantification, variation in amplification efficiency-which manifests as inter-assay variability among the slopes of calibration curves-does not determine the reliability of estimates per se. This assertion is embodied by the g statistic, a single-metric indicator of the 'performance' of a simple linear calibration curve (a HoLM) which influences in a Intercept, β 0 (SD a ) Slope, β 1 (SD) Covariance matrix, Σ (SD) exp(intercept), σ 2 (SD) Slope, γ (SD)  non-linear manner the reliability of derived estimates and includes all the components of a desirable,'high performing' calibration curve; one with a steep slope, numerous calibrators spread over a wide dynamic range, and a small degree of residual variability of experimental measurements.
The diagnostic sensitivity of a QMM is inextricably associated with the reliability of estimated pathogen densities. That is, purely by chance, the number of pathogens in a positive sample may be estimated, via a calibration curve, as less than 1. The probability of this occurrence  defines diagnostic sensitivity. The results presented here indicate that the uncertainty in QT-NASBA pointestimates increases with decreasing gametocyte densities in a density-dependent manner. The compounding of (initially small) errors throughout the dilution series is a likely and possibly predominant reason for this; at each dilution, additional uncertainty is introduced in the number of 'known' gametocytes, error which propagates into the variability of the experimental measurements (TTPs). The net result is that the diagnostic sensitivity of the assay is also density-dependent, decreasing with decreasing gametocyte density (Figure 4). This finding is consistent with other QMMs used to quantify the density of a variety of pathogens [44][45][46], indicating a general result. It is thus important to emphasize that failing to capture accurately, using a statistical model, systematic changes in the residual variation of experimental measures about a calibration line (intra-assay variance) ( Figure 1) risks inaccurately estimating reliability ( Figure 4) and diagnostic sensitivity ( Figure 5).

Statistical dependence of calibration curves
Heterogeneity in the quality of calibration curves derived from different QMM assays is inescapable; consequently, variation in estimates of reliability and sensitivity (collectively referred to as 'performance') is also inevitable. This is the case in state-of-the-art laboratories using rigorous and standardized experimental protocols [13], and even more so in difficult field settings [47]. Refinements to experimental protocols, including statistically-driven efforts to optimize the number and distribution of calibrators [34], will reduce but not eliminate variation in the quality of calibration curves. Furthermore, the impact of quality on performance is intensified by the practice of using relatively few calibrators per assay (here 6 calibrators were used per 48-well plate, 6 to 8 being commonplace).
The 'traditional' calibration approach, whereby calibration curves from different assays are treated as statistically independent, yields considerable variation in estimates of performance. This is compounded with the ubiquitous assumption of homoscedasticity (constant intra-assay variance) of experimental measures-despite observations that SEs tend to be greater at low nucleic acid concentrations or densities (density dependence) of a variety of pathogens [41,44,48]-resulting in performance estimates that are inaccurate as well as heterogeneous. This presents particular difficulties to studies concerned with inference at low pathogen densities [19,49], not only because this is where the problem is worst, but also because this is precisely where QMMs are preferred for their superior detection sensitivity compared with conventional diagnostics.
The problem of variable performance estimates is resolved using mixed model techniques [21,22] adapted to statistical calibration [50]. Mixed models treat calibration curves from different assays as statically related, permitting inference on simultaneous analysis of all data derived from a set of assays run at a particular time or in a particular laboratory. (In principle, the effect of laboratory, time and any other measurable covariate could be incorporated into the framework presented here, although this is beyond the scope of the current paper.) The problem of inaccurate performance estimates is resolved by departing from the assumption of constant intra-assay variability and explicitly modelling density dependence in the residual variance of experimental measures about calibration curves (Figure 1), a construct that can be embedded within the mixed model framework. The resulting heteroscesdastic mixed model comes, of course, with the requirement to justify the modelling assumptions and is more technically challenging and time consuming to implement than a traditional calibration approach, which is based on the simple linear regression model. Nevertheless, the Bayesian MCMC techniques that provide the versatility to implement almost arbitrarily complex custom-built mixed models are now readily accessible in statistical software packages such as WinBUGS [39], OpenBUGS [38], and JAGS [51]. In the end the decision about whether or not to conduct the more time consuming Bayesian analysis will depend on the g statistic (a value which should always be quoted) and the degree of precision required from the assay.

The hazards of 'quality control'
Hitherto, the issue of variable quality calibration curves and the resulting heterogeneous assay performance has been addressed under the broad banner of 'quality control'. Strategies have included post hoc exclusion of 'outlying' calibration data points [41,44] and vetting of calibration curves, as practiced for quantification of hepatitis B and Epstein-Barr viraemias [34]. Both approaches are statistically equivocal unless there are measured independent variables (covariates) to explain the 'unexpected' data. For example, exclusion of the poorest quality calibration curve from assay 4 (see Figure 2 and Table 1) would have: (a) made valid statistical inference on the value of an unknown gametocyte density run on that particular assay extremely difficult, but more importantly (b) rendered the estimated performance of all other assays-which are informed by the quality of all the calibration curves-overly optimistic. In essence, the calibration data from assay 4 were observed without a priori information to explain their somewhat 'unexpected' or 'outlying' nature and so they should contribute information to the estimated performance of replica assays like any other. Myriad explanatory variables such as time, reagent batch, technician etc. could be legitimately included in the analysis and may demonstrate that quality, and hence performance, is genuinely better or worse under particular circumstances.