Evaluation of absolute quantitation by nonlinear regression in probe-based real-time PCR
BMC Bioinformatics volume 7, Article number: 107 (2006)
In real-time PCR data analysis, the cycle threshold (CT) method is currently the gold standard. This method is based on an assumption of equal PCR efficiency in all reactions, and precision may suffer if this condition is not met. Nonlinear regression analysis (NLR) or curve fitting has therefore been suggested as an alternative to the cycle threshold method for absolute quantitation. The advantages of NLR are that the individual sample efficiency is simulated by the model and that absolute quantitation is possible without a standard curve, releasing reaction wells for unknown samples. However, the calculation method has not been evaluated systematically and has not previously been applied to a TaqMan platform. Aim: To develop and evaluate an automated NLR algorithm capable of generating batch production regression analysis.
Total RNA samples extracted from human gastric mucosa were reverse transcribed and analysed for TNFA, IL18 and ACTB by TaqMan real-time PCR. Fluorescence data were analysed by the regular CT method with a standard curve, and by NLR with a positive control for conversion of fluorescence intensity to copy number, and for this purpose an automated algorithm was written in SPSS syntax. Eleven separate regression models were tested, and the output data was subjected to Altman-Bland analysis. The Altman-Bland analysis showed that the best regression model yielded quantitative data with an intra-assay variation of 58% vs. 24% for the CT derived copy numbers, and with a mean inter-method deviation of × 0.8.
NLR can be automated for batch production analysis, but the CT method is more precise for absolute quantitation in the present setting. The observed inter-method deviation is an indication that assessment of the fluorescence conversion factor used in the regression method can be improved. However, the versatility depends on the level of precision required, and in some settings the increased cost effectiveness of NLR may justify the lower precision.
The use of real-time PCR in functional genomics has increased dramatically during the past decade. With this method, the detection of template accumulation in the PCR reaction is based on a fluorescent probe, or a fluorescent dye. The advantages compared to former PCR approaches are many: A: A closed compartment method decreases risk of contamination, as no post-PCR handling is necessary. B: The data used for calculation of quantity are collected as the PCR reaction runs, reducing the time span from pre-PCR procedures to final results are available. C: Compared to endpoint analyses of PCR reactions, real-time PCR is unmatched in precision – and D: An extreme dynamic range of 7–8 log10 [1, 2].
In the software currently available, analysis of real-time data is generally based on the "cycle-threshold" (CT) method. Some packages offer curve-smoothing and normalisation, but the basic CT algorithm remains unchanged. Threshold fluorescence is calculated from the initial cycles, and in each reaction the CT value is defined by the fractional cycle at which the fluorescence intensity equals the threshold fluorescence. A standard curve can be used for absolute quantitation, or the comparative CT method can be used for relative quantitation .
The CT method is quite stable and straightforward, so why try to complicate things? The answer is that the precision of estimates is impaired if efficiency is not equal in all reactions. Uniform reaction efficiency is the most important assumption of the CT method. The simplest estimate of individual sample efficiency is calculated from the slope of the first part of the log-linear phase , and can be used for identification of outliers or correction of values from individual samples. The sigmoid curve fit or non-linear regression (NLR) , on the other hand, assumes a dynamic change in efficiency and closely resembles the observed course of fluorescence accumulation during the whole reaction. A further advantage of regression analysis is the possibility to generate estimates of initial copy number directly from the regression estimates, eliminating the need for a standard curve . In small study series, the standard curve may be the best choice – but in a high-throughput production lab, elimination of the standard curve could liberate time and resources.
The first obstacle to the use of NLR is that the algorithm needs to be automated. The second and more important obstacle is that proper evaluation is missing both of the comparison of NLR with the CT method, and of the performance of NLR with probe-based chemistry. We therefore decided to develop and evaluate an automated regression model, to test if NLR is a real alternative to the traditional CT method.
Figure 1 shows an example of a curve-fit generated by NLR. In models 6, 9 and 11, one or more regressions returned bad fits (defined as generation of "impossible values" such as negative Fmax,etc.). In figure 2, plots of NLR- vs. CT-generated data are shown. Most models show a fair correlation. Models 3, 8, and 10 have higher bias than the rest, and the error is higher in models 2, 8, and 10. Models with one or more "bad fits" are not shown.
Altman-Bland plots were made of the numerical differences between duplicates (error) vs. duplicate means for each dataset and each regression model and the CT method. These plots showed an increase of error with increasing mean (an example of this is shown in figure 3). However, a log10 transformation of all final estimated values could resolve this pattern, and the error plots showed independence (figure 4 compares intra-assay variation with model 4 and with the CT method – for all assays). The intra-assay variation could then be characterised by a 95 percentile of the observed errors. The inverse log10 of this percentile can be interpreted as factor variation and recalculated to a percentage, as presented in table 2.
The mean copy number of duplicates was then analysed in plots of differences between NLR- and CT-derived values (bias) vs. means (of NLR- and CT-derived values). Again, independence could be observed after log10 transformation of the copy number values, but not in the raw data. In each experiment there was a relative bias, but when comparing the different experiments the bias was clearly not systematic. In figure 5, the bias of model 4 is shown in an Altman Bland plot containing data from all three experiments. The distribution of the data clouds indicates that each conversion factor varies between experiments in a random manner.
The calculated conversion factors ranged from 7.96E+10 to 3.07E+11 copies/fluorescence unit. Table 2 offers an overview of all models tested and key figures of their performance. The error percentiles stated are calculated on pooled data from all 3 assays, and the bias values are means of pooled numerical bias. As can be seen in figure 5, a simple average of pooled values would yield an erroneously low estimate of the bias, so the overall bias of each regression model has been calculated as an average of numerical bias values. For evaluation of the modifications applied, table 3 offers an overview of resulting R2 mean, error, and bias changes.
In the CT method equal efficiency in all reactions is assumed, and the impact of this assumption on final estimates has been underlined previously [4, 7, 8]. Tichopad  presented a standardised, automatable algorithm for estimation of sample specific efficiency, and a similar approach was published by Ramakers et al . These models calculate efficiency at the early log-linear phase, and assume homogenous efficiency before that. However, calculation of sample specific efficiency was also evaluated by Peirson et al , who concluded that this approach was good for detection of outliers, but individual efficiency correction did not improve the precision of absolute quantitation.
The CT method has also been combined with curve-smoothing to obtain background correction and data smoothing (in the soFAR software package  and by Larionov et al , who also included amplitude normalisation). The latter approach may produce nice curves, but especially amplitude normalisation will change the slope of the log-linear phase and thereby mask differences in reaction efficiency.
Theoretically, a calculation of template accumulation that mimics the dynamic change in PCR efficiency, and includes a larger array of the collected fluorescence data, could be more precise than the CT method. Alternatives to CT-based calculation have been suggested previously [5, 6, 12, 13]. One model that assumes a dynamic change in efficiency is the sigmoidal curve fit , though limitations apply [6, 12]. Especially the late plateau phase of the reaction is difficult to fit in this mathematical model. Rutledge suggested removal of observations from the late plateau phase to increase goodness-of-fit to the remaining data. Principal objections aside, the latter approach is less well suited for automation. To solve this problem we tested weighted analysis, which performed well in automation but unfortunately did not improve the precision of estimates.
Algorithms for this type of analysis should be independent on user input apart from the raw data, to eliminate user-dependent bias. In general, "mass production" techniques should be used with caution in complicated regression models, as small errors may impair the precision of the final estimates . Of the models initially investigated in this study, three produced one or more bad fits when automated – which illustrates a potential disadvantage of NLR when compared to CT analysis. The remaining eight models seemed robust, and could be evaluated more thoroughly.
The R2 value can be interpreted as "the amount of observed variation explained by the regression model". The mean R2 values in table 2 show that all models generated values above 0.99. Obviously, differences in the 3rd decimal place of R2 are not a good measure of model performance, so the Altman-Bland method is more informative.
In the present study, the gold standard CT method has an intra-assay variation (error) of 24%, which is close to previously reported values . This error is a sum of the inaccuracies in fluorescence measurement, thermocycling, pre-PCR procedures, and the CT fractional cycle estimate. Most of these inaccuracies are common to both calculation methods. In NLR, 4 or 5 variables are estimated in each analysis (C1/2, Fmax, k, Fb, f), and each of these estimates contain intrinsic error. Thus, the resulting intra-assay variation is a combination of inaccuracies in the pre-PCR procedures, equipment errors, and errors in the variable estimates. Thus, in effect at least 35% of the total 59% error in model 4 is generated by the mathematical model itself.
Of the four different modifications to the original model tested, changes in R2 were minute – but in terms of error all modifications tested had a negative impact on the model, probably due to the increased number of variables estimated. Model 4 (log10 transformation of raw data) produced marginally lower bias and marginally higher error, and this is the only modification that was not directly harmful to model performance.
The high-performance assays used in this study are an optimal setting for CT analysis, and our evaluation may therefore be quite conservative in terms of demonstrating the advantages of NLR. In assays with varying PCR efficiency, the NLR method may yet prove to be more precise than the CT method. This, however, awaits systematic evaluation.
The use of an absolute conversion factor, or optical calibration, has been evaluated previously in different analysis models [6, 13]. The three data clouds in figure 5 were generated with separate conversion factors, and their distribution shows a pattern of random variation, underlining that our conversion factor assessment was inaccurate. However, the conversion factor only affects the absolute sample value and not the intra-assay variation, nor the rank position of a sample in the data set.
Probe-based chemistry theoretically offers a stoichiometric calibration, as each probe has one reporter and one quencher molecule. In effect, the conversion factor should be universal and independent on the template measured. The conversion factors calculated in this series ranged × 3.9 from lowest to highest, and this also indicates that the precision of our conversion factors was less than optimal. Stoichiometric calibration was investigated in detail by Swillens et al , who lowered probe concentration to define probe as the limiting factor of fluorescence accumulation. This approach assumes a precise probe concentration and a 100% conjugation and purity. As the problem of signal to noise ratio is inherent in all probe-based assays, a reduction of probe concentration lowers the detection window even further and may impair precision.
Alternative mathematical models
For curve smoothing combined with the CT method, the sigmoidal curve fit may not be optimal – as the Gompertz function  shows a better fit both with the steep increase phase and the late plateau phase. The Gompertz algorithm is not suitable for estimation of initial fluorescence, though (tested, not shown).
To calculate the initial copy number accurately, the efficiency of each cycle must be estimated:
In theory each of these efficiencies could be measured directly on the fluorescence curve. In practice, however, only a few points on the PCR curve yield workable efficiency estimates because the early plateau phase is dominated by background noise. Rutledge recently proposed an alternative model for estimation of maximal efficiency based on the sigmoidal model . As the efficiency is directly calculable in the log-linear phase , the important extremes of efficiency (E0 and ECT) can be assessed. Further work will show if this model is workable, or if it will fall short on the problem of multiple estimates.
NLR is automatable and may be a powerful tool for analysis of fluorescence data from real-time PCR experiments. The unfavourable signal to noise ratio of the probe-based assays did not impair NLR analysis. The versatility of NLR depends on the precision needed – but if adaptable, this analysis method may save both time and resources in the laboratory. Further work is needed as to improve precision of the fluorescence-copy number conversion factor in order to reduce the bias observed in this study.
It is indeed possible to obtain absolute quantitation from real-time data without a standard curve. In an optimised assay, however, the CT method remains the gold standard due to the inherent errors of the multiple estimates used in NLR.
Forty-four biopsies of human gastric mucosa, collected by endoscopy of outpatients referred for dyspepsia, were included in this study after written informed consent. Biopsies were stored in RNA-Later (Ambion, Austin, Texas, USA) until extraction by the Trizol method (Invitrogen, Carlsbad, California, USA) according to the manufacturer's instructions. A standardised amount of total RNA (1μg) was reverse transcribed by Superscript II (Invitrogen), and cDNA was stored at -70°C. Samples were measured in duplicate by real-time PCR in an ABI-Prism 7900 instrument using TaqMan chemistry and SDS 2.1 software (Applied Biosystems, Foster City, California, USA), and a standard protocol in 25μL format. Three different templates were measured; table 1 shows the primers and probes, manufactured as custom oligos by Eurogentec, Seraing, Belgium. The absolute standard was produced by serial dilution of a dsDNA PCR product, purified by gel band analysis/extraction (GFX columns, Amersham, Piscataway, NJ, USA), sequenced (BigDye 2.0, Applied Biosystems) and quantified by spectrophotometry (Eppendorf Biophotometer, Hamburg, Germany). Based on repeated standard curves, all three assays performed well with calculated mean efficiencies above 1.99, and standards with concentrations of 100 copies/μL or more yielded CT values with a narrow 95%CI. At lower concentrations (10 and 1 copies/μL) CT values showed increasing standard error, compatible with increasing stochastic effects at low concentrations. The assays chosen have different expression levels in the tissue analysed (ACTB>IL18>TNFA). Raw fluorescence readings were exported from SDS as "clipped" text files which are readable by the statistics software. Regression analysis was performed in SPSS 12.0.1 (SPSS Inc., Chicago, Illinois, USA).
Equation 1: Where F C is fluorescence at cycle C; F max is the maximal fluorescence intensity; C is cycle number; C1/2 is the fractional cycle at half of maximal fluorescence; k is a slope constant related to PCR efficiency; and F b is the background fluorescence.
This equation was tested with combinations of additional mathematical modifications to increase goodness-of-fit (R2 closer to 1), as described below. For an overview of the 11 regression models, see table 2.
Baseline drift correction
In most of the reactions a slight, but significant linear increase of background fluorescence was observed. This baseline drift could be corrected by the introduction of a linear term in the regression model:
Equation 2: where f is a constant.
In the late plateau phase, a deviation from the sigmoid pattern can be observed with SYBR green chemistry, and unfortunately this tendency seems to be even stronger with TaqMan chemistry. Rutledge addressed this problem by removing such values from the calculation . We preferred a weighted regression – allowing for increase/decrease of impact of data – rather than removing values completely from the calculations. To automate this process, a "weight function" was devised based on a C1/2 estimate. This function generates a set of weights that is tailored to each specific reaction.
The constant initialises the weight at a base level; the second term gradually increases the weight from around 20 cycles before C1/2. The third term decreases the weight rapidly at C1/2, and the fourth term reduces the impact of the weights at C1/2 above 35.
An alternative way of dealing with late plateau phase drift is log10 transformation of fluorescence data, which changes the profile of the fluorescence curve to a more sigmoid pattern. In log10 transformed fluorescence data, however, the background fluorescence makes the early plateau phase very noisy – so a second, similar weight profile algorithm was devised to lessen the impact of early plateau phase data on the calculations.
The basis of CT analysis is fluorescence data corrected for Backgr. detection (background noise). When exporting data from SDS, two tables are generated, one with raw data (no correction), and one with background subtraction (Backgr. corrected).
Calculation of the template-related initial fluorescence was made by substitution of C by 0:
The "optical calibration" was performed by running NLR on the reactions with known copy number (the absolute standard), and a conversion factor CF was calculated from the estimated F0.
The regression models were written in SPSS syntax. On a decent PC (2.6 GHz P4, 256 MB RAM, XP pro), the algorithm processes an entire 96 reaction plate in less than 2 minutes.
Evaluation of output data
The three SDS files were subjected to analysis by CT/standard curve and by all 11 NLR algorithms. For each model, a mean R2 was calculated for comparison of goodness of fit between models. The data sets were then subjected to Altman-Bland analysis . Two types of Altman-Bland plots were generated. In the first type, intra-assay variability was evaluated in plots of numerical difference between duplicate values (termed "error") vs. mean of the duplicate values. If the error is independent of mean value, the 95th percentile is a measure of the overall intra-assay variability. If independence is not observed (i.e. patterns are observed in the scatter plot), appropriate transformation of raw data (here: the calculated copy numbers) or partitioned analysis must be applied before the error can be evaluated. The second type of Altman-Bland plot was aimed at evaluation of inter-method agreement (i.e., comparison of NLR vs. CT derived values). Plots of the point-to-point differences (termed "bias") versus the means of results derived by the two methods were inspected and rules of independence applied. The mean of the observed bias values yields a reasonable measure of the systematic error of estimates.
The project was recommended by the regional committee of medical research ethics (REK Northern Norway), ref # 200100973-5/IAY/400.
All files include 4 sections: A: Imports of raw data into an SPSS data file. B: Performs nonlinear regression on absolute standards for calculation of calibration factor. C: Performs nonlinear regression on all remaining reaction traces. C: Collects data in an SPSS data file and calculates absolute copy numbers for each reaction.
Bustin SA: Quantification of mRNA using real-time reverse transcription PCR (RT-PCR): trends and problems. J Mol Endocrinol 2002, 29: 23–39. 10.1677/jme.0.0290023
Freeman WM, Walker SJ, Vrana KE: Quantitative RT-PCR: pitfalls and potential. Biotechniques 1999, 26: 112–115.
Pfaffl MW: Quantification strategies in real-time PCR. In A-Z of quantitative PCR. Edited by: Bustin SA. International University Line (IUL), La Jolla, CA, USA; 2004.
Tichopad A, Dilger M, Schwarz G, Pfaffl MW: Standardized determination of real-time PCR efficiency from a single reaction set-up. Nucleic Acids Res 2003, 31: e122. 10.1093/nar/gng122
Liu W, Saint DA: Validation of a quantitative method for real time PCR kinetics. Biochem Biophys Res Commun 2002, 294: 347–353. 10.1016/S0006-291X(02)00478-3
Rutledge RG: Sigmoidal curve-fitting redefines quantitative real-time PCR with the prospective of developing automated high-throughput applications. Nucleic Acids Res 2004, 32: e178. 10.1093/nar/gnh177
Marino JH, Cook P, Miller KS: Accurate and statistically verified quantification of relative mRNA abundances using SYBR Green I and real-time RT-PCR. J Immunol Methods 2003, 283: 291–306. 10.1016/S0022-1759(03)00103-0
Ramakers C, Ruijter JM, Deprez RH, Moorman AF: Assumption-free analysis of quantitative real-time polymerase chain reaction (PCR) data. Neurosci Lett 2003, 339: 62–66. 10.1016/S0304-3940(02)01423-4
Peirson SN, Butler JN, Foster RG: Experimental validation of novel and conventional approaches to quantitative real-time PCR data analysis. Nucleic Acids Res 2003, 31: e73. 10.1093/nar/gng073
Wilhelm J, Pingoud A, Hahn M: SoFAR: software for fully automatic evaluation of real-time PCR data. Biotechniques 2003, 34: 324–332.
Larionov A, Krause A, Miller W: A standard curve based method for relative real time PCR data processing. BMC Bioinformatics 2005, 6: 62. 10.1186/1471-2105-6-62
Rutledge RG, Cote C: Mathematics of quantitative kinetic PCR and the application of standard curves. Nucleic Acids Res 2003, 31: e93. 10.1093/nar/gng093
Swillens S, Goffard JC, Marechal Y, de Kerchove EA, El Housni H: Instant evaluation of the absolute initial number of cDNA copies from a single real-time PCR curve. Nucleic Acids Res 2004, 32: e56. 10.1093/nar/gnh053
Samar VJ, De Filippo CL: Round-off error, blind faith, and the powers that be: a caution on numerical error in coefficients for polynomial curves fit to psychophysical data. J Outcome Meas 1998, 2: 159–167.
Gentle A, Anastasopoulos F, McBrien NA: High-resolution semi-quantitative real-time PCR without the use of a standard curve. Biotechniques 2001, 31: 502, 504–506, 508.
Marusic M, Bajzer Z, Freyer JP, Vuk-Pavlovic S: Analysis of growth of multicellular tumour spheroids by mathematical models. Cell Prolif 1994, 27: 73–94.
Rutledge RG: Amplification efficiency dynamics and its implications: Developing a kinetic based approach for quantitative analysis. 2nd International qPCR Symposium, TUMTECH, Munich - in submission process 2005.
Bland JM, Altman DG: Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986, 1: 307–310.
The project was funded by grants from the Helsenord RHF (SPF 54–04).
RGO conceived of the study, designed the project, and performed the experiments and calculations. All authors contributed in discussion of regression models, conclusions, and in preparation of this paper.
Electronic supplementary material
Additional File 1: The original sigmoid model with no modifications, performed on data that were not corrected for background fluorescence. (SPS 5 KB)
Additional File 8: Raw data corrected for background fluorescence, weight emphasis on the early plateau phase. (SPS 7 KB)
Additional File 9: Raw data corrected for background fluorescence, log10 transformed, and weighted with emphasis on late plateau phase. (SPS 8 KB)
About this article
Cite this article
Goll, R., Olsen, T., Cui, G. et al. Evaluation of absolute quantitation by nonlinear regression in probe-based real-time PCR. BMC Bioinformatics 7, 107 (2006). https://doi.org/10.1186/1471-2105-7-107