Highly accurate sigmoidal fitting of real-time PCR data by introducing a parameter for asymmetry
- Andrej-Nikolai Spiess^{1}Email author,
- Caroline Feig^{1} and
- Christian Ritz^{2}
https://doi.org/10.1186/1471-2105-9-221
© Spiess et al; licensee BioMed Central Ltd. 2008
Received: 06 November 2007
Accepted: 29 April 2008
Published: 29 April 2008
Abstract
Background
Fitting four-parameter sigmoidal models is one of the methods established in the analysis of quantitative real-time PCR (qPCR) data. We had observed that these models are not optimal in the fitting outcome due to the inherent constraint of symmetry around the point of inflection. Thus, we found it necessary to employ a mathematical algorithm that circumvents this problem and which utilizes an additional parameter for accommodating asymmetrical structures in sigmoidal qPCR data.
Results
The four-parameter models were compared to their five-parameter counterparts by means of nested F-tests based on the residual variance, thus acquiring a statistical measure for higher performance. For nearly all qPCR data we examined, five-parameter models resulted in a significantly better fit. Furthermore, accuracy and precision for the estimation of efficiencies and calculation of quantitative ratios were assessed with four independent dilution datasets and compared to the most commonly used quantification methods. It could be shown that the five-parameter model exhibits an accuracy and precision more similar to the non-sigmoidal quantification methods.
Conclusion
The five-parameter sigmoidal models outperform the established four-parameter model with high statistical significance. The estimation of essential PCR parameters such as PCR efficiency, threshold cycles and initial template fluorescence is more robust and has smaller variance. The model is implemented in the qpcR package for the freely available statistical R environment. The package can be downloaded from the author's homepage.
Background
Quantitative real-time polymerase chain reaction (qPCR) has become an invaluable tool for monitoring gene expression changes, combining the sensitivity of the PCR technique with the ability to quantify transcriptional changes with high accuracy [1]. Several different methods exist in respect to hardware (i.e. cappillary-based systems or thermal block-based systems) or fluorescence chemistry and design. Using the DNA intercalating dye SYBR Green I is one of most widely applied systems, as the fluorescence readout can be obtained from any PCR amplicon irrespective of its sequence. This way qPCR experiments can be conducted fast and with many different sequences, as is the case in screening and evaluating differential gene expression obtained from microarray experiments [2, 3].
When investigating differential gene expression, qPCR data of two or more different conditions (such as control/treatment or healthy/pathological) are compared by using the fluorescence data acquired by the hardware. One approach is the comparison of the threshold cycles, when the fluorescence of the qPCR reaction rises significantly above the background level, commonly done by the ΔΔCt methods. Originally developed with the tenet that the PCR efficiency is 2 [4], this was soon extended by the long known observation that PCR efficiency can have smaller values and be very different between two different amplicons, as is the case when normalizing a gene of interest against a 'housekeeping' gene. This necessitates the calculation of the efficiency in order to derive a realistic estimate of the expression changes. Various algorithms have been developed such as estimation from the slope of a calibration curve [5, 6] or from a linear fit of the logarithmized data within the exponential region either defined by the 'midpoint' [7] or the region with highest linearity ('window-of-linearity')[8].
In contrast to the above described linear quantitation methods, sigmoidal models have been developed for non-linear fitting of the PCR data, most commonly the Boltzmann or logistic sigmoidal function [9, 10, 17]. The advantage of non-linear fitting is the paradigm that PCR efficiency is not a constant but a variable that changes during PCR, having a maximum in the exponential phase of the reaction and declining in later cycles of the reaction when reagents get depleted, thus leading to the sigmoidal curvature. Non-linear fitting can then be used to calculate threshold fluorescence, cycle-dependent efficiency (E_{cyc}) and estimation of the starting template amount (F_{0}). The described sigmoidal qPCR models are four-parameter models that define by their fitted function the parameters ground asymptote, slope, point of inflection and maximum asymptote. The fitted parameters of logistic curves describe the qPCR data usually well and supersede other models like Gompertz and Chapman [11].
Although PCR data can be fitted with the four-parameter approach, this model implies symmetry of the lower and upper part of the curve, which results in the same curvature on either side of the inflection point. We found that this poses some essential problems that needed to be solved. Firstly, it is not evident that qPCR curves can be assumed to be symmetric. That this is indeed not the fact will be shown in this work. Secondly, and most important for the quantification aspect, is that fitting four-parameter models with symmetry as an inherent constraint onto asymmetric data will consequently lead to suboptimal fits and estimation of parameters [12].
We investigated the effect of applying logistic and also log-logistic five-parameter models to qPCR data, in which the fifth parameter takes a possible asymmetrical structure of the data into account. Five-parameter models have only just recently found their way into the dose-response analysis of immunological data [13].
Furthermore, we tested the significance of this approach with various statistical measures by comparing to fits of models with less parameters. The here described algorithms are implemented (besides many other functions) in the qpcR library [22] extension for the open source statistical programming environment R [23].
Results
The f-parameter in different PCR regimes
Model selection for the best fit and statistical analysis
The application of the model selection process is based on the F-test significance from the fit of the complete (or the largest part) of the amplification curve. As the PCR efficiency and second derivative maximum are derived mainly within the exponential region, it was necessary to evaluate the performance of the five-parameter model with a measure for the goodness-of-fit solely within this important part of the amplification curve. We identified the exponential region by two different methods: (i) the studentized residuals method as described in [9] and (ii) by fitting an exponential model with a window of seven points along the complete amplification curve and identifying the region with the smallest residual variance of the fit. The outcome from both methods was nearly always identical.
Estimation of essential qPCR parameters from the five-parameter model compared to previously established quantification models
For relative quantification of qPCR data, the estimates of the PCR efficiency have to be combined with the results from the threshold cycle analysis. Thus, it was necessary to derive the five-parameter equivalent of the threshold cycle, which is implemented in the qpcR package as the second derivative maximum (cpD2). The efficiency is then estimated at this point (see Equation 8). The calculation of the parameters follow the model selection step, such that they are based on the best performing sigmoidal model.
Accuracy and precision of ratio estimates obtained from Δct methods and initial template fluorescence (F0)
Summary for accuracy and precision of dilution ratio quantitation obtained from four independent dilution datasets.
Set 1 | Set 2 | Set 3 | Set 4 | |
---|---|---|---|---|
sigm/Δct/4-par | 65.5 (36.9) | 106.4 (7.0) | 108.3 (16.6) | 84.9 (7.6) |
sigm/Δct/5-par | 73.5 (21.9) | 77.2 (4.5) | 89.3 (4.6) | 75.9 (8.3) |
sigm/F0/4-par | 46.6 (35.8) | 79.7 (15.2) | 117.3 (79.1) | 68.8 (21.1) |
sigm/F0/5-par | 70.4 (42.4) | 86.1 (17.8) | 87.1 (33.6) | 72.6 (48.6) |
exp/Δct/4-par | 85.7 (21.8) | 104.2 (11.2) | 318.8 (31.7) | 339.5 (60.7) |
exp/Δct/5-par | 82.6 (21.2) | 101.3 (10.2) | 319.74 (29.9) | 355.4 (61.3) |
exp/F0 | 198.0 (93.1) | 238.9 (92.6) | 79.2 (134.0) | N.V. |
w-o-l/Δct/4-par | 59.4 (18.2) | 83.9 (5.4) | 105.7 (19.5) | 74.1 (20.7) |
w-o-l/Δct/5-par | 57.5 (18.3) | 81.9 (5.0) | 103.9 (14.9) | 73.9 (21.8) |
w-o-l/F0 | 53.5 (25.9) | 76.3 (14.3) | 310.2 (118.2) | 295.8 (175.6) |
calib/Δct/4-par | 126.0 (21.6) | 100.6 (5.8) | 100.6 (11.5) | 99.9 (7.7) |
calib/Δct/5-par | 128.6 (20.8) | 97.9 (5.1) | 99.3 (5.9) | 99.3 (8.1) |
The performance of using the Δct method with efficiency and threshold cycles estimated from the five-parameter model is increased in three datasets but is found within different methods. In contrast to this observation, ratios estimated by initial fluorescence from the five-parameter sigmoidal fit (sigm/F0/5-par) presented higher accuracy and precision throughout all datasets.
Discussion and Conclusion
By fitting four-parameter sigmoidal models onto many datasets, we observed that the fitted curves were often not optimal at the ground asymptote, top asymptote ('plateau phase') and even more important at the log-linear region that is used for the estimation of PCR efficiencies or threshold cycles. As an asymmetrical structure of the data would be a proper explanation for this phenomenon, we analyzed the performance of fitting five-parameter models onto qPCR data.
The fifth parameter (termed 'f' in this work) has profound impact on the sigmoidal curvature of the fit. When equal to 1, the five-parameter fit is reduced to its four-parameter equivalent. We rarely observed values very near to 1 after non-linear fitting. This is the reason why we emphasize the use of the five-parameter models, since asymmetry of qPCR data seems to be an inherent characteristic and absolutely symmetric qPCR data seldom (in our observations never) occur. As shown on different qPCR scenarios, the asymmetry parameter is unique to every curve and due to its interaction with other parameters of the fit (mostly 'e', the inflection point and 'b', the slope) the results of the fit are often similar but not directly comparable to four-parameter models.
To base our new proposed model on solid statistical ground, we conducted a nested F-test of the new five-parameter models versus the four-parameter versions in order to validate the increased performance. This is common practice for selecting the best model in non-linear fitting regimes [18] and delivers the essential p-value for choosing the fit with the smallest residual variance. Statistical significance in the region of p = 10^{-3} to p = 10^{-16} of five-parameter logistic or log-logistic models over their four-parameter counterparts were seen in almost all qPCR curves we examined. The log-logistic model 'l5' has the highest occurrence within the model selection, but we also observed fits with the logistic 'b5' model performing best, especially when the raw fluorescence data has low values. We believe that the advantage of the log-logistic model over the logistic model is in a reduced effect of the plateau cycles on the fit as a consequence of the logarithmized x-values (cycle numbers).
By using the RMSE and statistics based on the residual values, it could be shown that the five-parameter models clearly outperform their four-parameter counterparts in fitting the model solely to the exponential region. In most of the cases the performance of the exponential fit, which exhibited very low RMSE values and highly accurate fitting characteristics, was superior. This was not the observation for the dataset from Rutledge et al. [17] and another dataset from our group, where the five-parameter log-logistic model surpassed the exponential model. These two datasets exhibit lower raw fluorescence values in general, such that the reason for the different performances is likely to be based on the underlying platform or enzymatic system. The exponential model does not fit optimal on this kind of data and the fitting procedure was often problematic and yielded unsatisfactory estimates.
The reproducibility of the efficiency estimation with the five-parameter models was not only significantly better than with the four-parameter models, but also often surpassed the reproducibility of the exponential model and the 'window-of-linearity' method. This characteristic was found for the same datasets with low exponential fitting performance as described above.
In the aforementioned work from authors utilizing four-parameter models, the feasibility of using these were corroborated by using the R^{2}-value as the figure-of-merit, demonstrating very high values (R^{2}>0.99). As we have seen, the R^{2}-value is not a sensitive measure for model comparison: Dramatic improvement (as is often the case when going from symmetry to asymmetry) is hardly being reflected in the R^{2}-value. There is considerable controversy about the use of this measure in non-linear fitting [19]. Consequently, we would like to advocate that this measure should not be reported or trusted solely for demonstrating the validity of a fit in sigmoidal qPCR data.
The introduction of the five-parameter model is in our opinion another leap in the direction of automatic qPCR data analysis. This intention, introduced in [17] is a project to be still reached. It is unfortunately a fact that different methods in qPCR analysis can yield very different values in respect to PCR efficiency, threshold cycle or estimation of the exponential phase [20, 21]. Yet, when focusing the attention on sigmoidal models, we believe that the additional aspect of asymmetry is an important feature to take into consideration, since the performance of the fit in each part of the curve (and most importantly in the exponential region) is nearly always improved by using five-parameter models.
Methods
RNA extraction and cDNA synthesis
Total RNA was extracted from human testicular biopsies with RNApure™ (Peqlab, Germany) and re-purified on RNeasy™ columns (Qiagen, Germany) according to the manufacturers' protocols. RNA purity and integrity (28S/18S ratio) were assessed by loading aliquots of approximately 200 ng onto RNA 6000 nano assay chips using an Agilent Bioanalyzer (Model 2100; Agilent Technologies, Palo Alto, CA). Only samples with an RNA integrity number higher than 7.5 (RIN, Agilent software) were included for the PCR experiments. cDNAs were synthesized with Superscript™ II reverse transcriptase (Invitrogen, Carlsbad, CA) according to the manufacturers' protocol.
Quantitative real-time PCR (qRT-PCR)
qRT-PCR was performed using LightCycler™ (Roche, Basel, Switzerland) technology using 10 pmol each gene specific primers, 2 μl dNTP mix (25 mM each, Takara Bio, Shiga, Japan), 0.5 μl SybrGreen I (1:1000 in DMSO; Molecular Probes, Leiden, Netherlands), 0.25 μl BSA (20 mg/ml; Sigma, Germany) and 0.2 μl Ex-Taq HS (5 U/μl; Takara Bio, Shiga, Japan) in a total volume of 20 μl. Cycling conditions were 95°C 5 min, 95°C for 10 s, 60°C 10 s, 72°C for 30 s with a single fluorescence measurement at the end of the segment, repeated for 50 times. A melting curve program (60–95°C with a heating rate of 0.1°C/s and continuous fluorescence measurement) was conducted and the PCR products were electrophoretically separated on 1.3% agarose/TAE gels and verified by sequence analysis.
Curve fitting
Both the curve fitting process and the data analysis were conducted with the qpcR package, which is tailored to the special application of real-time polymerase chain reaction and houses several functions for fitting different curve types to qPCR data. The fitting process and model selection is done by using functionality from the package drc [14], which is the statistical analysis engine, while the qpcR package extends the fitting process by deriving several important qPCR parameters, providing optimization procedures and graphical evaluation of the results. All raw data used for the analysis were not processed (i.e. baseline corrected).
We compared the most widely applied sigmoidal curve type for qPCR analysis, the four-parameter logistic curve (also termed Boltzmann fit) and additionally its four-parameter log-logistic counterpart to five-parameter versions that were shown to exhibit better fits for asymmetric data [15].
with respect to the parameters (b, c, d, e) and also f in case of the five-parameter models. This procedure is in principle feasible with any data analysis software that is capable of fitting non-linear regression models, but is much more conveniently accessible through built-in models (such that the user need not provide initial values ("guesses") of the parameter values). The four-parameter models are commonly available in various data analysis software. Tools exclusively for the five parameter models are not generally available, two exceptions being the StatLIA software (Brendan Technologies) and the five-parameter log-logistic Richards model found in GraphPad Prism version 5.0 (Graphpad Software Inc...).
Choosing the best model by nested F-tests
with rss = residual sum-of-squares = $\sum _{i=1}^{n}{({y}_{i}-{\widehat{y}}_{i})}^{2}$ (y_{i} = the actual y value, ${\widehat{y}}_{i}$ = the fitted value), df = degrees of freedom, 4pl = four-parameter fit and 5pl = five-parameter fit. The p-value obtained from an F-distribution then evaluates the chance that -if the experiment were repeated- one randomly obtains data that would yield an even larger relative decrease than observed for the actual data.
Evaluating the different measures of goodness-of-fit for sigmoidal models
with rss, n and k as previously defined.
with rss = residual sum-of-squares and tss = total sum-of-squares = $\sum _{i=1}^{n}{({y}_{i}-\overline{y})}^{2}$ (where $\overline{y}$ is the average of the y values). This measure has been used to demonstrate the validity and goodness of four-parameter sigmoidal models in the analysis of real-time PCR data [10, 11, 17].
Deriving essential PCR parameters from the five-parameter models
with F = raw fluorescence at cycle x, and cpD2 = cycle number at second derivative maximum of the curve.
with F(cpD2) = raw fluorescence at second derivative maximum cycle number, Eff_{cpD2} = efficiency at second derivative maximum as calculated by Equation 8 and cpD2 = second derivative maximum cycle number.
Estimates for the PCR efficiency, the maxima of the first and the second derivatives and the initial template fluorescence based on the five-parameter sigmoidal fits are derived within the qpcR package and were used for calculations of four independent dilution datasets.
Comparison of the new model to previously established models
with res (residual) = actual value – predicted value (fit) and n = number of points. One can compare different models by absolute comparison of the RMSE values for the different models: the smaller the RMSE the better the model fit.
Declarations
Acknowledgements
We thank H. Capallo-Obermann and M. Behnen for fruitful discussion.
This work was supported by grant Sp721/1-1 of the German Research Foundation (DFG) to ANS and CF.
Authors’ Affiliations
References
- Bustin SA: Absolute quantification of mRNA using real-time reverse transcription polymerase chain reaction assays. J Mol Endocrinol 2000, 25: 169–93. 10.1677/jme.0.0250169View ArticlePubMedGoogle Scholar
- Ginzinger DG: Gene quantification using real-time quantitative PCR: an emerging technology hits the mainstream. Exp Hematol 2002, 30: 503–12. 10.1016/S0301-472X(02)00806-8View ArticlePubMedGoogle Scholar
- Klein D: Quantification using real-time PCR technology: applications and limitations. Trends Mol Med 2002, 8: 257–60. 10.1016/S1471-4914(02)02355-9View ArticlePubMedGoogle Scholar
- Livak KJ, Schmittgen TD: Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods 2001, 25: 402–8. 10.1006/meth.2001.1262View ArticlePubMedGoogle Scholar
- Pfaffl MW, Lange IG, Daxenberger A, Meyer HH: Tissue-specific expression pattern of estrogen receptors (ER): quantification of ER alpha and ER beta mRNA with real-time RT-PCR. Apmis 2001, 109: 345–55. 10.1034/j.1600-0463.2001.090503.xView ArticlePubMedGoogle Scholar
- Pfaffl MW: A new mathematical model for relative quantification in real-time RT-PCR. Nucleic Acids Res 2001, 29: e45. 10.1093/nar/29.9.e45PubMed CentralView ArticlePubMedGoogle Scholar
- Peirson SN, Butler JN, Foster RG: Experimental validation of novel and conventional approaches to quantitative real-time PCR data analysis. Nucleic Acids Res 2003, 31: e73. 10.1093/nar/gng073PubMed CentralView ArticlePubMedGoogle Scholar
- Ramakers C, Ruijter JM, Deprez RH, Moorman AF: Assumption-free analysis of quantitative real-time polymerase chain reaction (PCR) data. Neurosci Lett 2003, 339: 62–6. 10.1016/S0304-3940(02)01423-4View ArticlePubMedGoogle Scholar
- Tichopad A, Dilger M, Schwarz G, Pfaffl MW: Standardized determination of real-time PCR efficiency from a single reaction set-up. Nucleic Acids Res 2003, 31: e122. 10.1093/nar/gng122PubMed CentralView ArticlePubMedGoogle Scholar
- Liu W, Saint DA: Validation of a quantitative method for real time PCR kinetics. Biochem Biophys Res Commun 2002, 294: 347–53. 10.1016/S0006-291X(02)00478-3View ArticlePubMedGoogle Scholar
- Zhao S, Fernald RD: Comprehensive algorithm for quantitative real-time polymerase chain reaction. J Comput Biol 2005, 12: 1047–64. 10.1089/cmb.2005.12.1047PubMed CentralView ArticlePubMedGoogle Scholar
- Graaf PH, Schoemaker RC: Analysis of asymmetry of agonist concentration-effect curves. J Pharmacol Toxicol Methods 1999, 41: 107–15. 10.1016/S1056-8719(99)00026-XView ArticlePubMedGoogle Scholar
- Gottschalk PG, Dunn JR: The five-parameter logistic: a characterization and comparison with the four-parameter logistic. Anal Biochem 2005, 343: 54–65.View ArticlePubMedGoogle Scholar
- Ritz C, Streibig JC: Bioassay analysis using R. J Stat Soft 2005, 12: 1–22.View ArticleGoogle Scholar
- Finney DJ: Bioassay and the Practise of Statistical Inference. Int Statist Rev 1979, 47: 1–12.View ArticleGoogle Scholar
- Burnham KP, Anderson DR: Model selection and inference: a practical information-theoretic approach. Springer Verlag New York, USA; 2002.Google Scholar
- Rutledge RG: Sigmoidal curve-fitting redefines quantitative real-time PCR with the prospective of developing automated high-throughput applications. Nucleic Acids Res 2004, 32: e178. 10.1093/nar/gnh177PubMed CentralView ArticlePubMedGoogle Scholar
- Bates DM, Watts DG: Nonlinear regression analysis and its applications. John Wiley & Sons Hoboken, NJ, USA; 1988.View ArticleGoogle Scholar
- Environment Canada: Guidance Document on Statistical Methods for Environmental Toxicity Tests. 2007.Google Scholar
- Karlen Y, McNair A, Perseguers S, Mazza C, Mermod N: Statistical Significance of quantitative PCR. BMC Bioinformatics 2007, 8: 131. 10.1186/1471-2105-8-131PubMed CentralView ArticlePubMedGoogle Scholar
- Skern R, Frost P, Nilsen F: Relative transcript quantification by quantitative PCR: roughly right or precisely wrong? BMC Mol Biol 2005, 6: 10. 10.1186/1471-2199-6-10PubMed CentralView ArticlePubMedGoogle Scholar
- The qpcR homepage[http://www.dr-spiess.de/qpcR.html]
- The R project homepage[http://www.r-project.org]
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.