Simultaneous fitting of real-time PCR data with efficiency of amplification modeled as Gaussian function of target fluorescence

Background In real-time PCR, it is necessary to consider the efficiency of amplification (EA) of amplicons in order to determine initial target levels properly. EAs can be deduced from standard curves, but these involve extra effort and cost and may yield invalid EAs. Alternatively, EA can be extracted from individual fluorescence curves. Unfortunately, this is not reliable enough. Results Here we introduce simultaneous non-linear fitting to determine – without standard curves – an optimal common EA for all samples of a group. In order to adjust EA as a function of target fluorescence, and still to describe fluorescence as a function of cycle number, we use an iterative algorithm that increases fluorescence cycle by cycle and thus simulates the PCR process. A Gauss peak function is used to model the decrease of EA with increasing amplicon accumulation. Our approach was validated experimentally with hydrolysis probe or SYBR green detection with dilution series of 5 different targets. It performed distinctly better in terms of accuracy than standard curve, DART-PCR, and LinRegPCR approaches. Based on reliable EAs, it was possible to detect that for some amplicons, extraordinary fluorescence (EA > 2.00) was generated with locked nucleic acid hydrolysis probes, but not with SYBR green. Conclusion In comparison to previously reported approaches that are based on the separate analysis of each curve and on modelling EA as a function of cycle number, our approach yields more accurate and precise estimates of relative initial target levels.


Background
In real-time PCR, fluorescence is recorded at each cycle to monitor the generation of product [1]. Typically, after several cycles with no or minor changes in background fluo-rescence, there is a short phase with vigorous exponential increase of fluorescence, which then gradually slows down to a plateau phase. In conventional data analysis, for each fluorescence curve a crossing point (Cp) alias threshold cycle (Ct) is determined from the visible exponential amplification phase using either the fit point method or the second-derivative method [2]. It is clear that for proper calculation of initial target levels, differences in efficiency of amplification (EA) must be taken into account [3]. Even small EA differences amplify to large apparent differences in mRNA levels [4]. The above methods require the set-up of standard curves from which EA is deduced. The disadvantages of standard curves are (i) the extra effort and cost to set up additional samples e.g. by serial dilution, and (ii) non-matching EAs if inhibitors are present and serially diluted [4].
The alternative to using standard curves is to determine EA directly from the samples [5]. The initial exponential amplification can be described in terms of fluorescence (based on the assumption that accumulation of fluorescence is proportional to accumulation of amplification product) by the following equation: See Table 1 for definition of parameters. Note that in this report, EA has limits of 1 (= no amplification) and 2 (= ideal amplification, i.e. complete doubling of target with each cycle); all references to papers where EA runs between 0 and 1 have been transformed by adding 1. Ideally, one would like to determine the individual EA of each sample to determine accurate F 0 values; F 0 is directly proportional to the sample target cDNA amount. However, for each trace of fluorescence there are only very few (around 5 to 7) data points with virtually constant EA which can be used for an analysis according to equation 1. In earlier cycles, there is only background fluorescence (i.e. amplification product can not be detected for many cycles), and in later cycles the EA declines due to product accumulation [6]. Thus, very few qualified data points combined with considerable measurement error makes direct exponential extrapolation inaccurate. One strategy to improve parameter estimation is to include later points of the fluorescence curve and to adjust EA as a function of cycle number [7][8][9]. However, we have observed that these approaches can not properly model target fluorescence in detail.
Very recently, Alvarez et al. have introduced into real-time PCR data analysis the useful notion to model the decrease of EA not as a function of cycle number, but as a function of fluorescence, the indicator of product accumulation [10]. This insightful concept is more difficult to apply to data analysis though, since it does not allow direct fitting of flourescence as a simple function of cycle number. Alvarez et al. calculate, as F i+1 /F i ratio, amplification efficiencies for each cycle, then fit 2 parameters of a sigmoidal function to these EA values as a function of fluorescence, and finally estimate, with both parameters fixed, F 0 by iterative discrete fitting. The downsides of this approach are large errors in the F i+1 /F i ratios, non-linear regression with fluorescence as the independent variable (which violates the idea of x having a small or no error), fluorescence data (y axis: F i+1 /F i ratio; x axis: F i ) on both axes, and fitting twice to the same set of information. Further limitations are indicated in the Discussion.
Based on the innovative concept of modelling EA as a function of amplicon fluorescence, it was our aim here to overcome the defects of the approach of Alvarez et al.. As the key improvement, we find that iterative simulation of the PCR process with EA modelled as a Gaussian peak function of amplicon fluorescence yields precise and correct initial EA values, both with hydrolysis probe and SYBR green detection. Our approach includes, for the first time, simultaneous non-linear fitting to determine EA as a common parameter for all samples of a group. Compared to established methods of real-time PCR data analysis, our approach results in more accurate estimates of relative cDNA levels.

Modelling EA as a function of target fluorescence
Initially, we tried to fit equation 1 to a limited number (< 6) of data points from the very early visible exponential phase, i.e. the first points above background fluorescence.
In this phase the EA should still be, as a good approximation, constant and equal to the initial EA. However, this approach was relatively unreliable, even with simultaneous fitting of multiple curves, since there is considerable (random) experimental error (cf. background fluorescence differences in Fig. 1B) with every fluorescence reading, yet the last point with the highest fluorescence is always fitted best, even when various weighing options were applied. It is thus necessary to include data points from later cycles in order to mitigate random fluorescence errors. We tested previously reported sigmoid [7,8], logistic [9], and other (e.g. asymmetric sigmoid or reverse asymmetric sigmoid) transition functions in order to model target fluorescence as a function of cycle number. All of these, however, showed systematic deviations between calculated and observed fluorescence particularly in the early exponential phase (not shown).
With the aim of modelling EA as a function of fluorescence [10], we inspected several of our own experimental  (Fig. 2). As an important difference, the logistic function always yields higher EA 0 values than the Gauss function (see below). However, it is not possible to determine which function is more appropriate from this plot, since the critical region of low F i is unaccessible, because of very large errors.
In order to describe experimental fluorescence as a function of cycle number, we use an iterative approach that yields all 3 parameters by a single non-linear fitting procedure. Depending on F 0 , the virtual initial target fluorescence, EA 0 , the initial efficiency of amplification, and k, the fluorescence is increased cycle by cycle -with EA adjusted as a function of target fluorescence -up to cycle x. Note that e.g. function EA = 1 + (EA 0 -1)/exp [F i 2 /k] is valid for the plot of F i /F i-1 versus F i . However, in the PCR simulation, it is necessary to calculate -in the other direction -F i+1 from F i ; since EA is not a linear function of F i , the available ratio F i /F i-1 can not be used. Thus, combining EA = F i+1 /F i and EA = 1 + (EA 0 -1)/exp [F i+1 2 /k] gives F i * (EA 0 -1)/exp [F i+1 2 /k] + F i -F i+1 = 0. We use the algorithm of Newton [11] to solve this equation by iteration. Note that Alvarez et al. have used a F i+1 /F i plot to avoid the need to calculate F i+1 from F i by Newton iteration.

Selection of data points
Like previously reported approaches, neither Gauss nor logistic function can reliably model the plateau phase of the PCR fluorescence curve (Fig. 2). We therefore exclude all data points beyond the minimum of the second derivative (approximated by a 5 point peak; see Fig. 1 and Methods for details) from analysis. Also with the fluorescence difference (dF) data, we define the background interval that is modeled by a straight line (Fig. 1B).

Simultaneous fitting
The EA 0 values that result from fitting to individual fluorescence curves are still uncertain to an extent that precludes direct use (see below). We thus use simultaneous fitting in the final stage of data analysis to determine an optimal common EA 0 . For this, all associated curves (up to n = 16), with the same points selected as previously for individual fitting, are first pooled into a group by transformation of the cycle numbers (see Methods). Note that the protein of interest and the standard used for normaliza- Figure 1 Selection of points. (A) Fluorescence difference (dF) as a function of cycle number with data from a hydrolysis probe assay run on a LightCycler™. Filled circles were used for fitting with function EAvPeak; fitting results are displayed as line graph. Circles marked by an asterisk indicate the 5 point peak. (B) As part A, but at higher magnification on the y axis. Points for definition of background fluorescence are selected as follows: each position i, going down cycle by cycle from the 5 point peak, is checked until the dF values of at least 3 points in the interval i-8 to i-1 surpass the reference level, which is the average dF of points i, i+1, and i+2. The upper limit of the background definition interval, denoted by a filled arrow, corresponds to i; the lower limit, denoted by an open arrow, is at i-8. Finally, slope and offset of the background line is determined by linear regression on the raw fluorescence data at i-8 to i. (C) Corresponding raw fluorescence data. Fitting results from function EAv are displayed as line graph. Open circles represent points that were excluded from fitting. Asterisks and arrows as above.

Selection of points
tion, e.g. beta-actin, constitute separate groups, since different primers (and probes) are used. Samples with markedly different individual EA 0 s should be gathered into separate groups. With the iterative algorithm described above, a single common EA 0 is fitted to all curves of a group; at the same time, individual F 0 and k parameters are fitted for each curve. Based on the shared EA 0 , the final F 0 values, which are proportional to initial target amount, can be directly used to calculate relative expression levels; for this, normalized ratios, calculated from F 0 values of the protein of interest and of the corresponding standard protein, are compared.

Validation
Locked nucleic acid (LNA) hydrolysis probe and SYBR green assays with 5 different targets as shown in Table 2 were performed to resolve whether to use the Gauss function or the logistic function and to validate our approach. For each target, 3 identical dilution series of 5 samples each were processed in parallel; all 15 samples were analyzed as a group. Note that these assays are to some extent imperfect, since they were pipetted and operated by 4 different persons with ordinary skills. In these comparisons, goodness of fit, indicated by chi squared, was not always better with the Gauss function; nevertheless, on average goodness of fit was better i.e. chi squared was smaller by a factor of 1.23 (geometric mean; data not shown). Decisively, the Gauss function performed better than the logis-Efficiency of amplification can be described as a function of fluorescence with a Gauss peak or a logistic peak model Figure 2 Efficiency of amplification can be described as a function of fluorescence with a Gauss peak or a logistic peak model. For each amplicon described in Table 2 (same order, left to right corresponds to top to bottom), the third fluorescence curve (arbitrarily chosen; numbering refers to the Excel raw data file [see Additional file 5]) was analyzed. After subtraction of linear background, EA was calculated as F i /F i-1 ratio and plotted against fluorescence (F i ) as shown. To avoid confusion, points with F i < 0.1 are not displayed; most of these, because of large errors in EA, lie outside the y axis range. The line graphs were drawn with the Gauss peak function (upper row) or the logistic peak function (lower row). Note that function parameters were not fitted to the points shown, but determined by our stepwise PCR simulation approach based on the raw fluorescence versus cycle number data. Open circles represent data that was not used for fitting.
tic function according to 2 criteria: i) the sum of accuracy errors (error = absolute value of accuracy factor -1; see Table 2) is smaller, i.e. 0.26 (Gauss) vs. 0.44 (logistic). ii) With SYBR green detection of the human GAPDH amplicon, the logistic function yielded a concerted EA 0 of 2.05; this is significantly higher -the standard deviation from fitting of the concerted EA 0 s was ≤ 0.01 for all 5 targets (data not shown) -than the theoretical upper limit of 2; by contrast, the Gauss function produced an EA of 1.99. Thus, the Gauss function was used for all analyses below. Table 2 indicate good precision (median of relative errors: 5%) and accuracy. In order to compare our approach with the standard curve approach [3], each dilution series was analyzed as a separate subgroup as shown in Table 3. With our approach, precision was better by a factor of 1.5 (median of relative errors: 8% vs. 12%); more importantly, accuracy was better by a factor of 2.1 (median of relative errors: 13% vs. 27.5%). Note that our method yields EA 0 s both higher and lower than the corresponding EAs of the standard curve approach. We suppose that this is caused by the LightCycler software for Cp estimation, which can not properly correct a drifting baseline, since the best available baseline adjustment ("arithmetic") simply subtracts a constant offset from all data points. Table 4 shows results from analysis of our data sets with 2 of the tools that are available for data analysis without standard curves. Estimates from LinRegPCR analysis [4] were much less precise (38.5%) and accurate (58.5%). In comparison to the DART-PCR approach [12], which uses the average of individual EAs to calculate F 0 values, precision was virtually identical (8% vs. 7%); however, accuracy was in favour of our approach by a factor of 1.5 (13% vs. 19.5%). Table 5 suggests that our approach is better than DART-PCR because individual EAs are determined more precisely; SEMs on average (geometric mean) were smaller by a factor of 2.0. Surprisingly, with some hydrolysis probe assays we obtained EA0s definitely higher than 2.00; concurrently, the measured dilution factors of corresponding dilution series were strikingly wrong. With the same primers, but SYBR green instead of hydrolysis probe detection, EA0s ≤ 2.00 were determined, and measured matched intended dilution factors. Thus, with LNA hydrolysis probes (Roche Universal Probe Library), efficiency of fluorescence generation can be higher than efficiency of amplification. Extra fluorescence is not caused by the probe alone, since for one amplicon probe #89 gave a higher EA0 than the SYBR green assay (2.11 vs. 1.86; ≥ 3 samples per group), but for  Figure 3, to explain additional exponential probe hydrolysis. We suppose that, given matching partial binding sites as indicated, the tightly-binding LNA probe may guide the polymerase to switch to a second antisense strand during synthesis of sense strand. This low-efficiency template-switching [13,14] generates an extended amplicon with two perfect probe binding sites instead of one. The extended amplicon can be extended further by the same mechanism. In support of the model, when CCCA (antisense strand, close to the 5' end; read from right to left) was replaced by GGTG, EA0 dropped from 2.27 to 2.08 (3 samples per group). Residual fluorescence growth may be caused analogously by the sequence TGAG (marked by half dashes in the figure) in reverse strand synthesis.

Discussion
In real-time PCR, without a doubt, it would be optimal to determine an individual EA for each sample. However, it does not seem possible with present experimental technology to determine individual EAs according to equation 1 reliably: very few qualified data points (i.e. only the first 5-7 points that rise above background fluorescence with virtually constant EA) combined with considerable measurement error makes direct exponential extrapolation inaccurate. One strategy to improve parameter estimation is to include later points of the fluorescence curve. However, we find that sigmoid [7,8], logistic [9], or other functions can not properly model target fluorescence in detail. Very recently, Alvarez et al. have introduced a fundamentally different approach [10]. It appreciates that the decrease of EA is caused by product accumulation [6,15]. This concept allows to embrace even more points for analysis (i.e. up to the minimum of the second derivative of fluorescence) than other methods, which use the maximum of the second derivative as an upper limit [9] or the center of selection [12]. Unfortunately, the particular algorithm of Alvarez et al., which is based on a sigmoidal function, suffers from a number of disadvantages (see Introduction and below). In the present report we use iterative non-linear fitting with a Gauss function to describe EA as a function of fluorescence. Both approaches use the same number of parameters for fitting, i.e. 2 parameters plus the actual result, F 0 . However, our approach has the following advantages over the approach of Alvarez et al.: i) Parameters EA 0 , F 0 , and k are fitted directly to the fluorescence vs. cycle number data without any data transformation except for inevitable subtraction of background; this avoids additional errors (as in the F i+1 /F i ratios) and In an extensive comparison, the approach of Alvarez et al. displayed the lowest quantification error of all methods of individual curve analysis ( Fig. 3B and Table 2 in the cited work); similar results were only obtained with EAs esti-mated from standard curves based on dilution series [3]. We have not applied the approach of Alvarez et al. to our data, since, as explained above, the approach is based in parts on unfavorable design. However, our comparison with the widely-used standard curve approach suggests that our approach gives markedly better results (Table 3). Also, we find that our approach is much better in terms of precision and accuracy than the LinRegPCR approach (Table 4). With the DART-PCR approach, which uses the average of individual EAs to calculate F 0 values, precision was virtually identical; however, accuracy was distinctly better with our approach. We suppose that this is caused predominantly by much more (factor 2.0) precise individual EAs (Table 5). Moreover, with DART-PCR, the mean EAs of 2 amplicons were markedly smaller than the corresponding EA 0 s from our approach; the other 3 were not significantly larger. This is not surprising, since DART-PCR assumes a constant EA which is determined around the second derivative maximum and thereby may underestimate the initial EA.
In spite of these improvements, the F 0 values that result from fitting to individual fluorescence curves are still uncertain to an extent that precludes direct use (see Table  5, column EA 0 individual). The individual EAs are useful to identify erratic samples and to judge the quality of primers and probes, but, as was observed previously, they introduce additional error and thus increase data variance [12]. Indeed, in the afore-mentioned comparison of available individual curve analysis methods, accuracy and precision in quantification of experimental dilution series was poor [10]; similarly, with our data sets, the LinReg-PCR software yielded the least accurate results ( Table 4). Given that determination of F 0 values from individual EAs is futile because of experimental limitations, then the next best thing is to analyze related samples as a group with a concerted EA. Towards this end, Peirson et al. have simply calculated the arithmetic mean of individual EAs [12]. In the present report we introduce simultaneous non-linear regression to determine an optimal EA for all samples of a group. Note that with our large data sets, EA 0 determined by simultaneous fitting was not dramatically different from the arithmetic mean (compare Arithmetic mean values, Table 5, with EA 0 group values, Proposed mechanism of extraordinary fluorescence growth with LNA hydrolysis probe detection Figure 3 Proposed mechanism of extraordinary fluorescence growth with LNA hydrolysis probe detection. The example shown is the GAPDH amplicon (see Materials and Methods) which e.g. had an EA 0 of 2.33 with hydrolysis probe and 2.00 with SYBR green detection. The diagram shows forward primer, LNA hydrolysis probe with 5' fluorophore (filled circle) and 3' quencher (open circle), and the entire amplicon antisense strand, with reverse primer sequence underlined. A 3' phosphate (P) prevents elongation of the probe. fitting (EA 0 group = 2.01) and arithmetic mean (2.08) may yield markedly disparate results. We suggest that simultaneous fitting provides the best possible EA 0 that optimally unifies all related fluorescence curves; simultaneous fitting thus contributes to the better performance of our approach. Empirically, for a reliable EA 0 we would recommend to employ at least 3 samples per group.
Making good use of accurate EA 0 s, our study has revealed that fluorescence generation with some LNA hydrolysis probe assays may overestimate DNA amplification and hence cause incorrect results. To explain this, we assume low-efficiency polymerase template switching that leads to progressive amplicon elongation including additional probe binding sites (Fig. 3). It would thus seem advisable to verify each new LNA hydrolysis probe amplicon with SYBR green detection to avoid spurious fluorescence generation.

Conclusion
In the present report we introduce a new approach to analyze real-time PCR fluorescence curves without standard curves. Our strategy is based on the useful concept of Alvarez et al. to model EA as a function of amplicon fluorescence. As the key improvement, we find that a Gaussian model overcomes the defects of the original sigmoidal model. Iterative simulation of the PCR process up to the minimum of the second derivative of fluorescence yields precise and meaningful initial amplification efficiency values. In the final stage of analysis, a common EA 0 is fitted simultaneously to all curves of a group of related samples. In comparison to previously reported approaches that are based on the separate analysis of each curve and on modelling EA as a function of cycle number, our approach yields more accurate and precise estimates of relative initial target levels.

Isolation of total RNA and reverse transcription
Total RNA was isolated by the method of Chomczynski and Sacchi [16] from frozen (-80°C) tissues. Reverse transcription was performed as detailed previously [17] with the following modifications: i) RQ1-DNase (Promega, Mannheim, Germany) was used at 1 U/μg total RNA; ii) Random nonamers were used for priming; iii) cDNA synthesis was performed at 42°C.

PCR
A LightCycler 1.0 apparatus with system 2.0 software (Roche, Mannheim, Germany) was used for real-time PCR. Product accumulation was detected with SYBR green I or with locked nucleic acid hydrolysis probes (TaqMan principle [18]) from the Universal ProbeLibrary (Roche). Primers and probes (see Table 6  Selection of points First, each fluorescence curve is analyzed separately. The change of fluorescence (dF) as a function of cycle number is used to define an upper limit of useful points and to select points for linear background definition. With the dF data (calculated as fluorescence at cycle i minus fluorescence at cycle i-1), a 5 point peak is identified as the highest sum of dF values of a 5 point sliding window (Fig. 1A). The background fluorescence is modeled individually for each curve by a straight line; this line is defined by a 9 point interval as explained in the legend to Figure 1B. Slope and offset of the background line are determined by linear regression of the corresponding raw fluorescence data. Points preceding the 9 point interval and following the 5 point peak are excluded from further analysis (Fig.  1C). The parameters of a peak function (EAvPeak) are fitted to all remaining points to estimate starting parameters for function EAv. The EAvPeak function basically works like the EAv function described below, but it yields the fluorescence difference between the last and the second-tolast cylce as y output. Note that the number of points used for definition of peak and background were chosen empirically; higher numbers might work as well.
Fitting to a single fluorescence curve with variable efficiency of amplification For all remaining points, linear fluorescence background is subtracted from raw fluorescence. Then, the parameters of function EAv are determined by non-linear regression.
In essence, our Gauss model is based on the following equation (exp indicates e raised to the power of its argument in square brackets): See Table 7 for definition of parameters. Note, however, that F i+1 is calculated from F i by means of Newton iteration [11]. For details, see the function listings provided online as additional files (see above). Amplification is repeated until the cycle number reaches the x input; the final fluorescence is yielded as y output. In other words, each call of function EAv simulates a PCR reaction up to cycle x, starting with F i = F 0 at cycle 1. Apart from linear background, which is added for visual display of final results, this generates a step-wise increase in fluorescence. Individual fitted EA 0 values are displayed to the user for comparison.

Simultaneous fitting
In the final stage, a simultaneous fit is made with all curves of a group, with the same points selected as previously for function EAv. Function M16EAv uses a single global EA 0 parameter for all fluorescence curves; for each curve, parameters F 0 and k are fitted individually. The same algorithm as in function EAv is used; however, data sets are first joined by transformation of the cycle numbers: curve 1 uses cycle numbers 1 to 50, curve 2 uses 51 to 100 and so forth. Function M16EAv recognizes the input cycle number x and picks the F 0 and k parameters accordingly. Individual F 0 values and the common EA 0 are displayed to the user as the final result.