Application of the common base method to regression and analysis of covariance (ANCOVA) in qPCR experiments and subsequent relative expression calculation

Background Quantitative polymerase chain reaction (qPCR) is the technique of choice for quantifying gene expression. While the technique itself is well established, approaches for the analysis of qPCR data continue to improve. Results Here we expand on the common base method to develop procedures for testing linear relationships between gene expression and either a measured dependent variable, independent variable, or expression of another gene. We further develop functions relating variables to a relative expression value and develop calculations for determination of associated confidence intervals. Conclusions Traditional qPCR analysis methods typically rely on paired designs. The common base method does not require such pairing of samples. It is therefore applicable to other designs within the general linear model such as linear regression and analysis of covariance. The methodology presented here is also simple enough to be performed using basic spreadsheet software.

A central goal would be to elucidate a set of genes expressed and determine exactly how expression changes in response to external and internal signals and ultimately link this response to phenotypic changes. For this goal, quantification of gene expression could be performed in a variety of different ways via different methodologies [6], but the most common is to use differences in mRNA concentrations to quantify what is called relative expression that utilizes the polymerase chain reaction (PCR) to make detection of differences in initial RNA concentration possible [7]. Quantitative PCR (qPCR) has become the gold standard for such quantification and has become the technique of choice for diverse research questions [8][9][10].
The growth of amplicons within a qPCR reaction is expected to follow a logistic growth model where the increase in amplicons is exponential up until the point where reagents in the qPCR reaction begin to become limiting [8]. Because of this, Livak and Schmittgen [11] use the number 2 in their calculation of relative expression (equation 1) to indicate the potential for a doubling of the amplicon number each PCR cycle: This equation couples together the C q values from Treatment A for both a gene of interest (GOI) and a reference gene (REF) and does the same for Treatment B. The difference in the exponent in C q values for GOI and REF is referred to as a ΔC q value, and the difference between two ΔC q values as a ΔΔC q value [11].
From a theoretical perspective amplicons are expected to double each PCR cycle, yet many have shown that for various reasons this does not happen [12][13][14], and neglecting this fact can have measurable impacts on gene expression calculations [15,16]. Others [15,17] have developed methods for determining relative expression by incorporating a measure of the growth rate of a population of amplicons, called an efficiency value (E).
Though not readily apparent in this formulation, the Pfaffl method equation (equation 2 [17]) also works with both ΔC q and ΔΔC q values (see [15] for mathematical exposition).
The technique of qPCR occupies a central position in the work flow, preceded by the design and execution of the main experiment and extraction of nucleic acid. qPCR is then followed by the analysis of data and finally the post-hoc calculation of a relative expression value (Fig. 1). Though these steps are separated by qPCR, they are in fact linked, in that experimental design dictates how gene expression should be analyzed and relative expression determined. It is worth noting that the commonly used models, specifically the 2 − ΔΔC q method [11] (2001; over 106,5000 citations as of March 2020) and the Pfaffl method [17] (2001; over 26,000 citations as of March 2020), were developed to analyze paired experimental designs. In this case, the experimental design is paired in nature, and so then would be the analysis. Paired models have their place and have proved very useful in determining expression of a gene 1) before and after treatment or 2) between two tissue types within the same organism. However, many types of experimental designs exist beyond paired designs that can be used to address a multitude of experimental questions. Such questions suggest the need for the development of alternative approaches.
The common base method for the analysis of qPCR data [18] has inherent advantages over traditional methodologies and lends itself for use with other types of analyses within the general linear model (Fig. 2). Here we further develop statistical methodologies for unpaired models with a focus on linear relationships, specifically regression and analysis of covariance (ANCOVA). As with the common base method [18], we work with efficiency-weighted ΔC q values and develop relative expression calculations with associated confidence intervals post hoc.

The Common Base method
The common base method calculations are kept in the logscale for as long as possible. Remaining in the logscale allows for the use of the more familiar arithmetic mean instead of the geometric mean and permits the use of parametric statistics [18]. Any choice of base for a logarithm may be made as long as it is used consistently. We have chosen to use base-10 logarithms throughout this work.
The common base method uses C q and Efficiency (E) values to calculate an efficiency-weighted C ðwÞ q value. Let r denote a particular biological replicate, t denote a sample type, and g denote a particular gene (equation 3).
The C ðwÞ q;r;t;g value is then normalized using a reference gene or genes, where GOI is the gene of interest and REF is a reference gene (equation 4 [18];).
The advantage of such values is that each efficiency-weighted ΔC q value can be treated separately in unpaired models that incorporate categorical and/or continuous variables. The major goal of our work here is to show that the common base method can be expanded to other statistical tools, including regression and analysis of covariance (ANCOVA). We will provide the mathematical approach for consideration of linear relationships, where at least one of the variables is ΔC ðwÞ q , including calculation of ΔΔC ðwÞ q values, relative expression ratios, and associated confidence intervals. We begin with regression and proceed into ANCOVA.

ΔC ðwÞ
q as the Dependent Variable. We begin with consideration of the case where the dependent variable (y) is ΔC ðwÞ q , while the independent is a non-gene expression variable (x). For example, consider the concentration of a hypothetical hormone α 1 in plant leaves and expression of gene G in these same leaves, using ΔC ðwÞ q of G. We may be interested in how these two variables are related. For each individual, we could measure both α 1 concentration and quantify, through qPCR, an efficiency-weighted C q of gene G as ΔC ðwÞ q . Suppose that all necessary assumptions for a regression (linearity, homoscedasticity, independence, and normality) have been met by our data set. Note that the assumptions of regression analysis are covered in any introductory statistics text.
Once the regression analysis has been performed, it is now possible to calculate relative expression ratios as a function of hormone concentration along with associated confidence intervals. As discussed earlier, in unpaired models ΔΔC ðwÞ q values are used to calculate relative expression ratios (R) after statistical analyses have occurred (Fig. 2).
Suppose the line of best fit is of the form.
whereŷ is used to denote the predicted value of ΔC ðwÞ q given a value of x based on the linear equation (Fig. 3a). We can then rework the linear equation into a form that will yield an equation whose input is the concentration of hormone α 1 and whose output is a relative expression ratio R. We first must choose a fixed input concentration of hormone α 1 to be a "baseline" level (x 0 ) for comparison. For our example, let x 0 be the mean α 1 concentration 1 found in the original experiment. Let be the output predicted from the x 0 concentration of hormone α 1 . We will now subtract (equation 6) from (equation 5) to produce an equation that outputs predictions for ΔΔC ðwÞ q values based on predicted ΔC ðwÞ q values and the choice of baseline x 0 (Fig. 3b). In other words, where each ΔΔC ðwÞ q uses the baseline concentration of hormone α 1 and varies the chosen concentrations of hormone α 1 within the range of values used in the experiment (Fig. 3b). By applying an exponential function to (equation 7), we arrive at an exponential equation for relative expression ratio using the baseline. As a formula, In other words, from a plot of ΔC ðwÞ q and x (Figure 4a, Table 1), we have an equation that takes as input concentration x of hormone α 1 and outputs a pre-dictedR that is relative to the baseline concentration of α 1 x 0 ( Figure 4c, Table 1). Notice that using x = x 0 as the input in (equation 8) predicts a relative expression ratio of 1, which is exactly as it should be. We can predict that a plant with a hormone concentration of 8.85 pg/mL would have an expression of Gene G that is 27% (R ¼ 0:73Þ lower than that of plants with average hormone concentration. (Any values for the independent variable may be chosen to predict R as long as they do not occur outside of the minimum and maximum values used in the study). It is important to note that relative expression plots tend to be inverse ver-

Confidence interval calculations from regression
While functions describing the relationship between two variables have great value, they only represent point estimates of output values for each input. However, assuming that the statistical assumptions for a valid regression have been met, one can also produce confidence intervals 2 to envelope the point estimates resulting from the regression formula, allowing for meaningful error bars to be placed around point estimates. We will demonstrate that in order to calculate confidence intervals for relative expression value estimates, we first need to calculate the confidence intervals for ΔΔC ðwÞ q . These confidence intervals are derived from the confidence interval around the regression Fig. 4 Results of regression analysis between concentration of hormone α 1 and ΔC ðwÞ q where the two variables are (a) highly correlated (r 2 = 0.962) and (b) correlated (r 2 = 0.709). Plot of predicted relative expression ratios (R) for (c) regression in A with 95% confidence interval (CI) and for (d) regression in B with 95% confidence interval (CI). (e). Plot of predicted relative expression ratios (R) based on a linear regression between concentration of hormone α 1 and ΔC ðwÞ q with 95% confidence interval (CI). Relative expression and CI in (c and d) are based on comparisons to average concentration of hormone α 1 measured, while (e) compares to the largest concentration of hormone α 1 measured. Vertical dotted lines indicate x 0 slope m. Most statistical software tools (e.g., SPSS or Minitab), and even Excel, will compute the confidence interval for a regression slope as part of the standard regression output. This output is typically given as the low end and high end slope values of the 95% confidence interval in a form such as (L,U), though many tools allow for reporting of other confidence intervals. The formulas for L and U can be found in any introductory statistics textbook that covers inference related to linear regression.
We return to the setting where the concentration of hormone α 1 x and ΔC ðwÞ q value y are linearly related and fit a linear formula as in (equation 5). Let x be an arbitrary input value in the range of data values collected in your study, and let x 0 be the fixed baseline input value with associated linear output as in (equation 6). In our example, we fix x 0 to be the mean value of x, but any fixed choice will work. Recall from (equation 7) that ΔΔC ðwÞ q ¼ mðx − x 0 Þ . Thus, the only random element in the estimate of ΔΔC ðwÞ q is the slope m, and so the uncertainty of ΔΔC ðwÞ q is solely a function of the uncertainty around m.
Suppose that the confidence interval (CI) on the slope parameter m is (L,U). Then the confidence interval for ΔΔC ðwÞ q is given by.
depending upon whether (x − x 0 ) is positive or negative for each x. In order to calculate the corresponding confidence interval for the predicted relative expression ratioR, we apply the exponential transformation to the interval calculated in (equation 9) (Fig.  4c) and mimic our end formula in (equation 8). Table 1 Hypothetical data used to generate Figure 4a where ΔC ðwÞ q is the dependent variable. Calculation of predicted relative expression,R, values follows 10 mðx0 − xÞ , where m = − 0.139, and these values are plotted in Figure 4c. x 0 = 9.85 is the mean x. The 95% confidence interval for the slope m is (−0.162, −0.117) Depending upon whether (x 0 − x) is positive or negative. (Notice the change in order of x and x 0 made to match the order given in (equation 8).) From our example, the 95% confidence interval around our estimate of R given a hormone concentration of 8.85 pg/mL is 0.69-0.76 indicating relative expression of 69-76% compared to that of individuals with average hormone α 1 concentration.
For any regression, r 2 is an indication of the overall quality of the equation of the best fit line. Lower r 2 values tend to increase the size of the confidence intervals around Table 2 Hypothetical data used to generate Figure 4b. Calculation of predicted relative expression, R, values follows 10 mðx0 − xÞ , where m = − 0.139, and the values are plotted in Figure 4d. x 0 = 9.85 is the mean x. The 95% confidence interval for the slope m is (− 0.212, − 0.066)  predicted relative expression ratios because as the r 2 value lowers, the margin of error around the predicted slope value increases (Fig. 4b, d; Table 2). A Comment on Choosing the Baseline Value for the Independent Variable.
Notice that the widths of our confidence intervals are functions of the distance between input x and the baseline value x 0 (equation 10). The uncertainty that leads to the error for the estimates is solely due to uncertainty in the slope m, which means that the choice in baseline value x 0 does not alter the uncertainty. However, the choice of x 0 does play a role in how that uncertainty is translated into a confidence interval around a given ΔΔC ðwÞ q . As such, choosing x 0 to be the mean value for x will result in overall smaller error bars and more symmetrically distributed error bars around estimates compared to choosing x 0 to be one of the extreme values (minimum or maximum) ( Fig. 4e; Table 3).
The selection of x 0 should always be influenced by the experimental design. In our example, we selected the mean value of x for the baseline value x 0 since values of hormone α 1 concentration and ΔC ðwÞ q values were determined from randomly chosen plants. Suppose, however, that there is a tendency for the variable x to take on a certain value x 0 in nature. If your experiment is to test the effects on gene expression by varying or manipulating the value of x, then it may make better sense to use the unmanipulated value x 0 as the baseline in your calculations instead of the mean value of x, as that value serves as a natural point of comparison in your experiment. Such decisions should be made prudently.
In the absence of any other motivating factors or when the values of the independent variable will not be manipulated in the course of the experiment, we generally advocate choosing the mean value of x as the baseline value x 0 .

A comment on slope of the regression line
The p-value in a linear regression is used to test the null hypothesis m = 0. In our example above, we were able to reject the null hypothesis and obtained the formula (equation 8) as a result. Notice that if we were unable to reject the null hypothesis, we would be left with the assumption that the slope is not significantly different from zero, and (equation 6) would result in the constant functionŷ ¼ b, meaning that we have no evidence that the concentration of α 1 has any effect on gene expression. (Equation 8) would yieldR ¼ 1 , showing that changes in α 1 concentration have no impact on the relative expression ratio for the gene in question.
ΔC ðwÞ q as the Independent Variable. It may be of interest to determine the effect of the expression of a gene on some measureable quantity (y). Such an approach is common in experiments where the level of expression of a gene is explicitly manipulated either by varying the strength of the promoter or varying the number of gene copies. The result would be two values for each individual, the efficiency-weighted ΔC ðwÞ q for a particular gene or gene array and a response variable, y. For example, suppose that a particular gene's expression is thought to correlate with promiscuity in a certain species of animal as measured by time (min.) spent huddling with their partner (conceptual example derived from [19]). In this case, we would be using ΔC ðwÞ q values as the independent variable x, and y (time spent hud-dling) would be the dependent. The mathematics for this case is the inverse of the case above. 3 Suppose that the assumptions for a valid linear regression have been met and produce a line of best fit with associated statistics (Fig. 5a, Table 4).
To calculate a functional form that involves relative expression ratios R and confidence intervals, one should judiciously choose a baseline value for gene expression ΔC ðwÞ q , which we label as x 0 for brevity. We set Therefore, subtracting b We can rearrange that into a final form by adding b y 0 to both sides of the equation ) tells us that for a given R, or relative expression ratio between two values (x and x 0 ), we expect a specific change in time spent huddling (Fig. 5b, Table 4). In our hypothetical case, individuals with 50% higher expression of the promiscuity gene (R = 1.5) have an increase in huddling time of 73.0 s. Note that this value is only applicable to a comparison with the currently chosen x 0 ; in other words, a 50% increase in expression relative to x 0 . If you require a different set of comparisons, then you will require a new baseline for comparison.
As with all predictions of y, we recommend confidence interval calculations. We can generate formulas for confidence intervals to place around predicted values of the dependent variable given values of R. Suppose that the confidence interval on the slope parameter m is (L, U). Substitute this expression into (equation 15) and simplify to calculate a confidence interval forŷ based on a specified value of R.
where the order of L and U is swapped because of the negative multiplier in the 3 As the cases of ΔC ðwÞ q as dependent variable and ΔC ðwÞ q as independent variable are inverses, they each present essentially the same information but in two different manners. The nature of the experiment should help guide which approach is preferred. We advocate using ΔC ðwÞ q as the independent variable only in situations where ΔC ðwÞ q is a manipulated variable, i.e., the experimental design manipulated the level of some gene's expression. Otherwise, we suggest relegating ΔC ðwÞ q to the dependent variable. When ΔC ðwÞ q is a dependent variable, you will be able to calculate a predicted relative expression ratio from a given input value x. When ΔC ðwÞ q is the independent variable, you will only be able to calculate a predicted change in variable y compared to a predicted baseline given an input relative expression ratio, instead of predicting an absolute calculation for y. The former situation is slightly easier to plot and describe.
formula. Given our hypothetical example above, the 95% CI for huddling time given a 50% increase in expression would be an increase in huddling time of 61.3 s -84.6 s.

ΔC ðwÞ
q as Both Independent and Dependent Variable. Another useful technique might be to relate ΔC ðwÞ q values for two separate genes. This case is the intersection of the two cases listed above, but we include the derivation to make it explicit. The resulting regression would allow us to establish that the ΔC ðwÞ  The choice in independent variable will give one value either for the regression slope or its reciprocal and will vary the margin of error for that slope resulting in different widths for the confidence intervals.
Suppose that the independent variable x is given by ΔC  Table 5) has produced the formulâ We fix a baseline level for ΔC ðwÞ q;A , which we label as x 0 , and get y 0 ¼ mx 0 þ b as usual. Given ΔC ðwÞ q;A ¼ x − x 0 , we then subtract y 0 ¼ mx 0 þ b from (equation 17) and use notation similar to (equation 12) for gene A and B to produce Applying an exponential function to both sides and applying some algebra reveal showing that the relative expression ratio for B is the m th power of the relative expression ratio for A in this case ( Figure 6B, Table 5). From our example, individuals with 10% higher expression of gene A (R A = 1.1) are predicted to express gene B at a 3.6% higher rate ( c R B ¼ 1:036) relative to individuals with average gene A expression.
Yet again we can generate formulas for confidence intervals for each value of c R B predicted by a given value of R A . As in all earlier cases, all uncertainty derives directly from the uncertainty in the slope parameter. Suppose that the confidence interval on slope Table 4 Hypothetical data used to generate Fig. 5a. Calculation of predicted huddling time,ŷ, values follows y 0 − mlogðRÞ, where m = − 6.907, and these values are plotted in Fig. 5b. x 0 = 0.457 is the mean x, and y 0 ¼ 9:847. The 95% confidence interval for the slope m is (−8.  (Figure 6B, Table 5).
depending upon whether R A > 1 or 0 < R A < 1. For our example, the 95% confidence interval around c R B is 1.027-1.044, which corresponds to a predicted expression of gene B at 2.7-4.4% higher than that of individuals with average gene A expression.

A note on the assumption of linearity
There are important assumptions that must be met for regression analysis to be considered appropriate. These assumptions are covered in any general statistics text, and so we omit them here to conserve space. However, one of these assumptions, that of linearity, is worth discussing further. All of the work above assumes that there is a linear relationship between variable x and ΔC ðwÞ q , ΔC ðwÞ q and variable y, or between ΔC ðwÞ q;A and ΔC ðwÞ q;B . In these cases, the linear relationship between y and x resulted in either an exponential relationship between relative expression ratio R and x, a logarithmic relationship between R and y, or a power relationship between R A and R B . Theoretically, the functional relationships between measured variables and measures of gene expression (in our case the efficiency-weighted C q , ΔC ðwÞ q ) could assume any number of shapes depending on the gene of interest, the experimental condition, and even the species [5,20], leading to other functional relationships between R and x, R and y, and R A and R B . In cases where x and y are not linearly related, it is common to apply transformations to the data to improve linearity. A properly chosen transformation can allow for the linearity assumption to be met and a linear regression to be performed. However, the mathematical approach to calculating R is constrained by the specific transformation that was chosen.
The common base method is amenable for considering many functional types; however, for this paper we focus on only a few cases that we hope will illustrate the general concept. Above, we developed the calculations for the relationship between relative expression ratio R and an independent variable x that is exponential (R = kb x ) when ΔC ðwÞ q and x are linearly related. We also developed a logarithmic formula y = a + b*log(R) for linear relationships between a dependent variable y and R when they are linearly related. We finally showed that a power function ( R B ¼ R m A ) results when ΔC  Table 5 Hypothetical data used to generate Figure 6A. Calculation of predicted relative expression, R B , values follows R m A , where m = 0.367, and these values are plotted in Figure 6B.
In other words, suppose that the relationship between x and y is logarithmic (Figure 7A). Such plots are linearized by log-transformation of x ( Figure 7B, Table 6). For example, suppose that expression of a particular bacterial gene is predicted by the density of the bacteria in culture. The function relating ΔC ðwÞ q to density of cells shows that ΔC ðwÞ q responds more to a change in density when the bacterial count is low than when the bacterial count is high.
We again choose a fixed baseline value x 0 for the variable x and subtract equations using inputs x and x 0 as we did with (equation 5) and (equation 6) yielding After applying the exponential transformation, we havê Using algebraic properties of the logarithm, we producê In conclusion, when efficiency-weighted ΔC ðwÞ q values have a logarithmic relationship to x, then we obtain a power function relationship between relative expression ratio R and x ( Figure 7C, Table 6).
Again, notice that inputting a concentration of hormone α 1 x = x 0 will result in a predicted relative expression ratio of 1 as we would expect.
In the case where log(x) and ΔC ðwÞ q are linearly related, the process for calculating a confidence interval only needs slight alterations compared to our first case. By tracking (equations 9, 23-27), we see that appending log() around each x or x 0 will result in the correct formula. Therefore, we adjust (equation 10) and apply some algebraic properties of logarithms (as in (equation 26)) to obtain: Table 6 Hypothetical data used to generate Figure 7A, B. Calculation of predicted relative expression,R, values follows ð x0 x Þ m , where m = − 0.116, and these values are plotted in Figure 7C. depending upon whether the ratio x 0 x is greater than 1 or less than 1 for each value of x, which in turn is equivalent to whether (x − x 0 ) is positive or negative ( Figure 8C). From our example above (Table 6), a concentration of cells of 70 cells / nL would be predicted to have a 7.7% lower expression (R ¼ 0:923Þ than cells at the average concentration of 140 cells / nL with a 95% CI of a decrease in expression of 7.3-8.2%. ΔC ðwÞ q as the Independent Variable and Log-Transformed y Where the relationship between x and y is log-linear ( Figure 8A, Table 7), it may be necessary to log transform the dependent y values to establish a linear relationship with ΔC ðwÞ q as the independent variable ( Figure 8B). For example, in a species of insect, a particular gene is implicated in determining the size at pupation. Slight changes in gene Suppose that the assumptions for a valid linear regression have been met with a line of best fit Again, one should judiciously choose a baseline value for gene expression ΔC ðwÞ q , which we label as x 0 . We again set and have logðy 0 Þ ¼ mx 0 þ b. Thus, Subtracting the equation for logðy 0 Þ from (equation 29) yields the formula We apply some logarithmic properties to obtain the following: Next, apply the exponential function.
Finally, solve forŷ to obtain the power function ( Figure 8C, Table 7): This equation tells us that for a given R, or relative expression ratio between two values, we expect a specific change in response variable y ( Figure 8C, Table 7). We can generate formulas for confidence intervals to place around predicted values of the dependent variable given values of R. Suppose that the confidence interval on the slope Table 7 Hypothetical data used to generate Figure 8A, B. Calculation of predicted y,ŷ, values follows y 0 R − m , where m = 7.878 and y 0 ¼ 31:094, and these values are plotted in Figure 8C. x 0 = 0.417 is the mean x. The 95% confidence interval for the slope m is (7.516, 8 parameter m is (L, U). Substitute this expression into (equation 35) and simplify to calculate a confidence interval forŷ based on a specified value of R.
depending on whichever interval is in the correct order. Given our example, a 10% higher level of expression (R = 1.1) predicts a decrease in length of larvae at pupation from 16.4 mm to 14.7 mm. The 95% CI for the length of the larva at pupation is 14.2-15.2 mm when expression is 10% higher than individuals with average expression. Note that these results are only applicable with the currently chosen x 0 .

Other cases
While we treated cases above where the non-gene variable needed to be logtransformed first to establish a linear relationship, we have not discussed cases where into relative expression ratio R will yield functional formulas that are "doubly exponential" or "doubly logarithmic." While such formulas are not impossible, they do not appear to be common in nature. Another way to consider this situation is that since R ¼ 10 − ΔΔC ðwÞ q with ΔΔC ðwÞ q in the exponent of R, we can view ΔΔC ðwÞ q as something that is already derived through a log-transformation applied to R. Thus, applying a logarithm to ΔC ðwÞ q would be like applying two layers of log transformations to R, which does not seem likely to be necessary. On the other hand, one should not view an omission of any particular functional form in this work to represent a dismissal of that form as impossible. Nevertheless, our treatment of linear, exponential, logarithmic, and power forms covers the most common functional relationships curve shapes for two variables (Figure 9).

Analysis of covariance
The common base method [18] may be used to perform paired and unpaired 2-sample t-tests and calculate 2-sample t-intervals as well as analysis of variance (ANOVA). These approaches can fail, however, when the quantities being compared between the groups are also affected by an uncontrolled quantitative covariate. In that case, analysis of covariance (ANCOVA) is a powerful analysis tool that combines ANOVA and linear regression techniques. In a simple, one-way ANCOVA, there will be three variables of interest: the factor or treatment effect (an independent categorical variable consisting of at least two groups), the response (a dependent quantitative variable), and a covariate (an independent quantitative variable).
For example, suppose that we have determined that ΔC ðwÞ q of a gene RT in larvae is affected by temperature. We might have a suspicion that RT expression is also affected by the larvae's diet. We could perform an experiment at a single temperature where larvae are given an experimental and control diet. This would be a traditional use of qPCR and can be analyzed with the common base method as a 2-sample t-test. However, since we already know that temperature affects RT, we would be left wondering if the diet change was effective in altering RT expression across temperatures or if temperature and diet interact in some fashion. We could design an experiment that looks at both temperature and diet at the same time. Instead of designing an experiment with several larvae (replicates) in each combination of temperature and diet (twofactor ANOVA), we will instead grow larvae in three treatments: two experimental diets and one control diet across a range of temperatures (the covariate) in order to analyze the effect on expression of RT (the response).
Since we know from previous research that temperature and ΔC ðwÞ q of RT are related linearly, we really are not interested in performing another experiment to test this hypothesis. Instead we are interested in the effect of diet on ΔC ðwÞ q of RT, and we can determine if this effect is similar across temperatures or whether diet and temperature interact to alter ΔC ðwÞ q of RT. An ANCOVA is the obvious choice to test this hypothesis. Note that in our example above, temperature is manipulated by the researcher. However, covariates may also be unmanipulated variables that vary among individuals that are known to affect y.

The basic process for ANCOVA
(1) Perform separate linear regressions on the response as a function of the covariate for each of the treatment groups, and determine that at least one of those lines has a (2) Verify homogeneity of slopes for the lines. Although it is unlikely that the regression step produced lines with identical slopes, it is possible that the data fit a model with an enforced common slope. Testing homogeneity of slopes relies on testing the significance of the interaction term between the treatment and covariate, diet*temperature in our example. Depending upon your choice of software, you will probably run some form of fit for a general linear model (possibly within an ANOVA menu) that accepts a response, treatment, and covariate. Often in an option for "model," you can enter the interaction term. The resulting output should include a p-value for the interaction. The p-value for this interaction tests a null hypothesis that the slopes are the same. If the p-value is greater than 0.05, then you fail to reject the null hypothesis and may assume the slopes are homogeneous. If the p-value is smaller than 0.05, then the interaction between the treatment and covariate is significant, and so the slopes of the lines are likely different. In this case, ANCOVA is not appropriate.
(3) Where slopes are homogeneous, rerun the general linear model routine but without the interaction term in order to recalculate the regression lines with a new enforced common slope. Most software packages should also offer options for "contrasts" or "comparisons" that will generate confidence intervals for pairwise comparisons between treatments. We will avoid dictating which of the many types of contrasts (Fisher, Tukey, Sidak, or Bonferonni) is preferable.

Relative expression ratios and confidence intervals from ANCOVA
Suppose that all three steps above have gone correctly and that for the three treatments we now have regression lines that share an enforced common slope. Notice that the slope, m, is the same for each equation.
Then the differences in the lines are measured by b 2 − b 1 , b 3 − b 1 , and b 3 − b 2 respectively.
In our example, x stands for temperature while y stands for the ΔC ðwÞ q of RT. We use the subscripts c to denote control diet and t1 and t2 to denote treatment diets. Since the lines have the same slope, they are all parallel, and each pair has a constant vertical difference given by the difference between intercept values: b t1 − b c , b t2 − b c , and b t2 − b t1 . As that difference is a measurement on the y-scale, it represents a predicted ΔΔC ðwÞ q measurement ( Figure 10). For example, b t1 − b c and its confidence interval predict the effect on ΔC ðwÞ q between treatment1 and the control at any given value x of the covariate. In our example, we are calculating the effect that the two different diets have on expression of the gene RT while controlling for temperature.
We may now calculate a predicted relative expression ratioR showing the difference in any pair of factors (e.g., treatment1 effect relative to the control on the gene) at any given covariate value.
Similar to our regression analysis, we may also calculate a confidence interval for this predicted relative expression ratio using (equation 38) and the confidence interval (L,U) calculated for the difference b i − b j between any two factors.
where the order of L and U has switched because of the negative multiplier in the exponential function. For our example data (Table 8), a check of the homogeneity of slopes assumption shows that we can treat our lines as parallel (p = 0.613). Rerunning the analysis without the interaction term shows that both temperature and diet affect ΔC ðwÞ q . Post-hoc analysis shows that the treatment diets were both significantly different from the control (p < 0.001), but the two treatment diets were not different from each other (p = 0.829). Larvae exposed to the treatment1 diet expressed RT at a level 194% higher than in the control (95% CI = 181-207%; Figure 11). Larvae exposed to the treatment2 diet expressed RT at a level 192% higher than in the control (95% CI = 181-207%; Figure 11). With no difference in RT expression between the two treatments the 95% CI for relative expression comparing treatment2 to treatment1 (R = 0.993) overlaps 1 with the 95% CI = 0.930-1.061 ( Figure 11).
One of the key assumptions of the ANCOVA process is that the slopes of the regression lines can be statistically treated as equal, even if they are not calculated to be exactly equal during individual regression analysis. The analysis generates a common slope for each trend line, and the differences between the intercepts derive from these rather than the original slope estimates. In our example above, the common slope is estimated to be 0.033. If this homogeneity assumption does not hold, then the ANCOVA cannot proceed as there is evidence that the difference between the lines is not constant with respect to the covariate.  Table 8. Plotted regression lines use the common slope of 0.033 Table 8 Hypothetical data used to generate Figures 10, 11. Calculation of relative expression follows 10 − ðb1 − b2Þ , where b represents the y intercept, and the subscripts c, t1, and t2 represent control, treatment1, and treatment2 respectively 3. Production of the linear equation through regression analysis allows us to determine y values given x values. Interpretation of this relationship depends upon the experimental design. Where x values are measured from randomly chosen individuals (unmanipulated), the relationship is predictive but not necessarily causal. Care should be exercised in such interpretations. Where x values are manipulated as part of an experiment, it may be appropriate to apply such causality.
4. Presentation of relative expression values should be accompanied by confidence intervals [18]. It is not enough to report the relative expression value since, depending on the tightness of the relationship, confidence can vary greatly.

Conclusion
Traditional qPCR analysis is not able to address statistical models other than the paired t-test. The common base method is amenable for use with any of the statistical models from the general linear model. Here we have shown how the common base method may be applied to determine relationships between ΔC ðwÞ q values and an independent variable, a dependent variable, or another gene's ΔC ðwÞ q values. We have developed the concept of how to plot relative expression ratios R compared to an untransformed or log-transformed dependent or independent variable or to another relative expression ratio. In this manner, we can predict either how relative expression will change given a change in a measured variable, how a measured variable will change given an experimental change in expression, or how expression will change given a change in expression of a second gene.

Regression
In a simple linear regression analysis, we are attempting to determine if a linear relationship exists between two variables and, if so, describe the relationship. A linear regression analysis will return a linear equation y = mx + b connecting the two variables x and y. The analysis will at a minimum yield a coefficient of determination r 2 and a pvalue associated with the slope test. The r 2 value is a number between 0 and 1 that indicates the amount of variation in y that can be explained by variation in x. The closer r 2 is to 1, the better the linear relationship or fit between the two variables. The p-value is used to test whether or not the slope m is significantly different from zero.
In the results section we describe cases of linear regression where one of the variables is the efficiency-weighted C q ; ΔC ðwÞ q . The ultimate goal will then be to show how such a regression line can be transformed into a nonlinear formula where one of the variables is a relative expression ratio R. To our best knowledge, conceptualization of relative expression ratios in this manner is novel.