Making concentration estimates and estimating their errors in our ELISA microarray studies involve a sequence of steps beginning with the layout of the ELISA microarray and design of the experiment. Following execution of the analytical components of the experiment, the statistical analysis proceeds with data screening, normalization, and model identification. Estimation and evaluation of the standard curves and error estimation functions come next. Finally, the standard curves and error estimation functions are applied and then evaluated using a modeling diagnostic.

### Layout of the ELISA microarray and design of the experiment

To estimate errors in concentration estimates, it is necessary to carefully lay out the microarray and design the experiment. Our layout features several distally separate replicates of each assay spot on each microarray to evaluate local processing effects. Our design addresses selection and application of treatments – in particular, replicate treatments – to a collection of arrays. This replication facilitates adjustments for the sources of variability that lead to ambiguous concentration estimates [16, 17]. In array experiments featuring relatively small numbers of assays, usually 50 or fewer analytes, thoughtful design is critical to normalization, calibration, and estimation of concentrations due to the significant lack of technical replicates found in arrays with thousands of assays. With regard to error estimation, the major consideration in the design of the experiments is replication of treatments across arrays to capture the effects of process error.

To illustrate our technique for evaluating estimation errors in an ELISA microarray experiment, we used a subset of data from an ELISA microarray investigation of breast cancer biomarkers. The ELISA microarray experiments were performed as previously described [2, 3]. Briefly, capture antibodies were covalently attached to an aminosilanated glass slide surface (Sigma, St. Louis, Missouri, USA) using a Microgrid 2 robot from Genomic Solutions (Ann Arbor, Michigan, USA) equipped with ChipMaker2 split pins from TeleChem (Sunnyvale, California, USA). As demonstrated previously, these spots are typically uniform in shape with a reasonable homogenous distribution of protein across the spot [1–3]. That is, "donut" formation is not normally observed. These spatially confined antibodies bind a specific antigen from a sample overlaying the array. A second, biotinylated antibody that recognizes the same antigen as the first antibody but at a different epitope is then used for detection. Detection of the second antibody is based upon streptavidin (which binds biotin) and an enzymatic signal enhancement method known as tyramide signal amplification (TSA). The resultant fluorescence was detected at 10-micron scan resolution using a ScanArray 3000 from General Scanning (Billerica, Massachusetts, USA). The experiment used 94 arrays printed in pairs on 47 slides. Each array contained 4 (2 × 2) replicate subarrays of 25 (5 × 5) spots. A subarray contained 21 unique assays, 1 positive control and 3 negative control spots. A set of 7 known standard concentrations and a buffer blank was assembled by performing a three-fold dilution series of a single mixture of all the standards. Each standard concentration was applied to duplicate slides. The remaining 39 slides were treated with serum samples from women with or without breast cancer. These sera were encoded to prevent knowledge of the study group during sample processing. The treated microarrays were imaged with a ScanArray microarray scanner (PerkinElmer, Boston, Massachusetts, USA). The spot fluorescence estimates were calculated with custom array-image-analysis software that was developed in-house.

### Data screening, normalization and model identification

Data screening, an exploratory data analysis, serves several purposes – identifying outliers, anomalous values, and experimental design shortcomings; identifying data transforms to improve curve-fitting and application; identifying measurement trends and other processing effects; and suggesting an appropriate functional form for the standard curve [6, 18–21]. This exploratory analysis combines simple summary statistics and graphical displays. For instance, graphs of control spot intensities versus processing variables such as array print order or pin number may reveal variability due to processing. These processing trends can be made more apparent with locally weighted regression, or loess, a statistical technique to fit a smooth curve through the scatterplot [22, 23]. These graphs can be used as the basis for modifying the process or for data normalization.

Because our protein arrays feature fewer spots per array than do typical gene expression microarrays, a different approach to normalization, suitable for low spot-frequency arrays, is required. This normalization is critical, given that array-to-array processing error is common and that standard curves are estimated from reference spot intensities calculated from one set of arrays and then applied to sample spot intensities estimated from a separate set of arrays.

A scatterplot of intensity estimates of standard spots versus concentration is particularly useful. First, outliers and anomalies may be readily apparent. Second, the spacing between concentration values may be assessed. If standard concentrations follow from a dilution series, then the separation between concentrations decreases significantly with the decrease in concentration. This results in spot intensities measured at higher concentrations having much more leverage on the fit of the model than may be desirable. It should also be apparent whether the variability in spot intensity is increasing with mean spot intensity. Both increasing spacing in the concentrations and heteroskedasticity in the measured intensities affect the model fit and follow-on statistical inferences [24]. These may be minimized with log_{
e
}transformations of both concentrations and spot intensities.

A scatterplot of raw or transformed standard spot intensity versus concentration also provides an indication of the appropriate model for the data. In particular, data following a sigmoid curve favor the logistic curves while data apparently lacking the horizontal asymptotes of a sigmoid curve favor a linear or power law model. Although several models may be fit and one selected based on a goodness-of-fit statistic (see next section), the scatterplot is a useful visual check on this selection.

Several plots provide useful information about the quality of the fitted model. Of special importance are the scatterplot of residuals versus concentration and the scatterplot of residuals versus estimated intensity. In both cases, the variability of the residuals should be centered about zero and constant across concentration or intensity. Model bias is indicated by a systematic drift of residuals to one side of the zero line. Heteroskedasticity is indicated by a systematic change in the variation of the residuals. Both may indicate that a better model is necessary before proceeding to estimation of sample concentrations and estimation of concentration errors.

### Standard curves and estimation errors

An ELISA standard curve expresses protein concentration as a function of spot intensity. One standard curve is required for each assay. In an ELISA microarray experiment, the standard data are collected by fixing a set of concentrations and measuring spot intensities via imagery of the treated arrays. A standard curve is estimated by fitting an appropriate function to the set of (concentration, intensity) measurement pairs [25]. This equation is then inverted to obtain the standard curve.

Common parametric choices for standard curve models are multiparameter logistic functions and power law functions. For an ELISA microarray, a strictly monotone model is consistent with the belief that a monotone change in concentration should result in a monotone change in spot intensity.

We estimate standard curves with both logistic and power law parametric models. The four-parameter logistic model [26], expressing intensity *I* as a function of concentration *C* and parameters *P*
_{1}, *P*
_{2}, *P*
_{3} and *P*
_{4}, is defined as

The two-parameter power law model [27] expressing intensity *I* as a function of concentration *C* and parameters *P*
_{1} and *P*
_{2}, in log_{
e
}terms, is

log_{
e
}(*I*) = *P*
_{1} + *P*
_{2} log_{
e
}(*C*) + *ε*

We assume the errors, denoted by the term *ε*, are independent and normally distributed with mean 0 and variance *σ*
^{2}. With either of these parametric models, concentration estimation errors may be estimated using propagation of error, also known as the delta method.

To choose between competing candidate models, a number of measures exist for evaluating model fit when replicate observations of each assay are available. These include partitioning the mean squared error, or MSE, into components representing pure error and lack of fit [28], and penalized measures such as Akaike (AIC) and Bayesian (BIC) information criteria [29]. We also examine the *PRESS* statistic, a direct measure of the predictive capability of each candidate model [30].

To calculate the *PRESS* statistic for each candidate model, suppose we exclude each poin (*x*
_{
j
}, *y*
_{
j
}) in turn and fit the model to the remaining points. We predict the value
at the excluded point *x*
_{
j
}and calculate the *PRESS* residual defined by *e*
_{
j, - j
}= *y*
_{
j
}-
. Then, the *PRESS* statistic is the sum of the squared *PRESS* residuals

The candidate model with the lowest *PRESS* score as the best predictive model to estimate concentrations.

The basic approach to estimating concentration errors with the propagation of error method has three steps [31]. First, fit intensity as a function of concentration and estimate the covariance among model parameter estimates. Next, solve the fitted function for concentration as a function of intensity. Finally, propagate error estimates from the fitted model through the inverted model and combine with the error estimate of the sampled spot intensity to estimate the concentration estimation error.

Let *C*(*I*|
), with
, denote the inverted *N* parameter model expressing concentration *C* as a function of intensity *I* and the parameter estimates
Suppose
is the *NxN* parameter covariance matrix estimated by fitting *I* as a function of *C*, say *I*(*C*|**P**). Now, let *C*
_{
s
}be the estimated concentration from the sample intensity estimate *I*
_{
S
}, say *C*
_{
S
}= *C*(*I*
_{
S
}/**P**) and
be the corresponding estimated standard error of *I*
_{
S
}. Then, the propagation of error estimate for the concentration estimate *C*
_{
S
}is the square root of the product of
, the sample covariance matrix augmented with
, and the Jacobian matrix *J* evaluated at *I*
_{
S
}and the parameter estimates
. In this application, the Jacobian is the matrix of partial derivatives of *C*(*I*|**P**) with respect to the intensity *I* and the parameters **P**. Hence, the concentration estimation error of *C*(*I*|**P**) is the square root of the concentration estimation variance *V*(*C*(*I*|**P**))

*V*(*C*(*I*|**P**)) = *J*(*C*(*I*|**P**))^{
T
}Σ*J*(*C*(*I*|**P**))

where the Jacobian is

*J*(*C*(*I*|**P**))^{
T
}= [∂*C*/∂*I*, ∂*C*/∂*P*
_{1},..., ∂*C*/∂*P*
_{
N
}]

and the augmented covariance matrix is

Hence, the formula for estimated standard error of *C*
_{
S
}is

For a given intensity estimate *I*
_{
S
}and standard error
, the estimated concentration and approximate 95% confidence interval (*C*
_{95%L
}, *C*
_{95%U
}) are

*C*
_{
S
}= *C*(*I*
_{
S
})

*C*
_{95%L
}= *C*
_{
S
}- 2*SE*[*C*
_{
S
}] (2)

and

*C*
_{95%U
}= *C*
_{
S
}+ 2*SE*[*C*
_{
S
}] (3)

For example, consider the four parameter logistic model, Eqn. 1. The concentration estimation equation is obtained by solving this equation for *C* in terms of *I* and the four parameters

The Jacobian matrix is obtained by taking the partial derivatives of the inverted four-parameter logistic function of *C* (Eqn. 4) with respect to *I* and the parameters *P*
_{1}, *P*
_{2}, *P*
_{3} and *P*
_{4}

### Diagnostic visualizations

A three-panel display combining a histogram of normalized sample spot intensities for a given antigen, its corresponding standard curve, and the graph of the concentration coefficient of variation, or relative error, versus concentration provides pertinent information about the conduct of the current experiment as well as information to improve future experiments. The standard curve panel presents a scatterplot of normalized standard spot intensities versus standard concentrations. The scatterplot is overlain with the estimated standard curve expressing concentration as a function of spot intensity. This panel also includes approximate 95% confidence intervals. These intervals summarize the uncertainty in concentration estimates due to both the uncertainty in estimating the standard curve and the uncertainty in the sample spot intensity estimate. Finally, a highlighted region helps distinguish concentration estimates s with acceptable errors from concentration estimates with possibly less than acceptable errors.

The segment of the standard curve corresponding to acceptable concentration errors may be determined using the 95% confidence intervals. The lower and upper endpoints of this segment, (*I*
_{
L
}, *C*
_{
L
}) and (*I*
_{
U
}, *C*
_{
U
}), are the two points such that the confidence intervals begin to increase significantly in length. This segment generally corresponds to the linear segment of a standard curve. We identify the intensity *I*
_{
L
}of the lower pair as the smallest intensity such that 95% UB(*I*
_{
L
}) is less than 95% UB(*I*) for intensity values *I* less than *I*
_{
L
}. Similarly, we identify *I*
_{
U
}as the largest intensity such that 95% LB(*I*
_{
U
}) is greater than 95% LB(*I*) for intensity values *I* greater than *I*
_{
U
}. We define *C*
_{
L
}and *C*
_{
U
}to be *C*
_{
L
}= *C*(*I*
_{
L
}) and *C*
_{
U
}= *C*(*I*
_{
U
}), respectively. We believe that this is a conservative approach to identifying intensities that generate concentration estimates with acceptable errors.

An informative visualization of acceptable concentration estimates may be generated using the points (*I*
_{
L
}, *C*
_{
L
}) and (*I*
_{
U
}, *C*
_{
U
}). Consider the union of the two rectangular regions defined by the two sets of vertices [(*I*
_{
L
}, 0), (*I*
_{
L
}, *C*
_{
L
}), (*I*
_{
U
}, *C*
_{
U
}), (*I*
_{
U
}, 0)], and [(0, *C*
_{
L
}), (0, *C*
_{
U
}), (*I*
_{
U
}, *C*
_{
U
}), (*I*
_{
U
}, *C*
_{
LU
})]. This union defines an L-shaped region covering the standard curve segment and bound at its extremes by the intensity and concentration segments. From this visualization, one can quickly grasp the dynamic range of acceptable intensities and the potential range of acceptable concentration estimates.

In regard to this first panel, two notable aspects of this propagation of error methodology are noteworthy. First, the error bands are computed pointwise and provide reasonable error estimates for a small number of concentrations. As the number of concentration estimates grows, the impact of the multiple testing problem grows [32]. This a problem in any biomedical testing that features numerous simultaneous tests and has spawned considerable debate and research. The second aspect of note is the divergence of the error bands from the estimated standard curve as the standard curve approaches a horizontal asymptote. We see this apparent deficiency in the method as a plus. This divergence is a clear indicator that concentration estimates in the segment of a standard curve approaching a horizontal asymptote are highly suspect.

The second panel in this display shows the concentration coefficient of variation – that is, %*CCV* = 100 * *SE*(*C*|*I*)/*C*(*I*), or relative error of a concentration estimate – as a function of concentration. This provides an alternative view of the error in concentration estimation over the concentration range covered by the concentration estimation equation. A standard curve modeled with a four-parameter logistic function generally will have a bathtub shape due to the increasing uncertainty in concentration estimates at the two ends of the concentration range where the curve approaches horizontal asymptotes.

The third panel in this display features an annotated histogram of sample spot intensity estimates on the intensity axis opposite the scatterplot. In this representation, it is easy to see the extent of overlap between the distribution of sample intensity estimates and the range of intensities that result in concentration s estimates with acceptable errors.