Variance component analysis to assess protein quantification in biomarker validation: application to selected reaction monitoring-mass spectrometry

Background In the field of biomarker validation with mass spectrometry, controlling the technical variability is a critical issue. In selected reaction monitoring (SRM) measurements, this issue provides the opportunity of using variance component analysis to distinguish various sources of variability. However, in case of unbalanced data (unequal number of observations in all factor combinations), the classical methods cannot correctly estimate the various sources of variability, particularly in presence of interaction. The present paper proposes an extension of the variance component analysis to estimate the various components of the variance, including an interaction component in case of unbalanced data. Results We applied an experimental design that uses a serial dilution to generate known relative protein concentrations and estimated these concentrations by two processing algorithms, a classical and a more recent one. The extended method allowed estimating the variances explained by the dilution and the technical process by each algorithm in an experiment with 9 proteins: L-FABP, 14.3.3 sigma, Calgi, Def.A6, Villin, Calmo, I-FABP, Peroxi-5, and S100A14. Whereas, the recent algorithm gave a higher dilution variance and a lower technical variance than the classical one in two proteins with three peptides (L-FABP and Villin), there were no significant difference between the two algorithms on all proteins. Conclusions The extension of the variance component analysis was able to estimate correctly the variance components of protein concentration measurement in case of unbalanced design. Electronic supplementary material The online version of this article (10.1186/s12859-018-2075-8) contains supplementary material, which is available to authorized users.


Background
In the recent years, there has been a growing interest in using high throughput technologies to discover biomarkers. Because of the random sampling of the proteome within populations and the high false discovery rates, it became necessary to validate candidate biomarkers through quantitative assays [1]. ELISAs (Enzyme-Linked Immunosorbent Assays) have high specificities (because they often use two antibodies against the candidate biomarker) and high sensitivities that allow quantifying some biomarkers in human plasma. However, the limits with ELISA are the restricted possibility of performing multiple assays, the unavailability of antibodies for every new candidate biomarker, and the long and expensive developments of new assays [2]. The absolute quantification of protein biomarkers by mass spectrometry (MS) has naturally emerged as an alternative [3]. Eckel-Passow et al. [4] have discussed the difficulties of achieving good repeatability and reproducibility in MS and expressed the need for more research dedicated to proteomics data, including signal processing, experimental design, and statistical analysis.
In selected reaction monitoring (SRM, a specific form of multiple reaction monitoring, MRM) [5], the issues are somewhat different and offer the opportunity to use variance component analysis to investigate repeatability, reproducibility, and other sources of variability [6]. However, when the data are unbalanced (unequal number of observations in all possible factor combinations), classical methods cannot estimate correctly the various sources of variability, particularly in presence of interaction.
The present paper proposes an extension of the variance component analysis via the adjusted sum of squares that estimates correctly the various sources of variability of protein concentration on unbalanced data. This analysis is applied with an experimental design that uses a serial dilution to generate known relative protein concentrations and allows for a few sources of variation. Two processing algorithms, a classical and a more recent one (namely, NLP and BHI, respectively) are used to estimate protein concentration.
This analysis allowed an initial investigation of the performance of the new algorithm and a first comparison with the classical algorithm. In addition, the results given by the two algorithms are compared with those obtained by ELISA.

Experimental design
The experimental design is shown in Fig. 1. From each aliquot, two vials of 125 μL were taken for separate digestions. Labelled AQUA internal standards were added immediately before SRM-MS analysis then two injections (readings) were performed on each vial. SRM readings of the 24 aliquots (6 samples × 2 digestions × 2 injections) were carried out over 4 couples of days. Each set of samples had to be "read" over a couple of days because of equipment-related constraints (SRM does not allow analyzing all the samples in a single day). To avoid unexpected or uncontrolled biases, sample reading was made at random and two chromatographic columns were alternately used (for more details on SRM, see Additional file 1).
In the methodology associated with BHI algorithm, we used quality control (QC) measurements made daily before peptide reading to estimate the digestion yield. In parallel, each "extra" aliquot of dilution 1/4 was used on one reading day as QC measurement. From each "extra" aliquot, two vials of 125 μL were taken for digestion. The two vials were passed one at the start and the other at the end of the day then each vial was injected two times leading to the calibration of four digestion yields per day. The estimation of protein concentration with BHI algorithm on a given day changes according to the digestion yield estimated the same day.
For ELISA measurement of Liver-Fatty Acid Binding Protein (L-FABP), each sample provided five replicates. Each replicate was read four times leading to 20 readings per sample. Sample readings were made at random.

The BHI algorithm
The Bayesian Hierarchical Algorithm (BHI) is based on a full graphical hierarchical model of the SRM acquisition chain which combines biological and technical parameters (Fig. 2a, Tables 1 and 2).
To estimate all these parameters, two calibrations are required: the use of quality control (QC) samples measured each day for calibration (at protein level) and the use of AQUA peptides for calibration (at peptide level) (see section "Experimental design"). This set of measurements  leads to a set of equations that captures the links between the unknown latent variables and parameters to estimate and the known SRM measurement. Estimating a protein concentration requires estimating, at the same time, the technical parameters included in the model. Table 1 shows all the parameters and variables involved in the description of the SRM analytical chain model. Let θ tech be the set of latent technical parameters that describes the SRM acquisition chain: Table 2 shows the hierarchical model that links protein concentration y to the native transition signals I of native peptides and the labeled transition signals I * of AQUA peptides.
The BHI algorithm has to solve the inverse problem and compute protein concentration y and technical parameters θ tech . This problem is solved in a Bayesian framework [8][9][10][11][12][13][14][15]. Table 3 shows the distribution type used for each variable in this Bayesian framework.
To estimate together the protein concentration and the parameters, we used the native transition signals I with the labeled transitions signals I*. Regarding the labeled signal, the peptide concentration is known but the transition gains and the inverse variance of the noise in the AQUA signal have to be estimated.
Using the distributions defined in Table 3, the full a posteriori distribution p(y, θ tech | I, I * ) can be approximated as follows: The protein concentration and the parameters are estimated by the expectation of this a posteriori (EAP) distribution. This EAP is defined as follows: Computing the EAP is achieved with methods based on Markov Chain Monte-Carlo (MCMC) procedure and hierarchical Gibbs structure. The algorithm performs sequentially a random sampling of each parameter (y, θ tech ) from the a posteriori distribution and conditionally on the previously sampled parameters, and iterates. The parameters are sampled in the following order: κ, τ, λ, ξ, ϕ * , γ, γ * . In the case of a Normal distribution, the sampling is achieved knowing explicitly the mean and the inverse variance of the distribution. In the case of a uniform distribution, the sampling is achieved using one iteration of a Metropolis-Hastings random walk. After a fixed number of iterations, the algorithm computes the empirical mean of each parameter after a warm-up index. This index defines the number of iterations at convergence towards the a posteriori distribution.
Here, we supposed that the digestion yields are known. With BHI, we have introduced a protocol for estimating the digestion yields. We used the control signals measured on the quality-control sample. We assumed that the digestion factors d ip defined by the number of peptides i present in protein p are known. The digestion yield g ip is defined by the correction factor to apply to the digestion factor to obtain ratio peptide/protein concentration. Note here that a matrix formulation allows handling non-proteotypic peptides shared by several proteins. The Control signals combine both native transition signals I QC and labeled transition signals I * QC . According to the above-described Bayesian algorithm, the unknown becomes the digestion yield of each peptide instead of the protein concentration. Here too, estimating the EAP calls for a MCMC algorithm with Digestion factor defined by the number of peptides i present in protein p i = 1, …, S p = 1, …, P g ip Digestion yield defined by the correction factor to apply to the digestion factor d ip to obtain ratio peptide/protein concentration Peptide to fragment gain correction factor for AQUA peptide hierarchical Gibbs structure. This calibration is done once for each calibration batch selecting one quality control measurement. This process may be generalized to the cases where several quality control measurements are available by combining within the EAP computation the information delivered by each measurement. The BHI algorithm includes an automated selection to initialize the peak position that is based on the set of transitions associated with each peptide. It computes the product of the traces and searches for the position of the maximum value on this product. This way, only the peaks present in all traces are detected.
Algorithm BHI involves a fusion of the information delivered by all traces. This improves the algorithm robustness when the number of traces is large. In fact, generally, processing algorithms for protein quantification are most performant with proteins of ≥3 peptides and peptides with ≥3 transitions [16].

The NLP algorithm
The NLP algorithm (Fig. 2b) is based on the median value, over all transitions, of the log-transformation of ratio native transition peak area/labeled transition peak area. This algorithm is derived from a gold standard algorithm used for oligonucleotide array analysis [17]. The peaks are detected by MultiQuant™ software (AB Sciex, France). These peaks are checked by an operator who decides whether a signal of the labeled internal peptide standard AQUA does not make sense and should be Resulting signals of selected children of targeted peptide a G l ðκ; ξ; τ; λÞ ¼ Bold notation stands for vectors considered as missing or whether a too low or absent signal of the native transition should be assigned value 0.
The NLP algorithm uses, as input data, a normalized and log-transformed quantity t defined by: t ¼ Lnð1 þ I I Ã Þ where I represents the area under the peak of a given native transition and I* the area under the peak of the labeled transition.

Statistical modeling and analysis
In this article, the performance of each algorithm in SRM and the performance of ELISA were defined as the ability to find the concentrations generated by serial dilution. This ability was estimated by the linear slope and the variance decomposition of the linear model that links the measured to the theoretical protein concentration generated by dilution. The best performance corresponds to the highest part of dilution variance (explained by the dilution) and the lowest part of technical variance (explained by the measurement error and lab procedures).
Only proteins that have a correlation coefficient ≥ 0.7 between theoretical and measured concentration with either NLP or BHI algorithm were selected for the statistical analysis.

Linearity analysis
For each algorithm and each protein, a linear regression model was built to link the protein concentration y with the theoretical protein concentration x. A log2 transformation of the measurements was applied to stabilize the variance. Because of the two-fold dilution, the log2 transformation was applied to x and y. With this transformation, the regression line is expected to have a slope close to 1.
Because the reading on a given couple of days may influence the relationship between the measured and the theoretical concentration of each protein, the model included a slope and an intercept for each day-couple; this comes to include an interaction term between protein concentration and day-couple. A fixed effects model was applied and 'sum to zero contrasts' were used to obtain estimations of the mean intercept and the mean slope as follows: i, j, and r correspond respectively to the sample, the day-couple, and the digestion-injection step. Parameters β 0 and β 1 are respectively the mean intercept and the mean slope of the regression line between the log2 values of the measured protein concentrations and the log2 values of the theoretical protein concentrations, β 0j and β 1j being, respectively, the two-day-reading effects on the mean intercept and the mean slope. D is for a day-couple.
In parallel, a log2 transformation was applied to ELISA measurements too. These measurements were then analyzed by a linear model (Model 1E) that included the theoretical concentration x, the reading order T, and the interaction between them: i, j, and r correspond, respectively, to the sample, the reading order, and the replicate.

Variance decomposition
In this work, the data processed by the NLP algorithm included null intensities and missing values (see Additional file 2). These values were excluded after log2 transformation. As their number was unequal between the couple of day readings, the data were considered unbalanced.
To quantify the components of the variance, we calculated adjusted sums of squares by comparing complete Model 1S with each of its nested models. The nested models are shown below: Model 2S included only the effect of the theoretical concentration, Model 3S only the effect of the two-day measurement, and 4S both effects without interaction between them: Table 4 and Fig. 3 present the components of the analysis of variance. The dilution variance and its interaction with the two-day measurement effect, was calculated as the difference between Model 3S and Model 1S residual sums of squares. The lab procedure variance corresponds to the variance explained by the two-day measurement effect and its interaction with the theoretical concentration was calculated as the difference between Model 2S and Model 1S residual sums of squares. The variance explained by the sole interaction between the theoretical concentration and the two-day measurement was calculated by the difference between Model 4S and Model 1S residual sums of squares.
The residual variance was split into two components [18]: 1) the measurement error due to instrumental and algorithmic errors, which was calculated as the sum of the squares of the differences between the injection replicate values and their mean, and 2) the lack of fit of the model.
For ELISA, the same analysis of variance was applied to Model 1E. Each component of the sum of squares was divided by the total sum of squares and expressed as a percentage. This helped comparing the three methods (ELISA and the two processing algorithms for SRM).
Two Wilcoxon signed-rank tests were used on all proteins to test, first the difference between the parts of dilution variance then between the parts of technical variance given by the two processing algorithms. These two tests are not independent and correspond to a single test in case of absence of interaction.

Results
Among all results obtained for all protein reads, the correlation coefficient between the theoretical concentration and the measured protein concentration was ≥0.7 in 9 out of 21 proteins: L-FABP, 14.3.3 sigma, Calgi, Def.A6, Villin, Calmo, I-FABP, Peroxi-5, and S100A14 (Additional files 3 and 4). The correlation coefficient was ≥0.7 with BHI and NLP in proteins 14.3.3 sigma, Calgi, Def.A6, and Villin. This coefficient was ≥0.7 with NLP only in Calmo, I-FABP, Peroxi-5, and S100A14 and with BHI only in L-FABP. Table 5 and Additional file 5 summarize the analysis of variance in each linear model relative to each of the 9 above-cited proteins. Table 5 shows also the mean slopes of these linear models.

Linearity analysis
For L-FABP and Villin, the BHI algorithm gave a higher dilution variance and a lower technical variance than the NLP algorithm. In addition, the mean slope of Model 1S was closer to 1 with the BHI algorithm than with the NLP algorithm.
The BHI algorithm gave various results with the other proteins that have less than three peptides. For 14.3.3 sigma and Calgi, the BHI algorithm gave a higher dilution variance and a lower technical variance than the NLP algorithm. Def.A6 gave similar results with both algorithms. The BHI algorithm gave lower dilution variance and higher technical variance than the NLP algorithm with Calmo, I-FABP, Peroxi-5, and S100A14. Moreover, 14.3.3 sigma, and Calgi, Calmo, I-FABP, Peroxi-5, and S100A14, the mean slope of Model 1S was closer to 1 with the NLP than with the BHI algorithm.
But, on all proteins, the dilution variances and the technical variances were not significantly different between the two algorithms (p-values = 0.35 in both comparisons with Wilcoxon signed ranks test).
With L-FABP, ELISA gave higher dilution variance and lower technical variance than SRM with the two algorithms in terms of dilution variance and technical variance. Besides, the mean slope of Model 1E was closer to 1 than the mean slopes obtained with Model 1S with either the BHI or the NLP algorithm.

Technical variance components
The part of the measurement error was the highest part of the technical variance with the BHI with L-FABP, 14.3.3 sigma, Calgi, Def.A6, and Villin and with the NLP with Calmo, I-FABP, Peroxi-5, and S100A14. The other components of the technical variance (i.e., the two-day measurement and the interaction between this measurement and the theoretical concentration) included a variability of the intercept and a variability of the slope of the regression lines relative to the 4 day-couples.  5 show the relationships between the theoretical and the measured protein concentrations on the log2-log2 scale for the 9 proteins with algorithms NLP and BHI, respectively. Figure 6 shows the relationship between the theoretical and the measured L-FABP concentration with ELISA. In these figures, for L-FABP, 14.3.3 sigma, Calgi, Def.A6, and Villin, the four regression lines relative to the 4 day-couples were more grouped with BHI than with NLP. This means that the part of the variance due to the two-day measurement process and the interaction between this process and the theoretical concentration is smaller with BHI than with NLP (also shown in Table 5). For Def.A6, this part of the variance was very low with both algorithms; thus, the part due to the measurement error was the highest part of the technical variance.
In Figs. 4 and 5, for Calmo, I-FABP, Peroxi-5, and S100A14, the four regression lines relative to the 4 daycouples were more grouped with NLP than with BHI. This means that the part of the variance due to the twoday measurement process and the interaction between this process and the theoretical concentration is smaller with NLP than with BHI.

Discussion
The present article proposes an extension of variance component analysis via adjusted sums of squares by estimating correctly the various sources of variability on unbalanced data. This analysis allows estimating separately the dilution variability and the technical variability. In an application to protein concentration estimation by two processing algorithms (NLP and BHI), this extension allowed algorithm performance quantification and comparison. The results showed that the performance of each algorithm as reflected by the dilution and the technical variance depended on the protein and that, on all proteins, there were no significant difference between the two algorithms.
Other statistical modeling frameworks were proposed for protein quantification in SRM experiments. SRMstats [19] uses a linear mixed-effects model to compare distinct groups (or specific subjects from these groups). Its primary output is a list of proteins that are differentially abundant across settings and the dependent variables are the log-intensities of the transitions. Here, a simple linear model was used to find the theoretical protein concentrations generated by serial dilution, the primary outputs are the components of the variance (essentially, the variance component explained by the serial dilution) and the dependent variable is the protein concentration estimated by the quantification algorithm on the basis of the ratio of native to labeled transitions (see parts "The BHI algorithm" and "The NLP algorithm"). In the publication of Xia et al. [6], the reproducibility of the SRM experiment was assessed by decomposing the variance into parts attributable to different factors using mixed effects models. The sequential ANOVA was used to quantify the variance components of the fixed effects. However, when the data are unbalanced, the sequential ANOVA cannot correctly estimate the different parts of the variance: with balanced data, one factor can be held constant whereas the other is allowed to vary independently. This desirable property of orthogonality is usually lost with unbalanced data which generated correlations between factors. With such data, the use of adjusted sums of squares (Type II and Type III sum of squares in some statistical analysis programs) [20][21][22][23] is then an appropriate alternative to the sequential sums of squares. With Type II, each effect is adjusted on all other terms except their interactions; thus one limitation is that this approach is not applicable in the presence of interactions. With Type III, each effect is adjusted on all other terms including their interactions but one major criticism is that some nested models used for estimating the sums of squares are unrealistic [24] because these  Table (A, B and C), the model that includes interaction A*B*C does not include all effects A, B, C). Here, we propose an approach that responds to this criticism.
In SRM protein quantification, the BHI algorithm revealed a higher part of the whole dilution variance and a lower part of the whole technical variance vs. NLP with L-FABP and Villin, the only two proteins that have three peptides. This is one limitation of the present study because assessing the performance of the BHI algorithm requires other proteins with three peptides or proteins with more than three peptides.
In comparing the two algorithms using proteins with less than three peptides, the difference in measurement results may be explained by differences in the proprieties of these algorithms. Firstly, the NLP algorithm is supervised and assesses the quality of both the native and the labeled transition before estimating their ratio (spectrum visualization by an operator) whereas the BHI algorithm is automatic and gives directly an estimation of this ratio, including a weighting of each transition according to the estimated level of noise. When the signals of the labeled transition are not detectable, the measures do not make sense and are considered missing for the NLP algorithm but give incorrect values with the BHI algorithm. For Peroxi-5 and S100A14 proteins, 27% and 36% of the values of the labeled transitions measured by the NLP algorithm (see Additional file 2) were discarded by the operator but read by the BHI algorithm. This can be the reason for which these proteins had bad results with the BHI algorithm. Thus, it would be interesting to compare the two algorithms only on spikes selected by the operator.
The BHI algorithm (but not the NLP) allows for the variability stemming from the SRM pre-analytic step (precisely, the digestion step) by using QC samples to estimate the daily digestion yield. This may reduce the variability due to the two-day measurement process, but not systematically; in the presence of endogenous proteins with fragments, the ratio of transitions (native to labeled transition) may be altered, which leads to incorrect estimations of the digestion yields. Other strategies to monitor the digestion step should then be used to estimate correctly the digestion yield, such as the addition of Protein Standard Absolute Quantification, PSAQ [25]. Spiking isotopically labelled proteins has the advantage that incomplete or unspecific digestion does not corrupt the results; this corruption may occur when labelled AQUA internal standards are used.
In each algorithm, when a linear relationship was not clearly observed (< 0.70 correlation coefficient between the theoretical concentration and the measured protein concentration), it was assumed that the protein concentration was below the analytical limit of detection. Actually, the SRM sensitivity depends not only on the protein amount but also on the tryptic peptide sequence and the matrix effect [26].