Analyzing 2D gel images using a two-component empirical bayes model
- Feng Li^{1, 2}Email author and
- Françoise Seillier-Moiseiwitsch^{3}
https://doi.org/10.1186/1471-2105-12-433
© Li and Seillier-Moiseiwitsch; licensee BioMed Central Ltd. 2011
Received: 4 May 2011
Accepted: 8 November 2011
Published: 8 November 2011
Abstract
Background
Two-dimensional polyacrylomide gel electrophoresis (2D gel, 2D PAGE, 2-DE) is a powerful tool for analyzing the proteome of a organism. Differential analysis of 2D gel images aims at finding proteins that change under different conditions, which leads to large-scale hypothesis testing as in microarray data analysis. Two-component empirical Bayes (EB) models have been widely discussed for large-scale hypothesis testing and applied in the context of genomic data. They have not been implemented for the differential analysis of 2D gel data. In the literature, the mixture and null densities of the test statistics are estimated separately. The estimation of the mixture density does not take into account assumptions about the null density. Thus, there is no guarantee that the estimated null component will be no greater than the mixture density as it should be.
Results
We present an implementation of a two-component EB model for the analysis of 2D gel images. In contrast to the published estimation method, we propose to estimate the mixture and null densities simultaneously using a constrained estimation approach, which relies on an iteratively re-weighted least-squares algorithm. The assumption about the null density is naturally taken into account in the estimation of the mixture density. This strategy is illustrated using a set of 2D gel images from a factorial experiment. The proposed approach is validated using a set of simulated gels.
Conclusions
The two-component EB model is a very useful for large-scale hypothesis testing. In proteomic analysis, the theoretical null density is often not appropriate. We demonstrate how to implement a two-component EB model for analyzing a set of 2D gel images. We show that it is necessary to estimate the mixture density and empirical null component simultaneously. The proposed constrained estimation method always yields valid estimates and more stable results. The proposed estimation approach proposed can be applied to other contexts where large-scale hypothesis testing occurs.
Keywords
Background
Complementing functional genomics, proteomics deals with the large-scale analysis of proteins expressed by a tissue under specific physiological conditions. A broad range of technologies are used in proteomics, but the central paradigm has been the use of a method for separating mixtures of proteins followed by identification of protein by mass spectrometry (MS). Two-dimensional polyacrylomide gel electrophoresis (2D PAGE, 2D gel, 2-DE) very popular, despite the availability of other powerful separation techniques. With 2D PAGE [1], proteins are separated in one dimension according to their molecular mass and in the orthogonal dimension according to their isoelectric charge. In theory, each protein is uniquely determined by its location along the two dimensions of separation. The separated proteins are then stained with fluorescent dyes so that they are amenable to imaging. Proteomic differences across multiple samples can be studied by comparing the expression profiles across sets of gels.
The main steps in differential analysis of two-dimensional gels involve image de-noising, spot detection, spot quantification, spot matching and statistical analysis, which were discussed in detail in [2]. Unlike the analysis of microarray data, the statistical differential analysis of 2D gel images is still in its infancy. The main difficulties are the discrimination between actual protein spots and noise, the quantification of protein expression levels thereafter, and spot matching for individual comparison. Although there are commercial software packages for 2D gel image analysis (e.g. PDQuest, Dymension), considerable human intervention is required for spot matching. Spot matching is the process by which one maps a spot on a particular gel to the corresponding spots on the other gels so that spots corresponding to the same protein are identified. With a larger number of images, this step becomes increasingly problematic as fewer spots are matched and the analysis is performed on sparser data [3]. Moreover, in available software packages, the comparison of the quantitative features is based on classical tests, such as the t-test or the F-test. Attempts have been made to avoid image segmentation and spot quantification. Models based on image pixels [4] are not practical given the huge number of pixels, high variation in the background intensity and sensitivity to misalignment.
Recently, academic software was developed to cope with difficulties in the analysis pipeline including protein spot detection, quantification and spot matching [3, 5, 6]. To improve the spot-detection results and avoid spot matching, the methods in [3, 6] utilize the mean gel image as the template for locating spots. The pinnacle method [3] uses a fixed window for spot detection, quantification and background separation. The approaches in [5, 6] rely on the watershed transform [7] for spot segmentation and quantification. The RegStaGel software [6] provides advanced statistical tools. Comparison of different software for protein spot quantification is beyond the scope of the current paper. We shall focus on the statistical analysis, assuming that spot quantification has been performed appropriately. For convenience, we employ RegStatGel [6] to obtain spot quantification for statistical analysis of the set of gel images considered in this paper.
Since hundreds or thousands of proteins are usually featured on a gel, once proteins are quantified, we are faced with a large-scale hypothesis-testing problem. The RegStatGel software [6] applies the Benjamini-Hochberg (BH-FDR) procedure [8] in combination with multivariate analysis for identifying significantly changed proteins. The BH-FDR procedure is widely used for selecting the p-value threshold to control the false discovery rate (FDR). Under the assumptions that tests are independent or weakly dependent and the null distribution of the p-values is uniform, the BH-FDR procedure controls the false-discovery rate at a given level. But in practice, these two assumptions are often invalid. Strong dependence usually exists, especially in the field of genomics and proteomics [9], where the dependencies themselves are actually also of interest. Considerable effort has been dedicated to the estimation of the proportion of true null hypotheses and of the false discovery rate at a given p-value threshold [10–19]. The empirical Bayes methodology and closely related methods exploiting a two-component mixture model [10, 15, 20, 21] represent typical examples of such effort. The two-component EB models assumes that a test statistic follows either the null or the non-null distribution.
It has been commonly assumed that the null distribution of the test statistics follows some distribution theoretically. However, Efron [12–15] pointed out that in large-scale hypothesis testing the theoretical null distribution often does not hold for reasons including incorrect model assumption, unobserved covariates and correlations among test statistics. It is more appropriate to estimate the null density of the test statistics directly from the data instead of using the theoretical null density. Using the two-component empirical Bayes (EB) model, Efron [12–15] proposed to estimate the mixture density from the entire histogram and the null component from data around the central peak of the mixture density. The two-component EB model aims at separating a small subset interesting cases from a large group of uninteresting cases. Efron's innovative concept and estimation approach have been throughly discussed [22–26]. The locfdr R package [27] was developed to estimate the two-component model using Poisson regression and computing the local false discovery rate (FDR).
Two methods [12, 15] were proposed to estimate the null component. One is based on finding an optimal normal approximation to the mixture density around the central peak of the histogram, and the other on maximum-likelihood estimation. In both methods, the mixture density and the null component are estimated separately. The estimation of the mixture density does not take assumptions about the null density into account. Thus, there is no guarantee that the estimated null component is no greater than the mixture density over the entire domain. The two approaches may result in the estimated local FDR having multiple peaks or its being greater than 1 [25]; neither is desirable. We present a modified estimation method for the two-component EB model: the null and the mixture densities are estimated simultaneously with a necessary constraint, which can be achieved with a constrained iteratively re-weighted least squares (IRLS) algorithm. The proposed methodology is applied to the analysis of a set of 2D gel images from a factorial experiment. Simulation studies are conducted to further validate and investigate the performance of the proposed approach.
Methods
Data
To investigate the effect of nicotine exposure on the proteome of spleen cells of female and male rats, a 2 × 2 factorial design with gender and treatment (nicotine exposure) factors was used with 3 rats in each experimental group. Spleen cells from the control and treated rats were harvested on post-natal day 65 and then cultured in the presence of convanavalin A. After 4 days in culture, cell pellets were lysed and solubilized directly in rehydration buffer. Lysates were aliquoted and stored frozen at -80°C. Samples were thawed and 20 μ g protein from each sample applied to a pH 4-7 immobilized pH gradient strip (IPG; Amersham Biosciences/GE Healthcare) by overnight rehydration. Isoelectric focusing was performed using an IPGphor IEF system (Amershan Biosciences/GE Healthcare) with voltage increased gradually from 500 to 800 V and then kept constant at 8000 volts for 4 hours. Separation in the second dimension was performed on 12.5% Excel prepared gels specifically made for the Multiphor II apparatus (Amersham Biosciences/GE Healthcare) and run at 40 mA for 35 minutes followed by 100 mA for 1.25 hours. Gels were silver stained (Amersham Plus One silver stain kit) and imaged using a UMACS Power Look 3 scanner (Amersham).
Figure 1 shows four images, each from a different experimental group. The top row has examples of control rats and the bottom row of rats exposed to nicotine. The left column has examples for female rats and the right column of male rats. First, the images were aligned using the algorithm described in [28]. After alignment, boundaries for the interesting portion of the images were set and the region outside these boundaries was cropped.
The objective is to find proteins that changed in quantity under exposure to nicotine or show a gender effect. The next steps would be to determine the genomic sequence of the differrentially-expressed proteins by mass spectrometry and to refer these sequences to a database of protein sequences in order to identify them and investigate their functions.
The proteins were detected and quantified using the the default settings of the RegStatGel software [6, 29]. Specifically, the watershed algorithm was applied to the mean image to generate a master watershed map which is then imposed onto each individual gel image. Each watershed region contains a single object, either a single spot or an aggregate of two spots 9a seldom occurence). The pixels in each region are then classified as either belonging to the object or to the background using Otsu's method [30]. The mean intensity difference between the object and background serves as a summary statistic for each region and therefore for each protein (or aggregate), and is used for comparison across images. The RegStaGel software is fast, easy to use and has comparable performance to commercial software packages [29]. Note that other free programs such as Pinnacle [3] can also be used for protein quantification.
where S_{ i } is the pooled sample variance and t_{ i } follows the t-distribution with df = 4(n - 1) degrees of freedom under the null hypothesis that τ_{ i } = 0. The test statistics for the gender and interaction effects follow the same t-distribution under the null hypothesis. Let z_{ i } = Φ^{-1}(F_{ df }(t_{ i })), where F_{ df } is the cumulative t_{ df } distribution. Theoretically, under the null hypothesis, z_{ i } follows the standard normal distribution.
Two-component Empirical Bayes Model
where p_{0} is the prior probability that z_{ i } complies with the true null hypothesis, f_{0}(z_{ i }), is the null density and f_{1}(z_{ i }) is the density under the alternative hypothesis. This model is very popular in the literature on differential analysis of microarray data, where most authors assume the null density is the theoretical null density.
To estimate the local FDR, we must estimate the unknown p_{0}, f_{0}, f. Theoretically, f_{0} should be the N(0, 1) density. However, for many reasons, this theoretical null density may not be valid in practice. For example, strong correlations among tests or covariates unaccounted for in the model will invalidate the usual assumptions [12–15]. Moreover, when the majority of tests show small effects, it is sounder to select the relatively more interesting effects by comparing larger effects to smaller effects rather than to the theoretical zero effects. Therefore, it is more appropriate to estimate the null density of the test statistics directly from the data instead of using the theoretical null distribution.
where Δ is the width of the bin and N is the total number of tests. log(ν_{ j }) can be modeled using a polynomial function at x_{ j } or a natural cubic spline and estimated using standard generalized linear models (GLM) for Poisson observations.
Efron's estimation methods for the empirical null distribution
Both the central matching (CME) and the maximum likelihood (MLE) methods of estimation are implemented in the locfdr R package [15, 27]. MLE is somewhat more stable but can be more biased than CME. Efron [12] shows that CME yields nearly unbiased estimates.
Central matching
p_{0}, δ, and σ can be estimated from ${\widehat{\beta}}_{0},{\widehat{\beta}}_{1}$, and ${\widehat{\beta}}_{2}$. The local FDR at z is then estimated by $\hat{fdr}\left(z\right)=\hat{{p}_{0}{f}_{0}}\left(z\right)\u2215\widehat{f}\left(z\right)$. The quadratic curve is obtained by finding a least-squares approximation to the estimated $log\left(\widehat{f}\left(z\right)\right)$ using bins in a selected interval [a, b] containing null z_{ i }'s.
Maximum likelihood estimation
where ϕ denotes the normal density. The estimates of p_{0}, δ, and σ can be obtained by maximizing this likelihood.
Constrained Estimation Approach
where q(x; β) is a quadratic function with parameter β.
where L(m_{ j }, x_{ j }; θ) = -exp{s(x_{ j }; θ)} + m_{ j } s(x_{ j }; θ). L(m_{ j }, x_{ j };θ) is the Poisson log likelihood for bin j, omitting the constant term unrelated to the parameter θ. q(x; β) is the best quadratic approximation to s(x; θ) based on bins in [a, b]. To solve this, the parameter β must be expressed as a function of θ. Below, we show how to re-write the constraint in terms of the spline parameter θ.
where Γ, a J × D matrix, has entry in row j and column d Γ(j, d) = B_{ d }(x_{ j }). Similarly, we denote the values of the spline at bins in [a, b] in a vector form as S_{0}(θ) = Γ_{0}θ, where Γ_{0} is the corresponding sub-matrix of Γ. S_{0}(θ) approximates the null component of the mixture density. Let q(x; β) = ω(x)'β be a quadratic function, where ω(x) = [1, x, x^{2}]' and β = [β_{1}, β_{2}, β_{3}]'. The values of the quadratic function at all bin midpoints can be written in a vector form as Q(β) = Ωβ, where Ω is the J × 3 matrix with j th row as ω(x_{ j }) for j = 1, ..., J. Similarly, we denote the values of the quadratic function at bin midpoints n [a, b] as Q_{0}(β) = Ω_{0}β, where Ω_{0} is the submatrix of Ω corresponding to bins in [a, b].
The above problem is solved by means of non-linear programming. A simple computational algorithm for estimating the null and mixture densities is to modify the iteratively reweighted least-squares (IRLS) procedure [32] for Poisson regression by adding the constraint to the weighted least-squares regression. The IRLS algorithm converges very fast, based on our experience.
The pseudo code for the modified IRLS algorithm is as follows:
/* Initialization of deviance Dev and oldDev */
Dev = 100000, oldDev = 0
/* Initialization of estimation of ν_{ k } */
Where (|Dev-oldDev| > tolerance)
{
/* Update weights */
w_{ j } = ν_{ j }
/* Constrained weighted regression */
ν_{ j } = exp{s(x_{ j }; θ)}
/* Update Poisson deviance */
oldDev = Dev
Dev = 2Σ{m_{ j } log(m_{ j }) - m_{ j } - (m_{ j } log(ν_{ j }) - ν_{ j })}
}
The local FDR can then be estimated using $\hat{fdr}\left(z\right)=exp\left\{q\left(z;\widehat{\beta}\right)-s\left(z;\widehat{\theta}\right)\right\}$, where $\widehat{\beta}=({\Omega}_{0}^{\prime}{\Omega}_{0}{)}^{-1}{\Omega}_{0}^{\prime}{\Gamma}_{0}\widehat{\theta}$.
Results and Discussion
In this section, we implement the two-component EB model on the set of 2D gel images described previously. Both Efron's estimation approach and the proposed one will be applied for comparison. These approaches will be further compared using simulations.
Analyzing 2D Gel Images
At first, we analyze the z_{ i } values for the treatment, gender, and interaction effects using Efron's locfdr R package. The upper panel of Figure 2 shows the histograms (50 bins) of the corresponding z-values, the mixture density from the Poisson regression, and the null component estimated using CME and MLE. For estimation of the null component, we chose the intervals [-1.25, 0.25], [-2.5, -1.2] and [-0.5, 1.2] for the treatment, gender and interaction effects, respectively. The degrees of freedom of the splines were chosen to minimize the AIC criterion [33], which were 5, 10 and 10 respectively. The green solid curves in the upper panel of Figure 2 are estimates of the mixture densities from the Poisson regression. The blue dashed and red dotted curves in the upper panel represent the empirical null component estimated using the CME and MLE methods, respectively. The lower panel of Figure 2 shows the local FDR at different z-values based on the empirical null component from the CME (blue solid line) and MLE (red dotted line) methods. Figure 2 clearly conveys the message that the theoretical null, the standard normal density N(0, 1), is not appropriate for the proteomic data at hand. Taking the treatment effect as an example, the empirical null distribution is N(-0.595, 0.915^{2}) by CME and N(-0.48, 0.891^{2}) by MLE with proportions of true null hypotheses close to 1 for both, which indicates nicotine exposure effect affects similarly all proteins expressed by spleen cells. Clearly, the empirical null density is even further from its theoretical form for the gender effect. The central peak of the z-values is to the left of -1.
Comparing with Figure 2, we see that the proposed constrained estimation approach yielded results similar to those obtained with CME. However, now, the empirical null component is below the mixture density, and the local FDR estimate is no greater than 1, smooth and non-increasing at both tails. For treatment and interaction effects, the null proportion is nearly one, indicating that there is no apparent differential effect of nicotine exposure. The treatment and interaction effects follow approximately N(-0.459, 0.89^{2}) and N(0.284, 0.915^{2}), respectively. The empirical null distribution for the gender effect s N(-1.511, 1.07^{2}) with the null proportion about 0.84. The results for the gender effect show that we need to interpret results from large-scale hypothesis testing with caution. The bulk of the histogram is centered around -1.5, indicating that the majority of proteins have higher expression in female rats. The local FDR plot for the gender effect reveals that there is a small group of proteins with higher expression in males. This group of proteins is clearly separate from the rest as evidenced by the small local FDR. The local FDR is therefore more indicative of how different the gender effect is on a protein compared to the majority of the proteome, and less indicative of how significant the gender effect is. Should the theoretical null distribution be used, there would be a large number of effects at the left tail. Overall, we note that the estimated means of the null components are far from zero, especially for the gender effect, which may indicate the need to further normalize the data to remove some systematic bias.
Simulation Validations
Numerical simulation
Even when the true null distribution is normal and there is a large number of observations, the unconstrained estimation approach generated undesirable results. The null component is greater than the mixture distribution at some points around the peak of the histogram. Moreover, the left tail of the local FDR is close to 0, indicating that some true null values will be declared as non-null depending on the threshold of the local FDR. The estimated null density follows N(-1.013, 0.876^{2}) with the null proportion ${\widehat{p}}_{0}=0.837$, which is quite different from the values in the simulation model. In contrast, the empirical null density estimated using the constrained estimation approach is more accurate. The estimated empirical null density follows N(-1.011, 0.979^{2}) with ${\widehat{p}}_{0}=0.905$. The right tails of the estimated local FDR are similar under the two approaches, which indicates that both have similar sensitivity. The left tail of the local FDR has much larger values in the constrained method, indicating a lower chance of a true null value being declared as a non-null.
Comparison of Estimates for Null Parameters (δ = -1, σ = 1, p_{0} = 0.909; 100 simulations).
50 bins, N= 550 | 100 bins, N= 550 | 100 bins, N= 5500 | ||||
---|---|---|---|---|---|---|
mean, SD | unconstrained | constrained | unconstrained | constrained | unconstrained | constrained |
δ | -1.008 | -1.001 | -1.002 | -0.995 | -0.999 | -1.000 |
SD | 0.089 | 0.056 | 0.097 | 0.058 | 0.032 | 0.020 |
σ | 0.997 | 0.992 | 1.000 | 0.991 | 1.004 | 0.994 |
SD | 0.164 | 0.043 | 0.125 | 0.043 | 0.045 | 0.017 |
p _{0} | 0.914 | 0.905 | 0.916 | 0.906 | 0.913 | 0.907 |
SD | 0.108 | 0.011 | 0.076 | 0.012 | 0.025 | 0.005 |
From Table 1, we see that both approaches yielded estimates that are nearly unbiased. The estimates from the proposed approach have much smaller standard error, especially for σ and p_{0}. The superior performance of the constrained procedure continues as the total number of observation increases. The constrained approach is not sensitive to the number of bins used for estimation when this number is large enough (50 or 100) for the histogram counts to be roughly proportional to the density in the bins. The unconstrained approach is more affected by the number of bins, with a smaller number leading to increased variability for the estimates of σ and p_{0}. The simulation results clearly demonstrate that the constrained approach is better at estimating the null component.
Comparison of Estimates for Local FDR (100 simulations).
50 bins, N= 550 | 100 bins, N= 550 | 100 bins, N= 5500 | |||||
---|---|---|---|---|---|---|---|
z | unconstrained | constrained | unconstrained | constrained | unconstrained | constrained | |
2 | ratio | 1.42 | 1.04 | 3.80 | 1.20 | 1.17 | 1.00 |
relative SD | 0.24 | 0.05 | 0.40 | ||||
2.5 | ratio | 1.99 | 1.17 | 1.25 | 1.30 | 1.30 | 1.00 |
relative SD | 0.23 | 0.02 | 0.25 | ||||
3 | ratio | 3.16 | 1.14 | 62.0 | 1.30 | 1.48 | 0.95 |
relative SD | 0.14 | 0.003 | 0.15 | ||||
3.5 | ratio | 8.98 | 1.17 | 535.6 | 1.30 | 1.94 | 0.97 |
relative SD | 0.04 | 0.0004 | 0.10 | ||||
4 | ratio | 35.9 | 1.34 | 764.6 | 1.60 | 2.81 | 1.00 |
relative SD | 0.01 | 0.00004 | 0.06 |
Table 2 clearly shows that the estimate of the local FDR from the proposed procedure has smaller bias, much less variability, and converges to the true value faster when N increases. The bias (relative to the magnitude of the true values) in the unconstrained approach increases with greater values of z (smaller local FDR), and larger number of bins when N is fixed. The bias of both approaches decreases when N increases. When N is not so large and the number of observation per bin is small, the unconstrained approach leads to much larger variability and bias for smaller true local FDR values. Overall, the performance of constrained estimation is much more stable and not sensitive to the number of bins as well as to the magnitude of the true local FDR values.
Validation using Simulated Gels
Conclusions
Similar to microarray data analysis, proteomic analysis leads to large-scale simultaneous hypothesis testing and thus carries similar challenges. The two-component model plays an important role in the microarray literature. We applied a two-component EB model for analyzing a set of 2D gel images. As demonstrated by the 2D gel data, the true null density can be very different from its theoretical form, which supports Efron's innovative idea of choosing the empirical null distribution for hypothesis testing. The problem of estimating the null density is important and fundamental in the two-component EB model. Efron generalized the theoretical null N(0, 1) to N(δ, σ^{2}) and proposed two methods, CME and MLE, for estimating the null density, which are convenient to use.
However, as shown here, neither method is devoid of problematic results, which are hard to interpret in practice. To improve the estimation of the null density, we proposed a constrained estimation approach based on the central matching method. This novel procedure naturally takes the shape of the null density and its relationship to the mixture density into account for estimation, and explicitly constrains the estimated mixture density to being no less than the null density. Both the unconstrained and constrained approaches are nearly unbiased. The constrained method yields more stable and desirable estimation, as demonstrated by our simulation results. It can be generalized to include the situation where the null density comes from a family broader than the normal. The proposed approach can certainly be applied to any context where large-scale hypothesis testing occurs. Here, we have constrained the null component to be no greater than the mixture density for the histogram bins. It is a simplified version of the constraint that the null component is no greater than the mixture density over the entire real line, which is much more complicated. We note that, given the smoothness of the mixture density, the simplified constraint suffices in practice. It is reasonable to assume that the local FDR is a non-increasing function near the tail areas where the z-values are farther away from the null component. To impose this non-increasing property on the estimation of the local FDR, the monotone spline regression technique [34] should be utilized. We will tackle this in our future work.
The choice of the interval [a, b] may be influential for the estimation, especially if it is misspecified. When it is appropriately specified, i.e., the non-null component is nearly zero in the interval, our limited experience showed that the proposed approach is not sensitive to the choice of [a, b]. However, how the interval [a, b] can affect the estimation in general needs further research.
A quite different method for empirical null estimation is based on Fourier analysis [35]. Rather than modeling the mixture density, an attractive method for modeling the local FDR directly has also been proposed [25]. The former is non-parametric and the latter relies on parametric model assumptions. Both methods yield good estimates.
We have focused on estimating the local FDR based on test statistics. The two-component EB model is robust to correlation effects among the test statistics. It may be more informative to model the structure inherent in the data, which is certainly a challenging problem and relies on model assumptions. Further research is certainly needed here.
We utilized the protein quantifications from software RegStatGel with default settings. It should be noted that different software may generate different quantifications [36]. It is beyond the scope of the current paper to compare different quantifications.
Declarations
Acknowledgements
Work for this paper was supported in part by NIH Grant 5R01GM075298. The authors thank Carol Whisnant for the use of her data. This article reflects the views of the author and should not be construed to represent FDA's view or policies.
Authors’ Affiliations
References
- O'Farrell P: High resolution two-dimensional electrophoresis of proteins. Journal of Biol Chem 1975, 250: 4007–4021.Google Scholar
- Roy A, Seillier-Moiseiwitsch F, Lee K, Hang Y, Marten M, Raman B: Analyzing Two-Dimensional Gel Images. Chance 2003, 16: 13–18.View ArticleGoogle Scholar
- Morris J, Clark BN, Gutstein HB: Pinnacle: A fast, automatic and accurate method for detecting and quantifying protein spots in 2-dimensional gel electrophoresis data. Bioinformatics 2008, 24: 529–536. 10.1093/bioinformatics/btm590PubMed CentralView ArticlePubMedGoogle Scholar
- Conradsen K, Pedersen J: Analysis of Two-Dimensional Electrophoretic Gels. Biometrics 1992, 48: 1273–1287. 10.2307/2532718View ArticleGoogle Scholar
- Anjos Ad, Moller ALB, Ersbol BK, Finnie C, Shahbazkia HR: New approach for segmentation and quantification of two-dimensional gel electrophoresis images. Bioinformatics 2011, 27: 368–375. 10.1093/bioinformatics/btq666View ArticlePubMedGoogle Scholar
- Li F, Seillier-Moiseiwitsch F: Differential Analysis of 2D Gel Images. In Methods in Enzymology. Volume 487. Edited by: Johnson M, Brand L. San Diego: Academic Press; 2011:596–609.Google Scholar
- Vincent L, Soille P: Watersheds in digital spaces: An efficient algorithm based on immersion simulations. IEEE Transactions on Pattern Analysis and Machine Intelligence 1991, 13: 583–598. 10.1109/34.87344View ArticleGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B 1995, 57: 289–300.Google Scholar
- Qiu X, Klebanov L, Yakovlev A: Correlation Between Gene Expression Levels and Limitations of the Empirical Bayes Methodology for Finding Differentially Expressed Genes. Statistical Applications in Genetics and Molecular Biology 2005, 4: 1–13.View ArticleGoogle Scholar
- Efron B, Tibshirani R, Storey , Tusher V: Empirical Bayes analysis of a microarray experiment. Journal of the American Statistical Association 2001, 96: 1151–1160. 10.1198/016214501753382129View ArticleGoogle Scholar
- Efron B: Robbins, Empirical Bayes, and Microarrays. The Annals of Statistics 2003, 24: 366–378.View ArticleGoogle Scholar
- Efron B: Large-scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis. Journal of the American Statistical Association 2004, 99: 96–104. 10.1198/016214504000000089View ArticleGoogle Scholar
- Efron B: Correlation and Large-Scale Simultaneous Significance Testing. Journal of American Statistical Association 2007, 102: 93–103. 10.1198/016214506000001211View ArticleGoogle Scholar
- Efron B: Size, Power, and False Discovery Rates. Annal of Statistics 2007, 35: 1351–1377. 10.1214/009053606000001460View ArticleGoogle Scholar
- Efron B: Microarrays, Empirical Bayes and the Two-Groups Model. Statistical Science 2008, 23: 1–22. 10.1214/07-STS236View ArticleGoogle Scholar
- Storey J, Tibshirani R: Statistical significance for genomewide studies. Proceedings of National Academy of Sciences 2003, 100: 9440–9445. 10.1073/pnas.1530509100View ArticleGoogle Scholar
- Pounds S, Morris SW: Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioiformatics 2008, 19: 1236–1242.View ArticleGoogle Scholar
- Aubert J, Bar-hen A, Daudin J, Robin S: Determination of the differentially expressed genes in microarray experiments using local FDR. BMC Bioinformatics 2004.Google Scholar
- Broberg P: A new estimate of the proportion unchanged genes in a microarray experiment. Genome Biology 2005.Google Scholar
- Lee MLT, Kuo F, Whitmore G, Sklar J: Importance of replication in microarray gene expression studies: Statistical methods and evidence from repetitive cDNA hybridizations. Proc Natl Acad Sci 2000, 97: 9834–9838.PubMed CentralView ArticlePubMedGoogle Scholar
- Newton M, Kendziorsk C, Richmond C, Blattner F, Tsui K: On differential variability of expression ratios: Improving statistical inference about gene expression changes from microarray data. J Computational Biology 2001, 37–52.Google Scholar
- Benjamini Y: Comment: Microarrays, Empirical Bayes and the Two-Groups Model. Statistical Science 2008, 23: 23–28. 10.1214/07-STS236BView ArticleGoogle Scholar
- Morris C: Comment: Microarrays, Empirical Bayes and the Two-Groups Model. Statistical Science 2008, 23: 34–40. 10.1214/08-STS236DView ArticleGoogle Scholar
- Cai T: Comment: Microarrays, Empirical Bayes and the Two-Groups Model. Statistical Science 2008, 23: 29–33. 10.1214/07-STS236CView ArticleGoogle Scholar
- Rice K, Spiegelhalter D: Comment: Microarrays, Empirical Bayes and the Two-Groups Model. Statistical Science 2008, 23: 41–44. 10.1214/07-STS236AView ArticleGoogle Scholar
- Efron B: Rejoinder: Microarrays, Empirical Bayes and the Two-Groups Model. Statistical Science 2008, 23: 45–47. 10.1214/08-STS236REJView ArticleGoogle Scholar
- Locfdr: R package for computing local false discovery rate[http://cran.r-project.org/web/packages/locfdr/index.html]
- Potra F, Liu X, Seillier-Moiseiwitsch F, Roy A, Hang Y, Marten M, Raman B: Protein Image Alignment via Piecewise Affine Transformations. Journal of Computational Biology 2006, 13: 614–630. 10.1089/cmb.2006.13.614View ArticlePubMedGoogle Scholar
- Li F, Seillier-Moiseiwitsch F: Region-based Statistical Analysis of 2D PAGE Images. Computational Statistics and Data Analysis 2011, 55: 3059–3072. 10.1016/j.csda.2011.05.013PubMed CentralView ArticlePubMedGoogle Scholar
- Otsu N: A threshold selection method from gray level histograms. IEEE Transactions on Systems, Man and Cybernetics 1979, 9: 62–66.View ArticleGoogle Scholar
- Hastie T, Tibshirani R, Friedman J: The elements of statistical learning. Springer-Verlag; 2008.Google Scholar
- Hardin JW, Hilbe JM: Generalized Linear Models and Extensions. StataCorp LP; 2001.Google Scholar
- Akaike H: A new look at the statistical model identification. IEEE Transactions on on Automatic Control 1974, 19: 716–723. 10.1109/TAC.1974.1100705View ArticleGoogle Scholar
- Ramsay J: Monotone Regression Splines in Action. Statistical Science 1988, 3: 425–441. 10.1214/ss/1177012761View ArticleGoogle Scholar
- Jin J, Cai T: Estimating the null and the proportion of non-null effects in large-scale multiple comparison. Journal of American Statistical Association 2007, 495–506.Google Scholar
- Stressl M, Noe CR, Lachmann B: Influence of image-analysis software on quantitation of two-dimensional gel electrophoresis data. Electrophoresis 2009, 30: 325–328. 10.1002/elps.200800213View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.