Spot quantification in two dimensional gel electrophoresis image analysis: comparison of different approaches and presentation of a novel compound fitting algorithm

Brauner, Jan M; Groemer, Teja W; Stroebel, Armin; Grosse-Holz, Simon; Oberstein, Timo; Wiltfang, Jens; Kornhuber, Johannes; Maler, Juan Manuel

doi:10.1186/1471-2105-15-181

Research article
Open access
Published: 11 June 2014

Spot quantification in two dimensional gel electrophoresis image analysis: comparison of different approaches and presentation of a novel compound fitting algorithm

Jan M Brauner¹,
Teja W Groemer¹,
Armin Stroebel^1,3,
Simon Grosse-Holz¹,
Timo Oberstein¹,
Jens Wiltfang²,
Johannes Kornhuber¹ &
…
Juan Manuel Maler¹

BMC Bioinformatics volume 15, Article number: 181 (2014) Cite this article

7808 Accesses
19 Citations
3 Altmetric
Metrics details

Abstract

Background

Various computer-based methods exist for the detection and quantification of protein spots in two dimensional gel electrophoresis images. Area-based methods are commonly used for spot quantification: an area is assigned to each spot and the sum of the pixel intensities in that area, the so-called volume, is used a measure for spot signal. Other methods use the optical density, i.e. the intensity of the most intense pixel of a spot, or calculate the volume from the parameters of a fitted function.

Results

In this study we compare the performance of different spot quantification methods using synthetic and real data. We propose a ready-to-use algorithm for spot detection and quantification that uses fitting of two dimensional Gaussian function curves for the extraction of data from two dimensional gel electrophoresis (2-DE) images. The algorithm implements fitting using logical compounds and is computationally efficient. The applicability of the compound fitting algorithm was evaluated for various simulated data and compared with other quantification approaches. We provide evidence that even if an incorrect bell-shaped function is used, the fitting method is superior to other approaches, especially when spots overlap. Finally, we validated the method with experimental data of urea-based 2-DE of Aβ peptides andre-analyzed published data sets. Our methods showed higher precision and accuracy than other approaches when applied to exposure time series and standard gels.

Conclusion

Compound fitting as a quantification method for 2-DE spots shows several advantages over other approaches and could be combined with various spot detection methods.

The algorithm was scripted in MATLAB (Mathworks) and is available as a supplemental file.

Background

2-dimensional gel electrophoresis (2-DE) is a powerful tool in proteomics and software-based analysis of gel images is an important step in the process. Various computer-based methods have been proposed for the detection of protein spots in 2-DE images [1]. Recent developments involve segmentation-oriented methods, which are based on the watershed transformation [2, 3] or surface-oriented methods [4]. However, a general concern regarding current spot detection and quantification algorithms is that no method is fully automated, because they require the use of user-defined parameters [1]. Often, more complex methods require additional input parameters [5] and thus can be difficult to implement for a user less experienced in image processing. This difficulty and other factors might be the reasons why the automated image analysis of electrophoretic data is not yet standard in all fields of research. Despite numerous solutions for automated 2-DE analysis have been introduced [1], the majority of these approaches focus on spot detection rather than on spot quantification. The amount of marked protein in an electro chemiluminescent spot is proportional to the number of photons detected by the camera in the respective pixels, which is proportional to the intensities of those pixels. Therefore, to obtain measures for protein quantities, one has to determine the overall signal amount of every spot. Common methods for spot quantification are area-based approaches that quantitate spots based on the areas they occupy in the image. There are various published methods for determining these spot areas [3, 4, 6, 7], many of which make use of the watershed transformation. Spots can be quantitated by various area-related measures and many methods suggest the use of volume, i.e. the sum of pixel intensities within the respective areas. However, as we show in this paper, area-based approaches face certain limitations, especially in analyzing superimposed spots. The optical density (OD), i.e. the intensity of the most intense pixel of a spot, can also be used to quantify spots [8]. A different quantification approach is to find a fitted function for each spot and calculate the volume under the surface (VUS) of the function curve. Fitting of 2D-Gaussian function curves has been used for the quantification of 2-DE spots for a long time [9, 10], despite the fact that the adequate functions to model 2-DE spots continue to be contentious [1, 11]. Recent developments include a method to improve the applicability of Gaussian functions for saturated spots [12] or the use of 2D-gaussian function curves in the creation of synthetic gel images for the evaluation of image analysis algorithms [13]. In this study, we propose a simple algorithm that fits Gaussian functions to detected spots. We introduce the novel concept of compound fitting, i.e. the simultaneous fitting of neighboring groups of spots, for computationally efficient resolution of overlapping spots. Furthermore, we conduct the, to our knowledge, first structured comparison between different spot quantification approaches, namely the use of optical density of a spot, area-based quantification and Gaussian function fitting. The proposed compound fitting method is a highly accurate system for spot quantification and we demonstrate its superior performance relative to other methods for both synthetic and real data.

Regarding real 2-DE data, we chose beta amyloid (Aβ) peptides as our primary study system for two reasons: Firstly, 2-DE is a valuable tool in the analysis of Aβpeptides. Although it is difficult to separate low concentrated beta amyloid (Aβ) peptides based only on their almost identical masses, their high content of hydrophobic amino acids has been used to efficiently separate Aβ species with single amino acid differences in urea gels according to their hydrophobicity [14]. However, efficient separation can only be achieved using 2-dimensional gel electrophoresis because many species have identical hydrophobicities but different isoelectric points [15].The use of 2-dimensional electrophoretic methods has the advantage of displaying more than 30 different peptides at very low concentrations [15], that in surface enhanced laser desorption/ionization time-of flight mass spectroscopy are not detectable [16]. Secondly, exact quantification is especially important in the field of Aβ peptides, because it is necessary not only for the analysis of differential expression in different experimental conditions, but also for the use of Aβpeptides as biomarkers. The quantification of single Aβ peptide species and their ratios, e.g. Aβ_1-42 and Aβ_1-40 in particular, is valuable for the neurochemical diagnosis of Alzheimer’s dementia [16]. High-quality quantification is crucial to use these peptides as biomarkers [17].

Methods

Spot detection and compound fitting algorithm

Our algorithm comprises two major components: spot detection/identification and spot quantification. The flowchart of the algorithm is shown in Figure 1a. Spot detection (step 2 and 3) was performed using a simplified version of an established method [18, 19]. The algorithm involves a user-defined parameter w, which is used as the minimum allowed distance between peaks. A peak is defined as the most intense pixel of a spot. For 2-DE spots, the peak is usually in the center of a spot. Thus, w must be chosen to be smaller than the distance between two spots but sufficiently large to prevent the detection of several peaks on one spot. For the application of the algorithm on real data, the following user-friendly and intuitive method for selecting the value of w may be employed: the user is asked to click on the peaks of the two clearly identifiable spots with the smallest inter-peak distance (IPD).

Based on this input, w is calculated as

w = [\frac{IPD}{2}],

where [ ] is the Gaussian floor function because w must be an integer. If w is set/calculated too small, several peaks might be detected on top of a broad spot. If w is estimated too large, the detection might fail to resolve overlapping spots.

Estimation of background noise

The first step of the spot detection algorithm is the estimation of the standard deviation (SD) of the image noise σ_ns, which is performed in an area where only background signal is detected. Usually such an area without protein spots can be found in any typical 2-DE image, often near the edges. We used the following approach to implement the estimation.

The entire image A(x,y) is rastered with a quadratic w by w pixel grid. For every pixel (x,y), the SD σ_area(x, y) and the mean value μ_area(x, y) of the surrounding w by w area’s pixel intensities are calculated. μ_area(x, y) of a pixel in a background area should be less than the mean of all μ_area(x, y), because the majority of a gel image is occupied with background. Thus, the mean value of σ_area(x, y) that also satisfies

μ_{area} (x, y) < mea n_{i, j} (μ_{area} (i, j)) is taken as σ_{ns} .

Image preparation

Like any digital image, 2-DE images suffer from various types of noise and imperfections. We implemented noise reduction and image restoration for the spot detection procedure, whereas the function fit was performed using the original image A(x,y).

The following operations were performed:

the local background was modeled using a boxcar average over a square region of edge length 2 · w + 1,
random digital noise was suppressed by convoluting the image using a Gaussian surface of standard deviation $\sqrt{2}$ and
the restored image was calculated by subtracting the background image from the noise-reduced image.

These steps can be performed in one convolution. The restored image, which is the basis for further spot detection, was calculated as

A_{res} (x, y) = \sum_{i = - w}^{w} \sum_{j = - w}^{w} A (x - i, y - j) K (i, j),

where the convolution kernel K, normalization constant K₀ and factor B are given as follows:

K (i, j) = \frac{1}{K_{0}} [\frac{1}{B} exp (- \frac{i^{2} + j^{2}}{4}) - \frac{1}{{(2 w + 1)}^{2}}]

K_{0} = \frac{1}{B} {[\sum_{i = - w}^{w} exp (- \frac{i^{2}}{2})]}^{2} - \frac{B}{{(2 w + 1)}^{2}}

B = {[\sum_{i = - w}^{w} exp (- \frac{i^{2}}{4})]}^{2}

Peak detection

In the following section, we will use the term 'peak’ to refer to the most intense pixel of a protein spot, usually in the center of the spot. In the restored image, a particular pixel was considered to be the location of a peak if it met two conditions:

a)
It was the brightest pixel within the distance of w.
b)
Its intensity was greater than a threshold defined by σ _ns and the local factor μ _w, which is calculated as the mean intensity of all pixels within the distance of w.
a)
This criterion is efficiently computed with a grayscale dilation of the image with a flat structuring element of circular shape and radius w.

The dilated image A_dil is given as

Consequently, pixels (x_p, y_p) that satisfy
b)
After the peak candidates have been identified, a pixel (x _p, y _p) is considered a peak if it satisfies
$A_{dil} (x, y) = {max}_{(i^{2} + j^{2} \leq w^{2})} (A_{res} (x + i, y + j)) .$

$A_{res} (x_{p}, y_{p}) = A_{dil} (x_{p}, y_{p})$

are potential peak candidates.

A (x_{p}, y_{p}) > μ_{w} (x_{p}, y_{p}) + t \cdot σ_{ns},

where

μ_{w} (x, y) = (\sum_{i^{2} + j^{2} \leq w^{2}} A (x + i, y + j)) / (\sum_{i^{2} + j^{2} \leq w^{2}} 1) .

Based on our experience, t = 10 yields good results for real 2-DE images. For t < 10, a higher rate of false-positive detections is expected, especially for relatively large image sizes. If t is set too large, the rate of false negatives might increase.

Determination of the fitting compounds

For every spot, a Gaussian function is fitted in 4 parameters (for the equation, see next paragraph). For an assumed number of 50 spots per image, fitting the entire image at once would mean calculating the (nonlinear) influence of 200 parameters for every pixel, which is not possible within a reasonable computation time. To solve this problem, we used a compound fitting process that only fits neighboring groups of spots within the surrounding area of pixels simultaneously.

The general concept is that the spots of an image are divided into compounds - in such a way that the spots from one compound do not affect the pixels of another compound area. The defined groups of spots can then be fit individually within their respective compound areas, thereby significantly reducing the necessary computation time.

In the following, the term “compound” will be used for a group of spots that are fitted at the same time. The term “compound area” refers to the area that surrounds all the spots from one compound, comprising all the pixels that are significantly influenced by the spots and therefore the area that must be considered to be able to perform a proper fit.

The compound area around a single spot is a quadrat of edge length d. The minimum size of d is calculated in the results section. If two spots are close to each other in such a way that their compound areas would overlap, the overlapping area is influenced by both spots, and therefore, these spots must be fit at the same time. In this case, their individual compounds need to be fused, which means that if any two peaks P and Q satisfy |x_P - x_Q| ≤ d and |y_P - y_Q| ≤ d, they always belong to the same compound. This definition is compatible with the fact that in a single fitting compound, pairs of peaks can occur that do not satisfy this condition. For example, if there are three peaks P, Q, and R with

|x_{P} - x_{Q}| \leq d, |y_{P} - y_{Q}| \leq d, |x_{Q} - x_{R}| \leq d and |y_{Q} - y_{R}| \leq d,

there will be only one fitting compound for all three spots, even if |x_P - x_R| > d.

In this case, peak P and Q are called connected, whereas P is called connected to R via Q.

A reasonable approach for the determination of the compounds is to calculate the adjacency matrix:

The peaks are numbered, where N is the number of peaks/spot.

The matrix I_1,norm has a size of N × N and is defined as

I_{1, norm} (i, j) = {\begin{cases} 1, if |x_{i} - x_{j}| \leq d and |y_{i} - y_{j}| \leq d \\ 0, otherwise \end{cases}},

where x_i, y_i, x_j, y_j are the peak coordinates of the respective spots.

The subscripts indicate that I_1,norm is the adjacency matrix of first order and it is normalized so that the entries are either 0 or 1. The entry of I_1,norm at a certain position (P,Q) has the value 1, if the peaks P and Q are connected. Otherwise, the entry has the value 0. Of course, this constraint also means that all diagonal elements are 1.

Next the square of I_1,norm is calculated

I_{2} = {I_{1, norm}}^{2}

and thus

I_{2} (i, j) = \sum_{k = 1}^{N} I_{1, norm} (i, k) \cdot I_{1, norm} (k, j) .

I_2,norm is a matrix identical to I₂ but with all elements larger than 1 set to 1. Thus,

I_{2, norm} (i, j) = {\begin{cases} 1, if I_{2} (i, j) \geq 1 \\ 0, if I_{2} (i, j) = 0 \end{cases}}

If there is pair of (P,Q) with I_2,norm(P, Q) = 1, then there is a value k₁ ∈ 1 … N that satisfies

I_{1, norm} (P, k_{1}) \cdot I_{1, norm} (k_{1}, Q) > 0 .

Consequently,

I_{1, norm} (P, k_{1}) \neq 0

and

I_{1, norm} (k_{1}, Q) \neq 0,

and therefore, peak p is connected to peak q via peak k₁.

For the matrix I_2,norm, an entry of value 1 at a certain position (P,Q) indicates that peaks P and Q are connected via at most one other peak, whereas the value 0 indicates no connection.

This procedure of squaring and normalizing is repeated until I_h,norm = I_{(h - 1),norm} and the various fitting compounds containing the respective spots can be read out.

Nonlinear least squared error fit of 2-dimensional Gaussian distribution curves in respective compounds

A Gaussian function of circular shape is characterized by

f (x, y) = I \cdot exp (- 0.5 \frac{{(x - x_{0})}^{2} + {(y - y_{0})}^{2}}{σ^{2}}) .

This leaves four parameters: the x- and y- coordinates of the peak (x₀ and y₀), the spot’s intensity I and its width/standard deviation σ.

A nonlinear least-squared error type of fit is used. Before the fit, the mean value of the background μ_bg must be calculated. Because all areas having a spot signal are enclosed by compound areas μ_bg can simply be calculated as the mean of the intensity values of all pixels that are not included inside a compound area. Therefore, the function that is actually fit to model the data is

f (x, y) = I \cdot exp (- 0.5 \frac{{(x - x_{0})}^{2} + {(y - y_{0})}^{2}}{σ^{2}}) + μ_{bg},

with the 4 aforementioned parameters.

After determining the fitting compounds, each compound is fit individually; the spots of one compound are all fit at the same time, within the respective compound area.

The volume under the surface (VUS) of a Gaussian function curve, which is proportional to the number of photons detected by the camera and thus to the amount of protein at a certain location, can be calculated using the results from the fit via

\begin{array}{l} VUS = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} I \cdot exp (- 0.5 * \frac{{(x - x_{0})}^{2} + {(y - y_{0})}^{2}}{σ^{2}}) dx dy \\ = 2 \cdot π \cdot I \cdot σ^{2} \end{array}

Discrimination of false positives

Because spots can vary greatly in their size and intensity, it is difficult to define criteria for deciding whether a detected spot is a true or a false positive (i.e., whether a detected spot corresponds to a real spot). However, spots that are fitted with a very small width/standard deviation may represent camera noise because certain types of camera noise usually affect single pixels that subsequently form peaks without a broad base.

Thus, spots with σ ≤ 1 can be discriminated.

Comparison of different spot quantification methods

When we applied the three quantification methods to simulated gel images containing two spots P and Q, each method yielded different values signal_P and signal_Q, which correspond to the amount of protein signal in spot P and spot Q, respectively.

For the OD-method, signal_P and signal_Q represent the intensity of the brightest pixel of the respective spot; for the area-based method, they represent the sum of the pixel intensities in the respective spot areas. For our compound fitting method, these values represent the VUS of the curves calculated using the function parameters obtained by compound fitting. Consequently, there was no use in evaluating the different approaches on the basis of the absolute values of signal_P and signal_Q. Instead, it was necessary to consider the quantity $(\frac{signa l_{P}}{signa l_{Q}})$ , because it shows how a method calculated the ratio of protein signal between the two spots. For the simulated spots, the true signal ratios can be calculated from the function parameters used for simulation.

This provided a measure α of the quality of the quantification, which was defined as the deviation of the calculated signal ratio from the true signal ratio: $α = |((\frac{signa l_{P}}{signa l_{Q}}) / (\frac{true_signa l_{P}}{true_signa l_{Q}})) - 1| \cdot 100 %$

For further information on the different models and parameters used for data simulation, refer to Additional file 1.

Availability of supporting data

A ready-to-use version of this algorithm with graphical user interface as well as a command-line version was implemented for MATLAB (Mathworks). The code is included to the article in Additional file 2.

Results

The compound fitting algorithm

We developed an algorithm for the detection and quantification of protein spots on two dimensional gel electrophoresis (2-DE) images. Spot quantification is performed by fitting of 2-dimensional Gaussian function curves to the spots and calculating the volume under the surface (VUS) of the fitted functions. Before fitting, the spots are divided into fitting compounds. Only groups of neighboring spots, that show some grade of overlapping, are fitted simultaneously. The different compounds are fitted sequentially, in order to reduce the calculation time. For the determination of the minimal required compound area size, an evaluation of the fitting performance on asymmetric spots and an comparison regarding computation time between compound fitting and common fitting procedures, please refer to Additional files 3 and 4.

Comparison of different approaches of spot quantification using synthetic data

Next, we compared compound fitting with other quantification methods. We chose to compare the following three methods for spot quantification:

Method 1: Optical density (OD)

The intensity of the most intense pixel of a spot, i.e. the height of the peak, is used as the measure of signal quantity. This approach is used in recent approaches [8, 20] and has been reported to be a better [21] as well as a worse [22] measure of spot quantity than the area-based volume. It has been argued that OD is suitable for the resolution of overlapping spots, because it only takes into consideration the peak/center of a spot, where the contribution of overlapping parts of neighboring spots is expected to be small [23].

Method 2: Area-based approach

In an area-based approach, a certain area of the image that is assigned to each protein spot is used as the base for quantification. This is the most common method for spot quantification and usually the sum of the intensities of the pixels in an 'area’ , the so called 'volume’ , is used as the measure for spot signal [4, 6, 7, 20]. Our example algorithm determined these areas according to the following scheme. First, the area of the protein signal was determined using all pixels that surpassed a certain intensity threshold. Assigning areas to spots always requires a somewhat arbitrary threshold because spots usually are continuous structures without abrupt edges that would justify borders of the area. If two spots were present in one area, it had to be decided which pixel belonged to which spot. For this decision, the minimum of the line profile between the two spots (P,Q) was calculated, and a perpendicular to the inter-peak line was drawn through the minimum; pixels on one side of this perpendicular were assigned to spot P, and pixels on the other side were assigned to spot Q. Finally, the 'volume’ was calculated by summing the pixel intensities of each area.

Method 3 Fitting of 2-dimensional Gaussian function curves to determine the parameters

The three methods are illustrated in Figure 2a-d.

The quantification of overlapping spots

Having defined the three different quantification methods and a measure for the quality of quantification α (see 'Material and methods’), we compared the three methods regarding their performance in the quantification of overlapping spots. Despite the fact that the use of 2-D Gaussian functions to model protein spots is disputed [1, 11],a function fit as a way to quantify protein spots has some advantages regarding spot superposition. In superimposed areas, one pixel can be influenced by several spots. While area-based methods can only assign such a pixel to one spot and thereby neglect its amount of intensity related to other spots, spot superposition can be resolved by a function fit. For the comparison we simulated gel images containing two spots with a varying grade of superposition. Taking into account that the use of 2-D Gaussian functions as a model for protein spots is disputed, we simulated not only Gaussian spots but also spots with a shape based on Lorentz curves or based on a diffusion model.The former have been discussed as a function that models the shapes of spots in 1-D electrophoresis gels [24] and the latter have been suggested as another potential function modeling 2-DE spots [11]. For all three simulated spot shapes, the results were similar. The OD miscalculated the spot ratio even at high IPD, but the result was relatively robust to spot overlap. The area-based approach yielded good results for high IPD values, but failed to resolve overlapping spots. Only compound fitting yielded satisfactory results for high and low IPD values, not only for Gaussian shaped (Figure 2e,f), but also for Lorentz-shaped (Figure 2g,h) or diffusion model-based (Figure 2i,k) spots. Importantly, the same level of precision of our method when applied to Gaussian spots could be achieved neither for Lorentz-shaped spots nor for diffusion model-based spots. With increasing spot superposition, there was a sudden large increase in α for the area-based approach, as shown with an arrow. At this point, the two spots were superimposed in such a way that there was no valley present on the line profile from peak P to peak Q, but the line profile was a monotonically decreasing curve and Q was the minimum. This led to further deviations in the assignment of the areas because the line used to separate the areas corresponding to each spot was drawn through the peak coordinates of Q. Remarkably, spot detection in these cases still succeeded to detect two separate spots, because peak detection was not performed on the original but rather on the restored image A_res (see Image preparation).The ability of a method to correctly quantify overlapping spots directly influences the method’s ability to quantify areas with a high spot density. Indeed, when we simulated images with an increasing number of Gaussian spots of various parameters (Figure 2l), the result was similar to the studies with only two spots (Figure 2m). The area-based approach yielded good results for low spot density, but the quality deteriorated for higher spot densities. The OD misestimated spot ratios, but was affected less than the area-based approach by increasing spot density. Compound fitting yielded good quantification for low and high spot density.

In summary, compound fitting proved to be a valid method for the quantification of overlapping spots, while area-based approaches faced problems especially at low inter peak distances. This effect was also observed for not-exactly Gaussian shaped spots.

The quantification of spots with different intensities

Furthermore, we compared the methods regarding their performance in the quantification of spots of different intensities. The use of area-based approaches is questionable because the shapes of both protein spots and Gaussian function curves do not have any abrupt edges but the intensity values decline continuously with distance from the spot’s center. Therefore, arbitrary thresholds are needed to assign a limited area to a protein spot without well-defined edges. As area-based approaches by definition evaluate spots based on limited areas, they always neglect a certain portion of the generally infinite spatial extent of a spot; i.e., they omit the periphery of a spot. This effect generally does not pose a problem in the comparison of similar spots because the measure for each spot is affected equally, and thus, the ratio of the spots remains the same. However, this ratio can be affected when comparing spots of varied intensities. In our area-based approaches, the area corresponding to a protein spot was defined by all pixels that surpass a certain fixed intensity threshold. In this manner, larger areas are assigned to spots with a higher intensity. Although it may intuitively appear correct to assign larger areas to brighter spots, this method omits a larger percentage of the signal of small spots than that of spots with a high intensity (Figure 3a, inlay) and thus might compromise the comparison of spots with different intensities.To compare the three methods in the quantification of spots with different intensities and to test this hypothesis, we simulated gel images containing spots of varying intensities and compared the volume calculated by the quantification approaches to the true spot signal. We used Gaussian-shaped (Figure 3a), Lorentz-shaped (Figure 3b) and diffusion model-based (Figure 3c) spots in the simulation. For all three types of spots, the area-based approach increasingly underestimated the spots with lower intensities, in accordance with the above explanation. The function fit overestimated the Lorentz-shaped spots and the diffusion model-based spots by approximately 20 percent or 8 percent respectively, which is readily comprehensible because Lorentz curves and diffusion model-based curves are different from Gaussian curves. However, the advantage of our method over the area-based approach was that our method overestimated all spots by a similar percentage, therefore enabling unbiased comparisons of spots with different intensities. It should be mentioned that this problem with area-based approaches is not an issue of poor assignments of the areas but is inherent to the system of evaluating curves with an infinite spatial extent based on finite areas. For example, instead of defining the areas using an intensity threshold, one could use an equally sized area for every spot (e.g., a circle around the peak with a fixed radius). This approach would solve the difficulties associated with comparing spots of different intensities but would compromise the comparison of spots with different widths.

The OD does not calculate a volume, but takes the intensity as a measure of spot signal. Therefore it could not be included into this experiment, because we compared the volume yielded by the methods to the true volume. However, as the parameter relevant for the OD is the height of the spot and this is the only parameter that is varied in this experiment, the OD perfectly depicted the ratio between spots of varying height (data not shown).

In conclusion, the area-based method biased the quantification of spots with different intensities, because it omitted a larger part of the signal for small spots than for large spots. For Gaussian as well as not-exactly Gaussian shaped spots, the compound fitting of Gaussian function curves did not show such a tendency.

Comparison of different approaches of spot quantification using real data

We chose Aβpeptides as our primary study system for the evaluation of the different spot quantification methods. The immunoblots used differ from average 2-DE images in terms of spot number, with the immunoblots containing relatively few spots. The application of our method to a large 2-DE image containing about 1000 spots and the comparison with the noncommercial methods BEADS [4] and pinnacle [8] and the commercial software package Melanie 7 (Geneva Bioinformatics SA) can be found in Additional files 5 and 6.

Precision of the different quantification methods

We further evaluated our method on real data. To create a standard for the evaluation of our method, 2-D Aβ Western immunoblot was performed on Aβ standard solutions with fixed amounts of the different Aβ species. 2-DE was performed on Aβ standard solutions, the peptides were transferred to PVDF membranes and detected by a Horseradish peroxidase coupled antibody reaction (n = 4). Images of the blots (Figure 4a) were taken at different exposure times.We then tested the different quantification approaches on the images of the exposure time series. There were seven relevant spots on every blot, because the standard solution contained seven Aβ species. For each image of an exposure time series, we calculated the contribution of each spot (relative spot signal) to the sum of the overall signal of all spots (“total Aβ immunoreactivity“) in that image. This is depicted for one blot in Figure 4b. Subsequently, we determined for each spot the coefficient of variation (CV) of the relative spot signal across the exposure time series. Since the images of an exposure time series depict the same blot, a sound quantification method is expected to yield little variance of the relative spot intensities. Across all spots and blots, the mean CV of relative signal yielded by compound fitting was 0.043 (0.016 for the most intense, 0.065 for the least intense spot). For the OD and the area-based approach, the mean CV was 0.305 (0.240 for the most intense, 0.295 for the least intense spot) and 0.298 (0.208 for most intense, 0.317 for least intense spot) respectively. However, some of the images of the exposure time series were saturated. The OD is clearly not suitable for the analysis of saturated spots, because the OD of saturated peaks will always be the saturation threshold (as can be seen in Figure 4b). We therefore repeated the analysis, taking into consideration only the unsaturated images of each exposure time series. For the unsaturated images, the mean CV of the relative signal across all spots and blots was 0.007 for compound fitting, 0.044 for OD and 0.140 for the area-based approach.

In summary, for different images of the same blot, the spot intensities yielded by compound fitting have less variance compared to OD and area-based approach. When saturated images were included, the performance of OD and the area-based approach were similar. However, when only unsaturated images were analyzed, OD was superior to the area-based approach.

Accuracy of the different quantification methods

Having evaluated the precision of our method, we also wanted to measure the accuracy of compound fitting, i.e. the grade to which the results display the actual protein amounts. For each blot, we used the most intense image that did not show any saturation. We compared only the Aβ_{1–37/38/39/40/42} spots in our standard immunoblots, because the primary antibody (mAb 1E8) used for staining binds to the N-terminus and the differential binding of the antibody compromises the comparison of Aβ_1-x and Aβ_2-xspecies. In all four blots, we compared the relative spot signal of an Aβ species yielded by the three quantification methods to the percentage of the amount of that species in the standard solutions (relative peptide amount). For compound fitting, the mean deviation of the relative signal to the relative peptide amount was 3.81%, for OD and the area-based approach the mean deviation was 4.37% and 4.71%, respectively.

This means that compound fitting has a 13% or 19% higher accuracy than OD or the area-based approach, respectively. However, one has to be cautious when comparing the protein amount with the spot signal. Differential binding of the antibody, solubility of the different species, possible oxidation reaction and many more variables can influence the relation between protein load of the gel and detected signal.

Comparison of compound fitting with published, area based, results

Next, we applied our algorithm to immunoblot images of Aβ peptides in human plasma in healthy controls using datasets which we published previously [15]. When our algorithm was applied to the original images, we found that it produced a suitable fit for the images (Figure 4c-e).In our previous publication, 30 peptide spots were identified, and the relative contribution of each spot to the sum of the overall signal of all counted spots (“total Aβ immunoreactivity“) was calculated using an area-based approach (Quantity One software v4.1, BioRad). The relative spot signals calculated by our method closely matched the published values (spearman’s rho ρ =0.87, Figure 4f).Interestingly, the differences between the published area-based results and the results of the fit of the corresponding images showed a systematic trend. For spots with a low intensity, our method yielded higher relative values than the area-based approach, whereas for the spots with the highest intensities, the area-based approach yielded higher values (Figure 4g). This finding is in accord with our simulations on spots with different intensities (Figure 3). In the presented data, the peak intensity values varied by the factor 100 from lowest to highest. Because low-intensity spots are underestimated by the area-based approach but not by the function fit method, the area-based approach calculates a lower contribution to the total Aβ immunoreactivity for such spots. Consequently, because the sum of all spots must be equal to 100%, the area-based approach overestimates spots with a high intensity.

In summary, area-based approaches might underestimate small spots in immunoblot images of Aβ peptides in human plasma.

Discussion

In this present work, we validated the compound fitting of two dimensional Gaussian functions as a method for the quantification of two dimensional gel electrophoresis (2-DE) data, thereby enabling robust comparisons between different protein spots within a brief computation time. The fitting method was found to be valid for quantifying simulated data and was further evaluated using immunoblots of Aβ standard solutions and a published set of 2-DE images. The comparison of the compound fitting approach to area-based quantification methods revealed two advantages of the function fit method: 1. an improved ability to resolve superposed spots and 2. The unbiased comparison of spots with greatly differing heights/widths. The optical density (OD), i.e. the intensity of the most intense pixel of a spot, as a measure for spot signal does not take into account the width of a spot and is therefore not suitable to depict the signal ratio between different spots. However, it perfectly depicts the ratio between spots of similar shape but different height and is relatively robust to spot overlapping. It might therefore be useful for differential expression analysis across several gels, especially if the spots are strongly distorted. When we applied all three methods to exposure time series and standard gels of Aβ peptides, compound fitting showed the best precision as well as accuracy. Although for our data, our method produced suitable fits with little residues (Figure 4c-e), it has also been argued that protein spots are often not modeled exactly by Gaussian spots [11]. However, even for not-exactly-Gaussian spots, if the underlying function is not known, compound fitting may be the method of choice due to the aforementioned advantages, as demonstrated by the application of our algorithm to Lorentz-shaped or diffusion model-based spots. Although, as expected, our algorithm miscalculated those spots, it nevertheless outperformed the area-based approaches in terms of the resolution of superimposed spots (Figure 2e-m) and the comparison of spots of varying height (Figure 3). Due to these advantages, fitting of 2-dimensional Gaussian function curves has a higher precision (Figure 4b) and accuracy than area-based approaches when evaluated on immunoblots of standard peptide solutions.

Conclusion

Low SNRs and heterogeneous image conditions across different gels are common problems in 2-DE data analysis [7]. We found that it was most practical to robustly detect peaks and then allow the user to inspect the results and add or discard peaks based on experience. Spots that are clearly separated in one experiment may not be detected as separate spots in another gel. In this case, the experienced user might add or correct peak locations. However, because all spot detection methods have advantages and drawbacks when addressing a wide variety of electrophoresis images [1], compound fitting as a quantification tool could be combined with other spot detection algorithms or the manual identification of spots. This combination and the influence of different spot detection methods on the quantification process are described in Additional file 7.

With respect to Aβ peptides, high-quality quantification is necessary to use these peptides as biomarkers [17]. In addition, the exact quantification of 2-DE spots will continue to be crucial for the identification of differences in the amounts of Aβ species in the plasma of patients with Alzheimer’s disease and for the development of additional molecular markers.

Abbreviations

2-DE:: Two dimensional gel electrophoresis
CV:: Coefficient of variation
IPD:: Inter peak distance
OD:: Optical density
SNR:: Signal to noise ratio
VUS:: Volume under surface.

References

Sellers KF, Miecznikowski JC: Feature detection techniques for preprocessing proteomic data. Int J Biomed Imaging. 2010, 2010: 896718-
Article PubMed Central PubMed Google Scholar
Srinark T, Kambhamettu C: An image analysis suite for spot detection and spot matching in two-dimensional electrophoresis gels. Electrophoresis. 2008, 29: 706-715. 10.1002/elps.200700244.
Article PubMed CAS Google Scholar
Li F, Seillier-Moiseiwitsch F: Differential analysis of 2D gel images. Methods Enzymol. 2011, 487: 595-609.
Article PubMed CAS Google Scholar
Langella O, Zivy M: A method based on bead flows for spot detection on 2-D gel images. Proteomics. 2008, 8: 4914-4918. 10.1002/pmic.200800644.
Article PubMed CAS Google Scholar
Stroebel A, Welzel O, Kornhuber J, Groemer TW: Background determination-based detection of scattered peaks. Microsc Res Tech. 2010, 73: 1115-1122. 10.1002/jemt.20858.
Article PubMed Google Scholar
Natale M, Maresca B, Abrescia P, Bucci EM: Image analysis workflow for 2-D electrophoresis gels based on ImageJ. PRI. 2011, 37: 437-449.
Google Scholar
dos Anjos A, Moller AL, Ersboll BK, Finnie C, Shahbazkia HR: New approach for segmentation and quantification of two-dimensional gel electrophoresis images. Bioinformatics. 2011, 27: 368-375. 10.1093/bioinformatics/btq666.
Article PubMed CAS Google Scholar
Morris JS, Clark BN, Gutstein HB: Pinnacle: a fast, automatic and accurate method for detecting and quantifying protein spots in 2-dimensional gel electrophoresis data. Bioinformatics. 2008, 24: 529-536. 10.1093/bioinformatics/btm590.
Article PubMed Central PubMed CAS Google Scholar
Garrels JI: The QUEST system for quantitative analysis of two-dimensional gels. J Biol Chem. 1989, 64 (9): 5269-5282.
Google Scholar
Appel RD, Vargas JR, Palagi PM, Walther D, Hochstrasser DF: Melanie II--a third-generation software package for analysis of two-dimensional electrophoresis images: II. Algorithms. Electrophoresis. 1997, 18: 2735-2748. 10.1002/elps.1150181507.
Article PubMed CAS Google Scholar
Rogers M, Graham J, Tonge RP: Statistical models of shape for the analysis of protein spots in two-dimensional electrophoresis gel images. Proteomics. 2003, 3: 887-896. 10.1002/pmic.200300421.
Article PubMed CAS Google Scholar
Natale M, Caiazzo A, Bucci EM, Ficarra E: A novel Gaussian extrapolation approach for 2D gel electrophoresis saturated protein spots. Genomics Proteomics Bioinformatics. 2012, 10: 336-344. 10.1016/j.gpb.2012.06.005.
Article PubMed CAS Google Scholar
Marengo E, Demartini M, Robotti E, Bobba M: A new algorithm for the simulation of sodium dodecil sulfate two-dimensional polyacrylamide gel electrophoresis data sets. J Proteome Res. 2010, 9: 1864-1872. 10.1021/pr901014n.
Article PubMed CAS Google Scholar
Klafki HW, Wiltfang J, Staufenbiel M: Electrophoretic separation of betaA4 peptides (1-40) and (1-42). Anal Biochem. 1996, 237: 24-29. 10.1006/abio.1996.0195.
Article PubMed CAS Google Scholar
Maler JM, Klafki H, Paul S, Spitzer P, Groemer TW, Henkel AW, Esselmann H, Lewczuk P, Kornhuber J, Wiltfang J: Urea-based two-dimensional electrophoresis of beta-amyloid peptides in human plasma: evidence for novel Abeta species. Proteomics. 2007, 7: 3815-3820. 10.1002/pmic.200700311.
Article PubMed CAS Google Scholar
Lewczuk P, Esselmann H, Groemer TW, Bibl M, Maler JM, Steinacker P, Otto M, Kornhuber J, Wiltfang J: Amyloid beta peptides in cerebrospinal fluid as profiled with surface enhanced laser desorption/ionization time-of-flight mass spectrometry: evidence of novel biomarkers in Alzheimer's disease. Biol Psychiatry. 2004, 55: 524-530. 10.1016/j.biopsych.2003.10.014.
Article PubMed CAS Google Scholar
Lewczuk P, Esselmann H, Otto M, Maler JM, Henkel AW, Henkel MK, Eikenberg O, Antz C, Krause W, Reulbach U, Kornhuber J, Wiltfang J: Neurochemical diagnosis of alzheimer's dementia by CSF Abeta42, Abeta42/Abeta40 ratio and total tau. Neurobiol Aging. 2004, 25: 273-281. 10.1016/S0197-4580(03)00086-1.
Article PubMed CAS Google Scholar
Sbalzarini IF, Koumoutsakos P: Feature point tracking and trajectory analysis for video imaging in cell biology. J Struct Biol. 2005, 151: 182-195. 10.1016/j.jsb.2005.06.002.
Article PubMed CAS Google Scholar
Crocker JC, Grier DG: Methods of digital video microscopy for colloidal studies. J Colloid Interface Sci. 1996, 179: 298-310. 10.1006/jcis.1996.0217.
Article CAS Google Scholar
Dowsey AW, Dunn MJ, Yang G: The role of bioinformatics in two-dimensional gel electrophoresis. Proteomics. 2003, 3: 1567-1596. 10.1002/pmic.200300459.
Article PubMed CAS Google Scholar
Lustig RH, Pfaff DW, Mobbs CV: Considerations in the quantitative analysis of autoradiograms from 2-dimensional gels. J Neurosci Methods. 1989, 29: 17-26. 10.1016/0165-0270(89)90104-0.
Article PubMed CAS Google Scholar
Yan JX, Sanchez JC, Tonella L, Williams KL, Hochstrasser DF: Studies of quantitative analysis of protein expression in Saccharomyces cerevisiae. Electrophoresis. 1999, 20: 738-742. 10.1002/(SICI)1522-2683(19990101)20:4/5<738::AID-ELPS738>3.0.CO;2-2.
Article PubMed CAS Google Scholar
Morris JS, Clark BN, Wei W, Gutstein HB: Evaluating the performance of new approaches to spot quantification and differential expression in 2-dimensional gel electrophoresis studies. J Proteome Res. 2010, 9: 595-604. 10.1021/pr9005603.
Article PubMed Central PubMed CAS Google Scholar
Vohradsky J, Panek J: Quantitative analysis of gel electrophoretograms by image analysis and least squares modeling. Electrophoresis. 1993, 14: 601-612. 10.1002/elps.1150140195.
Article PubMed CAS Google Scholar
Raman B, Cheung A, Marten MR: Quantitative comparison and evaluation of two commercially available, two-dimensional electrophoresis image analysis software packages, Z3 and Melanie. Electrophoresis. 2002, 23: 2194-2202. 10.1002/1522-2683(200207)23:14<2194::AID-ELPS2194>3.0.CO;2-#.
Article PubMed CAS Google Scholar
Tsakanikas P, Manolakos ES: Protein spot detection and quantification in 2-DE gel images using machine-learning methods. Proteomics. 2011, 11: 2038-2050. 10.1002/pmic.201000601.
Article PubMed CAS Google Scholar

Download references

Acknowledgement

We thank Friederike Grosse-Holz for proof-reading of the manuscript.

Author information

Authors and Affiliations

Department of Psychiatry and Psychotherapy, Friedrich-Alexander-University of Erlangen-Nuremberg, Schwabachanlage 6, 091054, Erlangen, Germany
Jan M Brauner, Teja W Groemer, Armin Stroebel, Simon Grosse-Holz, Timo Oberstein, Johannes Kornhuber & Juan Manuel Maler
Department of Psychiatry and Psychotherapy, University of Duisburg-Essen, Virchowstrasse 174, 45147, Essen, Germany
Jens Wiltfang
CERES GMBH, Evaluation & research, Marie-Curie-Strasse 8, 79539, Lörrach, Germany
Armin Stroebel

Authors

Jan M Brauner
View author publications
You can also search for this author in PubMed Google Scholar
Teja W Groemer
View author publications
You can also search for this author in PubMed Google Scholar
Armin Stroebel
View author publications
You can also search for this author in PubMed Google Scholar
Simon Grosse-Holz
View author publications
You can also search for this author in PubMed Google Scholar
Timo Oberstein
View author publications
You can also search for this author in PubMed Google Scholar
Jens Wiltfang
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Kornhuber
View author publications
You can also search for this author in PubMed Google Scholar
Juan Manuel Maler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jan M Brauner or Teja W Groemer.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contribution

JMB and TWG conceived the study, evaluated the algorithm and wrote the manuscript. JMB, AS and SG implemented the algorithm. TO performed the experiments. JW, JK and JMM conceived the experiments. All authors read and approved the final manuscript.

Jan M Brauner, Teja W Groemer contributed equally to this work.

Electronic supplementary material

Additional file 1: Supplementary methods: Data Simulation in detail.(DOC 86 KB)

Additional file 2: Matlab code: Compound fitting.(ZIP 33 KB)

Additional file 3: Evaluation of compound fitting on synthetic data.(DOC 36 KB)

12859_2013_6475_MOESM4_ESM.tiff

Additional file 4: Figure S1: Performance of compound fitting on simulated data sets. (a) Influence of compound area size and SNR on the quality of fit. For each curve, the deviation of the fit result from the true VUS is displayed for varying compound area size d. For comparison, the curves for the different SNRs are plotted above each other. The inlays show examples of simulated spots with respective SNR.200 images per data point. (b) Evaluation of the quality of fit on asymmetric Gaussian-shaped spots. The deviation of the fit result from the true spot signal is displayed as a function of the asymmetry of the spot. The inlays show simulated spots with σ_x/ σ_y = 1, 2 and 3, from left to right. (c) Comparison of compound fitting and usual fitting. The computation time in seconds for compound fitting (red) and usual fitting (blue) is displayed in dependence of the number of simulated spots in the image. 200 images per data point. For details on data simulation, refer to additional file 1. (TIFF 339 KB)

12859_2013_6475_MOESM5_ESM.doc

Additional file 5: Analysis of a large 2-DE image and comparison with other methods [25], [26]. (DOC 46 KB)

Additional file 6: Figure S2: Immunoblot of Aβ peptides in human plasma from healthy controls. (PNG 35 KB)

Additional file 7: The combination of compound fitting with different spot detection methods.(DOC 40 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.

Reprints and permissions

About this article

Cite this article

Brauner, J.M., Groemer, T.W., Stroebel, A. et al. Spot quantification in two dimensional gel electrophoresis image analysis: comparison of different approaches and presentation of a novel compound fitting algorithm. BMC Bioinformatics 15, 181 (2014). https://doi.org/10.1186/1471-2105-15-181

Download citation

Received: 13 April 2013
Accepted: 28 May 2014
Published: 11 June 2014
DOI: https://doi.org/10.1186/1471-2105-15-181

Spot quantification in two dimensional gel electrophoresis image analysis: comparison of different approaches and presentation of a novel compound fitting algorithm

Abstract

Background

Results

Conclusion

Background

Methods

Spot detection and compound fitting algorithm

Estimation of background noise

Image preparation

Peak detection

Determination of the fitting compounds

Nonlinear least squared error fit of 2-dimensional Gaussian distribution curves in respective compounds

Discrimination of false positives

Comparison of different spot quantification methods

Availability of supporting data

Results

The compound fitting algorithm

Comparison of different approaches of spot quantification using synthetic data

Method 1: Optical density (OD)

Method 2: Area-based approach

Method 3 Fitting of 2-dimensional Gaussian function curves to determine the parameters

The quantification of overlapping spots

The quantification of spots with different intensities

Comparison of different approaches of spot quantification using real data

Precision of the different quantification methods

Accuracy of the different quantification methods

Comparison of compound fitting with published, area based, results

Discussion

Conclusion

Abbreviations

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding authors

Additional information

Competing interests

Authors’ contribution

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords