Joint pre-processing framework for two-dimensional gel electrophoresis images based on nonlinear filtering, background correction and normalization techniques

Background Two-dimensional gel electrophoresis (2-DGE) is a commonly used tool for proteomic analysis. This gel-based technique separates proteins in a sample according to their isoelectric point and molecular weight. 2-DGE images often present anomalies due to the acquisition process, such as: diffuse and overlapping spots, and background noise. This study proposes a joint pre-processing framework that combines the capabilities of nonlinear filtering, background correction and image normalization techniques for pre-processing 2-DGE images. Among the most important, joint nonlinear diffusion filtering, adaptive piecewise histogram equalization and multilevel thresholding were evaluated using both synthetic data and real 2-DGE images. Results An improvement of up to 46% in spot detection efficiency was achieved for synthetic data using the proposed framework compared to implementing a single technique of either normalization, background correction or filtering. Additionally, the proposed framework increased the detection of low abundance spots by 20% for synthetic data compared to a normalization technique, and increased the background estimation by 67% compared to a background correction technique. In terms of real data, the joint pre-processing framework reduced the false positives up to 93%. Conclusions The proposed joint pre-processing framework outperforms results achieved with a single approach. The best structure was obtained with the ordered combination of adaptive piecewise histogram equalization for image normalization, geometric nonlinear diffusion (GNDF) for filtering, and multilevel thresholding for background correction.


Introduction
A commonly used gel-based approach for proteomic analysis is two-dimensional gel electrophoresis (2-DGE), a technique that separates proteins in a sample based on both their isoelectric point and molecular weight [1]. This technique is often used in preliminary comparative proteomic analyses, as it is capable of resolving thousands of proteins in a single run. Once the proteins in the sample have been separated, the gel is then scanned and the imaged processed using computational tools. Often these 2-DGE images exhibit anomalies due to the technique itself or to the image scan and acquisition [2]. The purpose of 2-DGE image analysis is to detect the proteins (black spots) within the gel. However, a noisy background with variable intensity, diffuse or low-intensity spots, and over-saturated spots often hinder the detection of individual proteins. Therefore, a preprocessing step that minimizes these anomalies is an open issue in the literature, as an important phase prior to analysis of these kinds of images [3].
Pre-processing techniques for 2-DGE image analysis are classified as: image normalization, background correction, and noise reduction techniques [3,4]. Image normalization improves the detection of low abundance proteins (low-intensity spots) [5]. Satisfactory image normalization results are achieved using multiple gels, obtaining a pattern that is compared with each sample; however, aligning the multiple images is the main difficulty of this technique [6]. On the other hand, the aim of background correction is to increase contrast and decrease the effects of non-homogeneous regions, thus improving spot detection. In the literature, there are several background correction techniques reported for 2-DGE image processing, such as adjustment by either local or global minima, polynomial adjustment, and approaches based on histograms [6,7]. Despite the advances in normalization and background correction techniques, noise reduction approaches have been the most studied for 2-DGE image pre-processing. We found several linear and nonlinear filters used for noise reduction of 2-DGE images [3,4]. Usually, linear filters blur the spots and reduce their intensities, which is not optimal as it alters the end results [8]. Thus, it is common to use nonlinear filters, such as filters based on Wavelet [3], Contourlet [9] and total variation (TV) [10]. The most commonly used nonlinear filtering technique for 2-DGE is based on Wavelet transform, which achieves high noise reduction; however, with this technique it is difficult to preserve the spot contours [3,4]. On the other hand, TV preserves better spot edges due to a smoothing variable operation, but is limited in terms of noise reduction [10]. Contourlet transform also performs better than Wavelet in preserving edge information [9]. Xin and Zhao [11] used a combined version of Wavelet and TV (WTTV) to reduce information loss in 2-DGE image pre-processing. In a previous work [4], we presented a comparison between Wavelet, Contourlet, TV, and WTTV filters using synthetic and real 2-DGE images, showing that with synthetic data, Wavelet and WTTV had the lowest sensitivity to noise levels, while wavelet presented the best detection rate for known proteins on real 2-DGE images. However, these results were obtained by executing each technique separately and a joint framework was not considered.
Noise reduction, image normalization and background correction techniques reduce specific anomalies in 2-DGE images. For example, noise reduction minimizes the effect of impulsive and white noise; image normalization normalizes over-saturated and low abundance spots, as well as light saturation; and background correction reduces variability, saturation and streaking. Since each approach reduces a specific anomaly in 2-DGE images, it is necessary to combine them in order to enhance the spots in the image. This paper discusses a joint framework that combines the capabilities of image normalization, background correction and nonlinear filtering. Since there are several techniques for each approach, we first present a comparative study using both synthetic and real 2-DGE images and then we evaluate the combined framework. For this comparison, we used four metrics to evaluate the performance of the techniques applied to synthetic data, and we evaluated their capabilities in reducing anomalies in real 2-DGE images.

Pre-processing framework for 2-DGE images
In the proposed framework, the first step is image normalization. This step improves the contrast of protein spots, mainly low intensity ones. As in the literature there are several normalization techniques, we compared three enhancement techniques: histogram equalization, adaptive piece-wise histogram equalization [12], and a modification of background pixel intensity [7].
As mentioned previously, image normalization improves the contrast of low intensity protein spots; however, it also increases both the intensity of isolated points and impulsive noise. Therefore, in the proposed joint pre-processing framework, noise reduction is the second step in the process. For noise reduction, nonlinear filtering techniques are recommended for low edge distortion. A comparison of the most commonly used nonlinear techniques for 2-DGE image is presented in [4]. Quantitative comparison showed that Wavelet filtering performs better than Counterlet, TV, and WTTV. However, the results in [4] showed that with Wavelet there was less noise reduction but edge information was better preserved than with other techniques. In this paper, we evaluate the use of geometric nonlinear diffusion filtering (GNDF) for the pre-processing of 2-DGE images [13].
Finally, background correction techniques achieve better results when processing images with low levels of noise, therefore it is the last step in the pre-processing framework. We compared thresholding, multilevel thresholding [7] and surface approximation [14].

Image normalization
The histogram is an estimation of the probability of occurrence of grey levels in an image. The histogram is given by [15]: where n is the total number of pixels in the image, n k is the number of pixels with grey levels equal to k, L is the number of possible grey levels, and p(k) is the probability of occurrence of k. Histogram equalization is an image transformation that approaches the probability of occurrence of grey levels to a uniform probability density function. This transformation improves the use of the dynamic range for grey levels, thus improving contrast. From the histogram, the histogram equalization is obtained by computing the function S k given by: and then mapping each pixel with level k in the equalized image with a pixel value equal to (L − 1)S k . Given that pixel intensities behave randomly due to the type of sample and the acquisition process, an adaptive piecewise histogram equalization is proposed in [12]. This technique performs multiple histogram equalizations considering the maximum and minimum intensity levels. Further details of the algorithm are in [12].
Another way to perform image normalization is to modify the background pixel intensity [7]. The background of the image is estimated using a threshold and then it is subtracted from the data.

Nonlinear filtering
GNDF [13] reduces noise while preserving edge information, so it is expected to improve spot detection in 2-DGE image analysis. GNDF solves a nonlinear differential partial equation given by: where the initial condition I(t = 0) is the 2-DGE image, ∇I is the image gradient, and C are the diffusion coefficients defined as: where k is a threshold that determines the level of noise to be removed. The estimation of k is obtained from the signal to noise ratio of the image [13].
In addition to GNDF, in this study we used Wavelet Transform for noise reduction. A comparison of WT and other filtering techniques is presented in [4]. We use WT with a Daubechies family and 5 levels of decomposition [2,4].

Background correction
We compared three background correction techniques: thresholding, multilevel thresholding [7] and surface approximation [14]. Thresholding estimates the intensities of background pixels to be subtracted from the image. Since most of the time the background of 2-DGE images is not homogeneous, techniques such as multilevel thresholding can yield better results. Multilevel thresholding divides the image into several regions, and in each region we can estimate the intensities of the background pixels. For this paper, two levels G f 1 and G f 2 are used: where G f 1 is the first level, with pixels of intensities between the minimum grey level and the median of a percentile P x ( P x ), and G f 2 is the second level with pixels of intensities between P x and the maximum value of the percentile maxP x . A third method used in this paper for background correction is surface approximation [7]. A B-Spline surface is used to estimate background with the iterative algorithm presented in [7].

Database 1: synthetic dataset
Synthetic proteins were modelled as two-dimensional Gaussian distributions [16], assuming the media, μ, and standard deviation, σ , are equal for both dimensions. Size and scattering for a protein are varied through σ . Protein location within a synthetic image was randomly generated using a uniform distribution. The random distribution generated some overlapping spots. Gaussian, Rayleigh and exponential noise, given by (7), (8) and (9) respectively, were added to the synthetic images. The parameters presented in [4] were used for each noise in order to simulate images with signal-to-noise ratio -SNR between 8 and 20 db.

Database 2: ITM 2-DGE image database
This dataset was collected from previous studies carried out in the Laboratory of Molecular and Cell Biology of the Instituto Tecnologico Metropolitano ITM of Medellin (Colombia). The 2-DGE images correspond to two different sample types: a) Bee venom collected from africanized worker bees (samp_01-02-03 and 04). b) Urine samples taken from patients with prostate cancer (samp_05 and 06).

Database 3: lECB 2-D PAGE gel image database
This database consist of four 2-DGE image data sets previously analyzed with the GELLAB-II system [18]. These data sets consist of over 300 gel images (gif format) with annotations and landmark data in html, tab-delimited and xml formats. The data sets and experimental conditions are described and documented in the papers associated with each data set [19][20][21][22]. From this database, four 2-DGE images were randomly selected for this study, one from each data set: This database is available for public use and can be downloaded from http://www.bioinformatics.org/lecb2dgeldb/.

Validation measures
In this study four indicators were used for evaluating the performance of pre-processing techniques. For evaluating normalization, we used the percentage of low-abundance proteins detected (LPD) defined as the ratio between the number of low-abundance spots detected (LAS det ) and the total number of low-abundance spots (LAS tot ) in the image: In the case of noise reduction techniques, the signal to noise ratio (SNR), based on the normalized mean square error (MSE n ), was used and can be given by: SNR = 10 * log 10 1 MSE n (12) where x i is a pixel in the original image and x i is the same pixel in the filtered image. Additionally, spot efficiency ( ) was used to evaluate the performance of noise reduction techniques, in terms of the number of true detected spots (ς t ), false detected spots (ς f ) and lost spots (ς l ) [3,4]: Finally, the background correction methods were evaluated using the background subtraction index (BSI), which was calculated in terms of the number of detected pixels that belong to the background ( det ) and the total number of pixels that belong to the background ( tot ). Thus, BSI means the percentage of pixels identified as background:

Proposed approach
According to the measures expressed by (10), (12), (13) and (14), several configurations of stages for normalization, noise reduction and background correction were tested in a sequential structure made up by three stages, named in this work as the joint preprocessing framework. In this sense, the order of the stages was an important aspect to evaluate and the performance of several techniques in each stage was registered, in order to find the most effective structure configuration, which was validated by experts. It is important to note that the training was executed using synthetic images, but the validation was performed using real 2-DGE images, where the algorithm results were compared with the expert's opinions.

Comparison of normalization techniques
As image normalization seeks to enhance low-abundance proteins, we used a synthetic image with these kinds of spots (see Fig. 1a). The synthetic image had 1024 x 1024 pixels, with an opaque background and 150 spots, which were generated by a Gaussian distribution with standard deviation between 0. 3  to simulate low-abundance proteins with a grey level between 0.1 and 0.8. We compared histogram equalization, adaptive piecewise histogram equalization [12], and a modification of background pixel intensity [7] for image normalization, and used the percentage of low-abundance proteins detected (LPD) to evaluate the performance of each technique. The LPD results are presented in Table 1. The technique based on background pixel intensity detected only 48.7% of low-abundance spots. On the other hand, the histogram and adaptive piecewise histogram equalizations detected 82.1% and 88.9% of low abundance spots, respectively. As can be seen in Fig. 1b and c, the techniques based on equalization enhanced the contrast of the low-abundance spots. Figure 2 presents the normalization results for a real 2-DGE image (samp_05). The equalization-based approach improves contrast by increasing the grey level intensity of the protein spots and decreasing the intensity of the background pixels (see Fig. 2b and c). However, normalization also increases the background noise, so it was necessary to combine image normalization with a noise reduction technique. The values in bold indicate the best LPD achieved.

Comparison of noise reduction techniques
Wavelet transform (WT) is one of the nonlinear filters that presents the best performance for noise reduction in 2-DGE images [4]. However, there are other nonlinear methods that allow noise reduction without smoothing spot edges. We compared WT with geometric nonlinear diffusion filtering -GNDF. GNDF has been shown to perform well with several types of medical images but has not been used with 2-DGE images. For WT filter, a Daubechies wavelet family was used with five decomposition levels [4]. For GNDF, we used 35 smoothing iterations with a diffusion coefficient equal to 0.2 and windows of 5x5 pixels. The performance was evaluated using the signal-to-noise ratio (SNR) and spot efficiency [4]. WT and GNDF were tested with synthetic images with Gaussian, Rayleigh and exponential noise with SNR from 20 to 8 dB. Each synthetic image has 512x512 pixels with 250 spots. Table 2 presents the spot efficiency comparison using WT and GNDF filters for the synthetic images with noise. In terms of spot efficiency, WT and GNDF yielded very similar results for most noise levels, with differences close to 2%. However, for the synthetic image with Gaussian noise of 8 dB (i.e. the higher noise level), GNDF presented a spot efficiency of 77.86%, while WT obtained 67.5%. On the other hand, better results were obtained by GNDF in terms of SNR. Table 3 shows the SNR comparison for WT and GNDF filters. In the case of the image with SNR of 8dB, WT obtained images with 19.31 dB, 9.78 dB and 12.71 dB for the Gaussian, Rayleigh and exponential noise respectively; while GNDF obtained images with 20.11 dB, 10.5 dB and 15.61 dB for Gaussian, Rayleigh and exponential noise respectively. The values in bold indicate the best spot efficiency for each noise level.
Both nonlinear filtering techniques, WT and GNDF, were applied to real 2-DGE images (samp_05). As can be seen in the results in Fig. 3, the effect of filtering can be noted in the background, as GNDF reduces the background noise while preserving the spot contours.

Comparison of background correction
We compared three background correction techniques: thresholding, multilevel thresholding [7] and surface approximation [14]. First, we generated a synthetic image with changes in background intensity (see Fig. 4a). The background variation was obtained by increasing the initial intensity up to 155%. A percentile of 60% was used for both thresholding techniques. A B-Spline equation [14] was used for the surface approximation techniques optimizing the parameters with 150 iterations. The performance was evaluated by the Subtraction Index (SI) that compares the number of background pixels with the estimated. Figure 4 presents the background correction results in the synthetic image. Using thresholding, the background was partially removed, but as can be seen in part B of the figure, the background is divided in two regions. Conversely, a uniform background was obtained with multilevel thresholding. The surface approximation removed most of the background, but this technique did not work for pixels close to the spots. The SI results are presented in Table 4. Thresholding detected 71.8% of background pixels, while surface approximation and multi-level thresholding detected 97.9% and 98.5% of background pixels for the synthetic images respectively. Figure 5 presents the background correction for a real 2-DGE image (samp_05). Thresholding preserved background intensities around spots, but the background obtained from multi-level thresholding and surface approximation approaches was uniform and increased spot contrast. However, background noise was also preserved; hence, it is   Surface approximation 97.9 The values in bold indicate the best BSI achieved.
necessary to combine background correction with noise reduction techniques for preprocessing of 2-DGE images.

Proposal novelties I: joint pre-processing framework
Based on the comparison of image normalization, noise reduction and background correction techniques, we show that a joint pre-processing framework is needed. The proposed framework takes advantage of the capabilities of image normalization to increase the contrast of low-abundance proteins, of nonlinear filtering to reduce noise while preserving edge information, and of background correction to homogenize background pixels. According to previous results, we used piecewise histogram equalization for image normalization, GNDF for filtering and multi-level thresholding for background correction. The joint pre-processing framework was evaluated using both synthetic and real 2-DGE images. The joint pre-processing framework was evaluated using a synthetic image generated by  Table 5 presents the performance results using LPD, spot efficiency and SI. The SI metric was only computed for the images obtained from the background correction and joint pre-processing techniques, as it measures the background subtracted from the image. The best LPD was obtained using the joint pre-processing framework with 60% of lowabundance spots detected in the image. By comparison, this percentage was 40% when only the normalization technique was implemented. In terms of spot efficiency, the proposed framework detected 63.84% of spots, while lower percentages were obtained when using a single technique: 3.57% for normalization, 17.69% for the filtered image, and 6.69% using background correction. Furthermore, the best subtraction index was also obtained by the proposed framework, with a 78.62% in comparison with 11.37% using only the modified histogram-based technique for background correction. Figure 6 presents the effects of the joint pre-processing framework in three of the real 2-DGE images (samp_05-09-10). In the three processed images (Fig. 6b, d, and f ), we can see the effect of noise reduction and background homogenization. Additionally, the enhancement of low abundance spots is noticeable.

Proposal novelties II: validation with real 2-DGE images
The joint pre-processing framework was validated using real 2-DGE images captured from four apitoxin (honey bee venom) samples, two urine samples from patients with prostate cancer, and four 2D images from the LECB 2-D PAGE Gel Image Database. Table 6 presents the number of detected spots from the original and pre-processed samples, as well as the true positives and false positives. We obtained the false positive reduction percentages comparing the original and pre-processed images. For the 2-DGE images of apitoxin (samp_01-02-03-04), the joint pre-processing framework reduced the false positives between 43% and 72%. For the urine samples (samp_05-06), the false positives from the pre-processed images decreased by 91% and 85% respectively. And for the four images from the LECB 2-D PAGE Gel Image Database (samp_07-08-9-10), the false positives were reduced between 71% and 93%. From these results, we can see that the joint pre-processing framework improves protein detection by reducing the false positives caused by noise and non-homogeneous background.

Conclusions
2-DGE images commonly present several anomalies that hinder spot detection and analysis. In this paper, the use of several digital image processing techniques were tested Table 5 Performance of the joint pre-processing framework for a synthetic image with variable background and noise using LPD, spot efficiency ( ), and BSI  and validated in three stages, i.e., normalization, noise reduction and background correction, achieving an enhancement of the image for posterior analysis. Each approach helps improve specific anomalies, and here we introduce a new joint pre-processing framework that combines the capabilities of the selected techniques for each of the three stages. The techniques used in each of the stages of image pre-processing were compared on synthetic images, using four validation measures, i.e., LPD, SNR, spot efficiency ( ) and BSI, which offered representative and consistent values associated with pre-processing performance, so these quantitative indicators proved to be a very useful measure for 2-DGE image applications.
Experimental results from synthetic images demonstrated that the order of the stages impacts the final results. E.g., if the noise reduction stage is executed before normalization, the faint spots, that have important information for the interpretation of the image, are often removed. Consequently, the order with the best performance was the following: 1) normalization, 2) noise reduction and 3) background correction. In particular, the best