Skip to main content

Adaptive loss-guided multi-stage residual ASPP for lesion segmentation and disease detection in cucumber under complex backgrounds

Abstract

Background

In complex agricultural environments, the presence of shadows, leaf debris, and uneven illumination can hinder the performance of leaf segmentation models for cucumber disease detection. This is further exacerbated by the imbalance in pixel ratios between background and lesion areas, which affects the accuracy of lesion extraction.

Results

An original image segmentation framework, the LS-ASPP model, which utilizes a two-stage Atrous Spatial Pyramid Pooling (ASPP) approach combined with adaptive loss to address these challenges has been proposed. The Leaf-ASPP stage employs attention modules and residual structures to capture multi-scale semantic information and enhance edge perception, allowing for precise extraction of leaf contours from complex backgrounds. In the Spot-ASPP stage, we adjust the dilation rate of ASPP and introduce a Convolutional Attention Block Module (CABM) to accurately segment lesion areas.

Conclusions

The LS-ASPP model demonstrates improved performance in semantic segmentation accuracy under complex conditions, providing a robust solution for precise cucumber lesion segmentation. By focusing on challenging pixels and adapting to the specific requirements of agricultural image analysis, our framework has the potential to enhance disease detection accuracy and facilitate timely and effective crop management decisions.

Peer Review reports

Introduction

Plant diseases are one of the main causes of crop yield reduction in cucumbers, often leading to significant crop losses and even total crop failure, directly affecting crop quality and yield, and resulting in substantial economic losses [1, 2]. Therefore, to improve crop quality and yield, it is crucial to study plant diseases and be able to detect and identify them in order to determine the optimal time for prevention and treatment. When crops are infected by pathogens, most of the symptoms are manifested on the leaves, resulting in various phenomena such as lesions [3] and localized rot and wilting [4]. There are numerous types of crop diseases, and manual diagnosis is complex with a high misdiagnosis rate [5, 6]. Simultaneously, the spraying of pesticides is a primary measure for the prevention and treatment of plant diseases. However, pesticide application often overlooks the severity of crop symptoms [7], leading to imprecise pesticide dosages, resulting in soil pollution and overuse of pesticides [8]. Hence, to assist non-specialists in crop production in effectively carrying out their duties, detecting diseases, and diagnosing them promptly to avoid further crop losses, techniques in artificial intelligence and digital image processing are typically employed for disease detection. Segmentation and extraction of lesions on cucumber leaves can provide a reliable basis for future plant disease diagnosis and are of significant importance for the prevention and control of plant diseases and pests.

The severity of crop diseases can be assessed using image segmentation methods. Traditional methods assess the severity of crops such as cucumbers by segmenting crop leaves and lesion areas and calculating their areas. The main methods include: (1) Threshold-based segmentation methods [9, 10]: These methods, including genetic algorithms and Otsu’s method, are relatively simple to implement and have low computational requirements. However, in the real world, scenarios are often complex. Due to the subtle differences in grayscale values of crop leaves and the overlapping grayscale values between multi-scale leaves, data image processing is challenging, causing difficulties in lesion segmentation and detection. (2) Cluster-based segmentation methods [11, 12]: Common machine learning clustering segmentation methods include K-means and Fuzzy C-means. These methods apply to most samples. However, the segmentation results often depend on the selection of initial parameters, which can lead to local optima and reduce segmentation accuracy. (3) Region-based segmentation methods [13, 14]: Common region-based methods include region growing and watershed algorithms. As the region-growing method is sensitive to noise, it is not suitable for leaf lesion segmentation and detection under complex scenarios. As can be seen, traditional methods have a high complexity in preprocessing and poor generalizability, and most methods only target a single disease. Their ability to transfer to different types of plant disease segmentation is weak, making it difficult to handle multi-disease and multi-scale lesion segmentation simultaneously, which will severely impact the ability to segment and detect multiple diseases.

In addition to early traditional methods for image segmentation, deep learning technology proposes solutions to enhance the transferability of plant lesion segmentation tasks and to improve segmentation detection accuracy. Differing from early manual feature extraction methods, deep learning segmentation network models avoid cumbersome preprocessing stages, such as Fully Convolutional Networks[15], adopt end-to-end feature extraction methods, and do not require complex preprocessing, like the U-net network [16], which can also achieve more accurate segmentation results, closer to real samples. Additionally, various types of DeepLab network [17] structures, with their higher accuracy and stronger transferability, will encourage more researchers to enter the field of agricultural image processing and make good progress [18,19,20]. However, due to the complexity of the real environment, ordinary deep learning methods only function in a single environment, and the segmentation effect in complex scenarios is poor, meaning the model lacks universality. For example, Chen et al. proposed an improved semantic segmentation network, BLSNet, based on the Unet segmentation network introducing attention mechanisms and multi-scale modules, which have high segmentation accuracy and classification accuracy [21]. In recent years, more and more researchers have paid more attention to lesion segmentation and disease recognition under complex backgrounds, and explored the importance of lesion extraction for assessing disease severity.

Wang et al. proposed a network structure combining DeepLabV3+ and U-Net for lesion segmentation and disease recognition in complex backgrounds, reducing the interference of similar pixel values in complex backgrounds in lesion extraction [22, 23]. However, due to the imbalance of pixel ratio in lesion segmentation, and the difficulty of recognizing leaf edge pixels caused by leaf overlap or debris occlusion, the designers of DUNet did not optimize the network structure for these problems. This paper proposes a novel two-stage LD-ASPP network model guided by adaptive loss, which effectively targets the imbalance of pixel ratio in lesion segmentation, and validates the high precision and accuracy of this method on a cucumber leaf disease dataset. The main contributions of this study are as follows:

  1. (1)

    To address the imbalance of background pixel and foreground target pixel ratios in lesion segmentation, the proposed adaptive loss algorithm is utilized to enhance the attention to difficult-to-distinguish edge pixels of leaves and to improve the category probability of pixels belonging to the background or foreground. This is mainly achieved through the modulation factor present in adaptive loss to adjust the classification weight. This factor decreases as the pixel classification confidence increases. At this point, during the training process, this modulation factor can reduce the weight of easily classified pixels and increase the weight of difficult-to-classify pixels, prompting the model to effectively focus on the difficult-to-classify pixels. During the model training process, the classifier needs to classify a large amount of easily classified sample data, which leads to a decrease in the segmentation accuracy of sparse samples.

  2. (2)

    The first stage, the Leaf-ASPP network structure, primarily segments out the complete leaf contour from the complex environment. To enhance the network model’s focus on key areas and reduce the impact of overlapping leaves and debris, the Mult Residual ASPP improved module is used in place of the ASPP module to extract multi-scale feature images. An attention mechanism is introduced, combining ordinary convolution with small convolution, to acquire features with stronger discrimination.

  3. (3)

    The second stage, Spot-ASPP network structure, extracts lesion areas from the segmented complete leaves. Adjusting the dilation rate of the ASPP module enables the recognition of smaller lesion information, avoids accuracy loss, and aims at lesion segmentation to acquire a more complete lesion area. This stage introduces an enhancement of the network model’s focus on key area features and includes a convolutional channel attention block (CABM) to capture attention on important area pixels.

  4. (4)

    Combining the design of the two-stage model, the LS-ASPP integrated model is constructed for cucumber lesion segmentation and disease detection in complex scenarios. The comprehensive segmentation task will be decomposed into two stages: diseased leaf segmentation and lesion extraction, effectively improving segmentation accuracy and playing a crucial role in other disease classification assessment tasks.

Data sources

A. Xinjiang dataset

The experimental data comes from images of three types of cucumber diseases in the cucumber dataset from the AI laboratory of Xinjiang Institute of Technology, including 48 images of cucumber powdery mildew, 88 images of cucumber angular leaf spot, and 64 images of cucumber downy mildew.

B. Extensive cucumber dataset

This dataset included images of size 512 × 512 resolution with a complete cucumber dataset with eight different types of cucumber classes for building machine vision-based algorithms. These classes include Anthracnose, Bacterial Wilt, Belly Rot, Downy Mildew, Pythium Fruit Rot, Gummy Stem Blight, Fresh leaves, and Fresh cucumber. In total, there are 1280 original images in this dataset.

Figure 1 presents examples of diseased samples from the cucumber leaf diseases in the datasets.

Fig. 1
figure 1

An example of a cucumber leaf with diseased spots for A Xinjiang dataset, B Extensive cucumber dataset

As can be seen from Fig. 1, the identification and segmentation of cucumber leaf diseases mainly have the following four difficulties: (1) Different diseases of cucumber leaves will present different characteristics, and lesion segmentation needs to be conducted according to these disease features; (2) The similarities between the characteristics of different diseases interfere with the recognition task, leading to a low recognition accuracy rate, such as between cucumber downy mildew and cucumber angular leaf spot; (3) Complex backgrounds interfere with leaf segmentation and shadows from obstructions are misdetected as lesion areas; (4) Due to the irregular shape of cucumber leaf lesion areas, initial small lesions are difficult to discover, which increases the difficulty of segmentation.

Data augmentation

Training data augmentation

Due to the uneven distribution of samples, the training process may lead to model overfitting. Therefore, data augmentation is timely adopted to improve the generalization performance of the training model. In this study, for each batch of data during the training process, a random augmentation method is chosen. Without increasing the original dataset, the original data features are preserved to better simulate the differences among various samples in a real complex environment. The training set mainly employs the following data augmentation methods: (1) Flipping: The images are manipulated through horizontal flipping, vertical flipping, vertical-then-horizontal flipping, and mirroring. There are four flipping methods in total, simulating the randomness of shooting angles when collecting samples, without changing the shape of lesions or their distribution on the leaf. (2) Color jitter: By adjusting the brightness or saturation of the image, the differences in real-world lighting scenarios are simulated, ensuring no image distortion in the real environment. (3) Adding noise: Noise is added to simulate the noise generated during data collection, preventing the network model from overfitting. The effect of test data augmentation is shown in Fig. 2:

Fig. 2
figure 2

Illustration of training data sample augmentation

Test data augmentation

To address the data fluctuations in real complex environments, the dataset used in this study was shot under laboratory conditions. To simulate the scenarios of insufficient lighting, leaf deviation from the center, and obstructions in natural shooting, three image augmentation methods are employed to enhance the test data: (1) Translation: A part of the image is first cropped, and then the missing pixels are filled in with the border pixel values, simulating situations such as leaf deviation from the lens center and incomplete leaves during shooting. (2) Occlusion: Randomly-sized, randomly-located, and randomly-rotated images of tomato fruit, soil, and green leaves are generated to occlude the leaf area, simulating the target leaf being obscured. (3) Cropping: By cropping a certain proportion of the main area, the diseased part can be effectively highlighted. (4) Reducing brightness: The image brightness is reduced to simulate the condition of insufficient light. The brightness of the diseased leaf images can be sequentially reduced to 50% of the original image and grayscale. The results of the test data augmentation are shown in Fig. 3.

Fig. 3
figure 3

Illustration of test data sample augmentation

Figure 3 shows the outcomes of implementing various test data augmentation techniques on the original cucumber leaf images. The methods demonstrated include grayscaling, reducing brightness, and cropping.

  1. (a)

    Original test image This section exhibits the unedited, raw image taken in a controlled lab setting. The purpose is to show how the dataset looks before any modifications are applied.

  2. (b)

    Grayscale The ‘Grayscale’ section presents a black-and-white version of the original image. This transformation helps simulate low-light conditions, where colors may not be as distinct. Converting the image to grayscale encourages the model to focus on textures rather than colors, making it more adaptable to different lighting scenarios.

  3. (c)

    Reduce brightness by 50% This part illustrates what happens when the brightness of the original image is halved. This adjustment mimics situations with reduced lighting, preparing the model for real-world applications. By gradually decreasing brightness, the model becomes better at recognizing important features even in varying lighting conditions.

These examples highlight the importance of using data augmentation techniques to enhance the model’s performance in different environments. By incorporating these methods, the model becomes more reliable and can smoothly transition from ideal lab conditions to practical field settings.

Data annotation

For the plant disease identification network model inherently equipped with image-level annotations, it is also necessary to process the training disease spot segmentation network model without annotations in the dataset of Xinjiang Institute of Technology laboratory. For the training of disease spot segmentation models, pixel-level annotations are required for disease spots as shown in Fig. 4 (Add leaf annotation and disease spot annotation).

Fig. 4
figure 4

Pixel-level annotation of diseased leaf spots

Pixel-level annotation requires a substantial amount of human and material resources. Therefore, disease spot annotations are performed on a portion of the samples, selecting a total of 30 diseased non-healthy plant samples from various categories, with the LabelMe image annotation tool used for disease spot area annotation. Among them, 10 disease spot samples are used for training the segmentation model, and 20 disease spot sample images are used for testing the segmentation model.

Methods

The main issues faced by cucumber disease spot segmentation and disease detection include:

  • 1. Pixel ratio imbalance This issue mainly stems from sparse pixels in the target area, leading to an imbalance in the ratio of background area pixels to target pixels, such as in the task of extracting small disease spots. As shown in the figure, the ratio of disease spot area pixels to leaf pixels is small, making disease spot area pixels prone to loss. Additionally, a large number of easily classified pixels in the background area can generate significant losses, resulting in the total loss, when finally computed, far exceeding that of the disease spot area pixels. This directly reduces the efficiency of model training and severely impacts segmentation results.

  • 2. High number of hard samples Hard samples mainly originate from complex background images in the natural environment. Due to interference from clutter, leaf overlap, shadow occlusion, uneven lighting, etc., the pixel areas of the interfering parts can be considered hard samples. This leads to incomplete leaf edge segmentation and difficulties in disease spot extraction. These hard-to-distinguish pixels all impact disease spot extraction.

To address the above issues, this study makes improvements in two aspects: loss function and model structure. By adopting an adaptive loss function, issues like low model efficiency due to the sum of the losses of a large number of easy samples during training exceeding the total loss are ameliorated. This can to some extent solve problems such as pixel imbalance and low precision due to hard samples. By carrying out the tasks of leaf and disease spot segmentation in stages, interference from pixels in complex backgrounds is reduced. The first stage uses the Leaf-ASPP network model to segment leaf contours, and the second stage uses the Spot-ASPP network structure to extract disease spot areas.

Adaptive loss function

The introduction of an adaptive loss function mainly aims to resolve issues such as pixel ratio imbalance and an excessive number of hard samples in the task of segmenting diseased cucumber leaves, problems that traditional Cross-Entropy (CE) loss cannot solve. While balanced loss can effectively alleviate class sample imbalance, it overlooks the issue caused by an excessive number of hard samples. Therefore, by improving CE loss and balanced loss, an adaptive loss is generated and its optimization effect is discussed.

As a classic loss function in semantic segmentation in image processing, the Cross-Entropy binary classification loss function is defined as shown in the following Eq. (1):

$$CE = \left\{ {\begin{array}{*{20}l} { - \log \left( p \right),} \hfill & {if\;y = 1} \hfill \\ { - \log \left( {1 - p} \right),} \hfill & {if\;y = 0} \hfill \\ \end{array} } \right.$$
(1)

where \(y \in \left[ {0,1} \right]\) represents whether the pixel value \(p \in \left[ {0,1} \right]\) is true or false, represents the probability that the model predicts that this pixel belongs to the class \(y = 1\). Specifically, in the context of this paper, during image segmentation, it is determined whether the pixel value belongs to the foreground pixel, otherwise it is a background pixel. In the first stage, \(y = 1\) indicates that the pixel belongs to the target leaf area.

For easily classified pixels, such as those with probability values far greater than 0.5, the CE loss function generates a very small loss value. However, due to the vast number of pixels, for example, when the number of easily classified pixels in the data greatly exceeds the loss of hard-to-distinguish pixels, it produces overwhelming results, leading to insufficient training and poor network performance. Therefore, their loss cannot be ignored.

α-balanced CE loss is a common method to solve the problem of class imbalance. By introducing a balancing factor \(\alpha\) to the CE loss function, the following Eq. (2) is formed:

$$\alpha{\text{-}}balanced\;CE\left( {y,p} \right) = \left\{ {\begin{array}{*{20}l} { - \alpha \log \left( p \right),} \hfill & {if\;y = 1} \hfill \\ { - \left( {1 - \alpha } \right)\log \left( {1 - p} \right),} \hfill & {if\;y = 0} \hfill \\ \end{array} } \right..$$
(2)

In practical operation, by setting cross-validation, \(\alpha{\text{-}}balanced\) CE loss can increase the weight of smaller categories. Although it assigns weights to samples belonging to the same category, it can reduce the impact of a large proportion of data on the loss. However, the problem of hard samples existing in the data must be considered, such as how to effectively partition off pixels in areas of leaves covered by shadows, raindrops, dust, or interference pixels overlapping with other leaves in the background, to exclude interference. CE loss, however, \(\alpha{\text{-}}balanced\) cannot effectively solve the problem of hard samples. Therefore, a new type of adaptive loss function will be tried to solve the above problems.

The question of whether the model can actively focus on hard-to-classify pixels during training, without the need for human intervention in setting weights, becomes a key link in disease spot segmentation. A modulation factor \(\left[ {\cos \left( {p + \pi /2} \right) + 1} \right]\) is introduced into the CE loss function. This term will decay as the pixel classification confidence increases, thereby changing the loss weight of hard-to-classify pixels and sparse category pixels in the overall loss. The adaptive loss function (3) is as follows:

$$\alpha{\text{-}}balanced\;CE\left( {y,p} \right) = \left\{ {\begin{array}{*{20}l} { - \left[ {\sin \left( {p + \pi } \right) + 1} \right]\log \left( p \right),} \hfill & {if\;y = 1} \hfill \\ { - \left[ {\sin \left( {1 - p + \pi } \right) + 1} \right]\log \left( {1 - p} \right),} \hfill & {if\;y = 0} \hfill \\ \end{array} } \right.$$
(3)

here \(p\) represents the probability value that the model predicts the pixel belongs to the \(y = 1\) class, and the value of the modulation factor is determined by the probability \(P\). It decays as the probability value p increases, thereby reducing the loss value of easily classified pixels. Equation (3) includes the following content:

  1. (1)

    When the probability \(p\) decays, it indicates that the pixel value is hard to classify, and the size of the modulation factor increases as the probability \(p\) decays. When the probability p is 0, the modulation factor \(\sin \left( {p + \pi } \right) + 1\) is 1. The loss is infinitesimal and does not affect the overall loss.

  2. (2)

    When the probability value p increases, indicating that the pixel is easy to classify, the modulation factor decreases as the probability p rises, thus the loss value of easily classified pixels will decrease. When the probability p rises to 1, the modulation factor reaches its minimum value. The pixel loss value will be reduced to a minimum.

The modulation factor can dynamically adjust the weight size according to the difficulty level of the probability values belonging to different categories, thereby adaptively adjusting the loss value. This process reduces the impact of the total loss of easily classified pixels on model performance. The adaptive loss function can reflect dynamic attention to pixels of two categories of different difficulty levels. To a certain extent, it can alleviate the problem of an excessive number of hard samples in leaf segmentation. It can adaptively assign gradually decreasing weight values to easily classified pixels in the background area, improving segmentation accuracy and enhancing network model performance. At the same time, it effectively mitigates the imbalance problem of the pixel ratio in disease spot segmentation. Overall, the adaptive loss function can effectively improve the network model’s disease spot segmentation performance.

Two-stage LS-ASPP network model

Most leaves in complex environments overlap each other, and the background clutter and irrelevant leaves overlap with the target segmentation leaves, affecting the segmentation effect. In addition, there may be diseased leaves in the background image, and areas similar to disease spots can also interfere with the target leaf segmentation. Therefore, a single-stage segmentation task may result in an incomplete disease spot segmentation area, and the low segmentation accuracy can lead to inaccurate disease detection. Therefore, the segmentation task is refined into two stages, from obtaining the disease leaf outline to extracting the disease spot area, optimizing the segmentation process, and improving segmentation precision. This study uses the ASPP as the benchmark network structure, designing a two-stage segmentation model for cucumber disease leaf and disease spot segmentation.

The two-stage LS-ASPP segmentation network model consists of Leaf-ASPP and Spot-ASPP. Both stages use the Atrous Spatial Pyramid Pooling (ASPP) as the benchmark network structure. The proposed model’s first-stage network structure uses Leaf-ASPP to extract target leaves from complex scenes. Then, in the second stage, Spot-ASPP is used to segment the more complete disease spot area in the segmented disease leaf. Each stage focuses only on one specific type, reducing the difficulty of segmentation.

Leaf-ASPP

In real scenarios, the image background often contains overlapping leaves, which makes it difficult to accurately extract the contours of target leaves, as well as other leaves. More so, uneven illumination, raindrops, and dust can also directly affect segmentation. Therefore, to address these issues and enhance the capability of capturing cucumber disease leaf outlines, we have improved upon the original Atrous Spatial Pyramid Pooling (ASPP), rebranding the optimized network as the Leaf-ASPP network. The main structural optimizations include the replacement of the ASPP module with the Mult Residual ASPP module, enhancing the model’s ability to perceive disease leaf outlines in complex backgrounds. The detailed Leaf-ASPP network model is composed of encoder and decoder parts, and its architecture (Fig. 5) is shown below:

Fig. 5
figure 5

Architecture of MRA-Net

To improve the disease leaf segmentation performance of the model in complex scenes, we introduced the Mult Residual ASPP module, also known as the MRA-Net network, to capture more different multi-scale feature leaf outlines.

Generally, the larger the receptive field, the better the network’s ability to perceive and judge each pixel. However, due to the characteristics of large neural network models, the number of network layers that increase sequentially, and the frequent use of up-sampling and down-sampling modules to process features, can lead to loss of detail information and reduced segmentation accuracy.

Both the Mult Residual ASPP and ASPP modules use dilated convolutions to enlarge the receptive field to obtain different scale feature maps. However, the original ASPP model mainly consists of three parallel dilated convolutions applied to a feature map, with the basic kernel size being 3 × 3. As such, in the initial model, the features extracted by the convolution kernel are similar and cannot distinguish difficult pixel features, which leads to an inability to accurately capture disease leaf outlines. Therefore, we have improved upon the original network, and the embedding of the MRA module enhances the model’s edge extraction ability.

As shown in Fig. 5, each branch of the MRA module consists of ordinary convolution, dilated convolution, and attention modules. The difference from the original spatial pyramid pooling structure lies in the different kernel sizes between the branches of the MRA module. In ordinary convolution, different kernel sizes will capture different receptive fields for each branch. Each branch’s basic feature map will effectively capture different information and improve feature distinguishability. Finally, the outputs of each branch are fused to form multi-scale features.

In the encoder, two features are output: low-level features and high-level features. Low-level features are extracted by the Xception backbone network, mainly containing shallow information such as disease spot outlines and shapes. High-level features are processed by the backbone network and residual ASPP, mainly containing deep information such as texture and color features.

The Residual ASPP inputs the original features into three 1 × 1 convolution modules, four extended attention convolution units, and one residual unit. Each extended attention convolution unit consists of an ordinary attribute convolution module, a 3 × 3 convolution module, and an attention module, while the residual unit is composed of a 1 × 1 convolution module and an attention module. Among them, the dilation rates of the four extended convolution attentions are 1, 3, 3, and 5, with a kernel size of 3 × 3. Then, the outputs of each extended attention convolution unit are added to the output of the residual unit to get the four output feature maps of the Residual ASPP. Finally, the four feature maps are concatenated, and the merged result is input into a 1 × 1 convolution module. Through the above operations, the high-level features are finally obtained.

In the decoder, the outputs of low-level and optimized high-level features from the encoder are received. First, the low-level features are input into the attention module and the 1 × 1 convolution layer, yielding a small-scale refined low-level feature map. Then, the up-sampled high-level features are concatenated with the shallow features to obtain a fused feature map. Finally, the fused feature map is input into a 3 × 3 convolution layer for up-sampling processing to get the network’s prediction base map.

By improving upon the original ASPP module, the MRA module’s extraction of multi-scale features will reduce more irrelevant information, enhancing the model’s ability to perceive disease leaf edge pixels. This will significantly improve the original network’s segmentation performance.

Spot-ASPP

In the first stage of image segmentation, the information of disease leaf contours has been obtained. The disease leaf image contains only a small amount of sparse disease spot features, and the spot pixels account for a small proportion of the total disease leaf area pixels. This increases the difficulty of disease spot extraction in the second stage, resulting in a lower accuracy of disease spot segmentation. Therefore, the original network’s spatial pyramid pooling is optimized again to enhance the model’s segmentation performance. The main improvements are: (1) Adjusting the dilation rate in the ASPP module to reduce the loss of detailed information. (2) The Convolutional Block Attention Module (CBAM) is introduced to highlight important information features again, capture small area pixels such as disease spots, and suppress irrelevant other disease leaf information to improve disease spot segmentation accuracy. The improved network structure is named Spot-ASPP, and its framework is shown in the Fig. 6.

Fig. 6
figure 6

Architecture of CBAM-Net

Firstly, to enhance the segmentation effect of disease spots, smaller-sized dilated convolutions in the original network are retained, such as those with dilation rates of 2, 4, 6, and 8. The improved network structure is referred to as the CBAM-Net network. The receptive field range is mainly expanded by increasing the dilation rate, but due to the reduced correlation of adjacent local information in the feature map, small target area details will be lost directly. Therefore, the CBAM-Net structure retains smaller dilation rate dilated convolutions, which are more conducive to small disease spot pixel extraction, to achieve a more precise segmentation accuracy.

Secondly, to enhance the robustness of the model’s segmentation performance, the Convolutional Block Attention Module (CBAM) is introduced following the optimization of the original ASPP network. In the channel attention module, the feature maps are input into the max pooling layer (Maxpool) and the average pooling layer (Avgpool), generating feature maps that are passed to a Multilayer Perceptron (MLP), thus creating the channel attention map (see Fig. 7).

Fig. 7
figure 7

Architecture of channel attention module

The channel attention module uses Avgpool and Maxpool modules to integrate the channel information of the feature maps, outputting two types of spatial information contexts processed by max pooling and average pooling respectively. Following a matrix summation operation, the fused matrix map is multiplied by the input features. This operation will effectively enhance the extraction capability for important features and strengthen the expressive power of the features.

Model training

Experiment configuration

To validate the effectiveness of the U-shaped network model of the LS-ASPP network, the proposed method or model is applied to the task of cucumber leaf disease spot image segmentation, and is compared with other methods or models. The network model training and testing environment are both the Ubuntu 18.04 LTS 64-bit operating system. The proposed method is designed using the Python programming language, with Python version 3.7, and the experiment platform uses PyTorch 1.10.2 as the deep learning open-source framework. The experimental hardware platform environment includes an Intel(R) Core(TM) i9-10900F CPU @ 2.80 GHz processor, 32 GB of memory, and an NVIDIA GeForce RTX 3080Ti with 26G of video memory. CUDA_11.6.0 and CUDNN_10.2 are used as the library tools for network model training acceleration.

Model parameters

The U-shaped network convolutional layer of the LS-ASPP network refers to the Unet network model pre-training parameters for initialization, which has now been adopted by PyTorch as the default parameter initialization function. The negative slope of the activation function is 0. Kaiming initialization is mainly designed for deep neural networks using nonlinear activation, which can effectively prevent the explosion or disappearance of activation layer outputs in the forward propagation of deep neural networks, thus accelerating model convergence. The model learning rate is 0.0001, the number of training epochs is 15, the total number of iterations is 360, and the batch size for disease spot segmentation training is 4. The optimizer is Adam [17], with a weight decay of 0.00005.

Experimental results and analysis

This study selects at least three network models for comparison. Some disease spots are quite similar, and the same type of disease spots show different features at different disease stages, and the feature extraction network pays excessive attention to disease features.

Not using pre-training parameters and having a small proportion of training samples can lead to poor model recognition performance. After preliminary leaf position shifts, due to changes in their background shapes, some required disease spot area features are cropped, resulting in a decline in model recognition and segmentation performance.

The Vit network model pays more attention to the disease spot area. The Unet network model is sensitive to the orientation and location of its disease spots. Changes in the relative position of the background and the disease spot area can lead to poor recognition performance. Global pooling can enhance the relationship between feature maps and categories, showing invariance to spatial changes, and performs well in shift tests.

Analysis of disease segmentation results

Segmentation result metrics

In order to accurately assess the segmentation precision of the diseased spot area, this study typically employs four conventional evaluation metrics in semantic segmentation: Pixel Accuracy (PA), Mean Pixel Accuracy (MPA), Mean Intersection over Union (MIoU), and Frequency Weighted Intersection over Union (FWIoU).

Pixel Accuracy refers to the ratio of the number of correctly predicted pixels to the total number of pixels, as shown in Eq. (4):

$$R_{PA} = \frac{{\mathop \sum \nolimits_{i = 0}^{k} p_{ii} }}{{\mathop \sum \nolimits_{i = 0}^{k} \mathop \sum \nolimits_{j = 0}^{k} p_{ij} }}$$
(4)

where \(m\) represents the number of categories, \(q_{ii}\) represents the number of correctly predicted positive pixel samples by the model.

$$R_{MPA} = \frac{1}{k + 1}\mathop \sum \limits_{i = 0}^{k} \frac{{p_{ii} }}{{\mathop \sum \nolimits_{j = 0}^{k} p_{ij} }}$$
(5)

Mean Pixel Accuracy is the proportion of each category of pixels correctly classified, and averaged over categories.

$$R_{{MI{\text{oU}}}} = \frac{1}{k + 1}\mathop \sum \limits_{i = 0}^{k} \frac{{p_{ii} }}{{\mathop \sum \nolimits_{j = 0}^{k} p_{ij} + \mathop \sum \nolimits_{j = 0}^{k} p_{ji} - p_{ii} }}$$
(6)

Disease spot segmentation precision

The experiment used 84 images with disease spot annotations, processed by data layering, as training samples, and 36 images as test samples. The annotation only divided the images into foreground and disease spot areas, without considering disease category. The experimental results were based on the average of 225 image test results. Table 1 presents a comparison of the segmentation precision for each algorithm. As can be seen from Table 1, the segmentation precision of LS-ASPP is significantly improved compared to FCN, U-Net, and VIT. The segmentation precision of the Unet model is not significantly different from that of other networks trained with FCN, indicating that this method can achieve good segmentation results even when using very few training samples. The structures of U-Net and VIT are superior to the FCN model, because FCN only uses a single feature map from the output of the last three pooling layers during upsampling, without merging low-level semantic features, which leads to poor segmentation results. On the other hand, network models that merge low-level semantic features have optimized the disease spot segmentation effect.

Table 1 Segmentation accuracy of each algorithm

Figure 8 shows the segmentation results of cucumber downy mildew leaves by FCN, U-Net, VIT, and the LS-ASPP model with self-attention mechanism used in this experiment. It can be seen that adding skip-connection modules can fully merge low-level semantic feature information. U-Net, VIT, and LS-ASPP models can segment smaller disease spot areas, while the FCN model will lose some disease spot information, with incomplete and relatively vague segmentation boundaries and low acquisition capability for small disease spots at a distance. The use of the U-Net model can lead to uneven edges in the disease spot area, mainly because U-Net only has a convolutional layer in the upsampling process and does not capture all global disease spot information.

Fig. 8
figure 8

Comparison of segmentation results of various deep learning network models

Because this experiment uses the LS-ASPP network structure with attention mechanism modules, this structure can capture all information, accurately capturing global information at a distance, thus more accurately capturing global disease spot semantic information. Therefore, it has a stronger segmentation capability for less prominent small disease spot areas and can more accurately capture global leaf spot information.

Table 1 showcases the segmentation results of three diseases on three models: FCN, U-Net, and Vit. The addition of skip connections and the fusion of low-level semantic features enable the U-Net and Vit models to segment smaller diseased areas, while the FCN structure tends to lose some details in its segmentation results. The segmentation boundaries are blurry and closely situated diseased areas tend to stick together. The LS-ASPP model produces smoother disease boundaries than the U-Net, whose segmentation edges appear jagged. This is because the U-Net only introduces a convolutional layer after upsampling, while the LS-ASPP, with its convolution operation in the reconstruction layer and attention mechanism, can further smooth edges and gather global segmentation information.

Different diseases present different levels of segmentation difficulty. For instance, the diseased areas of cucumber powdery mildew are light yellow, with early symptoms and indistinct edges, making segmentation challenging. The average segmentation results of 84 test samples for each type of disease spot indicate that the LS-ASPP model exhibits high segmentation accuracy for diseases of the spot type, as shown in Table 2.

Table 2 Segmentation accuracy for diseases

The model’s input size influences segmentation results: when the input size is increased from 224 × 224 to 384 × 384, while keeping the patch size at 4, the LS-ASPP network’s input token sequence becomes larger, thereby improving the model’s segmentation performance. Despite a slight improvement in segmentation accuracy, the computational burden of the entire network also increases significantly. To ensure algorithm efficiency, this study uses an input resolution scale of 224 × 224.

The model’s scale also influences results: akin to reference [19], we believe deepening the network will impact the model’s performance. An increase in model scale does not enhance performance but instead raises the computational cost of the entire network. Balancing precision and speed, we opt for an attention mechanism-based symmetric model for plant image disease segmentation.

Performance comparison for different diseases

The LS-ASPP model was put to the test datasets for three different diseases: cucumber powdery mildew, cucumber angular leaf spot, and cucumber downy mildew. We analyzed the performance metrics for each disease individually and presented the average scores in Table 3. Looking at the table, we can see that the LS-ASPP model excels in accurately segmenting all three diseases. It achieves the highest score for cucumber powdery mildew and slightly lower scores for the other two diseases. However, the model consistently maintains strong performance overall, indicating its suitability for segmenting these diseases and potentially others that are similar.

Table 3 The performance metrics for each disease individually and presented the average scores

According to the data in the table, it is clear that the LS-ASPP model performs well in all the metrics that were evaluated. Specifically, the model achieves an impressive Mean IoU of about 79%, 78%, and 74% for cucumber powdery mildew, cucumber angular leaf spot, and cucumber downy mildew, respectively. Similarly, the FreqWeighted IoU metric shows a very satisfactory score of around 89%, 88%, and 91% for the corresponding diseases. This indicates that the LS-ASPP model consistently performs well in handling different cucumber diseases, demonstrating its strength and reliability.

Evolution of loss during model training process

It is important to keep an eye on how the loss changes while training a model. By checking the training and validation losses, we can learn a lot about how the model is learning and if it’s overfitting or underfitting. In Fig. 8, it is seen the losses every five epochs throughout the training.

As displayed in Fig. 9, initially, the training loss starts relatively high at 0.762, accompanied by a considerable validation loss of 0.684. Subsequently, the training loss steadily declines as the model trains longer, reaching 0.317 at epoch 50. Meanwhile, the validation loss experiences a mild oscillatory pattern, eventually settling near the training loss. This observation implies that the model successfully minimizes the errors on both training and validation sets.

Fig. 9
figure 9

Evolution of loss during model training process

Discussions

We suggest adding more data sets with different types of crops and environments to strengthen the model’s effectiveness. By including a wide range of crops and external factors in the data set, we can ensure that the model is better suited for real-world use. This expansion will involve retraining the model with the new data and assessing its performance using established criteria. The expected results of this project are:

  1. (1)

    Enhanced model flexibility: By expanding the data set to include various crops and growing conditions, we increase the chances of successful implementation in different fields.

  2. (2)

    Confirmation of robustness: Testing the model on diverse data sets confirms its ability to maintain high performance levels despite variations in appearance and environmental factors.

  3. (3)

    Real-world applicability: Demonstrating the model’s effectiveness in handling a wide range of crops and environmental conditions strengthens its credibility and usefulness in practical applications.

By taking this approach, the model we propose can move beyond just being good at identifying cucumber leaves to becoming a comprehensive solution for detecting and monitoring plant diseases, meeting the needs of modern agricultural practices.

Conclusion

Plant disease segmentation models are prone to interference from shadows and obstructions, and the extraction of features has an inherent uncertainty. To address these challenges, the LS-ASPP network model was constructed using images from the dataset. The use of the LS-ASPP module in the network model enhances the model’s ability to capture global information, thereby improving the segmentation of disease spots.

The model is trained with only a small amount of annotated disease spot samples, significantly reducing the annotation cost. Compared to the U-Net and LS-ASPP models, this model achieves superior segmentation accuracy. It performs well in terms of pixel accuracy, mean intersection over union, and frequency weighted intersection over union, all of which are key segmentation evaluation metrics. This suggests that the model exhibits robustness against shadows and obstructions.

By adding skip connections, the network model can integrate low-level features, and by restoring detailed features, it can retain smaller disease spots and refine segmentation boundaries. The trial demonstrates that the LS-ASPP model, equipped with a self-attention mechanism, exhibits good generalization performance and robustness.

Availability of data and materials

The raw data used in this study cannot be publicly shared due to confidentiality restrictions, but processed data used for analysis may be available from the corresponding author upon reasonable request.

Abbreviations

LS-ASPP:

Leaf-ASPP and spot-ASPP (two-stage model)

ASPP:

Atrous spatial pyramid pooling

CABM:

Convolutional attention block module

AI:

Artificial intelligence

CE:

Cross-entropy

MRA:

Mult residual ASPP

MLP:

Multilayer perceptron

FCN:

Fully convolutional network

U-Net:

U-shaped convolutional network

VIT:

Vision transformer

PA:

Pixel accuracy

MPA:

Mean pixel accuracy

MIoU:

Mean intersection over union

FWIoU:

Frequency weighted intersection over union

GPU:

Graphics processing unit

CNN:

Convolutional neural network

IoU:

Intersection over union

References

  1. Liu C, Zhu H, Guo W, et al. EFDet: an efficient detection method for cucumber disease under natural complex environments. Comput Electron Agric. 2021;189:106378.

    Article  Google Scholar 

  2. Zhang P, Yang L, Li D. EfficientNet-B4-Ranger: a novel method for greenhouse cucumber disease recognition under natural complex environment. Comput Electron Agric. 2020;176:105652.

    Article  Google Scholar 

  3. Bai X, Li X, Fu Z, et al. A fuzzy clustering segmentation method based on neighborhood grayscale information for defining cucumber leaf spot disease images. Comput Electron Agric. 2017;136:157–65.

    Article  Google Scholar 

  4. Pixia D, Xiangdong W. Recognition of greenhouse cucumber disease based on image processing technology. Open J Appl Sci. 2013;3(01):27–31.

    Google Scholar 

  5. Luo Y, Sun J, Shen J, Wu X, Wang L, Zhu W. Apple leaf disease recognition and sub-class categorization based on improved multi-scale feature fusion network. IEEE Access. 2021;9:95517–27. https://doi.org/10.1109/ACCESS.2021.3094802.

    Article  Google Scholar 

  6. Jiang P, Chen Y, Liu B, He D, Liang C. Real-time detection of apple leaf diseases using deep learning approach based on improved convolutional neural networks. IEEE Access. 2019;7:59069–80.

    Article  Google Scholar 

  7. Mu H, Wang K, Yang X, Xu W, Liu X, Ritsema CJ, Geissen V. Pesticide usage practices and the exposure risk to pollinators: a case study in the north China plain. Ecotoxicol Environ Saf. 2022;241:113713.

    Article  CAS  PubMed  Google Scholar 

  8. Pan D, He M, Kong F. Risk attitude, risk perception, and farmers’ pesticide application behavior in China: a moderation and mediation model. J Clean Prod. 2020;276:124241.

    Article  Google Scholar 

  9. Yang P, Song W, Zhao X, et al. An improved Otsu threshold segmentation algorithm. Int J Comput Sci Eng. 2020;22(1):146–53.

    Google Scholar 

  10. Pare S, Kumar A, Singh GK, et al. Image segmentation using multilevel thresholding: a research review. Iran J Sci Technol Trans Electr Eng. 2020;44:1–29.

    Article  Google Scholar 

  11. Rajabi A, Eskandari M, Ghadi MJ, et al. A comparative study of clustering techniques for electrical load pattern segmentation. Renew Sustain Energy Rev. 2020;120:109628.

    Article  Google Scholar 

  12. Chavolla E, Zaldivar D, Cuevas E, et al. Color spaces advantages and disadvantages in image color clustering segmentation. In: Advances in soft computing and machine learning in image processing; 2018, pp. 3–22.

  13. Abbas Q, Celebi ME, Garcia IF. Breast mass segmentation using region-based and edge-based methods in a 4-stage multiscale system. Biomed Signal Process Control. 2013;8(2):204–14.

    Article  Google Scholar 

  14. Ji X, Li Y, Cheng J, et al. Cell image segmentation based on an improved watershed algorithm. In: 2015 8th International congress on image and signal processing (CISP). IEEE; 2015, pp. 433–437.

  15. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2015, pp. 3431–3440.

  16. Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer; 2015, pp. 234–241.

  17. Liu C, Chen L C, Schroff F, et al. Auto-deeplab: hierarchical neural architecture search for semantic image segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2019, pp. 82–92.

  18. Czajkowska J, Badura P, Korzekwa S, et al. Automated segmentation of epidermis in high-frequency ultrasound of pathological skin using a cascade of DeepLab v3+ networks and fuzzy connectedness. Comput Med Imaging Graph. 2022;95:102023.

    Article  PubMed  Google Scholar 

  19. Yan J, Yan T, Ye W, et al. Cotton leaf segmentation with composite backbone architecture combining convolution and attention. Front Plant Sci. 2023;14:1111175.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Storey G, Meng Q, Li B. Leaf disease segmentation and detection in apple orchards for precise smart spraying in sustainable agriculture. Sustainability. 2022;14(3):1458.

    Article  Google Scholar 

  21. Chen S, Zhang K, et al. An approach for rice bacterial leaf streak disease segmentation and disease severity estimation. Agriculture. 2021;11(5):420.

    Article  Google Scholar 

  22. Wang C, Du P, Wu H, Li J, Zhao C, Zhu H. A cucumber leaf disease severity classification method based on the fusion of DeepLabV3+ and U-Net. Comput Electron Agric. 2021;189:106373.

    Article  Google Scholar 

  23. Sultana N, Shorif SB, Akter M, Uddin MS. A dataset for successful recognition of cucumber diseases. Data Brief. 2023;49:109320.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We would like to acknowledge the contributions of our colleagues who provided valuable feedback and support throughout the project.

Funding

This research received no external funding.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed equally to the conception and design of the study, data collection and analysis, drafting and revising the manuscript, and have approved the final version for publication.

Corresponding author

Correspondence to Jiya Tian.

Ethics declarations

Ethical approval and consent to participate

This study does not involve human participants, human data, or human tissue.

Consent for publication

All authors have consented to the publication of this manuscript.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, J., Tian, J., Miao, J. et al. Adaptive loss-guided multi-stage residual ASPP for lesion segmentation and disease detection in cucumber under complex backgrounds. BMC Bioinformatics 25, 262 (2024). https://doi.org/10.1186/s12859-024-05890-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12859-024-05890-8

Keywords