Skip to main content

Accurate classification of white blood cells by coupling pre-trained ResNet and DenseNet with SCAM mechanism



Via counting the different kinds of white blood cells (WBCs), a good quantitative description of a person’s health status is obtained, thus forming the critical aspects for the early treatment of several diseases. Thereby, correct classification of WBCs is crucial. Unfortunately, the manual microscopic evaluation is complicated, time-consuming, and subjective, so its statistical reliability becomes limited. Hence, the automatic and accurate identification of WBCs is of great benefit. However, the similarity between WBC samples and the imbalance and insufficiency of samples in the field of medical computer vision bring challenges to intelligent and accurate classification of WBCs. To tackle these challenges, this study proposes a deep learning framework by coupling the pre-trained ResNet and DenseNet with SCAM (spatial and channel attention module) for accurately classifying WBCs.


In the proposed network, ResNet and DenseNet enables information reusage and new information exploration, respectively, which are both important and compatible for learning good representations. Meanwhile, the SCAM module sequentially infers attention maps from two separate dimensions of space and channel to emphasize important information or suppress unnecessary information, further enhancing the representation power of our model for WBCs to overcome the limitation of sample similarity. Moreover, the data augmentation and transfer learning techniques are used to handle the data of imbalance and insufficiency. In addition, the mixup approach is adopted for modeling the vicinity relation across training samples of different categories to increase the generalizability of the model. By comparing with five representative networks on our developed LDWBC dataset and the publicly available LISC, BCCD, and Raabin WBC datasets, our model achieves the best overall performance. We also implement the occlusion testing by the gradient-weighted class activation mapping (Grad-CAM) algorithm to improve the interpretability of our model.


The proposed method has great potential for application in intelligent and accurate classification of WBCs.

Peer Review reports


WBCs, also called leukocytes, are created in the bone marrow and lymphoid masses in the human immune system. These cells protect the human body from infections such as bacteria, viruses, and fungi [1,2,3]. Traditionally, WBCs are mainly divided into granulocytes and agranulocytes [4, 5]. The granulocytes contain basophils (0–1%), eosinophils (1–5%), and neutrophils (50–70%), while the agranulocytes include monocytes (2–10%) and lymphocytes (20–45%) [4, 6]. Figure 1 exhibits some examples of WBC images. If the number of WBCs in a human body is higher or lower than the reference values, which may lead to many kinds of diseases [7, 8]. Hence, to accurately classify different types of WBCs is necessary.

Fig. 1
figure 1

Examples of five types of WBC images

The classification technology of WBCs can be divided into three types: manual examination method, automated hematology analyzer detection method, and machine learning method. The manual examination method is considered the gold standard for discriminating WBCs [9, 10]. However, this approach is inefficient and its results rely on the experience and knowledge of the hematologists.

By comparison, the automated hematology analyzer detection method has the ability to address the above issues [11, 12]. The method is mainly based on different technologies, such as electrical impedance, radiofrequency conductivity, light scatter, fluorescent scatter, cytochemistry, etc. [13, 14], to automatically differentiate the WBC types, and can achieve high accuracy and efficiency. However, this method can not use the morphology of WBCs in blood smears for classification. Furthermore, it can not digitally preserve blood smears, so the retrospective study is not available. This means that once there is any abnormality in the detection device, hematologists have to re-collect blood smears and distinguish WBCs by manual examination.

Of late, the digital images of blood smears can be easily obtained due to the rapid development of digital microscope and information technology [15, 16]. Therefore, many computer-aided methods based on machine learning techniques including traditional machine learning based methods and deep learning based methods have been developed for automatically distinguishing different types of WBCs in blood cell images. The traditional machine learning based methods input the extracted discriminative features for representing WBCs into the classifier to implement the classification task. For instance, Alqudah et al. [17] investigated the feature extraction and classification of WBC based on using the combination of principal component analysis and three classifiers [probabilistic neural network, support vector machine (SVM), and random forest (RF)]. Duan et al. [18] extracted features such as texture, shape, and spectrum features from the segmented cells, and applied SVM to recognize the types of the WBCs. Sharma et al. [19] used the bio-inspired optimized grey wolf algorithm to find the optimal features, and then combined with SVM, decision tree, RF, and k-nearest neighbor classifiers to detect WBCs. Dong et al. [20] first extracted geometry, color, and texture features based on segmented WBCs, then used the feature selection algorithm based on classification and regression trees to remove irrelevant and redundant features, and finally analyzed the performance of the particle swarm optimization SVM. Although these classification approaches can yield good results, they highly rely on the selection of feature engineering. However, determining which features are selected for constructing a classification model is generally difficult.

Different from the traditional machine learning based methods, the deep learning based methods are able to automatically learn the features from images and simultaneously carry out classification. Thus, many deep learning based approaches have been developed and successfully applied to WBC classification. For instance, Ridoy et al. [21] verified the performance of the convolutional neural network (CNN)-based model they presented for automatically classifying WBCs on the BCCD (blood cell count and detection) dataset [22]. Mohamed et al. [23] proposed the deep learning + traditional learning hybrid framework for WBC classification. The deep learning is to yield the feature vector and the traditional machine learning is for WBC classification. They experimented several combinations on the BCCD dataset and found that the hybrid of a pre-trained 1.0 MobileNet-224 model and a logistic regression classifier reached the highest classification accuracy. In order to investigate the classification performance of different network structures, Habibzadeh et al. [24] transferred a variety of pre-trained Inception and ResNet models to the public BCCD dataset of WBCs and found that the 4-class classification results of fine-tuning all layers were better than those of just fine-tuning the last layers, and the ResNet models performed better than the Inception models. Kutlu et al. [25] obtained the similar results after experimenting various deep learning networks on the combination of the BCCD and the LISC (leukocyte images for segmentation and classifcation) datasets [26]. We think that the good performance of ResNet models may be attributed to the adoption of the skip connection mechanism, which creates a path propagating information from a lower layer directly to a higher layer, thus effectively alleviating the gradient vanishing problem and easing the model optimization. Recently, some fusion models have been proposed to improve the accuracy of classifying WBCs by combining several CNNs, e.g., CNN-RNN (recurrent neural network) [27], AlexNet-GoogleNet-DenseNet [28], etc. However, whether these models can inherit the advantages of each CNN needs to be further explored.

Nevertheless, the work of Chen et al. [29] has shown that ResNet and DenseNet respectively are good at reusing features and exploring new features, which helps to enhance the representation power of model. Based on their study, we develop a parallel CNN by combining ResNet and DenseNet modules to integrate the advantages of both. Besides, we add the SCAM attention module [30] to our network for adaptive feature refinement to further motivate the model to learn discriminative information from WBC images to address the problem of sample similarity. In addition, to deal with the imbalanced and insufficient data, data augmentation and transfer learning (TL) strategies are adopted in the training process of model. Meanwhile, the mixup method is used for modeling the vicinity relation between different kinds of training samples to improve the generalization ability of the proposed method. Finally, the Grad-CAM algorithm [31] is used for the occlusion testing to understand the decision-making process of the model.

The remainder of this paper is organized as follows: “Materials and methods” section introduces the data collection and processing and the proposed methods. “Experiments and results” section presents the experimental results and analysis. Finally, “Conclusion” section concludes this work.

Materials and methods

Data collection

We have collected four WBC datasets in this paper from several data sources. We intend to use these data to evaluate the performance of our method.

Fig. 2
figure 2

The process of WBC images generation. a Blood smear. b Microscopic image. c Color deconvolution to separate nucleus from background. d Marker extraction to locate WBCs. The white regions refer to the location of the nucleus. e Watershed algorithm to segment WBCs. f Crop to extract WBC images

From our cooperative medical institutions, we acquired 150 blood samples from 150 subjects. All samples are anonymized, so there is no concern about privacy. These samples were smeared, stained with Wright-Gimsa [32, 33], and scanned by the micro-scanning imaging device with high resolution to obtain the digital images. For each image, the WBC images with the size of 1280 \(\times\) 1280 pixels were extracted by utilizing our own developed cell segmentation method. Our approach consists of color deconvolution [34], marker extraction, and watershed algorithm [35]. Marker extraction is to locate nucleus and then locate cells. The specific process of locating nucleus includes image binarization, hole filling, morphology opening operation, dilate operation, distance transformation, and morphology reconstruction. Figure 2 illustrates the generation process of WBC images. All images were definitively labeled by the team of hematologists. Consequently, we collected 22645 WBC images, including 224 basophils, 968 monocytes, 539 eosinophils, 10469 neutrophils, and 10445 lymphocytes.

Considering that the quantity and diversity of data is of great importance for training a model with excellent performance [36], this study releases the largest freely available WBC image dataset (called the LDWBC dataset) we have known so far to help facilitate the development of clinical hematology.

From LISC database, we obtained 242 WBC images. The size of each WBC image is 720 \(\times\) 576 pixels. All the images were manually segmented and classified into five types by hematologists, consisting of 53 basophils, 48 monocytes, 39 eosinophils, 50 neutrophils, and 52 lymphocytes.

From BCCD database, we collected 12444 WBC images, which were divided into four categories: 3098 monocytes, 3120 eosinophils, 3123 neutrophils, and 3103 lymphocytes. The images in the dataset were cropped images of size 320 \(\times\) 240 pixels.

From Raabin database [37], we downloaded 14514 WBC images, comprising 301 basophils, 795 monocytes, 1066 eosinophils, 8891 neutrophils, and 3461 lymphocytes at resolutions of 575 \(\times\) 575.

Table 1 summarizes the four publicly available WBC datasets. It is noticed that the images in the LISC and BCCD datasets have low signal-to-noise ratio due to the inclusion of a large number of irrelevant background elements, which may have a negative impact on the performance of the model. Thereby, we cropped the WBC images in the LISC dataset based on the provided mask images of WBC. Meanwhile, we also extracted WBC images from the BCCD dataset by using our cell segmentation method. A total of 12336 W

Table 1 The image information in the four datasets

BC images were obtained, and another 108 images were excluded from this study since they did not contain WBC or contained only a small fraction of WBC. As a note, most of WBCs are located at the edges of the images in the BCCD dataset so the cropped WBC images still contain a lot of noise.

Classification model

Fig. 3
figure 3

a. The architecture of our model. b. The structure diagram of the SCAM block used in a. Conv: convolutional; FC: fully-connected; GAP: global average pooling; GMP: global max pooling

Figure 3a depicts the architecture of our model. In the parallel network, ResNet and DenseNet are selected to share their respective advantages: the former encourages the features reuse while the latter is able to explore new features, which are both significant for learning good representations. To fuse their extracted features, we respectively selected the middle layers and removed the last fully-connected (FC) layers of them (named ResNet and DenseNet modules), and then we used a convolutional layer (kernel size: 1 \(\times\) 1, number of filters: 512, size step: 1) to adjust the number of channels of the feature maps output by these two modules to ensure that the feature maps have the same size. Given the important role of attention in human perception, i.e., humans do not attempt to handle the whole scene but selectively concentrate on the prominent parts to better capture the visual structure [38]. Inspired by this, since the nucleus of WBC contains a large amount of discriminative information about the cell, we implanted a self-attention module into the model to improve the representation power of our network for the nucleus and thus overcome the limitation of sample similarity. The SCAM block shown in Fig. 3b is adopted, with the aim that the module includes both the spatial attention module (SAM) and channel attention module (CAM), where the SAM emphasizes where the important features are while CAM emphasizes what are the meaningful features in the feature maps. Finally, we sequentially stacked two FC layers to perform our WBC classification task. To alleviate the overfitting of the model, the dropout method was used before the last FC layer.

Although CNNs are highly effective in many applications, especially in image classification, training CNNs with high accuracy usually relies on massive data to help them understand the underlying patterns of data [39, 40]. Unfortunately, building large-scale WBC image data is extremely difficult clinically since the collection and annotation of WBC data are complex and expensive. However, TL relaxes the assumption that the training and test data must be independent and identically distributed [39], which means that it can use the knowledge learned from a similar domain to tackle a given domain task thus addressing the problem of limited data in the target domain. Some recent studies have fruitfully exploited TL in fields such as biomedicine [41,42,43], motivating us to also utilize TL to deal with insufficient WBC data. In addition, the low-level features extracted by CNNs are standard and regardless of the dataset utilized while the top-level features extracted are abstract and heavily rely on the dataset and task selected [44]. However, ResNet50 [45] and DenseNet121 [46] pre-trained on the ImageNet dataset have learned enough low-level features such as color, geometry, texture, etc., and features similar to these are also present in WBC images. Also based on this consideration, we implanted the parameters of the middle layers of these two pre-trained models into our model to enable our network to better concentrate on learning top-level features from WBC images to accomplish our classification task.

Data processing

Data augmentation

Despite applying TL method to deep learning model can effectively address the issue of insufficient WBC data to a certain extent, deep learning model is also generally very sensitive to category imbalance [47]. However, there is a natural imbalance in the number of each type of WBCs in the human body. Hence, to tackle this problem, the data augmentation strategies are employed [48]. Meanwhile, data augmentation also increases the amount of training data, improving the generalization ability of model. In this work, for the LDWBC, LISC, and Raabin datasets, data augmentation was respectively performed on the training sets by randomly combining several transformation operations including rotation, flipping, translation, etc. Noted that, for the BCCD dataset, the training set has been augmented. For the four datasets, the number of images in each augmented training set is displayed in Fig. 4.

Fig. 4
figure 4

Distribution for each category in the four augmented training sets

On the basis of the recommended computational requirements of ResNet model or DenseNet model, the uniform size of 224 \(\times\) 224 dimension for all WBC images in these four datasets needs to be established. Then, we randomly split the LDWBC and LISC datasets into training, validation, and test sets respectively in a 3:1:1 ratio. Considering that the BCCD and Raabin datasets have included test sets, we randomly divided the training data in these two datasets into training and validation sets respectively with a ratio of 3:1. The training set is used to fit and update the model parameters, the validation set is for model selection and parameter adjustment, and the test set aims to objectively assess the performance of the trained model. Table 2 presents the number of WBC images for different sets.

Table 2 The number of images in different sets in the four datasets

Mixup operation

Data augmentation assumes that the samples in the vicinity share the same category while ignoring the vicinity relation between samples of different categories. However, the study of Zhang et al. [49] has demonstrated that the mixup method models this vicinity relation by training the model on convex combinations of paired samples and their labels, acting as a regularizer to suppress overfitting of the model. Inspired by their work, we combine data augmentation and mixup operation for the training data to further improve the generalization of the model.

The details of the mixup operation are as follows: Suppose \((x_{u}, y_{u})\) and \((x_{v}, y_{v})\) are two samples randomly selected from the training data, where \(x_{u}\) and \(x_{v}\) denote the pixel matrix respectively, and \(y_{u}\) and \(y_{v}\) refer to the corresponding label, represented by one-hot encoding. The virtual instance (xy) is constructed by mixup operation:

$$\begin{aligned} x= & {} \lambda *x_{u}+(1-\lambda )*x_{v} \end{aligned}$$
$$\begin{aligned} y= & {} \lambda *y_{u}+(1-\lambda )*y_{v} \end{aligned}$$

where \(\lambda\) \(\in\) [0, 1] represents the weight factor that satisfies the distribution of Beta (\(\alpha\), \(\alpha\)) and \(\alpha\) \(\in\) (0, +\(\infty\)) is one parameter. To help understand the generation of virtual samples via mixup operation, an example is provided in Fig. 5.

Fig. 5
figure 5

An example of the mixup operation for constructing a virtual training sample

Model training

All the models were trained, validated, and tested on a 64-bit ubuntu 16.04 operating system with Intel E5-2650 v4 2.20 GHz CPU, 256 Gb RAM, NVIDIA TITAN Xp 12 Gb GPU. For training, the RAdam optimizer [50] is utilized to minimize the categorical cross-entropy loss in Eq. (3). The parameter configuration is revealed in Table 3.

$$\begin{aligned} loss = -[ylog({\hat{y}})+(1-y)log(1-{\hat{y}})] \end{aligned}$$

where y and \({\hat{y}}\) respectively denote true label and predicted label.

Table 3 The parameter configuration of models

Experiments and results

We started by evaluating the impact of the mixup operation on model performance. The effects of several different attention methods were then compared. After that, the contribution of the ResNet and DenseNet modules and the attention module in our model, and the effort of TL for the model were verified by ablation studies. Then, the proposed model was compared with five representative networks on the four WBC datasets. We finally applied the Grad-CAM algorithm for the occlusion testing to help explain the decision-making process of our model.

Performance metrics

The overall accuracy (OA), average precision (AP), average recall (AR), and average F1-score (AF1) are utilized to evaluate the ability of the model to identify WBC images. OA is calculated by dividing the number of correctly classified samples by the total number of samples. The other three evaluation criteria are stated as:

$$\begin{aligned} AP= & {} \frac{1}{N}\sum \limits _{c = 0}^{N - 1} {\frac{{TP(c)}}{{TP(c) + FP(c)}}} \end{aligned}$$
$$\begin{aligned} AR= & {} \frac{1}{N}\sum \limits _{c = 0}^{N - 1} {\frac{{TP(c)}}{{TP(c) + FN(c)}}} \end{aligned}$$
$$\begin{aligned} AF1= & {} \frac{{2*AP*AR}}{{AP + AR}} \end{aligned}$$

where N is the number of classes, and c represents that a class takes it as positive class and the other classes as negative class. TP (true positive): number of correctly classified positive samples; FP (false positive): number of misclassified negative samples; TN (true negative): number of correctly classified negative samples; FN (false negative): number of misclassified positive samples.

Investigation on effect of mixup operation on model

According to Eqs. (1) and (2), the degree of linear interpolations of training samples depends on the value of the parameter \(\alpha\). Therefore, we assessed the effect of setting the parameter between 0 and 1 with step 0.2 on the classification performance of our model. Table 4 displays the classification results of our model on the LDWBC test set. It can be seen from this table that the model trained with the virtual samples can yield higher scores than that trained with the raw samples (\(\alpha = 0\)). And, we also find that our model acquires the best performance when \(\alpha = 0.2\). So, the value of \(\alpha\) is set to 0.2 for generating the virtual training samples to construct our model.

Table 4 The classification results (%) under different \(\alpha\) settings on our LDWBC test set
Fig. 6
figure 6

Effect of mixup operation on train and validation sets

We also respectively plotted the curves (\(\alpha = 0\) and \(\alpha = 0.2\)) of the training and validation accuracies along with training epochs in Fig. 6, which shows that the model trained with the raw data is overfitting. The accuracy on the training set reaches 100% after several epochs, whereas the highest accuracy on the validation set is only 97.37%. On the contrary, the training and validation accuracies of the model trained with the virtual data are very close (98.53% and 97.62%), which illustrates that using the virtual samples instead of the raw ones can get more robust models. After using virtual data, although the accuracy rate on the validation set has some fluctuations, it has been improved to a certain extent. In addition, since the accuracy of the training set without using virtual data has approached 100%, the update of the network has become slow. We considered that the network has fallen into a stopping process at this time, so the accuracy of the validation set has not changed much, which seems more stable.

Comparison of different attention methods

Fig. 7
figure 7

The confusion matrices for classification on our LDWBC test set. a CAM. b CSAM. c SCAM

Table 5 lists the effects of several common attention modules and their arrangement methods on the performance of model. From this table, it can be found that whether using channel attention or spatial attention or their combination can enhance the representation ability of network. However, we also find that the model seems to perform better when utilizing only channel attention. For further insight into the classification results, Table 6 exhibits the accuracy of model in identifying different types of WBCs. We can see that compared to using only channel or spatial attention, the parallel arrangement (CAM // SAM) does not improve the performance of model while the sequential arrangement (CSAM and SCAM) significantly raise the ability of model to recognize monocytes. This shows that the attention maps generated by the latter are finer than those generated by the former. To reveal the classification effect of the model using CAM, CSAM, and SCAM in more detail, Fig. 7 provides the corresponding confusion matrices. From Fig. 7 we can clearly see that the model performs best on lymphocytes but worst on monocytes by using CAM. In contrast, the model used CSAM or SCAM performs more balanced on these two types of WBCs. This indicates that the spatial attention method indeed enhances the representation ability of model to the nucleus. Finally, the further comparison shows that SCAM performs more balanced on all categories of WBCs compared to CSAM. This is due to the fact that CAM and SAM have different functions and therefore the order of combination impacts the performance of model.

Table 5 The performances (%) of models with different attention methods on our LDWBC test set
Table 6 The accuracies (%) of models with different attention methods for each category on our LDWBC test set

Ablation study on model

Since we have evaluated the role of SCAM module in our model in the previous section, here we only assessed the contribution of the ResNet and DenseNet modules to the model by performing an ablation study. Table 7 lists the comparison results on different performance metrics. It can be seen from this table that the performance of the model decreases regardless of which branch is removed from the model, which shows that the advantages of the ResNet and DenseNet modules are compatible, enhancing the ability of our model to exploit the information in WBC images.

Table 7 The classification results (%) of the proposed components on our LDWBC test set

Further, the effect of TL method on our model was also validated via ablation study. Tables 8 and  9 show the overall classification results of the model and the classification accuracy for each category, respectively. As can be seen from Tables 8 and  9, the use of TL method in any branch significantly enhances the ability of the model to identify basophils and monocytes. And the simultaneous use of TL method in both branches further effectively raises the classification ability of model on monocytes. This implies that TL enables the model to better learn the abstract features in WBC images and thus improves the representation ability of model. This also shows that TL in WBC classification task is an effective method for the limited training data.

Table 8 The classification results (%) of the TL method on our LDWBC test set
Table 9 The accuracies (%) of the TL method for each category on our LDWBC test set

Comparison with other methods

Fig. 8
figure 8

Train accuracy of all models on our LDWBC dataset

To evaluate the classification performance, we compared our model with five state-of-the-art methods on the four WBC datasets. All methods have the same parameter configuration. The models were trained on the training sets both on raw data and virtual data for the LDWBC dataset, and the one with the highest accuracy on the validation set for each method was selected as the final model. We evaluated the final models on the test sets, and the comparison results are shown in Table 10. As can be seen from Table 10, the performances of most models are improved by using mixup operation. Meanwhile, our model yields the best classification results. Moreover, we also compared the details of the training process of the proposed model with the five models on the LDWBC dataset, and the results are shown in Figs. 8 and 9 respectively. As can be seen from the figures, not only does our model obtains the highest accuracies in both training and validation sets, but also its performance fluctuates very slightly along the epochs of training. The results once again demonstrate that our model is robust and has strong adaptability for data. In addition, the performances of these models based on mixup operation were also compared on the other three datasets (See Table 11). In Table 11, the performance of our model ranks first on the BCCD and Raabin datasets and tied for second on the LISC dataset. These results collectively demonstrate that our model has excellent overall classification performance.

Fig. 9
figure 9

Validation accuracy of all models on our LDWBC dataset. THA refers to the highest accuracy

Table 10 The comparing results (%) of different methods for raw data and virtual data on our LDWBC test set
Table 11 The comparing results (%) of different methods on the LISC, BCCD, and Raabin test sets

We also present the classification accuracy of all models on these four datasets for each category of WBCs in Table 12. We find that our method displays excellent performance for almost all types of WBC on each dataset compared to other methods, especially on monocytes, which again shows the promising performance of our method. We also find that almost all methods are able to identify each type of WBC well on the LISC and Raabin datasets. However, all methods perform worse on the BCCD dataset than on the other datasets, which is likely attributable to the cropped WBC images in the dataset still having a low signal-to-noise ratio.

Table 12 The accuracies (%) of models for each category on the test sets of the four datasets

Interpretability of model

Fig. 10
figure 10

Several visualization examples are selected from the test sets of the four datasets. For each set, the left column is the raw input image, and the right column is the occlusion map generated by superimposing heatmap on the raw input image

In order to investigate the interpretability of our model, the occlusion testing was performed via utilizing the Grad-CAM algorithm to visualize the regions which had the greatest impact on model decisions. In our model, the output of the SCAM module was made transparent to the prediction of each type of WBC image, as shown in Fig. 10. In Fig. 10, the red regions on the occlusion map represent the areas where the model pays the most attention during the classification, while the blue regions receive the least attention, which can be decoded by the color bar on the right. We find that the salient areas of the occlusion maps are located on the nucleus, which indicates that the model uses features extracted from specific regions in the input WBC images and draws corresponding classification conclusions.


In the present study, a novel deep learning method is developed to automatically and accurately differentiate WBCs. Our proposed method is able to learn better feature representation by integrating the advantages of ResNet and DenseNet. Moreover, the method also benefits from the guidance of the SCAM mechanism, further enhancing the representation ability of the model via emphasizing the meaningful features in WBC images in two independent dimensions of space and channel, which helps to tackle the issue of sample similarity. Since spatial attention and channel attention have different functions, different arrangement methods will yield different classification results. Considering that the imbalanced or insufficient training data may negatively affect the performance of the deep learning model, we adopt data augmentation and TL methods respectively. Furthermore, we use mixup method in addition to the dropout technique to model the vicinity relation between training samples of different classes to form a strong regularizer to further improve the generalization ability of the model. On the four WBC datasets, our method not only achieves superior overall classification performance but also performs well on each class of WBCs compared to other state-of-the-art methods. Finally, the occlusion testing is implemented using the Grad-CAM algorithm to visualize the discriminative areas of our model, thereby improving the explainability of the classification results.

Although the results of our method are promising, there also exist several limitations. We should improve the loss function to decrease the intra-class variations and increase the inter-class differences simultaneously to further raise the representation power of our method as part of future work. This is because the cross-entropy loss function penalizes the misclassified samples to separate the features of different categories, but ignores the differences between these samples. Apart from this, the current classification is based on five major subtypes of WBCs. However, more subtype classification is also a challenge for future study.

Availability of data and materials

The four WBC datasets analysed during the current study are publicly available through the following links: LDWBC: LISC: BCCD: Raabin:



White blood cell


Spatial and channel attention module


Transfer learning


Leukocyte images for segmentation and classifcation


Blood cell count and detection


Gradient-weighted class activation mapping


Support vector machine


Random forest


Convolutional neural network


Recurrent neural network














Spatial attention module


Channel attention module


Overall accuracy


Average precision


Average recall


Average F1-score


True positive


False positive


True negative


False negative


Channel and spatial attention module


Global average pooling


Global max pooling


The highest accuracy


Channel dimension


Spatial dimension


Two dimensions


Squeeze and excitation


Efficient channel attention


ResNet module


DenseNet module


  1. Almezhghwi K, Serte S. Improved classification of white blood cells with the generative adversarial network and deep convolutional neural network. Comput Intell Neurosci. 2020;2020:1–12.

    Article  Google Scholar 

  2. Siddique MAI, Aziz AZB, Matin A. An improved deep learning based classification of human white blood cell images. In: International Conference on Electrical and Computer Engineering (ICECE), 2020. p. 149–52.

  3. Khan MA, Qasim M, Lodhi HMJ, Nazir M, Javed K, Rubab S, Din A, Habib U. Automated design for recognition of blood cells diseases from hematopathology using classical features selection and elm. Microsc Res Tech. 2021;84(2):202–16.

    CAS  Article  Google Scholar 

  4. Saade P, El Jammal R, El Hayek S, Abi Zeid J, Falou O, Azar D. Computer-aided detection of white blood cells using geometric features and color. In: Cairo international biomedical engineering conference (CIBEC), 2018. p. 142–5.

  5. Ghosh S, Majumder M, Kudeshia A. Leukox: leukocyte classification using least entropy combiner (lec) for ensemble learning. IEEE Trans Circuits Syst II-Express Briefs. 2021;68(8):2977–81.

    Article  Google Scholar 

  6. Karthikeyan M, Venkatesan R. Interpolative leishman-stained transformation invariant deep pattern classification for white blood cells. Soft Comput. 2020;24(16):12215–25.

    Article  Google Scholar 

  7. Özyurt F. A fused cnn model for wbc detection with mrmr feature selection and extreme learning machine. Soft Comput. 2020;24(11):8163–72.

    Article  Google Scholar 

  8. Baby D, Devaraj SJ, Hemanth J. Leukocyte classification based on feature selection using extra trees classifier: a transfer learning approach. Turk J Electr Eng Comput Sci. 2021;29:2742–57.

    Article  Google Scholar 

  9. Hegde RB, Prasad K, Hebbar H, Singh BMK. Feature extraction using traditional image processing and convolutional neural network methods to classify white blood cells: a study. Australas Phys Eng Sci Med. 2019;42(2):627–38.

    Article  Google Scholar 

  10. Wijesinghe CB, Wickramarachchi DN, Kalupahana IN, Lokesha R, Silva ID, Nanayakkara ND. Fully automated detection and classification of white blood cells. In: Annual international conference of the IEEE engineering in medicine & biology society (EMBC), 2020. p. 1816–9.

  11. Ryabchykov O, Ramoji A, Bocklitz T, Foerster M, Hagel S, Kroegel C, Bauer M, Neugebauer U, Popp J. Leukocyte subtypes classification by means of image processing. In: Federated conference on computer science and information systems (FedCSIS), 2016. p. 309–16.

  12. Chhabra G. Automated hematology analyzers: recent trends and applications. J Lab Phys. 2018;10(01):015–6.

    CAS  Google Scholar 

  13. Bigorra L, Larriba I, Gutiérrez-Gallego R. Machine learning algorithms for the detection of spurious white blood cell differentials due to erythrocyte lysis resistance. J Clin Pathol. 2019;72(6):431–7.

    CAS  Article  Google Scholar 

  14. Dhieb N, Ghazzai H, Besbes H, Massoud Y. An automated blood cells counting and classification framework using mask r-cnn deep learning model. In: International conference on microelectronics (ICM), 2019. p. 300–3.

  15. Kratz A, Lee S-h, Zini G, Riedl JA, Hur M, Machin S, International Council for Standardization in Haematology IC. Digital morphology analyzers in hematology: Icsh review and recommendations. Int J Lab Hematol. 2019;41(4):437–47.

  16. El Achi H, Khoury JD. Artificial intelligence and digital microscopy applications in diagnostic hematopathology. Cancers. 2020;12(4):797–811.

    Article  Google Scholar 

  17. Alqudah AM, Al-Ta’ani O, Al-Badarneh A. Automatic segmentation and classification of white blood cells in peripheral blood samples. J Eng Sci Tech Rev. 2018;11(6):7–13.

    Article  Google Scholar 

  18. Duan Y, Wang J, Hu M, Zhou M, Li Q, Sun L, Qiu S, Wang Y. Leukocyte classification based on spatial and spectral features of microscopic hyperspectral images. Opt Laser Technol. 2019;112(15):530–8.

    CAS  Article  Google Scholar 

  19. Sharma P, Sharma M, Gupta D, Mittal N. Detection of white blood cells using optimized qgwo. Intell Decis Technol. 2021;15(1):141–9.

    Article  Google Scholar 

  20. Dong N, Zhai M-d, Chang J-f, Wu C-h. A self-adaptive approach for white blood cell classification towards point-of-care testing. Appl Soft Comput. 2021;111:107709.

    Article  Google Scholar 

  21. Ridoy MAR, Islam MR. An automated approach to white blood cell classification using a lightweight convolutional neural network. In: International conference on advanced information and communication technology (ICAICT), 2020. p. 480–3.

  22. Mooney P. Blood cell images. 2018. Accessed 6 Jan 2022.

  23. H Mohamed E, H El-Behaidy W, Khoriba G, Li J. Improved white blood cells classification based on pre-trained deep learning models. J Commun Softw Syst. 2020;16(1):37–45.

    Article  Google Scholar 

  24. Habibzadeh M, Jannesari M, Rezaei Z, Baharvand H, Totonchi M. Automatic white blood cell classification using pre-trained deep learning models: resnet and inception. In: International conference on machine vision (ICMV), 2018. p. 274–81.

  25. Kutlu H, Avci E, Özyurt F. White blood cells detection and classification based on regional convolutional neural networks. Med Hypotheses. 2020;135:109472.

    CAS  Article  Google Scholar 

  26. Rezatofighi SH, Soltanian-Zadeh H. Automatic recognition of five types of white blood cells in peripheral blood. Comput Med Imaging Graph. 2011;35(4):333–43.

    Article  Google Scholar 

  27. Liang G, Hong H, Xie W, Zheng L. Combining convolutional neural network with recursive neural network for blood cell image classification. IEEE Access. 2018;6:36188–97.

    Article  Google Scholar 

  28. Toğaçar M, Ergen B, Cömert Z. Classification of white blood cells using deep features obtained from convolutional neural network models based on the combination of feature selection methods. Appl Soft Comput. 2020;97:106810.

    Article  Google Scholar 

  29. Chen Y, Li J, Xiao H, Jin X, Yan S, Feng J. Dual path networks. In: International conference on neural information processing systems (NIPS), 2017. p. 4470–8.

  30. Woo S, Park J, Lee J-Y, Kweon IS. Cbam: convolutional block attention module. In: Proceedings of the european conference on computer vision (ECCV), 2018. p. 3–19.

  31. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization. Int J Comput Vis. 2019;128:336–59.

    Article  Google Scholar 

  32. Modabbernia MJ, Mirsafa AR, Modabbernia A, Pilehroodi F, Shirazi M. Catatonic syndrome associated with lead intoxication: a case report. Cases J. 2009;2(1):1–3.

    Article  Google Scholar 

  33. Rafiee MH. Evaluation of cytotoxic effect of zinc on raji cell-line by mtt assay. Iran JToxicol. 2011;4(4):390–6.

    Google Scholar 

  34. Khan AM, Rajpoot N, Treanor D, Magee D. A nonlinear mapping approach to stain normalization in digital histopathology images using image-specific color deconvolution. IEEE Trans Biomed Eng. 2014;61(6):1729–38.

    Article  Google Scholar 

  35. Riddell C, Brigger P, Carson RE, Bacharach SL. The watershed algorithm: a method to segment noisy pet transmission images. IEEE Trans Nucl Sci. 1999;46(3):713–9.

    Article  Google Scholar 

  36. Zhang L, Lu L, Nogues I, Summers RM, Liu S, Yao J. Deeppap: deep convolutional networks for cervical cell classification. IEEE J Biomed Health Inform. 2017;21(6):1633–43.

    Article  Google Scholar 

  37. Kouzehkanan ZM, Saghari S, Tavakoli E, Rostami P, Abaszadeh M, Mirzadeh F, Satlsar ES, Gheidishahran M, Gorgi F, Mohammadi S, et al. Raabin-wbc: a large free access dataset of white blood cells from normal peripheral blood. bioRxiv 2021.

  38. Larochelle H, Hinton GE. Learning to combine foveal glimpses with a third-order boltzmann machine. In: International conference on neural information processing systems (NIPS), 2010. p. 1243–51.

  39. Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C. A survey on deep transfer learning. In: International conference on artificial neural networks (ICANN), 2018. p. 270–9.

  40. Kim HE, Cosa-Linan A, Santhanam N, Jannesari M, Maros ME, Ganslandt T. Transfer learning for medical image classification: a literature review. BMC Med Imag. 2022;22(1):1–13.

    Article  Google Scholar 

  41. Pio G, Mignone P, Magazzù G, Zampieri G, Ceci M, Angione C. Integrating genome-scale metabolic modelling and transfer learning for human gene regulatory network reconstruction. Bioinformatics. 2022;38(2):487–93.

    CAS  Article  Google Scholar 

  42. Kakati T, Bhattacharyya DK, Kalita JK, Norden-Krichmar TM. Degnext: classification of differentially expressed genes from rna-seq data using a convolutional neural network with transfer learning. BMC Bioinform. 2022;23(1):1–18.

    Article  Google Scholar 

  43. Cengil E, Çınar A, Yıldırım M. A hybrid approach for efficient multi-classification of white blood cells based on transfer learning techniques and traditional machine learning methods. Concurrency Computat Pract Exper. 2022;34(6):1–14.

    Article  Google Scholar 

  44. Kandaswamy C, Silva LM, Alexandre LA, Santos JM. Deep transfer learning ensemble for classification. In: International work-conference on artificial neural networks (IWANN), 2015. p. 335–48.

  45. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Conference on computer vision and pattern recognition (CVPR), 2016. p. 770–8.

  46. Huang G, Liu Z, Maaten L, Weinberger KQ. Densely connected convolutional networks. In: Conference on computer vision and pattern recognition (CVPR), 2017. p. 2261–9.

  47. Lippeveld M, Knill C, Ladlow E, Fuller A, Michaelis LJ, Saeys Y, Filby A, Peralta D. Classification of human white blood cells using machine learning for stain-free imaging flow cytometry. Cytom Part A. 2020;97(3):308–19.

    Article  Google Scholar 

  48. Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. J Big Data. 2019;6(1):1–48.

    Article  Google Scholar 

  49. Zhang H, Cisse M, Dauphin YN, Lopez-Paz D. Mixup: beyond empirical risk minimization. In: International conference on learning representations (ICLR), 2018. p. 1–13.

  50. Liu L, Jiang H, He P, Chen W, Liu X, Gao J, Han J. On the variance of the adaptive learning rate and beyond. In: International conference on learning representations (ICLR), 2020. p. 1–14.

  51. Hu J, Shen L, Albanie S, Sun G, Wu E. Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell. 2020;42(8):2011–23.

    Article  Google Scholar 

  52. Chen H, Liu J, Hua C, Zuo Z, Feng J, Pang B, Xiao D. Transmixnet: an attention based double-branch model for white blood cell classification and its training with the fuzzified training data. In: IEEE international conference on bioinformatics and biomedicine (BIBM), 2021. p. 842–7.

  53. Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q. Eca-net: Efficient channel attention for deep convolutional neural networks. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2020. p. 11531–9.

  54. Yu W, Chang J, Yang C, Zhang L, Shen H, Xia Y, Sha J. Automatic classification of leukocytes using deep neural network. In: International conference on ASIC (ASICON), 2017. p. 1041–4.

  55. Jiang M, Cheng L, Qin F, Du L, Zhang M. White blood cells classification with deep convolutional neural networks. Int J Pattern Recognit Artif Intell. 2018;32(09):1857006.

    Article  Google Scholar 

  56. Sharma M, Bhave A, Janghel RR. White blood cell classification using convolutional neural network. In: International conference on soft computing and signal processing (ICSCSP), 2019. p. 135–43.

Download references


Not applicable.


This work was supported by the Major Projects of Technological Innovation in Hubei Province [Grant Number 2019AEA170]; the Frontier Projects of Wuhan for Application Foundation [Grant Number 2019010701011381]; and the Translational Medicine and Interdisciplinary Research Joint Fund of Zhongnan Hospital of Wuhan University [Grant Number ZNJC201919].

Author information

Authors and Affiliations



HC, JL, and CH conceived and designed the study. HC and CH executed the experiments. JF processed and analyzed the data. BP, DC, and CL collected the data. HC and JL wrote this paper. JL provided guidance and made critical revisions. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Juan Liu.

Ethics declarations

Ethics approval and consent to participate

All procedures performed in this study are in accordance with the tenets of the Declaration of Helsinki and are approved by the ethics committee of the Zhongnan Hospital of Wuhan University. All participants obtain the informed consent.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chen, H., Liu, J., Hua, C. et al. Accurate classification of white blood cells by coupling pre-trained ResNet and DenseNet with SCAM mechanism. BMC Bioinformatics 23, 282 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Deep learning
  • Spatial and channel attention
  • Transfer learning
  • Mixup
  • White blood cells classification