CAD systems for COVID-19 diagnosis and disease stage classification by segmentation of infected regions from CT images

Alshayeji, Mohammad H.; ChandraBhasi Sindhu, Silpa; Abed, Sa’ed

doi:10.1186/s12859-022-04818-4

Research
Open access
Published: 06 July 2022

CAD systems for COVID-19 diagnosis and disease stage classification by segmentation of infected regions from CT images

Mohammad H. Alshayeji¹,
Silpa ChandraBhasi Sindhu² &
Sa’ed Abed¹

BMC Bioinformatics volume 23, Article number: 264 (2022) Cite this article

1784 Accesses
10 Citations
19 Altmetric
Metrics details

Abstract

Background

Here propose a computer-aided diagnosis (CAD) system to differentiate COVID-19 (the coronavirus disease of 2019) patients from normal cases, as well as to perform infection region segmentation along with infection severity estimation using computed tomography (CT) images. The developed system facilitates timely administration of appropriate treatment by identifying the disease stage without reliance on medical professionals. So far, this developed model gives the most accurate, fully automatic COVID-19 real-time CAD framework.

Results

The CT image dataset of COVID-19 and non-COVID-19 individuals were subjected to conventional ML stages to perform binary classification. In the feature extraction stage, SIFT, SURF, ORB image descriptors and bag of features technique were implemented for the appropriate differentiation of chest CT regions affected with COVID-19 from normal cases. This is the first work introducing this concept for COVID-19 diagnosis application. The preferred diverse database and selected features that are invariant to scale, rotation, distortion, noise etc. make this framework real-time applicable. Also, this fully automatic approach which is faster compared to existing models helps to incorporate it into CAD systems. The severity score was measured based on the infected regions along the lung field. Infected regions were segmented through a three-class semantic segmentation of the lung CT image. Using severity score, the disease stages were classified as mild if the lesion area covers less than 25% of the lung area; moderate if 25–50% and severe if greater than 50%. Our proposed model resulted in classification accuracy of 99.7% with a PNN classifier, along with area under the curve (AUC) of 0.9988, 99.6% sensitivity, 99.9% specificity and a misclassification rate of 0.0027. The developed infected region segmentation model gave 99.47% global accuracy, 94.04% mean accuracy, 0.8968 mean IoU (intersection over union), 0.9899 weighted IoU, and a mean Boundary F1 (BF) contour matching score of 0.9453, using Deepabv3+ with its weights initialized using ResNet-50.

Conclusions

The developed CAD system model is able to perform fully automatic and accurate diagnosis of COVID-19 along with infected region extraction and disease stage identification. The ORB image descriptor with bag of features technique and PNN classifier achieved the superior classification performance.

Peer Review reports

Background

The lung is a respiratory organ which is powerless against airborne injuries and contaminations. According to World Health Organization (WHO) [1] recent reports, third most common cause of death is lung diseases, with about three million people dying per year. Even though smoking, genetics, and air pollution are among the causes of lung diseases, the main current reason for the huge rise in lung disease is infection by bacteria or viruses. Acute lower respiratory tract infections appear to be a primary cause of death and illness in both children and adults. Bacteria or virus lung infections affect the lungs functionality and may even lead to death if not treated on time. SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is a recent human pathogenic virus that causes severe lung disease and has impacted several million people worldwide [2]. This disease first reported in China’s Hubei Province in or before December 2019 and had spread internationally by early. The immune system will experience symptoms within two to fourteen days if the virus gets into touch with the mucous membranes that line a person's mouth, nose and eyes. By entering into healthy body cells, the virus kick-starts the production of more infected cells. Soon, it multiplies and infects the rest of the cells. Viral proteins use angiotensin-converting enzyme 2 (ACE2) receptors to gain entrance to healthy cells, where they then seize them. Having taken command of such cells, these proteins continue to destroy them completely. As there are more ACE2 receptors in the lower airways than in other places, COVID-19 is more likely to travel deeper into the respiratory tract. Human respiratory tracts are affected by the virus either at the upper or lower parts; then the immune system tries to fight the infection as it passes through the respiratory tract. Infection caused by the virus results in lung and airway swelling and inflammation. The infection usually starts in one part of the lung and may then spread further.

COVID-19 causes lung complications, such as pneumonia, resulting in shortness of breath caused by fluid build-up in the lungs. Furthermore, lung inflammation inhibits the ability to absorb oxygen. Based on the level of infection within the lungs, the disease can be mild, moderate, or severe. Because of the huge rise in patient counts on daily basis, hospitals are struggling to provide treatment that keeps up with hospital admission rates. Chest X-ray (CXR) and CT scans are the medical imaging tools employed to diagnose COVID-19. These imaging modalities are also highly helpful in the early diagnosis of lung diseases. Samples taken from the nose and throat are used to determine whether COVID-19 is present through real-time reverse transcription-polymerase chain reaction test (RT-PCR). Some studies, however, have already reported that it is less sensitive during the initial disease stages and that 24 h are required for a result to be confirmed. Physicians can identify a more detailed disease picture by using a CT scan than by using conventional X-rays. Moreover, a CT scan can identify the exact problem location more precisely [3]. The common CT findings of COVID-19 patients are ground-glass opacities (GGOs), peripheral distribution, multilobar involvement, bilateral lesion involvement, and posterior lesion topography [4]. Largely, however, GGOs are seen with crazy-paving patterns, and nodular or rounded features. The lower lobes of the lungs are the most affected by this pneumonia and, in most cases, GGO findings are visible even in the initial disease stages. Specifically, CT images give clearer and more detailed information about the lung region than CXR.

CT scan is selected as the imaging modality for the model development since it can identify the exact problem location more precisely. Although simple conventional segmentation methods exist, they are not effective for our purpose due to the database diversity. Recently, deep neural networks have attracted many researchers, who have used different deep learning (DL) models for semantic segmentation. Here performed two classes of segmentation (background and lungs) in the diagnosis stage for the ROI extraction, since we were able to achieve computational efficiency in the later stages. Once diagnosed with COVID-19, the infected regions were segmented by using a three-class segmentation (i.e., background, lungs, and COVID-19-infected regions). This semantic segmentation was achieved using the most recent, fastest and computationally efficient semantic segmentation network DeepLabv3+, which was invented by Google [5]. In the feature extraction stage, scale-invariant feature transform (SIFT), oriented FAST and rotated BRIEF (ORB) and speeded-up robust features (SURF) techniques [6] were implemented to ensure the model was invariant to scale, rotation, distortion, noise, etc. Due to the direct feature engineering involved in classical ML, these algorithms are quite easy to interpret and understand. Hence, we finalized with conventional ML classifiers for the proposed model development.

The complete workflow is illustrated in Fig. 1. The major contributions of this paper are outlined below.

1.
Developed an automatic CAD system able to perform COVID-19 diagnosis by utilizing lung CT images with the help of conventional ML steps. Also, implemented bag of features technique followed by SIFT, SURF, and ORB image descriptors in the feature extraction stage of the CAD system. Applying these image descriptors helps to differentiate between COVID-19 affected and normal lung CT images accurately and training the model with this information helps to achieve high performance. To the best of our knowledge, this is the first work utilizing this feature extraction technique for COVID-19 diagnosis.
2.
Developed a DL semantic segmentation method for the segmentation of infected regions, as identified via lung CT scans of COVID-19 patients and visualized it.
3.
Carried out a severity score evaluation implementation on the developed CAD system, allowing for infection stages to be identified without the need for medical professionals and for appropriate medical assistance to be given at time of hospital admission without delay.
4.
Used a diverse dataset with a huge number of CT images to achieve a real-time applicable model. Moreover, here experimented with different networks in the DL model and employed transfer learning, grid search (GS), and cross-validation concepts.

Related works

From the starting stage of COVID-19 pandemic itself, researchers from different areas started working on diagnosis application to come up with useful findings that will aid in automatic diagnosis systems.

DL classification approaches

Majority of the COVID-19 diagnosis research works were purely based on DL networks. Silva et al. [7] employed a high-quality DL model for COVID-19 diagnosis with EfficientNet by implementing a voting-based approach and cross dataset study, using the two largest publicly available datasets. The major limitations on the use of CT scan images were that slices from the same patients were treated independently, and images from the same patient could be repeated in the train and test dataset studies. In their study, this issue was solved by the concept of the voting-based approach. The voting scheme considered all CT images of a given patient rather than a single CT image; hence, it gave a high success rate. Jin et al. [8] developed and deployed a COVID-19 diagnosis system in four weeks, using a limited CT image dataset that was available in the COVID-19 pandemic initial stage. In this work, the authors performed 3D segmentation and classification as key stages using 3DUnet++-ResNet-50. Later, in the research by Santosh et al. [9], a type of active learning was used in which the learner had some role in deciding the data trained; hence, it was a kind of self-learning. This kind of incremental learning helps the model adapt to a new kind of dataset without losing knowledge of an existing one. Furthermore, an anomaly detection technique was employed to access the changes in data.

In [10], DL was utilized to train X-ray and CT-scan images individually. The upgraded VGG16 deep transfer learning models are used to perform COVID-19 classification. For COVID-19 CT-scan image binary classification, they employed four pre-trained convolutional neural network (CNN) models: VGG16, DenseNet121, ResNet50, and ResNet152, and suggested the fast AI ResNet framework in the detection of COVID-19 CT-scan images with high accuracy of 99%. However due to the limitation regarding the metadata, they were unable to incorporate disease severity identification module into their framework. A novel deep neural network architecture that is tailored for the detection of COVID-19 cases from CXR images using a human–machine collaborative design strategy named COVID-Net was implemented in [11]. When employing COVID-Net for accelerated computer-aided screening, COVID-Net produces predictions using an explainability method in an attempt to acquire deeper insights into crucial factors connected with COVID cases, which can benefit clinicians in enhanced screening as well as promote trust and transparency. This approach achieved 98.9% positive predictive value (PPV) but failed in predicting the risk status.

The primary goal of Kassania et al.’s [12] work was to implement a generic feature extraction method using a CNN to eliminate the handcrafted and complex features needed for imaging modalities as well as to reduce generalization error and increase diagnosis accuracy. In this study, they employed 15 different CNN feature extractors and 6 ML classifiers for COVID-19 identification from normal cases, using X-rays and CT scan images. Since they lacked sufficiently large training data to develop the model from scratch, they used a transfer learning concept which also eliminated the problem of overfitting. To achieve better generalization, they also avoided data augmentation and extensive pre-processing. Here authors state that avoiding extensive preprocessing helps to make the model more robust to noise, artifacts and variations in input images during feature extraction phase, and avoiding data augmentation will reduce bias toward the model performance. In this work, they concluded that combinations of deep CNN and bagging tree classifiers give better classification performances.

All these reviewed models completely relied on the DL networks in taking COVID-19 diagnosis decisions. Since they were acted like black boxes it is unable to identify the criteria based on which network took such decisions. In the DL approach applied by Gozes et al. [13], abnormalities were visualized using grad-CAM technique by extracting activation functions, since these contribute to the area responsible for a DL network’s decision. Similarly, grad-CAM visualization used in [14], where transfer learning implemented to test COVID-19 using CT images and analyzed the effects of various starting parameters on the results. They demonstrated that the model, which was pre-trained on ImageNet21k, have strong generalizability in CT images and the model achieved an accuracy of 99.2%.

Classical ML approaches

Only few research work carried out with classical ML approach in COVID-19 diagnosis where hand crafted features come into action. Al-Karawi et al. [15] proposed an ML approach to find COVID-19 patients, using a texture analysis concept in the feature extraction stage by employing a fast Fourier transform (FFT) Gabor scheme. And achieved an average accuracy of 95.37%, along with very low false negatives. They were also able to visually give evidence by displaying the final features on which the prediction decision was based. In [16], Barstugan et al. used Grey-level co-occurrence matrix (GLCM), grey-level run length matrix (GLRLM), grey-level size zone matrix (GLSZM), local directional pattern (LDP) and discrete wavelet transform (DWT) algorithms as feature extraction methods. Abd Elaziz 2020 et al. [17] utilized orthogonal moment feature properties and feature selection techniques. Extraction of features were carried out by new fractional multichannel exponent moments (FrMEMs), and a new feature selection method was employed by improving manta ray foraging optimization (MRFO) using differential evolution (DE). Patel et al. [18] used features such as, clinical, blood-panel profile and socio-demographic data for severity identification and stated that the ML model with random forest (RF) gives the most accurate critical and mechanical ventilation prediction. The authors in [19] used clinical information along with CT images, including count of leukocyte, absolute lymphocyte number, neutrophils and lymphocytes percentage. In the classification stage, they used SVM, multilayer perceptron (MLP) and RF classifiers, of which the MLP performed well. Finally, the model was created by the combination of radiological and clinical information.

Lung infection segmentation approaches

All the reviewed works lack separate COVID-19 infection region extraction after COVID-19 classification. This part is important to help clinicians for taking vital decisions in timely manner. In [20] they present CoSinGAN, a new conditional generative model that can be learned from a single radiological picture with a certain condition, such as the lungs and infected regions annotation mask. Higher segmentation performance was achieved using 2D and 3D U-Net. CoSinGAN can capture the conditional distribution of a single radiological image and synthesize high-resolution and diversified radiological images that closely fit the input conditions. The work's drawback is that the structural masks of the lungs and diseased regions must still be drawn by hand.

Deng et al. [21] developed lung infection segmentation network called “Inf-Net”. Infected region extraction usually faces problems such as infection extraction variation, low density contrast between the infected and normal region etc. Here, a parallel partial decoder generates a global map by aggregating high-level features. Explicit edge-attention and implicit reverse attention are used to model boundaries and improve representations. The development of a semi-supervised segmentation framework named "Semi Inf-Net" removed the limitations of CT images with segmentation annotations. For COVID-19 infection segmentation on CT images, a domain adaptation based self-correction model (DASC-Net) is proposed in [22], which consists of a novel attention and feature domain enhanced domain adaptation model (AFD-DA) to solve domain shifts and a self-correction learning process to refine segmentation results. An image-level activation feature extractor with a focus on lung anomalies and a multilevel discrimination module for hierarchical feature domain alignment are among the new features in AFD-DA. Even though this model outperformed "Semi Inf-Net", it faces limitation that, they presumptively annotated all of the source data samples. However, the number of well-annotated data samples was restricted, and DA approaches' performance can suffer significantly when there are fewer labeled examples.

Severity prediction approaches

Majority of works focused on COVID-19 classification from normal CT images only. But once identified with the disease it is equally importance to get the severity level prediction. Mahdavi et al. [23] utilized patients’ clinical, laboratory, and demographic features at time of hospital admission to predict mortality prognosis, as these data can reduce the rate of mortality by prioritizing appropriate treatments. They implemented three ML models, using an SVM framework with three groups of input data. The first group of input data included demographic and clinical features; the laboratory features were in the second set, and the third set comprised a combination of both inputs. The criteria used for severity classifications were saturation of peripheral oxygen (${SPO}_{2}$) and respiratory rate (RR). ${SPO}_{2}$ of less than 90 and an RR greater than or equal to 30 were categorized as severe cases. Moreover, the authors stated that non-invasive (clinical and demographic) features are able to give a better prediction of mortality even when there are fewer of them.

In [24], they collected data from 641 patients and developed a model that calculates risk-score to predict intensive care unit (ICU) admissions and mortality rates. The authors also identified the key clinical features to be considered for ICU admission and mortality prediction. A reduced lymphocyte count was amongst the top predictors of ICU admission, as was history of smoking. The authors also validated the developed risk-score model with different internal datasets. Su et al. [25] used another dataset of 93 mild and 32 severe cases of COVID-19 to develop progression to severe symptoms prediction model. The model achieved 94.1% sensitivity and 90.2% specificity, and was under the ROC curve (AUC) of 94.4%. Although the authors found that 17 features could be used to distinguish between mild and severe cases, they identified that only four such features were independent and plays key role in severity prediction that includes, C-reactive protein test (CRP), RR, comorbidities and lactate dehydrogenase (LDH). In [26], we observed that CT scores were manually calculated by evaluating the lobar involvement in chest CT, incorporating different clinical and laboratory features. However, these works employed clinical measures to obtain the risk score which required human intervention. After reviewing these works, we decided to develop an automatic severity prediction model along with COVID-19 diagnosis.

CT images chosen over X-Rays to develop the framework after reviewing the works, since CT image contains majority of the COVID-19 infection findings clearly even from the primary stages. Many of the developed models faced generalization issues due to the dataset limitations. Hence, preferred largest publicly available dataset of COVID-19 CT images which was collected from different cohorts so that the model could be incorporated into real-time CAD applications. But the conventional segmentation approaches will not work due to the diversity in database. Hence, we opted semantic segmentation using DL. Majority of the reviewed works used either DL features or scale space variant features in their model development. Hence, we decided to develop our model using features that are local and scale, space, distortion, noise invariant to make use of COVID-19-related findings from each CT image, irrespective of the diversity in database. Since direct feature engineering involved in classical ML, these algorithms are quite easy to interpret and understand. Hence, we finalized with classical ML approaches for classification. To fill the research gap in COVID-19 diagnosis application, it was necessary to get an infection segmentation model along with an automatic disease severity prediction. To get a precise infection segmentation model even under real-time, DL semantic segmentation concept implemented. In total, the framework will give a complete automatic COVID-19 real-time CAD model along with infection extraction, severity score prediction and disease stage identification.

Methods

This section contains information on all the materials we used, description of processes as well as the methodologies used to create the COVID-19 classification and infection segmentation architecture.

Description of materials

CT-scan database

In this research, we used datasets from the China National Centre for Bioinformation [27], which provides a large CT image dataset. In this dataset, COVID-19 is referred as novel coronavirus pneumonia (NCP). The images in this collection were compiled from the China Consortium of Chest CT Image Investigation cohorts (CC-CCII). They also provided the metadata which includes patient ID, scan ID, liver function, lung function, age, sex, critical illness and time of progression. CT images and metadata mentioned, were acquired at the time of their hospital admission. Across the entire dataset, CT images vary in size from 256 × 256 till 2592 × 2592 and are in “jpg” and “png” formats. In addition to the complete set of CT images from different categories, they also provided information regarding 55,692 CT images with lesions belonging to both NCP and common pneumonia (CP) in one of the csv files named “lesions_slices.csv”. Moreover, it provides a dataset of 750 CT images obtained from 150 patients with manual pixel annotations by radiologists, provided by another study [28] which used the same dataset. In the pixel-labelled images, the pixels are annotated as zero for background, one for lung field, two for GGOs, and three for consolidation (CL). The complete details of the abovementioned dataset are mentioned in Table 1.

Table 1 CT image details of complete database for two classes

CAD systems for COVID-19 diagnosis and disease stage classification by segmentation of infected regions from CT images

Abstract

Background

Results

Conclusions

Background

Related works

DL classification approaches

Classical ML approaches

Lung infection segmentation approaches

Severity prediction approaches

Methods

Description of materials

CT-scan database

SIFT, SURF, and ORB techniques

DeepLabv3+ architecture

ML classification

Infection region segmentation

Design and setting of the study

Deep learning model for infection region extraction

Results

Discussion

Conclusions

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us