|Year : 2022 | Volume
| Issue : 1 | Page : 19-25
Deep learning-based COVID-19 triage tool: An observational study on an X-ray dataset
Abhishek Mahajan1, Vivek Pawar2, Vivek Punia2, Aakash Vaswani2, Piyush Gupta2, KS S. Bharadwaj2, Arvind Salunke3, Sujit D Palande3, Kalashree Banderkar4, M L V. Apparao2
1 Department of Radiodiagnosis, Tata Memorial Hospital, Homi Bhabha National Institute, Mumbai, Maharashtra, India
2 Endimension Technology Pvt. Ltd., Thane, Maharashtra, India
3 Kaushalya Medical Foundation, Thane, Maharashtra, India
4 Jupiter Hospital, Thane, Maharashtra, India
|Date of Submission||15-Jul-2021|
|Date of Decision||30-Jan-2022|
|Date of Acceptance||02-Feb-2022|
|Date of Web Publication||31-Mar-2022|
Fellowship in Cancer Imaging, MRes (KCL, London), FRCR (UK), Consultant Radiologist, The Clatterbridge Cancer Centre NHS Foundation Trust, Pembroke Place, Liverpool, L7 8YA
Source of Support: None, Conflict of Interest: None
Background: Easy availability, low cost, and low radiation exposure make chest radiography an ideal modality for coronavirus disease 2019 (COVID-19) detection.
Objectives: In this study, we propose the use of an artificial intelligence (AI) algorithm to automatically detect abnormalities associated with COVID-19 on chest radiographs. We aimed to evaluate the performance of the algorithm against the interpretation of radiologists to assess its utility as a COVID-19 triage tool.
Materials and Methods: The study was conducted in collaboration with Kaushalya Medical Trust Foundation Hospital, Thane, Maharashtra, between July and August 2020. We used a collection of public and private datasets to train our AI models. Specificity and sensitivity measures were used to assess the performance of the AI algorithm by comparing AI and radiology predictions using the result of the reverse transcriptase-polymerase chain reaction as reference. We also compared the existing open-source AI algorithms with our method using our private dataset to ascertain the reliability of our algorithm.
Results: We evaluated 611 scans for semantic and non-semantic features. Our algorithm showed a sensitivity of 77.7% and a specificity of 75.4%. Our AI algorithm performed better than the radiologists who showed a sensitivity of 75.9% and specificity of 75.4%. The open-source model on the same dataset showed a large disparity in performance measures with a specificity of 46.5% and sensitivity of 91.8%, thus confirming the reliability of our approach.
Conclusion: Our AI algorithm can aid radiologists in confirming the findings of COVID-19 pneumonia on chest radiography and identifying additional abnormalities and can be used as an assistive and complementary first-line COVID-19 triage tool.
Keywords: Artificial intelligence, assistive technology, coronavirus disease 2019, deep learning, radiology, triage, X-ray
|How to cite this article:|
Mahajan A, Pawar V, Punia V, Vaswani A, Gupta P, S. Bharadwaj K S, Salunke A, Palande SD, Banderkar K, Apparao M L. Deep learning-based COVID-19 triage tool: An observational study on an X-ray dataset. Cancer Res Stat Treat 2022;5:19-25
|How to cite this URL:|
Mahajan A, Pawar V, Punia V, Vaswani A, Gupta P, S. Bharadwaj K S, Salunke A, Palande SD, Banderkar K, Apparao M L. Deep learning-based COVID-19 triage tool: An observational study on an X-ray dataset. Cancer Res Stat Treat [serial online] 2022 [cited 2022 May 28];5:19-25. Available from: https://www.crstonline.com/text.asp?2022/5/1/19/341236
| Introduction|| |
The first case of coronavirus disease 2019 (COVID-19) caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was reported from Wuhan city in the Hubei province of China. Since then, it has rapidly spread across the world. The COVID-19 outbreak was declared a pandemic by the World Health Organization (WHO) on March 11, 2020. and at the time of writing this article, there were over 73 million COVID-19 cases with 1.6 million deaths reported globally., The high communicability of the disease, need for intensive care unit (ICU) admission upon infection, and dependence on mechanical ventilation make a timely diagnosis of COVID-19 imperative to reduce the stress on the healthcare system.,,,, Reverse transcriptase-polymerase chain reaction (RT-PCR) is currently the gold standard for the diagnosis of COVID-19; however, some studies have shown that the sensitivity of RT-PCR may not be high enough to be relied upon solely. The sensitivity can be as low as 30% based on various factors such as sample collection and transportation, RT-PCR kit performance, and protocols used. This can also lead to delays in receiving test results; a shortage of RT-PCR kits is often reported.
The current literature in radiology focuses primarily on computed tomography (CT) imaging for the diagnosis of COVID-19.,,,, Few countries like China have CT suites dedicated to imaging of suspected COVID-19 patients as a first-line investigation. This practice is not only nearly impossible to be implemented globally but can also cause disruptions in the availability of radiological services due to the need for frequent decontamination of the CT suites. Thus, the American College of Radiology has recommended the use of portable X-rays to minimize the risk of cross-infection as several countries use chest radiography as the first-line triage tool. The widespread availability, low cost, and low radiation exposure make chest radiography an ideal modality for first-line investigations for mass screening of the general population and their contacts who may be asymptomatic as well as for assessing the need for additional investigations like CT.
Artificial intelligence (AI) algorithms excel at automatically recognizing complex patterns in imaging data and providing quantitative, rather than qualitative assessments of radiographic characteristics. Therefore, in this study, we proposed a method of using an AI algorithm to automatically detect abnormalities associated with COVID-19 on chest radiographs and evaluated the performance of the algorithm against the interpretation of radiologists. AI algorithms are highly dependent on data for producing viable results, and hence, we also included data from a collaborating hospital.
| Materials and Methods|| |
General study details
The study was conducted in collaboration with Kaushalya Medical Trust Foundation Hospital, Thane, Maharashtra between July and August 2020. The study was approved by the Institutional Ethics Committee of the Kaushalya Medical Trust Foundation Hospital, Thane, Maharashtra. The need for obtaining informed consent was waived, and data collection and storage were performed in accordance with the local guidelines. Data were anonymized to keep the personal information of patients confidential. The study was performed in accordance with the ethical guidelines outlined in the Declaration of Helsinki, Good Clinical Practice guidelines, and the Indian Council of Medical Research guidelines. The study was not registered in a public clinical trials registry. No funding was used for the purpose of this study.
We used standard specificity and sensitivity measures to assess the performance of the AI algorithm by comparing AI and radiology predictions using RT-PCR as the conventional gold standard. We further compared the existing open-source AI algorithms with our method using our private dataset to draw a comparison and test the reliability of our method.
We used the standard procedure for the evaluation of AI-based systems in this study. We used a collection of public and private datasets to train our AI models. At each stage of the AI pipeline, different datasets and types of supervision were used. Details of the datasets, type of supervision, and loss function used in the pipeline are elucidated below. We used the specificity and sensitivity measures to estimate the performance of the overall pipeline.
We developed the AI algorithm using a set of open-source and private datasets. We used a three-stage pipeline to process the images [Figure 1]. The first stage identifies ground-glass opacities (GGO) and consolidation in chest radiographs using a deep convolutional neural network (DCNN).,, Using the rough identification of GGOs and consolidation obtained in the first stage, the second stage confirms whether the scan is of the chest region and removes predictions for regions outside the chest. The second-stage network is a variant of U-Net, a state-of-the-art deep convolutional network. The third stage uses the same deep convolutional network as the first stage, but it is trained using a different dataset. This network takes predictions from the second stage network and confirms or rejects the GGOs and consolidation predictions based on COVID-19-and non-COVID-19-related criteria from the second stage.
|Figure 1: The artificial intelligence pipeline used for the study. It uses 3 different deep neural networks for each stage. First stage network makes rough predictions of consolidation and ground glass opacities. Second stage network removes predictions outside the chest area. Third stage network filters the predictions based on confidence scores and relevance to coronavirus disease 2019 infection findings|
Click here to view
The RT-PCR reports provided by collaborating hospitals were used as reference labels for the study. We performed a multi-reader analysis where the findings were evaluated against the assessment provided by two consultant radiologists. The senior radiologist had 12 years of experience and the junior radiologist had 6 years of experience. Both readers assessed the images independently and were blinded to the opinions of the other reader as well as the clinical information and RT-PCR status.
Deep learning models require vast amounts of data to learn to identify patterns. Since the beginning of the COVID-19 pandemic, several public datasets have been released by various sources to tackle the challenges associated with the detection of COVID-19 symptoms on chest radiographs. Public datasets play a huge role in the development of AI algorithms, but there are no Indian patient datasets that are publicly available. In order to perform an India-specific study, we collaborated with one hospital (Kaushalya Medical Trust) in India. The collaborating hospital provided COVID-19 positive/negative labels for the data, based on the interpretation of two radiologists. Available datasets were used to train models at different stages of the AI pipeline.
Data were collected from Kaushalya Medical Trust Foundation Hospital, Thane, Maharashtra. The dataset comprised 2661 scans from 2661 patients, which were anonymized for confidentiality. These data were collected on a daily basis during the study period and annotated by in-house radiologists to tune the models for Indian patients. Chest radiography data were collected for patients within 3 days of the RT-PCR test. In addition to the RT-PCR results, the data were assigned a label “COVID-19”/”non-COVID-19”. The data were split into training (80%) and validation (20%) sets for network training.
Kaggle Pneumonia Challenge dataset
Data were collected from the Kaggle RSNA Pneumonia Detection Challenge hosted by the Radiological Society of North America (RSNA). This dataset comprised 16,447 scans. A team of in-house radiologists further modified the data by updating the scan labels (”COVID-19”/”non-COVID-19”) and GGO/consolidation markings in the dataset. It was used by the first- and third-stage networks of the AI pipeline for training and validation.
Segmentation dataset for second stage
This dataset comprised approximately 3000 scans from Kaushalya Medical Trust Foundation Hospital, Thane, Maharashtra as well as the Kaggle RSNA Pneumonia Detection Challenge dataset. The scans added to the dataset across multiple iterations. In each iteration, we trained the AI to segment lungs from the chest radiograph and checked the performance of the model on the held-out dataset from the training data itself. The radiologists relabeled the scans that AI could not accurately identify, and these scans were used as training data in the next iteration. At the beginning of training, we used 1000 training scans to train the AI and 500 to evaluate AI performance. The labeling process continued till we exhausted all the training data. In the final iteration, the AI was evaluated on the evaluation data from the second stage. This helped us keep track of AI mistakes and progress of performance at a more granular level. The dataset was labeled by radiologists to segment six different regions (lungs, clavicle, central bronchovascular markings, visible spinous process of the upper thoracic vertebrae, ribcage boundary, ribcage) of the chest. The dataset was used to train a second-stage network of the AI pipeline. The purpose of this dataset was to train the AI algorithm to reliably identify the lungs in a chest radiograph, such that predictions from the first stage that lay outside the lung area could be eliminated.
Open-source COVID-19 dataset
This dataset comprised COVID-19 and other viral and bacterial pneumonia scans collected from different sources. It contained 951 scans for different pathologies. In this dataset, scans from patients with COVID-19 were considered positive and the rest were considered negative including viral (SARS, influenza), bacterial (streptococcus, legionella), fungal and pneumocystis infections, and lipoid pneumonia.
Stages of artificial intelligence pipeline
First stage network
The network was trained to predict boxes around GGOs/consolidation regions, class of the box (GGO/consolidation), and segmentation of the GGO/consolidation inside that box. We used ResNet-101 with pre-trained weights on ImageNet. Data from the Kaggle RSNA challenge and Indian hospitals were used to train this network. The network has the same backbone but three different last layers which are optimized for three different loss functions (smooth L1, focal loss, and dice loss). Multiple losses were used to improve the generalization using multiple objectives.
Second stage network
The second stage network was trained to segment the corresponding lung regions provided by radiologist annotations in the given chest radiography. We used the pre-trained U-Net architecture by modifying the convolutional operation into residual blocks. A total of 3000 scans were used to train the model with an 80:20 training: Validation split. A mix of scans from Indian data and Kaggle RNSA Pneumonia Detection Challenge data were used to create this dataset. The network was trained to optimize the dice loss during training.
Third stage network
The model was trained to predict the box around the GGO/consolidation area and class of the box on COVID-19 scans. We used the same model as stage 1 for this, initialized with weights of stage 1. If the first-stage network did not predict any bounding boxes, this network was not used in the pipeline. However, if the first stage network made predictions, we filtered them using this network to remove false positives and improve the performance. The network used the scan as input and predicted bounding boxes for GGOs and consolidation. It had been trained in the same fashion as the stage 1 network, however, scans with non-COVID-19 GGO/consolidation were treated as negative in the third stage. The dataset from the Indian hospital was used with an additional COVID-19/non-COVID-19 label for training this network. Model weights were initialized with values from the first-stage network. The network was trained to optimize the average of focal loss and smooth L1 loss.
Training and validation
All the models were trained for 50 epochs with a learning rate of 0.01 using the Stochastic Gradient Descent optimizer. Training each network took 20 min per epoch. We used three different losses (smooth L1, focal loss, and dice loss). During training, different networks used different loss functions and datasets. Training and validation curves of all the three networks of the pipeline are shown in [Figure 2].
|Figure 2: Training and validation loss progress during the training of each model. Training and validation loss reduced sharply in the beginning but plateaued around the 50th epoch. Additional training did not result in any learning for the network. Hence the number of epochs of the model training was clipped at 50 epochs. Even after transfer learning, direct predictions from the first stage network plateaued to high loss value so we had to clean the data and train the second and third stage network to improve the performance|
Click here to view
Specificity was defined as the fraction of positive COVID-19 infection according to RT-PCR covered by the predictions. Sensitivity was defined as the fraction of negative COVID-19 infection according to RT-PCR covered by the predictions. The receiver operating characteristic (ROC) curve is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. Area under the ROC curve (AUROC) can be used to find the appropriate threshold for binary classification.
As AI predictions are in the form of the probability of COVID-19, we had to decide a threshold to convert the probability values into binary COVID-19/non-COVID-19 predictions. We used the AUC to find the best threshold.
| Results|| |
We validated our AI algorithm using a total of 611 scans, out of which 220 were confirmed by RT-PCR to be COVID-19 positive. The results from the study are described in [Table 1]. Our algorithm showed a sensitivity of 77.7%, while its specificity was 75.4%. The rates of false-positive and false-negative results were 22.3% and 24.6%, respectively. The radiologists correctly classified 167 COVID-19 cases and achieved a sensitivity of 75.9% and specificity of 75.4%. The most representative images from each class of predictions (true positive, false positive, false negative, true negative) are shown in [Figure 3].
|Table 1: The comparison between radiologist annotations and artificial intelligence predictions on Indian data from 611 cases|
Click here to view
|Figure 3: Representative sample of different prediction types of our artificial intelligence (AI) against labels provided by radiologists. Blue box indicates radiologist labels. Red box indicates AI predictions|
Click here to view
We also compared our model's performance against an open-source, state-of-the-art model called COVID-Net. As shown in [Table 2], COVID-Net outperformed our model with a sensitivity of 91.8% but had a specificity of 46.5%. In addition, our model demonstrated more consistent results on Indian data.
|Table 2: The sensitivity and specificity comparison of open-source coronavirus disease-Net model and our artificial intelligence|
Click here to view
| Discussion|| |
Our proposed AI pipeline was able to perform competitively against radiologists' annotations with a sensitivity of 77.7% and specificity of 75.4%. The performance of our algorithm further improved because of its ability to ignore non-lung predictions. In the third stage of the pipeline, its performance improved even further as it could distinguish COVID-19 from other pulmonary infections. Our AI pipeline provides bounding boxes around infected regions in addition to classifying them, which is beneficial for the interpretation of AI results, visualization of infected regions, and cross-verification by radiologists. The marked areas can be considered as areas of focus when the AI pipeline is making COVID-19/non-COVID-19 classification decisions which gives us a peek into the reasoning of our AI pipeline. In the representative images from each class of predictions shown in [Figure 3], the blue bounding boxes are drawn by radiologists to indicate the COVID-19 infected areas, and red bounding boxes are predictions from our AI pipeline. It is apparent that the AI pipeline is able to predict the location of the infected area with adequate accuracy and the results are qualitatively competitive. Hazy areas may lead to false positive predictions by the algorithm; these are areas where the radiologists have not labelled anything, but the algorithm predicts red bounding boxes. In case of very mild infection, our AI pipeline is not able to predict the bounding boxes. However, looking at the specificity and sensitivity of the model, its ability to classify the infected regions appears to be adequate leading to fewer false-negative and false-positive results.
We also compared our AI pipeline with other openly available AI models. We observed that the performance of the best model was highly skewed in the Indian datasets. Since reliability and balanced results are crucial for triaging applications, we believe that our model will perform better in real-world settings. This also shows that a model trained using one data source cannot be directly used for other datasets. Therefore, we need to use approaches like transfer learning, where we train the model on a general openly available dataset and then fine-tune it for specific datasets before using it. Even though there are several datasets available for the training of AI algorithms for COVID-19 detection, it is not necessary that good models trained using those datasets will perform equally well on new datasets. We believe that clinical validation studies like ours are necessary for proper evaluation before deployment. We plan to conduct more such studies in the future to overcome the lack of such validation studies and to properly evaluate the performance of AI models. Such evaluations become even more necessary in light of the lack of diversity in these datasets.
The competitive performance of our AI algorithm suggests various potential applications of this pipeline in patient management. It can be used alongside predictions made by radiologists as a check or as an initial filter to reduce the load on radiologists. It does not seem suitable for independent deployment, but it could be used as a suitable baseline for future models. We plan to further improve the pipeline by gathering more data and validating it in different settings.
Even though our AI algorithm performs competitively against radiologist annotations, some of the limitations of our study are the absence of annotations of other pathologies, lack of a more diverse distribution of patients from different demographics, and lack of even larger datasets (more collaborating hospitals). Training and evaluation against such a dataset would provide more detailed performance measures of our AI pipeline. Some crucial metrics are the performance of the algorithm in the presence of other pathologies and demography-based performance etc. Therefore, in future, we plan to work on larger, more diverse, and granular datasets. This would help us measure and improve the performance of our algorithm even further. We also aim to conduct a multi-center clinical validation study to evaluate our AI-based methods further.
| Conclusion|| |
Our AI algorithm can aid radiologists in confirming the findings of COVID-19 on chest radiography and identifying additional abnormalities and can be used as an assistive and complementary first-line COVID-19 triage tool.
Data sharing statement
Individual de-identified participant data (including data dictionaries) will not be shared.
The authors would like to acknowledge the support of Kaushalya Medical Trust Foundation Hospital, Thane, Maharashtra, in providing us with invaluable data and permission to perform this validation study. We would also like to thank Dr. Arvind Salunke and Dr. Sujit D. Palande for providing us with annotations which were used to validate the performance of our AI against radiological view.
Financial support and sponsorship
Conflicts of interest
We conducted the study with Kaushalya Medical Trust Foundation Hospital, Thane, Maharashtra. The hospital has provided a conflict of interest waiver for the study.
| References|| |
Ciotti M, Ciccozzi M, Terrinoni A, Jiang WC, Wang CB, Bernardini S. The COVID-19 pandemic. Crit Rev Clin Lab Sci 2020;57:365-88.
Ohannessian R, Duong TA, Odone A. Global telemedicine implementation and integration within health systems to fight the COVID-19 pandemic: A call to action. JMIR Public Health Surveill 2020;6:e18810.
COVID-19 Dashboard by Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU). Available from: https://coronavirus.jhu.edu/map.html
. [Last accessed on 2022 Jan 13].
Mahajan A. COVID-19 and its socioeconomic impact. Cancer Res Stat Treat 2021;4:12-8. [Full text]
Bothra M, Shera TA, Bajpai J, Mahajan A. COVID-19: A review of the pandemic with emphasis on the role of imaging. Indian J Med Paediatr Oncol 2020;41:640-51. [Full text]
Pande P, Sharma P, Goyal D, Kulkarni T, Rane S, Mahajan A. COVID-19: A review of the ongoing pandemic. Cancer Res Stat Treat 2020;3:221-32. [Full text]
Pereira RM, Bertolini D, Teixeira LO, Silla CN Jr., Costa YM. COVID-19 identification in chest X-ray images on flat and hierarchical classification scenarios. Comput Methods Programs Biomed 2020;194:105532.
Kulkarni T, Sharma P, Pande P, Agrawal R, Rane S, Mahajan A. COVID-19: A review of protective measures. Cancer Res Stat Treat 2020;3:244-53. [Full text]
Mahajan A, Sharma P. COVID-19 and radiologist: Image wisely. Indian J Med Paediatr Oncol 2020;41:121-6. [Full text]
Naguib M, Moustafa F, Salman MT, Saeed NK, Al-Qahtani M. The use of radiological imaging alongside reverse transcriptase PCR in diagnosing novel coronavirus disease 2019: A narrative review. Future Microbiol 2020;15:897-903.
Sharma PJ, Mahajan A, Rane S, Bhattacharjee A. Assessment of COVID-19 severity using computed tomography imaging: A systematic review and meta-analysis. Cancer Res Stat Treat 2021;4:78-87. [Full text]
Mahajan A. Recent updates on imaging in patients with COVID-7. Cancer Res Stat Treat 2020;3:351-2. [Full text]
Wong HY, Lam HY, Fong AH, Leung ST, Chin TW, Lo CS, et al.
Frequency and distribution of chest radiographic findings in patients positive for COVID-19. Radiology 2020;296:E72-8.
Ahuja A, Mahajan A. Imaging and COVID-19: Preparing the radiologist for the pandemic. Cancer Res Stat Treat 2020;3 Suppl S1:80-5.
Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJ. Artificial intelligence in radiology. Nat Rev Cancer 2018;18:500-10.
Kss B, Pawar V, Punia V, Mlv A, Mahajan A. Novel artificial intelligence algorithm for automatic detection of COVID-19 abnormalities in computed tomography images. Cancer Res Stat Treat 2021;4:256-61.
Krizhevsky A. One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997 (2014) [cs.NE].
He K, Zhang X, Ren S, Sun J. “Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016. p. 770-8.
Simonyan, K. and Zisserman, A. (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition. The 3rd International Conference on Learning Representations (ICLR2015).
He K, Zhang X, Ren S, Sun J. “Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-8.
Wang L, Lin ZQ, Wong A. COVID-Net: A tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images. Sci Rep 2020;10:19549.
Ronneberger O, Fischer P, Brox T. U-net: Convolutional Networks for Biomedical Image Segmentation. International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer; 2015.
Graning L, Jin Y, Sendhoff B. Generalization Improvement in Multi-Objective Learning. The 2006 IEEE International Joint Conference on Neural Network Proceedings. Vancouver: BC; 2006. p. 4839-46.
Girshick R. Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV); 2015. p. 1440-8.
Sudre CH, Li W, Vercauteren T, Ourselin S, Cardoso MJ. Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support 2017;pp. 240-8.
[Figure 1], [Figure 2], [Figure 3]
[Table 1], [Table 2]