Abstract:
Chronic Respiratory Diseases and Chronic Obstructive Pulmonary Disorders affect millions of people around the globe, especially the population of low-middle-income countries such as Pakistan, and are the cause of millions of years lived with disability. Chest
X-rays (CXR) are the most commonly used imaging methodology in radiology to diagnose these pulmonary diseases with close to 2 billion CXRs taken every year. Although
CXRs are often used, their sheer volume can be a strain on the healthcare system and
take a lot of radiologists’ time and resources. Therefore, the need for an automated system utilizing this modality is imperative. Furthermore, merely providing an image-level
diagnosis for a CXR is insufficient, as the disease affects multiple lung regions. This
detailed information is crucial in assessing the severity and progression of the condition.
The framework, proposed in this research, offers a unified framework capable of disease
classification, providing a severity score for a subset of lung diseases by segmenting the
lungs into six regions, and producing chest X-ray reports while taking these challenges
into consideration. The classification sub-module proposes a modified progressive learning technique in which the amount of augmentations at each step is capped. Additionally,
an ensemble of 4 EfficientNet B0 is used to improve this sub-module’s performance and
generalizability by taking advantage of a number of augmentation techniques. Furthermore, the segmentation task makes use of an attention map generated within and by the
network itself. This attention mechanism allows to achieve segmentation results that are
on par with networks having an order of magnitude or more parameters. Severity scoring
is introduced for 4 thoracic diseases that can provide a single-digit score corresponding
to the spread of opacity in different lung segments with the help of radiologists. The
report generation sub- module of the proposed framework generates a CXR report that
provides the findings from a single CXR taken either from the Anterior-Posterior (AP) or
Posterior-Anterior (PA) viewing position. An encoder and a decoder are employed in the
report-generation module; the former splits the image into patches to create hidden states,
while the latter uses the hidden encoded states to generate word probabilities, which are
then used to build the final report. A foundation model is first fine-tuned in an unsupervised manner which is then used as the Teacher for knowledge transfer to a smaller
Student model via Knowledge Distillation (KD). Kullback–Leibler (KL) divergence loss
is employed for KD. The distilled student model is then used as the encoder in conjunction with a decoder for report generation. The evaluation and training is done using 9
different CXR datasets, both publicly available and collected locally including BRAX,
Indiana, MIMIC, JSRT, Shenzhen, SIIM and others utilising nearly 400,000 CXR images
from diverse demographic and geographical locations. On the BRAX validation data set
for segmentation, we achieve F1 scores of 0.924 and 0.939 without and with fine-tuning,
respectively and a mean matching score of 80.8% is obtained for severity score grading. An average area under receiver operating characteristic curve of 0.88 is achieved for
classification using the proposed modified progressive learning which is an improvement
of almost 9% in comparison to literature. The incorporation of KD in report generation
framework by first fine- tuning a foundation model and then training a student model results in an increase of BLEU-1 score for Indiana dataset by 4% and BERTScore by 7.5%.
Similarly, pre-training on larger datasets for report generation, when used in combination
with KD, further increases BLEU-1 score for Indiana dataset by 7.2% and BERTScore
ii
by 3%. For MIMIC dataset, comparable performance is achieved for Findings and the
Impression sections of the report while the proposed framework outperforms other techniques when both of these sections are combined. For MIMIC-PRO dataset, an semb
score of 0.4069 while a RadGraph F1 score of 0.1165 is achieved outperforming other
techniques in literature. With the highest BERTScore of 0.2245 on the same dataset, the
difference between SOTA is just 1.06%. Finally, the proposed framework is also evaluated
on locally gathered dataset BRAX subset without any re- training or fine-tuning resulting
in BLEU-1 score of 0.3827 and a BERTScore of 0.4392 for local dataset and BLEU-1
score 0.1671 of and a BERTScore of 0.2186 for BRAX dataset showing generalisation
ability. The results indicate that the proposed framework performs comparably to existing
techniques for some sub-modules and outperforms state-of-the-art techniques for other
sub-modules while using a simple architecture with a relatively small parameter size. By
obtaining many insights from a single chest X-ray, the approaches used in the proposed
framework have the potential to improve the precision of lung disease diagnosis and offer the medical community a comprehensive solution to expedite the chest examination
procedure. Subsequent improvements in the performance of the proposed framework will
increase its utility even further.