dc.description.abstract |
The spine radiographs are important for the identification of various diseases – these
include, but are not limited to fractures, tumors, and degenerative diseases affecting
the spine. These images assist healthcare practitioners in making decisions regarding the treatment and management of the patients. However, the present way of
examining spinal radiographs presents a number of limitations especially in most
parts of the world such as limited access to qualified radiologists„ inadequate medical infrastructure or nonexistent or modern equipment is available to none. Given
such conditions, which are very common in low income countries, automated systems
for diagnosis of spine radiographs are in great need as they would not only improve
the diagnosis but also increase the efficiency of workflow and decrease the socioeconomic burden of the healthcare system for treating spinal disorders.The development
of deep learning technologies introduces a spectrum of possibilities, particularly in
the area of automated report writing in medical imaging. In order to overcome
the aforementioned challenges, wepropose a novel spine radiology report generation
model. This framework uses transformer models to combine visual information from
spine CT scan sagittal images and the text content of clinical reports. By incorporating visual characteristics of spinal structures, the proposed framework aims to
generate detailed and reliable radiology reports that closely mimic the expertise of
a human radiologist.
This thesis introduces a novel spine radiology report generation framework, utilizing
transformers trained on text reports and visual data from spine CT scans in the
sagittal view. The core of this framework consists of a foundation model, which is
fine-tuned during the training period. Moreover, Knowledge Distillation (KD) is applied, through which the encoder can enhance its learning by transferring knowledge
from a more complex teacher mode. A report generation module comprises both
an encoder and a decoder that works on the input medical images and generates
a detailed report. The fine-tuned foundation model, in conjunction with KD, has
shown significant improvements in performance metrics.
Extensive evaluations performed on a public dataset validate the effectiveness of
the proposed framework. The results show improvements which are significant in
terms of the BERT Score and BLEU-1 metrics with the former rising from 0.7486
to 0.7522 while the latter increases from 0.6361 to 0.7291. Further, the framework is
examined using four different approaches: (1) the reports written and submitted by
original practicing radiologists, (2) reports lacking spine level details, (3) modified
reports, and (4) reports written by the ChatGPT AI software. Among these, it was
observed that the technique which skipped annotations on the spine level fared the
best across most of the metrics with the highest being 0.9056 for the BLEU-4 score
and 0.915 for the BERTScore. This indicates that the higher quality of the generated text is attained with the simplified report containing no detailed spine-level
reporting. A test set of 50 cases was created to measure the quality of the report,
with the completeness, correctness, and conciseness being assessed by radiologists
as well as chatgpt. It was found that the radiologists’ original reports outscored the
automatically generated reports in terms of completeness, correctness and conciseness. The performance of reports generated by ChatGPT is relatively lower than
other methods, although it holds its competitiveness within every sub-method in
terms of ROUGE-L (0.8552) and BERTScore (0.8655). This also shows that the
model manages to captures fundamental linguistic structures, but it struggles with
more complex or extended sequences.There is less disparity in the ChatGPT assessment of original and generated reports. although the original reports were rated
highly by radiologists, the generated reports received more favorable assessments
from ChatGPT, Notwithstanding this claim, the scores for the evaluated generated
reports were in any case not as high as those earned for the original reports made
by radiologists.
The results indicate that the framework presented in this Thesis is a significant
advancement on the existing process of generating spine radiology reports through
automation with the use of modern deep learning techniques. The system’s capabilities in generating reports that are contextually relevant and semantically correct
are commendable, but some improvements are still necessary if clinical requirements
are to be strictly observed. Improvements have also been made in the performance
of report generation through the use of transformers and Knowledge Distillation.
This underlines the fact that there are still many research opportunities available
not only in this area but also in automated reporting of medical images. |
en_US |