NUST Institutional Repository

KNOWLEDGE-DISTILLATION DRIVEN TRANSFORMER MODEL FOR REPORT GENERATION OF SPINE RADIOGRAPHS

Show simple item record

dc.contributor.author Mukhtar, Asmat
dc.date.accessioned 2024-12-12T04:49:51Z
dc.date.available 2024-12-12T04:49:51Z
dc.date.issued 2024
dc.identifier.other 363369
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/48255
dc.description Supervisor: Dr. Muhammad Usman Akram en_US
dc.description.abstract The spine radiographs are important for the identification of various diseases – these include, but are not limited to fractures, tumors, and degenerative diseases affecting the spine. These images assist healthcare practitioners in making decisions regarding the treatment and management of the patients. However, the present way of examining spinal radiographs presents a number of limitations especially in most parts of the world such as limited access to qualified radiologists„ inadequate medical infrastructure or nonexistent or modern equipment is available to none. Given such conditions, which are very common in low income countries, automated systems for diagnosis of spine radiographs are in great need as they would not only improve the diagnosis but also increase the efficiency of workflow and decrease the socioeconomic burden of the healthcare system for treating spinal disorders.The development of deep learning technologies introduces a spectrum of possibilities, particularly in the area of automated report writing in medical imaging. In order to overcome the aforementioned challenges, wepropose a novel spine radiology report generation model. This framework uses transformer models to combine visual information from spine CT scan sagittal images and the text content of clinical reports. By incorporating visual characteristics of spinal structures, the proposed framework aims to generate detailed and reliable radiology reports that closely mimic the expertise of a human radiologist. This thesis introduces a novel spine radiology report generation framework, utilizing transformers trained on text reports and visual data from spine CT scans in the sagittal view. The core of this framework consists of a foundation model, which is fine-tuned during the training period. Moreover, Knowledge Distillation (KD) is applied, through which the encoder can enhance its learning by transferring knowledge from a more complex teacher mode. A report generation module comprises both an encoder and a decoder that works on the input medical images and generates a detailed report. The fine-tuned foundation model, in conjunction with KD, has shown significant improvements in performance metrics. Extensive evaluations performed on a public dataset validate the effectiveness of the proposed framework. The results show improvements which are significant in terms of the BERT Score and BLEU-1 metrics with the former rising from 0.7486 to 0.7522 while the latter increases from 0.6361 to 0.7291. Further, the framework is examined using four different approaches: (1) the reports written and submitted by original practicing radiologists, (2) reports lacking spine level details, (3) modified reports, and (4) reports written by the ChatGPT AI software. Among these, it was observed that the technique which skipped annotations on the spine level fared the best across most of the metrics with the highest being 0.9056 for the BLEU-4 score and 0.915 for the BERTScore. This indicates that the higher quality of the generated text is attained with the simplified report containing no detailed spine-level reporting. A test set of 50 cases was created to measure the quality of the report, with the completeness, correctness, and conciseness being assessed by radiologists as well as chatgpt. It was found that the radiologists’ original reports outscored the automatically generated reports in terms of completeness, correctness and conciseness. The performance of reports generated by ChatGPT is relatively lower than other methods, although it holds its competitiveness within every sub-method in terms of ROUGE-L (0.8552) and BERTScore (0.8655). This also shows that the model manages to captures fundamental linguistic structures, but it struggles with more complex or extended sequences.There is less disparity in the ChatGPT assessment of original and generated reports. although the original reports were rated highly by radiologists, the generated reports received more favorable assessments from ChatGPT, Notwithstanding this claim, the scores for the evaluated generated reports were in any case not as high as those earned for the original reports made by radiologists. The results indicate that the framework presented in this Thesis is a significant advancement on the existing process of generating spine radiology reports through automation with the use of modern deep learning techniques. The system’s capabilities in generating reports that are contextually relevant and semantically correct are commendable, but some improvements are still necessary if clinical requirements are to be strictly observed. Improvements have also been made in the performance of report generation through the use of transformers and Knowledge Distillation. This underlines the fact that there are still many research opportunities available not only in this area but also in automated reporting of medical images. en_US
dc.language.iso en en_US
dc.publisher College of Electrical & Mechanical Engineering (CEME), NUST en_US
dc.title KNOWLEDGE-DISTILLATION DRIVEN TRANSFORMER MODEL FOR REPORT GENERATION OF SPINE RADIOGRAPHS en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • MS [441]

Show simple item record

Search DSpace


Advanced Search

Browse

My Account