Abstract:
In the medical field, medical images are the visual representation of the organs and their
functions. The medical images are used for finding the medical problems present in human
being and highly trained professionals interpret these images into reports with a lot of time
required in a day, as each report take around 4-6 minutes. Other than that, the reports were used
to highlight the diseases, and writing a summarized report nowadays is very important for other
inexperienced persons in understanding so that it can help them in better treatment. The
automated summarization of radiology reports has tremendous potential. This operationally
improve the diagnosis process of diseases. An image-text joint embedding extraction from
chest x-rays and radiology reports, in producing summarized reports along with findings/tags
will significantly reduce the workload of doctors and help them in treating patients. Because
of the sensitivity of the process, the existing methods/techniques are not adequately accurate
and limitation of data effects in training the models. Therefore, the generation of a
summarization radiology report is an exceedingly difficult task. A novel approach is proposed
to address this issue. In this approach, use pre-trained vision-and-language models like
VisualBERT, UNITER, and LXMERT to learn multimodal representation from chest x-rays
or radiographs and reports. The pre-trained model classified the findings/tags in the chest xrays (CXR) using Gated Recurrent Units as a decoder to generate a summarized report based
on them. The Chest X-rays images and reports data are publicly available Indiana University
dataset. There are also different methods for automatic report summarization and findings/tags
classification from CNN-RNN-based models but mostly based on text or image only with less
accuracy. The image-text joint embedding using the pre-trained models helps in more accurate
report generation and improve performance in thoracic findings and summarized report
generation task. Experimental results obtained by utilizing Indiana University (IU) CXR
dataset showed that the suggested model attains the current state-of-the-art efficiency as
compared to other existing solutions to the baseline. As evaluation metrics, BLEU and ROUGE
have been applied along with AUC for findings/tags. The experiments are performed in
multiple ways and the accuracy achieved in diseases findings is about 98%, BLEU score of
0.35 and ROUGE score of 0.65 for the summarized radiology report.