Abstract:
The burden of ocular diseases is a significant public health concern globally, with millions of
people affected by conditions that can lead to visual impairment and blindness. In Pakistan, the
prevalence of ocular diseases such as Diabetic Macular Edema (DME), Choroidal Neovascularization
(CNV), and Drusen has been rising, primarily due to increased incidences of diabetes
and aging populations. The early detection and accurate classification of these diseases are
crucial to prevent severe vision impairment and blindness. However, the scarcity of available
diagnostics resources and the traditional methods of diagnosis, which rely heavily on the expertise
of ophthalmologists, are time-consuming and subject to human error poses significant
challenges. Leveraging the power of deep learning models for the classification of retinal diseases
using Optical Coherence Tomography (OCT) images provides an optimal solution to this
problem, enabling automated, rapid, and accurate diagnosis even in resource-limited settings.
The primary goal of this project is to develop and evaluate deep neural network models for
the classification of ocular diseases using OCT images. We investigated the performance of
four convolutional neural network (CNN’s) architecture: VGG16, VGG19, ResNet50, and InceptionV3.
These models are evaluated under different conditions, including without image
pre-processing and data augmentation, with image pre-processing and data augmentation, and
with the Synthetic Minority Over-sampling Technique (SMOTE) to address class imbalance.
The models were evaluated using performance metrics such as accuracy, precision, recall, F1
score, Cohen’s Kappa score, Matthews Correlation Coefficient (MCC), and Receiver Operating
Characteristic Area Under the Curve (ROC AUC) score.
The dataset comprised 7,314 OCT images, which were categorized into four groups: DME,
DRUSEN, CNV and normal. The dataset was partitioned into three categories: training, validation,
and test sets, with respective ratios of 90%, 5%, and 5%. The initial baseline performance evaluation indicated that VGG16 and VGG19 outperformed InceptionV3 and ResNet50. Enhanced
models implementing data augmentation and pre-processing techniques exhibited substantial
performance enhancements with VGG16 and VGG19 showing the highest accuracy
and robustness. The Adam optimizer consistently outperformed the SGD optimizer across all
models.
SMOTE was applied to the most prevalent performing models (VGG16 and VGG19), resulting
in enhanced performance and effectively addressing the class imbalance issue. The
SMOTE-enhanced VGG19 model scored 99.12% accuracy, 99.13% precision, 99.12% recall,
and 99.11% F1 score. These gains were verified on an external dataset, where VGG19 outperformed,
demonstrating its generalizability.
Gradient-Weighted Class Activation Mapping (Grad-CAM) was employed to investigate the
model’s approach to making decisions. Grad-CAM was employed to identify and visualize the
specific regions of the images that had the most influence on the model’s decisions. The gradcam
visualizations provided valuable insights related to diseased regions in the OCT images.
This study successfully demonstrates the potential of deep learning models, particularly VGG16
and VGG19, for accurate and robust eye disease classification using OCT images. The findings
highlights the importance of data augmentation, pre-processing, and addressing class imbalance
in training deep learning models for medical image analysis. Future research will aim to
optimize model architectures, investigate ensemble learning techniques, and facilitate the practical
deployment of these models in clinical settings. This would facilitate the prompt detection
as well as immediate treatment of ocular conditions in Pakistan.