Abstract:
Pharmaceutical industry worldwide is worth 1.48 trillion USD [1]. In this industry there
is a need for automation, especially in the pharmaceutical industry of the developing
world. This thesis investigates the application of object detection and optical character
recognition (OCR) techniques to detect the medicine boxes and extracting information
from them. The primary objective is to develop a deep learning-based system that can
accurately detect and recognize medicine boxes and extract critical information such as
batch number, expiry date, manufacturing date and retail price. The study compares the
performance of five prominent object detection models, YOLOv8n, YOLOv8s,
YOLOv9t, YOLOv9c and YOLOv10n, trained, tested and validated on a custom
dataset comprising of 21,579 images.
The methodology involves first training a number of different YOLO and RTDETR models with different epochs on a similar but smaller exploratory dataset of
medicine boxes. A total of 68 models were trained on the exploratory dataset. These
models were then evaluated based on their mAP50-95 vs Inference time graphs. The
best five models that encompassed a balance of precision and speed or were either the
fastest or most precise were chosen. These selected models were then trained on a
bigger dataset, termed as the primary dataset. A variation of epochs was used to train
the five models from scratch.
The results demonstrated that for the exploratory dataset YOLOv8n, YOLOv8s,
YOLOv9t, YOLOv9c and YOLOv10n performed the best. These five models were then
further used for the next phase of research where they were trained on the primary
dataset. The model that performed the best out of the five in terms of speed and accuracy
was then integrated with a GUI system built for the industry. This system performs
inference on the medicine boxes to detect them and extracts crucial information form
them using the OCR.
Future work will focus on the areas for improvement, i.e., expanding the dataset,
refining the models and exploring additional features, with the aim of further enhancing
the system's performance and applicability.