Abstract:
Efficient extraction of information from identification documents, such as Computerized National Identity Cards (CNICs), is a pivotal aspect in modern document analysis and information
retrieval systems. Traditional Optical Character Recognition (OCR) techniques often fall short
in handling diverse challenges posed by real-world scenarios, including blurred images, varying
illumination conditions, and complex backgrounds. This thesis presents an innovative approach
leveraging an OCR-free algorithm known as "Donut" with pre-processing and optimizing techniques to enhance the accuracy and robustness of information extraction tasks. The study initiates with localization task utilizing YOLOv5 for detection of text, coupled with OCR-based
recognition and extraction using Tesseract. Recognizing the limitations of OCR techniques, the
research transitions to the OCR-free approach, preparing a self-annotated dataset of CNICs en coded in JSON lines text format. The proposed methodology involves dataset pre-processing
and augmentation techniques for training, encompassing random crop, random rotate, random
brightness-contrast adjustments, and Gaussian noise injection. The Donut model configuration
is detailed, and the model is optimized in terms of memory, emphasizing its adaptability to handle various challenges in visual data, including blurred, dark, bright, and noisy images. Notably,
the model exhibits a remarkable accuracy of 99.96% with an F1 score of 99.46% on test data
with our proposed pipeline, showcasing its robust performance in real-world conditions. Also,
HTML bio-data forms are prepared and trained with the same pipeline for Donut model, exhibiting consistent performance for test data. To facilitate practical implementation, a Django
API is developed for seamless testing of images, demonstrating the model’s effectiveness in
real-time applications. The findings of this research underscore the significance of OCR-free
approaches, specifically the Donut algorithm, in overcoming the limitations of traditional OCR
techniques. The outcomes confirm the model’s exceptional performance in information extraction tasks related to ID cards, laying the foundation for advancements in document analysis,
identity verification, and broader applications in the field of information retrieval.