dc.description.abstract |
This thesis introduces an innovative word-level Optical Character Recognition (OCR) model designed
specifically for digital Urdu text recognition. Leveraging the power of transformer-based architectures
and attention mechanisms, the proposed model was trained on a comprehensive dataset comprising
approximately 160,000 Urdu text images. Remarkably, the model achieved a commendable character
error rate (CER) of 0.242, indicating its superior accuracy in recognizing Urdu characters. The key
strength of the model lies in its unique architecture, incorporating the permuted autoregressive
sequence (PARSeq) model. This advanced approach enables context-aware inference and iterative
refinement, leveraging bidirectional context information to enhance recognition accuracy.
Additionally, the model's ability to handle a diverse range of Urdu text styles, fonts, and variations
further enhances its applicability in real-world scenarios. While the model demonstrates promising
results, it does have some limitations. Blurred images, non-horizontal orientations, and the overlay of
patterns, lines, or other text can occasionally lead to suboptimal results. Additionally, trailing or
following punctuation marks may cause noise in the recognition process. Addressing these challenges
will be a focal point of future research. The proposed model's exceptional performance and its ability
to adapt to various text styles make it a valuable tool for applications that require accurate and
efficient Urdu text recognition. Future work will focus on refining the model, exploring data
augmentation techniques, optimizing hyperparameters, and integrating context-aware language
models to further improve its overall performance and robustness. |
en_US |