Abstract:
Urdu is a complex language widely spoken in South East Asia. Since it is also the national language of Pakistan, Optical Character Recognition (OCR) finds much application of Urdu numerals in our country. In many areas, the Urdu numeral finds its application in Pakistani Currency Notes, Pakistani postage stamps, and automatic dictation. Due to the unavailability of the public datasets Urdu numerals, there isn‟t much research done in this domain. While there are many public datasets available for famous languages such as English, Chinese, Arabic, Japanese hence these languages excel in the latest research areas. In this thesis, we provide a novel dataset of Urdu numerals that have been collected keeping in view the dynamics of the real world as every person has a unique style of writing. Since the state-of-the-art techniques in deep learning require bundles of data to train, we have employed the Deep Convolutional Generative Adversarial Networks (DCGAN) which have been rarely explored for this problem. The resulting augmented images have been visualized using t-distributed Stochastic Neighbour Embedding (t-SNE) that further confirms the realness of the images. It is almost impossible to recognize the real and fake images in 2D space. Next, we have to build an Urdu numeral classifier to recognize the diversified Urdu digits. We have employed ResNet18, ResNet18 and Squeeze and Excitation block (SE), and ResNet18 and Convolution Block of Attention Module (CBAM). These modules are tested with and without DCGAN artificially produced augmented data. We conclude that our dataset achieved 100% accuracy on ResNet18 and CBAM model. To further validate this accuracy we have tested the performance of the model using a test set occupied from four sources namely another set of handwritten numerals on the same lines as the original dataset, Pakistani currency notes, numerals written on gadgets with touch and thin strokes using pointer. Our model was able to achieve 95% accuracy on these diversified test sets. Furthermore, this classifier is tested on numerals of Persian, Arabic, and English language. Our model achieved an accuracy of 79.3% on numerals of the Persian language and 54.5% on numerals of the Arabic language. However, the lowest accuracy of 18.4% is achieved on numerals of the English language. These results make our model very reliable to be deployed in any practical application. Using this model can revive our national language and bring it up to speed with the research world