Abstract:
With increased digitization of documents over the past decades, the task
of word spotting has acquired much significance in the field of document
analysis and recognition. Deep learning has revolutionized many fields and
promises to make similar inroads in this field and improve performance for
various document analysis tasks. This research presents a systems for the
task of word spotting of Urdu text using effective feature extraction. The
systems take ligature images of Urdu text and extract features to train on
vtwo different learning models. For the purpose of feature extraction, HOG
features and autoencoders have been used. The classifiers used in this study
were SVM and LSTM models. The system has been tested on two separate
data sets of printed and hand written Urdu text. The systems produced
outstanding results when trained on the printed Urdu database. In the case
of handwritten database however, the intra-class variation is too large which
results in poor accuracy of the system. Hence, for the case of hand written
text, the data was modified during preprocessing to create three different
data sets to improve the performance of the system. This process improved
the system’s performance significantly. As Convolutioanl Neural Networks
are best suited for classifications with image inputs, a comparison of the re sults obtained in this study and the classification by CNN on both datasets
is presented in the results section. The CNN architecture used for the comparison is VGG16. The research shows that the best results were obtained
when LSTM was trained on HOG features of the ligature images.