dc.contributor.author |
Ali, Zeeshan |
|
dc.date.accessioned |
2023-08-02T11:04:40Z |
|
dc.date.available |
2023-08-02T11:04:40Z |
|
dc.date.issued |
2023 |
|
dc.identifier.other |
319593 |
|
dc.identifier.uri |
http://10.250.8.41:8080/xmlui/handle/123456789/35440 |
|
dc.description |
Supervisor: Dr. Safdar Abbas Khan |
en_US |
dc.description.abstract |
The advent of Deep Learning in Computer Vision has resulted in advancements in many
domains of life encompassing a diverse set of fields. Object Character Recognition plays
a vital role in the modern age of Artificial Intelligence. It is a challenging task, difficult to
implement, and computationally expensive. Sindhi is a literature-rich language spoken
by millions of people around the globe. It has an exuberance of preserved grammatical
forms. There has been a significant development in OCR systems for English. Little
work has been done on Arabic script. Most of the Sindhi literature uses the extended
Perso-Arabic script. No benchmark datasets have been published to the best of our
knowledge. Consequently no state-of-the-art Sindhi OCR models have been devised.
This thesis attempts to fill this research gap by making the following contributions. We
have extracted a set of 22,597 ligatures that are found in Sindhi literature. We present
a synthesized benchmark dataset for Sindhi printed text recognition at ligature level.
The dataset is font diverse, comprising of 256 unique fonts. Finally, we have setup a
baseline neural network for Sindhi Ligature Recognition in printed text. It has achieved
91.85% test accuracy on the benchmark dataset. Our baseline can be used to build the
complete pipeline of a Sindhi OCR that is font invariant. |
en_US |
dc.language.iso |
en_US |
en_US |
dc.publisher |
School of Electrical Engineering & Computer Sciences (SEECS), NUST |
en_US |
dc.title |
Sindhi Ligature Recognition in Printed Text: A Large Scale Font Diverse Sindhi Ligature Recognition System |
en_US |
dc.type |
Thesis |
en_US |