Abstract:
Deep learning has significantly improved handwriting text recognition, esp.
for Latin scripts. Arabic scripts including Urdu is a family of complex scripts
and they pose difficult challenges for deep learning architectures. Data avail ability is a significant obstacle in developing Urdu handwriting recognition
systems. Since gathering data is a costly and challenging task, there is a
need to increase training data using novel approaches. One possible solu tion is to make a model that can generate similar yet different samples from
the existing data samples. In this paper, we propose such models based on
Generative Adversarial Networks (GANs) that have the ability to synthe size realistic samples similar to the original dataset. The generator is class
conditioned to produce Urdu samples of varying characters that differ in
style. Visual and quantitative analysis convey that generated samples are of
realistic nature and can be used to increase datasets. Synthesized samples
integrated with the existing training set is shown to increase the performance
of a handwriting recognition model.