Abstract:
Numerous text detection and recognition methods have been proposed in recent times
that have demonstrated remarkable performance on the standard benchmark datasets.
The existing datasets includes the scripts of numerous languages like English, Chinese,
French, Arabic, German, etc. The traffic navigation signboards in Pakistan, and many
states of India are written in Urdu along with the English translation to guide the human
drivers. To the best of our knowledge, there is no public dataset available that includes
annotations of real traffic navigation signboards containing Urdu text regions with the
corresponding transcription. To this end, we present Deep Learning Laboratory’s Traf fic Signboards Dataset (DLL-TraffSiD) along with multi-lingual text detection, recog nition, and language identification annotations to develop multi-lingual text detection
and recognition methods for traffic signboards. In addition, we present a pipeline for
multi-lingual text detection and recognition to perform an efficient multi-lingual text
detection and recognition in outdoor road environment. The proposed pipeline con sists of three sub-architectures (i) text-detection, (ii) language identification, and (iii)
text-recognition, which achieved 89%, 98.7%, and 92.18% accuracy, respectively. Lastly,
a comprehensive comparison has been carried out to demonstrate the effectiveness of
our proposed dataset and multi-lingual text detection and recognition pipeline. The
proposed dataset along with implementation is available at https://github.com/aati ibutt/TraffSign/.