Abstract:
Recently automatic speech recognition (ASR) has gained a lot of public attention due to spiking interest in the area by some major tech giants. From
voice-activated digital assistants in our homes to voice recognition based
search engines, speech recognition is being used everywhere these days. Modern voice recognition services support many languages but Urdu is usually
not one of them. In Pakistan huge portion of population do not speak or
understand English. Even some of the popular English voice recognition systems do not efficiently understand English in Pakistani accent. In this study
we developed a mixed English-Urdu speech recognition system for TPL Maps
Pakistan (a part of the TPL Corp) for their voice-enabled navigation service.
Kaldi an open source speech recognition toolkit is used for development of
speech recognition models. Two different ASR systems are developed and
compared in this study using general Urdu data and mixed data (general
Urdu + roman Urdu addresses). As a part of this study various GMM-HMM
and DNN-HMM models are developed and evaluated for both ASR systems.
In terms of Word Error Rate, ASR system developed using mixed data is
found to achieve better performance as compared to the system trained using only general Urdu data