Abstract:
With great technological advancements made in computational powers, Automatic Speech
Recognition (ASR) systems have seen a surge in interest and usage. Much research
has been done in ASR systems in languages like Chinese, English, Spanish, Korean or
even in our national language Urdu, resulting in a better Human Computer Interaction (HCI).But there is a dearth of speech recognition systems done in regional and
local languages like Sindhi. Over 30 million speakers of Sindhi Language in Pakistan
are unable to communicate with a machine in Sindhi which is a great hurdle in uti lizing the best of what technology has to offer. Automatic Speech Recognition (ASR)
systems specifically built for local languages can help in overcoming these hurdles. In
this study a speech recognition system for Sindhi language has been built with Kaldi
toolkit. Hidden Markov Models (HMM) have been used along with Guassian Mixture
Models (GMM) and Deep Neural Networks (DNN). Experiments have been conducted
on GMM-HMM and DNN-HMM techniques regarding noise, training size, phonetic dictionary size and DNN parameters. DNNs were tested and compared using parameters
such as value of p in p-norm non-linearity, number of hidden layers and learning rates.
DNN with 6 hidden layers and p=2 gave best results. Accuracy of our speech recognition system is measured in Word Error Rate (WER). Experiments have been carried
out on various speech recognition models and recipes for improved WER and results.
These results could then be utilized in different areas like navigation, home automation
etc. to increase HCI and usage of technology by Sindhi speakers.