dc.contributor.author |
RASHED, WAQAS |
|
dc.date.accessioned |
2023-08-10T11:34:24Z |
|
dc.date.available |
2023-08-10T11:34:24Z |
|
dc.date.issued |
2019 |
|
dc.identifier.other |
00000118445 |
|
dc.identifier.uri |
http://10.250.8.41:8080/xmlui/handle/123456789/36275 |
|
dc.description |
Supervisor: Dr. Arslan Shaukat |
en_US |
dc.description.abstract |
A computer needs to be able to understand what you said, before it can even understand
what you mean and it encompasses Natural Language Processing (NLP). There are limited open
source speech recognition systems are available with close to human level performance and it is
particularly true for the Urdu language. In this thesis we worked on the development of an Urdu
language Automatic Speech Recognition (ASR) system based on Deep Learning. Mozilla’s
DeepSpeech which is an open source implementation of TensorFlow, Optimized RNN, CTC and
Bi-directional LSTM used to train and build acoustic model for Urdu speech recognition. A
language model is constructed and trained to represent Urdu alphabets and common vocabulary
and its binary file is created with help of KenLM tools to feed it in the system for estimation and
decoding purposes. This method makes training a speech recognition system a lot simpler and it
does not require many complex neural network layers or knowledge about a language to train. The
system is trained using virtual machine on Google’s cloud platform with GPU support. Based on
WER (Word Error Rate) the number of nodes in the core layers of neural network is optimized
acquired on data set, having concern to GPU memory limits. Our study lead to an Urdu language
acoustic model which has been trained based on data set we have collected. We collected almost
390K speech instances of total 1008 sentences from 400 male and female students. Desktop and
mobile applications are developed to automatically record and collect on the cloud the spoken
audio files along with their transcripts. Language model is constructed with these commonly used
Urdu sentences. Data set is separated into three categories including training, validation and testing
data with a split of 70%, 20% and 10% respectively—the best results in form of WER we get from
the system is less than 10%. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
College of Electrical & Mechanical Engineering (CEME), NUST |
en_US |
dc.subject |
Key Words: Deep Learning, Urdu ASR, DeepSpeech, LSTM, RNN, CTC |
en_US |
dc.title |
Automated Urdu Speech Recognition System using Deep Learning |
en_US |
dc.type |
Thesis |
en_US |