NUST Institutional Repository

Automated Urdu Speech Recognition System using Deep Learning

Show simple item record

dc.contributor.author RASHED, WAQAS
dc.date.accessioned 2023-08-10T11:34:24Z
dc.date.available 2023-08-10T11:34:24Z
dc.date.issued 2019
dc.identifier.other 00000118445
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/36275
dc.description Supervisor: Dr. Arslan Shaukat en_US
dc.description.abstract A computer needs to be able to understand what you said, before it can even understand what you mean and it encompasses Natural Language Processing (NLP). There are limited open source speech recognition systems are available with close to human level performance and it is particularly true for the Urdu language. In this thesis we worked on the development of an Urdu language Automatic Speech Recognition (ASR) system based on Deep Learning. Mozilla’s DeepSpeech which is an open source implementation of TensorFlow, Optimized RNN, CTC and Bi-directional LSTM used to train and build acoustic model for Urdu speech recognition. A language model is constructed and trained to represent Urdu alphabets and common vocabulary and its binary file is created with help of KenLM tools to feed it in the system for estimation and decoding purposes. This method makes training a speech recognition system a lot simpler and it does not require many complex neural network layers or knowledge about a language to train. The system is trained using virtual machine on Google’s cloud platform with GPU support. Based on WER (Word Error Rate) the number of nodes in the core layers of neural network is optimized acquired on data set, having concern to GPU memory limits. Our study lead to an Urdu language acoustic model which has been trained based on data set we have collected. We collected almost 390K speech instances of total 1008 sentences from 400 male and female students. Desktop and mobile applications are developed to automatically record and collect on the cloud the spoken audio files along with their transcripts. Language model is constructed with these commonly used Urdu sentences. Data set is separated into three categories including training, validation and testing data with a split of 70%, 20% and 10% respectively—the best results in form of WER we get from the system is less than 10%. en_US
dc.language.iso en en_US
dc.publisher College of Electrical & Mechanical Engineering (CEME), NUST en_US
dc.subject Key Words: Deep Learning, Urdu ASR, DeepSpeech, LSTM, RNN, CTC en_US
dc.title Automated Urdu Speech Recognition System using Deep Learning en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • MS [441]

Show simple item record

Search DSpace


Advanced Search

Browse

My Account