Automated Urdu Speech Recognition System using Deep Learning

RASHED, WAQAS

DSpace Home
→
E-Theses
→
CEME
→
Computer Software Engineering
→
MS
→
View Item

dc.contributor.author	RASHED, WAQAS
dc.date.accessioned	2023-08-10T11:34:24Z
dc.date.available	2023-08-10T11:34:24Z
dc.date.issued	2019
dc.identifier.other	00000118445
dc.identifier.uri	http://10.250.8.41:8080/xmlui/handle/123456789/36275
dc.description	Supervisor: Dr. Arslan Shaukat	en_US
dc.description.abstract	A computer needs to be able to understand what you said, before it can even understand what you mean and it encompasses Natural Language Processing (NLP). There are limited open source speech recognition systems are available with close to human level performance and it is particularly true for the Urdu language. In this thesis we worked on the development of an Urdu language Automatic Speech Recognition (ASR) system based on Deep Learning. Mozilla’s DeepSpeech which is an open source implementation of TensorFlow, Optimized RNN, CTC and Bi-directional LSTM used to train and build acoustic model for Urdu speech recognition. A language model is constructed and trained to represent Urdu alphabets and common vocabulary and its binary file is created with help of KenLM tools to feed it in the system for estimation and decoding purposes. This method makes training a speech recognition system a lot simpler and it does not require many complex neural network layers or knowledge about a language to train. The system is trained using virtual machine on Google’s cloud platform with GPU support. Based on WER (Word Error Rate) the number of nodes in the core layers of neural network is optimized acquired on data set, having concern to GPU memory limits. Our study lead to an Urdu language acoustic model which has been trained based on data set we have collected. We collected almost 390K speech instances of total 1008 sentences from 400 male and female students. Desktop and mobile applications are developed to automatically record and collect on the cloud the spoken audio files along with their transcripts. Language model is constructed with these commonly used Urdu sentences. Data set is separated into three categories including training, validation and testing data with a split of 70%, 20% and 10% respectively—the best results in form of WER we get from the system is less than 10%.	en_US
dc.language.iso	en	en_US
dc.publisher	College of Electrical & Mechanical Engineering (CEME), NUST	en_US
dc.subject	Key Words: Deep Learning, Urdu ASR, DeepSpeech, LSTM, RNN, CTC	en_US
dc.title	Automated Urdu Speech Recognition System using Deep Learning	en_US
dc.type	Thesis	en_US