ASRB: A Novel Automatic Speech Recognition for Spoken Burushaski Language

Wali, Hussain

DSpace Home
→
E-Theses
→
SEECS
→
Data Science
→
MS
→
View Item

dc.contributor.author	Wali, Hussain
dc.date.accessioned	2023-06-23T09:20:24Z
dc.date.available	2023-06-23T09:20:24Z
dc.date.issued	2023
dc.identifier.other	321147
dc.identifier.uri	http://10.250.8.41:8080/xmlui/handle/123456789/34190
dc.description	Supervisor: Dr Muhammad Khuram Shehzad	en_US
dc.description.abstract	With this thesis, our aim is to establish a foundation for research and development of the Burushaski language. We will construct the first ever audio and textual dataset that can be used for future research. Our final goals include the development of a Latin-based script, a structured and clean audio dataset, a usable text corpus, and an initial Automatic Speech Recognition (ASR) system using the Kaldi toolkit based on the developed datasets for the Burushaski language. The Burushaski language is a language isolate and is considered one of the most difficult languages to learn and model. In this paper, we present the first ever open source free database of audio and text datasets of the Burushaski language collected from speakers. Additionally, we present a continuous Burushaski speech recognition model using the Kaldi toolkit. From continuous speech samples of the Burushaski language audio dataset, we extracted Mel frequency cepstral coefficients (MFCC) features for the ASR system. We provide detailed reports on the performance of the ASR system for both monophone and triphone models, including tri1, tri2, and tri3 models using N gram language model. The word error rate (WER) is the metric on which we measured the performance of the system. We trained the system on a limited dataset and noticed that the triphone model (tri3) gives significantly better performance compared to the monophone model system. The tri3 model has also performed much better than the tri2 model, and the tri2 model has better performance than the tri1 model ASR. We also present a detailed framework that can be used to design and develop systems to create ASR systems for other zero-resource languages. This framework can be used for dataset generation any any language.	en_US
dc.language.iso	en	en_US
dc.publisher	School of Electrical Engineering and Computer Sciences (SEECS), NUST	en_US
dc.title	ASRB: A Novel Automatic Speech Recognition for Spoken Burushaski Language	en_US
dc.type	Thesis	en_US