NUST Institutional Repository

Speech Corpus Generation For Low Resource Language (Pashto)

Show simple item record

dc.contributor.author Shoaib, Muhammad
dc.contributor.author Supervised by Dr. Shibli Nisar.
dc.date.accessioned 2023-01-20T04:35:55Z
dc.date.available 2023-01-20T04:35:55Z
dc.date.issued 2022-12
dc.identifier.other TCS-537
dc.identifier.other MSCSE / MSSE-27
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/32286
dc.description.abstract Meta AI’s new unsupervised speech recognition framework (wav2vec variants) is the latest development of several years of work in speech recognition models, data sets, and training techniques. The wav2vec model has changed the way traditional ASR worked, now a few hours of spoken data is required to obtain transcribed speech. Despite this, over 6000 languages couldn't exploit the opportunity because they lack the required speech data corpus. The corpus should contain 4-5 hours of speech data on average, which is a challenge, especially for a low-resource language. To deal with the challenge the current approach is to manually record speech and then transcribe it. Such an approach is resource intensive and costly. On the internet, there is a wealth of speech data. To capitalize on such data, we will use an automated speech utilization process instead of manual recording. In our thesis, we have proposed a model that automatically fetches audio data from free video/audio sharing websites and segments them to produce desired length audio frames. The proposed model is generic and can be implemented for any low-resource language. Furthermore, using the proposed pipeline we generated speech data for the Pashto language. en_US
dc.language.iso en en_US
dc.publisher MCS en_US
dc.title Speech Corpus Generation For Low Resource Language (Pashto) en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account