Abstract:
Due to advancements in technology, speech recognition has grown to be one of the most important aspects of human-computer interaction. In recent years, a tremendous amount of research has been conducted in the area of speech signal processing. Particularly, the subject of Automatic Speech Recognition (ASR) technology has seen an increase in interest. ASR started out with basic systems that could only recognize a handful of sounds, but it has now developed into complex systems that not only can understand but also corresponds to human speech with ease. Major languages have access to a wealth of ASR research, however low resource languages are still underrepresented in this area of study. The foundation of this research work is Automatic Speech Recognition, specifically for Pashto, a low resource language. The creation of low resource data repositories to evaluate the newest ASR trends specifically for Pashto is needed for current advancement in ASR technology. As a result of its successful implementation, native speakers may interact with computers via voice commands in their native languages and take full advantage of the digitization boom. Research in this field will benefit academics by expanding existing huge corpora and generating cutting-edge ASR models for languages with limited resources. This research will also aid in identifying current, significant ASR difficulties in practical settings. Additionally, it will highlight the present shortcomings in traditional ASR systems. Also, this work aims to implement ASR for low resource language Pashto by making use of Facebook's most recent algorithm, wav2vec2, which is quite latest in ASR trends in recent years. After successful development of dataset, we have trained and fine-tuned our model to the latest trend of ASR. Almost 66% output accuracy with 37% WER (word error rate) is obtained at the output end of our model, which is quite an achievement for a low resource ASR system.