Abstract:
Now adays, cyberbullying on social media platforms is at its peak. It’s a vital challenge for researchers these days. And hence a tally of research work is done to address this is sue in a variety of languages around the Globe. Social media dices are heavily used by people to express their views in their native languages. Besides positive views, people of ten use abusive or offensive language to express their anger or frustration. Resource rich languageshaveoffensivelanguagedetectionsystemstoautomaticallymonitorandblockof fensive content, however, they are very rare for low resourced languages. This is because of the non-availability of datasets for local languages. This work proposes a model which au tomaticallydetectsoffensivelanguageforaverylowresourcelanguagei.e., Pashto. Thero manPashtodatasetiscreatedbypicking60thousandcommentsfromdifferentsocialmedia and labeling them manually. The proposed model is trained and tested using three different feature extraction approaches i.e., bag-of-words (BoW), term frequency-inverse document frequency (TF-IDF), and sequence integer encoding. Four traditional classifiers and a deep sequence model are used to train on this task. Experimental result shows that random forest classifier works best and give 94.07 The corpus created in this work is made available for the researcher working in this domain.