Abstract:
Cyberbullying has emerged as a deadly nuisance for social norms and ethics, posing one of the most critical challenges for researchers to detect and monitor cyberbullying on social media platforms through the use of offensive and abusive wordings to express their annoyance or frustration. Much of the advancements have already been carried out by the linguistic community but in resource-rich languages and very little effort has been put in for resource-poor languages to develop an automatic monitoring and detection system for abhorrent re-marks. The main reason is the non-availability of datasets for native/local languages. The objective of this research work is to detect abhorrent and abusive remarks automatically for resource-poor language i.e. “Punjabi”, through the Zero-Shot Learning technique, where in the model classifies the samples without ever seeing or training examples. The seen and unseen categories are combined using Zero-Shot approaches by using auxiliary information to indicate observable differentiating/distinguishing properties of objects. Dataset creation is the foremost task and is done manually due to its non-availability online. We collected two datasets of 0.1 Mn comments/feedback and 1000 comments/feedback separately from different social media platforms for “Punjabi”. Manual labeling of 1000 comments as ‘Of-fensive’ and ‘Non-Offensive’ was done for comparison with the prediction of our proposed model based on feature extraction, to calculate the accuracy. The proposed classification by the model is done by taking the Euclidean distance from the centroid of the offensive words and the document. The zero-shot learning model gave an accuracy of 84.6% against the manually labeled dataset as ‘Offensive’ and ‘Non-Offensive’. Moreover, automatic labeling of the 0.1 Mn dataset is also done by the proposed model. The corpus created in this work is made available for the researcher working in this domain.