dc.description.abstract |
The proliferation of political extremism on social media has adverse effects on not only the
individuals who are targeted but also on society at large. It causes great damage to the hosting
platform as well where such content is being shared. Even though notable research work has
been done on sentiment analysis and classification in both academia and industry, an effective
and robust tool to detect and classify political extremism on various social media platforms is
still a challenge. Previous research work had largely focused on detecting general hate speech
on social media via binary classification. But, considering the diverse nature of extremism,
binary classification does not suffice the purpose. In this research, we have studied existing
solutions and after finding their limitations, we have developed a multi-class and multi-lingual
model that detects and distinguishes between neutral, moderate, and strong political extremist
content. For training our model, we collected a data set of around 10,000 tweets from prominent
political parties and politicians in Pakistan. We used the latest pre-trained BERT model and
machine learning classifiers like Support Vector Machine, Random Forest, Naıve-Bayes, and
Stochastic Gradient Descent to analyze and detect different classes of extremism. The highest
accuracy we achieved is 89% in binary classification and 86% in multi-class classification using
the Term Frequency-Inverse Document Frequency word embedding and SVM classifier. It is
hoped that the results of this thesis will provide researchers and organizations with a viable
solution to detect and classify extreme political sentiments. |
en_US |