NUST Institutional Repository

Enhancing Transparency of Social Media Offensive Communication Detection Techniques by Integrating Explainable Artificial Intelligence (XAI)

Show simple item record

dc.contributor.author Sana, Ayesha
dc.date.accessioned 2024-11-14T11:07:58Z
dc.date.available 2024-11-14T11:07:58Z
dc.date.issued 2024
dc.identifier.other 402433
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/47954
dc.description Supervisor: Dr. Momina Moetesum; Co Supervisor: Dr. Ahsan Saadat en_US
dc.description.abstract The rapid expansion of social media has intensified the spread of hate speech and online harassment, posing serious threats to vulnerable groups based on age, gender, religion, and ethnicity. While Artificial Intelligence (AI) offers powerful tools for detecting and mitigating toxic content, existing AI models often suffer from two critical limitations: biased predictions that disproportionately impact specific communities and a lack of interpretability that hinders trust in the results. Most hate speech detection models overlook the need for transparent explanations behind their classifications, leaving users and affected communities uncertain about how decisions are made. Addressing these gaps is essential for developing fair and trustworthy AI solutions that protect targeted groups from online abuse, which can escalate into real-world violence. This research aims to tackle the problem of hate speech by developing a method that integrates Explainable Artificial Intelligence (XAI) to provide clear and understandable explanations, to reduce the bias that target specific groups based on age, gender, ethnicity, and religion. For developing a hate speech detection system that incorporates XAI, we initially started by applying five machine learning models that include Multinomial Naïve Bayes (MNB), Logistic Regression (LR), Long Short Term Memory (LSTM), and the Bidirectional Transformer model BERT on the HateXplain benchmark dataset for text classification. The results revealed that BERT outperformed the other models, achieving an accuracy of 98.5%. To interpret the model’s predictions, we used two explainability methods, LIME and SHAP, that provided insights into the features influencing the classification decisions. In order to detect hateful content targeted at specific groups, we developed a multiclass word list based on attributes like age, religion, gender, and ethnicity. After comparing the model’s output with the multiclass word list, we utilized these keywords to redefine and update the data, followed by retraining the BERT model. Finally, we provided explanations for hate speech targeted towards a specific group. In the end, the explainable methods are evaluated based on comprehensiveness, sufficiency, and Intersection over Union (IoU) to determine their effectiveness, that measure how well the model-generated explanations align with human-annotated rationales and results showed that while both LIME and SHAP performed comparably in providing explanations, SHAP proved to be more computationally expensive and time-consuming. However, this work opens up promising opportunities for further research in enhancing explainability methods. Future work could explore additional XAI approaches, as well as apply these methods to diverse datasets to further enhance the effectiveness of explainability methods. en_US
dc.language.iso en en_US
dc.publisher School of Electrical Engineering & Computer Science (SEECS), NUST en_US
dc.title Enhancing Transparency of Social Media Offensive Communication Detection Techniques by Integrating Explainable Artificial Intelligence (XAI) en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • MS [375]

Show simple item record

Search DSpace


Advanced Search

Browse

My Account