Abstract:
Living in a world of constant connectivity with others through electronic devices, Islamophobia
has become a very critical issue, especially in social network sites. In contrast
to regular hate speech, which is most often expressed in words, Islamophobia in the
internet world can be expressed in pictures, text, videos, and audio and therefore, is
much more complex to trace. Conventional machine learning techniques cannot be used
for their classification since they tend to lack context in detecting hate speech. Therefore,
researchers have shifted to deep learning techniques. Previously developed deep
learning approaches are based on unimodal architectures that classify either textual or
visual data, thereby not considering the overall context of data including both visual
and textual. This research aims to fill the gap that currently exists in the identification
and categorization of Islamophobic memes. We have proposed a multimodal technique
that integrates deep learning models for the classification of Islamophobic content from
both, textual and visual information. BERT and ResNet-50 models are used for text
and image classification respectively. The evaluation results demonstrate that the proposed
multimodal approach accurately identifies Islamophobic content with an overall
accuracy score of 95% and cross-entropy loss of 15%.