Abstract:
Lately, social media has become one of the favorite topics of fields like Data Science, Machine
Learning, Data Mining, Big Data, and Natural Language Processing. This is due to the fact that
data is abundantly present on social media platforms. These platforms include Facebook, Twitter,
Instagram, and Flickr, etc. Gaining some insight into user data can be of great use when it comes
to tailored campaigns like advertisements, or political campaigns. Gender prediction also possesses
special significance when it comes to other domains where the identification of an organization is
important. For example, emergency management and on other occasions where classifying
between male or female is critical for instance in campaigns that are directed towards the issues or
awareness of gender-based ferocity.
Taking the significance of gender prediction into consideration, this research tries to assess and
evaluate a readily presented approach to automatically detect the gender of the users based on
provided tweets. This can be helpful in targeting a specific gender group for advertisements or for
social media campaigns. As social media campaigns are really helpful in educating a wide range
of people with different backgrounds and geographical locations. Convolutional Neural Network
or more commonly known as CNN has been used for this categorization. CNN is mostly used for
image classification but it is also helpful in text classification. CNN has been made use of for
classifying user’s gender by considering the texts from their tweets.
CrowdFlower dataset has been used in this thesis. After preprocessing the user tweets are inputted
to the CNN where the embedding layer receives polished tweets. It is quite usual to use forward
or backward propagation with neural networks but here Adaptive moment estimation technique
has been used for weight optimization.
The mean accuracy that has been achieved by the proposed system is 97%. The stated figure is
close to 100 percent and thus the proposed system can be used to form an automated prediction
system and can be made use of for numerous purposes including tailored advertisements.
In the future, different combinations of weight optimization and loss functions can be used to further
improve the performance of the proposed system.