Depression Detection using Machine Learning Techniques

Nadeem, Aleena; Supervised by Dr. Hammad Afzal.

DSpace Home
→
E-Theses
→
MCS
→
Computer Software Engineering
→
MSCS
→
View Item

dc.contributor.author	Nadeem, Aleena
dc.contributor.author	Supervised by Dr. Hammad Afzal.
dc.date.accessioned	2021-11-20T08:40:47Z
dc.date.available	2021-11-20T08:40:47Z
dc.date.issued	2021-10
dc.identifier.other	MSSE/MSCSE-26
dc.identifier.other	TCS-494
dc.identifier.uri	http://10.250.8.41:8080/xmlui/handle/123456789/27578
dc.description.abstract	Social media has proven to be a great platform for people to express their emotions and feelings. Thus, a user’s social media can speak a great deal about their emotional state, feelings and mental health. Depression is in the class of mental health problems that has become prevalent in the whole world. By the year 2020 depression was predicted to be the second leading causes of global burden of diseases. Considering the high pervasiveness of the disease, this study aims to employ Natural Language Processing techniques for the task of depression detection. For the stated errand we have resorted to the dataset consisting of tweets. We first manually annotated the tweets to capture the implicit and explicit content about depression. The data has been labelled in two ways i.e. binary and ternary labels. Binary labels categorized the data into depressed and non-depressed classes whereas, ternary labels distinguished the tweets into depressed, non-depressed and third category of tweets in which a person is referring to someone else’s depression or giving generic information about the topic. In this study twitter specific data preprocessing steps were carried out. Various feature extraction techniques like TFIDF, N-Grams and word embeddings including Word2Vec, Glove and Fasttext along with machine and deep learning models have been applied. Evaluation metrics used are accuracy and F1-score. F1-score is more pertinent for multi-labelled data. Support Vector Machines (SVM) worked well with binary data from machine learning classifiers, with 96.1 accuracy and 96 F1-Score. For ternary labels, better results were achieved by Logistic Regression model (LR), with F1-score of 77.3. In deep learning models Fasttext embeddings along with Bi-LSTM and CNN ensemble performed well with both binary and ternary annotations giving F1- score of 96.6 and 80.1 respectively. Ultimately, deep learning-based hybrid framework with self-Attention mechanism has been proposed in this study. Our framework consists of Glove pretrained word embeddings for feature extraction and LSTM + CNN + GRU + Self-Attention mechanism deployed for depression detection task. After the Embedding Layer, LSTM and 1D-CNN have been deployed to capture the sequence and semantics of tweets. Later, GRU applied with self-attention mechanism have been used to focus on contextual and implicit information in tweets. The framework presented improved the accuracy and f1-score of binary-labelled data to 97.4 and ternary-labelled data to 82.9. Furthermore, cross-domain validation has also been carried out for the presented framework, using ‘News Headline Dataset’ for sarcasm detection. Our framework also performed better than the comparative results for sarcasm detection in news headlines. Hence a novel framework for the task of binary and multi-label classification has been proposed in this study.	en_US
dc.language.iso	en	en_US
dc.publisher	MCS	en_US
dc.title	Depression Detection using Machine Learning Techniques	en_US
dc.type	Thesis	en_US