dc.description.abstract |
In this research, we present a novel approach to enhance Text Classification task using improved
word representations that have sentence embeddings. Word2vec is the vector based
word representation, can be manipulated to work in conjunction with vector based sentence
representations to improve the accuracy of classifier in text classification task. we will
demonstrate how distributed representations of words could be used for sentence classification
by jointly learning both representations for detecting semantic relatedness between
sentences. Distributed representation of words exhibit semantic association between words
however very less effective amount of work done on sentence classification. Findings also
suggests that CBOW model works better on semantic tasks as opposed to the skipgram.
The primary objective of this thesis is to surface out semantic relatedness between questions’
pairs. In this thesis a deep neural network Multi-Layer Perceptron is used for sentence
encoding and multiple distance and similarity measures are used for similarity estimation.
Different experiments are performed to find out sentence vectors. This is then used to classify
questions as similar or dissimilar, which asks same intent but apparently looks different
because of different wordings. It is a challenging task because in Online User Forums it is
crucial to gaurantee that each question can occur only once so that it can be answered once
and improves the quality of knowledge foundations. This saves the time of the experts because
they dont’t have to answer same questions again and again. To date word2vec achieve
state of the art performance in natural language processing tasks however it is still inadequate
to consider sentence classification in it. To incorporate such information we primarily
focus on to collaborate both semantic information of both words and sentences to produce
high quality vectors. The resulting word vectors reportedly shows effective improvements
over baseline techniques. We summarize our best published results on the famous publically
available benchmark dataset of text classification.
The experimental results and quantitative comparison with other modern techniques is provided
to demonstrate the importance of subject techniques |
en_US |