Abstract:
Text classification is the process of categorizing a text phrase or text docu ment with an appropriate label. Supervised learning, is the most common
method used for classifying texts. The traditional methods of text classifi cation often need a substantial quantity of labelled training data. However,
it is not always possible to get a labelled text dataset for the purpose of
training classification algorithms. Data labelling often requires a significant
amount of time and cost. Insufficient or unlabelled data can be a problem in
classification tasks. As a result, unsupervised methods provide the potential
to do low-cost text categorization for unlabelled data. The concept of this
dissertation revolves around unsupervised text classification using word em beddings. A previous study generated the results using some generic word
embeddings such as word2vec, GloVe, and Doc2vec. We have used Lbl2vec
approach to perform unsupervised text classification. where a document can
be classified and assigned a category by looking at the distance between each
label vector and the centroid of the document vector. This model is used
to classify unlabelled texts into different categories. After this classification
process, the performance of the classifiers is measured with different evalu ation metrics like precision, recall, and F1 measures. Our experiments on
some bench-mark text dataset show that the proposed method raises the
F1 score to 0.81. A comparison analysis is made to show the classification results with respect to different supervised and unsupervised classification
algorithms. Some bench-mark text-based datasets 20newsgroup, AG news group, have been used for comparison and evaluation purposes.