NUST Institutional Repository

Document Retrieval Using Unsupervised Text Classification With Word Embeddings

Show simple item record

dc.contributor.author Siraj, Zahra
dc.date.accessioned 2022-08-07T13:31:46Z
dc.date.available 2022-08-07T13:31:46Z
dc.date.issued 2022
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/30048
dc.description CL-T-6622 en_US
dc.description.abstract Text classification is the process of categorizing a text phrase or text docu ment with an appropriate label. Supervised learning, is the most common method used for classifying texts. The traditional methods of text classifi cation often need a substantial quantity of labelled training data. However, it is not always possible to get a labelled text dataset for the purpose of training classification algorithms. Data labelling often requires a significant amount of time and cost. Insufficient or unlabelled data can be a problem in classification tasks. As a result, unsupervised methods provide the potential to do low-cost text categorization for unlabelled data. The concept of this dissertation revolves around unsupervised text classification using word em beddings. A previous study generated the results using some generic word embeddings such as word2vec, GloVe, and Doc2vec. We have used Lbl2vec approach to perform unsupervised text classification. where a document can be classified and assigned a category by looking at the distance between each label vector and the centroid of the document vector. This model is used to classify unlabelled texts into different categories. After this classification process, the performance of the classifiers is measured with different evalu ation metrics like precision, recall, and F1 measures. Our experiments on some bench-mark text dataset show that the proposed method raises the F1 score to 0.81. A comparison analysis is made to show the classification results with respect to different supervised and unsupervised classification algorithms. Some bench-mark text-based datasets 20newsgroup, AG news group, have been used for comparison and evaluation purposes. en_US
dc.description.sponsorship Dr. Rabia Irfan en_US
dc.language.iso en en_US
dc.publisher SEECS-School of Electrical Engineering and Computer Science NUST Islamabad en_US
dc.title Document Retrieval Using Unsupervised Text Classification With Word Embeddings en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • MS [375]

Show simple item record

Search DSpace


Advanced Search

Browse

My Account