NUST Institutional Repository

Text Classification Using NLP

Show simple item record

dc.contributor.author Fahid Bin Tariq
dc.date.accessioned 2024-04-18T11:39:52Z
dc.date.available 2024-04-18T11:39:52Z
dc.date.issued 2024
dc.identifier.other 330491
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/42994
dc.description Supervisor: Dr. Ahmad Salman en_US
dc.description.abstract This research is based on the fine tuning of BERT model and applying long short term memory layers (LSTM). Bert which is already well known for text classification is being used along with other layers to further enhance the accuracy of text classification. Many existing Specific Bert models are available but they are only trained for a specific task. This paper shows the classification of four different classes: chats, emails, news and tweets. The method is pretty simple, at first dataset for each target class is collected and preprocessed using NLP libraries to remove extra and useless data from the datasets. The data-loader is prepared to feed the testing and validation data into a BERT base model. Before that, Bert tokenizer is used as Bert only takes data which is presented in a specific format having Special tokens ([CLS] and [SEP]). Using a recommended approach Bert is fine-tuned one by one for all target classes. The innovation is the introduction of LSTM layers merged with fully connected (FC) and some pooling layers in case. The trained output of Bert, which is Bert-embedding, is being used as an input to the LSTM model. Although, Bert alone could perform well, but for some complex datasets these additional layers have provided an edge. As LSTM layers are being used in bidirection to further capture the feature in the text to classify them more efficiently. Accuracy is enhanced overall. However, for binary classification with limited datasets there is a minor change in accuracy by introducing LSTM layers but for multi-classification with complex data, the accuracy is noticeable. For chats achieved accuracy is 99%; for emails, 98%; for news, 97%; and for tweets (complex data with multi-label sentiment analysis), 85%. These accuracies in comparison with alone Bert model are more efficient. en_US
dc.language.iso en en_US
dc.publisher School of Electrical Engineering and Computer Science,(SEECS) NUST en_US
dc.title Text Classification Using NLP en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • MS [881]

Show simple item record

Search DSpace


Advanced Search

Browse

My Account