NUST Institutional Repository

Urdu Summarization using Pre-trained Language Models

Show simple item record

dc.contributor.author Munaf, Raja Mubashir
dc.contributor.author Supervised by Dr. Hammad Afzal.
dc.date.accessioned 2022-12-07T06:25:48Z
dc.date.available 2022-12-07T06:25:48Z
dc.date.issued 2022-07
dc.identifier.other TCS-529
dc.identifier.other MSCSE / MSSE-25
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/31764
dc.description.abstract Ever increasing influx of data over the internet has become a reality which is faced with a challenge of sifting through and extracting meaningful information. During the last two decades, users are being overwhelmed with both textual and multimedia data, due to popularity of social media and news platforms. To cope up with the challenges of information overload various research technologies have also gain popularity. Natural Language Processing (NLP) has observed significant improvements for textual data processing in terms of its efficiency and accuracy with the inception of Language Models comprising of Deep Learning based Artificial Neural Networks. Automatic Summarization (under the umbrella of NLP) is the process of extracting only the meaningful information from text resulting into reducing the length of text as well as maintaining the sense of it. Urdu Language despite 10th most spoken language in the world is still a low resource language having little to no research in the field of Automatic Summarization and NLP. Most of the research in restricted to high resource languages like English. An effort is carried out to explore Deep Learning based Pre-trained Language Models comprising of self-attentive transformers for both Extractive and Abstractive Summarization capturing contextual information. Moreover, a summarization dataset of 76k records is created by collecting article summary pairs from news domain. As per best of our knowledge it will be the first and largest dataset available for Urdu Summarization. Experimental Results demonstrated competitive evaluation score (ROUGE, BERTScore) of summarization models finetuned on newly created dataset. Human evaluation is also carried out identifying the shortcomings of automatic evaluation methods in the field of summarization. en_US
dc.language.iso en en_US
dc.publisher MCS en_US
dc.title Urdu Summarization using Pre-trained Language Models en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account