Urdu Summarization using Pre-trained Language Models

Munaf, Raja Mubashir; Supervised by Dr. Hammad Afzal.

DSpace Home
→
E-Theses
→
MCS
→
Computer Software Engineering
→
MSCS
→
View Item

dc.contributor.author	Munaf, Raja Mubashir
dc.contributor.author	Supervised by Dr. Hammad Afzal.
dc.date.accessioned	2022-12-07T06:25:48Z
dc.date.available	2022-12-07T06:25:48Z
dc.date.issued	2022-07
dc.identifier.other	TCS-529
dc.identifier.other	MSCSE / MSSE-25
dc.identifier.uri	http://10.250.8.41:8080/xmlui/handle/123456789/31764
dc.description.abstract	Ever increasing influx of data over the internet has become a reality which is faced with a challenge of sifting through and extracting meaningful information. During the last two decades, users are being overwhelmed with both textual and multimedia data, due to popularity of social media and news platforms. To cope up with the challenges of information overload various research technologies have also gain popularity. Natural Language Processing (NLP) has observed significant improvements for textual data processing in terms of its efficiency and accuracy with the inception of Language Models comprising of Deep Learning based Artificial Neural Networks. Automatic Summarization (under the umbrella of NLP) is the process of extracting only the meaningful information from text resulting into reducing the length of text as well as maintaining the sense of it. Urdu Language despite 10th most spoken language in the world is still a low resource language having little to no research in the field of Automatic Summarization and NLP. Most of the research in restricted to high resource languages like English. An effort is carried out to explore Deep Learning based Pre-trained Language Models comprising of self-attentive transformers for both Extractive and Abstractive Summarization capturing contextual information. Moreover, a summarization dataset of 76k records is created by collecting article summary pairs from news domain. As per best of our knowledge it will be the first and largest dataset available for Urdu Summarization. Experimental Results demonstrated competitive evaluation score (ROUGE, BERTScore) of summarization models finetuned on newly created dataset. Human evaluation is also carried out identifying the shortcomings of automatic evaluation methods in the field of summarization.	en_US
dc.language.iso	en	en_US
dc.publisher	MCS	en_US
dc.title	Urdu Summarization using Pre-trained Language Models	en_US
dc.type	Thesis	en_US