Abstract:
We live in an era where there is linguistic diversity and global interconnectedness. In order to move forward, the ability to bridge language barriers is paramount to facilitate cross-cultural communication. This research study presents a comprehensive exploration of cross lingual Urdu to English extractive text summarization framework using an unsupervised NLP approach. The framework incorporates a sequence of steps using a language specific manually prepared dataset. It integrates text translation, summarization using TextRank algorithm, Rouge score calculation and sentiment analysis to assist seamless language comprehension and conversion.
The motivation behind this research emerges from the vital need to address the linguistic divide in a multilingual society like Pakistan. Here, Urdu serves as a national language, but English also holds a significant importance in various areas especially in a professional and educational background. The primary objective is to develop a framework that will be capable of accurately translating cross lingual content meanwhile preserving a semantic meaning of the context.
The framework involves various components, a manually curated dataset that is paired with human generated summaries, along with rouge score in order to assess the accuracy and effectiveness of the framework-generated summaries.
The methodology encompasses dataset preparation, text translation, summarization, evaluation using rouge scores calculation, and sentiment analysis to give reader a gist of the overall content sentiment. The findings of this study contribute to the advancement of cross lingual text summarization technologies.