Abstract:
Named Entity Recognition (NER) is the part of Natural language processing (NLP) that helps to identify and classify the name entities such as people, location and organization names. Named Entity Recognition also played a key role and used to improve the results of Information Extraction, Machine Translation and many other NLP applications. Social media contain lots of data and are thus considered a valuable source of information nowadays. As the information on social media grows exponentially the problem of managing, the information becomes challenging. A lot of work has been done in NER in rich resource languages such as English, and German. Since Urdu is a low-resource language, therefore, no or very little work has been done in Urdu NER. In recent years, NER has been dominated by deep neural networks, which have achieved higher accuracy compared to other traditional machine learning models. This thesis aims to perform Named Entity recognition on social media text using traditional and deep learning models. The process starts with the collection of Urdu tweets through Twitter data API and web scrapper. After necessary text preprocessing the refined dataset will be annotated in four name entities (Person, Location, Organization and Others). The labelled data will be passed to the feature extraction module for relevant feature mining. The extracted feature will be trained on state-of-the-art machine learning and deep learning classifiers to investigate the performance of the proposed model.