Abstract:
This thesis presents an approach towards bilingual sentiment analysis of short
messages data retrieved from twitter. Social networks evolved to become one of the most
advanced and widespread communication medium of the modern era. They also serve as
multi-cultural and multi-lingual information centers. Information analysis of these social
networks can help in designing better government and commercial policies on local,
national and international platforms. A number of behavioral and demographic oriented
analytical studies reported that use data from social networks; however, most of these
studies are focused towards English language. Despite being spoken by almost 350
million population (6% of world's population), Hindi and its sister languages (Urdu) lack
extensive work on such sentiment analysis. This proposed research focused on sentiment
analysis of bi-lingual dataset, composed of English and Roman-Urdu tweet messages, on
subject General Elections 2013 in Pakistan. A bilingual sentiment lexicon (BSL) is semiautomatically
created for assigning sentiment strengths to the short messages (tweets). In
order to provide maximum lexicon coverage for sentiment analysis, other linguistic
resources such as WordNet have been involved. Proposed lexicon is used to measure the
popularity of four major political parties on twitter. The proposed system yield promising
results with 76 % accuracy in tweet’s sentiment strength classification.