Abstract:
Purpose: In the last few years, with rapid growth in use of networking sites such as twitter and
Face book has been increased greatly. This also attracted the researcher to use social networks
data for sentiment analysis. Sentiment analysis is also known as opinion mining is the process of
finding out the emotion such as positive, negative and neutral from the series of words. In present,
on internet huge amount of data has been generated and to extract useful information from data is
also become interest for the researchers. Sentiment analysis has been done mostly in English and
Chinese languages. In this paper, sentiment classification is done on Urdu news tweets. The
proposed methodology consists upon two steps. In first step data preprocessing is done such as
removal of hash tag and removal of stop words is done. In second step feature vector is designed.
The feature vector is formulated by through the identification of number of positive words, negative
words, and presence of negation and use of POS tags. After formulation of feature vector the
decision tree is used as classification algorithm. The decision tree classifies the tweet as positive,
negative and neutral. The experimental result of the proposed methodology shows significant
success in terms of accuracy and sentiment analysis.
Methods: This paper proposed sentence level sentiment analysis. This section presented the
methodology of sentiment analysis of Urdu news tweets data. First the detail of dataset collection
is providing. Next the annotation of data set is done with help of human annotators. Then data
preprocessing carry out. Then feature vector is calculated by considering the more relevant
features. Finally the tweets are classified into positive, negative and neutral using decision tree
classifier
Results: In this research endeavor, we presented a summary of existing state of art for classification
of Urdu news tweets. Sentiment classifications on Urdu tweets have been attempted in this research
work. The impacts of feature vector and decision tree were analyzed to classify the tweets as
positive, negative and neutral.
The preprocessed form of training data along with feature vector was employed to the algorithm
C45 which is used for decision tree.