Abstract:
Natural Language Processing is a growing field of Artificial Intelligence and used for
interaction between computers and humans. In NLP, negation is of great importance as
it changes the polarity of a sentence. Recognition of Cue and scope during negation
detection is an important aspect. Research work has been reported in Negation
Detection for English language, e.g. in biomedical domain, “Bioscope Corpus” is a
corpus of Biomedical events annotated with negation cues and scopes. There is no such
research done in Urdu and negation detection is difficult due to Urdu’s morphologically
rich structure. In this thesis, a corpus has been created using BBC Urdu News articles.
Using the guidelines for annotation of BioScope corpus, further rules are devised which
are suitable for Urdu and applied on BBC Urdu corpus. Corpus comprises of 1600
sentences, belonging to four domains (politics, sports, ). Different types of negation
cues are extracted from corpus, which are: Single, Multiple and prefixes. Annotation
has been carried out by 3 domain experts and inter-annotator agreement has been
applied through Kappa. The annotated corpus is then used to devise a machine learning
based method using Condition Random Fields (CRF) to detect “cue” and “scope”
automatically. This system detected negation cue with 100% precision, 94% recall and
96% F-measure; whereas scope is detected with 75% precision, 81% recall and 77% Fmeasures.
We further investigated the effect of automatically detected negation on sentence level Sentiment Analysis. For this purpose, we performed Sentiment Analysis on BBC Urdu News Corpus with and without using negation. Experiments showed an increase to 82.6% accuracy with using negation as compared to 76.4% without negation detection.