RANKED INFORMATION RETRIEVAL USING WEIGHTED TF IDF7

ANWAR, SALEEM

DSpace Home
→
E-Theses
→
CEME
→
Computer Software Engineering
→
MS
→
View Item

RANKED INFORMATION RETRIEVAL USING WEIGHTED TF IDF7

ANWAR, SALEEM

URI: http://10.250.8.41:8080/xmlui/handle/123456789/37775

Date: 2008

Abstract:

Ranked Information Retrieval using Weighted TF IDF Document Retrieval is the task of retrieving a relevant Document in response to a query, a question, or a reference Document. Tasks such as question answering, summarization, novelty detection, and information provenance make use of a Document retrieval module as a preprocessing step. The performance of these systems is dependent on the quality of the Document‐retrieval module. Other tasks such as information extraction and machine translation operate on Documents, either using them as training data, or as the unit of input or output (or both), and may benefit from Document retrieval to build a training corpus, or as a post‐processing step. In this thesis we begin by studying IR Model, then we build a through understanding of exiting IR algorithms like TFIDF, Okapi BM25 and Pivoted length normalization to name a few. During the study of the mentioned algorithms we come up with some deficiencies in retrieval algorithms and started working to eradicate those deficiencies. We proposed a better approach for scoring documents named Weighted TF IDF (WTF IDF) instead of TF IDF where terms are counted rather than weighted with respect to locality of documents and term order. More over we planned to cope with different writing styles by looking for synonym query along with original query, this increase the chances of retrieving some novel information from the corpus. We have provided the implementation of exiting algorithms and compare the performance with proposed approach WTF IDF and presented the result. The proposed approach has better results than the exiting ones