NUST Institutional Repository

Extended Refinement Methodology for Automatic Keyphrase Assignment

Show simple item record Rabia Irfan 2020-11-23T13:53:13Z 2020-11-23T13:53:13Z 2012
dc.description Supervisor: Dr Sharifullah Khan en_US
dc.description.abstract Keyphrases facilitate in finding right information from digital documents. Keyphrase assignment is the alignment of document or text with the keyphrases of any standard classification taxonomy. Kea++ is a famous tool for performing keyphrase assignment automatically; however it assigns irrelevant terms along with the relevant ones. In order to reduce noise in the Kea++ result set, refinement rules were defined in the refinement methodology to exploit the semantics of the hierarchical structure of the taxonomy. This methodology is a top layer on Kea++. It was evaluated on computing domain taxonomy and showed better results than Kea++. However the refinement methodology is more focused on computing domain taxonomy and does not perform well in case of taxonomies having deep hierarchy of keyphrases. Training-level is the hierarchical level of taxonomy which is adopted in manually generated keyphrases for documents in the training dataset of Kea++. In refinement methodology, the training-level is the key parameter for selection or rejection of any keyphrase in Kea++ result set. But its selection process does not offer priority to the taxonomy level where maximum keyphrases are aligned in the training dataset. Moreover, the methodology lacks in applying standard terminology used in taxonomy languages. This work is aimed to extend and generalize the refinement methodology for multiple domains and improve its results. In the proposed extended refinement methodology, the training-level selection process has been revised and due consideration has been given to taxonomies having deep hierarchy of keyphrases. Standard terminology used in taxonomy languages has been adopted and amended the refinement methodology accordingly to be practical in multiple domains. The extended refinement methodology was evaluated on three different domain taxonomies and datasets: computing, agriculture and mathematics. Evaluation metrics used were (i) precision, recall and f-measure (ii) average number of assigned keyphrases to test documents and (iii) statistical t-test. The evaluation demonstrates significant improvement in reducing noise in the Kea++ result set for multiple domains. We conclude that the extended refinement methodology has been generalized and can be applied in domains other than computing. It has also shown better results than its predecessor. en_US
dc.publisher SEECS, National University of Sciences and Technology, Islamabad en_US
dc.subject Information Technology en_US
dc.title Extended Refinement Methodology for Automatic Keyphrase Assignment en_US
dc.type Thesis en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace

Advanced Search


My Account