dc.contributor.author |
Irfan Ali Khan |
|
dc.date.accessioned |
2020-12-09T06:05:56Z |
|
dc.date.available |
2020-12-09T06:05:56Z |
|
dc.date.issued |
2018 |
|
dc.identifier.uri |
http://10.250.8.41:8080/xmlui/handle/123456789/17136 |
|
dc.description |
Supervisor: Dr. Sharifullah Khan |
en_US |
dc.description.abstract |
Huge amount of textual data is generated every second in today’s digital world. It is essential to process the data in such a way that valuable information can be extracted successfully and precisely. Taxonomy fulfills the requirements and provide one such structure. Many techniques have been developed to generate taxonomy. These techniques are from different fields such as natural language processing, machine learning, information retrieval and data mining. After reviewing many well-known automatic taxonomy generation techniques, these techniques have been divided into two categories: document based and concept based taxonomy generation. In document based taxonomy generation documents are arranged in hierarchical structure based on their similarity with other documents in the collection. While in concept based taxonomy generation concepts are arranged in hierarchical structure based on their similarity with other concepts in the collection. The existing techniques have not used common datasets for taxonomy generation and have not adopted common criteria for taxonomy evaluation. No comparisons were made between results of document based taxonomy process and concept based taxonomy generation process. The aim of this research is to compare document based automatic taxonomy generation process and concept based automatic taxonomy generation process. In this study both taxonomy generation systems were evaluated on basis of the content of the taxonomy using various experiments. Content quality was measured using content quality measure-precision (CQM-P), content quality measure-recall (CQM-R) and F1-measure for the generated taxonomy with respect to the gold standard of MEDLINE and ACM taxonomies. The research concluded that document based automatic taxonomy generation process performed better than concept based automatic taxonomy generation process for Medical and Computer Science domain datasets. In addition to content quality evaluation metrics, a structure quality evaluation metrics can also be used in future to evaluate the generated taxonomy. |
en_US |
dc.publisher |
SEECS, National University of Sciences and Technology, Islamabad |
en_US |
dc.subject |
Computer Science |
en_US |
dc.title |
Comparative Analysis of Document vs. Concept based Automatic Taxonomy Generation Process |
en_US |
dc.type |
Thesis |
en_US |