NUST Institutional Repository

TPAC: Tool for Parallel Agglomerative Clustering

Show simple item record Muhammad Usman Anwer 2020-11-20T15:34:36Z 2020-11-20T15:34:36Z 2005
dc.description Supervisor: Dr. Arshad Ali en_US
dc.description.abstract Data mining has emerged to be the hottest technology of today’s world. It has been used in a number of fields like bio-informatics, customer prospecting, product matching etc. In each of these fields, purpose is the same and that is to extract useful information from raw data. In this project, we have used data mining for the purpose of house-holding i.e. to identify all those customers who have same address. We have used Agglomerative Hierarchical Clustering (AHC) for this project because for house-holding we intend to view clusters at multiple levels of granularity. Another reason for using AHC is to make use of maximum amount of information at initial levels where most critical clusters are merged and hence make clustering more accurate. Besides, to avoid the problem of local minima, we have constrained AHC with k-means algorithm of data clustering. Furthermore, to reduce the time requirement of AHC, we have employed it in a distributed environment. We aim to generate keys on the available data set in a rule based fashion, depending on the domain. Thereafter, keys are distributed across several nodes and then clustered. Final results are sent back to the server, which shows a final visualization for these results. K-means algorithm helps in partitioning data while presenting class boundaries. en_US
dc.publisher SEECS, National University of Sciences and Technology, Islamabad en_US
dc.subject Information Technology en_US
dc.title TPAC: Tool for Parallel Agglomerative Clustering en_US
dc.type Thesis en_US

Files in this item

This item appears in the following Collection(s)

  • BS [440]

Show simple item record

Search DSpace

Advanced Search


My Account