TPAC: Tool for Parallel Agglomerative Clustering

Muhammad Usman Anwer

DSpace Home
→
E-Theses
→
SEECS
→
Information Technology
→
BS
→
View Item

dc.contributor.author	Muhammad Usman Anwer
dc.date.accessioned	2020-11-20T15:34:36Z
dc.date.available	2020-11-20T15:34:36Z
dc.date.issued	2005
dc.identifier.uri	http://10.250.8.41:8080/xmlui/handle/123456789/13248
dc.description	Supervisor: Dr. Arshad Ali	en_US
dc.description.abstract	Data mining has emerged to be the hottest technology of today’s world. It has been used in a number of fields like bio-informatics, customer prospecting, product matching etc. In each of these fields, purpose is the same and that is to extract useful information from raw data. In this project, we have used data mining for the purpose of house-holding i.e. to identify all those customers who have same address. We have used Agglomerative Hierarchical Clustering (AHC) for this project because for house-holding we intend to view clusters at multiple levels of granularity. Another reason for using AHC is to make use of maximum amount of information at initial levels where most critical clusters are merged and hence make clustering more accurate. Besides, to avoid the problem of local minima, we have constrained AHC with k-means algorithm of data clustering. Furthermore, to reduce the time requirement of AHC, we have employed it in a distributed environment. We aim to generate keys on the available data set in a rule based fashion, depending on the domain. Thereafter, keys are distributed across several nodes and then clustered. Final results are sent back to the server, which shows a final visualization for these results. K-means algorithm helps in partitioning data while presenting class boundaries.	en_US
dc.publisher	SEECS, National University of Sciences and Technology, Islamabad	en_US
dc.subject	Information Technology	en_US
dc.title	TPAC: Tool for Parallel Agglomerative Clustering	en_US
dc.type	Thesis	en_US