dc.description.abstract |
Data mining has emerged to be the hottest technology of today’s world. It has been used in a number of fields like bio-informatics, customer prospecting, product matching etc. In each of these fields, purpose is the same and that is to extract useful information from raw data. In this project, we have used data mining for the purpose of house-holding i.e. to identify all those customers who have same address. We have used Agglomerative Hierarchical Clustering (AHC) for this project because for house-holding we intend to view clusters at multiple levels of granularity. Another reason for using AHC is to make use of maximum amount of information at initial levels where most critical clusters are merged and hence make clustering more accurate. Besides, to avoid the problem of local minima, we have constrained AHC with k-means algorithm of data clustering. Furthermore, to reduce the time requirement of AHC, we have employed it in a distributed environment. We aim to generate keys on the available data set in a rule based fashion, depending on the domain. Thereafter, keys are distributed across several nodes and then clustered. Final results are sent back to the server, which shows a final visualization for these results. K-means algorithm helps in partitioning data while presenting class boundaries. |
en_US |