Abstract:
Many business organizations are facing the dilemma of being data-rich and information-poor. Unprecedented decrease in cost of data storage has led to an exponential increase in amount of data at key public sector organizations. The real value of these vast silos of multivariate data lies in extraction of useful information that may guide policies and practices of these organizations. For better customer management organizations need to have a consolidate view of their customer data. The problem of having a consolidated view of customer data become complex in the presence of large and heterogeneous sources of information without any common attribute to identify records pertaining to same customer. In this project we tried to tackle the problem of linking different records pertaining to same customers based on their name and address information.
The main focus of our project was to develop a hybrid solution encompassing existing and new techniques from the domain of semantic data matching and record linkage. So we undergone an extensive study of the related techniques and were able to come up with our own idea which could possibly extend the existing record linkage model. This idea was tested by conducting different analysis on test data and the accuracy level of 90% was achieved during these tests. This document describes the complete detail of our work along with the implemented solution, analysis and concluded results.