Abstract:
Clustering and Multidimensional data plotting is a research project that focuses on introducing a new technique for visual clustering. The aim is to take on large amount of multivariate data and visualize it in a way that data exhibiting similar characteristics is visualized close to one another. The research is focused on broader range of high dimensional data sets and a proof of concept application is made on customer transactional data. Applying this visualization technique on customer data will enable business analyst and decision makers to view large number of rows of customer data at once and to extract useful information from it.
The chosen area of application, transactional data consists of several attributes that define the buying behavior of a customer. This data is to be visualized on a two dimensional screen such that the points which are close to one another in N dimensions remain close in two dimensions. For this reason a dimension reduction technique is required that preserves the distance between the different rows of the data set and converts the visualization process to a series of mathematical transformations. After detailed study of several techniques Singular Value Decomposition emerged as the most appropriate technique for the visualization problem.
More over since the project aimed to visualize large amount of multivariate data, main memory limitation is another concern that needs to be addressed. In order to take care of the memory constraint problem a new flavor of singular value decomposition called Fold SVD is introduced. Fold SVD eliminates the RAM limitations thus enabling visualization of large amount of customer data. Another important characteristic of this project is that it aims to integrate user in the cluster identification process thus combining the fields of data mining and data visualization.