Automated Integration of Heterogenous databases

Safdar, Mamoona

DSpace Home
→
E-Theses
→
CEME
→
Computer Software Engineering
→
MS
→
View Item

Automated Integration of Heterogenous databases

Safdar, Mamoona

URI: http://10.250.8.41:8080/xmlui/handle/123456789/35303

Date: 2020

Abstract:

In integration of heterogeneous databases, data integrate from different sources. It is very challenging task because data model and representation of data varies in different relational databases. It will be more complicated when we are talking about relational for example SQL (Structured Query Language) and non-relational for example NoSQL (not only SQL) databases. In past, researchers focused on integration of different relational databases. Now a days Integration of SQL and NoSQL become an important issue because of popularity of NoSQL. Until now, various techniques of supervised machine learning algorithms have been introduced to solve the problem of heterogeneous database integration. Every method perform integration in its own unique way. we are introducing unsupervised machine learning algorithms to perform integration. The main idea of this approach is to integrate relational and non-relational database for increasing the efficiency of data by using unsupervised machine learning algorithms. So, we don’t need to train and supervise our dataset. The proposed approach is to first get data from Mongo DB and apply clustering on that data by set centroid values. The algorithm is than represent clusters with different color. Each cluster represent specific table of SQL database. We would also explore best machine learning algorithm by comparing different algorithms based on accuracy. We only used K-means, spectral, agglomerative and mean shift algorithms. For validation of clustering of each algorithm we used confusion matrix. The proposed approach has been validated through multiple case studies. Therefore, there is a gap between supervised machine learning algorithm techniques and unsupervised machine learning algorithm techniques. So, there is need to provide an unsupervised level solution to automatically integrate NoSQL to SQL databases to overcome research gap. We have proposed this solution for integration of relational and non-relational databases through unsupervised machine learning algorithm. This automatically predict similar data of non-relational database in the form of clusters so we can represent entities of relational database.