NUST Institutional Repository

Framework to cluster structured data and a recovery mechanism

Show simple item record

dc.contributor.author Khalid, Amna
dc.contributor.author Supervised by Dr. Hammad Afzal
dc.date.accessioned 2020-11-17T06:26:01Z
dc.date.available 2020-11-17T06:26:01Z
dc.date.issued 2014-09
dc.identifier.other TCS-336
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/12361
dc.description.abstract Analytic business applications are the trend defining applications; they also define the worth of data at a particular time. For any business organization it is important to keep record and log every day business activities, strategies and the reasons behind a decision. The power that is induced in the data by adding structure to it makes it so powerful that no unstructured data management tool can handle it. Hadoop tried to use the unstructured data management tool on structured data but the query execution became expensive. Map reduce is an internationally recognized algorithm to query unstructured data but Hadoop claims to handle both structured and unstructured data. This research illustrates a method that can help Hadoop and other big data management tools to work on structured data as effectively as they work on unstructured. Performance of traditional data management tools drops when it comes to running cross table analytical queries on structured data in distributed processing environment; response time to these data management tools are high because of the ill-aligned data sets and complex hierarchy of distributed computing environment. Data alignment requires a complete shift in data deployment paradigm from row oriented storage layout to column oriented storage layout, and complex hierarchy of distributed computing environment can be handled by keeping metadata of entire data set. Response time to analytical queries can be lowered with the support of two concepts; Shared architecture and Multi path query execution. Highly scalable systems are Shared Nothing architecture based but degradation in performance and fault tolerance are the side effects that came with high scalability. Proposed method is an effort to balance the equation between scalability, performance and fault tolerance. Shared architecture and active backup helps improving the system’s performance by sharing its work-load-per-node. Proposed clustering methodology sheds the data pressure points to minimize the data loss per node crash. en_US
dc.language.iso en en_US
dc.publisher MCS en_US
dc.title Framework to cluster structured data and a recovery mechanism en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account