dc.description.abstract |
Analytic business applications are the trend defining applications; they also define the worth of data at a particular time. For any business organization it is important to keep record and log every day business activities, strategies and the reasons behind a decision. The power that is induced in the data by adding structure to it makes it so powerful that no unstructured data management tool can handle it. Hadoop tried to use the unstructured data management tool on structured data but the query execution became expensive. Map reduce is an internationally recognized algorithm to query unstructured data but Hadoop claims to handle both structured and unstructured data. This research illustrates a method that can help Hadoop and other big data management tools to work on structured data as effectively as they work on unstructured. Performance of traditional data management tools drops when it comes to running cross table analytical queries on structured data in distributed processing environment; response time to these data management tools are high because of the ill-aligned data sets and complex hierarchy of distributed computing environment. Data alignment requires a complete shift in data deployment paradigm from row oriented storage layout to column oriented storage layout, and complex hierarchy of distributed computing environment can be handled by keeping metadata of entire data set. Response time to analytical queries can be lowered with the support of two concepts; Shared architecture and Multi path query execution. Highly scalable systems are Shared Nothing architecture based but degradation in performance and fault tolerance are the side effects that came with high scalability. Proposed method is an effort to balance the equation between scalability, performance and fault tolerance. Shared architecture and active backup helps improving the system’s performance by sharing its work-load-per-node. Proposed clustering methodology sheds the data pressure points to minimize the data loss per node crash. |
en_US |