Implementation and evaluation of storage layouts for querying RDF data on Hadoop using MapReduce

Tariq, Haroon

DSpace Home
→
E-Theses
→
SEECS
→
Information Technology
→
MS
→
View Item

dc.contributor.author	Tariq, Haroon
dc.date.accessioned	2020-10-28T07:11:08Z
dc.date.available	2020-10-28T07:11:08Z
dc.date.issued	2015
dc.identifier.uri	http://10.250.8.41:8080/xmlui/handle/123456789/6466
dc.description	Supervisor: Dr. Khalid Latif	en_US
dc.description.abstract	With the advancement in semantic web, the size of Resource Description Framework (RDF) triples is increasing tremendously. Storage and querying of this huge amount of data is become a major issue. Traditional data storage models are unable to solve the problems of scalable storage and query efficiency. A parallel, distributed and scalable model is needed to meet the requirements of storage and processing of huge amount of RDF data. Although significant work has been done in this field to get high performance and scalability, but every approach has its drawback. Hadoop is a framework that is designed for parallel and distributed storage and processing of large datasets, which make it very appropriate choice for storage and processing of huge amount of graph data. In This thesis we have implemented the RDF storage layouts on a highly scalable and distributed column oriented storage model using Hadoop/HBase. In our approach we have used two different RDF data models (vertical partitioning and property table). In vertical Partitioning a table is made for each property that contains subjects and objects related to that particular property, this approach provides better support for multi valued attributes. In property table approach similar type of subjects are grouped in a single table with all of their properties, which saves the cost of expensive joins. Berlin SPARQL Benchmark (BSBM) dataset is used for the testing and evaluation of layouts. A comparison of these storage layouts on the bases of storage and query efficiency is made and situations are identified in which particular layout performs better than others. In our case property table and vertical partition had much better performance than triple table. Property table performed significantly well for the queries based on subjects and vertical partition gave better performance for queries where whole table scan is required.	en_US
dc.publisher	SEEC, National University of Science & Technology	en_US
dc.subject	storage layouts, RDF data, Hadoop, Information Technology	en_US
dc.title	Implementation and evaluation of storage layouts for querying RDF data on Hadoop using MapReduce	en_US
dc.type	Thesis	en_US