NUST Institutional Repository

Implementation and evaluation of storage layouts for querying RDF data on Hadoop using MapReduce

Show simple item record

dc.contributor.author Tariq, Haroon
dc.date.accessioned 2020-10-28T07:11:08Z
dc.date.available 2020-10-28T07:11:08Z
dc.date.issued 2015
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/6466
dc.description Supervisor: Dr. Khalid Latif en_US
dc.description.abstract With the advancement in semantic web, the size of Resource Description Framework (RDF) triples is increasing tremendously. Storage and querying of this huge amount of data is become a major issue. Traditional data storage models are unable to solve the problems of scalable storage and query efficiency. A parallel, distributed and scalable model is needed to meet the requirements of storage and processing of huge amount of RDF data. Although significant work has been done in this field to get high performance and scalability, but every approach has its drawback. Hadoop is a framework that is designed for parallel and distributed storage and processing of large datasets, which make it very appropriate choice for storage and processing of huge amount of graph data. In This thesis we have implemented the RDF storage layouts on a highly scalable and distributed column oriented storage model using Hadoop/HBase. In our approach we have used two different RDF data models (vertical partitioning and property table). In vertical Partitioning a table is made for each property that contains subjects and objects related to that particular property, this approach provides better support for multi valued attributes. In property table approach similar type of subjects are grouped in a single table with all of their properties, which saves the cost of expensive joins. Berlin SPARQL Benchmark (BSBM) dataset is used for the testing and evaluation of layouts. A comparison of these storage layouts on the bases of storage and query efficiency is made and situations are identified in which particular layout performs better than others. In our case property table and vertical partition had much better performance than triple table. Property table performed significantly well for the queries based on subjects and vertical partition gave better performance for queries where whole table scan is required. en_US
dc.publisher SEEC, National University of Science & Technology en_US
dc.subject storage layouts, RDF data, Hadoop, Information Technology en_US
dc.title Implementation and evaluation of storage layouts for querying RDF data on Hadoop using MapReduce en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • MS [435]

Show simple item record

Search DSpace


Advanced Search

Browse

My Account