dc.contributor.author |
Tariq, Haroon |
|
dc.date.accessioned |
2020-10-28T07:11:08Z |
|
dc.date.available |
2020-10-28T07:11:08Z |
|
dc.date.issued |
2015 |
|
dc.identifier.uri |
http://10.250.8.41:8080/xmlui/handle/123456789/6466 |
|
dc.description |
Supervisor: Dr. Khalid Latif |
en_US |
dc.description.abstract |
With the advancement in semantic web, the size of Resource Description Framework (RDF)
triples is increasing tremendously. Storage and querying of this huge amount of data is become a
major issue. Traditional data storage models are unable to solve the problems of scalable storage
and query efficiency. A parallel, distributed and scalable model is needed to meet the
requirements of storage and processing of huge amount of RDF data. Although significant work
has been done in this field to get high performance and scalability, but every approach has its
drawback.
Hadoop is a framework that is designed for parallel and distributed storage and processing of
large datasets, which make it very appropriate choice for storage and processing of huge amount
of graph data. In This thesis we have implemented the RDF storage layouts on a highly scalable
and distributed column oriented storage model using Hadoop/HBase. In our approach we have
used two different RDF data models (vertical partitioning and property table). In vertical
Partitioning a table is made for each property that contains subjects and objects related to that
particular property, this approach provides better support for multi valued attributes. In property
table approach similar type of subjects are grouped in a single table with all of their properties,
which saves the cost of expensive joins. Berlin SPARQL Benchmark (BSBM) dataset is used for
the testing and evaluation of layouts. A comparison of these storage layouts on the bases of
storage and query efficiency is made and situations are identified in which particular layout
performs better than others. In our case property table and vertical partition had much better
performance than triple table. Property table performed significantly well for the queries based
on subjects and vertical partition gave better performance for queries where whole table scan is
required. |
en_US |
dc.publisher |
SEEC, National University of Science & Technology |
en_US |
dc.subject |
storage layouts, RDF data, Hadoop, Information Technology |
en_US |
dc.title |
Implementation and evaluation of storage layouts for querying RDF data on Hadoop using MapReduce |
en_US |
dc.type |
Thesis |
en_US |