MapRQL: SPARQL over MapReduce

Karim, Farah

DSpace Home
→
E-Theses
→
SEECS
→
Information Technology
→
MS
→
View Item

MapRQL: SPARQL over MapReduce

Karim, Farah

URI: http://10.250.8.41:8080/xmlui/handle/123456789/9835

Date: 2013

Abstract:

Semantic Web is an emerging technology that has enabled machines to ma- nipulate web data and produce useful results. RDF is the W3C standard framework for Semantic Web data annotation. RDF data repositories are growing both in number and size with each passing day. This continuous growth of Semantic Web data has attracted the research community to work for the e cient access and manipulation of RDF data. Jena, Sesame and Openlink Virtuoso have laid the groundwork for the development of RDF data management systems. However, these traditional RDF data manage- ment approaches are centralized and are not able to manage huge volumes of RDF data. Keeping in view the growing volume of RDF data; emerging RDF tools are based on distributed technologies. This paper is a contribution in the aforementioned domain. It aims at providing a distributed SPARQL query framework for RDF data processing using Hadoop. We combine two components of Hadoop; MapReduce and HBase. MapReduce provides e - cient processing of huge volumes of data on commodity hardware, whereas HBase stores semi-structured data in a scalable way. MapRQL generates cus- tomized MapReduce jobs for SPARQL graph pattern and runs MapReduce jobs over huge volumes of RDF data, stored in HBase. MapRQL supports SPARQL queries with basic graph pattern, basic graph pattern with lter constraint, union or alternate graph pattern, and optional graph pattern. MapRQL is evaluated on Barton dataset by observing execution time with gradually increasing dataset size up to 50 million RDF triples. SPARQL queries are run directly over Hive, using the same dataset, and query execu- tion time taken by MapRQL is compared with that of Hive. Results show signi cant performance gain of MapRQL to execute SPARQL queries. In- dexing can be implemented in MapRQL for e cient retrieval of RDF data in future.