Abstract:
Resource Description Framework (RDF) is a W3C Recommendation for
knowledge representation on semantic web. Growing size of RDF anno-
tated data demands scalable semantic stores. Hadoop based distributed and
parallel processing frameworks such as HBase and Hive are increasingly be-
coming popular for storing voluminous data and for enhancing flexibility
to handle complex data. Hive is a Hadoop based data warehousing infras-
tructure with support for complex analytical processing. Its query interface
doesn’t support data exploration using SPARQL, a standard query language
for RDF. Integration of aforementioned technologies with added support for
SPARQL queries may realize a scalable semantic web data store. We have
proposed a semantic preserving SPARQL-to-HiveQL translation scheme that
adds querying interface to the Hadoop based RDF stores. Major contribu-
tions of our research are (i) semantic preserving SPARQL-to-HiveQL query
translation algorithm (ii) a storage schema independent querying mechanism
that accommodates different storage schemes without impacting translation
time. The experimental results show the efficient working of proposed trans-
lation algorithm and its support for different types of SPARQL queries.