Abstract:
The concept of Semantic Web was initially proposed by Tim Berners-Lee
in 1999. In Semantic Web, information is represented using speci c lan-
guages, like Resource Description Framework (RDF), Web Ontology Lan-
guage (OWL) etc. RDF is simple and has been standardized by World Wide
Web Consortium (W3C), due to which, its usage in knowledge management
applications has widely increased. So, a storage infrastructure, which should
be capable to store and process large RDF dataset, is an essential need. Ex-
isting RDF processing frameworks handle small dataset e ciently, but to
process large dataset, costly and high power server setup is required. There
is an essential need to cope this challenge in order to provide cost e ective
and scalable system that can handle e ciently the massive amount of RDF
data.
Distributed and parallel processing models are commonly used to process
massive dataset e ciently and e ectively. Hadoop is such a distributed and
parallel processing open-source framework. Hadoop Distributed File System
(HDFS), HBase (a distributed database of Hadoop) and Hive (a data ware-
housing framework) are already being used to process massive data. We
developed a framework based on HDFS, HBase and Hive to store and re-
trieve massive RDF dataset by using cheap commodity hardware. We stored
massive RDF data in HDFS and HBase to test scalability and then executed
various queries to analyze performance and e ciency of our framework.
Result analysis indicated that we are able to cope with scalability issue
by storing massive RDF data through con guration of few simple machines
over distributed environment, and moreover, execution of various queries
also proved that, our proposed framework is very e ective and e cient as
compared to the existing frameworks like Jena, Sesame, AllegroGraph etc.