RDF Overlay for HDFS and HBase

Mudassar, Muhammad

DSpace Home
→
E-Theses
→
SEECS
→
Information Technology
→
MS
→
View Item

RDF Overlay for HDFS and HBase

Mudassar, Muhammad

URI: http://10.250.8.41:8080/xmlui/handle/123456789/10244

Date: 2011

Abstract:

The concept of Semantic Web was initially proposed by Tim Berners-Lee in 1999. In Semantic Web, information is represented using speci c lan- guages, like Resource Description Framework (RDF), Web Ontology Lan- guage (OWL) etc. RDF is simple and has been standardized by World Wide Web Consortium (W3C), due to which, its usage in knowledge management applications has widely increased. So, a storage infrastructure, which should be capable to store and process large RDF dataset, is an essential need. Ex- isting RDF processing frameworks handle small dataset e ciently, but to process large dataset, costly and high power server setup is required. There is an essential need to cope this challenge in order to provide cost e ective and scalable system that can handle e ciently the massive amount of RDF data. Distributed and parallel processing models are commonly used to process massive dataset e ciently and e ectively. Hadoop is such a distributed and parallel processing open-source framework. Hadoop Distributed File System (HDFS), HBase (a distributed database of Hadoop) and Hive (a data ware- housing framework) are already being used to process massive data. We developed a framework based on HDFS, HBase and Hive to store and re- trieve massive RDF dataset by using cheap commodity hardware. We stored massive RDF data in HDFS and HBase to test scalability and then executed various queries to analyze performance and e ciency of our framework. Result analysis indicated that we are able to cope with scalability issue by storing massive RDF data through con guration of few simple machines over distributed environment, and moreover, execution of various queries also proved that, our proposed framework is very e ective and e cient as compared to the existing frameworks like Jena, Sesame, AllegroGraph etc.