Ontology Driven Relevance Reasoning for Source Selection in Data Integration

Bilal, Muhammad

DSpace Home
→
E-Theses
→
SEECS
→
Information Technology
→
MS
→
View Item

Ontology Driven Relevance Reasoning for Source Selection in Data Integration

Bilal, Muhammad

URI: http://10.250.8.41:8080/xmlui/handle/123456789/6423

Date: 2006

Abstract:

Online data sources are autonomous, heterogeneous and geographically distributed. The data sources can join and leave a data integration system arbitrarily. Some sources may not contribute significantly to a user query because they are not relevant to it. Executing queries against all the available data sources consume resources unreasonably and subsequently these queries become expensive. Source selection is an approach to resolve the issue. The existing techniques of relevance reasoning for source selection take significant time in traversing the source descriptions. Consequently query response time degrades in coping with the growing number of available sources. Moreover, simple matching process is unable to sort out the fine-grained semantic heterogeneities of data. Semantic heterogeneity of data sources makes the relevance reasoning complex. These issues degrade the performance of data integration systems. In this research, we have proposed an ontology-driven relevance reasoning architecture that identifies relevant data sources for a user query before its execution. The proposed methodology aligns source descriptions (i.e. local ontologies) with domain ontology through a bitmap index. In spite of traversing the local ontologies, the methodology utilizes the bitmap index to perform relevance reasoning in order to improve query response. Semantic matching has been employed in relevance reasoning for the provision of semantic interoperability. Semantic operators, such as, exactMatch, sameAs, equivalentOf, subClassOf, and disjointFrom, have been introduced to sort out fine-grained semantic heterogeneities among data sources. Quantitative scores are assigned to the operators. Data sources are ranked based on the similarity score obtained by them.