Secure Block-level Data Deduplication approach for Cloud Data Centers

Ali, Gul Sayyar

DSpace Home
→
E-Theses
→
SINES
→
Computation Science & Engineering
→
MS
→
View Item

Secure Block-level Data Deduplication approach for Cloud Data Centers

Ali, Gul Sayyar

URI: http://10.250.8.41:8080/xmlui/handle/123456789/35716

Date: 2018-12-03

Abstract:

The rise in information and technology sector has increased storage requirement in cloud data centers with unprecedented pace. Global storage reached 2.8 trillion GB as per EMC Digital Universe study 2012 [1] and will reach 5247GB per user by 2020. Data redundancy is one of the root factors in storage scarcity because clients upload data without knowing the content available on the server. Ponemon Institute detected 18 percent redundant data in \National Survey on Data Centers Outages" [15]. To resolve this issue, the concept of data deduplication is used, where each le has a unique hash identi er that changes with the content of the le. If a client tries to save duplicate of an existing le, he/she receives a pointer for retrieving the existing le. In this way, data deduplication helps in storage reduction and identifying redundant copies of the same les stored at data centers. Therefore, many popular cloud storage vendors like Amazon, Google Dropbox, IBM Cloud, Microsoft Azure, Spider Oak, Waula and Mozy adopted data deduplication. In this study, we have made a comparison of commonly used File-level deduplication with our proposed Block-level deduplication for cloud data centers. We implemented the two deduplication approaches on a local dataset and demonstrated that the proposed Block-level deduplication approach shows 5 percent better results as compared to the File-level deduplication approach. Furthermore, we expect that the performance can be further improved by considering a large dataset with more users working in similar domain.

Show full item record