Abstract:
Protecting patient privacy in the era of digital health records is a major challenge while allowing healthcare data to be useful. The study of differential privacy as a strong privacy-preserving method for medical data is examined in this thesis. Differential privacy is a mathematical framework that prevents sensitive information from being disclosed while preserving data utility for analysis. It does this by introducing controlled noise to the data. Thus, the main objective of this thesis is privacy preserving of healthcare data in internet of things by using differential privacy. This research aims to propose a secure, privacy-preserving scheme to ensure maximum privacy of an individual by also maintaining its utility and allowing to perform queries based on sensitive attributes under differential privacy. This mechanism guarantees the individual's privacy by consuming minimum computation and communication costs. We have designed a basic framework that tries to achieves differential privacy guarantee and evaluate the results regarding the level of privacy can be achieved in electronic healthcare data. For this purpose, we have practically implemented differential privacy on two different publicly available datasets such as Breast Cancer Prediction Dataset and the Nursing Home COVID-19 Dataset. By applying differential privacy mechanisms to these datasets, it evaluates the balance between privacy and data utility, demonstrating the effectiveness of differential privacy in real-world healthcare scenarios. Additionally, we have conducted time comparison by performing multiple complex queries on these datasets to analyze the computational overhead introduced by differential privacy. The outcomes demonstrate that, despite a slight increase in query processing time, it remains within reasonable bounds, ensuring the practicality of differential privacy for real-time applications. A significant part of this study involves the selection of the privacy parameterε, which determines the degree of privacy protection. Moreover, we have examined the impact of varying ε values on both the privacy and utility of the data. Our experiments demonstrate that a lower ε value enhances privacy at the cost of reduced data utility, whereas a higher ε value offers better utility but with less privacy protection. This trade-off analysis provides crucial insights into optimizing ε for different healthcare data use cases.
The findings of this thesis contribute to the increasing corpus of information on privacy-preserving data analysis in the healthcare industry by providing useful suggestions and insights for using differentiated privacy in various healthcare data scenarios. This work underscores the importance of adopting advanced privacy-preserving techniques to foster trust and compliance in healthcare data sharing and analytics.