Abstract:
Cloud computing has revolutionized the way data is stored and managed,
providing unparalleled scalability and accessibility. However, the rapid adoption of
cloud services has led to an escalating challenge in digital forensic investigations,
resulting in a considerable backlog of cases. In response to this critical issue, cloud
forensic constraints are defined, and then a Cloud Forensic Framework is designed
to alleviate the burden of this forensic backlog. Building upon cloud constraints
and cloud forensic framework, a streamlined Cloud Forensic Process Flow is
established. To address the issue of data duplication that contributes to the forensic
backlog, we reduce it by using hashing. By doing so it optimizes storage utilization
and minimizes redundancy, thereby expediting investigation processes. And in the
context of fraud detection within cloud-stored email data, we focus on relevant
data extraction and prioritization. Our framework offers an approach to identify
pertinent information efficiently, enhancing effectiveness of subsequent analysis.
Specifically, we employ Topic Modelling using Latent Dirichlet Allocation (LDA)
for the detection of fraudulent emails, facilitating rapid fraud identification. To
further augment information extraction, Named Entity Recognition (NER)
powered by BERT is employed to identify entities of interest from the email text
data. Additionally, we described Relation Extraction at the end to uncover
connections between entities, aiding in the identification of different named entities
and relations between them to help in our investigative purposes. The results of our
experimentation found out that BERT model gives exceptional results as compared
to rule-based approach and CRF model. Further it is revealed that by using data
deduplication, by using relevant data extraction, and prioritization within our
Forensic Framework significantly reduces investigation time, storage, and backlog.