Abstract:
Distributed Denial of Service attacks over the Internet have the ability to
cripple online businesses and services. Subsequent extortion attempts by attackers
requires Internet service providers to conduct hasty and speedy forensics on the
digital evidence collected in order to avoid businesses having to pay large sums of
ransom money. Unfortunately owing to the intrinsic nature of DDoS attacks of
producing huge volumes of garbage and spoofed network data, forensic
investigators have to manually sift through gigabytes of network logs trying to
discern useful information that would help identify the source of the attacks, victim
IPs and vulnerability exploitations etc. Due to lack of distributed DDoS forensics
tools, these investigations can take several days or weeks to complete, leading to
loss of valuable amounts of funds in ransom money. Cloud Computing has recently
emerged as a promising technology which allows everyday users to harness the
massively parallel processing capabilities of commodity machines as a pay-as-yougo
utility service.
The contribution of this work is the conceptualization, design and implementation
of two distributed DDoS forensics frameworks that harness the power of the cloud
via the mapReduce paradigm to perform an entropy based clustering and security
analysis of the key features of DDoS attack traffic. . We have evaluated our
framework on two large and publicly available DDoS attack datasets. Our results
show that our framework is as accurate in correctly identifying the various phases
of a DDoS attack as other competing approaches. Moreover, we achieve 86%
speedup with our solution that uses hierarchical agglomerative clustering designed
for Hadoop to perform forensics analysis with a modestly sized cloud of ten nodes.
We also performed the forensic analysis using Mahout’s k-means but the solution
is not as practical as our own Hadoop based solution. because it requires very
accurate values of thresholds which is not possible without trial-and-error
experiments or some spectral analysis which by itself takes a lot of time. However
once you give it accurate values than its performance is better and provides over
87% speedup compared to sequential paradigm. Additionally our hierarchical
agglomerative clustering unit can be added to Mahout.