dc.description.abstract |
Cyber Threat Intelligence (CTI) is the collection and analysis of cyber incidents that
pose risks to the safety of an enterprise. According to a survey by Gemalto, in 2019,
approximately 9 billion records were breached globally resulting in an economic loss
of $608 billion. This shows that CTI is not effectively consumed and there are problems
with current approaches. A common practice is to share technical attack artifacts
which can be easily modified and disguised resulting in a deceptive and biased threat
investigation. It is ironic, that while adversaries’ do occasionally enhance their exploit
kits and malware tools, the underlying attack patterns have remained the same over
the course of history. To investigate cyber threats effectively, it is necessary to identify
them based on the adversarys attack patterns of the cyber kill-chain model. These patterns
reported in CTI are textual, and cannot be directly interpreted by the machines
to investigate a threat incident. Based on the aforementioned problems, the objective
of this research is to motivate the development of a data-driven, and structured vocabulary
for extracting adversaries attack patterns in CTI documents and their effective
utilization in proactive defense. We employed a design science methodology to explore
the respective solution in the domain of distributional semantic topic modeling,
machine learning, and multi-criteria decision making. The solutions proposed in this
thesis are grouped under four frameworks. The first uses machine learning to identify
cyber threats based on observed threat artifacts. The second uses distributional semantic
topic analysis to profile cyber threat actors based on their attack patterns reported in
textual CTI reports. The third framework identifies prevalent attack patterns and finds
useful associations among them. Finally, to help consumers select an appropriate service
provider, a framework is created that proposes an extensive set of criteria to select
CTI vendors according to consumers requirements. As a result of this research, firstly a
benchmark dataset was created and secondly the efficacy of the proposed frameworks
iv
was evaluated. The dataset is being used by security researchers world-wide. The proposed
solution can identify cyber threats with a high accuracy (92%), low false positive
rate (2%), and low detection time (0.15s) which is a marked improvement as compared
to the other competing solutions. The baselines established by this research benefit researchers
as well as practitioners to enhance the cyber threat investigation procedures.
The proposed solution is dependent on the reliability of CTI source. In the future, the
economic aspect of incentivizing threat sharing will be explored. |
en_US |