Abstract:
The last couple of years have seen a radical shift in the cyber defense paradigm
from reactive to proactive, and this change is marked by the steadily increas-
ing trend of Cyber Threat Intelligence (CTI) sharing. Currently, there are
numerous Open Source Intelligence (OSINT) sources providing periodically
updated threat feeds that are fed into various analytical solutions. At this
point, there is an excessive amount of data being produced from such sources,
both structured (STIX, IOC, etc.) as well as unstructured (blacklists, etc.).
However, more often than not, the level of detail required for making in-
formed security decisions is missing from threat feeds, since most indicators
are atomic in nature, like IPs and hashes, which are usually rather volatile.
These feeds distinctly lack strategic threat information, like attack patterns
and techniques that truly represent the behavior of an attacker or an exploit.
Another vital information missing from CTI is the course of action taken by
a certain organization to combat a threat, which would make it easier for
organizations to formulate their own counter-mechanism against a certain
threat.
We propose the usage of natural language processing to extract threat feeds
by mining the unstructured cyber threat information sources, cleansing, ag-
gregating, tagging and indexing information, also providing output in stan-
dards, like STIX, that is a widely accepted industry standard that represents
CTI. The automation of an otherwise tedious manual task would ensure the
timely gathering and sharing of relevant CTI that would give organizations
the edge to be able to proactively defend against known as well as unknown
threats.