Abstract:
The basic idea of exploit code detection inside network flows with the goal of preventing remote exploits is not new. For fast, efficient and real time analysis of network flows statistical methods are the most feasible choice. Unlike other anomaly based detection systems instead of modeling benign behavior we are trying to figure out the probability of
vi
presence of executable code in normal network traffic flows. Markov chains are used to model the transition probabilities. This technique is supposed to look for assembly language code in any kind of compressed and uncompressed data streams. We are trying to overcome the size constraints faced by already presented statistical methods in this domain. Our technique should be efficient enough to locate few bytes (i.e. 20 to 30 bytes) of assembly code in fairly large files which may larger than 20 MB.
Like any stochastic process in coding at any step if it is provided with present state, the chances that future states will be equally likely are very less; some future states will have more probability of occurring then the others. This property is referred as Markov property: “Given the present future states are independent of the past states”.The changes of the state are called transitions and the probabilities associated with various state changes are called transition probabilities. Fairly efficient estimates can be obtained if the proper transition probabilities are obtained.
We have used this approach to design a novel algorithm that can distinguish malcode/exploit shellcode from benign code or random data. We show that our algorithm can identify shellcode as short as 57 bytes with reliability.