Transfer Learning Autoencoder Neural Networks For Anomaly Detection In Malware Infected IoT Devices

Shafiq, Unsub

DSpace Home
→
E-Theses
→
SEECS
→
Computer Science
→
MS
→
View Item

Transfer Learning Autoencoder Neural Networks For Anomaly Detection In Malware Infected IoT Devices

Shafiq, Unsub

URI: http://10.250.8.41:8080/xmlui/handle/123456789/29278

Date: 2022

Abstract:

Distributed Denial of Service (DDoS) attacks have persisted against defensive measures with their sheer capability of obscurity and the simplicity of the attack vectors that they exploit, i.e exhausting the victim’s computing resources. The advent of Internet of-Things (IoT) has led to a massive increase in smart internet-connected devices that often have customized firmware with limited and irregular security patches. This has made them targets for hosting bot malware and can contribute to traffic in a DDoS attack. Many researchers have worked on developing anomaly detectors to identify possible infected hosts and have incorporated ML models within their techniques as well. However, the ubiquity of IoT devices has made training ML models on a per device and per-malware basis impractical. In this thesis, we focus on the autoencoder neural-network-based anomaly detection technique and evaluate the efficacy of the transfer-learning technique in reducing the training time of anomaly models for IoT devices. We base our hypothesis on the intu ition that similar IoT devices should have a similar network footprint and therefore, the latent representation of network footprint should be transferable across devices. The study bases itself on the NBaIoT [1] dataset, which consists of 115 traffic features of real IoT devices infected with Mirai and Bashlite. We observe that while the accu racy of an anomaly model decreased when tested against data from a new IoT device, re-training the innermost layers of the autoencoder with at least 10% of the available dataset restored the anomaly model’s performance. We further evaluate the capability of autoencoders against the CIC-IDS2017 dataset; consisting of network-flow information derived from PCAP data, which can be considered more synonymous with IPFIX/Net flow record formats prevalent in the industry. xi

Show full item record