Abstract:
So far from their inception, Internet of Things (IoT) devices have been a breakthrough in the
technology. These devices are well-connected; they generate and consume data which involves
transmission of data back and forth among various devices. With this advancement in technol ogy, we are connected not only with each other but with man made devices as well and have
control over them anywhere and anytime. With majority of scientists indulge in advancing this
technology, there are very few who tries to exploit the weaknesses in them like having unau thorized access, using resources without permission and rendering their services unavailable by
Denial of Service (DoS) and Distributed DoS (DDoS) attacks. Ensuring security of the data is a
critical challenge as far as IoT is concerned. Since IoT devices are inherently low-power and do
not require a lot of compute power, a Network Intrusion Detection System (NIDS) is typically
employed to detect and remove malicious packets from entering the network.
In the same context, in this thesis, we propose feature clusters in terms of Flow, Message Queu ing Telemetry Transport (MQTT) and Transmission Control Protocol (TCP) by using features
in UNSW-NB15 dataset. We apply supervised Machine Learning (ML) algorithms i.e. Random
Forest (RF), Support Vector Machine (SVM) and Artificial Neural Networks (ANN) on the clus ters. Using RF, we respectively achieve 98.67% and 97.37% of accuracies in binary and multi
class classification. In clusters based techniques, we achieved 96.96%, 91.4% and 97.54% of
classification accuracies by using RF on Flow & MQTT features, TCP features and top features
from both clusters. We also created a dataset of packets from IoT traffic by comparing features
from UNSW-NB15 and Bot-IoT datasets for classifying normal, DoS and DDoS traffic in a
unique way. We catered for and eliminated problems like over-fitting, curse of dimensionality
and imbalance in the dataset in both techniques. We achieved classification accuracy of 96.3%
by using the technique of Deep Learning (DL). Moreover, we show that the proposed feature
clusters and our comparison of datasets provide higher accuracy and requires lesser training
time as compared to other state of the art ML based approaches.