Performance Analysis of IDS Using Feature Selection and ML Methods

Khan, Muhammad Muheet; Supervised by Dr. Fahim Arif.

DSpace Home
→
E-Theses
→
MCS
→
Computer Software Engineering
→
MSCS
→
View Item

Performance Analysis of IDS Using Feature Selection and ML Methods

Khan, Muhammad Muheet; Supervised by Dr. Fahim Arif.

URI: http://10.250.8.41:8080/xmlui/handle/123456789/32448

Date: 2022-12

Abstract:

Intrusion detection system (IDS) in past many years has played an important part in improving performance of systems by avoiding and preventing false attacks on the systems to make the networks more safe and secure. Now it has become very difficult in this world to work on the internet due to the cyber-attacks and security risks on the internet like intrusion detections. Intrusion detection system is very much powerful so researchers have produced the different types of intrusion detection system for different types of environments, because IDS can detect the abnormal behaviors of the system very accurately. On the other hand many other problems are raising for the researchers and the intrusion detection systems due to the change of the behaviors of the attacks on the system. Which is very much frustrating because the attacks are highly impactful as well as the life and the accuracy of the system is on the stake. So relying on these intrusion detection systems IDS and prevention systems PS is very risky due to their inability to detect the threats against the new level and nature of attacks on the systems. Recently machine leaning has reached its heights in terms of detection of threats and anomalies as compared to the anomaly based detection systems with good potential where these kinds of systems generally fail. For that purpose state of the art classical machine learning algorithms are used on the UNSW-NB15 dataset as a benchmark for experimentation while other datasets are also available. In this paper, many types of techniques related to machine learning are used to detect the attacks accurately on the system. On the other hand the performance of the system is degraded when the data is multi-dimensional; therefore numbers of dimensions from 49 to 12 were decreased for achieving more accuracy. In addition to that, features or dimensions which have less importance or not having bigger impact or sparse on the results were filtered out. Hence ensemble methods were applied like Random Forest (RF), support vector machine (SVM), XGBoost, Logistic Regression (LR) and Decision Tree (DT) to check the accuracy of these models. Also the wrapper based feature selection technique RFE (Recursive Feature Elimination) is used to gain the desired features. All the machine learning algorithms are applied for binary classification as well as for the multi classification. In our results it was found out that the Random Forest with the feature selection method (RFE) has the accuracy of 99.70% for the binary classification which is most while using the Anaconda 3.0 with python 3.0 and google colab in this research work. We also applied the machine learning algorithms for multi-classification the accuracy is 70.72% for training and 65.86% for the training data set, and other wrapper based techniques like sequential feature selection which gives the accuracy of 99.70% with random forest using feature elimination.

Show full item record